# **Predicting House Prices Using Machine Learning**  


### **1. Introduction**  
This project aims to predict house prices using three machine learning models — **Linear Regression**, **Random Forest**, and **SVM**. The goal is to identify the best-performing model based on the **RMSE (Root Mean Square Error)** metric.  


### **2. Data Description**  
The dataset contains key features such as:  
- **LotArea:** Lot size in square feet.  
- **TotalBsmtSF:** Total basement area.  
- **SalePrice:** The target variable representing the house price.  

#### **Data Cleaning**
- **Missing Values:**  
   - `SalePrice` missing values were replaced with the mean.  
   - Rows with missing values in other features were dropped.  
- **Duplicate Entries:** Removed to ensure data quality.  


### **3. Exploratory Data Analysis (EDA)**  
EDA helped uncover key insights:  

- **SalePrice Distribution:** Right-skewed, indicating the presence of high-value properties.  
- **Outliers:** Significant outliers were detected in `LotArea`, `TotalBsmtSF`, and `SalePrice`.  
- Outliers were removed using the **IQR (Interquartile Range)** method to improve model performance.  


### **4. Data Scaling**  
To ensure consistency across features, numerical columns such as `LotArea`, `TotalBsmtSF`, and `SalePrice` were standardized using **StandardScaler**. This step was crucial for improving model performance, especially for **SVM** and **Linear Regression**.


### **5. Modeling**  
Three models were implemented:  

- **Linear Regression** — Simple and interpretable.  
- **Random Forest** — Robust and handles outliers well.  
- **SVM** — Effective for complex patterns.  


### **6. Results**  
The models were evaluated using **RMSE**:  

- **Linear Regression:** **0.87**  
- **Random Forest:** **0.58** (Best Model)  
- **SVM:** **0.82**  


### **7. Conclusion**  
The **Random Forest** model outperformed the others with the lowest RMSE score, making it the most accurate model for predicting house prices in this project.  

For further improvement, tuning hyperparameters or adding engineered features could enhance model performance.
