# ***Comparative Analysis of Machine Learning Models for Accurate House Price Prediction*** 

This project outlines the process of building a machine learning pipeline to predict housing prices using the `USA_Housing` dataset. The pipeline includes data preprocessing, training multiple models, saving artifacts, and preparing the solution for deployment on the cloud.

---

### 1.  **OVERVIEW OF THE WORKFLOW**

1. **Data Loading and Cleaning**: Load the dataset and handle missing values.
2. **Data Preprocessing**: Select features, scale data, and split into training and testing sets.
3. **Model Training**: Train three regression models — Decision Tree Regressor, Support Vector Regressor, and Gradient Boosting Regressor.
4. **Model Evaluation**: Compute RMSE to evaluate performance.
5. **Save Artifacts**: Save trained models and scalers for deployment.
6. **Deployment Preparation**: Test predictions with dummy data for deployment validation.

---

### 2. **CODE IMPLEMENTATION**

**IMPORTING THE NECESSARY LIBRARIES**

In [13]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import joblib

**LOADING AND CLEANING THE DATASET**

In [14]:
# Load Dataset
data = pd.read_csv('USA_Housing.csv')

In [15]:
# Check for missing values
data.dropna(inplace=True)

In [16]:
# Features and Target
X = data.drop(['Price', 'Address'], axis=1)
y = data['Price']

**DATA PREPROCESSING**

In [17]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


In [18]:
# Save Scaler
joblib.dump(scaler, 'scaler.pkl')

['scaler.pkl']

**MODEL TRAINING AND SAVING**

In [19]:
# Model 1: Decision Tree Regressor
dt = DecisionTreeRegressor()
dt.fit(X_train, y_train)
joblib.dump(dt, 'decision_tree_model.pkl')

['decision_tree_model.pkl']

In [20]:
# Model 2: Support Vector Regressor
svr = SVR()
svr.fit(X_train, y_train)
joblib.dump(svr, 'svm_model.pkl')

['svm_model.pkl']

In [21]:
# Model 3: Gradient Boosting Regressor
gbr = GradientBoostingRegressor()
gbr.fit(X_train, y_train)
joblib.dump(gbr, 'gradient_boosting_model.pkl')

['gradient_boosting_model.pkl']

**MODEL EVALUATION**

In [22]:
# Evaluation
models = {'Decision Tree': dt, 'SVR': svr, 'Gradient Boosting': gbr}
for name, model in models.items():
    predictions = model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, predictions))
    print(f'{name} RMSE: {rmse}')

print("Models and scaler saved successfully!")

Decision Tree RMSE: 182039.3704965961
SVR RMSE: 350941.5597254943
Gradient Boosting RMSE: 109476.65761496617
Models and scaler saved successfully!


**TESTING PREDICTIONS WITH DUMMY DATA**

In [23]:
import numpy as np
import joblib

# Dummy Data: avg_area_income, avg_area_house_age, avg_area_rooms, avg_area_bedrooms, area_population
dummy_data = [
    [85.0, 20.0, 7.0, 3.0, 150.0],
    [90.5, 25.0, 8.0, 3.5, 200.0],
    [75.0, 15.0, 6.5, 2.5, 120.0],
    [92.0, 30.0, 9.0, 4.0, 180.0],
    [80.5, 18.0, 7.5, 3.0, 160.0]
]

# Load the pre-trained model
model = joblib.load('gradient_boosting_model.pkl')

# Predict house prices for dummy data
for i, data in enumerate(dummy_data):
    prediction = model.predict([data])
    print(f"Prediction for row {i+1}: ${prediction[0]:,.2f}")


Prediction for row 1: $2,864,219.79
Prediction for row 2: $2,864,219.79
Prediction for row 3: $2,864,219.79
Prediction for row 4: $2,864,219.79
Prediction for row 5: $2,864,219.79


---