<a href="https://colab.research.google.com/github/AyushMishra504/predict-house-prices/blob/main/training_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [24]:
import sklearn
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import joblib

In [18]:
df = pd.read_csv('/content/sample_data/processed.csv')

In [20]:
try:
  X = df.drop(columns=['price'])
  y = df['price']
except:
  print("Column already dropped")

Dropping price column doesn’t change the dataset itself—it just ensures that price isn’t mistakenly used as an input during training.

In [21]:
#splitting dataset into training and testing sets
#80% training, 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


X_train → 80% of the features (used to train the model)

X_test → 20% of the features (used to test the model)

y_train → 80% of the target values (price) for training

y_test → 20% of the target values (price) for testing

In [22]:
#training the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

#making predictions
y_pred = model.predict(X_test)

In [23]:
#printing evaluation metric

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Absolute Error: {mae}")
print(f"Mean Squared Error: {mse}")
print(f"R² Score: {r2}")


Mean Absolute Error: 167422.50033431454
Mean Squared Error: 65548102000.552025
R² Score: 0.37475248028193986


 MAE (Mean Absolute Error) → Average absolute difference between predicted and actual prices.

 MSE (Mean Squared Error) → Average squared difference between predicted and actual prices.

 R² Score (R-Squared) → Measures how well the model explains the data (0 to 1).

R² = 1 → Perfect fit

R² = 0 → Model is no better than guessing

In [25]:
#saving the trained model
joblib.dump(model, 'house_price_model.pkl')
print("Model saved successfully!")

Model saved successfully!
