# **House Price Prediction Using Machine Learning**

**Data Loading**

In [None]:
import zipfile
zip_path="/content/archive.zip"
with zipfile.ZipFile(zip_path,"r") as f:
  f.extractall("/content")
print("Unzipped successfully")

Unzipped successfully


In [None]:
import pandas as pd
data=pd.read_csv("train (1).csv")
data.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


**Data Cleaning**

In [None]:
missing=data.isnull().sum()
missing[missing>0].sort_values(ascending=False).head(15)

Unnamed: 0,0
PoolQC,1453
MiscFeature,1406
Alley,1369
Fence,1179
MasVnrType,872
FireplaceQu,690
LotFrontage,259
GarageType,81
GarageYrBlt,81
GarageFinish,81


In [None]:
data=data.drop(['PoolQC','MiscFeature','Alley','Fence'],axis=1)

In [None]:
data.shape

(1460, 77)

In [None]:
num_cols=data.select_dtypes(include=['int64','float64']).columns
cat_cols=data.select_dtypes(include=['object']).columns
print("Numerical Columns:",len(num_cols))
print("Categorical Columns:",len(cat_cols))

Numerical Columns: 38
Categorical Columns: 39


In [None]:
data[num_cols]=data[num_cols].fillna(data[num_cols].median())
data[cat_cols]=data[cat_cols].fillna("None")

In [None]:
data.isnull().sum().sum()

np.int64(0)

In [None]:
data_encoded=pd.get_dummies(data,drop_first=True)
data_encoded.shape

(1460, 249)

**Model Training**

In [None]:
X=data_encoded.drop("SalePrice", axis=1)
y=data_encoded["SalePrice"]
X.shape,y.shape


((1460, 248), (1460,))

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
X_train.shape,X_test.shape


((1168, 248), (292, 248))

In [None]:
from sklearn.linear_model import LinearRegression
model=LinearRegression()
model.fit(X_train, y_train)


In [None]:
y_pred=model.predict(X_test)
y_pred[:5]


array([157533.81870222, 355093.19763686,  86855.90572094, 179918.0285567 ,
       321013.27883471])

**Model Evaluation**

In [None]:
from sklearn.metrics import mean_absolute_error
mae=mean_absolute_error(y_test, y_pred)
mae


20668.615426176406

In [None]:

one_house=X_test.iloc[0:1]
predicted_price=model.predict(one_house)
actual_price=y_test.iloc[0]
predicted_price,actual_price


(array([157533.81870222]), np.int64(154500))

In [None]:
import numpy as np
y_log=np.log(y)
X_train_l,X_test_l,y_train_l,y_test_l=train_test_split(X,y_log,test_size=0.2,random_state=42)
model_log=LinearRegression()
model_log.fit(X_train_l, y_train_l)
y_pred_log=model_log.predict(X_test_l)
y_pred_exp=np.exp(y_pred_log)
from sklearn.metrics import mean_absolute_error
mae_log=mean_absolute_error(y_test,y_pred_exp)
mae_log


15476.489127404127

**Conclusion**

A Linear Regression model was trained to predict house prices using cleaned and encoded housing data.
The model was evaluated using MAE and RMSE, providing a strong baseline for house price prediction.
