Task-1:
Implement a linear regression model to predict the prices of houses based on their square footage and the number of bedrooms and bathrooms.

Data used: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

Importing necessary libraries and loading data

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
train=pd.read_csv("train.csv")
test=pd.read_csv("test.csv")

Data Exploration:

In [2]:
train.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [3]:
test.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,1461,20,RH,80.0,11622,Pave,,Reg,Lvl,AllPub,...,120,0,,MnPrv,,0,6,2010,WD,Normal
1,1462,20,RL,81.0,14267,Pave,,IR1,Lvl,AllPub,...,0,0,,,Gar2,12500,6,2010,WD,Normal
2,1463,60,RL,74.0,13830,Pave,,IR1,Lvl,AllPub,...,0,0,,MnPrv,,0,3,2010,WD,Normal
3,1464,60,RL,78.0,9978,Pave,,IR1,Lvl,AllPub,...,0,0,,,,0,6,2010,WD,Normal
4,1465,120,RL,43.0,5005,Pave,,IR1,HLS,AllPub,...,144,0,,,,0,1,2010,WD,Normal


In [4]:
train.isna().sum()

Id                 0
MSSubClass         0
MSZoning           0
LotFrontage      259
LotArea            0
                ... 
MoSold             0
YrSold             0
SaleType           0
SaleCondition      0
SalePrice          0
Length: 81, dtype: int64

In [5]:
test.isna().sum()

Id                 0
MSSubClass         0
MSZoning           4
LotFrontage      227
LotArea            0
                ... 
MiscVal            0
MoSold             0
YrSold             0
SaleType           1
SaleCondition      0
Length: 80, dtype: int64

In [6]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallC

Train Test Split:

In [7]:
X = train[['GrLivArea','BedroomAbvGr','BsmtFullBath','BsmtHalfBath','FullBath','HalfBath']]
Y = train['SalePrice']
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2,random_state=42)


Implementation of Linear Regression Model:

In [8]:
model = LinearRegression()
model.fit(X_train, Y_train)
Y_pred=model.predict(X_test)


Calculation of model metrics:

In [9]:
mse=mean_squared_error(Y_test,Y_pred)
print('Mean Squared Error:',float(mse))
rmse=mse**0.5
print('Root Mean Squared Error:',float(rmse))

Mean Squared Error: 2609094001.2966046
Root Mean Squared Error: 51079.291315528295


Example prediction:

In [10]:
def predict_house_price(GrLivArea, BedroomAbvGr, BsmtFullBath, BsmtHalfBath, FullBath, HalfBath):
    new_house = pd.DataFrame({
        'GrLivArea': [GrLivArea],
        'BedroomAbvGr': [BedroomAbvGr],
        'BsmtFullBath': [BsmtFullBath],
        'BsmtHalfBath': [BsmtHalfBath],
        'FullBath': [FullBath],
        'HalfBath': [HalfBath]
    })
    predicted_price = model.predict(new_house)
    return float(predicted_price)

In [12]:
GrLivArea = int(input("Enter the living area (sqft): "))
BedroomAbvGr = int(input("Enter the number of bedrooms: "))
BsmtFullBath = int(input("Enter the number of full bathrooms in the basement: "))
BsmtHalfBath = int(input("Enter the number of half bathrooms in the basement: "))
FullBath = int(input("Enter the number of full bathrooms: "))
HalfBath = int(input("Enter the number of half bathrooms: "))

predicted_price = predict_house_price(GrLivArea, BedroomAbvGr, BsmtFullBath, BsmtHalfBath, FullBath, HalfBath)
print("Predicted Price: $", predicted_price)

Predicted Price: $ 432009.6326260981


  return float(predicted_price)
