###  To build a prediction model to evaluate the price of a house with the given features.

### Data Description

 cid: a notation for a house
1.	dayhours: Date house was sold
2.	price: Price is prediction target
3.	room_bed: Number of Bedrooms/House
4.	room_bath: Number of bathrooms/bedrooms
5.	living_measure: square footage of the home
6.	lot_measure: square footage of the lot
7.	ceil: Total floors (levels) in house
8.	coast: House which has a view to a waterfront
9.	sight: Has been viewed
10.	condition: How good the condition is (Overall)
11.	quality: grade given to the housing unit, based on grading system
12.	ceil_measure: square footage of house apart from basement
13.	basement_measure: square footage of the basement
14.	yr_built: Built Year
15.	yr_renovated: Year when house was renovated
16.	zipcode: zip
17.	lat: Latitude coordinate
18.	long: Longitude coordinate
19.	living_measure15: Living room area in 2015(implies-- some renovations) This might or might not have affected the lotsize area
20.	lot_measure15: lotSize area in 2015(implies-- some renovations)
21.	furnished: Based on the quality of room 23: total_area: Measure of both living and lot


### Data Ingestion

In [1]:
import pandas as pd
import numpy as np
pd.options.display.max_columns=None
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import zscore
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Loading the dataset whcih having city and region columns
house=pd.read_csv('innercityn.csv')
y=house['price']

In [3]:
# Shape and Size of the dataset
print("The shape of the dataset",house.shape)
print("The size of the dataset",house.size)

The shape of the dataset (21613, 25)
The size of the dataset 540325


In [4]:
#Check the null values in the dataset
house.isnull().sum()

cid                 0
dayhours            0
price               0
room_bed            0
room_bath           0
living_measure      0
lot_measure         0
ceil                0
coast               0
sight               0
condition           0
quality             0
ceil_measure        0
basement            0
yr_built            0
yr_renovated        0
zipcode             0
City                0
lat                 0
long                0
living_measure15    0
lot_measure15       0
furnished           0
Region              0
total_area          0
dtype: int64

In [5]:
#Description of the dataset
house.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
cid,21613.0,4580302000.0,2876566000.0,1000102.0,2123049000.0,3904930000.0,7308900000.0,9900000000.0
price,21613.0,540182.2,367362.2,75000.0,321950.0,450000.0,645000.0,7700000.0
room_bed,21613.0,3.370842,0.9300618,0.0,3.0,3.0,4.0,33.0
room_bath,21613.0,2.114757,0.7701632,0.0,1.75,2.25,2.5,8.0
living_measure,21613.0,2079.9,918.4409,290.0,1427.0,1910.0,2550.0,13540.0
lot_measure,21613.0,15106.97,41420.51,520.0,5040.0,7618.0,10688.0,1651359.0
ceil,21613.0,1.494309,0.5399889,1.0,1.0,1.5,2.0,3.5
coast,21613.0,0.007541757,0.0865172,0.0,0.0,0.0,0.0,1.0
sight,21613.0,0.2343034,0.7663176,0.0,0.0,0.0,0.0,4.0
condition,21613.0,3.40943,0.650743,1.0,3.0,3.0,4.0,5.0


In [6]:
#Checking the datatypes of the dataset
house.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 25 columns):
cid                 21613 non-null int64
dayhours            21613 non-null object
price               21613 non-null int64
room_bed            21613 non-null int64
room_bath           21613 non-null float64
living_measure      21613 non-null int64
lot_measure         21613 non-null int64
ceil                21613 non-null float64
coast               21613 non-null int64
sight               21613 non-null int64
condition           21613 non-null int64
quality             21613 non-null int64
ceil_measure        21613 non-null int64
basement            21613 non-null int64
yr_built            21613 non-null int64
yr_renovated        21613 non-null int64
zipcode             21613 non-null int64
City                21613 non-null object
lat                 21613 non-null float64
long                21613 non-null float64
living_measure15    21613 non-null int64
lot_measure15       

In [7]:
house.head()

Unnamed: 0,cid,dayhours,price,room_bed,room_bath,living_measure,lot_measure,ceil,coast,sight,condition,quality,ceil_measure,basement,yr_built,yr_renovated,zipcode,City,lat,long,living_measure15,lot_measure15,furnished,Region,total_area
0,3034200666,20141107T000000,808100,4,3.25,3020,13457,1.0,0,0,5,9,3020,0,1956,0,98133,Seattle,47.7174,-122.336,2120,7553,1,North West,16477
1,8731981640,20141204T000000,277500,4,2.5,2550,7500,1.0,0,0,3,8,1750,800,1976,0,98023,Federal Way,47.3165,-122.386,2260,8800,0,South West,10050
2,5104530220,20150420T000000,404000,3,2.5,2370,4324,2.0,0,0,3,8,2370,0,2006,0,98038,Maple Valley,47.3515,-121.999,2370,4348,0,South East,6694
3,6145600285,20140529T000000,300000,2,1.0,820,3844,1.0,0,0,4,6,820,0,1916,0,98133,Seattle,47.7049,-122.349,1520,3844,0,North West,4664
4,8924100111,20150424T000000,699000,2,1.5,1400,4050,1.0,0,0,4,8,1400,0,1954,0,98115,Seattle,47.6768,-122.269,1900,5940,0,North West,5450


### Data Cleaning & Feature Engineering

In [8]:
#Extracting the year and Month from the dayhours feature
house['dayhours']=house['dayhours'].apply(lambda x:x.rstrip('T0'))
house['year']=house['dayhours'].apply(lambda x:x[0:4]) # The Year which house was sold
house['month']=house['dayhours'].apply(lambda x:x[4:6])
house.drop('dayhours',axis=1,inplace=True) #droping the dayhours feature

In [9]:
#Dropping the Cid feature
house.drop('cid',axis=1,inplace=True)
house.drop('zipcode',axis=1,inplace=True) #Already we extracted region column from zipcode

In [10]:
#Creating the Two columns
house['year']=house['year'].astype('int64')                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
house['age']=house['year']-house['yr_built'] # Age of the house
house['rage']=house['year']-house['yr_renovated'] # The Age of the house after renovation done
house.drop(['yr_built','yr_renovated','year'],axis=1,inplace=True) # Removal of these columns

In [11]:
#Some of the columns shows datatype as numerical but it should be categorical so,changed columns to object
house[['room_bed','ceil','coast','sight','condition','quality','furnished','room_bath','age','rage']]=house[['room_bed','ceil','coast','sight','condition','quality','furnished','room_bath','age','rage']].astype('object')

### Extracting Numerical Columns and Applying Zscore

In [12]:
#Extracting num cols and we are not applying zscore for price feature
num_cols=house.select_dtypes(['int64','float64']).columns
num_cols1=num_cols[1:]
num_cols1

Index(['living_measure', 'lot_measure', 'ceil_measure', 'basement', 'lat',
       'long', 'living_measure15', 'lot_measure15', 'total_area'],
      dtype='object')

In [13]:
#Exrtacting numerical features to house_num dataframe
house_num=house[num_cols1]

In [14]:
#Applying zscore to numerical features
house_num[num_cols1]=house_num[num_cols1].apply(zscore)
house_num.head()

Unnamed: 0,living_measure,lot_measure,ceil_measure,basement,lat,long,living_measure15,lot_measure15,total_area
0,1.023606,-0.039835,1.487322,-0.658681,1.135587,-0.867059,0.194707,-0.191018,-0.017069
1,0.511858,-0.183656,-0.046362,1.148964,-1.757734,-1.222109,0.398975,-0.145346,-0.171608
2,0.315869,-0.260335,0.702366,-0.658681,-1.505137,1.525981,0.559471,-0.308402,-0.252304
3,-1.371813,-0.271924,-1.169453,-0.658681,1.045374,-0.959372,-0.680725,-0.326861,-0.301116
4,-0.740293,-0.26695,-0.46903,-0.658681,0.842574,-0.391291,-0.126285,-0.250094,-0.282217


#### Outlier Treatment

In [15]:
# EXtracting the outliers and replacing with them null values
for i in range(len(num_cols1)):  # number of columns
    for j in range(len(house_num)):  # number of rows
        if abs(house_num[num_cols1[i]][j])>3:  # condition to extract outliers
            house_num[num_cols1[i]].replace({house_num[num_cols1[i]][j]:np.nan},inplace=True)

In [16]:
#Chceking the null values in the after replacement of outliers with null
house_num.isnull().sum()

living_measure      248
lot_measure         347
ceil_measure        254
basement            247
lat                   0
long                233
living_measure15    237
lot_measure15       363
total_area          346
dtype: int64

### Data Prepration

In [17]:
#Dropping the numerical columns in the orginal house dataset
house.drop(['living_measure', 'lot_measure', 'ceil_measure', 'basement', 'lat',
       'long', 'living_measure15', 'lot_measure15', 'total_area'],axis=1,inplace=True)

In [18]:
#Concating house_num data to original house data
housef=pd.concat([house,house_num],axis=1)
housef.head()

Unnamed: 0,price,room_bed,room_bath,ceil,coast,sight,condition,quality,City,furnished,Region,month,age,rage,living_measure,lot_measure,ceil_measure,basement,lat,long,living_measure15,lot_measure15,total_area
0,808100,4,3.25,1,0,0,5,9,Seattle,1,North West,11,58,2014,1.023606,-0.039835,1.487322,-0.658681,1.135587,-0.867059,0.194707,-0.191018,-0.017069
1,277500,4,2.5,1,0,0,3,8,Federal Way,0,South West,12,38,2014,0.511858,-0.183656,-0.046362,1.148964,-1.757734,-1.222109,0.398975,-0.145346,-0.171608
2,404000,3,2.5,2,0,0,3,8,Maple Valley,0,South East,4,9,2015,0.315869,-0.260335,0.702366,-0.658681,-1.505137,1.525981,0.559471,-0.308402,-0.252304
3,300000,2,1.0,1,0,0,4,6,Seattle,0,North West,5,98,2014,-1.371813,-0.271924,-1.169453,-0.658681,1.045374,-0.959372,-0.680725,-0.326861,-0.301116
4,699000,2,1.5,1,0,0,4,8,Seattle,0,North West,4,61,2015,-0.740293,-0.26695,-0.46903,-0.658681,0.842574,-0.391291,-0.126285,-0.250094,-0.282217


In [19]:
housef.drop(['City','price'],axis=1,inplace=True)

In [20]:
#Applying the label Encoder for some specific columns
labels=['age','room_bath','rage','ceil']
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
for i in labels:
    housef[i]=le.fit_transform(house[i])

In [21]:
#Get dummies for Region column
housef=pd.get_dummies(housef,columns=['Region'],drop_first=True)

### MICE

In [22]:
from impyute.imputation.cs import mice # Importing the MICE

In [23]:
allcols=housef.columns # Getting all columns

In [24]:
housef[allcols]=housef[allcols].astype('float64') # For MICE every feature should be in Numerical

In [25]:
#Applying MICE to the whole dataset
house_mice=mice(housef)

In [26]:
house_mice.columns=housef.columns

In [27]:
# For MICE we have converted all features to numerical, so we are again converting to original datatype
cat_cols=['room_bed', 'room_bath', 'ceil', 'coast', 'sight', 'condition',
       'quality', 'furnished', 'month', 'age', 'rage',
       'Region_North East', 'Region_North West', 'Region_South East',
       'Region_South West']
house_mice[cat_cols]=house_mice[cat_cols].astype('object')

In [28]:
house_mice.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 24 columns):
room_bed             21613 non-null object
room_bath            21613 non-null object
ceil                 21613 non-null object
coast                21613 non-null object
sight                21613 non-null object
condition            21613 non-null object
quality              21613 non-null object
furnished            21613 non-null object
month                21613 non-null object
age                  21613 non-null object
rage                 21613 non-null object
living_measure       21613 non-null float64
lot_measure          21613 non-null float64
ceil_measure         21613 non-null float64
basement             21613 non-null float64
lat                  21613 non-null float64
long                 21613 non-null float64
living_measure15     21613 non-null float64
lot_measure15        21613 non-null float64
total_area           21613 non-null float64
Region_North East    2

### Model Buliding

In [29]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor,AdaBoostRegressor,BaggingRegressor,GradientBoostingRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [30]:
Xtrain,Xtest,Ytrain,Ytest=train_test_split(house_mice,y,test_size=0.2,random_state=10)

In [31]:
Xtrain.shape

(17290, 24)

In [32]:
lr=LinearRegression()
lr.fit(Xtrain,Ytrain)
ytrain_lr=lr.predict(Xtrain)
ypred_lr=lr.predict(Xtest)
a=lr.score(Xtrain,Ytrain) # TRAINING ACCRACY
b=lr.score(Xtest,Ytest)  # TEST ACCURACY
print("The Training Accuracy is",a*100) 
print("The Test Accuracy is ",b*100)
rmse_lr=np.sqrt(mean_squared_error(Ytrain,ytrain_lr)) # TRAIN RMSE
rmse_lr1=np.sqrt(mean_squared_error(Ytest,ypred_lr)) #TEST RMSE
print("The Train RMSE for Linear Regression is",rmse_lr)
print("The Test RMSE for Linear Regression is",rmse_lr1)

The Training Accuracy is 72.88767528461347
The Test Accuracy is  74.09128444321826
The Train RMSE for Linear Regression is 192523.68580453593
The Test RMSE for Linear Regression is 182009.57245291714


In [33]:
df=pd.DataFrame({'columns':Xtrain.columns})
#lr.coef_
df['Coeffciants']=lr.coef_
df.head()

Unnamed: 0,columns,Coeffciants
0,room_bed,-20942.669264
1,room_bath,11324.519108
2,ceil,7125.690896
3,coast,567309.112705
4,sight,47519.43051


In [34]:
Xtrain.columns

Index(['room_bed', 'room_bath', 'ceil', 'coast', 'sight', 'condition',
       'quality', 'furnished', 'month', 'age', 'rage', 'living_measure',
       'lot_measure', 'ceil_measure', 'basement', 'lat', 'long',
       'living_measure15', 'lot_measure15', 'total_area', 'Region_North East',
       'Region_North West', 'Region_South East', 'Region_South West'],
      dtype='object')

In [35]:
rf=RandomForestRegressor()
rf.fit(Xtrain,Ytrain)
ytrain_rf=rf.predict(Xtrain)
ypred_rf=rf.predict(Xtest)
c=rf.score(Xtrain,Ytrain) # TRAIN ACCURAY
d=rf.score(Xtest,Ytest) #TEST ACCURACY
print("The Training Accuracy is",c*100) 
print("The Test Accuracy is ",d*100) 
rmse_rf=np.sqrt(mean_squared_error(Ytrain,ytrain_rf)) # TRANING RMSE
rmse_rf1=np.sqrt(mean_squared_error(Ytest,ypred_rf)) # TEST RMSE
print("The TRAIN RMSE for Random is",rmse_rf)
print("The TEST RMSE for Random Forest Regression is",rmse_rf1)

The Training Accuracy is 97.18247619967025
The Test Accuracy is  84.05780670894349
The TRAIN RMSE for Random is 62063.26021313107
The TEST RMSE for Random Forest Regression is 142772.74949593627


In [36]:
gb=GradientBoostingRegressor()
gb.fit(Xtrain,Ytrain)
ypred_gb=gb.predict(Xtest)
ytrain_gb=gb.predict(Xtrain)
e=gb.score(Xtrain,Ytrain) # TRAIN ACCURACY
f=gb.score(Xtest,Ytest) #TEST ACCURACY
print("The Training Accuracy is",e*100) 
print("The Test Accuracy is ",f*100) 
rmse_gb=np.sqrt(mean_squared_error(Ytrain,ytrain_gb)) #TRAIN RMSE
rmse_gb1=np.sqrt(mean_squared_error(Ytest,ypred_gb)) # TEST RMSE
print("The TRAIN RMSE for Gradient Regression is",rmse_gb)
print("The TEST RMSE for Gradient Regression is",rmse_gb1)

The Training Accuracy is 89.13331270340964
The Test Accuracy is  84.97891349932802
The TRAIN RMSE for Gradient Regression is 121884.75082129551
The TEST RMSE for Gradient Regression is 138586.83017678437


In [37]:
gb=GradientBoostingRegressor(random_state=100)
bg=BaggingRegressor(base_estimator=gb)
bg.fit(Xtrain,Ytrain)
ypred_bg=bg.predict(Xtest)
ytrain_bg=bg.predict(Xtrain)
g=bg.score(Xtrain,Ytrain)
h=bg.score(Xtest,Ytest)
print("The Training Accuracy is",g*100)  #RAIN ACCURACY
print("The Test Accuracy is ",h*100)  # TEST ACCURACY
rmse_bg=np.sqrt(mean_squared_error(Ytrain,ytrain_bg)) #TRAIN RMSE
rmse_bg1=np.sqrt(mean_squared_error(Ytest,ypred_bg)) #TEST RMSE
print("The TRAIN RMSE for Bagging Regression is",rmse_bg)
print("The TEST RMSE for Bagging Regression is",rmse_bg1)

The Training Accuracy is 88.62760248226974
The Test Accuracy is  85.31775268909544
The TRAIN RMSE for Bagging Regression is 124688.61610732063
The TEST RMSE for Bagging Regression is 137014.82348140524


In [38]:
models=[ypred_lr,ypred_rf,ypred_gb,ypred_bg]
r2_score=[]
adr2_score=[]
for i in models:
    SS_Residual = sum((Ytest-i)**2)
    SS_Total = sum((Ytest-np.mean(Ytest))**2)
    r_squared = 1 - (float(SS_Residual))/SS_Total
    r2_score.append(r_squared*100)
    adjusted_r_squared = 1 - (1-r_squared)*(len(Ytest)-1)/(len(Ytest)-Xtrain.shape[1]-1)
    adr2_score.append(adjusted_r_squared*100)

### BEST MODEL

In [39]:
bestmodel=pd.DataFrame({'Model':['LR','RF','GB','BG']})
bestmodel['Train RMSE']=[round(rmse_lr),round(rmse_rf),round(rmse_gb),round(rmse_bg)]
bestmodel['Test RMSE']=[round(rmse_lr1),round(rmse_rf1),round(rmse_gb1),round(rmse_bg1)]
bestmodel['Train R2_Score %']=[round(a*100,2),round(c*100,2),round(e*100,2),round(g*100,2)]
bestmodel['Test R2_Score %']=[round(b*100,2),round(d*100,2),round(f*100,2),round(h*100,2)]
bestmodel['Adj_Score']=adr2_score

In [40]:
bestmodel

Unnamed: 0,Model,Train RMSE,Test RMSE,Train R2_Score %,Test R2_Score %,Adj_Score
0,LR,192524.0,182010.0,72.89,74.09,73.94661
1,RF,62063.0,142773.0,97.18,84.06,83.968786
2,GB,121885.0,138587.0,89.13,84.98,84.895036
3,BG,124689.0,137015.0,88.63,85.32,85.235767


### Feature Selection

In [41]:
from sklearn.feature_selection import RFE

lr=LinearRegression()
# .fit(X_train,y_train)
rfe = RFE(lr,14)
rfe.fit(Xtrain,Ytrain)
print(rfe.support_)
print(rfe.ranking_)
idc_rfe = pd.DataFrame({"rfe_support" :rfe.support_,
                       "columns" :Xtrain.columns ,
                       "ranking" : rfe.ranking_,
                      })
cols = idc_rfe[idc_rfe["rfe_support"] == True]["columns"].tolist()

[False False False  True  True  True  True  True False False False  True
  True  True  True  True  True False False  True False  True False  True]
[ 6  8  7  1  1  1  1  1  9 10 11  1  1  1  1  1  1  2  5  1  4  1  3  1]


In [42]:
house_rfe=house_mice[cols]

In [43]:
Xtrain,Xtest,Ytrain,Ytest=train_test_split(house_rfe,y,test_size=0.3,random_state=10)

In [44]:
lr=LinearRegression()
lr.fit(Xtrain,Ytrain)
ytrain_lr=lr.predict(Xtrain)
ypred_lr=lr.predict(Xtest)
a=lr.score(Xtrain,Ytrain) # TRAINING ACCRACY
b=lr.score(Xtest,Ytest)  # TEST ACCURACY
print("The Training Accuracy is",a*100) 
print("The Test Accuracy is ",b*100)
rmse_lr=np.sqrt(mean_squared_error(Ytrain,ytrain_lr)) # TRAIN RMSE
rmse_lr1=np.sqrt(mean_squared_error(Ytest,ypred_lr)) #TEST RMSE
print("The Train RMSE for Linear Regression is",rmse_lr)
print("The Test RMSE for Linear Regression is",rmse_lr1)

The Training Accuracy is 70.61197229812144
The Test Accuracy is  71.30564762385755
The Train RMSE for Linear Regression is 200513.49224169797
The Test RMSE for Linear Regression is 193589.12418273246


In [45]:
df1=pd.DataFrame({'columns':Xtrain.columns})
df1['Coef_']=lr.coef_
df1

Unnamed: 0,columns,Coef_
0,coast,534146.1
1,sight,60516.14
2,condition,53795.95
3,quality,75622.82
4,furnished,80811.72
5,living_measure,52851590000.0
6,lot_measure,-16600380.0
7,ceil_measure,-47652630000.0
8,basement,-25468060000.0
9,lat,82850.03


In [46]:
rf=RandomForestRegressor()
rf.fit(Xtrain,Ytrain)
ytrain_rf=rf.predict(Xtrain)
ypred_rf=rf.predict(Xtest)
c=rf.score(Xtrain,Ytrain) # TRAIN ACCURAY
d=rf.score(Xtest,Ytest) #TEST ACCURACY
print("The Training Accuracy is",c*100) 
print("The Test Accuracy is ",d*100) 
rmse_rf=np.sqrt(mean_squared_error(Ytrain,ytrain_rf)) # TRANING RMSE
rmse_rf1=np.sqrt(mean_squared_error(Ytest,ypred_rf)) # TEST RMSE
print("The TRAIN RMSE for Random is",rmse_rf)
print("The TEST RMSE for Random Forest Regression is",rmse_rf1)

The Training Accuracy is 97.29357309438585
The Test Accuracy is  82.66195219847603
The TRAIN RMSE for Random is 60849.432441896366
The TEST RMSE for Random Forest Regression is 150481.40567465822


In [54]:
gb=GradientBoostingRegressor()
gb.fit(Xtrain,Ytrain)
ypred_gb=gb.predict(Xtest)
ytrain_gb=gb.predict(Xtrain)
e=gb.score(Xtrain,Ytrain) # TRAIN ACCURACY
f=gb.score(Xtest,Ytest) #TEST ACCURACY
print("The Training Accuracy is",e*100) 
print("The Test Accuracy is ",f*100) 
rmse_gb=np.sqrt(mean_squared_error(Ytrain,ytrain_gb)) #TRAIN RMSE
rmse_gb1=np.sqrt(mean_squared_error(Ytest,ypred_gb)) # TEST RMSE
print("The TRAIN RMSE for Gradient Regression is",rmse_gb)
print("The TEST RMSE for Gradient Regression is",rmse_gb1)

The Training Accuracy is 87.96824980951364
The Test Accuracy is  81.54667052594546
The TRAIN RMSE for Gradient Regression is 128298.85596612422
The TEST RMSE for Gradient Regression is 155245.88925481765


In [53]:
gb=GradientBoostingRegressor()
bg=BaggingRegressor(base_estimator=gb)
bg.fit(Xtrain,Ytrain)
ypred_bg=bg.predict(Xtest)
ytrain_bg=bg.predict(Xtrain)
g=bg.score(Xtrain,Ytrain)
h=bg.score(Xtest,Ytest)
print("The Training Accuracy is",g*100)  #RAIN ACCURACY
print("The Test Accuracy is ",h*100)  # TEST ACCURACY
rmse_bg=np.sqrt(mean_squared_error(Ytrain,ytrain_bg)) #TRAIN RMSE
rmse_bg1=np.sqrt(mean_squared_error(Ytest,ypred_bg)) #TEST RMSE
print("The TRAIN RMSE for Bagging Regression is",rmse_bg)
print("The TEST RMSE for Bagging Regression is",rmse_bg1)

The Training Accuracy is 87.21240838070368
The Test Accuracy is  82.26755589444457
The TRAIN RMSE for Bagging Regression is 132267.38269786356
The TEST RMSE for Bagging Regression is 152183.3153888267


### Calculating Adjusted R2_Score

In [49]:
models=[ypred_lr,ypred_rf,ypred_gb,ypred_bg]
r2_score=[]
adr2_score=[]
for i in models:
    SS_Residual = sum((Ytest-i)**2)
    SS_Total = sum((Ytest-np.mean(Ytest))**2)
    r_squared = 1 - (float(SS_Residual))/SS_Total
    r2_score.append(r_squared*100)
    adjusted_r_squared = 1 - (1-r_squared)*(len(Ytest)-1)/(len(Ytest)-Xtrain.shape[1]-1)
    adr2_score.append(round(adjusted_r_squared*100,2))

### Model after RFE

In [50]:
bestmodel=pd.DataFrame({'Model':['LR','RF','GB','BG']})
bestmodel['Train RMSE']=[round(rmse_lr),round(rmse_rf),round(rmse_gb),round(rmse_bg)]
bestmodel['Test RMSE']=[round(rmse_lr1),round(rmse_rf1),round(rmse_gb1),round(rmse_bg1)]
bestmodel['Train R2_Score %']=[round(a*100,2),round(c*100,2),round(e*100,2),round(g*100,2)]
bestmodel['Test R2_Score %']=[round(b*100,2),round(d*100,2),round(f*100,2),round(h*100,2)]
bestmodel['Adj_Score']=adr2_score
bestmodel

Unnamed: 0,Model,Train RMSE,Test RMSE,Train R2_Score %,Test R2_Score %,Adj_Score
0,LR,200513.0,193589.0,70.61,71.31,71.24
1,RF,60849.0,150481.0,97.29,82.66,82.62
2,GB,128299.0,155399.0,87.97,81.51,81.47
3,BG,133757.0,150164.0,86.92,82.74,82.7
