# Problem Statement:

A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses data analytics to purchase houses at a price below their actual values and flip them on at a higher price. For the same purpose, the company has collected a data set from the sale of houses in Australia. The data is provided in the CSV file below.

 

The company is looking at prospective properties to buy to enter the market. You are required to build a regression model using regularisation in order to predict the actual value of the prospective properties and decide whether to invest in them or not.

 

The company wants to know the following things about the prospective properties:

Which variables are significant in predicting the price of a house, and

How well those variables describe the price of a house.

 

Also, determine the optimal value of lambda for ridge and lasso regression.

In [2]:
import numpy as np, pandas as pd
import matplotlib.pyplot as plt, seaborn as sns
import statsmodels
import statsmodels.api as sm
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.preprocessing import PolynomialFeatures, MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score

import warnings
warnings.filterwarnings('ignore')

In [3]:
price = pd.read_csv("train.csv")

In [4]:
price.shape

(1460, 81)

In [5]:
price.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallC

In [9]:
null_perc = round(price.isnull().mean()*100,2)
null_perc   

# The null percentage of columns tells us which are the columns to drop immediatly.

Id                0.00
MSSubClass        0.00
MSZoning          0.00
LotFrontage      17.74
LotArea           0.00
                 ...  
MoSold            0.00
YrSold            0.00
SaleType          0.00
SaleCondition     0.00
SalePrice         0.00
Length: 81, dtype: float64

In [11]:
# Lets see the columns with null percentage greater than 45% and drop them immediatly,
# as high null value percentage won't help our final model. 

null_perc[null_perc > 45]

Alley          93.77
FireplaceQu    47.26
PoolQC         99.52
Fence          80.75
MiscFeature    96.30
dtype: float64

In [12]:
price= price.drop(['FireplaceQu', 'Alley', 'PoolQC', 'Fence', 'MiscFeature'], 1)

In [20]:
null_perc = round(price.isnull().mean()*100,2)
null_perc[null_perc != 0]

LotFrontage     17.74
MasVnrType       0.55
MasVnrArea       0.55
BsmtQual         2.53
BsmtCond         2.53
BsmtExposure     2.60
BsmtFinType1     2.53
BsmtFinType2     2.60
Electrical       0.07
GarageType       5.55
GarageYrBlt      5.55
GarageFinish     5.55
GarageQual       5.55
GarageCond       5.55
dtype: float64

In [22]:
price.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,Reg,Lvl,AllPub,Inside,...,0,0,0,0,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,Reg,Lvl,AllPub,FR2,...,0,0,0,0,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,IR1,Lvl,AllPub,Inside,...,0,0,0,0,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,IR1,Lvl,AllPub,Corner,...,272,0,0,0,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,IR1,Lvl,AllPub,FR2,...,0,0,0,0,0,12,2008,WD,Normal,250000


In [23]:
price.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 76 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   LotShape       1460 non-null   object 
 7   LandContour    1460 non-null   object 
 8   Utilities      1460 non-null   object 
 9   LotConfig      1460 non-null   object 
 10  LandSlope      1460 non-null   object 
 11  Neighborhood   1460 non-null   object 
 12  Condition1     1460 non-null   object 
 13  Condition2     1460 non-null   object 
 14  BldgType       1460 non-null   object 
 15  HouseStyle     1460 non-null   object 
 16  OverallQual    1460 non-null   int64  
 17  OverallCond    1460 non-null   int64  
 18  YearBuil

In [37]:
cat_vars = price.select_dtypes(include = 'object')

In [38]:
cat_vars.columns

Index(['MSZoning', 'Street', 'LotShape', 'LandContour', 'Utilities',
       'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
       'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st',
       'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation',
       'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2',
       'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual',
       'Functional', 'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond',
       'PavedDrive', 'SaleType', 'SaleCondition'],
      dtype='object')

In [40]:
num_vars = price.select_dtypes(include = ['int64', 'float64'])

In [41]:
num_vars.columns

Index(['Id', 'MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual',
       'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1',
       'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd',
       'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF',
       'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea',
       'MiscVal', 'MoSold', 'YrSold', 'SalePrice'],
      dtype='object')

In [43]:
cat_vars.apply(pd.Series.value_counts)


Unnamed: 0,MSZoning,Street,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,...,Electrical,KitchenQual,Functional,GarageType,GarageFinish,GarageQual,GarageCond,PavedDrive,SaleType,SaleCondition
1.5Fin,,,,,,,,,,,...,,,,,,,,,,
1.5Unf,,,,,,,,,,,...,,,,,,,,,,
1Fam,,,,,,,,,,,...,,,,,,,,,,
1Story,,,,,,,,,,,...,,,,,,,,,,
2.5Fin,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
WdShake,,,,,,,,,,,...,,,,,,,,,,
WdShing,,,,,,,,,,,...,,,,,,,,,,
WdShngl,,,,,,,,,,,...,,,,,,,,,,
Wood,,,,,,,,,,,...,,,,,,,,,,


In [46]:
for col in cat_vars.columns:
    print('-' * 40 + col + '-' * 40 , end=' - ')
    display(cat_vars[col].value_counts())

----------------------------------------MSZoning---------------------------------------- - 

RL         1151
RM          218
FV           65
RH           16
C (all)      10
Name: MSZoning, dtype: int64

----------------------------------------Street---------------------------------------- - 

Pave    1454
Grvl       6
Name: Street, dtype: int64

----------------------------------------LotShape---------------------------------------- - 

Reg    925
IR1    484
IR2     41
IR3     10
Name: LotShape, dtype: int64

----------------------------------------LandContour---------------------------------------- - 

Lvl    1311
Bnk      63
HLS      50
Low      36
Name: LandContour, dtype: int64

----------------------------------------Utilities---------------------------------------- - 

AllPub    1459
NoSeWa       1
Name: Utilities, dtype: int64

----------------------------------------LotConfig---------------------------------------- - 

Inside     1052
Corner      263
CulDSac      94
FR2          47
FR3           4
Name: LotConfig, dtype: int64

----------------------------------------LandSlope---------------------------------------- - 

Gtl    1382
Mod      65
Sev      13
Name: LandSlope, dtype: int64

----------------------------------------Neighborhood---------------------------------------- - 

NAmes      225
CollgCr    150
OldTown    113
Edwards    100
Somerst     86
Gilbert     79
NridgHt     77
Sawyer      74
NWAmes      73
SawyerW     59
BrkSide     58
Crawfor     51
Mitchel     49
NoRidge     41
Timber      38
IDOTRR      37
ClearCr     28
StoneBr     25
SWISU       25
MeadowV     17
Blmngtn     17
BrDale      16
Veenker     11
NPkVill      9
Blueste      2
Name: Neighborhood, dtype: int64

----------------------------------------Condition1---------------------------------------- - 

Norm      1260
Feedr       81
Artery      48
RRAn        26
PosN        19
RRAe        11
PosA         8
RRNn         5
RRNe         2
Name: Condition1, dtype: int64

----------------------------------------Condition2---------------------------------------- - 

Norm      1445
Feedr        6
PosN         2
Artery       2
RRNn         2
RRAn         1
RRAe         1
PosA         1
Name: Condition2, dtype: int64

----------------------------------------BldgType---------------------------------------- - 

1Fam      1220
TwnhsE     114
Duplex      52
Twnhs       43
2fmCon      31
Name: BldgType, dtype: int64

----------------------------------------HouseStyle---------------------------------------- - 

1Story    726
2Story    445
1.5Fin    154
SLvl       65
SFoyer     37
1.5Unf     14
2.5Unf     11
2.5Fin      8
Name: HouseStyle, dtype: int64

----------------------------------------RoofStyle---------------------------------------- - 

Gable      1141
Hip         286
Flat         13
Gambrel      11
Mansard       7
Shed          2
Name: RoofStyle, dtype: int64

----------------------------------------RoofMatl---------------------------------------- - 

CompShg    1434
Tar&Grv      11
WdShngl       6
WdShake       5
Metal         1
ClyTile       1
Roll          1
Membran       1
Name: RoofMatl, dtype: int64

----------------------------------------Exterior1st---------------------------------------- - 

VinylSd    515
HdBoard    222
MetalSd    220
Wd Sdng    206
Plywood    108
CemntBd     61
BrkFace     50
WdShing     26
Stucco      25
AsbShng     20
BrkComm      2
Stone        2
AsphShn      1
ImStucc      1
CBlock       1
Name: Exterior1st, dtype: int64

----------------------------------------Exterior2nd---------------------------------------- - 

VinylSd    504
MetalSd    214
HdBoard    207
Wd Sdng    197
Plywood    142
CmentBd     60
Wd Shng     38
Stucco      26
BrkFace     25
AsbShng     20
ImStucc     10
Brk Cmn      7
Stone        5
AsphShn      3
Other        1
CBlock       1
Name: Exterior2nd, dtype: int64

----------------------------------------MasVnrType---------------------------------------- - 

None       864
BrkFace    445
Stone      128
BrkCmn      15
Name: MasVnrType, dtype: int64

----------------------------------------ExterQual---------------------------------------- - 

TA    906
Gd    488
Ex     52
Fa     14
Name: ExterQual, dtype: int64

----------------------------------------ExterCond---------------------------------------- - 

TA    1282
Gd     146
Fa      28
Ex       3
Po       1
Name: ExterCond, dtype: int64

----------------------------------------Foundation---------------------------------------- - 

PConc     647
CBlock    634
BrkTil    146
Slab       24
Stone       6
Wood        3
Name: Foundation, dtype: int64

----------------------------------------BsmtQual---------------------------------------- - 

TA    649
Gd    618
Ex    121
Fa     35
Name: BsmtQual, dtype: int64

----------------------------------------BsmtCond---------------------------------------- - 

TA    1311
Gd      65
Fa      45
Po       2
Name: BsmtCond, dtype: int64

----------------------------------------BsmtExposure---------------------------------------- - 

No    953
Av    221
Gd    134
Mn    114
Name: BsmtExposure, dtype: int64

----------------------------------------BsmtFinType1---------------------------------------- - 

Unf    430
GLQ    418
ALQ    220
BLQ    148
Rec    133
LwQ     74
Name: BsmtFinType1, dtype: int64

----------------------------------------BsmtFinType2---------------------------------------- - 

Unf    1256
Rec      54
LwQ      46
BLQ      33
ALQ      19
GLQ      14
Name: BsmtFinType2, dtype: int64

----------------------------------------Heating---------------------------------------- - 

GasA     1428
GasW       18
Grav        7
Wall        4
OthW        2
Floor       1
Name: Heating, dtype: int64

----------------------------------------HeatingQC---------------------------------------- - 

Ex    741
TA    428
Gd    241
Fa     49
Po      1
Name: HeatingQC, dtype: int64

----------------------------------------CentralAir---------------------------------------- - 

Y    1365
N      95
Name: CentralAir, dtype: int64

----------------------------------------Electrical---------------------------------------- - 

SBrkr    1334
FuseA      94
FuseF      27
FuseP       3
Mix         1
Name: Electrical, dtype: int64

----------------------------------------KitchenQual---------------------------------------- - 

TA    735
Gd    586
Ex    100
Fa     39
Name: KitchenQual, dtype: int64

----------------------------------------Functional---------------------------------------- - 

Typ     1360
Min2      34
Min1      31
Mod       15
Maj1      14
Maj2       5
Sev        1
Name: Functional, dtype: int64

----------------------------------------GarageType---------------------------------------- - 

Attchd     870
Detchd     387
BuiltIn     88
Basment     19
CarPort      9
2Types       6
Name: GarageType, dtype: int64

----------------------------------------GarageFinish---------------------------------------- - 

Unf    605
RFn    422
Fin    352
Name: GarageFinish, dtype: int64

----------------------------------------GarageQual---------------------------------------- - 

TA    1311
Fa      48
Gd      14
Ex       3
Po       3
Name: GarageQual, dtype: int64

----------------------------------------GarageCond---------------------------------------- - 

TA    1326
Fa      35
Gd       9
Po       7
Ex       2
Name: GarageCond, dtype: int64

----------------------------------------PavedDrive---------------------------------------- - 

Y    1340
N      90
P      30
Name: PavedDrive, dtype: int64

----------------------------------------SaleType---------------------------------------- - 

WD       1267
New       122
COD        43
ConLD       9
ConLw       5
ConLI       5
CWD         4
Oth         3
Con         2
Name: SaleType, dtype: int64

----------------------------------------SaleCondition---------------------------------------- - 

Normal     1198
Partial     125
Abnorml     101
Family       20
Alloca       12
AdjLand       4
Name: SaleCondition, dtype: int64

In [48]:
for col in cat_vars.columns:
    print('-' * 40 + col + '-' * 40 , end=' - ')
    display(cat_vars[col].value_counts(normalize = True))

----------------------------------------MSZoning---------------------------------------- - 

RL         0.788356
RM         0.149315
FV         0.044521
RH         0.010959
C (all)    0.006849
Name: MSZoning, dtype: float64

----------------------------------------Street---------------------------------------- - 

Pave    0.99589
Grvl    0.00411
Name: Street, dtype: float64

----------------------------------------LotShape---------------------------------------- - 

Reg    0.633562
IR1    0.331507
IR2    0.028082
IR3    0.006849
Name: LotShape, dtype: float64

----------------------------------------LandContour---------------------------------------- - 

Lvl    0.897945
Bnk    0.043151
HLS    0.034247
Low    0.024658
Name: LandContour, dtype: float64

----------------------------------------Utilities---------------------------------------- - 

AllPub    0.999315
NoSeWa    0.000685
Name: Utilities, dtype: float64

----------------------------------------LotConfig---------------------------------------- - 

Inside     0.720548
Corner     0.180137
CulDSac    0.064384
FR2        0.032192
FR3        0.002740
Name: LotConfig, dtype: float64

----------------------------------------LandSlope---------------------------------------- - 

Gtl    0.946575
Mod    0.044521
Sev    0.008904
Name: LandSlope, dtype: float64

----------------------------------------Neighborhood---------------------------------------- - 

NAmes      0.154110
CollgCr    0.102740
OldTown    0.077397
Edwards    0.068493
Somerst    0.058904
Gilbert    0.054110
NridgHt    0.052740
Sawyer     0.050685
NWAmes     0.050000
SawyerW    0.040411
BrkSide    0.039726
Crawfor    0.034932
Mitchel    0.033562
NoRidge    0.028082
Timber     0.026027
IDOTRR     0.025342
ClearCr    0.019178
StoneBr    0.017123
SWISU      0.017123
MeadowV    0.011644
Blmngtn    0.011644
BrDale     0.010959
Veenker    0.007534
NPkVill    0.006164
Blueste    0.001370
Name: Neighborhood, dtype: float64

----------------------------------------Condition1---------------------------------------- - 

Norm      0.863014
Feedr     0.055479
Artery    0.032877
RRAn      0.017808
PosN      0.013014
RRAe      0.007534
PosA      0.005479
RRNn      0.003425
RRNe      0.001370
Name: Condition1, dtype: float64

----------------------------------------Condition2---------------------------------------- - 

Norm      0.989726
Feedr     0.004110
PosN      0.001370
Artery    0.001370
RRNn      0.001370
RRAn      0.000685
RRAe      0.000685
PosA      0.000685
Name: Condition2, dtype: float64

----------------------------------------BldgType---------------------------------------- - 

1Fam      0.835616
TwnhsE    0.078082
Duplex    0.035616
Twnhs     0.029452
2fmCon    0.021233
Name: BldgType, dtype: float64

----------------------------------------HouseStyle---------------------------------------- - 

1Story    0.497260
2Story    0.304795
1.5Fin    0.105479
SLvl      0.044521
SFoyer    0.025342
1.5Unf    0.009589
2.5Unf    0.007534
2.5Fin    0.005479
Name: HouseStyle, dtype: float64

----------------------------------------RoofStyle---------------------------------------- - 

Gable      0.781507
Hip        0.195890
Flat       0.008904
Gambrel    0.007534
Mansard    0.004795
Shed       0.001370
Name: RoofStyle, dtype: float64

----------------------------------------RoofMatl---------------------------------------- - 

CompShg    0.982192
Tar&Grv    0.007534
WdShngl    0.004110
WdShake    0.003425
Metal      0.000685
ClyTile    0.000685
Roll       0.000685
Membran    0.000685
Name: RoofMatl, dtype: float64

----------------------------------------Exterior1st---------------------------------------- - 

VinylSd    0.352740
HdBoard    0.152055
MetalSd    0.150685
Wd Sdng    0.141096
Plywood    0.073973
CemntBd    0.041781
BrkFace    0.034247
WdShing    0.017808
Stucco     0.017123
AsbShng    0.013699
BrkComm    0.001370
Stone      0.001370
AsphShn    0.000685
ImStucc    0.000685
CBlock     0.000685
Name: Exterior1st, dtype: float64

----------------------------------------Exterior2nd---------------------------------------- - 

VinylSd    0.345205
MetalSd    0.146575
HdBoard    0.141781
Wd Sdng    0.134932
Plywood    0.097260
CmentBd    0.041096
Wd Shng    0.026027
Stucco     0.017808
BrkFace    0.017123
AsbShng    0.013699
ImStucc    0.006849
Brk Cmn    0.004795
Stone      0.003425
AsphShn    0.002055
Other      0.000685
CBlock     0.000685
Name: Exterior2nd, dtype: float64

----------------------------------------MasVnrType---------------------------------------- - 

None       0.595041
BrkFace    0.306474
Stone      0.088154
BrkCmn     0.010331
Name: MasVnrType, dtype: float64

----------------------------------------ExterQual---------------------------------------- - 

TA    0.620548
Gd    0.334247
Ex    0.035616
Fa    0.009589
Name: ExterQual, dtype: float64

----------------------------------------ExterCond---------------------------------------- - 

TA    0.878082
Gd    0.100000
Fa    0.019178
Ex    0.002055
Po    0.000685
Name: ExterCond, dtype: float64

----------------------------------------Foundation---------------------------------------- - 

PConc     0.443151
CBlock    0.434247
BrkTil    0.100000
Slab      0.016438
Stone     0.004110
Wood      0.002055
Name: Foundation, dtype: float64

----------------------------------------BsmtQual---------------------------------------- - 

TA    0.456079
Gd    0.434294
Ex    0.085032
Fa    0.024596
Name: BsmtQual, dtype: float64

----------------------------------------BsmtCond---------------------------------------- - 

TA    0.921293
Gd    0.045678
Fa    0.031623
Po    0.001405
Name: BsmtCond, dtype: float64

----------------------------------------BsmtExposure---------------------------------------- - 

No    0.670183
Av    0.155415
Gd    0.094233
Mn    0.080169
Name: BsmtExposure, dtype: float64

----------------------------------------BsmtFinType1---------------------------------------- - 

Unf    0.302178
GLQ    0.293746
ALQ    0.154603
BLQ    0.104006
Rec    0.093465
LwQ    0.052003
Name: BsmtFinType1, dtype: float64

----------------------------------------BsmtFinType2---------------------------------------- - 

Unf    0.883263
Rec    0.037975
LwQ    0.032349
BLQ    0.023207
ALQ    0.013361
GLQ    0.009845
Name: BsmtFinType2, dtype: float64

----------------------------------------Heating---------------------------------------- - 

GasA     0.978082
GasW     0.012329
Grav     0.004795
Wall     0.002740
OthW     0.001370
Floor    0.000685
Name: Heating, dtype: float64

----------------------------------------HeatingQC---------------------------------------- - 

Ex    0.507534
TA    0.293151
Gd    0.165068
Fa    0.033562
Po    0.000685
Name: HeatingQC, dtype: float64

----------------------------------------CentralAir---------------------------------------- - 

Y    0.934932
N    0.065068
Name: CentralAir, dtype: float64

----------------------------------------Electrical---------------------------------------- - 

SBrkr    0.914325
FuseA    0.064428
FuseF    0.018506
FuseP    0.002056
Mix      0.000685
Name: Electrical, dtype: float64

----------------------------------------KitchenQual---------------------------------------- - 

TA    0.503425
Gd    0.401370
Ex    0.068493
Fa    0.026712
Name: KitchenQual, dtype: float64

----------------------------------------Functional---------------------------------------- - 

Typ     0.931507
Min2    0.023288
Min1    0.021233
Mod     0.010274
Maj1    0.009589
Maj2    0.003425
Sev     0.000685
Name: Functional, dtype: float64

----------------------------------------GarageType---------------------------------------- - 

Attchd     0.630892
Detchd     0.280638
BuiltIn    0.063814
Basment    0.013778
CarPort    0.006526
2Types     0.004351
Name: GarageType, dtype: float64

----------------------------------------GarageFinish---------------------------------------- - 

Unf    0.438724
RFn    0.306019
Fin    0.255257
Name: GarageFinish, dtype: float64

----------------------------------------GarageQual---------------------------------------- - 

TA    0.950689
Fa    0.034808
Gd    0.010152
Ex    0.002175
Po    0.002175
Name: GarageQual, dtype: float64

----------------------------------------GarageCond---------------------------------------- - 

TA    0.961566
Fa    0.025381
Gd    0.006526
Po    0.005076
Ex    0.001450
Name: GarageCond, dtype: float64

----------------------------------------PavedDrive---------------------------------------- - 

Y    0.917808
N    0.061644
P    0.020548
Name: PavedDrive, dtype: float64

----------------------------------------SaleType---------------------------------------- - 

WD       0.867808
New      0.083562
COD      0.029452
ConLD    0.006164
ConLw    0.003425
ConLI    0.003425
CWD      0.002740
Oth      0.002055
Con      0.001370
Name: SaleType, dtype: float64

----------------------------------------SaleCondition---------------------------------------- - 

Normal     0.820548
Partial    0.085616
Abnorml    0.069178
Family     0.013699
Alloca     0.008219
AdjLand    0.002740
Name: SaleCondition, dtype: float64

In [50]:
skewed_vars = cat_vars[['Street', 'LandContour', 'Utilities', 'LandSlope', 'Condition1', 'Condition2', 
                        'RoofMatl', 'ExterCond', 'BsmtCond', 'BsmtFinType2', 'Heating', 'CentralAir', 
                        'Electrical','Functional', 'GarageQual', 'GarageCond', 'PavedDrive','SaleType'] ]   

In [51]:
price = price.drop(skewed_vars, 1)

In [52]:
cat_vars = price.select_dtypes(include = 'object')
cat_vars.columns

Index(['MSZoning', 'LotShape', 'LotConfig', 'Neighborhood', 'BldgType',
       'HouseStyle', 'RoofStyle', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'ExterQual', 'Foundation', 'BsmtQual', 'BsmtExposure', 'BsmtFinType1',
       'HeatingQC', 'KitchenQual', 'GarageType', 'GarageFinish',
       'SaleCondition'],
      dtype='object')

In [53]:
for col in cat_vars.columns:
    print('-' * 40 + col + '-' * 40 , end=' - ')
    display(cat_vars[col].value_counts(normalize = True))

----------------------------------------MSZoning---------------------------------------- - 

RL         0.788356
RM         0.149315
FV         0.044521
RH         0.010959
C (all)    0.006849
Name: MSZoning, dtype: float64

----------------------------------------LotShape---------------------------------------- - 

Reg    0.633562
IR1    0.331507
IR2    0.028082
IR3    0.006849
Name: LotShape, dtype: float64

----------------------------------------LotConfig---------------------------------------- - 

Inside     0.720548
Corner     0.180137
CulDSac    0.064384
FR2        0.032192
FR3        0.002740
Name: LotConfig, dtype: float64

----------------------------------------Neighborhood---------------------------------------- - 

NAmes      0.154110
CollgCr    0.102740
OldTown    0.077397
Edwards    0.068493
Somerst    0.058904
Gilbert    0.054110
NridgHt    0.052740
Sawyer     0.050685
NWAmes     0.050000
SawyerW    0.040411
BrkSide    0.039726
Crawfor    0.034932
Mitchel    0.033562
NoRidge    0.028082
Timber     0.026027
IDOTRR     0.025342
ClearCr    0.019178
StoneBr    0.017123
SWISU      0.017123
MeadowV    0.011644
Blmngtn    0.011644
BrDale     0.010959
Veenker    0.007534
NPkVill    0.006164
Blueste    0.001370
Name: Neighborhood, dtype: float64

----------------------------------------BldgType---------------------------------------- - 

1Fam      0.835616
TwnhsE    0.078082
Duplex    0.035616
Twnhs     0.029452
2fmCon    0.021233
Name: BldgType, dtype: float64

----------------------------------------HouseStyle---------------------------------------- - 

1Story    0.497260
2Story    0.304795
1.5Fin    0.105479
SLvl      0.044521
SFoyer    0.025342
1.5Unf    0.009589
2.5Unf    0.007534
2.5Fin    0.005479
Name: HouseStyle, dtype: float64

----------------------------------------RoofStyle---------------------------------------- - 

Gable      0.781507
Hip        0.195890
Flat       0.008904
Gambrel    0.007534
Mansard    0.004795
Shed       0.001370
Name: RoofStyle, dtype: float64

----------------------------------------Exterior1st---------------------------------------- - 

VinylSd    0.352740
HdBoard    0.152055
MetalSd    0.150685
Wd Sdng    0.141096
Plywood    0.073973
CemntBd    0.041781
BrkFace    0.034247
WdShing    0.017808
Stucco     0.017123
AsbShng    0.013699
BrkComm    0.001370
Stone      0.001370
AsphShn    0.000685
ImStucc    0.000685
CBlock     0.000685
Name: Exterior1st, dtype: float64

----------------------------------------Exterior2nd---------------------------------------- - 

VinylSd    0.345205
MetalSd    0.146575
HdBoard    0.141781
Wd Sdng    0.134932
Plywood    0.097260
CmentBd    0.041096
Wd Shng    0.026027
Stucco     0.017808
BrkFace    0.017123
AsbShng    0.013699
ImStucc    0.006849
Brk Cmn    0.004795
Stone      0.003425
AsphShn    0.002055
Other      0.000685
CBlock     0.000685
Name: Exterior2nd, dtype: float64

----------------------------------------MasVnrType---------------------------------------- - 

None       0.595041
BrkFace    0.306474
Stone      0.088154
BrkCmn     0.010331
Name: MasVnrType, dtype: float64

----------------------------------------ExterQual---------------------------------------- - 

TA    0.620548
Gd    0.334247
Ex    0.035616
Fa    0.009589
Name: ExterQual, dtype: float64

----------------------------------------Foundation---------------------------------------- - 

PConc     0.443151
CBlock    0.434247
BrkTil    0.100000
Slab      0.016438
Stone     0.004110
Wood      0.002055
Name: Foundation, dtype: float64

----------------------------------------BsmtQual---------------------------------------- - 

TA    0.456079
Gd    0.434294
Ex    0.085032
Fa    0.024596
Name: BsmtQual, dtype: float64

----------------------------------------BsmtExposure---------------------------------------- - 

No    0.670183
Av    0.155415
Gd    0.094233
Mn    0.080169
Name: BsmtExposure, dtype: float64

----------------------------------------BsmtFinType1---------------------------------------- - 

Unf    0.302178
GLQ    0.293746
ALQ    0.154603
BLQ    0.104006
Rec    0.093465
LwQ    0.052003
Name: BsmtFinType1, dtype: float64

----------------------------------------HeatingQC---------------------------------------- - 

Ex    0.507534
TA    0.293151
Gd    0.165068
Fa    0.033562
Po    0.000685
Name: HeatingQC, dtype: float64

----------------------------------------KitchenQual---------------------------------------- - 

TA    0.503425
Gd    0.401370
Ex    0.068493
Fa    0.026712
Name: KitchenQual, dtype: float64

----------------------------------------GarageType---------------------------------------- - 

Attchd     0.630892
Detchd     0.280638
BuiltIn    0.063814
Basment    0.013778
CarPort    0.006526
2Types     0.004351
Name: GarageType, dtype: float64

----------------------------------------GarageFinish---------------------------------------- - 

Unf    0.438724
RFn    0.306019
Fin    0.255257
Name: GarageFinish, dtype: float64

----------------------------------------SaleCondition---------------------------------------- - 

Normal     0.820548
Partial    0.085616
Abnorml    0.069178
Family     0.013699
Alloca     0.008219
AdjLand    0.002740
Name: SaleCondition, dtype: float64

In [54]:
cat_vars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 20 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   MSZoning       1460 non-null   object
 1   LotShape       1460 non-null   object
 2   LotConfig      1460 non-null   object
 3   Neighborhood   1460 non-null   object
 4   BldgType       1460 non-null   object
 5   HouseStyle     1460 non-null   object
 6   RoofStyle      1460 non-null   object
 7   Exterior1st    1460 non-null   object
 8   Exterior2nd    1460 non-null   object
 9   MasVnrType     1452 non-null   object
 10  ExterQual      1460 non-null   object
 11  Foundation     1460 non-null   object
 12  BsmtQual       1423 non-null   object
 13  BsmtExposure   1422 non-null   object
 14  BsmtFinType1   1423 non-null   object
 15  HeatingQC      1460 non-null   object
 16  KitchenQual    1460 non-null   object
 17  GarageType     1379 non-null   object
 18  GarageFinish   1379 non-null

In [58]:
cat_vars.isnull().sum()

MSZoning          0
LotShape          0
LotConfig         0
Neighborhood      0
BldgType          0
HouseStyle        0
RoofStyle         0
Exterior1st       0
Exterior2nd       0
MasVnrType        8
ExterQual         0
Foundation        0
BsmtQual         37
BsmtExposure     38
BsmtFinType1     37
HeatingQC         0
KitchenQual       0
GarageType       81
GarageFinish     81
SaleCondition     0
dtype: int64

In [59]:
# Fill the NAN cells with mode.
price.MasVnrType = price.MasVnrType.fillna(price.MasVnrType.mode()[0])
price.BsmtQual = price.BsmtQual.fillna(price.BsmtQual.mode()[0])
price.BsmtExposure = price.BsmtExposure.fillna(price.BsmtExposure.mode()[0])
price.BsmtFinType1 = price.BsmtFinType1.fillna(price.BsmtFinType1.mode()[0])
price.GarageType = price.GarageType.fillna(price.GarageType.mode()[0])
price.GarageFinish = price.GarageFinish.fillna(price.GarageFinish.mode()[0])

In [61]:
price[cat_vars.columns].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 20 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   MSZoning       1460 non-null   object
 1   LotShape       1460 non-null   object
 2   LotConfig      1460 non-null   object
 3   Neighborhood   1460 non-null   object
 4   BldgType       1460 non-null   object
 5   HouseStyle     1460 non-null   object
 6   RoofStyle      1460 non-null   object
 7   Exterior1st    1460 non-null   object
 8   Exterior2nd    1460 non-null   object
 9   MasVnrType     1460 non-null   object
 10  ExterQual      1460 non-null   object
 11  Foundation     1460 non-null   object
 12  BsmtQual       1460 non-null   object
 13  BsmtExposure   1460 non-null   object
 14  BsmtFinType1   1460 non-null   object
 15  HeatingQC      1460 non-null   object
 16  KitchenQual    1460 non-null   object
 17  GarageType     1460 non-null   object
 18  GarageFinish   1460 non-null

In [62]:
cat_vars = price.select_dtypes(include = 'object')

In [64]:
cat_vars.isnull().sum()

MSZoning         0
LotShape         0
LotConfig        0
Neighborhood     0
BldgType         0
HouseStyle       0
RoofStyle        0
Exterior1st      0
Exterior2nd      0
MasVnrType       0
ExterQual        0
Foundation       0
BsmtQual         0
BsmtExposure     0
BsmtFinType1     0
HeatingQC        0
KitchenQual      0
GarageType       0
GarageFinish     0
SaleCondition    0
dtype: int64

In [65]:
for col in cat_vars.columns:
    print('-' * 40 + col + '-' * 40 , end=' - ')
    display(cat_vars[col].value_counts(normalize = True))

----------------------------------------MSZoning---------------------------------------- - 

RL         0.788356
RM         0.149315
FV         0.044521
RH         0.010959
C (all)    0.006849
Name: MSZoning, dtype: float64

----------------------------------------LotShape---------------------------------------- - 

Reg    0.633562
IR1    0.331507
IR2    0.028082
IR3    0.006849
Name: LotShape, dtype: float64

----------------------------------------LotConfig---------------------------------------- - 

Inside     0.720548
Corner     0.180137
CulDSac    0.064384
FR2        0.032192
FR3        0.002740
Name: LotConfig, dtype: float64

----------------------------------------Neighborhood---------------------------------------- - 

NAmes      0.154110
CollgCr    0.102740
OldTown    0.077397
Edwards    0.068493
Somerst    0.058904
Gilbert    0.054110
NridgHt    0.052740
Sawyer     0.050685
NWAmes     0.050000
SawyerW    0.040411
BrkSide    0.039726
Crawfor    0.034932
Mitchel    0.033562
NoRidge    0.028082
Timber     0.026027
IDOTRR     0.025342
ClearCr    0.019178
StoneBr    0.017123
SWISU      0.017123
MeadowV    0.011644
Blmngtn    0.011644
BrDale     0.010959
Veenker    0.007534
NPkVill    0.006164
Blueste    0.001370
Name: Neighborhood, dtype: float64

----------------------------------------BldgType---------------------------------------- - 

1Fam      0.835616
TwnhsE    0.078082
Duplex    0.035616
Twnhs     0.029452
2fmCon    0.021233
Name: BldgType, dtype: float64

----------------------------------------HouseStyle---------------------------------------- - 

1Story    0.497260
2Story    0.304795
1.5Fin    0.105479
SLvl      0.044521
SFoyer    0.025342
1.5Unf    0.009589
2.5Unf    0.007534
2.5Fin    0.005479
Name: HouseStyle, dtype: float64

----------------------------------------RoofStyle---------------------------------------- - 

Gable      0.781507
Hip        0.195890
Flat       0.008904
Gambrel    0.007534
Mansard    0.004795
Shed       0.001370
Name: RoofStyle, dtype: float64

----------------------------------------Exterior1st---------------------------------------- - 

VinylSd    0.352740
HdBoard    0.152055
MetalSd    0.150685
Wd Sdng    0.141096
Plywood    0.073973
CemntBd    0.041781
BrkFace    0.034247
WdShing    0.017808
Stucco     0.017123
AsbShng    0.013699
BrkComm    0.001370
Stone      0.001370
AsphShn    0.000685
ImStucc    0.000685
CBlock     0.000685
Name: Exterior1st, dtype: float64

----------------------------------------Exterior2nd---------------------------------------- - 

VinylSd    0.345205
MetalSd    0.146575
HdBoard    0.141781
Wd Sdng    0.134932
Plywood    0.097260
CmentBd    0.041096
Wd Shng    0.026027
Stucco     0.017808
BrkFace    0.017123
AsbShng    0.013699
ImStucc    0.006849
Brk Cmn    0.004795
Stone      0.003425
AsphShn    0.002055
Other      0.000685
CBlock     0.000685
Name: Exterior2nd, dtype: float64

----------------------------------------MasVnrType---------------------------------------- - 

None       0.597260
BrkFace    0.304795
Stone      0.087671
BrkCmn     0.010274
Name: MasVnrType, dtype: float64

----------------------------------------ExterQual---------------------------------------- - 

TA    0.620548
Gd    0.334247
Ex    0.035616
Fa    0.009589
Name: ExterQual, dtype: float64

----------------------------------------Foundation---------------------------------------- - 

PConc     0.443151
CBlock    0.434247
BrkTil    0.100000
Slab      0.016438
Stone     0.004110
Wood      0.002055
Name: Foundation, dtype: float64

----------------------------------------BsmtQual---------------------------------------- - 

TA    0.469863
Gd    0.423288
Ex    0.082877
Fa    0.023973
Name: BsmtQual, dtype: float64

----------------------------------------BsmtExposure---------------------------------------- - 

No    0.678767
Av    0.151370
Gd    0.091781
Mn    0.078082
Name: BsmtExposure, dtype: float64

----------------------------------------BsmtFinType1---------------------------------------- - 

Unf    0.319863
GLQ    0.286301
ALQ    0.150685
BLQ    0.101370
Rec    0.091096
LwQ    0.050685
Name: BsmtFinType1, dtype: float64

----------------------------------------HeatingQC---------------------------------------- - 

Ex    0.507534
TA    0.293151
Gd    0.165068
Fa    0.033562
Po    0.000685
Name: HeatingQC, dtype: float64

----------------------------------------KitchenQual---------------------------------------- - 

TA    0.503425
Gd    0.401370
Ex    0.068493
Fa    0.026712
Name: KitchenQual, dtype: float64

----------------------------------------GarageType---------------------------------------- - 

Attchd     0.651370
Detchd     0.265068
BuiltIn    0.060274
Basment    0.013014
CarPort    0.006164
2Types     0.004110
Name: GarageType, dtype: float64

----------------------------------------GarageFinish---------------------------------------- - 

Unf    0.469863
RFn    0.289041
Fin    0.241096
Name: GarageFinish, dtype: float64

----------------------------------------SaleCondition---------------------------------------- - 

Normal     0.820548
Partial    0.085616
Abnorml    0.069178
Family     0.013699
Alloca     0.008219
AdjLand    0.002740
Name: SaleCondition, dtype: float64

In [66]:
# Define the changes to be made to 'Last Notable Activity' column and apply the changes
def Change_MSZoning(x):
    if x == 'RL':
        return 'RL'
    elif x == 'RM':
        return 'RM'
    else: 
        return 'Others'

price.MSZoning = price.MSZoning.apply(Change_MSZoning)

In [67]:
# Define the changes to be made to 'Last Notable Activity' column and apply the changes
def Change_LotShape(x):
    if x == 'Reg':
        return 'Reg'
    elif x == 'IR1':
        return 'IR1'
    else: 
        return 'Others'

price.LotShape = price.LotShape.apply(Change_LotShape)

In [68]:
# Define the changes to be made to 'Last Notable Activity' column and apply the changes
def Change_Neighborhood(x):
    if x == 'NAmes':
        return 'NAmes'
    elif x == 'CollgCr':
        return 'CollgCr'
    elif x == 'OldTown':
        return 'OldTown'
    elif x == 'Edwards':
        return 'Edwards'
    else: 
        return 'Others'

price.Neighborhood = price.Neighborhood.apply(Change_Neighborhood)

In [69]:
# Define the changes to be made to 'Last Notable Activity' column and apply the changes
def Change_BldgType(x):
    if x == '1Fam':
        return '1Fam'
    elif x == 'TwnhsE':
        return 'TwnhsE'
    else: 
        return 'Others'

price.BldgType = price.BldgType.apply(Change_BldgType)


In [70]:
# Define the changes to be made to 'Last Notable Activity' column and apply the changes
def Change_HouseStyle(x):
    if x == '1Story':
        return '1Story'
    elif x == '2Story':
        return '2Story'
    elif x == '1.5Fin':
        return '1.5Fin'
    else: 
        return 'Others'

price.HouseStyle = price.HouseStyle.apply(Change_HouseStyle)

In [72]:

def Change_RoofStyle(x):
    if x == 'Gable':
        return 'Gable'
    elif x == 'Hip':
        return 'Hip'
    else: 
        return 'Others'

def Change_Exterior1st(x):
    if x == 'VinylSd':
        return 'VinylSd'
    elif x == 'HdBoard':
        return 'HdBoard'
    elif x == 'MetalSd':
        return 'MetalSd'
    elif x == 'Wd Sdng':
        return 'WdSdng'
    else: 
        return 'Others'

def Change_ExterQual(x):
    if x == 'TA':
        return 'TA'
    elif x == 'Gd':
        return 'Gd'
    else: 
        return 'Others'

def Change_Foundation(x):
    if x == 'PConc':
        return 'PConc'
    elif x == 'CBlock':
        return 'CBlock'
    else: 
        return 'Others'

def Change_BsmtQual(x):
    if x == 'TA':
        return 'TA'
    elif x == 'Gd':
        return 'Gd'
    else: 
        return 'Others'

def Change_HeatingQC(x):
    if x == 'Ex':
        return 'Ex'
    elif x == 'TA':
        return 'TA'
    elif x == 'Gd':
        return 'Gd'
    else: 
        return 'Others'

def Change_KitchenQual(x):
    if x == 'TA':
        return 'TA'
    elif x == 'Gd':
        return 'Gd'
    else: 
        return 'Others'

def Change_GarageType(x):
    if x == 'Attchd':
        return 'Attchd'
    elif x == 'Detchd':
        return 'Detchd'
    else: 
        return 'Others'

def Change_SaleCondition(x):
    if x == 'Normal':
        return 'Normal'
    elif x == 'Partial':
        return 'Partial'
    else: 
        return 'Others'



price.RoofStyle = price.RoofStyle.apply(Change_RoofStyle)
price.Exterior1st = price.Exterior1st.apply(Change_Exterior1st)
price.ExterQual = price.ExterQual.apply(Change_ExterQual)
price.Foundation = price.Foundation.apply(Change_Foundation)
price.BsmtQual = price.BsmtQual.apply(Change_BsmtQual)
price.HeatingQC = price.HeatingQC.apply(Change_HeatingQC)
price.KitchenQual = price.KitchenQual.apply(Change_KitchenQual)
price.GarageType = price.GarageType.apply(Change_GarageType)
price.SaleCondition = price.SaleCondition.apply(Change_SaleCondition)




In [75]:

def Change_Exterior2nd(x):
    if x == 'VinylSd':
        return 'VinylSd'
    elif x == 'HdBoard':
        return 'HdBoard'
    elif x == 'MetalSd':
        return 'MetalSd'
    elif x == 'Wd Sdng':
        return 'WdSdng'
    elif x == 'Plywood':
        return 'Plywood'
    else: 
        return 'Others'
    
price.Exterior2nd = price.Exterior2nd.apply(Change_Exterior2nd)


In [77]:
def Change_LotConfig(x):
    if x == 'Inside':
        return 'Inside'
    elif x == 'Corner':
        return 'Corner'
    else: 
        return 'Others'
    
price.LotConfig = price.LotConfig.apply(Change_LotConfig)


In [78]:
cat_vars = price.select_dtypes(include = 'object')

for col in cat_vars.columns:
    print('-' * 40 + col + '-' * 40 , end=' - ')
    display(cat_vars[col].value_counts(normalize = True))

----------------------------------------MSZoning---------------------------------------- - 

RL        0.788356
RM        0.149315
Others    0.062329
Name: MSZoning, dtype: float64

----------------------------------------LotShape---------------------------------------- - 

Reg       0.633562
IR1       0.331507
Others    0.034932
Name: LotShape, dtype: float64

----------------------------------------LotConfig---------------------------------------- - 

Inside    0.720548
Corner    0.180137
Others    0.099315
Name: LotConfig, dtype: float64

----------------------------------------Neighborhood---------------------------------------- - 

Others     0.597260
NAmes      0.154110
CollgCr    0.102740
OldTown    0.077397
Edwards    0.068493
Name: Neighborhood, dtype: float64

----------------------------------------BldgType---------------------------------------- - 

1Fam      0.835616
Others    0.086301
TwnhsE    0.078082
Name: BldgType, dtype: float64

----------------------------------------HouseStyle---------------------------------------- - 

1Story    0.497260
2Story    0.304795
1.5Fin    0.105479
Others    0.092466
Name: HouseStyle, dtype: float64

----------------------------------------RoofStyle---------------------------------------- - 

Gable     0.781507
Hip       0.195890
Others    0.022603
Name: RoofStyle, dtype: float64

----------------------------------------Exterior1st---------------------------------------- - 

VinylSd    0.352740
Others     0.203425
HdBoard    0.152055
MetalSd    0.150685
WdSdng     0.141096
Name: Exterior1st, dtype: float64

----------------------------------------Exterior2nd---------------------------------------- - 

VinylSd    0.345205
MetalSd    0.146575
HdBoard    0.141781
WdSdng     0.134932
Others     0.134247
Plywood    0.097260
Name: Exterior2nd, dtype: float64

----------------------------------------ExterQual---------------------------------------- - 

TA        0.620548
Gd        0.334247
Others    0.045205
Name: ExterQual, dtype: float64

----------------------------------------Foundation---------------------------------------- - 

PConc     0.443151
CBlock    0.434247
Others    0.122603
Name: Foundation, dtype: float64

----------------------------------------BsmtQual---------------------------------------- - 

TA        0.469863
Gd        0.423288
Others    0.106849
Name: BsmtQual, dtype: float64

----------------------------------------BsmtExposure---------------------------------------- - 

No    0.678767
Av    0.151370
Gd    0.091781
Mn    0.078082
Name: BsmtExposure, dtype: float64

----------------------------------------BsmtFinType1---------------------------------------- - 

Unf    0.319863
GLQ    0.286301
ALQ    0.150685
BLQ    0.101370
Rec    0.091096
LwQ    0.050685
Name: BsmtFinType1, dtype: float64

----------------------------------------HeatingQC---------------------------------------- - 

Ex        0.507534
TA        0.293151
Gd        0.165068
Others    0.034247
Name: HeatingQC, dtype: float64

----------------------------------------KitchenQual---------------------------------------- - 

TA        0.503425
Gd        0.401370
Others    0.095205
Name: KitchenQual, dtype: float64

----------------------------------------GarageType---------------------------------------- - 

Attchd    0.651370
Detchd    0.265068
Others    0.083562
Name: GarageType, dtype: float64

----------------------------------------GarageFinish---------------------------------------- - 

Unf    0.469863
RFn    0.289041
Fin    0.241096
Name: GarageFinish, dtype: float64

----------------------------------------SaleCondition---------------------------------------- - 

Normal     0.820548
Others     0.093836
Partial    0.085616
Name: SaleCondition, dtype: float64

In [80]:
cat_vars.columns

Index(['MSZoning', 'LotShape', 'LotConfig', 'Neighborhood', 'BldgType',
       'HouseStyle', 'RoofStyle', 'Exterior1st', 'Exterior2nd', 'ExterQual',
       'Foundation', 'BsmtQual', 'BsmtExposure', 'BsmtFinType1', 'HeatingQC',
       'KitchenQual', 'GarageType', 'GarageFinish', 'SaleCondition'],
      dtype='object')

In [81]:
# Creating a dummy variable for some of the categorical variables and dropping the first one.
dummy1 = pd.get_dummies(price[['MSZoning', 'LotShape', 'LotConfig', 'Neighborhood', 'BldgType',
       'HouseStyle', 'RoofStyle', 'Exterior1st', 'Exterior2nd', 'ExterQual',
       'Foundation', 'BsmtQual', 'BsmtExposure', 'BsmtFinType1', 'HeatingQC',
       'KitchenQual', 'GarageType', 'GarageFinish', 'SaleCondition']], drop_first=True)

# Adding the results to the master dataframe
price = pd.concat([price, dummy1], axis=1)

In [82]:
price.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,LotShape,LotConfig,Neighborhood,BldgType,HouseStyle,...,HeatingQC_Others,HeatingQC_TA,KitchenQual_Others,KitchenQual_TA,GarageType_Detchd,GarageType_Others,GarageFinish_RFn,GarageFinish_Unf,SaleCondition_Others,SaleCondition_Partial
0,1,60,RL,65.0,8450,Reg,Inside,CollgCr,1Fam,2Story,...,0,0,0,0,0,0,1,0,0,0
1,2,20,RL,80.0,9600,Reg,Others,Others,1Fam,1Story,...,0,0,0,1,0,0,1,0,0,0
2,3,60,RL,68.0,11250,IR1,Inside,CollgCr,1Fam,2Story,...,0,0,0,0,0,0,1,0,0,0
3,4,70,RL,60.0,9550,IR1,Corner,Others,1Fam,2Story,...,0,0,0,0,1,0,0,1,1,0
4,5,60,RL,84.0,14260,IR1,Others,Others,1Fam,2Story,...,0,0,0,0,0,0,1,0,0,0


In [83]:
# Drop the categorical columns after creating dummines
price = price.drop(['MSZoning', 'LotShape', 'LotConfig', 'Neighborhood', 'BldgType',
       'HouseStyle', 'RoofStyle', 'Exterior1st', 'Exterior2nd', 'ExterQual',
       'Foundation', 'BsmtQual', 'BsmtExposure', 'BsmtFinType1', 'HeatingQC',
       'KitchenQual', 'GarageType', 'GarageFinish', 'SaleCondition'], axis = 1)

In [84]:
price.head()

Unnamed: 0,Id,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,...,HeatingQC_Others,HeatingQC_TA,KitchenQual_Others,KitchenQual_TA,GarageType_Detchd,GarageType_Others,GarageFinish_RFn,GarageFinish_Unf,SaleCondition_Others,SaleCondition_Partial
0,1,60,65.0,8450,7,5,2003,2003,196.0,706,...,0,0,0,0,0,0,1,0,0,0
1,2,20,80.0,9600,6,8,1976,1976,0.0,978,...,0,0,0,1,0,0,1,0,0,0
2,3,60,68.0,11250,7,5,2001,2002,162.0,486,...,0,0,0,0,0,0,1,0,0,0
3,4,70,60.0,9550,7,5,1915,1970,0.0,216,...,0,0,0,0,1,0,0,1,1,0
4,5,60,84.0,14260,8,5,2000,2000,350.0,655,...,0,0,0,0,0,0,1,0,0,0


In [85]:
price.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 89 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Id                     1460 non-null   int64  
 1   MSSubClass             1460 non-null   int64  
 2   LotFrontage            1201 non-null   float64
 3   LotArea                1460 non-null   int64  
 4   OverallQual            1460 non-null   int64  
 5   OverallCond            1460 non-null   int64  
 6   YearBuilt              1460 non-null   int64  
 7   YearRemodAdd           1460 non-null   int64  
 8   MasVnrArea             1452 non-null   float64
 9   BsmtFinSF1             1460 non-null   int64  
 10  BsmtFinSF2             1460 non-null   int64  
 11  BsmtUnfSF              1460 non-null   int64  
 12  TotalBsmtSF            1460 non-null   int64  
 13  1stFlrSF               1460 non-null   int64  
 14  2ndFlrSF               1460 non-null   int64  
 15  LowQ