## Table of Contents

* [Data Import and Data Cleaning](#data-import-and-data-cleaning)
    * [Import Libraries](#import-libraries)
    * [Import CSV Files](#import-csv-files)
    * [Check Data](#check-data)
    * [Clean Up Data](#clean-up-data)
    * [Functions Created to Fill NaNs and Map Variables to Numeric Values](#functions-created-to-fill-nans-and-map-variables-to-numeric-values)
    * [Manual Mapping](#manual-mapping)
    * [Find Correlations and Drop Columns](#find-correlations-and-drop-columns)
* [Save Clean Data as CSV Files](#save-clean-data-as-csv-files)
* [Load Clean CSV File](#load-clean-csv-file)
* [Dummify Columns for Train and Test](#dummify-columns-for-train-and-test)
    * [Get Dummies on All Nominal Columns](#get-dummies-on-all-nominal-columns)
* [Save Data With Dummies as CSV File](#save-data-with-dummies-as-csv-file)

# Data Import and Data Cleaning

## Import Libraries

In [356]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import statsmodels.api as sm

from sklearn.linear_model import LinearRegression, RidgeCV, LassoCV
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import PolynomialFeatures, OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.metrics import mean_squared_error

## Import CSV Files

In [357]:
train = pd.read_csv('./datasets/train.csv')
test = pd.read_csv('./datasets/test.csv')

## Check Data

In [358]:
train.head()

Unnamed: 0,Id,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,Utilities,Lot Config,Land Slope,Neighborhood,Condition 1,Condition 2,Bldg Type,House Style,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Roof Style,Roof Matl,Exterior 1st,Exterior 2nd,Mas Vnr Type,Mas Vnr Area,Exter Qual,Exter Cond,Foundation,Bsmt Qual,Bsmt Cond,Bsmt Exposure,BsmtFin Type 1,BsmtFin SF 1,BsmtFin Type 2,BsmtFin SF 2,Bsmt Unf SF,Total Bsmt SF,Heating,Heating QC,Central Air,Electrical,1st Flr SF,2nd Flr SF,Low Qual Fin SF,Gr Liv Area,Bsmt Full Bath,Bsmt Half Bath,Full Bath,Half Bath,Bedroom AbvGr,Kitchen AbvGr,Kitchen Qual,TotRms AbvGrd,Functional,Fireplaces,Fireplace Qu,Garage Type,Garage Yr Blt,Garage Finish,Garage Cars,Garage Area,Garage Qual,Garage Cond,Paved Drive,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,SalePrice
0,109,533352170,60,RL,,13517,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
1,544,531379050,60,RL,43.0,11492,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
2,153,535304180,20,RL,68.0,7922,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
3,318,916386060,60,RL,73.0,9802,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
4,255,906425045,50,RL,82.0,14235,Pave,,IR1,Lvl,AllPub,Inside,Gtl,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


## Clean Up Data

In [360]:
# Change train columns to lowercase and snakecase
train.columns = train.columns.str.lower().str.replace(' ', '_')

# Change test columns to lowercase and snakecase
test.columns = test.columns.str.lower().str.replace(' ', '_')

In [361]:
# Check columns that have more than 0 NaN for both train.csv and test.csv

# Train
null_counts = train.isnull().sum()
columns_with_missing_values = null_counts[null_counts > 0]
print(columns_with_missing_values)

# Test
null_counts = test.isnull().sum()
columns_with_missing_values = null_counts[null_counts > 0]
print(columns_with_missing_values)

lot_frontage       330
alley             1911
mas_vnr_type        22
mas_vnr_area        22
bsmt_qual           55
bsmt_cond           55
bsmt_exposure       58
bsmtfin_type_1      55
bsmtfin_sf_1         1
bsmtfin_type_2      56
bsmtfin_sf_2         1
bsmt_unf_sf          1
total_bsmt_sf        1
bsmt_full_bath       2
bsmt_half_bath       2
fireplace_qu      1000
garage_type        113
garage_yr_blt      114
garage_finish      114
garage_cars          1
garage_area          1
garage_qual        114
garage_cond        114
pool_qc           2042
fence             1651
misc_feature      1986
dtype: int64
lot_frontage      160
alley             820
mas_vnr_type        1
mas_vnr_area        1
bsmt_qual          25
bsmt_cond          25
bsmt_exposure      25
bsmtfin_type_1     25
bsmtfin_type_2     25
electrical          1
fireplace_qu      422
garage_type        44
garage_yr_blt      45
garage_finish      45
garage_qual        45
garage_cond        45
pool_qc           874
fence          

## Functions Created to fill NaNs and Map Variables to Numeric Values

In [362]:
# Create function to change nominal columns with NaN/empty to 'NA' to
# match data dictionary

# Train function
def train_change_nominal(column):
    train[column] = train[column].replace(np.nan, 'NA')
    return train

# Test function
def test_change_nominal(column):
    test[column] = test[column].replace(np.nan, 'NA')
    return test

In [363]:
# Use function for training data
train_change_nominal('alley')
train_change_nominal('mas_vnr_type')
train_change_nominal('bsmt_qual')
train_change_nominal('bsmt_cond')
train_change_nominal('bsmt_exposure')
train_change_nominal('bsmtfin_type_1')
train_change_nominal('bsmtfin_type_2')
train_change_nominal('fireplace_qu')
train_change_nominal('garage_type')
train_change_nominal('garage_finish')
train_change_nominal('garage_qual')
train_change_nominal('garage_cond')
train_change_nominal('pool_qc')
train_change_nominal('fence')
train_change_nominal('misc_feature')

Unnamed: 0,id,pid,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod/add,roof_style,roof_matl,exterior_1st,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,1st_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,3ssn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
0,109,533352170,60,RL,,13517,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
1,544,531379050,60,RL,43.0,11492,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
2,153,535304180,20,RL,68.0,7922,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
3,318,916386060,60,RL,73.0,9802,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
4,255,906425045,50,RL,82.0,14235,Pave,,IR1,Lvl,AllPub,Inside,Gtl,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2046,1587,921126030,20,RL,79.0,11449,Pave,,IR1,HLS,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,1Story,8,5,2007,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,TA,Av,GLQ,1011.0,Unf,0.0,873.0,1884.0,GasA,Ex,Y,SBrkr,1728,0,0,1728,1.0,0.0,2,0,3,1,Gd,7,Typ,1,Gd,Attchd,2007.0,Fin,2.0,520.0,TA,TA,Y,0,276,0,0,0,0,,,,0,1,2008,WD,298751
2047,785,905377130,30,RL,,12342,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,1Story,4,5,1940,1950,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,CBlock,TA,TA,No,BLQ,262.0,Unf,0.0,599.0,861.0,GasA,Ex,Y,SBrkr,861,0,0,861,0.0,0.0,1,0,1,1,TA,4,Typ,0,,Detchd,1961.0,Unf,2.0,539.0,TA,TA,Y,158,0,0,0,0,0,,,,0,3,2009,WD,82500
2048,916,909253010,50,RL,57.0,7558,Pave,,Reg,Bnk,AllPub,Inside,Gtl,Crawfor,Norm,Norm,1Fam,1.5Fin,6,6,1928,1950,Gable,CompShg,BrkFace,Stone,,0.0,TA,TA,BrkTil,TA,TA,No,Unf,0.0,Unf,0.0,896.0,896.0,GasA,Gd,Y,SBrkr,1172,741,0,1913,0.0,0.0,1,1,3,1,TA,9,Typ,1,TA,Detchd,1929.0,Unf,2.0,342.0,Fa,Fa,Y,0,0,0,0,0,0,,,,0,3,2009,WD,177000
2049,639,535179160,20,RL,80.0,10400,Pave,,Reg,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,4,5,1956,1956,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,TA,TA,No,Rec,155.0,LwQ,750.0,295.0,1200.0,GasA,TA,Y,SBrkr,1200,0,0,1200,1.0,0.0,1,0,3,1,TA,6,Typ,2,Gd,Attchd,1956.0,Unf,1.0,294.0,TA,TA,Y,0,189,140,0,0,0,,,,0,11,2009,WD,144000


In [364]:
# Use function for testing data
test_change_nominal('alley')
test_change_nominal('mas_vnr_type')
test_change_nominal('bsmt_qual')
test_change_nominal('bsmt_cond')
test_change_nominal('bsmt_exposure')
test_change_nominal('bsmtfin_type_1')
test_change_nominal('bsmtfin_type_2')
test_change_nominal('fireplace_qu')
test_change_nominal('garage_type')
test_change_nominal('garage_finish')
test_change_nominal('garage_qual')
test_change_nominal('garage_cond')
test_change_nominal('pool_qc')
test_change_nominal('fence')
test_change_nominal('misc_feature')

Unnamed: 0,id,pid,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod/add,roof_style,roof_matl,exterior_1st,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,1st_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,3ssn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
0,2658,902301120,190,RM,69.0,9142,Pave,Grvl,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,2fmCon,2Story,6,8,1910,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
1,2718,905108090,90,RL,,9662,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Sawyer,Norm,Norm,Duplex,1Story,5,4,1977,1977,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
2,2414,528218130,60,RL,58.0,17104,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,7,5,2006,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
3,1989,902207150,30,RM,60.0,8520,Pave,,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,1Fam,1Story,5,6,1923,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
4,625,535105100,20,RL,,9500,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,6,5,1963,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
873,1662,527377110,60,RL,80.0,8000,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NWAmes,PosN,Norm,1Fam,2Story,6,6,1974,1974,Gable,CompShg,HdBoard,HdBoard,,0.0,TA,TA,CBlock,TA,TA,No,ALQ,931,LwQ,153,0,1084,GasA,TA,Y,SBrkr,1084,793,0,1877,1,0,2,1,4,1,TA,8,Typ,1,TA,Attchd,1974.0,Unf,2,488,TA,TA,Y,0,96,0,0,0,0,,,,0,11,2007,WD
874,1234,535126140,60,RL,90.0,14670,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,2Story,6,7,1966,1999,Gable,CompShg,VinylSd,VinylSd,BrkFace,410.0,Gd,Gd,CBlock,TA,TA,No,BLQ,575,Unf,0,529,1104,GasA,Ex,Y,SBrkr,1104,884,0,1988,0,0,2,1,4,1,Gd,9,Typ,1,Gd,Attchd,1966.0,RFn,2,480,TA,TA,Y,0,230,0,0,0,0,,MnPrv,,0,8,2008,WD
875,1373,904100040,20,RL,55.0,8250,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Sawyer,Feedr,Norm,1Fam,1Story,5,5,1968,1968,Hip,CompShg,HdBoard,HdBoard,,0.0,TA,TA,CBlock,TA,TA,No,BLQ,250,LwQ,492,210,952,GasA,Ex,Y,SBrkr,1211,0,0,1211,0,0,1,0,3,1,TA,5,Typ,1,TA,Attchd,1968.0,Unf,1,322,TA,TA,Y,0,63,0,0,0,0,,,,0,8,2008,WD
876,1672,527425140,20,RL,60.0,9000,Pave,,Reg,Lvl,AllPub,FR2,Gtl,NAmes,Norm,Norm,1Fam,1Story,4,6,1971,1971,Gable,CompShg,HdBoard,HdBoard,,0.0,TA,TA,PConc,TA,TA,No,ALQ,616,Unf,0,248,864,GasA,TA,Y,SBrkr,864,0,0,864,0,0,1,0,3,1,TA,5,Typ,0,,Detchd,1974.0,Unf,2,528,TA,TA,Y,0,0,0,0,0,0,,GdWo,,0,5,2007,WD


In [365]:
# Created function to change continuous columns with NaN/empty to 0

# Train function
def train_change_continuous(column):
    train[column] = train[column].fillna(0)
    return train

# Test function
def test_change_continuous(column):
    test[column] = test[column].fillna(0)
    return test

In [366]:
# Use function for training data
train_change_continuous('lot_frontage')
train_change_continuous('mas_vnr_area')
train_change_continuous('bsmtfin_sf_1')
train_change_continuous('bsmtfin_sf_2')
train_change_continuous('bsmt_unf_sf')
train_change_continuous('total_bsmt_sf')
train_change_continuous('bsmt_full_bath')
train_change_continuous('bsmt_half_bath')
train_change_continuous('garage_area')

Unnamed: 0,id,pid,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod/add,roof_style,roof_matl,exterior_1st,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,1st_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,3ssn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
0,109,533352170,60,RL,0.0,13517,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
1,544,531379050,60,RL,43.0,11492,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
2,153,535304180,20,RL,68.0,7922,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
3,318,916386060,60,RL,73.0,9802,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
4,255,906425045,50,RL,82.0,14235,Pave,,IR1,Lvl,AllPub,Inside,Gtl,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2046,1587,921126030,20,RL,79.0,11449,Pave,,IR1,HLS,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,1Story,8,5,2007,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,TA,Av,GLQ,1011.0,Unf,0.0,873.0,1884.0,GasA,Ex,Y,SBrkr,1728,0,0,1728,1.0,0.0,2,0,3,1,Gd,7,Typ,1,Gd,Attchd,2007.0,Fin,2.0,520.0,TA,TA,Y,0,276,0,0,0,0,,,,0,1,2008,WD,298751
2047,785,905377130,30,RL,0.0,12342,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,1Story,4,5,1940,1950,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,CBlock,TA,TA,No,BLQ,262.0,Unf,0.0,599.0,861.0,GasA,Ex,Y,SBrkr,861,0,0,861,0.0,0.0,1,0,1,1,TA,4,Typ,0,,Detchd,1961.0,Unf,2.0,539.0,TA,TA,Y,158,0,0,0,0,0,,,,0,3,2009,WD,82500
2048,916,909253010,50,RL,57.0,7558,Pave,,Reg,Bnk,AllPub,Inside,Gtl,Crawfor,Norm,Norm,1Fam,1.5Fin,6,6,1928,1950,Gable,CompShg,BrkFace,Stone,,0.0,TA,TA,BrkTil,TA,TA,No,Unf,0.0,Unf,0.0,896.0,896.0,GasA,Gd,Y,SBrkr,1172,741,0,1913,0.0,0.0,1,1,3,1,TA,9,Typ,1,TA,Detchd,1929.0,Unf,2.0,342.0,Fa,Fa,Y,0,0,0,0,0,0,,,,0,3,2009,WD,177000
2049,639,535179160,20,RL,80.0,10400,Pave,,Reg,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,4,5,1956,1956,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,TA,TA,No,Rec,155.0,LwQ,750.0,295.0,1200.0,GasA,TA,Y,SBrkr,1200,0,0,1200,1.0,0.0,1,0,3,1,TA,6,Typ,2,Gd,Attchd,1956.0,Unf,1.0,294.0,TA,TA,Y,0,189,140,0,0,0,,,,0,11,2009,WD,144000


In [367]:
# Use function for testing data
test_change_continuous('lot_frontage')
test_change_continuous('mas_vnr_area')
test_change_continuous('bsmtfin_sf_1')
test_change_continuous('bsmtfin_sf_2')
test_change_continuous('bsmt_unf_sf')
test_change_continuous('total_bsmt_sf')
test_change_continuous('bsmt_full_bath')
test_change_continuous('bsmt_half_bath')
test_change_continuous('garage_area')

Unnamed: 0,id,pid,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod/add,roof_style,roof_matl,exterior_1st,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,1st_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,3ssn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
0,2658,902301120,190,RM,69.0,9142,Pave,Grvl,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,2fmCon,2Story,6,8,1910,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
1,2718,905108090,90,RL,0.0,9662,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Sawyer,Norm,Norm,Duplex,1Story,5,4,1977,1977,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
2,2414,528218130,60,RL,58.0,17104,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,7,5,2006,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
3,1989,902207150,30,RM,60.0,8520,Pave,,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,1Fam,1Story,5,6,1923,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
4,625,535105100,20,RL,0.0,9500,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,6,5,1963,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
873,1662,527377110,60,RL,80.0,8000,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NWAmes,PosN,Norm,1Fam,2Story,6,6,1974,1974,Gable,CompShg,HdBoard,HdBoard,,0.0,TA,TA,CBlock,TA,TA,No,ALQ,931,LwQ,153,0,1084,GasA,TA,Y,SBrkr,1084,793,0,1877,1,0,2,1,4,1,TA,8,Typ,1,TA,Attchd,1974.0,Unf,2,488,TA,TA,Y,0,96,0,0,0,0,,,,0,11,2007,WD
874,1234,535126140,60,RL,90.0,14670,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,2Story,6,7,1966,1999,Gable,CompShg,VinylSd,VinylSd,BrkFace,410.0,Gd,Gd,CBlock,TA,TA,No,BLQ,575,Unf,0,529,1104,GasA,Ex,Y,SBrkr,1104,884,0,1988,0,0,2,1,4,1,Gd,9,Typ,1,Gd,Attchd,1966.0,RFn,2,480,TA,TA,Y,0,230,0,0,0,0,,MnPrv,,0,8,2008,WD
875,1373,904100040,20,RL,55.0,8250,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Sawyer,Feedr,Norm,1Fam,1Story,5,5,1968,1968,Hip,CompShg,HdBoard,HdBoard,,0.0,TA,TA,CBlock,TA,TA,No,BLQ,250,LwQ,492,210,952,GasA,Ex,Y,SBrkr,1211,0,0,1211,0,0,1,0,3,1,TA,5,Typ,1,TA,Attchd,1968.0,Unf,1,322,TA,TA,Y,0,63,0,0,0,0,,,,0,8,2008,WD
876,1672,527425140,20,RL,60.0,9000,Pave,,Reg,Lvl,AllPub,FR2,Gtl,NAmes,Norm,Norm,1Fam,1Story,4,6,1971,1971,Gable,CompShg,HdBoard,HdBoard,,0.0,TA,TA,PConc,TA,TA,No,ALQ,616,Unf,0,248,864,GasA,TA,Y,SBrkr,864,0,0,864,0,0,1,0,3,1,TA,5,Typ,0,,Detchd,1974.0,Unf,2,528,TA,TA,Y,0,0,0,0,0,0,,GdWo,,0,5,2007,WD


In [368]:
# Created function to map ordinal columns with numeric values

# Train function
def train_convert_column(column):
    train[column] = train[column].replace(np.nan, 'NA')
    mapping = {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}
    train[column] = train[column].map(mapping)
    return train

# Test function
def test_convert_column(column):
    test[column] = test[column].replace(np.nan, 'NA')
    mapping = {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}
    test[column] = test[column].map(mapping)
    return test

In [369]:
# Use function for training data
train_convert_column('exter_qual')
train_convert_column('exter_cond')
train_convert_column('bsmt_qual')
train_convert_column('bsmt_cond')
train_convert_column('heating_qc')
train_convert_column('kitchen_qual')
train_convert_column('garage_qual')
train_convert_column('garage_cond')
train_convert_column('pool_qc')

Unnamed: 0,id,pid,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod/add,roof_style,roof_matl,exterior_1st,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,1st_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,3ssn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
0,109,533352170,60,RL,0.0,13517,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,4,3,CBlock,3,3,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,5,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,4,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,3,3,Y,0,44,0,0,0,0,0,,,0,3,2010,WD,130500
1,544,531379050,60,RL,43.0,11492,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,4,3,PConc,4,3,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,5,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,4,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,3,3,Y,0,74,0,0,0,0,0,,,0,4,2009,WD,220000
2,153,535304180,20,RL,68.0,7922,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,3,4,CBlock,3,3,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,3,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,4,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,3,3,Y,0,52,0,0,0,0,0,,,0,1,2010,WD,109000
3,318,916386060,60,RL,73.0,9802,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,3,3,PConc,4,3,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,4,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,3,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,3,3,Y,100,0,0,0,0,0,0,,,0,4,2010,WD,174000
4,255,906425045,50,RL,82.0,14235,Pave,,IR1,Lvl,AllPub,Inside,Gtl,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,3,3,PConc,2,4,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,3,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,3,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,3,3,N,0,59,0,0,0,0,0,,,0,3,2010,WD,138500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2046,1587,921126030,20,RL,79.0,11449,Pave,,IR1,HLS,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,1Story,8,5,2007,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,4,3,PConc,4,3,Av,GLQ,1011.0,Unf,0.0,873.0,1884.0,GasA,5,Y,SBrkr,1728,0,0,1728,1.0,0.0,2,0,3,1,4,7,Typ,1,Gd,Attchd,2007.0,Fin,2.0,520.0,3,3,Y,0,276,0,0,0,0,0,,,0,1,2008,WD,298751
2047,785,905377130,30,RL,0.0,12342,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Edwards,Norm,Norm,1Fam,1Story,4,5,1940,1950,Gable,CompShg,VinylSd,VinylSd,,0.0,3,3,CBlock,3,3,No,BLQ,262.0,Unf,0.0,599.0,861.0,GasA,5,Y,SBrkr,861,0,0,861,0.0,0.0,1,0,1,1,3,4,Typ,0,,Detchd,1961.0,Unf,2.0,539.0,3,3,Y,158,0,0,0,0,0,0,,,0,3,2009,WD,82500
2048,916,909253010,50,RL,57.0,7558,Pave,,Reg,Bnk,AllPub,Inside,Gtl,Crawfor,Norm,Norm,1Fam,1.5Fin,6,6,1928,1950,Gable,CompShg,BrkFace,Stone,,0.0,3,3,BrkTil,3,3,No,Unf,0.0,Unf,0.0,896.0,896.0,GasA,4,Y,SBrkr,1172,741,0,1913,0.0,0.0,1,1,3,1,3,9,Typ,1,TA,Detchd,1929.0,Unf,2.0,342.0,2,2,Y,0,0,0,0,0,0,0,,,0,3,2009,WD,177000
2049,639,535179160,20,RL,80.0,10400,Pave,,Reg,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,4,5,1956,1956,Gable,CompShg,Plywood,Plywood,,0.0,3,3,CBlock,3,3,No,Rec,155.0,LwQ,750.0,295.0,1200.0,GasA,3,Y,SBrkr,1200,0,0,1200,1.0,0.0,1,0,3,1,3,6,Typ,2,Gd,Attchd,1956.0,Unf,1.0,294.0,3,3,Y,0,189,140,0,0,0,0,,,0,11,2009,WD,144000


In [370]:
# Use function for testing data
test_convert_column('exter_qual')
test_convert_column('exter_cond')
test_convert_column('bsmt_qual')
test_convert_column('bsmt_cond')
test_convert_column('heating_qc')
test_convert_column('kitchen_qual')
test_convert_column('garage_qual')
test_convert_column('garage_cond')
test_convert_column('pool_qc')

Unnamed: 0,id,pid,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod/add,roof_style,roof_matl,exterior_1st,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,1st_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,3ssn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
0,2658,902301120,190,RM,69.0,9142,Pave,Grvl,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,2fmCon,2Story,6,8,1910,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,3,2,Stone,2,3,No,Unf,0,Unf,0,1020,1020,GasA,4,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,2,9,Typ,0,,Detchd,1910.0,Unf,1,440,1,1,Y,0,60,112,0,0,0,0,,,0,4,2006,WD
1,2718,905108090,90,RL,0.0,9662,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Sawyer,Norm,Norm,Duplex,1Story,5,4,1977,1977,Gable,CompShg,Plywood,Plywood,,0.0,3,3,CBlock,4,3,No,Unf,0,Unf,0,1967,1967,GasA,3,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,3,10,Typ,0,,Attchd,1977.0,Fin,2,580,3,3,Y,170,0,0,0,0,0,0,,,0,8,2006,WD
2,2414,528218130,60,RL,58.0,17104,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,7,5,2006,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,4,3,PConc,4,4,Av,GLQ,554,Unf,0,100,654,GasA,5,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,4,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,3,3,Y,100,24,0,0,0,0,0,,,0,9,2006,New
3,1989,902207150,30,RM,60.0,8520,Pave,,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,1Fam,1Story,5,6,1923,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,4,3,CBlock,3,3,No,Unf,0,Unf,0,968,968,GasA,3,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,3,5,Typ,0,,Detchd,1935.0,Unf,2,480,2,3,N,0,0,184,0,0,0,0,,,0,7,2007,WD
4,625,535105100,20,RL,0.0,9500,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,6,5,1963,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,3,3,CBlock,4,3,No,BLQ,609,Unf,0,785,1394,GasA,4,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,3,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,3,3,Y,0,76,0,0,185,0,0,,,0,7,2009,WD
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
873,1662,527377110,60,RL,80.0,8000,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NWAmes,PosN,Norm,1Fam,2Story,6,6,1974,1974,Gable,CompShg,HdBoard,HdBoard,,0.0,3,3,CBlock,3,3,No,ALQ,931,LwQ,153,0,1084,GasA,3,Y,SBrkr,1084,793,0,1877,1,0,2,1,4,1,3,8,Typ,1,TA,Attchd,1974.0,Unf,2,488,3,3,Y,0,96,0,0,0,0,0,,,0,11,2007,WD
874,1234,535126140,60,RL,90.0,14670,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,2Story,6,7,1966,1999,Gable,CompShg,VinylSd,VinylSd,BrkFace,410.0,4,4,CBlock,3,3,No,BLQ,575,Unf,0,529,1104,GasA,5,Y,SBrkr,1104,884,0,1988,0,0,2,1,4,1,4,9,Typ,1,Gd,Attchd,1966.0,RFn,2,480,3,3,Y,0,230,0,0,0,0,0,MnPrv,,0,8,2008,WD
875,1373,904100040,20,RL,55.0,8250,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Sawyer,Feedr,Norm,1Fam,1Story,5,5,1968,1968,Hip,CompShg,HdBoard,HdBoard,,0.0,3,3,CBlock,3,3,No,BLQ,250,LwQ,492,210,952,GasA,5,Y,SBrkr,1211,0,0,1211,0,0,1,0,3,1,3,5,Typ,1,TA,Attchd,1968.0,Unf,1,322,3,3,Y,0,63,0,0,0,0,0,,,0,8,2008,WD
876,1672,527425140,20,RL,60.0,9000,Pave,,Reg,Lvl,AllPub,FR2,Gtl,NAmes,Norm,Norm,1Fam,1Story,4,6,1971,1971,Gable,CompShg,HdBoard,HdBoard,,0.0,3,3,PConc,3,3,No,ALQ,616,Unf,0,248,864,GasA,3,Y,SBrkr,864,0,0,864,0,0,1,0,3,1,3,5,Typ,0,,Detchd,1974.0,Unf,2,528,3,3,Y,0,0,0,0,0,0,0,GdWo,,0,5,2007,WD


In [371]:
# Create function to map bsmtfin

# Train function
def train_map_type(column):
    mapping = {'NA': 0, 'Unf': 1, 'LwQ': 2, 'Rec': 3, 'BLQ': 4, 'ALQ': 5, 'GLQ': 6}
    train[column] = train[column].map(mapping)
    return train

# Test function
def test_map_type(column):
    mapping = {'NA': 0, 'Unf': 1, 'LwQ': 2, 'Rec': 3, 'BLQ': 4, 'ALQ': 5, 'GLQ': 6}
    test[column] = test[column].map(mapping)
    return test

In [372]:
# Use function on training data
train_map_type('bsmtfin_type_1')
train_map_type('bsmtfin_type_2')

# Use function on testing data
test_map_type('bsmtfin_type_1')
test_map_type('bsmtfin_type_2')

Unnamed: 0,id,pid,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod/add,roof_style,roof_matl,exterior_1st,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,1st_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,3ssn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
0,2658,902301120,190,RM,69.0,9142,Pave,Grvl,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,2fmCon,2Story,6,8,1910,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,3,2,Stone,2,3,No,1,0,1,0,1020,1020,GasA,4,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,2,9,Typ,0,,Detchd,1910.0,Unf,1,440,1,1,Y,0,60,112,0,0,0,0,,,0,4,2006,WD
1,2718,905108090,90,RL,0.0,9662,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Sawyer,Norm,Norm,Duplex,1Story,5,4,1977,1977,Gable,CompShg,Plywood,Plywood,,0.0,3,3,CBlock,4,3,No,1,0,1,0,1967,1967,GasA,3,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,3,10,Typ,0,,Attchd,1977.0,Fin,2,580,3,3,Y,170,0,0,0,0,0,0,,,0,8,2006,WD
2,2414,528218130,60,RL,58.0,17104,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,7,5,2006,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,4,3,PConc,4,4,Av,6,554,1,0,100,654,GasA,5,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,4,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,3,3,Y,100,24,0,0,0,0,0,,,0,9,2006,New
3,1989,902207150,30,RM,60.0,8520,Pave,,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,1Fam,1Story,5,6,1923,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,4,3,CBlock,3,3,No,1,0,1,0,968,968,GasA,3,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,3,5,Typ,0,,Detchd,1935.0,Unf,2,480,2,3,N,0,0,184,0,0,0,0,,,0,7,2007,WD
4,625,535105100,20,RL,0.0,9500,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,6,5,1963,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,3,3,CBlock,4,3,No,4,609,1,0,785,1394,GasA,4,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,3,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,3,3,Y,0,76,0,0,185,0,0,,,0,7,2009,WD
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
873,1662,527377110,60,RL,80.0,8000,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NWAmes,PosN,Norm,1Fam,2Story,6,6,1974,1974,Gable,CompShg,HdBoard,HdBoard,,0.0,3,3,CBlock,3,3,No,5,931,2,153,0,1084,GasA,3,Y,SBrkr,1084,793,0,1877,1,0,2,1,4,1,3,8,Typ,1,TA,Attchd,1974.0,Unf,2,488,3,3,Y,0,96,0,0,0,0,0,,,0,11,2007,WD
874,1234,535126140,60,RL,90.0,14670,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,2Story,6,7,1966,1999,Gable,CompShg,VinylSd,VinylSd,BrkFace,410.0,4,4,CBlock,3,3,No,4,575,1,0,529,1104,GasA,5,Y,SBrkr,1104,884,0,1988,0,0,2,1,4,1,4,9,Typ,1,Gd,Attchd,1966.0,RFn,2,480,3,3,Y,0,230,0,0,0,0,0,MnPrv,,0,8,2008,WD
875,1373,904100040,20,RL,55.0,8250,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Sawyer,Feedr,Norm,1Fam,1Story,5,5,1968,1968,Hip,CompShg,HdBoard,HdBoard,,0.0,3,3,CBlock,3,3,No,4,250,2,492,210,952,GasA,5,Y,SBrkr,1211,0,0,1211,0,0,1,0,3,1,3,5,Typ,1,TA,Attchd,1968.0,Unf,1,322,3,3,Y,0,63,0,0,0,0,0,,,0,8,2008,WD
876,1672,527425140,20,RL,60.0,9000,Pave,,Reg,Lvl,AllPub,FR2,Gtl,NAmes,Norm,Norm,1Fam,1Story,4,6,1971,1971,Gable,CompShg,HdBoard,HdBoard,,0.0,3,3,PConc,3,3,No,5,616,1,0,248,864,GasA,3,Y,SBrkr,864,0,0,864,0,0,1,0,3,1,3,5,Typ,0,,Detchd,1974.0,Unf,2,528,3,3,Y,0,0,0,0,0,0,0,GdWo,,0,5,2007,WD


## Manual Mapping

In [373]:
# Map garage_finish to numerical values
# Define mapping
garage_finish_mapping = {'NA': 0, 'Unf': 1, 'RFn': 2, 'Fin': 3}

# Map for train data
train['garage_finish'] = train['garage_finish'].map(garage_finish_mapping)

# Map for test data
test['garage_finish'] = test['garage_finish'].map(garage_finish_mapping)

In [374]:
# Map lot_shape to numerical values
# Define mapping
lot_shape_mapping = {'IR3': 0, 'IR2': 1, 'IR1': 2, 'Reg': 3}

# Map for train data
train['lot_shape'] = train['lot_shape'].map(lot_shape_mapping)

# Map for test data
test['lot_shape'] = test['lot_shape'].map(lot_shape_mapping)

In [375]:
# Map utilities to numerical values
# Define mapping
utilities_mapping = {'ELO': 0, 'NoSeWa': 1, 'NoSewr': 2, 'AllPub': 3}

# Map for train data
train['utilities'] = train['utilities'].map(utilities_mapping)

# Map for test data
test['utilities'] = test['utilities'].map(utilities_mapping)

In [376]:
# Map paved_drive to numerical values
# Define mapping
paved_drive_mapping = {'N': 0, 'P': 1, 'Y': 2}

# Map for train data
train['paved_drive'] = train['paved_drive'].map(paved_drive_mapping)

# Map for test data
test['paved_drive'] = test['paved_drive'].map(paved_drive_mapping)

In [377]:
# Map functional to numerical values
# Define mapping
functional_mapping = {'Sal': 0, 'Sev': 1, 'Maj2': 2, 'Maj1': 3, 'Mod': 4, 'Min2': 5, 'Min1': 6, 'Typ': 7}

# Map for train data
train['functional'] = train['functional'].map(functional_mapping)

# Map for test data
test['functional'] = test['functional'].map(functional_mapping)

In [378]:
train['ms_zoning'].unique()
# Map ms_zoning to data dictionary values
# Define mapping
ms_zoning_mapping = {'RM': 'RM', 'RP': 'RP', 'RL': 'RL', 'RH': 'RH', 'I (all)': 'I', 'FV': 'FV', 'C (all)': 'C', 'A (agr)': 'A'}

# Map for train data
train['ms_zoning'] = train['ms_zoning'].map(ms_zoning_mapping)

# Map for test data
test['ms_zoning'] = test['ms_zoning'].map(ms_zoning_mapping)

In [379]:
# Map electrical to numerical values and fix test NaN
# Define mapping
electrical_mapping = {'Mix': 0, 'FuseP': 1, 'FuseF': 2, 'FuseA': 3, 'SBrkr': 4}

# Map for train data
train['electrical'] = train['electrical'].map(electrical_mapping)

# Replace NaN
test.loc[:, 'electrical'] = test['electrical'].replace(np.nan, 'Mix')

# Map for test data
test['electrical'] = test['electrical'].map(electrical_mapping)

## Find Correlations and Drop Columns

In [380]:
train.corr()

  train.corr()


Unnamed: 0,id,pid,ms_subclass,lot_frontage,lot_area,lot_shape,utilities,overall_qual,overall_cond,year_built,year_remod/add,mas_vnr_area,exter_qual,exter_cond,bsmt_qual,bsmt_cond,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating_qc,electrical,1st_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,3ssn_porch,screen_porch,pool_area,pool_qc,misc_val,mo_sold,yr_sold,saleprice
id,1.0,0.175793,0.026543,-0.013538,0.032872,-0.011705,0.018313,-0.061483,-0.026096,-0.064444,-0.09004,-0.035808,-0.071633,-0.017571,-0.04504,0.008009,-0.067089,-0.045794,-0.008976,0.000745,0.008649,-0.038115,-0.029102,-0.054595,-0.02265,-0.008388,0.011719,-0.023881,-0.033323,0.014396,-0.059086,-0.042054,0.010605,-0.0024,-0.065466,-0.009092,-0.020809,-0.03756,-0.05073,-0.066017,-0.048666,-0.046105,-0.04645,-0.045039,-0.062328,-0.009045,0.001382,0.033747,-0.022791,0.009758,0.055696,0.05072,-0.012683,0.127723,-0.975747,-0.051398
pid,0.175793,1.0,-0.003632,-0.038402,0.024135,0.09396,-0.031622,-0.265863,0.106861,-0.347039,-0.176666,-0.242482,-0.22626,0.039832,-0.198674,-0.1092,-0.11221,-0.086951,-0.020547,-0.013283,-0.111835,-0.204623,-0.103471,-0.14757,-0.145862,-0.005598,0.072268,-0.112936,-0.031341,-0.002195,-0.17937,-0.163975,0.009122,0.068416,-0.194674,-0.084999,-0.101363,-0.12071,-0.251257,-0.274924,-0.228368,-0.201717,-0.174859,-0.169897,-0.231093,-0.041221,-0.081129,0.150179,-0.024679,-0.04203,0.005825,0.021364,0.004223,-0.032735,0.008476,-0.255052
ms_subclass,0.026543,-0.003632,1.0,-0.216259,-0.245484,0.075306,0.023312,0.035763,-0.070141,0.035983,0.044836,-0.002763,0.017275,-0.057094,0.055981,-0.019888,0.06071,-0.060425,-0.034799,-0.068129,-0.139292,-0.2247,-0.024195,0.005294,-0.246212,0.305771,0.031091,0.06821,0.025727,-0.01703,0.142087,0.179404,-0.003516,0.252111,-0.019533,0.03449,-5.3e-05,-0.055118,0.084954,-0.033059,-0.049148,-0.108423,-0.097814,-0.114905,-0.019489,0.001622,-0.020289,-0.039842,-0.030088,-0.038819,-0.004585,-0.009025,-0.027485,0.013027,-0.03287,-0.087335
lot_frontage,-0.013538,-0.038402,-0.216259,1.0,0.135586,0.174018,0.019333,0.114469,-0.032452,0.020571,0.058942,0.101672,0.100055,-0.038746,0.04279,-0.001906,-0.047541,0.056742,-7.3e-05,0.001322,0.147324,0.204375,0.062938,-0.006732,0.230839,-0.014954,0.011487,0.17132,0.00985,-0.018886,0.055042,-0.031017,0.09745,0.015079,0.106405,0.185083,-0.004607,0.038556,0.026831,0.06798,0.118143,0.17379,0.000419,-0.001781,-0.036746,0.011388,0.095295,0.009207,0.008076,0.036598,0.092288,0.102806,0.026066,-0.006382,0.007713,0.181456
lot_area,0.032872,0.024135,-0.245484,0.135586,1.0,-0.301763,-0.029802,0.105824,-0.019185,0.036002,0.050771,0.16752,0.08922,0.019617,0.060759,-0.0109,0.04253,0.215648,0.020662,0.041799,0.041544,0.277478,0.0221,0.043049,0.381593,0.029398,0.001273,0.327427,0.113283,0.029157,0.125601,0.049995,0.13906,-0.013484,0.119339,0.238724,-0.08636,0.289467,0.004882,0.124476,0.214954,0.263145,0.101098,0.091729,-0.001757,0.155623,0.140864,0.014139,0.019553,0.067714,0.115102,0.1292,0.093922,0.003197,-0.029454,0.296566
lot_shape,-0.011705,0.09396,0.075306,0.174018,-0.301763,1.0,0.013735,-0.249357,0.077207,-0.277834,-0.221336,-0.127835,-0.237697,0.004938,-0.231861,-0.083708,-0.154318,-0.18227,-0.028509,-0.034029,-0.011454,-0.210649,-0.160437,-0.117636,-0.221779,-0.070199,0.015246,-0.233844,-0.087207,-0.046049,-0.198203,-0.136468,-0.023694,0.092632,-0.202014,-0.133364,-0.024128,-0.199743,-0.239132,-0.275456,-0.228951,-0.216134,-0.143612,-0.131344,-0.151075,-0.149512,-0.084296,0.090935,-0.028078,-0.069171,-0.03959,-0.052713,-0.049819,-0.005869,0.03483,-0.294542
utilities,0.018313,-0.031622,0.023312,0.019333,-0.029802,0.013735,1.0,0.030044,0.006142,0.029184,0.040666,-0.044561,0.037284,0.033331,0.04899,0.048162,-0.006072,0.02366,-0.02265,-0.033796,0.022079,0.033701,0.015122,0.065217,-0.000113,0.022935,0.0032,0.019745,0.024229,-0.071085,0.031156,0.021953,0.018296,0.006063,-0.006652,0.008282,0.007649,-0.003516,0.018036,0.00186,0.004263,0.00413,-0.008096,-0.007852,0.027739,0.012492,-0.01737,0.011181,0.003045,-0.071728,0.001881,0.001896,0.002666,0.049178,-0.027663,0.026404
overall_qual,-0.061483,-0.265863,0.035763,0.114469,0.105824,-0.249357,0.030044,1.0,-0.08277,0.602964,0.584654,0.430041,0.740257,0.020425,0.654071,0.299712,0.291901,0.279223,-0.00221,-0.027973,0.276437,0.549407,0.475555,0.271378,0.477136,0.228152,-0.052338,0.566701,0.175896,-0.047006,0.51508,0.274859,0.053373,-0.170964,0.690639,0.382025,0.181719,0.38892,0.574553,0.554583,0.587423,0.563904,0.299361,0.285909,0.323754,0.257081,0.308855,-0.154554,0.031938,0.048752,0.006558,0.019568,0.022099,0.019242,-0.011578,0.800207
overall_cond,-0.026096,0.106861,-0.070141,-0.032452,-0.019185,0.077207,0.006142,-0.08277,1.0,-0.370988,0.042614,-0.131908,-0.156454,0.412351,-0.168001,0.117757,-0.008308,-0.046348,0.102557,0.047605,-0.131225,-0.159856,0.002426,0.091198,-0.150938,0.010912,0.004753,-0.109804,-0.040107,0.099918,-0.219189,-0.093266,-0.009908,-0.095725,-0.048743,-0.093576,0.129261,-0.006463,-0.331765,-0.167377,-0.168513,-0.138174,0.036198,0.043593,-0.013463,0.011034,-0.052266,0.10832,0.026907,0.047359,-0.005806,-0.008272,0.014269,-0.003144,0.047664,-0.097019
year_built,-0.064444,-0.347039,0.035983,0.020571,0.036002,-0.277834,0.029184,0.602964,-0.370988,1.0,0.629116,0.32078,0.616441,-0.084759,0.622317,0.194274,0.37576,0.275728,-0.008411,-0.020906,0.137114,0.410605,0.463758,0.333892,0.323315,0.022313,-0.159403,0.258838,0.21559,-0.031299,0.480169,0.283207,-0.042149,-0.127162,0.53723,0.137783,0.158502,0.168848,0.825316,0.598993,0.542544,0.488023,0.300659,0.284707,0.446905,0.216339,0.207798,-0.380082,0.016104,-0.037866,0.003728,0.008823,0.000626,-0.007083,-0.003559,0.571849


In [381]:
# Find correlations to reduce multicollinearity
garage_corr = train[['garage_area', 'garage_cars']].corr()
print(garage_corr)
area_corr = train[['total_bsmt_sf', '1st_flr_sf', 'gr_liv_area']].corr()
print(area_corr)

             garage_area  garage_cars
garage_area      1.00000      0.89318
garage_cars      0.89318      1.00000
               total_bsmt_sf  1st_flr_sf  gr_liv_area
total_bsmt_sf       1.000000    0.808351     0.454245
1st_flr_sf          0.808351    1.000000     0.562441
gr_liv_area         0.454245    0.562441     1.000000


In [382]:
# Dropping garage_cars to reduce multicollinarity

# Drop for train data
train.drop(columns=['garage_cars'], inplace=True)

# Drop for test data
test.drop(columns=['garage_cars'], inplace=True)

In [383]:
# According to the data dictionary, year_remod/add is same as construction
# date if no remodeling so dropping garage_yr_blt to avoid redudancy

# Drop for train data
train.drop(columns=['garage_yr_blt'], inplace=True)

# Drop for test data
test.drop(columns=['garage_yr_blt'], inplace=True)

In [384]:
# Dropping PID because when submitting/creating model, we will be using ID to represent the house

# Drop for train data
train.drop(columns=['pid'], inplace=True)

# Drop for test data
test.drop(columns=['pid'], inplace=True)

In [385]:
train.head()

Unnamed: 0,id,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod/add,roof_style,roof_matl,exterior_1st,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,1st_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_finish,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,3ssn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
0,109,60,RL,0.0,13517,Pave,,2,Lvl,3,CulDSac,Gtl,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,4,3,CBlock,3,3,No,6,533.0,1,0.0,192.0,725.0,GasA,5,Y,4,725,754,0,1479,0.0,0.0,2,1,3,1,4,6,7,0,,Attchd,2,475.0,3,3,2,0,44,0,0,0,0,0,,,0,3,2010,WD,130500
1,544,60,RL,43.0,11492,Pave,,2,Lvl,3,CulDSac,Gtl,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,4,3,PConc,4,3,No,6,637.0,1,0.0,276.0,913.0,GasA,5,Y,4,913,1209,0,2122,1.0,0.0,2,1,4,1,4,8,7,1,TA,Attchd,2,559.0,3,3,2,0,74,0,0,0,0,0,,,0,4,2009,WD,220000
2,153,20,RL,68.0,7922,Pave,,3,Lvl,3,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,3,4,CBlock,3,3,No,6,731.0,1,0.0,326.0,1057.0,GasA,3,Y,4,1057,0,0,1057,1.0,0.0,1,0,3,1,4,5,7,0,,Detchd,1,246.0,3,3,2,0,52,0,0,0,0,0,,,0,1,2010,WD,109000
3,318,60,RL,73.0,9802,Pave,,3,Lvl,3,Inside,Gtl,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,3,3,PConc,4,3,No,1,0.0,1,0.0,384.0,384.0,GasA,4,Y,4,744,700,0,1444,0.0,0.0,2,1,3,1,3,7,7,0,,BuiltIn,3,400.0,3,3,2,100,0,0,0,0,0,0,,,0,4,2010,WD,174000
4,255,50,RL,82.0,14235,Pave,,2,Lvl,3,Inside,Gtl,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,3,3,PConc,2,4,No,1,0.0,1,0.0,676.0,676.0,GasA,3,Y,4,831,614,0,1445,0.0,0.0,2,0,3,1,3,6,7,0,,Detchd,1,484.0,3,3,0,0,59,0,0,0,0,0,,,0,3,2010,WD,138500


# Save Clean Data As CSV Files

In [386]:
# Save Train Data
train.to_csv('./clean_datasets/finish_clean_training.csv', index=False)

# Save Test Data
test.to_csv('./clean_datasets/finish_clean_testing.csv', index=False)

# Load Clean CSV File

In [387]:
# Load Clean Training Data
train_cleaned = pd.read_csv('./clean_datasets/finish_clean_training.csv')

# Load Clean Testing Data
test_cleaned = pd.read_csv('./clean_datasets/finish_clean_testing.csv')

# Dummify Columns for Train and Test

## Get Dummies On All Nominal Columns

In [388]:
# Train Get Dummies
train_cleaned = pd.get_dummies(columns=['ms_subclass',
                                        'ms_zoning',
                                        'street',
                                        'alley',
                                        'land_contour',
                                        'lot_config',
                                        'neighborhood',
                                        'condition_1',
                                        'condition_2',
                                        'bldg_type',
                                        'house_style',
                                        'roof_style',
                                        'roof_matl',
                                        'exterior_1st',
                                        'exterior_2nd',
                                        'mas_vnr_type',
                                        'foundation',
                                        'heating',
                                        'central_air',
                                        'garage_type',
                                        'misc_feature',
                                        'sale_type'], drop_first=True, data=train_cleaned)

In [389]:
# Test Get Dummies
test_cleaned = pd.get_dummies(columns=['ms_subclass',
                                        'ms_zoning',
                                        'street',
                                        'alley',
                                        'land_contour',
                                        'lot_config',
                                        'neighborhood',
                                        'condition_1',
                                        'condition_2',
                                        'bldg_type',
                                        'house_style',
                                        'roof_style',
                                        'roof_matl',
                                        'exterior_1st',
                                        'exterior_2nd',
                                        'mas_vnr_type',
                                        'foundation',
                                        'heating',
                                        'central_air',
                                        'garage_type',
                                        'misc_feature',
                                        'sale_type'], drop_first=True, data=test_cleaned)

# Save Data With Dummies as CSV File

In [392]:
# Save Train Data
train_cleaned.to_csv('./clean_datasets/finish_clean_training_dummified.csv', index=False)

# Save Test Data
test_cleaned.to_csv('./clean_datasets/finish_clean_testing_dummified.csv', index=False)