Welcome to the Logistic regression model for CS4650 Big Data, Analysis, and Cloud Computing.

This model is using a dataset from the Kaggle Competition:

`Housing Prices - Advanced Regression Techniques`

The dataset includes these features and their descriptions:

---

    1. MSSubClass: Identifies the type of dwelling involved in the sale. (categorical) 
    
    2. MSZoning: Identifies the general zoning classification of the sale. (categorical)
    
    3. LotFrontage: Linear feet of street connected to property (numeric)
    
    4. LotArea: Lot size in square feet (numeric)
    
    5. Street: Type of road access to property (categorical)
    
    6. Alley: Type of alley access to property (categorical)
    
    7. LotShape: General shape of property (categorical)
    
    8. LandContour: Flatness of the property (categorical)
    
    9. Utilities: Type of utilities available (categorical)
    
    10. LotConfig: Lot configuration (categorical)
    
    11. LandSlope: Slope of property (categorical)
    
    12. Neighborhood: Physical locations within Ames city limits (categorical)
    
    13. Condition1: Proximity to various conditions (categorical)
    
    14. Condition2: Proximity to various conditions (if more than one is present) (categorical)
    
    15. BldgType: Type of dwelling (categorical)
    
    16. HouseStyle: Style of dwelling (categorical)
    
    17. OverallQual: Rates the overall material and finish of the house (categorical)
    
    18. OverallCond: Rates the overall condition of the house (categorical)
    
    19. YearBuilt: Original construction date (numeric)
    
    20. YearRemodAdd: Remodel date (same as construction date if no remodeling or additions) (numeric)
    
    21. RoofStyle: Type of roof (categorical)
    
    22. RoofMatl: Roof material (categorical)
    
    23. Exterior1st: Exterior covering on house (categorical)
    
    24. Exterior2nd: Exterior covering on house (if more than one material) (categorical)
    
    25. MasVnrType: Masonry veneer type (categorical)
    
    26. MasVnrArea: Masonry veneer area in square feet (numeric)
    
    27. ExterQual: Evaluates the quality of the material on the exterior (categorical)
    
    28. ExterCond: Evaluates the present condition of the material on the exterior (categorical)
    
    29. Foundation: Type of foundation (categorical)
    
    30. BsmtQual: Evaluates the height of the basement (categorical)
    
    31. BsmtCond: Evaluates the general condition of the basement (categorical)
    
    32. BsmtExposure: Refers to walkout or garden level walls (categorical)
    
    33. BsmtFinType1: Rating of basement finished area (categorical)
    
    34. BsmtFinSF1: Type 1 finished square feet (numeric)

    35. BsmtFinType2: Rating of basement finished area (if multiple types) (categorical)
    
    36. BsmtFinSF2: Type 2 finished square feet (numeric)
    
    37. BsmtUnfSF: Unfinished square feet of basement area (numeric)
    
    38. TotalBsmtSF: Total square feet of basement area (numeric)

    39. Heating: Type of heating (categorical)
    
    40. HeatingQC: Heating quality and condition (categorical)
    
    41. CentralAir: Central air conditioning (categorical)
    
    42. Electrical: Electrical system (categorical)
    
    43. 1stFlrSF: First Floor square feet (numeric)
    
    44. 2ndFlrSF: Second floor square feet (numeric)
    
    45. LowQualFinSF: Low quality finished square feet (all floors) (numeric)
    
    46. GrLivArea: Above grade (ground) living area square feet (numeric)
    
    47. BsmtFullBath: Basement full bathrooms (numeric)
    
    48. BsmtHalfBath: Basement half bathrooms (numeric)
    
    49. FullBath: Full bathrooms above grade (numeric)
    
    50. HalfBath: Half baths above grade (numeric)
    
    51. Bedroom: Bedrooms above grade (does NOT include basement bedrooms) (numeric)
    
    52. Kitchen: Kitchens above grade (numeric)
    
    53. KitchenQual: Kitchen quality (categorical)
    
    54. TotRmsAbvGrd: Total rooms above grade (does not include bathrooms) (numeric)
    
    55. Functional: Home functionality (Assume typical unless deductions are warranted)
    
    56. Fireplaces: Number of fireplaces (numeric)
    
    57. FireplaceQu: Fireplace quality (categorical)
    
    58. GarageType: Garage location (categorical)
    
    59. GarageYrBlt: Year garage was built (numeric)
    
    60. GarageFinish: Interior finish of the garage (categorical)
    
    61. GarageCars: Size of garage in car capacity (numeric)
    
    62. GarageArea: Size of garage in square feet (numeric)
    
    63. GarageQual: Garage quality (categorical)
    
    64. GarageCond: Garage condition (categorical)
    
    65. PavedDrive: Paved driveway (categorical)
    
    66. WoodDeckSF: Wood deck area in square feet (numeric)
    
    67. OpenPorchSF: Open porch area in square feet (numeric)
    
    68. EnclosedPorch: Enclosed porch area in square feet (numeric)
    
    69. 3SsnPorch: Three season porch area in square feet (numeric)
    
    70. ScreenPorch: Screen porch area in square feet (numeric)
    
    71. PoolArea: Pool area in square feet (numeric)
    
    72. PoolQC: Pool quality (categorical)
    
    73. Fence: Fence quality (categorical)
    
    74. MiscFeature: Miscellaneous feature not covered in other categories (categorical)
    
    75. MiscVal: $Value of miscellaneous feature (numeric)
    
    76. MoSold: Month Sold (MM) (numeric)
    
    77. YrSold: Year Sold (YYYY) (numeric)
    
    78. SaleType: Type of sale (categorical)
    
    79. SaleCondition: Condition of sale (categorical)


---    







In [72]:
import pandas as pd 
import numpy as np
pd.set_option('display.max_columns', None)
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [73]:
df_training= pd.read_csv("./train.csv")
df_testing = pd.read_csv("./test.csv")

In [74]:
print(df_training.columns.values)

['Id' 'MSSubClass' 'MSZoning' 'LotFrontage' 'LotArea' 'Street' 'Alley'
 'LotShape' 'LandContour' 'Utilities' 'LotConfig' 'LandSlope'
 'Neighborhood' 'Condition1' 'Condition2' 'BldgType' 'HouseStyle'
 'OverallQual' 'OverallCond' 'YearBuilt' 'YearRemodAdd' 'RoofStyle'
 'RoofMatl' 'Exterior1st' 'Exterior2nd' 'MasVnrType' 'MasVnrArea'
 'ExterQual' 'ExterCond' 'Foundation' 'BsmtQual' 'BsmtCond' 'BsmtExposure'
 'BsmtFinType1' 'BsmtFinSF1' 'BsmtFinType2' 'BsmtFinSF2' 'BsmtUnfSF'
 'TotalBsmtSF' 'Heating' 'HeatingQC' 'CentralAir' 'Electrical' '1stFlrSF'
 '2ndFlrSF' 'LowQualFinSF' 'GrLivArea' 'BsmtFullBath' 'BsmtHalfBath'
 'FullBath' 'HalfBath' 'BedroomAbvGr' 'KitchenAbvGr' 'KitchenQual'
 'TotRmsAbvGrd' 'Functional' 'Fireplaces' 'FireplaceQu' 'GarageType'
 'GarageYrBlt' 'GarageFinish' 'GarageCars' 'GarageArea' 'GarageQual'
 'GarageCond' 'PavedDrive' 'WoodDeckSF' 'OpenPorchSF' 'EnclosedPorch'
 '3SsnPorch' 'ScreenPorch' 'PoolArea' 'PoolQC' 'Fence' 'MiscFeature'
 'MiscVal' 'MoSold' 'YrSold' 'SaleTy

In [75]:
print(df_testing.columns.values)

['Id' 'MSSubClass' 'MSZoning' 'LotFrontage' 'LotArea' 'Street' 'Alley'
 'LotShape' 'LandContour' 'Utilities' 'LotConfig' 'LandSlope'
 'Neighborhood' 'Condition1' 'Condition2' 'BldgType' 'HouseStyle'
 'OverallQual' 'OverallCond' 'YearBuilt' 'YearRemodAdd' 'RoofStyle'
 'RoofMatl' 'Exterior1st' 'Exterior2nd' 'MasVnrType' 'MasVnrArea'
 'ExterQual' 'ExterCond' 'Foundation' 'BsmtQual' 'BsmtCond' 'BsmtExposure'
 'BsmtFinType1' 'BsmtFinSF1' 'BsmtFinType2' 'BsmtFinSF2' 'BsmtUnfSF'
 'TotalBsmtSF' 'Heating' 'HeatingQC' 'CentralAir' 'Electrical' '1stFlrSF'
 '2ndFlrSF' 'LowQualFinSF' 'GrLivArea' 'BsmtFullBath' 'BsmtHalfBath'
 'FullBath' 'HalfBath' 'BedroomAbvGr' 'KitchenAbvGr' 'KitchenQual'
 'TotRmsAbvGrd' 'Functional' 'Fireplaces' 'FireplaceQu' 'GarageType'
 'GarageYrBlt' 'GarageFinish' 'GarageCars' 'GarageArea' 'GarageQual'
 'GarageCond' 'PavedDrive' 'WoodDeckSF' 'OpenPorchSF' 'EnclosedPorch'
 '3SsnPorch' 'ScreenPorch' 'PoolArea' 'PoolQC' 'Fence' 'MiscFeature'
 'MiscVal' 'MoSold' 'YrSold' 'SaleTy

Creating the labels for the dataset

In [76]:
training_labels = pd.DataFrame(df_training['SalePrice'])
training_labels.sample

<bound method NDFrame.sample of       SalePrice
0        208500
1        181500
2        223500
3        140000
4        250000
...         ...
1455     175000
1456     210000
1457     266500
1458     142125
1459     147500

[1460 rows x 1 columns]>

Creating the Training Dataset

In [77]:
df_training.drop(columns = ['SalePrice'], inplace=True)
object_columns_training = df_training.loc[:, df_training.dtypes == object]
df_converted_training = pd.get_dummies(df_training, columns= object_columns_training.columns)
print(df_converted_training.columns.values)

['Id' 'MSSubClass' 'LotFrontage' 'LotArea' 'OverallQual' 'OverallCond'
 'YearBuilt' 'YearRemodAdd' 'MasVnrArea' 'BsmtFinSF1' 'BsmtFinSF2'
 'BsmtUnfSF' 'TotalBsmtSF' '1stFlrSF' '2ndFlrSF' 'LowQualFinSF'
 'GrLivArea' 'BsmtFullBath' 'BsmtHalfBath' 'FullBath' 'HalfBath'
 'BedroomAbvGr' 'KitchenAbvGr' 'TotRmsAbvGrd' 'Fireplaces' 'GarageYrBlt'
 'GarageCars' 'GarageArea' 'WoodDeckSF' 'OpenPorchSF' 'EnclosedPorch'
 '3SsnPorch' 'ScreenPorch' 'PoolArea' 'MiscVal' 'MoSold' 'YrSold'
 'MSZoning_C (all)' 'MSZoning_FV' 'MSZoning_RH' 'MSZoning_RL'
 'MSZoning_RM' 'Street_Grvl' 'Street_Pave' 'Alley_Grvl' 'Alley_Pave'
 'LotShape_IR1' 'LotShape_IR2' 'LotShape_IR3' 'LotShape_Reg'
 'LandContour_Bnk' 'LandContour_HLS' 'LandContour_Low' 'LandContour_Lvl'
 'Utilities_AllPub' 'Utilities_NoSeWa' 'LotConfig_Corner'
 'LotConfig_CulDSac' 'LotConfig_FR2' 'LotConfig_FR3' 'LotConfig_Inside'
 'LandSlope_Gtl' 'LandSlope_Mod' 'LandSlope_Sev' 'Neighborhood_Blmngtn'
 'Neighborhood_Blueste' 'Neighborhood_BrDale' 'Neighborho

Creating the Test Dataset

In [78]:
object_columns_testing = df_testing.loc[:, df_testing.dtypes == object]
df_converted_testing = pd.get_dummies(df_testing, columns= object_columns_testing.columns)
df_converted_testing.columns

Index(['Id', 'MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual',
       'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1',
       ...
       'SaleType_ConLw', 'SaleType_New', 'SaleType_Oth', 'SaleType_WD',
       'SaleCondition_Abnorml', 'SaleCondition_AdjLand',
       'SaleCondition_Alloca', 'SaleCondition_Family', 'SaleCondition_Normal',
       'SaleCondition_Partial'],
      dtype='object', length=271)

In [79]:
common_cols = [col for col in df_converted_training.columns if col in df_converted_testing.columns]
df_converted_training = df_converted_training[common_cols]
df_converted_training.columns

Index(['Id', 'MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual',
       'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1',
       ...
       'SaleType_ConLw', 'SaleType_New', 'SaleType_Oth', 'SaleType_WD',
       'SaleCondition_Abnorml', 'SaleCondition_AdjLand',
       'SaleCondition_Alloca', 'SaleCondition_Family', 'SaleCondition_Normal',
       'SaleCondition_Partial'],
      dtype='object', length=271)

In [80]:
print(df_converted_testing.columns.values)

['Id' 'MSSubClass' 'LotFrontage' 'LotArea' 'OverallQual' 'OverallCond'
 'YearBuilt' 'YearRemodAdd' 'MasVnrArea' 'BsmtFinSF1' 'BsmtFinSF2'
 'BsmtUnfSF' 'TotalBsmtSF' '1stFlrSF' '2ndFlrSF' 'LowQualFinSF'
 'GrLivArea' 'BsmtFullBath' 'BsmtHalfBath' 'FullBath' 'HalfBath'
 'BedroomAbvGr' 'KitchenAbvGr' 'TotRmsAbvGrd' 'Fireplaces' 'GarageYrBlt'
 'GarageCars' 'GarageArea' 'WoodDeckSF' 'OpenPorchSF' 'EnclosedPorch'
 '3SsnPorch' 'ScreenPorch' 'PoolArea' 'MiscVal' 'MoSold' 'YrSold'
 'MSZoning_C (all)' 'MSZoning_FV' 'MSZoning_RH' 'MSZoning_RL'
 'MSZoning_RM' 'Street_Grvl' 'Street_Pave' 'Alley_Grvl' 'Alley_Pave'
 'LotShape_IR1' 'LotShape_IR2' 'LotShape_IR3' 'LotShape_Reg'
 'LandContour_Bnk' 'LandContour_HLS' 'LandContour_Low' 'LandContour_Lvl'
 'Utilities_AllPub' 'LotConfig_Corner' 'LotConfig_CulDSac' 'LotConfig_FR2'
 'LotConfig_FR3' 'LotConfig_Inside' 'LandSlope_Gtl' 'LandSlope_Mod'
 'LandSlope_Sev' 'Neighborhood_Blmngtn' 'Neighborhood_Blueste'
 'Neighborhood_BrDale' 'Neighborhood_BrkSide' 'Neighb

The following cell will convert the True / False to 1 and 0 values

In [81]:
df_converted_training = df_converted_training*1
df_converted_training.sample()
df_converted_training.columns

Index(['Id', 'MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual',
       'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1',
       ...
       'SaleType_ConLw', 'SaleType_New', 'SaleType_Oth', 'SaleType_WD',
       'SaleCondition_Abnorml', 'SaleCondition_AdjLand',
       'SaleCondition_Alloca', 'SaleCondition_Family', 'SaleCondition_Normal',
       'SaleCondition_Partial'],
      dtype='object', length=271)

In [82]:
df_converted_testing = df_converted_testing*1
df_converted_testing.sample()

Unnamed: 0,Id,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,Fireplaces,GarageYrBlt,GarageCars,GarageArea,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,MSZoning_C (all),MSZoning_FV,MSZoning_RH,MSZoning_RL,MSZoning_RM,Street_Grvl,Street_Pave,Alley_Grvl,Alley_Pave,LotShape_IR1,LotShape_IR2,LotShape_IR3,LotShape_Reg,LandContour_Bnk,LandContour_HLS,LandContour_Low,LandContour_Lvl,Utilities_AllPub,LotConfig_Corner,LotConfig_CulDSac,LotConfig_FR2,LotConfig_FR3,LotConfig_Inside,LandSlope_Gtl,LandSlope_Mod,LandSlope_Sev,Neighborhood_Blmngtn,Neighborhood_Blueste,Neighborhood_BrDale,Neighborhood_BrkSide,Neighborhood_ClearCr,Neighborhood_CollgCr,Neighborhood_Crawfor,Neighborhood_Edwards,Neighborhood_Gilbert,Neighborhood_IDOTRR,Neighborhood_MeadowV,Neighborhood_Mitchel,Neighborhood_NAmes,Neighborhood_NPkVill,Neighborhood_NWAmes,Neighborhood_NoRidge,Neighborhood_NridgHt,Neighborhood_OldTown,Neighborhood_SWISU,Neighborhood_Sawyer,Neighborhood_SawyerW,Neighborhood_Somerst,Neighborhood_StoneBr,Neighborhood_Timber,Neighborhood_Veenker,Condition1_Artery,Condition1_Feedr,Condition1_Norm,Condition1_PosA,Condition1_PosN,Condition1_RRAe,Condition1_RRAn,Condition1_RRNe,Condition1_RRNn,Condition2_Artery,Condition2_Feedr,Condition2_Norm,Condition2_PosA,Condition2_PosN,BldgType_1Fam,BldgType_2fmCon,BldgType_Duplex,BldgType_Twnhs,BldgType_TwnhsE,HouseStyle_1.5Fin,HouseStyle_1.5Unf,HouseStyle_1Story,HouseStyle_2.5Unf,HouseStyle_2Story,HouseStyle_SFoyer,HouseStyle_SLvl,RoofStyle_Flat,RoofStyle_Gable,RoofStyle_Gambrel,RoofStyle_Hip,RoofStyle_Mansard,RoofStyle_Shed,RoofMatl_CompShg,RoofMatl_Tar&Grv,RoofMatl_WdShake,RoofMatl_WdShngl,Exterior1st_AsbShng,Exterior1st_AsphShn,Exterior1st_BrkComm,Exterior1st_BrkFace,Exterior1st_CBlock,Exterior1st_CemntBd,Exterior1st_HdBoard,Exterior1st_MetalSd,Exterior1st_Plywood,Exterior1st_Stucco,Exterior1st_VinylSd,Exterior1st_Wd Sdng,Exterior1st_WdShing,Exterior2nd_AsbShng,Exterior2nd_AsphShn,Exterior2nd_Brk Cmn,Exterior2nd_BrkFace,Exterior2nd_CBlock,Exterior2nd_CmentBd,Exterior2nd_HdBoard,Exterior2nd_ImStucc,Exterior2nd_MetalSd,Exterior2nd_Plywood,Exterior2nd_Stone,Exterior2nd_Stucco,Exterior2nd_VinylSd,Exterior2nd_Wd Sdng,Exterior2nd_Wd Shng,MasVnrType_BrkCmn,MasVnrType_BrkFace,MasVnrType_None,MasVnrType_Stone,ExterQual_Ex,ExterQual_Fa,ExterQual_Gd,ExterQual_TA,ExterCond_Ex,ExterCond_Fa,ExterCond_Gd,ExterCond_Po,ExterCond_TA,Foundation_BrkTil,Foundation_CBlock,Foundation_PConc,Foundation_Slab,Foundation_Stone,Foundation_Wood,BsmtQual_Ex,BsmtQual_Fa,BsmtQual_Gd,BsmtQual_TA,BsmtCond_Fa,BsmtCond_Gd,BsmtCond_Po,BsmtCond_TA,BsmtExposure_Av,BsmtExposure_Gd,BsmtExposure_Mn,BsmtExposure_No,BsmtFinType1_ALQ,BsmtFinType1_BLQ,BsmtFinType1_GLQ,BsmtFinType1_LwQ,BsmtFinType1_Rec,BsmtFinType1_Unf,BsmtFinType2_ALQ,BsmtFinType2_BLQ,BsmtFinType2_GLQ,BsmtFinType2_LwQ,BsmtFinType2_Rec,BsmtFinType2_Unf,Heating_GasA,Heating_GasW,Heating_Grav,Heating_Wall,HeatingQC_Ex,HeatingQC_Fa,HeatingQC_Gd,HeatingQC_Po,HeatingQC_TA,CentralAir_N,CentralAir_Y,Electrical_FuseA,Electrical_FuseF,Electrical_FuseP,Electrical_SBrkr,KitchenQual_Ex,KitchenQual_Fa,KitchenQual_Gd,KitchenQual_TA,Functional_Maj1,Functional_Maj2,Functional_Min1,Functional_Min2,Functional_Mod,Functional_Sev,Functional_Typ,FireplaceQu_Ex,FireplaceQu_Fa,FireplaceQu_Gd,FireplaceQu_Po,FireplaceQu_TA,GarageType_2Types,GarageType_Attchd,GarageType_Basment,GarageType_BuiltIn,GarageType_CarPort,GarageType_Detchd,GarageFinish_Fin,GarageFinish_RFn,GarageFinish_Unf,GarageQual_Fa,GarageQual_Gd,GarageQual_Po,GarageQual_TA,GarageCond_Ex,GarageCond_Fa,GarageCond_Gd,GarageCond_Po,GarageCond_TA,PavedDrive_N,PavedDrive_P,PavedDrive_Y,PoolQC_Ex,PoolQC_Gd,Fence_GdPrv,Fence_GdWo,Fence_MnPrv,Fence_MnWw,MiscFeature_Gar2,MiscFeature_Othr,MiscFeature_Shed,SaleType_COD,SaleType_CWD,SaleType_Con,SaleType_ConLD,SaleType_ConLI,SaleType_ConLw,SaleType_New,SaleType_Oth,SaleType_WD,SaleCondition_Abnorml,SaleCondition_AdjLand,SaleCondition_Alloca,SaleCondition_Family,SaleCondition_Normal,SaleCondition_Partial
745,2206,20,79.0,7801,6,5,1951,1951,88.0,500.0,0.0,591.0,1091.0,1091,0,0,1091,0.0,1.0,1,0,2,1,5,1,1951.0,1.0,344.0,66,105,0,0,221,0,0,5,2008,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0


Remove the labels from the samples of the training dataset

Creating Scaled Training Dataset Using Standard Scaler

In [83]:
scaler_Standard = StandardScaler()

columns_to_scaler_Standard_training = df_converted_training.drop(columns=['Id'])
scaled_data_Standard_training = scaler_Standard.fit_transform(columns_to_scaler_Standard_training)

scaled_df_Standard_training = pd.DataFrame(scaled_data_Standard_training, columns=columns_to_scaler_Standard_training.columns)
scaled_df_Standard_training = pd.concat([df_converted_training[['Id']], scaled_df_Standard_training], axis=1)
scaled_df_Standard_training.fillna(0.0, inplace = True)
scaled_df_Standard_training.drop(columns = ['Id'], inplace=True)

Creating Scaled Testing Dataset Using Standard Scaler

In [84]:
columns_to_scaler_Standard_testing = df_converted_testing.drop(columns=['Id'])
scaled_data_Standard_testing = scaler_Standard.fit_transform(columns_to_scaler_Standard_testing)

scaled_df_Standard_testing = pd.DataFrame(scaled_data_Standard_testing, columns=columns_to_scaler_Standard_testing.columns)
scaled_df_Standard_testing = pd.concat([df_converted_testing[['Id']], scaled_df_Standard_testing], axis=1)
scaled_df_Standard_testing.drop(columns = ['Id'], inplace=True)
scaled_df_Standard_testing.sample()

Unnamed: 0,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,Fireplaces,GarageYrBlt,GarageCars,GarageArea,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,MSZoning_C (all),MSZoning_FV,MSZoning_RH,MSZoning_RL,MSZoning_RM,Street_Grvl,Street_Pave,Alley_Grvl,Alley_Pave,LotShape_IR1,LotShape_IR2,LotShape_IR3,LotShape_Reg,LandContour_Bnk,LandContour_HLS,LandContour_Low,LandContour_Lvl,Utilities_AllPub,LotConfig_Corner,LotConfig_CulDSac,LotConfig_FR2,LotConfig_FR3,LotConfig_Inside,LandSlope_Gtl,LandSlope_Mod,LandSlope_Sev,Neighborhood_Blmngtn,Neighborhood_Blueste,Neighborhood_BrDale,Neighborhood_BrkSide,Neighborhood_ClearCr,Neighborhood_CollgCr,Neighborhood_Crawfor,Neighborhood_Edwards,Neighborhood_Gilbert,Neighborhood_IDOTRR,Neighborhood_MeadowV,Neighborhood_Mitchel,Neighborhood_NAmes,Neighborhood_NPkVill,Neighborhood_NWAmes,Neighborhood_NoRidge,Neighborhood_NridgHt,Neighborhood_OldTown,Neighborhood_SWISU,Neighborhood_Sawyer,Neighborhood_SawyerW,Neighborhood_Somerst,Neighborhood_StoneBr,Neighborhood_Timber,Neighborhood_Veenker,Condition1_Artery,Condition1_Feedr,Condition1_Norm,Condition1_PosA,Condition1_PosN,Condition1_RRAe,Condition1_RRAn,Condition1_RRNe,Condition1_RRNn,Condition2_Artery,Condition2_Feedr,Condition2_Norm,Condition2_PosA,Condition2_PosN,BldgType_1Fam,BldgType_2fmCon,BldgType_Duplex,BldgType_Twnhs,BldgType_TwnhsE,HouseStyle_1.5Fin,HouseStyle_1.5Unf,HouseStyle_1Story,HouseStyle_2.5Unf,HouseStyle_2Story,HouseStyle_SFoyer,HouseStyle_SLvl,RoofStyle_Flat,RoofStyle_Gable,RoofStyle_Gambrel,RoofStyle_Hip,RoofStyle_Mansard,RoofStyle_Shed,RoofMatl_CompShg,RoofMatl_Tar&Grv,RoofMatl_WdShake,RoofMatl_WdShngl,Exterior1st_AsbShng,Exterior1st_AsphShn,Exterior1st_BrkComm,Exterior1st_BrkFace,Exterior1st_CBlock,Exterior1st_CemntBd,Exterior1st_HdBoard,Exterior1st_MetalSd,Exterior1st_Plywood,Exterior1st_Stucco,Exterior1st_VinylSd,Exterior1st_Wd Sdng,Exterior1st_WdShing,Exterior2nd_AsbShng,Exterior2nd_AsphShn,Exterior2nd_Brk Cmn,Exterior2nd_BrkFace,Exterior2nd_CBlock,Exterior2nd_CmentBd,Exterior2nd_HdBoard,Exterior2nd_ImStucc,Exterior2nd_MetalSd,Exterior2nd_Plywood,Exterior2nd_Stone,Exterior2nd_Stucco,Exterior2nd_VinylSd,Exterior2nd_Wd Sdng,Exterior2nd_Wd Shng,MasVnrType_BrkCmn,MasVnrType_BrkFace,MasVnrType_None,MasVnrType_Stone,ExterQual_Ex,ExterQual_Fa,ExterQual_Gd,ExterQual_TA,ExterCond_Ex,ExterCond_Fa,ExterCond_Gd,ExterCond_Po,ExterCond_TA,Foundation_BrkTil,Foundation_CBlock,Foundation_PConc,Foundation_Slab,Foundation_Stone,Foundation_Wood,BsmtQual_Ex,BsmtQual_Fa,BsmtQual_Gd,BsmtQual_TA,BsmtCond_Fa,BsmtCond_Gd,BsmtCond_Po,BsmtCond_TA,BsmtExposure_Av,BsmtExposure_Gd,BsmtExposure_Mn,BsmtExposure_No,BsmtFinType1_ALQ,BsmtFinType1_BLQ,BsmtFinType1_GLQ,BsmtFinType1_LwQ,BsmtFinType1_Rec,BsmtFinType1_Unf,BsmtFinType2_ALQ,BsmtFinType2_BLQ,BsmtFinType2_GLQ,BsmtFinType2_LwQ,BsmtFinType2_Rec,BsmtFinType2_Unf,Heating_GasA,Heating_GasW,Heating_Grav,Heating_Wall,HeatingQC_Ex,HeatingQC_Fa,HeatingQC_Gd,HeatingQC_Po,HeatingQC_TA,CentralAir_N,CentralAir_Y,Electrical_FuseA,Electrical_FuseF,Electrical_FuseP,Electrical_SBrkr,KitchenQual_Ex,KitchenQual_Fa,KitchenQual_Gd,KitchenQual_TA,Functional_Maj1,Functional_Maj2,Functional_Min1,Functional_Min2,Functional_Mod,Functional_Sev,Functional_Typ,FireplaceQu_Ex,FireplaceQu_Fa,FireplaceQu_Gd,FireplaceQu_Po,FireplaceQu_TA,GarageType_2Types,GarageType_Attchd,GarageType_Basment,GarageType_BuiltIn,GarageType_CarPort,GarageType_Detchd,GarageFinish_Fin,GarageFinish_RFn,GarageFinish_Unf,GarageQual_Fa,GarageQual_Gd,GarageQual_Po,GarageQual_TA,GarageCond_Ex,GarageCond_Fa,GarageCond_Gd,GarageCond_Po,GarageCond_TA,PavedDrive_N,PavedDrive_P,PavedDrive_Y,PoolQC_Ex,PoolQC_Gd,Fence_GdPrv,Fence_GdWo,Fence_MnPrv,Fence_MnWw,MiscFeature_Gar2,MiscFeature_Othr,MiscFeature_Shed,SaleType_COD,SaleType_CWD,SaleType_Con,SaleType_ConLD,SaleType_ConLI,SaleType_ConLw,SaleType_New,SaleType_Oth,SaleType_WD,SaleCondition_Abnorml,SaleCondition_AdjLand,SaleCondition_Alloca,SaleCondition_Family,SaleCondition_Normal,SaleCondition_Partial
685,-0.874711,0.868198,0.879289,-0.054877,2.197134,-0.505528,-1.309591,-0.56717,0.74003,-0.2978,-0.783086,-0.131267,0.079053,-0.775254,-0.080483,-0.614022,1.066131,-0.258349,-1.02872,-0.75104,-2.235082,-0.20391,-1.581299,0.647066,-0.822101,0.301519,0.683174,0.069108,0.590854,3.07652,-0.088827,-0.301543,-0.057227,-0.092244,-0.038281,0.176974,-0.101921,-0.231148,-0.083074,0.556502,-0.445926,-0.06426,0.06426,-0.22449,-0.161306,-0.704564,-0.156776,-0.06426,0.749732,-0.196046,-0.22449,-0.129324,0.335992,0.03705,-0.452537,-0.244028,-0.163529,-0.083074,0.591334,0.212436,-0.207094,-0.045392,-0.087159,-0.074253,-0.098431,-0.188378,-0.1053,-0.295268,-0.192245,-0.26242,-0.250273,-0.199786,-0.117892,-0.215936,-0.419124,-0.098431,-0.203467,-0.144892,-0.254879,-0.307447,-0.126557,4.236514,-0.217669,-0.265392,-0.134699,-0.154466,-0.094817,-0.176339,-0.245601,-2.452432,10.981044,-0.117892,-0.108578,-0.129324,-0.052432,-0.052432,-0.045392,-0.069433,0.101921,-0.045392,-0.03705,0.459117,-0.147339,-0.201634,-0.194154,-0.289745,-0.350958,-0.058641,0.978974,-0.094817,-0.643242,-0.18043,-0.212436,-0.069433,0.498072,-0.087159,-0.471108,-0.052432,-0.045392,0.108578,-0.091066,-0.052432,-0.026189,-0.129324,-0.026189,-0.052432,-0.161306,-0.026189,-0.215936,2.373145,-0.432601,-0.289745,-0.111765,-0.733081,-0.404323,-0.144892,-0.111765,-0.026189,-0.101921,-0.123732,-0.03705,-0.217669,2.516279,-0.058641,-0.435946,-0.31011,-0.026189,-0.120845,-0.733081,-0.391612,-0.174262,-0.083074,-0.650703,0.813469,-0.300722,-0.197924,-0.120845,-0.712202,0.797277,-0.078784,-0.165725,-0.342274,-0.03705,0.402025,-0.357088,1.194831,-0.910121,-0.132037,-0.058641,-0.03705,-0.321917,-0.194154,-0.825152,1.140729,-0.205287,4.959485,-0.045392,-2.810043,-0.395097,3.045431,-0.30611,-1.368228,-0.408901,-0.300722,1.544393,-0.240859,-0.344768,-0.636858,-0.152124,-0.156776,-0.117892,-0.170041,-0.19032,0.423635,0.094817,-0.078784,-0.03705,-0.03705,-1.031334,-0.174262,-0.435946,-0.03705,1.549494,-0.272716,0.272716,3.810679,-0.126557,-0.058641,-3.310441,-0.278474,-0.147339,-0.794979,0.962987,-0.058641,-0.052432,-0.154466,-0.159056,-0.117892,-0.026189,0.274164,-0.114867,-0.170041,-0.576559,-0.134699,2.056548,-0.108578,0.842872,-0.108578,-0.268339,-0.06426,-0.606123,-0.579725,-0.602952,1.155162,-0.234421,-0.083074,-0.03705,0.358307,-0.026189,-0.165725,-0.06426,-0.069433,0.314077,-0.307447,-0.149749,0.34849,-0.03705,-0.026189,-0.205287,-0.203467,-0.365574,-0.026189,-0.045392,-0.03705,-0.18043,-0.176339,-0.074253,-0.045392,-0.108578,-0.052432,-0.045392,-0.295268,-0.052432,0.399722,-0.254879,-0.074253,-0.091066,-0.134699,0.460211,-0.299365


In this testing, we are going to create a MinMax scaler to test the difference between both types of scalers (Standard vs. MinMax) for both datasets.

In [85]:
scaler_MinMax = preprocessing.MinMaxScaler()

columns_to_scaler_MinMax_training = df_converted_training.drop(columns=['Id'])
scaled_data_MinMax_training = scaler_MinMax.fit_transform(columns_to_scaler_MinMax_training)

scaled_df_MinMax_training = pd.DataFrame(scaled_data_MinMax_training, columns=columns_to_scaler_MinMax_training.columns)
scaled_df_MinMax_training = pd.concat([df_converted_training[['Id']], scaled_df_MinMax_training], axis=1)
scaled_df_MinMax_training.drop(columns = ['Id'], inplace=True)
scaled_df_MinMax_training.sample()

Unnamed: 0,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,Fireplaces,GarageYrBlt,GarageCars,GarageArea,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,MSZoning_C (all),MSZoning_FV,MSZoning_RH,MSZoning_RL,MSZoning_RM,Street_Grvl,Street_Pave,Alley_Grvl,Alley_Pave,LotShape_IR1,LotShape_IR2,LotShape_IR3,LotShape_Reg,LandContour_Bnk,LandContour_HLS,LandContour_Low,LandContour_Lvl,Utilities_AllPub,LotConfig_Corner,LotConfig_CulDSac,LotConfig_FR2,LotConfig_FR3,LotConfig_Inside,LandSlope_Gtl,LandSlope_Mod,LandSlope_Sev,Neighborhood_Blmngtn,Neighborhood_Blueste,Neighborhood_BrDale,Neighborhood_BrkSide,Neighborhood_ClearCr,Neighborhood_CollgCr,Neighborhood_Crawfor,Neighborhood_Edwards,Neighborhood_Gilbert,Neighborhood_IDOTRR,Neighborhood_MeadowV,Neighborhood_Mitchel,Neighborhood_NAmes,Neighborhood_NPkVill,Neighborhood_NWAmes,Neighborhood_NoRidge,Neighborhood_NridgHt,Neighborhood_OldTown,Neighborhood_SWISU,Neighborhood_Sawyer,Neighborhood_SawyerW,Neighborhood_Somerst,Neighborhood_StoneBr,Neighborhood_Timber,Neighborhood_Veenker,Condition1_Artery,Condition1_Feedr,Condition1_Norm,Condition1_PosA,Condition1_PosN,Condition1_RRAe,Condition1_RRAn,Condition1_RRNe,Condition1_RRNn,Condition2_Artery,Condition2_Feedr,Condition2_Norm,Condition2_PosA,Condition2_PosN,BldgType_1Fam,BldgType_2fmCon,BldgType_Duplex,BldgType_Twnhs,BldgType_TwnhsE,HouseStyle_1.5Fin,HouseStyle_1.5Unf,HouseStyle_1Story,HouseStyle_2.5Unf,HouseStyle_2Story,HouseStyle_SFoyer,HouseStyle_SLvl,RoofStyle_Flat,RoofStyle_Gable,RoofStyle_Gambrel,RoofStyle_Hip,RoofStyle_Mansard,RoofStyle_Shed,RoofMatl_CompShg,RoofMatl_Tar&Grv,RoofMatl_WdShake,RoofMatl_WdShngl,Exterior1st_AsbShng,Exterior1st_AsphShn,Exterior1st_BrkComm,Exterior1st_BrkFace,Exterior1st_CBlock,Exterior1st_CemntBd,Exterior1st_HdBoard,Exterior1st_MetalSd,Exterior1st_Plywood,Exterior1st_Stucco,Exterior1st_VinylSd,Exterior1st_Wd Sdng,Exterior1st_WdShing,Exterior2nd_AsbShng,Exterior2nd_AsphShn,Exterior2nd_Brk Cmn,Exterior2nd_BrkFace,Exterior2nd_CBlock,Exterior2nd_CmentBd,Exterior2nd_HdBoard,Exterior2nd_ImStucc,Exterior2nd_MetalSd,Exterior2nd_Plywood,Exterior2nd_Stone,Exterior2nd_Stucco,Exterior2nd_VinylSd,Exterior2nd_Wd Sdng,Exterior2nd_Wd Shng,MasVnrType_BrkCmn,MasVnrType_BrkFace,MasVnrType_None,MasVnrType_Stone,ExterQual_Ex,ExterQual_Fa,ExterQual_Gd,ExterQual_TA,ExterCond_Ex,ExterCond_Fa,ExterCond_Gd,ExterCond_Po,ExterCond_TA,Foundation_BrkTil,Foundation_CBlock,Foundation_PConc,Foundation_Slab,Foundation_Stone,Foundation_Wood,BsmtQual_Ex,BsmtQual_Fa,BsmtQual_Gd,BsmtQual_TA,BsmtCond_Fa,BsmtCond_Gd,BsmtCond_Po,BsmtCond_TA,BsmtExposure_Av,BsmtExposure_Gd,BsmtExposure_Mn,BsmtExposure_No,BsmtFinType1_ALQ,BsmtFinType1_BLQ,BsmtFinType1_GLQ,BsmtFinType1_LwQ,BsmtFinType1_Rec,BsmtFinType1_Unf,BsmtFinType2_ALQ,BsmtFinType2_BLQ,BsmtFinType2_GLQ,BsmtFinType2_LwQ,BsmtFinType2_Rec,BsmtFinType2_Unf,Heating_GasA,Heating_GasW,Heating_Grav,Heating_Wall,HeatingQC_Ex,HeatingQC_Fa,HeatingQC_Gd,HeatingQC_Po,HeatingQC_TA,CentralAir_N,CentralAir_Y,Electrical_FuseA,Electrical_FuseF,Electrical_FuseP,Electrical_SBrkr,KitchenQual_Ex,KitchenQual_Fa,KitchenQual_Gd,KitchenQual_TA,Functional_Maj1,Functional_Maj2,Functional_Min1,Functional_Min2,Functional_Mod,Functional_Sev,Functional_Typ,FireplaceQu_Ex,FireplaceQu_Fa,FireplaceQu_Gd,FireplaceQu_Po,FireplaceQu_TA,GarageType_2Types,GarageType_Attchd,GarageType_Basment,GarageType_BuiltIn,GarageType_CarPort,GarageType_Detchd,GarageFinish_Fin,GarageFinish_RFn,GarageFinish_Unf,GarageQual_Fa,GarageQual_Gd,GarageQual_Po,GarageQual_TA,GarageCond_Ex,GarageCond_Fa,GarageCond_Gd,GarageCond_Po,GarageCond_TA,PavedDrive_N,PavedDrive_P,PavedDrive_Y,PoolQC_Ex,PoolQC_Gd,Fence_GdPrv,Fence_GdWo,Fence_MnPrv,Fence_MnWw,MiscFeature_Gar2,MiscFeature_Othr,MiscFeature_Shed,SaleType_COD,SaleType_CWD,SaleType_Con,SaleType_ConLD,SaleType_ConLI,SaleType_ConLw,SaleType_New,SaleType_Oth,SaleType_WD,SaleCondition_Abnorml,SaleCondition_AdjLand,SaleCondition_Alloca,SaleCondition_Family,SaleCondition_Normal,SaleCondition_Partial
380,0.176471,0.099315,0.017294,0.444444,0.625,0.376812,0.0,0.0,0.038625,0.0,0.34589,0.167921,0.158788,0.322034,0.0,0.255652,0.0,0.0,0.666667,0.0,0.375,0.333333,0.333333,0.333333,0.218182,0.25,0.217207,0.0,0.0,0.438406,0.0,0.0,0.0,0.0,0.363636,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0


In [86]:
columns_to_scaler_MinMax_testing = df_converted_testing.drop(columns=['Id'])
scaled_data_MinMax_testing = scaler_MinMax.fit_transform(columns_to_scaler_MinMax_testing)

scaled_df_MinMax_testing = pd.DataFrame(scaled_data_MinMax_testing, columns=columns_to_scaler_MinMax_testing.columns)
scaled_df_MinMax_testing = pd.concat([df_converted_testing[['Id']], scaled_df_MinMax_testing], axis=1)
scaled_df_MinMax_testing.drop(columns = ['Id'], inplace=True)
scaled_df_MinMax_testing.sample()

Unnamed: 0,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,Fireplaces,GarageYrBlt,GarageCars,GarageArea,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,MSZoning_C (all),MSZoning_FV,MSZoning_RH,MSZoning_RL,MSZoning_RM,Street_Grvl,Street_Pave,Alley_Grvl,Alley_Pave,LotShape_IR1,LotShape_IR2,LotShape_IR3,LotShape_Reg,LandContour_Bnk,LandContour_HLS,LandContour_Low,LandContour_Lvl,Utilities_AllPub,LotConfig_Corner,LotConfig_CulDSac,LotConfig_FR2,LotConfig_FR3,LotConfig_Inside,LandSlope_Gtl,LandSlope_Mod,LandSlope_Sev,Neighborhood_Blmngtn,Neighborhood_Blueste,Neighborhood_BrDale,Neighborhood_BrkSide,Neighborhood_ClearCr,Neighborhood_CollgCr,Neighborhood_Crawfor,Neighborhood_Edwards,Neighborhood_Gilbert,Neighborhood_IDOTRR,Neighborhood_MeadowV,Neighborhood_Mitchel,Neighborhood_NAmes,Neighborhood_NPkVill,Neighborhood_NWAmes,Neighborhood_NoRidge,Neighborhood_NridgHt,Neighborhood_OldTown,Neighborhood_SWISU,Neighborhood_Sawyer,Neighborhood_SawyerW,Neighborhood_Somerst,Neighborhood_StoneBr,Neighborhood_Timber,Neighborhood_Veenker,Condition1_Artery,Condition1_Feedr,Condition1_Norm,Condition1_PosA,Condition1_PosN,Condition1_RRAe,Condition1_RRAn,Condition1_RRNe,Condition1_RRNn,Condition2_Artery,Condition2_Feedr,Condition2_Norm,Condition2_PosA,Condition2_PosN,BldgType_1Fam,BldgType_2fmCon,BldgType_Duplex,BldgType_Twnhs,BldgType_TwnhsE,HouseStyle_1.5Fin,HouseStyle_1.5Unf,HouseStyle_1Story,HouseStyle_2.5Unf,HouseStyle_2Story,HouseStyle_SFoyer,HouseStyle_SLvl,RoofStyle_Flat,RoofStyle_Gable,RoofStyle_Gambrel,RoofStyle_Hip,RoofStyle_Mansard,RoofStyle_Shed,RoofMatl_CompShg,RoofMatl_Tar&Grv,RoofMatl_WdShake,RoofMatl_WdShngl,Exterior1st_AsbShng,Exterior1st_AsphShn,Exterior1st_BrkComm,Exterior1st_BrkFace,Exterior1st_CBlock,Exterior1st_CemntBd,Exterior1st_HdBoard,Exterior1st_MetalSd,Exterior1st_Plywood,Exterior1st_Stucco,Exterior1st_VinylSd,Exterior1st_Wd Sdng,Exterior1st_WdShing,Exterior2nd_AsbShng,Exterior2nd_AsphShn,Exterior2nd_Brk Cmn,Exterior2nd_BrkFace,Exterior2nd_CBlock,Exterior2nd_CmentBd,Exterior2nd_HdBoard,Exterior2nd_ImStucc,Exterior2nd_MetalSd,Exterior2nd_Plywood,Exterior2nd_Stone,Exterior2nd_Stucco,Exterior2nd_VinylSd,Exterior2nd_Wd Sdng,Exterior2nd_Wd Shng,MasVnrType_BrkCmn,MasVnrType_BrkFace,MasVnrType_None,MasVnrType_Stone,ExterQual_Ex,ExterQual_Fa,ExterQual_Gd,ExterQual_TA,ExterCond_Ex,ExterCond_Fa,ExterCond_Gd,ExterCond_Po,ExterCond_TA,Foundation_BrkTil,Foundation_CBlock,Foundation_PConc,Foundation_Slab,Foundation_Stone,Foundation_Wood,BsmtQual_Ex,BsmtQual_Fa,BsmtQual_Gd,BsmtQual_TA,BsmtCond_Fa,BsmtCond_Gd,BsmtCond_Po,BsmtCond_TA,BsmtExposure_Av,BsmtExposure_Gd,BsmtExposure_Mn,BsmtExposure_No,BsmtFinType1_ALQ,BsmtFinType1_BLQ,BsmtFinType1_GLQ,BsmtFinType1_LwQ,BsmtFinType1_Rec,BsmtFinType1_Unf,BsmtFinType2_ALQ,BsmtFinType2_BLQ,BsmtFinType2_GLQ,BsmtFinType2_LwQ,BsmtFinType2_Rec,BsmtFinType2_Unf,Heating_GasA,Heating_GasW,Heating_Grav,Heating_Wall,HeatingQC_Ex,HeatingQC_Fa,HeatingQC_Gd,HeatingQC_Po,HeatingQC_TA,CentralAir_N,CentralAir_Y,Electrical_FuseA,Electrical_FuseF,Electrical_FuseP,Electrical_SBrkr,KitchenQual_Ex,KitchenQual_Fa,KitchenQual_Gd,KitchenQual_TA,Functional_Maj1,Functional_Maj2,Functional_Min1,Functional_Min2,Functional_Mod,Functional_Sev,Functional_Typ,FireplaceQu_Ex,FireplaceQu_Fa,FireplaceQu_Gd,FireplaceQu_Po,FireplaceQu_TA,GarageType_2Types,GarageType_Attchd,GarageType_Basment,GarageType_BuiltIn,GarageType_CarPort,GarageType_Detchd,GarageFinish_Fin,GarageFinish_RFn,GarageFinish_Unf,GarageQual_Fa,GarageQual_Gd,GarageQual_Po,GarageQual_TA,GarageCond_Ex,GarageCond_Fa,GarageCond_Gd,GarageCond_Po,GarageCond_TA,PavedDrive_N,PavedDrive_P,PavedDrive_Y,PoolQC_Ex,PoolQC_Gd,Fence_GdPrv,Fence_GdWo,Fence_MnPrv,Fence_MnWw,MiscFeature_Gar2,MiscFeature_Othr,MiscFeature_Shed,SaleType_COD,SaleType_CWD,SaleType_Con,SaleType_ConLD,SaleType_ConLI,SaleType_ConLw,SaleType_New,SaleType_Oth,SaleType_WD,SaleCondition_Abnorml,SaleCondition_AdjLand,SaleCondition_Alloca,SaleCondition_Family,SaleCondition_Normal,SaleCondition_Partial
644,0.294118,0.217877,0.082169,0.444444,1.0,0.198473,0.916667,0.0,0.0,0.0,0.26729,0.112267,0.101749,0.352309,0.0,0.241681,0.0,0.0,0.25,0.5,0.5,0.5,0.333333,0.0,,0.0,0.0,0.168539,0.103774,0.0,0.0,0.0,0.0,0.0,0.090909,0.5,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0


In [87]:
scaled_training_labels = pd.DataFrame(scaler_Standard.fit_transform(training_labels))
print(scaled_training_labels)

             0
0     0.347273
1     0.007288
2     0.536154
3    -0.515281
4     0.869843
...        ...
1455 -0.074560
1456  0.366161
1457  1.077611
1458 -0.488523
1459 -0.420841

[1460 rows x 1 columns]


Now create the K-fold tests and analyize the accuracies for each test.

In [88]:
kfold = KFold(n_splits=20, shuffle= True, random_state=42)
mae = []
mse = []
y_val_pred_values = []
y_val_fold_values = []
X_train_fold_values = []
y_train_fold_values = []


In [89]:
for training_index, val_index in kfold.split(scaled_df_MinMax_training):
    X_train_fold =scaled_df_Standard_training.iloc[training_index]
    X_val_fold = scaled_df_Standard_training.iloc[val_index]
    y_train_fold =scaled_training_labels.iloc[training_index]
    y_val_fold = scaled_training_labels.iloc[val_index]

    model = LinearRegression()
    model.fit(X_train_fold,y_train_fold)

    y_val_pred = model.predict(X_val_fold)
    y_val_pred_values.append(y_val_pred)
    y_val_fold_values.append(y_val_fold)
    X_train_fold_values.append(X_train_fold)
    y_train_fold_values.append(y_train_fold)
    mae.append(mean_absolute_error(y_val_fold, y_val_pred))
    mse.append(mean_squared_error(y_val_fold, y_val_pred)) 

In [90]:
mae

[0.2037348135882761,
 1989132272.707979,
 0.3180447766576919,
 258878837.4581652,
 160888006.13049507,
 0.25139054235491887,
 0.251331706739478,
 487633944.4275087,
 0.20096665176596185,
 4120757644.2022448,
 2117599603716.7432,
 0.2381281936556197,
 650103484.6053814,
 398996392.18560493,
 6174052957.530763,
 0.27545181492181064,
 0.18837402642755327,
 0.290079832961477,
 0.20493519034497076,
 541368818243.9456]

In [91]:
y_val_fold_values

[             0
 29   -1.415611
 49   -0.678977
 51   -0.836378
 65    1.713509
 67    0.567634
 ...        ...
 1269 -0.464913
 1299 -0.338993
 1317  0.352310
 1347  1.291210
 1398 -0.540465
 
 [73 rows x 1 columns],
              0
 15   -0.616017
 43   -0.638053
 44   -0.502689
 59   -0.705421
 70    0.794290
 ...        ...
 1344 -0.315886
 1392 -0.729346
 1397 -0.546761
 1414  0.328385
 1428 -0.779714
 
 [73 rows x 1 columns],
              0
 30   -1.774484
 32   -0.012859
 56   -0.106040
 63   -0.515281
 99   -0.654423
 ...        ...
 1333 -0.697866
 1356 -0.893042
 1358 -0.043080
 1394  0.826753
 1450 -0.565649
 
 [73 rows x 1 columns],
              0
 23   -0.642461
 58    3.246967
 81   -0.345289
 107  -0.830082
 111  -0.011600
 ...        ...
 1427 -0.515281
 1430  0.141268
 1432 -1.465980
 1447  0.743922
 1449 -1.119699
 
 [73 rows x 1 columns],
              0
 31   -0.397546
 48   -0.855266
 83   -0.685274
 86   -0.087152
 113   0.454306
 ...        ...
 1405  1.184643


In [92]:
X_train_fold_values

[      MSSubClass  LotFrontage   LotArea  OverallQual  OverallCond  YearBuilt  \
 0       0.073375    -0.208034 -0.207142     0.651479    -0.517200   1.050994   
 1      -0.872563     0.409895 -0.091886    -0.071836     2.179628   0.156734   
 2       0.073375    -0.084449  0.073480     0.651479    -0.517200   0.984752   
 3       0.309859    -0.414011 -0.096897     0.651479    -0.517200  -1.863632   
 4       0.073375     0.574676  0.375148     1.374795    -0.517200   0.951632   
 ...          ...          ...       ...          ...          ...        ...   
 1455    0.073375    -0.331620 -0.260560    -0.071836    -0.517200   0.918511   
 1456   -0.872563     0.615871  0.266407    -0.071836     0.381743   0.222975   
 1457    0.309859    -0.166839 -0.147810     0.651479     3.078570  -1.002492   
 1458   -0.872563    -0.084449 -0.080160    -0.795151     0.381743  -0.704406   
 1459   -0.872563     0.203918 -0.058112    -0.795151     0.381743  -0.207594   
 
       YearRemodAdd  MasVn

In [93]:
y_train_fold_values

[             0
 0     0.347273
 1     0.007288
 2     0.536154
 3    -0.515281
 4     0.869843
 ...        ...
 1455 -0.074560
 1456  0.366161
 1457  1.077611
 1458 -0.488523
 1459 -0.420841
 
 [1387 rows x 1 columns],
              0
 0     0.347273
 1     0.007288
 2     0.536154
 3    -0.515281
 4     0.869843
 ...        ...
 1455 -0.074560
 1456  0.366161
 1457  1.077611
 1458 -0.488523
 1459 -0.420841
 
 [1387 rows x 1 columns],
              0
 0     0.347273
 1     0.007288
 2     0.536154
 3    -0.515281
 4     0.869843
 ...        ...
 1455 -0.074560
 1456  0.366161
 1457  1.077611
 1458 -0.488523
 1459 -0.420841
 
 [1387 rows x 1 columns],
              0
 0     0.347273
 1     0.007288
 2     0.536154
 3    -0.515281
 4     0.869843
 ...        ...
 1455 -0.074560
 1456  0.366161
 1457  1.077611
 1458 -0.488523
 1459 -0.420841
 
 [1387 rows x 1 columns],
              0
 0     0.347273
 1     0.007288
 2     0.536154
 3    -0.515281
 4     0.869843
 ...        ...
 1455 -0

In [94]:
print(f"Average MAE: {np.mean(mae):.4f}")
print(f"Average MSE: {np.mean(mse):.4f}")

Average MAE: 133660443275.1180
Average MSE: 10028240417690403232284672.0000


In [108]:
final_model = LinearRegression()
scaled_df_MinMax_training.fillna(0.0, inplace=True)
scaled_df_MinMax_testing.fillna(0.0, inplace=True)
final_model.fit(scaled_df_MinMax_training,scaled_training_labels)

y_test_pred = final_model.predict(scaled_df_MinMax_testing)
y_scaler_pred = preprocessing.MinMaxScaler()
y_scaler_pred.min_, y_scaler_pred.scale_ = scaler_MinMax.min_[1], scaler_MinMax.scale_[1]

y_test_pred = y_scaler_pred.inverse_transform(y_test_pred)

y_test_pred.tolist()

[[214440706077.4622],
 [346238968195.0412],
 [188607587893.4622],
 [228029602509.23398],
 [464709852813.7508],
 [281365476613.6967],
 [262084347676.12198],
 [283931561469.12885],
 [413222454397.20593],
 [127413163291.29845],
 [327543832781.356],
 [67233737122.82913],
 [62566215289.963684],
 [203165619887.5281],
 [272583059832.6415],
 [755374824874.6025],
 [671781927549.0337],
 [734263477302.3076],
 [822331363822.4775],
 [1028708420305.8032],
 [654779179304.345],
 [568361038324.0747],
 [547737081581.9613],
 [443267889007.67847],
 [156668241855.6205],
 [328392116343.6377],
 [773994476299.7922],
 [700537778050.312],
 [487970019557.6623],
 [315923915242.29517],
 [316123274392.2408],
 [357036775038.0804],
 [669771774821.0479],
 [597591109904.1948],
 [386973126629.24567],
 [352878735037.27014],
 [321869711756.9926],
 [111129133026.40656],
 [95347771402.65025],
 [91893988133.0525],
 [166649218197.4809],
 [76829876094.30362],
 [389186165745.9343],
 [292696491130.5554],
 [458067520687.2605],
 [