Welcome to the Logistic regression model for CS4650 Big Data, Analysis, and Cloud Computing.

This model is using a dataset from the Kaggle Competition:

`Housing Prices - Advanced Regression Techniques`

The dataset includes these features and their descriptions:

---

    1. MSSubClass: Identifies the type of dwelling involved in the sale. (categorical) 
    
    2. MSZoning: Identifies the general zoning classification of the sale. (categorical)
    
    3. LotFrontage: Linear feet of street connected to property (numeric)
    
    4. LotArea: Lot size in square feet (numeric)
    
    5. Street: Type of road access to property (categorical)
    
    6. Alley: Type of alley access to property (categorical)
    
    7. LotShape: General shape of property (categorical)
    
    8. LandContour: Flatness of the property (categorical)
    
    9. Utilities: Type of utilities available (categorical)
    
    10. LotConfig: Lot configuration (categorical)
    
    11. LandSlope: Slope of property (categorical)
    
    12. Neighborhood: Physical locations within Ames city limits (categorical)
    
    13. Condition1: Proximity to various conditions (categorical)
    
    14. Condition2: Proximity to various conditions (if more than one is present) (categorical)
    
    15. BldgType: Type of dwelling (categorical)
    
    16. HouseStyle: Style of dwelling (categorical)
    
    17. OverallQual: Rates the overall material and finish of the house (categorical)
    
    18. OverallCond: Rates the overall condition of the house (categorical)
    
    19. YearBuilt: Original construction date (numeric)
    
    20. YearRemodAdd: Remodel date (same as construction date if no remodeling or additions) (numeric)
    
    21. RoofStyle: Type of roof (categorical)
    
    22. RoofMatl: Roof material (categorical)
    
    23. Exterior1st: Exterior covering on house (categorical)
    
    24. Exterior2nd: Exterior covering on house (if more than one material) (categorical)
    
    25. MasVnrType: Masonry veneer type (categorical)
    
    26. MasVnrArea: Masonry veneer area in square feet (numeric)
    
    27. ExterQual: Evaluates the quality of the material on the exterior (categorical)
    
    28. ExterCond: Evaluates the present condition of the material on the exterior (categorical)
    
    29. Foundation: Type of foundation (categorical)
    
    30. BsmtQual: Evaluates the height of the basement (categorical)
    
    31. BsmtCond: Evaluates the general condition of the basement (categorical)
    
    32. BsmtExposure: Refers to walkout or garden level walls (categorical)
    
    33. BsmtFinType1: Rating of basement finished area (categorical)
    
    34. BsmtFinSF1: Type 1 finished square feet (numeric)

    35. BsmtFinType2: Rating of basement finished area (if multiple types) (categorical)
    
    36. BsmtFinSF2: Type 2 finished square feet (numeric)
    
    37. BsmtUnfSF: Unfinished square feet of basement area (numeric)
    
    38. TotalBsmtSF: Total square feet of basement area (numeric)

    39. Heating: Type of heating (categorical)
    
    40. HeatingQC: Heating quality and condition (categorical)
    
    41. CentralAir: Central air conditioning (categorical)
    
    42. Electrical: Electrical system (categorical)
    
    43. 1stFlrSF: First Floor square feet (numeric)
    
    44. 2ndFlrSF: Second floor square feet (numeric)
    
    45. LowQualFinSF: Low quality finished square feet (all floors) (numeric)
    
    46. GrLivArea: Above grade (ground) living area square feet (numeric)
    
    47. BsmtFullBath: Basement full bathrooms (numeric)
    
    48. BsmtHalfBath: Basement half bathrooms (numeric)
    
    49. FullBath: Full bathrooms above grade (numeric)
    
    50. HalfBath: Half baths above grade (numeric)
    
    51. Bedroom: Bedrooms above grade (does NOT include basement bedrooms) (numeric)
    
    52. Kitchen: Kitchens above grade (numeric)
    
    53. KitchenQual: Kitchen quality (categorical)
    
    54. TotRmsAbvGrd: Total rooms above grade (does not include bathrooms) (numeric)
    
    55. Functional: Home functionality (Assume typical unless deductions are warranted)
    
    56. Fireplaces: Number of fireplaces (numeric)
    
    57. FireplaceQu: Fireplace quality (categorical)
    
    58. GarageType: Garage location (categorical)
    
    59. GarageYrBlt: Year garage was built (numeric)
    
    60. GarageFinish: Interior finish of the garage (categorical)
    
    61. GarageCars: Size of garage in car capacity (numeric)
    
    62. GarageArea: Size of garage in square feet (numeric)
    
    63. GarageQual: Garage quality (categorical)
    
    64. GarageCond: Garage condition (categorical)
    
    65. PavedDrive: Paved driveway (categorical)
    
    66. WoodDeckSF: Wood deck area in square feet (numeric)
    
    67. OpenPorchSF: Open porch area in square feet (numeric)
    
    68. EnclosedPorch: Enclosed porch area in square feet (numeric)
    
    69. 3SsnPorch: Three season porch area in square feet (numeric)
    
    70. ScreenPorch: Screen porch area in square feet (numeric)
    
    71. PoolArea: Pool area in square feet (numeric)
    
    72. PoolQC: Pool quality (categorical)
    
    73. Fence: Fence quality (categorical)
    
    74. MiscFeature: Miscellaneous feature not covered in other categories (categorical)
    
    75. MiscVal: $Value of miscellaneous feature (numeric)
    
    76. MoSold: Month Sold (MM) (numeric)
    
    77. YrSold: Year Sold (YYYY) (numeric)
    
    78. SaleType: Type of sale (categorical)
    
    79. SaleCondition: Condition of sale (categorical)


---    







In [61]:
import pandas as pd 
import numpy as np
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import StandardScaler

In [62]:
df=pd.read_csv("./train.csv")

In [71]:
object_columns = df.loc[:, df.dtypes == object]
df_converted = pd.get_dummies(df, columns= object_columns.columns)
df_converted.sample()

Unnamed: 0,Id,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,Fireplaces,GarageYrBlt,GarageCars,GarageArea,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SalePrice,MSZoning_C (all),MSZoning_FV,MSZoning_RH,MSZoning_RL,MSZoning_RM,Street_Grvl,Street_Pave,Alley_Grvl,Alley_Pave,LotShape_IR1,LotShape_IR2,LotShape_IR3,LotShape_Reg,LandContour_Bnk,LandContour_HLS,LandContour_Low,LandContour_Lvl,Utilities_AllPub,Utilities_NoSeWa,LotConfig_Corner,LotConfig_CulDSac,LotConfig_FR2,LotConfig_FR3,LotConfig_Inside,LandSlope_Gtl,LandSlope_Mod,LandSlope_Sev,Neighborhood_Blmngtn,Neighborhood_Blueste,Neighborhood_BrDale,Neighborhood_BrkSide,Neighborhood_ClearCr,Neighborhood_CollgCr,Neighborhood_Crawfor,Neighborhood_Edwards,Neighborhood_Gilbert,Neighborhood_IDOTRR,Neighborhood_MeadowV,Neighborhood_Mitchel,Neighborhood_NAmes,Neighborhood_NPkVill,Neighborhood_NWAmes,Neighborhood_NoRidge,Neighborhood_NridgHt,Neighborhood_OldTown,Neighborhood_SWISU,Neighborhood_Sawyer,Neighborhood_SawyerW,Neighborhood_Somerst,Neighborhood_StoneBr,Neighborhood_Timber,Neighborhood_Veenker,Condition1_Artery,Condition1_Feedr,Condition1_Norm,Condition1_PosA,Condition1_PosN,Condition1_RRAe,Condition1_RRAn,Condition1_RRNe,Condition1_RRNn,Condition2_Artery,Condition2_Feedr,Condition2_Norm,Condition2_PosA,Condition2_PosN,Condition2_RRAe,Condition2_RRAn,Condition2_RRNn,BldgType_1Fam,BldgType_2fmCon,BldgType_Duplex,BldgType_Twnhs,BldgType_TwnhsE,HouseStyle_1.5Fin,HouseStyle_1.5Unf,HouseStyle_1Story,HouseStyle_2.5Fin,HouseStyle_2.5Unf,HouseStyle_2Story,HouseStyle_SFoyer,HouseStyle_SLvl,RoofStyle_Flat,RoofStyle_Gable,RoofStyle_Gambrel,RoofStyle_Hip,RoofStyle_Mansard,RoofStyle_Shed,RoofMatl_ClyTile,RoofMatl_CompShg,RoofMatl_Membran,RoofMatl_Metal,RoofMatl_Roll,RoofMatl_Tar&Grv,RoofMatl_WdShake,RoofMatl_WdShngl,Exterior1st_AsbShng,Exterior1st_AsphShn,Exterior1st_BrkComm,Exterior1st_BrkFace,Exterior1st_CBlock,Exterior1st_CemntBd,Exterior1st_HdBoard,Exterior1st_ImStucc,Exterior1st_MetalSd,Exterior1st_Plywood,Exterior1st_Stone,Exterior1st_Stucco,Exterior1st_VinylSd,Exterior1st_Wd Sdng,Exterior1st_WdShing,Exterior2nd_AsbShng,Exterior2nd_AsphShn,Exterior2nd_Brk Cmn,Exterior2nd_BrkFace,Exterior2nd_CBlock,Exterior2nd_CmentBd,Exterior2nd_HdBoard,Exterior2nd_ImStucc,Exterior2nd_MetalSd,Exterior2nd_Other,Exterior2nd_Plywood,Exterior2nd_Stone,Exterior2nd_Stucco,Exterior2nd_VinylSd,Exterior2nd_Wd Sdng,Exterior2nd_Wd Shng,MasVnrType_BrkCmn,MasVnrType_BrkFace,MasVnrType_None,MasVnrType_Stone,ExterQual_Ex,ExterQual_Fa,ExterQual_Gd,ExterQual_TA,ExterCond_Ex,ExterCond_Fa,ExterCond_Gd,ExterCond_Po,ExterCond_TA,Foundation_BrkTil,Foundation_CBlock,Foundation_PConc,Foundation_Slab,Foundation_Stone,Foundation_Wood,BsmtQual_Ex,BsmtQual_Fa,BsmtQual_Gd,BsmtQual_TA,BsmtCond_Fa,BsmtCond_Gd,BsmtCond_Po,BsmtCond_TA,BsmtExposure_Av,BsmtExposure_Gd,BsmtExposure_Mn,BsmtExposure_No,BsmtFinType1_ALQ,BsmtFinType1_BLQ,BsmtFinType1_GLQ,BsmtFinType1_LwQ,BsmtFinType1_Rec,BsmtFinType1_Unf,BsmtFinType2_ALQ,BsmtFinType2_BLQ,BsmtFinType2_GLQ,BsmtFinType2_LwQ,BsmtFinType2_Rec,BsmtFinType2_Unf,Heating_Floor,Heating_GasA,Heating_GasW,Heating_Grav,Heating_OthW,Heating_Wall,HeatingQC_Ex,HeatingQC_Fa,HeatingQC_Gd,HeatingQC_Po,HeatingQC_TA,CentralAir_N,CentralAir_Y,Electrical_FuseA,Electrical_FuseF,Electrical_FuseP,Electrical_Mix,Electrical_SBrkr,KitchenQual_Ex,KitchenQual_Fa,KitchenQual_Gd,KitchenQual_TA,Functional_Maj1,Functional_Maj2,Functional_Min1,Functional_Min2,Functional_Mod,Functional_Sev,Functional_Typ,FireplaceQu_Ex,FireplaceQu_Fa,FireplaceQu_Gd,FireplaceQu_Po,FireplaceQu_TA,GarageType_2Types,GarageType_Attchd,GarageType_Basment,GarageType_BuiltIn,GarageType_CarPort,GarageType_Detchd,GarageFinish_Fin,GarageFinish_RFn,GarageFinish_Unf,GarageQual_Ex,GarageQual_Fa,GarageQual_Gd,GarageQual_Po,GarageQual_TA,GarageCond_Ex,GarageCond_Fa,GarageCond_Gd,GarageCond_Po,GarageCond_TA,PavedDrive_N,PavedDrive_P,PavedDrive_Y,PoolQC_Ex,PoolQC_Fa,PoolQC_Gd,Fence_GdPrv,Fence_GdWo,Fence_MnPrv,Fence_MnWw,MiscFeature_Gar2,MiscFeature_Othr,MiscFeature_Shed,MiscFeature_TenC,SaleType_COD,SaleType_CWD,SaleType_Con,SaleType_ConLD,SaleType_ConLI,SaleType_ConLw,SaleType_New,SaleType_Oth,SaleType_WD,SaleCondition_Abnorml,SaleCondition_AdjLand,SaleCondition_Alloca,SaleCondition_Family,SaleCondition_Normal,SaleCondition_Partial
1026,1027,20,73.0,9300,5,5,1960,1960,324.0,697,0,571,1268,1264,0,0,1264,1,0,1,0,3,1,6,2,1960.0,2,461,0,0,0,0,143,0,0,4,2010,167500,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0


In [82]:
print(df_converted.select_dtypes(include=['object']))

Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]

[1460 rows x 0 columns]
