# Encode
Here, we will encode data as follows
* Convert categorical data with orderings (likert scale type data) into ordinal data
E.g.,  `GarageQual`: NoGarage->0, Po->1, Fa->2, TA->3, Gd->4, Ex->5
* Create new features
    * Simplify Scales
    E.g., `OverallQual` has scale 1-10. 1-3 are simplified to "bad", 4-6 are "normal", 7-10 are "good"

In [28]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [29]:
import pandas as pd
import data_dict

In [30]:
df = pd.read_csv("../data/cleaned.csv")
df.drop('Unnamed: 0', inplace=True, axis=1)

Convert categorical to ordinal

In [31]:
# Exterior features
df['ExterQual'] = df.ExterQual.map(data_dict.convert_exterqual)
df['ExterCond'] = df.ExterCond.map(data_dict.convert_extercond)

In [32]:
# Basement features
df['BsmtQual'] = df.BsmtQual.map(data_dict.convert_bsmtqual)
df['BsmtCond'] = df.BsmtCond.map(data_dict.convert_bsmtcond)
df['BsmtExposure'] = df.BsmtExposure.map(data_dict.convert_bsmtexposure)
df['BsmtFinType1'] = df.BsmtFinType1.map(data_dict.convert_bsmtfintype)
df['BsmtFinType2'] = df.BsmtFinType2.map(data_dict.convert_bsmtfintype)

In [33]:
# Home Interior features
df['Functional'] = df.Functional.map(data_dict.convert_functional)
df['FireplaceQu'] = df.FireplaceQu.map(data_dict.convert_fireplacequ)
df['HeatingQC'] = df.HeatingQC.map(data_dict.convert_heatingqc)
df['KitchenQual'] = df.KitchenQual.map(data_dict.convert_kitchenqual)

In [34]:
# Land features
df['LandSlope'] = df.LandSlope.map(data_dict.convert_landslope)
df['LotShape'] = df.LotShape.map(data_dict.convert_lotshape)

In [35]:
# Garage features
df['GarageCond'] = df.GarageCond.map(data_dict.convert_garagecond)
df['GarageQual'] = df.GarageQual.map(data_dict.convert_garagequal)

In [36]:
# Road features
df['Street'] = df.Street.map(data_dict.convert_street)
df['PavedDrive'] = df.PavedDrive.map(data_dict.convert_paveddrive)
df['Alley'] = df.Alley.map(data_dict.convert_alley)

In [37]:
# Other features
df['Utilities'] = df.Utilities.map(data_dict.convert_utilities)
df['PoolQC'] = df.PoolQC.map(data_dict.convert_poolqc)

In [38]:
df.head(1) # sanity check

Unnamed: 0,PID,GrLivArea,SalePrice,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,909176150,856,126000,30,RL,66.0,7890,2,0,4,...,166,0,0,NoFence,NoMisc,0,3,2010,WD,Normal


In [39]:
# WONTFIX: We could have used df.COLUMN.replace() above

In [40]:
# See whether a property is in a PUD from the dwelling type
df['isPUD'] = df.MSSubClass.map(data_dict.get_pud_indicator)

In [51]:
# Get number of floors for the property from the dwelling type
df['NumFloors'] = df.MSSubClass.map(data_dict.get_num_floors)

In [55]:
# Convert MSSubClass and MoSold to categorical from numeric
# df['MSSubClass'] = df.MSSubClass.map(data_dict.convert_mssubclass) # groups
df['MoSold'] = df.MoSold.map(data_dict.convert_mosold)


Simplify buckets for likert scale data

In [25]:
# TODO

Save output

In [26]:
df.to_csv("../data/encoded.csv")