# 3 - Feature Engineering

In [503]:
import pandas as pd
pd.set_option('display.max_columns', None)
train = pd.read_pickle('../pickles/cleaned/train_cleaned')
test = pd.read_pickle('../pickles/cleaned/test_cleaned')

### Value Mapping<p>
In order to be used in regression, all columns need to be in a numberical format. Additionally, some columns can be combined into more meaningful data points, such as room counts.<p>
First, heirarchical values, such as those ranging from 'poor' to 'excellent' can be converted into numerical ones quite easily, with 1 as the worst and counting up, using 0 where there is no data at all. 

5 <- Ex	(Excellent) <br>
4 <- Gd	(Good)<br>
3 <- TA	(Average/Typical)<br>
2 <- Fa	(Fair)<br>
1 <- Po	(Poor)<br>
0 <- None	(Doesn't have)<p>

Columns this applies to:<p>
*PoolQC*: Pool quality<br>
*FireplaceQu*: Fireplace quality<br>
*GarageCond*: Garage condition<br>
*GarageQual*: Garage quality<br>
*KitchenQual*: Kitchen quality<br>
*HeatingQC*: Heating quality and condition<br>
*BsmtCond*: Evaluates the general condition of the basement<br>
*BsmtQual*: Evaluates the height of the basement<br>
*ExterCond*: Evaluates the present condition of the material on the exterior<br>
*ExterQual*: Evaluates the quality of the material on the exterior 

In [504]:
mapping1 = {'Ex':5, 'Gd':4, 'TA':3, 'Fa':2, 'Po':1, 'None':0}

In [505]:
train.loc[:, 'PoolQC'] = train['PoolQC'].map(mapping1)
train.loc[:, 'FireplaceQu'] = train['FireplaceQu'].map(mapping1)
train.loc[:, 'GarageCond'] = train['GarageCond'].map(mapping1)
train.loc[:, 'GarageQual'] = train['GarageQual'].map(mapping1)
train.loc[:, 'KitchenQual'] = train['KitchenQual'].map(mapping1)
train.loc[:, 'HeatingQC'] = train['HeatingQC'].map(mapping1)
train.loc[:, 'BsmtCond'] = train['BsmtCond'].map(mapping1)
train.loc[:, 'BsmtQual'] = train['BsmtQual'].map(mapping1)
train.loc[:, 'ExterCond'] = train['ExterCond'].map(mapping1)
train.loc[:, 'ExterQual'] = train['ExterQual'].map(mapping1)

In [506]:
test.loc[:, 'PoolQC'] = test['PoolQC'].map(mapping1)
test.loc[:, 'FireplaceQu'] = test['FireplaceQu'].map(mapping1)
test.loc[:, 'GarageCond'] = test['GarageCond'].map(mapping1)
test.loc[:, 'GarageQual'] = test['GarageQual'].map(mapping1)
test.loc[:, 'KitchenQual'] = test['KitchenQual'].map(mapping1)
test.loc[:, 'HeatingQC'] = test['HeatingQC'].map(mapping1)
test.loc[:, 'BsmtCond'] = test['BsmtCond'].map(mapping1)
test.loc[:, 'BsmtQual'] = test['BsmtQual'].map(mapping1)
test.loc[:, 'ExterCond'] = test['ExterCond'].map(mapping1)
test.loc[:, 'ExterQual'] = test['ExterQual'].map(mapping1)

6 <- GLQ (Good Living Quarters)<br>
5 <- ALQ (Average Living Quarters)<br>
4 <- BLQ (Below Average Living Quarters)	<br>
3 <- Rec (Average Rec Room)<br>
2 <- LwQ (Low Quality)<br>
1 <- Unf (Unfinshed)<br>
0 <- None (Doesn't have)<p>

Columes this applies to:<p>
*BsmtFinType1*: Rating of basement finished area<br>
*BsmtFinType2*: Rating of basement finished area (if multiple types)

In [507]:
mapping2 = {'GLQ':6, 'ALQ':5, 'BLQ':4, 'Rec':3, 'LwQ':2, 'Unf':1, 'None':0}

In [508]:
train.loc[:, 'BsmtFinType1'] = train['BsmtFinType1'].map(mapping2)
train.loc[:, 'BsmtFinType2'] = train['BsmtFinType2'].map(mapping2)

In [509]:
test.loc[:, 'BsmtFinType1'] = test['BsmtFinType1'].map(mapping2)
test.loc[:, 'BsmtFinType2'] = test['BsmtFinType2'].map(mapping2)

2 <- Grvl	(Gravel)<br>
1 <- Pave	(Paved)<br>
0 <- None (Only on `Alley`; no alley access)<p>

Applies to:<p>
*Street*: Type of road access to property<br>
*Alley*: Type of alley access to property

In [510]:
mapping3 = {'Grvl':2, 'Pave':1, 'None':0}

In [511]:
train.loc[:, 'Street'] = train['Street'].map(mapping3)
train.loc[:, 'Alley'] = train['Alley'].map(mapping3)

In [512]:
test.loc[:, 'Street'] = test['Street'].map(mapping3)
test.loc[:, 'Alley'] = test['Alley'].map(mapping3)

3 <- Y	(Paved) <br>
2 <- P	(Partial Pavement)<br>
1 <- N	(Dirt/Gravel)<p>

Only applies to:<p>
*PavedDrive*: Paved driveway

In [513]:
pavemap = {'Y':3, 'P':2, 'N':1}

In [514]:
train.loc[:, 'PavedDrive'] = train['PavedDrive'].map(pavemap)

test.loc[:, 'PavedDrive'] = test['PavedDrive'].map(pavemap)

3 <- Fin (Finished)<br>
2 <- RFn (Rough Finished)<br>
1 <- Unf (Unfinished)<br>
0 <- None (No Garage)<p>

Only applies to:<p>
*GarageFinish*: Interior finish of the garage

In [515]:
garagemap = {'Fin':3, 'RFn':2, 'Unf':1, 'None':0}

In [516]:
train.loc[:, 'GarageFinish'] = train['GarageFinish'].map(garagemap)

test.loc[:, 'GarageFinish'] = test['GarageFinish'].map(garagemap)

8 <- Typ	(Typical Functionality)<br>
7 <- Min1	(Minor Deductions 1)<br>
6 <- Min2	(Minor Deductions 2)<br>
5 <- Mod	(Moderate Deductions)<br>
4 <- Maj1	(Major Deductions 1)<br>
3 <- Maj2	(Major Deductions 2)<br>
2 <- Sev	(Severely Damaged)<br>
1 <- Sal	(Salvage only)<p>

Only applies to:<p>
*Functional*: Home functionality (Assume typical unless deductions are warranted)

In [517]:
functmap = {'Typ':8, 'Min1':7, 'Min2':6, 'Mod':5, 'Maj1':4, 'Maj2':3, 'Sev':2, 'Sal':1}

In [518]:
train.loc[:, 'Functional'] = train['Functional'].map(functmap)

test.loc[:, 'Functional'] = test['Functional'].map(functmap)

4 <- Gd	(Good Exposure)<br>
3 <- Av	(Average Exposure)<br>
2 <- Mn	(Mimimum Exposure)<br>
1 <- No	(No Exposure)<br>
0 <- None	(No Basement)<p>

Only applies to:<p>
*BsmtExposure*: Refers to walkout or garden level walls

In [519]:
bsmtmap = {'Gd':4, 'Av':3, 'Mn':2, 'No':1, 'None':0}

In [520]:
train.loc[:, 'BsmtExposure'] = train['BsmtExposure'].map(bsmtmap)

test.loc[:, 'BsmtExposure'] = test['BsmtExposure'].map(bsmtmap)

4 <- AllPub	(All public Utilities)<br>
3 <- NoSewr	(Electricity, Gas, and Water (Septic Tank))<br>
2 <- NoSeWa	(Electricity and Gas Only)<br>
1 <- ELO	(Electricity only)<p>

Only applies to:<p>
*Utilities*: Type of utilities available<br>

In [521]:
utilmap = {'AllPub':4, 'NoSewr':3, 'NoSeWa':2, 'ELO':1}

In [522]:
train.loc[:, 'Utilities'] = train['Utilities'].map(utilmap)

test.loc[:, 'Utilities'] = test['Utilities'].map(utilmap)

Additionally, the `CentralAir` column currently contains Yes/No values, and as such can be re-mapped using the standard 1/0 schema.

In [523]:
airmap = {'Y':1, 'N':0}

In [524]:
train.loc[:, 'CentralAir'] = train['CentralAir'].map(airmap)

test.loc[:, 'CentralAir'] = test['CentralAir'].map(airmap)

### One Hot Encoding<p>
Features that aren't numerical and have no obvious heirarchy need to be handled via One Hot Encoding.

In [525]:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(sparse_output=False)

There are some column pairs that share the same possible values. In order to capture both columns' data for each pair, I'm expanding both of them and then combing them into a single column set, with a max of two points across them.

`Condition1` and `Condition2` share the following value set:<p>

Artery - Adjacent to arterial street<br>
Feedr - Adjacent to feeder street<br>
Norm - Normal	<br>
RRNn - Within 200' of North-South Railroad<br>
RRAn - Adjacent to North-South Railroad<br>
PosN - Near positive off-site feature--park, greenbelt, etc.<br>
PosA - Adjacent to postive off-site feature<br>
RRNe - Within 200' of East-West Railroad<br>
RRAe - Adjacent to East-West Railroad

In [526]:
# expand both columns and combine for coalation
train_c1 = pd.DataFrame(data=(enc.fit_transform(train[['Condition1']])),columns=(enc.get_feature_names_out()))
train_c2 = pd.DataFrame(data=(enc.fit_transform(train[['Condition2']])),columns=(enc.get_feature_names_out()))
train_condition = pd.concat([train_c1,train_c2],axis=1)

In [527]:
# rename columns that did not appear in the second set of conditions
train_condition = train_condition.rename(columns={'Condition1_RRAe':'RRAe','Condition1_RRNe':'RRNe'})

# create columns for coalating values 
train_condition['Artery'] = 0.0
train_condition['Feedr'] = 0.0
train_condition['Norm'] = 0.0
train_condition['PosA'] = 0.0
train_condition['PosN'] = 0.0
train_condition['RRAn'] = 0.0
train_condition['RRNn'] = 0.0

In [528]:
# where either 'Artery' expansion is 1, fill master column with a 1
train_condition.loc[((train_condition[
    (train_condition['Condition1_Artery']==1)| 
    (train_condition['Condition2_Artery']==1)
].index).tolist()), 'Artery'] = 1

# where either 'Feedr' expansion is 1, fill master column with a 1
train_condition.loc[((train_condition[
    (train_condition['Condition1_Feedr']==1)|
    (train_condition['Condition2_Feedr']==1)
].index).tolist()), 'Feedr'] = 1

# where either 'Norm' expansion is 1, fill master column with a 1
train_condition.loc[((train_condition[
    (train_condition['Condition1_Norm']==1)|
    (train_condition['Condition2_Norm']==1)
].index).tolist()), 'Norm'] = 1

# where either 'PosA' expansion is 1, fill master column with a 1
train_condition.loc[((train_condition[
    (train_condition['Condition1_PosA']==1)|
    (train_condition['Condition2_PosA']==1)
].index).tolist()), 'PosA'] = 1

# where either 'PosN' expansion is 1, fill master column with a 1
train_condition.loc[((train_condition[
    (train_condition['Condition1_PosN']==1)|
    (train_condition['Condition2_PosN']==1)
].index).tolist()), 'PosN'] = 1

# where either 'RRAn' expansion is 1, fill master column with a 1
train_condition.loc[((train_condition[
    (train_condition['Condition1_RRAn']==1)|
    (train_condition['Condition2_RRAn']==1)
].index).tolist()), 'RRAn'] = 1

# where either 'RRNn' expansion is 1, fill master column with a 1
train_condition.loc[((train_condition[
    (train_condition['Condition1_RRNn']==1)|
    (train_condition['Condition2_RRNn']==1)
].index).tolist()), 'RRNn'] = 1

In [529]:
# merge the columns to keep back onto main dataframe
train = pd.concat([train.reset_index(),train_condition[['Artery','Feedr','Norm','RRNn','RRAn','PosN','PosA','RRNe','RRAe']]],axis=1)

And the test set:

In [530]:
# expand both columns and combine for coalation
test_c1 = pd.DataFrame(data=(enc.fit_transform(test[['Condition1']])),columns=(enc.get_feature_names_out()))
test_c2 = pd.DataFrame(data=(enc.fit_transform(test[['Condition2']])),columns=(enc.get_feature_names_out()))
test_condition = pd.concat([test_c1,test_c2],axis=1)

In [531]:
# rename columns that did not appear in the second set of conditions
test_condition = test_condition.rename(columns={'Condition1_RRAe':'RRAe','Condition1_RRAn':'RRAn',
'Condition1_RRNe':'RRNe','Condition1_RRNn':'RRNn'})

# create columns for coalating values 
test_condition['Artery'] = 0.0
test_condition['Feedr'] = 0.0
test_condition['Norm'] = 0.0
test_condition['PosA'] = 0.0
test_condition['PosN'] = 0.0

In [532]:
# where either 'Artery' expansion is 1, fill master column with a 1
test_condition.loc[((test_condition[
    (test_condition['Condition1_Artery']==1)|
    (test_condition['Condition2_Artery']==1)
].index).tolist()), 'Artery'] = 1

# where either 'Feedr' expansion is 1, fill master column with a 1
test_condition.loc[((test_condition[
    (test_condition['Condition1_Feedr']==1)|
    (test_condition['Condition2_Feedr']==1)
].index).tolist()), 'Feedr'] = 1

# where either 'Norm' expansion is 1, fill master column with a 1
test_condition.loc[((test_condition[
    (test_condition['Condition1_Norm']==1)|
    (test_condition['Condition2_Norm']==1)
].index).tolist()), 'Norm'] = 1

# where either 'PosA' expansion is 1, fill master column with a 1
test_condition.loc[((test_condition[
    (test_condition['Condition1_PosA']==1)|
    (test_condition['Condition2_PosA']==1)
].index).tolist()), 'PosA'] = 1

# where either 'PosN' expansion is 1, fill master column with a 1
test_condition.loc[((test_condition[
    (test_condition['Condition1_PosN']==1)|
    (test_condition['Condition2_PosN']==1)
].index).tolist()), 'PosN'] = 1

In [533]:
# merge the columns to keep back onto main dataframe
test = pd.concat([test.reset_index(),test_condition[['Artery','Feedr','Norm','RRNn','RRAn','PosN','PosA','RRNe','RRAe']]],axis=1)

`Exterior1st` and `Exterior2nd` share the following value set:<p>

AsbShng - Asbestos Shingles<br>
AsphShn - Asphalt Shingles<br>
BrkComm - Brick Common<br>
BrkFace - Brick Face<br>
CBlock - Cinder Block<br>
CemntBd - Cement Board<br>
HdBoard - Hard Board<br>
ImStucc - Imitation Stucco<br>
MetalSd - Metal Siding<br>
Other - Other<br>
Plywood - Plywood<br>
PreCast - PreCast<br>
Stone  -Stone<br>
Stucco - Stucco<br>
VinylSd - Vinyl Siding<br>
Wd Sdng - Wood Siding<br>
WdShing - Wood Shingles<br>

In [534]:
train_ext1 = pd.DataFrame(data=(enc.fit_transform(train[['Exterior1st']])),columns=(enc.get_feature_names_out()))
train_ext2 = pd.DataFrame(data=(enc.fit_transform(train[['Exterior2nd']])),columns=(enc.get_feature_names_out()))
train_exterior = pd.concat([train_ext1,train_ext2],axis=1)

In [535]:
# rename the column that only appears in one dataframe
train_exterior = train_exterior.rename(columns={'Exterior2nd_Other':'Other'})

# create columns for coalating values
train_exterior['AsbShng'] = 0.0
train_exterior['AsphShn'] = 0.0
train_exterior['BrkComm'] = 0.0
train_exterior['BrkFace'] = 0.0
train_exterior['CBlock'] = 0.0
train_exterior['CemntBd'] = 0.0
train_exterior['HdBoard'] = 0.0
train_exterior['ImStucc'] = 0.0
train_exterior['MetalSd'] = 0.0
train_exterior['Plywood'] = 0.0
train_exterior['PreCast'] = 0.0
train_exterior['Stone'] = 0.0
train_exterior['Stucco'] = 0.0
train_exterior['VinylSd'] = 0.0
train_exterior['Wd Sdng'] = 0.0
train_exterior['WdShing'] = 0.0

In [544]:
# where either 'AsbShng' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_AsbShng']==1)|
    (train_exterior['Exterior2nd_AsbShng']==1)
].index).tolist()), 'AsbShng'] = 1

# where either 'AsphShn' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_AsphShn']==1)|
    (train_exterior['Exterior2nd_AsphShn']==1)
].index).tolist()), 'AsphShn'] = 1

# where either 'BrkComm' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_BrkComm']==1)|
    (train_exterior['Exterior2nd_Brk Cmn']==1)
].index).tolist()), 'BrkComm'] = 1

# where either 'BrkFace' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_BrkFace']==1)|
    (train_exterior['Exterior2nd_BrkFace']==1)
].index).tolist()), 'BrkFace'] = 1

# where either 'CBlock' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_CBlock']==1)|
    (train_exterior['Exterior2nd_CBlock']==1)
].index).tolist()), 'CBlock'] = 1

# where either 'CemntBd' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_CemntBd']==1)|
    (train_exterior['Exterior2nd_CmentBd']==1)
].index).tolist()), 'CemntBd'] = 1

# where either 'HdBoard' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_HdBoard']==1)|
    (train_exterior['Exterior2nd_HdBoard']==1)
].index).tolist()), 'HdBoard'] = 1

# where either 'ImStucc' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_ImStucc']==1)|
    (train_exterior['Exterior2nd_ImStucc']==1)
].index).tolist()), 'ImStucc'] = 1

# where either 'MetalSd' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_MetalSd']==1)|
    (train_exterior['Exterior2nd_MetalSd']==1)
].index).tolist()), 'MetalSd'] = 1

# where either 'Plywood' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_Plywood']==1)|
    (train_exterior['Exterior2nd_Plywood']==1)
].index).tolist()), 'Plywood'] = 1

# where either 'Stone' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_Stone']==1)|
    (train_exterior['Exterior2nd_Stone']==1)
].index).tolist()), 'Stone'] = 1

# where either 'Stucco' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_Stucco']==1)|
    (train_exterior['Exterior2nd_Stucco']==1)
].index).tolist()), 'Stucco'] = 1

# where either 'VinylSd' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_VinylSd']==1)|
    (train_exterior['Exterior2nd_VinylSd']==1)
].index).tolist()), 'VinylSd'] = 1

# where either 'Wd Sdng' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_Wd Sdng']==1)|
    (train_exterior['Exterior2nd_Wd Shng']==1)
].index).tolist()), 'Wd Sdng'] = 1

# where either 'WdShing' expansion is 1, fill master column with a 1
train_exterior.loc[((train_exterior[
    (train_exterior['Exterior1st_WdShing']==1)|
    (train_exterior['Exterior2nd_Wd Sdng']==1)
].index).tolist()), 'WdShing'] = 1

In [546]:
# merge the columns to keep back onto main dataframe
train = pd.concat([train,train_exterior[['AsbShng','AsphShn','BrkComm','BrkFace','CBlock','CemntBd','HdBoard','ImStucc',
'MetalSd','Other','Plywood','PreCast','Stone','Stucco','VinylSd','Wd Sdng','WdShing']]],axis=1)

The test set:

In [536]:
test_ext1 = pd.DataFrame(data=(enc.fit_transform(test[['Exterior1st']])),columns=(enc.get_feature_names_out()))
test_ext2 = pd.DataFrame(data=(enc.fit_transform(test[['Exterior2nd']])),columns=(enc.get_feature_names_out()))
test_exterior = pd.concat([test_ext1,test_ext2],axis=1)

In [547]:
# rename columns that only appear in one dataframe
test_exterior = test_exterior.rename(columns={'Exterior2nd_ImStucc':'ImStucc','Exterior2nd_Stone':'Stone'})

# create columns for coalating values
test_exterior['AsbShng'] = 0.0
test_exterior['AsphShn'] = 0.0
test_exterior['BrkComm'] = 0.0
test_exterior['BrkFace'] = 0.0
test_exterior['CBlock'] = 0.0
test_exterior['CemntBd'] = 0.0
test_exterior['HdBoard'] = 0.0
test_exterior['MetalSd'] = 0.0
test_exterior['Other'] = 0.0
test_exterior['Plywood'] = 0.0
test_exterior['PreCast'] = 0.0
test_exterior['Stucco'] = 0.0
test_exterior['VinylSd'] = 0.0
test_exterior['Wd Sdng'] = 0.0
test_exterior['WdShing'] = 0.0

In [548]:
# where either 'AsbShng' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_AsbShng']==1)|
    (test_exterior['Exterior2nd_AsbShng']==1)
].index).tolist()), 'AsbShng'] = 1

# where either 'AsphShn' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_AsphShn']==1)|
    (test_exterior['Exterior2nd_AsphShn']==1)
].index).tolist()), 'AsphShn'] = 1

# where either 'BrkComm' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_BrkComm']==1)|
    (test_exterior['Exterior2nd_Brk Cmn']==1)
].index).tolist()), 'BrkComm'] = 1

# where either 'BrkFace' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_BrkFace']==1)|
    (test_exterior['Exterior2nd_BrkFace']==1)
].index).tolist()), 'BrkFace'] = 1

# where either 'CBlock' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_CBlock']==1)|
    (test_exterior['Exterior2nd_CBlock']==1)
].index).tolist()), 'CBlock'] = 1

# where either 'CemntBd' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_CemntBd']==1)|
    (test_exterior['Exterior2nd_CmentBd']==1)
].index).tolist()), 'CemntBd'] = 1

# where either 'HdBoard' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_HdBoard']==1)|
    (test_exterior['Exterior2nd_HdBoard']==1)
].index).tolist()), 'HdBoard'] = 1

# where either 'MetalSd' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_MetalSd']==1)|
    (test_exterior['Exterior2nd_MetalSd']==1)
].index).tolist()), 'MetalSd'] = 1

# where either 'Other' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_Other']==1)|
    (test_exterior['Exterior2nd_Other']==1)
].index).tolist()), 'Other'] = 1

# where either 'Plywood' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_Plywood']==1)|
    (test_exterior['Exterior2nd_Plywood']==1)
].index).tolist()), 'Plywood'] = 1

# where either 'Stucco' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_Stucco']==1)|
    (test_exterior['Exterior2nd_Stucco']==1)
].index).tolist()), 'Stucco'] = 1

# where either 'VinylSd' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_VinylSd']==1)|
    (test_exterior['Exterior2nd_VinylSd']==1)
].index).tolist()), 'VinylSd'] = 1

# where either 'Wd Sdng' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_Wd Sdng']==1)|
    (test_exterior['Exterior2nd_Wd Shng']==1)
].index).tolist()), 'Wd Sdng'] = 1

# where either 'WdShing' expansion is 1, fill master column with a 1
test_exterior.loc[((test_exterior[
    (test_exterior['Exterior1st_WdShing']==1)|
    (test_exterior['Exterior2nd_Wd Sdng']==1)
].index).tolist()), 'WdShing'] = 1

In [549]:
# merge the columns to keep back onto main dataframe
train = pd.concat([test,test_exterior[['AsbShng','AsphShn','BrkComm','BrkFace','CBlock','CemntBd','HdBoard','ImStucc',
'MetalSd','Other','Plywood','PreCast','Stone','Stucco','VinylSd','Wd Sdng','WdShing']]],axis=1)

The `Foundation` column:

In [579]:
train_found = pd.DataFrame(data=(enc.fit_transform(train[['Foundation']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
train = pd.concat([train,train_found],axis=1)

In [580]:
test_found = pd.DataFrame(data=(enc.fit_transform(test[['Foundation']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
test = pd.concat([test,test_found],axis=1)

The `Heating` column:

In [581]:
train_heat = pd.DataFrame(data=(enc.fit_transform(train[['Heating']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
train = pd.concat([train,train_heat],axis=1)

In [582]:
test_heat = pd.DataFrame(data=(enc.fit_transform(test[['Heating']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
test = pd.concat([test,test_heat],axis=1)

The `Electrical` column:

In [583]:
train_elec = pd.DataFrame(data=(enc.fit_transform(train[['Electrical']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
train = pd.concat([train,train_elec],axis=1)

In [584]:
test_elec = pd.DataFrame(data=(enc.fit_transform(test[['Electrical']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
test = pd.concat([test,test_elec],axis=1)

The `RoofStyle` column:

In [590]:
train_RfStyl = pd.DataFrame(data=(enc.fit_transform(train[['RoofStyle']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
train = pd.concat([train,train_RfStyl],axis=1)

In [591]:
test_RfStyl = pd.DataFrame(data=(enc.fit_transform(test[['RoofStyle']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
test = pd.concat([test,test_RfStyl],axis=1)

The `RoofMatl` column:

In [None]:
train_RfMat = pd.DataFrame(data=(enc.fit_transform(train[['RoofMatl']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
train = pd.concat([train,train_RfMat],axis=1)

In [595]:
test_RfMat = pd.DataFrame(data=(enc.fit_transform(test[['RoofMatl']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
test = pd.concat([test,test_RfMat],axis=1)

The `MasVnrType` column:

In [599]:
train_vnr = pd.DataFrame(data=(enc.fit_transform(train[['MasVnrType']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
train = pd.concat([train,train_vnr],axis=1)

In [597]:
test_vnr = pd.DataFrame(data=(enc.fit_transform(test[['MasVnrType']])),columns=(enc.get_feature_names_out()))

# merge new columns back into main dataframe
test = pd.concat([test,test_vnr],axis=1)

The `MiscFeature` column:

In [None]:
train_misc = pd.DataFrame(data=(enc.fit_transform(train[['MiscFeature']])),columns=(enc.get_feature_names_out()))

# drop nan column
train_misc = train_misc.drop('MiscFeature_nan',axis=1)

# merge new columns back into main dataframe
train = pd.concat([train,train_misc],axis=1)

In [None]:
test_misc = pd.DataFrame(data=(enc.fit_transform(test[['MiscFeature']])),columns=(enc.get_feature_names_out()))

# drop nan column
test_misc = test_misc.drop('MiscFeature_nan',axis=1)

# merge new columns back into main dataframe
test = pd.concat([test,test_misc],axis=1)

**To drop:
'Condition1','Condition2', 'Exterior1st', 'Exterior2nd', 'MiscFeature','Foundation', 'Heating', 'Electrical', 'RoofStyle', 'RoofMatl', 'MasVnrType'**

### Combining Columns<p>
The total living space for each property can be calculated using the `GrLivArea` column, for the area above ground level, and the `TotalBsmtSF1` column, for the area of the basement, if one exists. 

In [538]:
train['TtlLivSF'] = train['GrLivArea']+train['TotalBsmtSF']
test['TtlLivSF'] = test['GrLivArea']+test['TotalBsmtSF']

In [539]:
print('Train head:')
display(train[['GrLivArea', 'TotalBsmtSF', 'TtlLivSF']].head())
print('Test head:')
display(test[['GrLivArea', 'TotalBsmtSF', 'TtlLivSF']].head())

Train head:


Unnamed: 0,GrLivArea,TotalBsmtSF,TtlLivSF
0,1710,856,2566
1,1262,1262,2524
2,1786,920,2706
3,1717,756,2473
4,2198,1145,3343


Test head:


Unnamed: 0,GrLivArea,TotalBsmtSF,TtlLivSF
0,896,882.0,1778.0
1,1329,1329.0,2658.0
2,1629,928.0,2557.0
3,1604,926.0,2530.0
4,1280,1280.0,2560.0


The total number of bathrooms can also be calculated using the four columns tracking number of bathrooms. (Halfbath columns are multiplied by 0.5 to reflect that they are 'half' when being added to the count.)

In [540]:
train['TotalBath'] = train['BsmtFullBath']+(train['BsmtHalfBath']*0.5)+train['FullBath']+(train['HalfBath']*0.5)
test['TotalBath'] = test['BsmtFullBath']+(test['BsmtHalfBath']*0.5)+test['FullBath']+(test['HalfBath']*0.5)

In [541]:
print('Train head:')
display(train[['BsmtFullBath', 'BsmtHalfBath', 'FullBath','HalfBath', 'TotalBath']].head())
print('Test head:')
display(test[['BsmtFullBath', 'BsmtHalfBath', 'FullBath','HalfBath', 'TotalBath']].head())

Train head:


Unnamed: 0,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,TotalBath
0,1,0,2,1,3.5
1,0,1,2,0,2.5
2,1,0,2,1,3.5
3,1,0,1,0,2.0
4,1,0,2,1,3.5


Test head:


Unnamed: 0,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,TotalBath
0,0.0,0.0,1,0,1.0
1,0.0,0.0,1,1,1.5
2,0.0,0.0,2,1,2.5
3,0.0,0.0,2,1,2.5
4,0.0,0.0,2,0,2.0
