# Avaliação 2

### Questão 2
Utilizando a base de dados de preços de imóveis disponível no kaggle (https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data), apresente duas soluções de aprendizagem de máquina que consigam baixas taxas de erro médio.
* Lembre de fazer todo o pré-processamento, explicando suas decisões;
* Teste 3 variações de parâmetros dos métodos;
* Avalie os resultados usando uma métrica

#### Pré-processamento

É dado início ao pré-processamento, importando as bibliotecas necessárias e instanciando os databases forneceidos através do pandas.

É importante entender que:
* __sample__ equivale à instância do database que contém as saídas (coluna *SalePrice*) corretas para os testes
* __tr__ equivale à instância do database que contém os treinos
* __ts__ equivale à instância do database que contém os testes

Vale lembrar que o database __sample__ refere-se ao __teste__, e que o __teste__ não pertence ao __treino__. Portanto será interessante unir os dois databases futuramente.

In [1]:
import pandas as pd
import numpy as np
import statistics as st
import matplotlib.pyplot as plt
import plotly.express as px
from sklearn.preprocessing import RobustScaler
from sklearn import preprocessing, cluster, neighbors, svm, metrics, tree
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split, cross_validate, cross_val_score
from sklearn.metrics import make_scorer, r2_score 
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.model_selection import KFold, GridSearchCV
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor, plot_tree
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

sample = pd.read_csv("data/qt2/sample_submission.csv",sep=",")
tr = pd.read_csv("data/qt2/train.csv",sep=",")
ts = pd.read_csv("data/qt2/test.csv",sep=",")

#### Avaliando a quantidade de itens em cada database

Para entender a quantidade de itens com o qual se está trabalhando, é feita uma leitura através do comando __*len()*__.

In [2]:
print("Database de treino:", len(tr),"x",len(tr.columns))
print("Database de testes:", len(ts),"x",len(ts.columns))
print("Database que contém as saídas corretas do teste:", len(sample),"x",len(sample.columns))

Database de treino: 1460 x 81
Database de testes: 1459 x 80
Database que contém as saídas corretas do teste: 1459 x 2


#### Unindo todos os databases

Para passarem pelo pré-processamento, é interessante que todos os dados estejam juntos. Desta forma o __gs__ será unido ao __sample__, que será, por sua vez, unido ao __tr__.

❗️ __Utilizaremos o termo *dfTotal* para se referir ao DataFrame completo, que inclui o teste e o treino.__ ❗️

In [3]:
dfTotal = pd.concat([tr,pd.concat([sample, ts.drop(columns=['Id'])], axis=1)])

#### Tamanho resultante total

Para conferir a quantidade total de elementos no database resultante.

In [4]:
print(len(dfTotal),"x",len(dfTotal.columns))

2919 x 81


#### Analisando quantos itens nulos há no database

É importante conhecer quantos itens nulos existem no database para entender qual deve ser a medida adequada a se tomar mediante a quantidade de itens nulos.

In [5]:
print("-->Nulidade:")
for i in dfTotal.columns:
    print(i,"\t:",dfTotal[i].isnull().sum())

-->Nulidade:
Id 	: 0
MSSubClass 	: 0
MSZoning 	: 4
LotFrontage 	: 486
LotArea 	: 0
Street 	: 0
Alley 	: 2721
LotShape 	: 0
LandContour 	: 0
Utilities 	: 2
LotConfig 	: 0
LandSlope 	: 0
Neighborhood 	: 0
Condition1 	: 0
Condition2 	: 0
BldgType 	: 0
HouseStyle 	: 0
OverallQual 	: 0
OverallCond 	: 0
YearBuilt 	: 0
YearRemodAdd 	: 0
RoofStyle 	: 0
RoofMatl 	: 0
Exterior1st 	: 1
Exterior2nd 	: 1
MasVnrType 	: 24
MasVnrArea 	: 23
ExterQual 	: 0
ExterCond 	: 0
Foundation 	: 0
BsmtQual 	: 81
BsmtCond 	: 82
BsmtExposure 	: 82
BsmtFinType1 	: 79
BsmtFinSF1 	: 1
BsmtFinType2 	: 80
BsmtFinSF2 	: 1
BsmtUnfSF 	: 1
TotalBsmtSF 	: 1
Heating 	: 0
HeatingQC 	: 0
CentralAir 	: 0
Electrical 	: 1
1stFlrSF 	: 0
2ndFlrSF 	: 0
LowQualFinSF 	: 0
GrLivArea 	: 0
BsmtFullBath 	: 2
BsmtHalfBath 	: 2
FullBath 	: 0
HalfBath 	: 0
BedroomAbvGr 	: 0
KitchenAbvGr 	: 0
KitchenQual 	: 1
TotRmsAbvGrd 	: 0
Functional 	: 2
Fireplaces 	: 0
FireplaceQu 	: 1420
GarageType 	: 157
GarageYrBlt 	: 159
GarageFinish 	: 159
GarageCar

⚠️ __Houve muitas tabelas com valores nulos, vale nesta ocasião mostrar uma listagem para tornar o trabalho aqui desenvolvido o mais didático possível.__ ⚠️

In [6]:
amount = 0
for i in dfTotal.columns:
    if(dfTotal[i].isnull().sum()>0): 
        amount += 1
        print(i,"\t:",dfTotal[i].isnull().sum())
print("Quantidade total de tabelas que apresentaram valores nulos:",amount)

MSZoning 	: 4
LotFrontage 	: 486
Alley 	: 2721
Utilities 	: 2
Exterior1st 	: 1
Exterior2nd 	: 1
MasVnrType 	: 24
MasVnrArea 	: 23
BsmtQual 	: 81
BsmtCond 	: 82
BsmtExposure 	: 82
BsmtFinType1 	: 79
BsmtFinSF1 	: 1
BsmtFinType2 	: 80
BsmtFinSF2 	: 1
BsmtUnfSF 	: 1
TotalBsmtSF 	: 1
Electrical 	: 1
BsmtFullBath 	: 2
BsmtHalfBath 	: 2
KitchenQual 	: 1
Functional 	: 2
FireplaceQu 	: 1420
GarageType 	: 157
GarageYrBlt 	: 159
GarageFinish 	: 159
GarageCars 	: 1
GarageArea 	: 1
GarageQual 	: 159
GarageCond 	: 159
PoolQC 	: 2909
Fence 	: 2348
MiscFeature 	: 2814
SaleType 	: 1
Quantidade total de tabelas que apresentaram valores nulos: 34


⚠️ __34 colunas apresentaram valores nulos. A essa altura é importante tratar de cada uma individualmente.__ ⚠️

#### Tratando as nulidades

Algumas colunas apresentaram uma quantidade muito alta de valores nulos. Entendeu-se, desta forma, como mais adequada a decisão de apagar essas colunas, uma vez que média, desvio padrão ou moda são valores que perdem significado dada uma quantidade tão alta de valores nulos. Sendo assim, __as colunas__:

* __MiscFeature__ (com 96% dos valores nulos)
* __PoolQC__ (com 99% dos valores nulos)
* __Fence__ (com 80% dos valores nulos)
* __FireplaceQu__ (com 48% dos valores nulos)
* __Alley__ (com 93% dos valores nulos)

__serão excluídas__.

As colunas que apresentam uma quantidade desprezível de dados faltantes:

* __MSZoning__
* __Utilities__
* __Exterior1st__
* __Exterior2nd__
* __BsmtFinSF1__
* __BsmtFinSF2__
* __BsmtUnfSF__
* __TotalBsmtSF__
* __Electrical__
* __BsmtFullBath__
* __BsmtHalfBath__
* __KitchenQual__
* __Functional__
* __GarageCars__
* __GarageArea__ 
* __SaleType__

__terão as linhas que apresenta esses dados faltantes excluídas__.

As demais colunas __serão preenchidas com a mediana (no caso de colunas numéricas) ou moda (no caso de colunas qualitativas) dos valores faltantes na devida coluna__:

* __LotFrontage__
* __MasVnrType__
* __MasVnrArea__
* __BsmtQual__
* __BsmtCond__
* __BsmtExposure__
* __BsmtFinType1__
* __BsmtFinType2__
* __GarageType__
* __GarageYrBlt__
* __GarageFinish__
* __GarageQual__
* __GarageCond__

##### Tratando colunas com alta quantidade de valores nulos

In [7]:
dfTotal = dfTotal.drop(columns=['MiscFeature','PoolQC','Fence','FireplaceQu','Alley'])
print(len(dfTotal),"x",len(dfTotal.columns))

2919 x 76


##### Tratando colunas com quantidade ínfima de valores nulos

In [8]:
dfTotal = dfTotal.dropna(subset=['MSZoning','Utilities','Exterior1st','Exterior2nd','BsmtFinSF1','BsmtFinSF2','BsmtUnfSF','TotalBsmtSF','Electrical','BsmtFullBath','BsmtHalfBath','KitchenQual','Functional','GarageCars','GarageArea','SaleType'])
print(len(dfTotal),"x",len(dfTotal.columns))

2906 x 76


##### Tratando demais colunas

In [9]:
dfTotal.fillna(dfTotal['LotFrontage'].median(),inplace=True)
print("LotFrontage\t:",dfTotal['LotFrontage'].isnull().sum())

LotFrontage	: 0


In [10]:
dfTotal.fillna(dfTotal['MasVnrType'].mode(),inplace=True)
print("MasVnrType\t:",dfTotal['MasVnrType'].isnull().sum())

MasVnrType	: 0


In [11]:
dfTotal.fillna(dfTotal['MasVnrArea'].median(),inplace=True)
print("MasVnrArea\t:",dfTotal['MasVnrArea'].isnull().sum())

MasVnrArea	: 0


In [12]:
dfTotal.fillna(dfTotal['BsmtQual'].mode(),inplace=True)
print("BsmtQual\t:",dfTotal['BsmtQual'].isnull().sum())

BsmtQual	: 0


In [13]:
dfTotal.fillna(dfTotal['BsmtCond'].mode(),inplace=True)
print("BsmtCond\t:",dfTotal['BsmtCond'].isnull().sum())

BsmtCond	: 0


In [14]:
dfTotal.fillna(dfTotal['BsmtExposure'].mode(),inplace=True)
print("BsmtExposure\t:",dfTotal['BsmtExposure'].isnull().sum())

BsmtExposure	: 0


In [15]:
dfTotal.fillna(dfTotal['BsmtFinType1'].mode(),inplace=True)
print("BsmtFinType1\t:",dfTotal['BsmtFinType1'].isnull().sum())

BsmtFinType1	: 0


In [16]:
dfTotal.fillna(dfTotal['BsmtFinType2'].mode(),inplace=True)
print("BsmtFinType2\t:",dfTotal['BsmtFinType2'].isnull().sum())

BsmtFinType2	: 0


In [17]:
dfTotal.fillna(dfTotal['GarageType'].mode(),inplace=True)
print("GarageType\t:",dfTotal['GarageType'].isnull().sum())

GarageType	: 0


In [18]:
dfTotal.fillna(dfTotal['GarageYrBlt'].median(),inplace=True)
print("GarageYrBlt\t:",dfTotal['GarageYrBlt'].isnull().sum())

GarageYrBlt	: 0


In [19]:
dfTotal.fillna(dfTotal['GarageFinish'].mode(),inplace=True)
print("GarageFinish\t:",dfTotal['GarageFinish'].isnull().sum())

GarageFinish	: 0


In [20]:
dfTotal.fillna(dfTotal['GarageQual'].mode(),inplace=True)
print("GarageQual\t:",dfTotal['GarageQual'].isnull().sum())

GarageQual	: 0


In [21]:
dfTotal.fillna(dfTotal['GarageCond'].mode(),inplace=True)
print("GarageCond\t:",dfTotal['GarageCond'].isnull().sum())

GarageCond	: 0


In [22]:
print(len(dfTotal),"x",len(dfTotal.columns))

2906 x 76


#### Conferindo quantidade de itens nulos no database

In [23]:
dfTotal.isnull().sum().sum()

0

✅ __Foi feita com sucesso a limpeza de valores nulos nas tabelas.__ ✅

❗️ __Entretanto é necessário um cuidado. Há muitas colunas, entretanto, com valores qualitativos. Muito embora essas colunas qualitativas não apresentem valores nulos após a limpeza, podem conter células que apresentem conteúdo que representam o "vazio", que deve ser realizada a interpretação se esse "vazio" é proposital com a lógica do database ou se é tem o mesmo significado lógico que um nulo. Para isso, como será feito adiante, será analizado adiante quais tabelas são qualitativas e se possuem valores que representam o "vazio" supracitado.__ ❗️

#### Analizando e tratando dados qualitativos

É importante, agora, fazer uma minuciosa análise nas colunas qualitativas. Os passos serão os seguintes:

* Analisar quais são as colunas qualitativas
* Individualmente analisar se todos os seus dados (através do comando *groupby*) são consistentes ou há valores que podem ser considerados nulos

Feito isso, é também interessante converter as classificações dessas colunas em valores numéricos, com o uso da função *replace* do python. Sendo assim, com todos os dados numéricos, será possível, de maneira mais natural, o uso de diferentes métodos de IA.

In [24]:
print("Colunas qualitativas: ")
amount = 0
for i in dfTotal.columns:
    if(dfTotal.dtypes[i]=='object'):
        amount += 1
        print(amount,"-\t",i)

Colunas qualitativas: 
1 -	 MSZoning
2 -	 Street
3 -	 LotShape
4 -	 LandContour
5 -	 Utilities
6 -	 LotConfig
7 -	 LandSlope
8 -	 Neighborhood
9 -	 Condition1
10 -	 Condition2
11 -	 BldgType
12 -	 HouseStyle
13 -	 RoofStyle
14 -	 RoofMatl
15 -	 Exterior1st
16 -	 Exterior2nd
17 -	 MasVnrType
18 -	 ExterQual
19 -	 ExterCond
20 -	 Foundation
21 -	 BsmtQual
22 -	 BsmtCond
23 -	 BsmtExposure
24 -	 BsmtFinType1
25 -	 BsmtFinType2
26 -	 Heating
27 -	 HeatingQC
28 -	 CentralAir
29 -	 Electrical
30 -	 KitchenQual
31 -	 Functional
32 -	 GarageType
33 -	 GarageFinish
34 -	 GarageQual
35 -	 GarageCond
36 -	 PavedDrive
37 -	 SaleType
38 -	 SaleCondition


##### Tratando MSZoning

In [25]:
dfTotal.groupby(['MSZoning']).count()

Unnamed: 0_level_0,Id,MSSubClass,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
MSZoning,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
C (all),25,25,25,25,25,25,25,25,25,25,...,25,25,25,25,25,25,25,25,25,25
FV,139,139,139,139,139,139,139,139,139,139,...,139,139,139,139,139,139,139,139,139,139
RH,26,26,26,26,26,26,26,26,26,26,...,26,26,26,26,26,26,26,26,26,26
RL,2259,2259,2259,2259,2259,2259,2259,2259,2259,2259,...,2259,2259,2259,2259,2259,2259,2259,2259,2259,2259
RM,457,457,457,457,457,457,457,457,457,457,...,457,457,457,457,457,457,457,457,457,457


✅ __A coluna *MSZoning* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __C (all)__ será substituído por 0
* __FV__ será substituído por 1
* __RH__ será substituído por 2
* __RL__ será substituído por 3
* __RM__ será substituído por 4

In [26]:
dfTotal['MSZoning'] = dfTotal['MSZoning'].replace('C (all)',0)
dfTotal['MSZoning'] = dfTotal['MSZoning'].replace('FV',1)
dfTotal['MSZoning'] = dfTotal['MSZoning'].replace('RH',2)
dfTotal['MSZoning'] = dfTotal['MSZoning'].replace('RL',3)
dfTotal['MSZoning'] = dfTotal['MSZoning'].replace('RM',4)

dfTotal.groupby(['MSZoning']).count()

Unnamed: 0_level_0,Id,MSSubClass,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
MSZoning,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,25,25,25,25,25,25,25,25,25,25,...,25,25,25,25,25,25,25,25,25,25
1,139,139,139,139,139,139,139,139,139,139,...,139,139,139,139,139,139,139,139,139,139
2,26,26,26,26,26,26,26,26,26,26,...,26,26,26,26,26,26,26,26,26,26
3,2259,2259,2259,2259,2259,2259,2259,2259,2259,2259,...,2259,2259,2259,2259,2259,2259,2259,2259,2259,2259
4,457,457,457,457,457,457,457,457,457,457,...,457,457,457,457,457,457,457,457,457,457


##### Tratando Street

In [27]:
dfTotal.groupby(['Street']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,LotShape,LandContour,Utilities,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Grvl,11,11,11,11,11,11,11,11,11,11,...,11,11,11,11,11,11,11,11,11,11
Pave,2895,2895,2895,2895,2895,2895,2895,2895,2895,2895,...,2895,2895,2895,2895,2895,2895,2895,2895,2895,2895


✅ __A coluna *Street* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Grvl__ será substituído por 0
* __Pave__ será substituído por 1

In [28]:
dfTotal['Street'] = dfTotal['Street'].replace('Grvl',0)
dfTotal['Street'] = dfTotal['Street'].replace('Pave',1)

dfTotal.groupby(['Street']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,LotShape,LandContour,Utilities,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,11,11,11,11,11,11,11,11,11,11,...,11,11,11,11,11,11,11,11,11,11
1,2895,2895,2895,2895,2895,2895,2895,2895,2895,2895,...,2895,2895,2895,2895,2895,2895,2895,2895,2895,2895


##### Tratando LotShape

In [29]:
dfTotal.groupby(['LotShape']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LandContour,Utilities,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
LotShape,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
IR1,963,963,963,963,963,963,963,963,963,963,...,963,963,963,963,963,963,963,963,963,963
IR2,76,76,76,76,76,76,76,76,76,76,...,76,76,76,76,76,76,76,76,76,76
IR3,16,16,16,16,16,16,16,16,16,16,...,16,16,16,16,16,16,16,16,16,16
Reg,1851,1851,1851,1851,1851,1851,1851,1851,1851,1851,...,1851,1851,1851,1851,1851,1851,1851,1851,1851,1851


✅ __A coluna *LotShape* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __IR1__ será substituído por 0
* __IR2__ será substituído por 1
* __IR3__ será substituído por 2
* __Reg__ será substituído por 3

In [30]:
dfTotal['LotShape'] = dfTotal['LotShape'].replace('IR1',0)
dfTotal['LotShape'] = dfTotal['LotShape'].replace('IR2',1)
dfTotal['LotShape'] = dfTotal['LotShape'].replace('IR3',2)
dfTotal['LotShape'] = dfTotal['LotShape'].replace('Reg',3)

dfTotal.groupby(['LotShape']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LandContour,Utilities,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
LotShape,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,963,963,963,963,963,963,963,963,963,963,...,963,963,963,963,963,963,963,963,963,963
1,76,76,76,76,76,76,76,76,76,76,...,76,76,76,76,76,76,76,76,76,76
2,16,16,16,16,16,16,16,16,16,16,...,16,16,16,16,16,16,16,16,16,16
3,1851,1851,1851,1851,1851,1851,1851,1851,1851,1851,...,1851,1851,1851,1851,1851,1851,1851,1851,1851,1851


##### Tratando LandContour

In [31]:
dfTotal.groupby(['LandContour']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,Utilities,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
LandContour,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Bnk,116,116,116,116,116,116,116,116,116,116,...,116,116,116,116,116,116,116,116,116,116
HLS,120,120,120,120,120,120,120,120,120,120,...,120,120,120,120,120,120,120,120,120,120
Low,58,58,58,58,58,58,58,58,58,58,...,58,58,58,58,58,58,58,58,58,58
Lvl,2612,2612,2612,2612,2612,2612,2612,2612,2612,2612,...,2612,2612,2612,2612,2612,2612,2612,2612,2612,2612


✅ __A coluna *LandContour* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Bnk__ será substituído por 0
* __HLS__ será substituído por 1
* __Low__ será substituído por 2
* __Lvl__ será substituído por 3

In [32]:
dfTotal['LandContour'] = dfTotal['LandContour'].replace('Bnk',0)
dfTotal['LandContour'] = dfTotal['LandContour'].replace('HLS',1)
dfTotal['LandContour'] = dfTotal['LandContour'].replace('Low',2)
dfTotal['LandContour'] = dfTotal['LandContour'].replace('Lvl',3)

dfTotal.groupby(['LandContour']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,Utilities,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
LandContour,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,116,116,116,116,116,116,116,116,116,116,...,116,116,116,116,116,116,116,116,116,116
1,120,120,120,120,120,120,120,120,120,120,...,120,120,120,120,120,120,120,120,120,120
2,58,58,58,58,58,58,58,58,58,58,...,58,58,58,58,58,58,58,58,58,58
3,2612,2612,2612,2612,2612,2612,2612,2612,2612,2612,...,2612,2612,2612,2612,2612,2612,2612,2612,2612,2612


##### Tratando Utilities

In [33]:
dfTotal.groupby(['Utilities']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Utilities,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AllPub,2905,2905,2905,2905,2905,2905,2905,2905,2905,2905,...,2905,2905,2905,2905,2905,2905,2905,2905,2905,2905
NoSeWa,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1


✅ __A coluna *Utilities* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __AllPub__ será substituído por 0
* __NoSeWa__ será substituído por 1

In [34]:
dfTotal['Utilities'] = dfTotal['Utilities'].replace('AllPub',0)
dfTotal['Utilities'] = dfTotal['Utilities'].replace('NoSeWa',1)

dfTotal.groupby(['Utilities']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Utilities,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,2905,2905,2905,2905,2905,2905,2905,2905,2905,2905,...,2905,2905,2905,2905,2905,2905,2905,2905,2905,2905
1,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1


##### Tratando LotConfig

In [35]:
dfTotal.groupby(['LotConfig']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
LotConfig,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Corner,509,509,509,509,509,509,509,509,509,509,...,509,509,509,509,509,509,509,509,509,509
CulDSac,176,176,176,176,176,176,176,176,176,176,...,176,176,176,176,176,176,176,176,176,176
FR2,84,84,84,84,84,84,84,84,84,84,...,84,84,84,84,84,84,84,84,84,84
FR3,13,13,13,13,13,13,13,13,13,13,...,13,13,13,13,13,13,13,13,13,13
Inside,2124,2124,2124,2124,2124,2124,2124,2124,2124,2124,...,2124,2124,2124,2124,2124,2124,2124,2124,2124,2124


✅ __A coluna *LotConfig* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Corner__ será substituído por 0
* __CulDSac__ será substituído por 1
* __FR2__ será substituído por 2
* __FR3__ será substituído por 3
* __Inside__ será substituído por 4

In [36]:
dfTotal['LotConfig'] = dfTotal['LotConfig'].replace('Corner',0)
dfTotal['LotConfig'] = dfTotal['LotConfig'].replace('CulDSac',1)
dfTotal['LotConfig'] = dfTotal['LotConfig'].replace('FR2',2)
dfTotal['LotConfig'] = dfTotal['LotConfig'].replace('FR3',3)
dfTotal['LotConfig'] = dfTotal['LotConfig'].replace('Inside',4)

dfTotal.groupby(['LotConfig']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
LotConfig,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,509,509,509,509,509,509,509,509,509,509,...,509,509,509,509,509,509,509,509,509,509
1,176,176,176,176,176,176,176,176,176,176,...,176,176,176,176,176,176,176,176,176,176
2,84,84,84,84,84,84,84,84,84,84,...,84,84,84,84,84,84,84,84,84,84
3,13,13,13,13,13,13,13,13,13,13,...,13,13,13,13,13,13,13,13,13,13
4,2124,2124,2124,2124,2124,2124,2124,2124,2124,2124,...,2124,2124,2124,2124,2124,2124,2124,2124,2124,2124


##### Tratando LandSlope

In [37]:
dfTotal.groupby(['LandSlope']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
LandSlope,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Gtl,2766,2766,2766,2766,2766,2766,2766,2766,2766,2766,...,2766,2766,2766,2766,2766,2766,2766,2766,2766,2766
Mod,124,124,124,124,124,124,124,124,124,124,...,124,124,124,124,124,124,124,124,124,124
Sev,16,16,16,16,16,16,16,16,16,16,...,16,16,16,16,16,16,16,16,16,16


✅ __A coluna *LandSlope* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Gtl__ será substituído por 0
* __Mod__ será substituído por 1
* __Sev__ será substituído por 2

In [38]:
dfTotal['LandSlope'] = dfTotal['LandSlope'].replace('Gtl',0)
dfTotal['LandSlope'] = dfTotal['LandSlope'].replace('Mod',1)
dfTotal['LandSlope'] = dfTotal['LandSlope'].replace('Sev',2)

dfTotal.groupby(['LandSlope']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
LandSlope,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,2766,2766,2766,2766,2766,2766,2766,2766,2766,2766,...,2766,2766,2766,2766,2766,2766,2766,2766,2766,2766
1,124,124,124,124,124,124,124,124,124,124,...,124,124,124,124,124,124,124,124,124,124
2,16,16,16,16,16,16,16,16,16,16,...,16,16,16,16,16,16,16,16,16,16


##### Tratando Neighborhood

In [39]:
dfTotal.groupby(['Neighborhood']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Blmngtn,28,28,28,28,28,28,28,28,28,28,...,28,28,28,28,28,28,28,28,28,28
Blueste,10,10,10,10,10,10,10,10,10,10,...,10,10,10,10,10,10,10,10,10,10
BrDale,30,30,30,30,30,30,30,30,30,30,...,30,30,30,30,30,30,30,30,30,30
BrkSide,107,107,107,107,107,107,107,107,107,107,...,107,107,107,107,107,107,107,107,107,107
ClearCr,43,43,43,43,43,43,43,43,43,43,...,43,43,43,43,43,43,43,43,43,43
CollgCr,267,267,267,267,267,267,267,267,267,267,...,267,267,267,267,267,267,267,267,267,267
Crawfor,103,103,103,103,103,103,103,103,103,103,...,103,103,103,103,103,103,103,103,103,103
Edwards,192,192,192,192,192,192,192,192,192,192,...,192,192,192,192,192,192,192,192,192,192
Gilbert,164,164,164,164,164,164,164,164,164,164,...,164,164,164,164,164,164,164,164,164,164
IDOTRR,88,88,88,88,88,88,88,88,88,88,...,88,88,88,88,88,88,88,88,88,88


✅ __A coluna *Neighborhood* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Blmngtn__ será substituído por 0
* __Blueste__ será substituído por 1
* __BrDale__ será substituído por 2
* __BrkSide__ será substituído por 3
* __ClearCr__ será substituído por 4
* __CollgCr__ será substituído por 5
* __Crawfor__ será substituído por 6
* __Edwards__ será substituído por 7
* __Gilbert__ será substituído por 8
* __IDOTRR__ será substituído por 9
* __MeadowV__ será substituído por 10
* __Mitchel__ será substituído por 11
* __NAmes__ será substituído por 12
* __NPkVill__ será substituído por 13
* __NWAmes__ será substituído por 14
* __NoRidge__ será substituído por 15
* __NridgHt__ será substituído por 16
* __OldTown__ será substituído por 17
* __SWISU__ será substituído por 18
* __Sawyer__ será substituído por 19
* __SawyerW__ será substituído por 20
* __Somerst__ será substituído por 21
* __StoneBr__ será substituído por 22
* __Timber__ será substituído por 23
* __Veenker__ será substituído por 24

In [40]:
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('Blmngtn',0)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('Blueste',1)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('BrDale',2)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('BrkSide',3)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('ClearCr',4)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('CollgCr',5)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('Crawfor',6)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('Edwards',7)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('Gilbert',8)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('IDOTRR',9)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('MeadowV',10)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('Mitchel',11)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('NAmes',12)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('NPkVill',13)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('NWAmes',14)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('NoRidge',15)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('NridgHt',16)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('OldTown',17)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('SWISU',18)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('Sawyer',19)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('SawyerW',20)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('Somerst',21)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('StoneBr',22)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('Timber',23)
dfTotal['Neighborhood'] = dfTotal['Neighborhood'].replace('Veenker',24)

dfTotal.groupby(['Neighborhood']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,28,28,28,28,28,28,28,28,28,28,...,28,28,28,28,28,28,28,28,28,28
1,10,10,10,10,10,10,10,10,10,10,...,10,10,10,10,10,10,10,10,10,10
2,30,30,30,30,30,30,30,30,30,30,...,30,30,30,30,30,30,30,30,30,30
3,107,107,107,107,107,107,107,107,107,107,...,107,107,107,107,107,107,107,107,107,107
4,43,43,43,43,43,43,43,43,43,43,...,43,43,43,43,43,43,43,43,43,43
5,267,267,267,267,267,267,267,267,267,267,...,267,267,267,267,267,267,267,267,267,267
6,103,103,103,103,103,103,103,103,103,103,...,103,103,103,103,103,103,103,103,103,103
7,192,192,192,192,192,192,192,192,192,192,...,192,192,192,192,192,192,192,192,192,192
8,164,164,164,164,164,164,164,164,164,164,...,164,164,164,164,164,164,164,164,164,164
9,88,88,88,88,88,88,88,88,88,88,...,88,88,88,88,88,88,88,88,88,88


##### Tratando Condition1

In [41]:
dfTotal.groupby(['Condition1']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Condition1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Artery,90,90,90,90,90,90,90,90,90,90,...,90,90,90,90,90,90,90,90,90,90
Feedr,161,161,161,161,161,161,161,161,161,161,...,161,161,161,161,161,161,161,161,161,161
Norm,2503,2503,2503,2503,2503,2503,2503,2503,2503,2503,...,2503,2503,2503,2503,2503,2503,2503,2503,2503,2503
PosA,20,20,20,20,20,20,20,20,20,20,...,20,20,20,20,20,20,20,20,20,20
PosN,39,39,39,39,39,39,39,39,39,39,...,39,39,39,39,39,39,39,39,39,39
RRAe,28,28,28,28,28,28,28,28,28,28,...,28,28,28,28,28,28,28,28,28,28
RRAn,50,50,50,50,50,50,50,50,50,50,...,50,50,50,50,50,50,50,50,50,50
RRNe,6,6,6,6,6,6,6,6,6,6,...,6,6,6,6,6,6,6,6,6,6
RRNn,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9


✅ __A coluna *Condition1* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Artery__ será substituído por 0
* __Feedr__ será substituído por 1
* __Norm__ será substituído por 2
* __PosA__ será substituído por 3
* __PosN__ será substituído por 4
* __RRAe__ será substituído por 5
* __RRAn__ será substituído por 6
* __RRNe__ será substituído por 7
* __RRNn__ será substituído por 8

In [42]:
dfTotal['Condition1'] = dfTotal['Condition1'].replace('Artery',0)
dfTotal['Condition1'] = dfTotal['Condition1'].replace('Feedr',1)
dfTotal['Condition1'] = dfTotal['Condition1'].replace('Norm',2)
dfTotal['Condition1'] = dfTotal['Condition1'].replace('PosA',3)
dfTotal['Condition1'] = dfTotal['Condition1'].replace('PosN',4)
dfTotal['Condition1'] = dfTotal['Condition1'].replace('RRAe',5)
dfTotal['Condition1'] = dfTotal['Condition1'].replace('RRAn',6)
dfTotal['Condition1'] = dfTotal['Condition1'].replace('RRNe',7)
dfTotal['Condition1'] = dfTotal['Condition1'].replace('RRNn',8)

dfTotal.groupby(['Condition1']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Condition1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,90,90,90,90,90,90,90,90,90,90,...,90,90,90,90,90,90,90,90,90,90
1,161,161,161,161,161,161,161,161,161,161,...,161,161,161,161,161,161,161,161,161,161
2,2503,2503,2503,2503,2503,2503,2503,2503,2503,2503,...,2503,2503,2503,2503,2503,2503,2503,2503,2503,2503
3,20,20,20,20,20,20,20,20,20,20,...,20,20,20,20,20,20,20,20,20,20
4,39,39,39,39,39,39,39,39,39,39,...,39,39,39,39,39,39,39,39,39,39
5,28,28,28,28,28,28,28,28,28,28,...,28,28,28,28,28,28,28,28,28,28
6,50,50,50,50,50,50,50,50,50,50,...,50,50,50,50,50,50,50,50,50,50
7,6,6,6,6,6,6,6,6,6,6,...,6,6,6,6,6,6,6,6,6,6
8,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9


##### Tratando Condition2

In [43]:
dfTotal.groupby(['Condition2']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Condition2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Artery,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5
Feedr,13,13,13,13,13,13,13,13,13,13,...,13,13,13,13,13,13,13,13,13,13
Norm,2876,2876,2876,2876,2876,2876,2876,2876,2876,2876,...,2876,2876,2876,2876,2876,2876,2876,2876,2876,2876
PosA,4,4,4,4,4,4,4,4,4,4,...,4,4,4,4,4,4,4,4,4,4
PosN,4,4,4,4,4,4,4,4,4,4,...,4,4,4,4,4,4,4,4,4,4
RRAe,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
RRAn,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
RRNn,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2


✅ __A coluna *Condition2* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Artery__ será substituído por 0
* __Feedr__ será substituído por 1
* __Norm__ será substituído por 2
* __PosA__ será substituído por 3
* __PosN__ será substituído por 4
* __RRAe__ será substituído por 5
* __RRAn__ será substituído por 6
* __RRNn__ será substituído por 7

In [44]:
dfTotal['Condition2'] = dfTotal['Condition2'].replace('Artery',0)
dfTotal['Condition2'] = dfTotal['Condition2'].replace('Feedr',1)
dfTotal['Condition2'] = dfTotal['Condition2'].replace('Norm',2)
dfTotal['Condition2'] = dfTotal['Condition2'].replace('PosA',3)
dfTotal['Condition2'] = dfTotal['Condition2'].replace('PosN',4)
dfTotal['Condition2'] = dfTotal['Condition2'].replace('RRAe',5)
dfTotal['Condition2'] = dfTotal['Condition2'].replace('RRAn',6)
dfTotal['Condition2'] = dfTotal['Condition2'].replace('RRNn',7)

dfTotal.groupby(['Condition2']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Condition2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5
1,13,13,13,13,13,13,13,13,13,13,...,13,13,13,13,13,13,13,13,13,13
2,2876,2876,2876,2876,2876,2876,2876,2876,2876,2876,...,2876,2876,2876,2876,2876,2876,2876,2876,2876,2876
3,4,4,4,4,4,4,4,4,4,4,...,4,4,4,4,4,4,4,4,4,4
4,4,4,4,4,4,4,4,4,4,4,...,4,4,4,4,4,4,4,4,4,4
5,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
6,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
7,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2


##### Tratando BldgType

In [45]:
dfTotal.groupby(['BldgType']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BldgType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1Fam,2412,2412,2412,2412,2412,2412,2412,2412,2412,2412,...,2412,2412,2412,2412,2412,2412,2412,2412,2412,2412
2fmCon,62,62,62,62,62,62,62,62,62,62,...,62,62,62,62,62,62,62,62,62,62
Duplex,109,109,109,109,109,109,109,109,109,109,...,109,109,109,109,109,109,109,109,109,109
Twnhs,96,96,96,96,96,96,96,96,96,96,...,96,96,96,96,96,96,96,96,96,96
TwnhsE,227,227,227,227,227,227,227,227,227,227,...,227,227,227,227,227,227,227,227,227,227


✅ __A coluna *BldgType* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __1Fam__ será substituído por 0
* __2fmCon__ será substituído por 1
* __Duplex__ será substituído por 2
* __Twnhs__ será substituído por 3
* __TwnhsE__ será substituído por 4

In [46]:
dfTotal['BldgType'] = dfTotal['BldgType'].replace('1Fam',0)
dfTotal['BldgType'] = dfTotal['BldgType'].replace('2fmCon',1)
dfTotal['BldgType'] = dfTotal['BldgType'].replace('Duplex',2)
dfTotal['BldgType'] = dfTotal['BldgType'].replace('Twnhs',3)
dfTotal['BldgType'] = dfTotal['BldgType'].replace('TwnhsE',4)

dfTotal.groupby(['BldgType']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BldgType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,2412,2412,2412,2412,2412,2412,2412,2412,2412,2412,...,2412,2412,2412,2412,2412,2412,2412,2412,2412,2412
1,62,62,62,62,62,62,62,62,62,62,...,62,62,62,62,62,62,62,62,62,62
2,109,109,109,109,109,109,109,109,109,109,...,109,109,109,109,109,109,109,109,109,109
3,96,96,96,96,96,96,96,96,96,96,...,96,96,96,96,96,96,96,96,96,96
4,227,227,227,227,227,227,227,227,227,227,...,227,227,227,227,227,227,227,227,227,227


##### Tratando HouseStyle

In [47]:
dfTotal.groupby(['HouseStyle']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
HouseStyle,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.5Fin,312,312,312,312,312,312,312,312,312,312,...,312,312,312,312,312,312,312,312,312,312
1.5Unf,19,19,19,19,19,19,19,19,19,19,...,19,19,19,19,19,19,19,19,19,19
1Story,1463,1463,1463,1463,1463,1463,1463,1463,1463,1463,...,1463,1463,1463,1463,1463,1463,1463,1463,1463,1463
2.5Fin,8,8,8,8,8,8,8,8,8,8,...,8,8,8,8,8,8,8,8,8,8
2.5Unf,23,23,23,23,23,23,23,23,23,23,...,23,23,23,23,23,23,23,23,23,23
2Story,871,871,871,871,871,871,871,871,871,871,...,871,871,871,871,871,871,871,871,871,871
SFoyer,83,83,83,83,83,83,83,83,83,83,...,83,83,83,83,83,83,83,83,83,83
SLvl,127,127,127,127,127,127,127,127,127,127,...,127,127,127,127,127,127,127,127,127,127


✅ __A coluna *HouseStyle* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __1.5Fin__ será substituído por 0
* __1.5Unf__ será substituído por 1
* __1Story__ será substituído por 2
* __2.5Fin__ será substituído por 3
* __2.5Unf__ será substituído por 4
* __2Story__ será substituído por 5
* __SFoyer__ será substituído por 6
* __SLvl__ será substituído por 7

In [48]:
dfTotal['HouseStyle'] = dfTotal['HouseStyle'].replace('1.5Fin',0)
dfTotal['HouseStyle'] = dfTotal['HouseStyle'].replace('1.5Unf',1)
dfTotal['HouseStyle'] = dfTotal['HouseStyle'].replace('1Story',2)
dfTotal['HouseStyle'] = dfTotal['HouseStyle'].replace('2.5Fin',3)
dfTotal['HouseStyle'] = dfTotal['HouseStyle'].replace('2.5Unf',4)
dfTotal['HouseStyle'] = dfTotal['HouseStyle'].replace('2Story',5)
dfTotal['HouseStyle'] = dfTotal['HouseStyle'].replace('SFoyer',6)
dfTotal['HouseStyle'] = dfTotal['HouseStyle'].replace('SLvl',7)

dfTotal.groupby(['HouseStyle']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
HouseStyle,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,312,312,312,312,312,312,312,312,312,312,...,312,312,312,312,312,312,312,312,312,312
1,19,19,19,19,19,19,19,19,19,19,...,19,19,19,19,19,19,19,19,19,19
2,1463,1463,1463,1463,1463,1463,1463,1463,1463,1463,...,1463,1463,1463,1463,1463,1463,1463,1463,1463,1463
3,8,8,8,8,8,8,8,8,8,8,...,8,8,8,8,8,8,8,8,8,8
4,23,23,23,23,23,23,23,23,23,23,...,23,23,23,23,23,23,23,23,23,23
5,871,871,871,871,871,871,871,871,871,871,...,871,871,871,871,871,871,871,871,871,871
6,83,83,83,83,83,83,83,83,83,83,...,83,83,83,83,83,83,83,83,83,83
7,127,127,127,127,127,127,127,127,127,127,...,127,127,127,127,127,127,127,127,127,127


##### Tratando RoofStyle

In [49]:
dfTotal.groupby(['RoofStyle']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
RoofStyle,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Flat,19,19,19,19,19,19,19,19,19,19,...,19,19,19,19,19,19,19,19,19,19
Gable,2300,2300,2300,2300,2300,2300,2300,2300,2300,2300,...,2300,2300,2300,2300,2300,2300,2300,2300,2300,2300
Gambrel,22,22,22,22,22,22,22,22,22,22,...,22,22,22,22,22,22,22,22,22,22
Hip,549,549,549,549,549,549,549,549,549,549,...,549,549,549,549,549,549,549,549,549,549
Mansard,11,11,11,11,11,11,11,11,11,11,...,11,11,11,11,11,11,11,11,11,11
Shed,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5


✅ __A coluna *RoofStyle* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Flat__ será substituído por 0
* __Gable__ será substituído por 1
* __Gambrel__ será substituído por 2
* __Hip__ será substituído por 3
* __Mansard__ será substituído por 4
* __Shed__ será substituído por 5

In [50]:
dfTotal['RoofStyle'] = dfTotal['RoofStyle'].replace('Flat',0)
dfTotal['RoofStyle'] = dfTotal['RoofStyle'].replace('Gable',1)
dfTotal['RoofStyle'] = dfTotal['RoofStyle'].replace('Gambrel',2)
dfTotal['RoofStyle'] = dfTotal['RoofStyle'].replace('Hip',3)
dfTotal['RoofStyle'] = dfTotal['RoofStyle'].replace('Mansard',4)
dfTotal['RoofStyle'] = dfTotal['RoofStyle'].replace('Shed',5)

dfTotal.groupby(['RoofStyle']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
RoofStyle,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,19,19,19,19,19,19,19,19,19,19,...,19,19,19,19,19,19,19,19,19,19
1,2300,2300,2300,2300,2300,2300,2300,2300,2300,2300,...,2300,2300,2300,2300,2300,2300,2300,2300,2300,2300
2,22,22,22,22,22,22,22,22,22,22,...,22,22,22,22,22,22,22,22,22,22
3,549,549,549,549,549,549,549,549,549,549,...,549,549,549,549,549,549,549,549,549,549
4,11,11,11,11,11,11,11,11,11,11,...,11,11,11,11,11,11,11,11,11,11
5,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5


##### Tratando RoofMatl

In [51]:
dfTotal.groupby(['RoofMatl']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
RoofMatl,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ClyTile,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
CompShg,2864,2864,2864,2864,2864,2864,2864,2864,2864,2864,...,2864,2864,2864,2864,2864,2864,2864,2864,2864,2864
Membran,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Metal,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Roll,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Tar&Grv,22,22,22,22,22,22,22,22,22,22,...,22,22,22,22,22,22,22,22,22,22
WdShake,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9
WdShngl,7,7,7,7,7,7,7,7,7,7,...,7,7,7,7,7,7,7,7,7,7


✅ __A coluna *RoofMatl* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __ClyTile__ será substituído por 0
* __CompShp__ será substituído por 1
* __Membran__ será substituído por 2
* __Metal__ será substituído por 3
* __Roll__ será substituído por 4
* __Tar&Grv__ será substituído por 5
* __WdShake__ será substituído por 6
* __WdShngl__ será substituído por 7

In [52]:
dfTotal['RoofMatl'] = dfTotal['RoofMatl'].replace('ClyTile',0)
dfTotal['RoofMatl'] = dfTotal['RoofMatl'].replace('CompShp',1)
dfTotal['RoofMatl'] = dfTotal['RoofMatl'].replace('Membran',2)
dfTotal['RoofMatl'] = dfTotal['RoofMatl'].replace('Metal',3)
dfTotal['RoofMatl'] = dfTotal['RoofMatl'].replace('Roll',4)
dfTotal['RoofMatl'] = dfTotal['RoofMatl'].replace('Tar&Grv',5)
dfTotal['RoofMatl'] = dfTotal['RoofMatl'].replace('WdShake',6)
dfTotal['RoofMatl'] = dfTotal['RoofMatl'].replace('WdShngl',7)

dfTotal.groupby(['RoofMatl']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
RoofMatl,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
2,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
3,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
4,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
5,22,22,22,22,22,22,22,22,22,22,...,22,22,22,22,22,22,22,22,22,22
6,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9
7,7,7,7,7,7,7,7,7,7,7,...,7,7,7,7,7,7,7,7,7,7
CompShg,2864,2864,2864,2864,2864,2864,2864,2864,2864,2864,...,2864,2864,2864,2864,2864,2864,2864,2864,2864,2864


##### Tratando Exterior1st

In [53]:
dfTotal.groupby(['Exterior1st']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Exterior1st,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AsbShng,43,43,43,43,43,43,43,43,43,43,...,43,43,43,43,43,43,43,43,43,43
AsphShn,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
BrkComm,6,6,6,6,6,6,6,6,6,6,...,6,6,6,6,6,6,6,6,6,6
BrkFace,86,86,86,86,86,86,86,86,86,86,...,86,86,86,86,86,86,86,86,86,86
CBlock,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
CemntBd,126,126,126,126,126,126,126,126,126,126,...,126,126,126,126,126,126,126,126,126,126
HdBoard,442,442,442,442,442,442,442,442,442,442,...,442,442,442,442,442,442,442,442,442,442
ImStucc,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
MetalSd,449,449,449,449,449,449,449,449,449,449,...,449,449,449,449,449,449,449,449,449,449
Plywood,219,219,219,219,219,219,219,219,219,219,...,219,219,219,219,219,219,219,219,219,219


✅ __A coluna *Exterior1st* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __AsbShng__ será substituído por 0
* __AsphShn__ será substituído por 1
* __BrkComm__ será substituído por 2
* __BrkFace__ será substituído por 3
* __CBlock__ será substituído por 4
* __CemntBd__ será substituído por 5
* __HdBoard__ será substituído por 6
* __ImStucc__ será substituído por 7
* __MetalSd__ será substituído por 8
* __Plywood__ será substituído por 9
* __Stone__ será substituído por 10
* __Stucco__ será substituído por 11
* __VinylSd__ será substituído por 12
* __Wd Sdng__ será substituído por 13
* __WdShing__ será substituído por 14

In [54]:
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('AsbShng',0)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('AsphShn',1)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('BrkComm',2)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('BrkFace',3)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('CBlock',4)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('CemntBd',5)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('HdBoard',6)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('ImStucc',7)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('MetalSd',8)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('Plywood',9)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('Stone',10)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('Stucco',11)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('VinylSd',12)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('Wd Sdng',13)
dfTotal['Exterior1st'] = dfTotal['Exterior1st'].replace('WdShing',14)

dfTotal.groupby(['Exterior1st']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Exterior1st,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,43,43,43,43,43,43,43,43,43,43,...,43,43,43,43,43,43,43,43,43,43
1,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
2,6,6,6,6,6,6,6,6,6,6,...,6,6,6,6,6,6,6,6,6,6
3,86,86,86,86,86,86,86,86,86,86,...,86,86,86,86,86,86,86,86,86,86
4,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
5,126,126,126,126,126,126,126,126,126,126,...,126,126,126,126,126,126,126,126,126,126
6,442,442,442,442,442,442,442,442,442,442,...,442,442,442,442,442,442,442,442,442,442
7,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
8,449,449,449,449,449,449,449,449,449,449,...,449,449,449,449,449,449,449,449,449,449
9,219,219,219,219,219,219,219,219,219,219,...,219,219,219,219,219,219,219,219,219,219


##### Tratando Exterior2nd

In [55]:
dfTotal.groupby(['Exterior2nd']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Exterior2nd,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AsbShng,38,38,38,38,38,38,38,38,38,38,...,38,38,38,38,38,38,38,38,38,38
AsphShn,4,4,4,4,4,4,4,4,4,4,...,4,4,4,4,4,4,4,4,4,4
Brk Cmn,22,22,22,22,22,22,22,22,22,22,...,22,22,22,22,22,22,22,22,22,22
BrkFace,46,46,46,46,46,46,46,46,46,46,...,46,46,46,46,46,46,46,46,46,46
CBlock,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
CmentBd,126,126,126,126,126,126,126,126,126,126,...,126,126,126,126,126,126,126,126,126,126
HdBoard,406,406,406,406,406,406,406,406,406,406,...,406,406,406,406,406,406,406,406,406,406
ImStucc,15,15,15,15,15,15,15,15,15,15,...,15,15,15,15,15,15,15,15,15,15
MetalSd,447,447,447,447,447,447,447,447,447,447,...,447,447,447,447,447,447,447,447,447,447
Other,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1


✅ __A coluna *Exterior2nd* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __AsbShng__ será substituído por 0
* __AsphShn__ será substituído por 1
* __Brk Cmn__ será substituído por 2
* __BrkFace__ será substituído por 3
* __CBlock__ será substituído por 4
* __CmentBd__ será substituído por 5
* __HdBoard__ será substituído por 6
* __ImStucc__ será substituído por 7
* __MetalSd__ será substituído por 8
* __Plywood__ será substituído por 9
* __Stone__ será substituído por 10
* __Stucco__ será substituído por 11
* __VinylSd__ será substituído por 12
* __Wd Sdng__ será substituído por 13
* __Wd Shng__ será substituído por 14
* __Other__ será substituído por 15

In [56]:
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('AsbShng',0)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('AsphShn',1)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('Brk Cmn',2)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('BrkFace',3)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('CBlock',4)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('CmentBd',5)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('HdBoard',6)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('ImStucc',7)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('MetalSd',8)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('Plywood',9)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('Stone',10)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('Stucco',11)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('VinylSd',12)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('Wd Sdng',13)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('Wd Shng',14)
dfTotal['Exterior2nd'] = dfTotal['Exterior2nd'].replace('Other',15)

dfTotal.groupby(['Exterior2nd']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Exterior2nd,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,38,38,38,38,38,38,38,38,38,38,...,38,38,38,38,38,38,38,38,38,38
1,4,4,4,4,4,4,4,4,4,4,...,4,4,4,4,4,4,4,4,4,4
2,22,22,22,22,22,22,22,22,22,22,...,22,22,22,22,22,22,22,22,22,22
3,46,46,46,46,46,46,46,46,46,46,...,46,46,46,46,46,46,46,46,46,46
4,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
5,126,126,126,126,126,126,126,126,126,126,...,126,126,126,126,126,126,126,126,126,126
6,406,406,406,406,406,406,406,406,406,406,...,406,406,406,406,406,406,406,406,406,406
7,15,15,15,15,15,15,15,15,15,15,...,15,15,15,15,15,15,15,15,15,15
8,447,447,447,447,447,447,447,447,447,447,...,447,447,447,447,447,447,447,447,447,447
9,267,267,267,267,267,267,267,267,267,267,...,267,267,267,267,267,267,267,267,267,267


##### Tratando MasVnrType

In [57]:
dfTotal.groupby(['MasVnrType']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
MasVnrType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
68.0,24,24,24,24,24,24,24,24,24,24,...,24,24,24,24,24,24,24,24,24,24
BrkCmn,25,25,25,25,25,25,25,25,25,25,...,25,25,25,25,25,25,25,25,25,25
BrkFace,878,878,878,878,878,878,878,878,878,878,...,878,878,878,878,878,878,878,878,878,878
,1730,1730,1730,1730,1730,1730,1730,1730,1730,1730,...,1730,1730,1730,1730,1730,1730,1730,1730,1730,1730
Stone,249,249,249,249,249,249,249,249,249,249,...,249,249,249,249,249,249,249,249,249,249


✅ __A coluna *MasVnrType* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

❗️ __Vale lembrar que a coluna *None* é prevista no database e tem um significado, logo ela não deve ser tratada como nula.__ ❗️ 

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __68.0__ será substituído por 0
* __BrkCmn__ será substituído por 1
* __BrkFace__ será substituído por 2
* __None__ será substituído por 3
* __Stone__ será substituído por 4

In [81]:
dfTotal['MasVnrType'] = dfTotal['MasVnrType'].replace(68.0,0)
dfTotal['MasVnrType'] = dfTotal['MasVnrType'].replace('BrkCmn',1)
dfTotal['MasVnrType'] = dfTotal['MasVnrType'].replace('BrkFace',2)
dfTotal['MasVnrType'] = dfTotal['MasVnrType'].replace('None',3)
dfTotal['MasVnrType'] = dfTotal['MasVnrType'].replace('Stone',4)

dfTotal.groupby(['MasVnrType']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
MasVnrType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0.0,24,24,24,24,24,24,24,24,24,24,...,24,24,24,24,24,24,24,24,24,24
1.0,25,25,25,25,25,25,25,25,25,25,...,25,25,25,25,25,25,25,25,25,25
2.0,878,878,878,878,878,878,878,878,878,878,...,878,878,878,878,878,878,878,878,878,878
3.0,1730,1730,1730,1730,1730,1730,1730,1730,1730,1730,...,1730,1730,1730,1730,1730,1730,1730,1730,1730,1730
4.0,249,249,249,249,249,249,249,249,249,249,...,249,249,249,249,249,249,249,249,249,249


##### Tratando ExterQual

In [58]:
dfTotal.groupby(['ExterQual']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
ExterQual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Ex,107,107,107,107,107,107,107,107,107,107,...,107,107,107,107,107,107,107,107,107,107
Fa,32,32,32,32,32,32,32,32,32,32,...,32,32,32,32,32,32,32,32,32,32
Gd,979,979,979,979,979,979,979,979,979,979,...,979,979,979,979,979,979,979,979,979,979
TA,1788,1788,1788,1788,1788,1788,1788,1788,1788,1788,...,1788,1788,1788,1788,1788,1788,1788,1788,1788,1788


✅ __A coluna *ExterQual* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Ex__ será substituído por 0
* __Fa__ será substituído por 1
* __Gd__ será substituído por 2
* __TA__ será substituído por 3

In [82]:
dfTotal['ExterQual'] = dfTotal['ExterQual'].replace('Ex',0)
dfTotal['ExterQual'] = dfTotal['ExterQual'].replace('Fa',1)
dfTotal['ExterQual'] = dfTotal['ExterQual'].replace('Gd',2)
dfTotal['ExterQual'] = dfTotal['ExterQual'].replace('TA',3)

dfTotal.groupby(['ExterQual']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
ExterQual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,107,107,107,107,107,107,107,107,107,107,...,107,107,107,107,107,107,107,107,107,107
1,32,32,32,32,32,32,32,32,32,32,...,32,32,32,32,32,32,32,32,32,32
2,979,979,979,979,979,979,979,979,979,979,...,979,979,979,979,979,979,979,979,979,979
3,1788,1788,1788,1788,1788,1788,1788,1788,1788,1788,...,1788,1788,1788,1788,1788,1788,1788,1788,1788,1788


##### Tratando ExterCond

In [59]:
dfTotal.groupby(['ExterCond']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
ExterCond,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Ex,12,12,12,12,12,12,12,12,12,12,...,12,12,12,12,12,12,12,12,12,12
Fa,64,64,64,64,64,64,64,64,64,64,...,64,64,64,64,64,64,64,64,64,64
Gd,299,299,299,299,299,299,299,299,299,299,...,299,299,299,299,299,299,299,299,299,299
Po,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
TA,2529,2529,2529,2529,2529,2529,2529,2529,2529,2529,...,2529,2529,2529,2529,2529,2529,2529,2529,2529,2529


✅ __A coluna *ExterCond* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Ex__ será substituído por 0
* __Fa__ será substituído por 1
* __Gd__ será substituído por 2
* __TA__ será substituído por 3
* __Po__ será substituído por 4

In [83]:
dfTotal['ExterCond'] = dfTotal['ExterCond'].replace('Ex',0)
dfTotal['ExterCond'] = dfTotal['ExterCond'].replace('Fa',1)
dfTotal['ExterCond'] = dfTotal['ExterCond'].replace('Gd',2)
dfTotal['ExterCond'] = dfTotal['ExterCond'].replace('TA',3)
dfTotal['ExterCond'] = dfTotal['ExterCond'].replace('Po',4)

dfTotal.groupby(['ExterCond']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
ExterCond,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,12,12,12,12,12,12,12,12,12,12,...,12,12,12,12,12,12,12,12,12,12
1,64,64,64,64,64,64,64,64,64,64,...,64,64,64,64,64,64,64,64,64,64
2,299,299,299,299,299,299,299,299,299,299,...,299,299,299,299,299,299,299,299,299,299
3,2529,2529,2529,2529,2529,2529,2529,2529,2529,2529,...,2529,2529,2529,2529,2529,2529,2529,2529,2529,2529
4,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2


##### Tratando Foundation

In [60]:
dfTotal.groupby(['Foundation']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Foundation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
BrkTil,308,308,308,308,308,308,308,308,308,308,...,308,308,308,308,308,308,308,308,308,308
CBlock,1230,1230,1230,1230,1230,1230,1230,1230,1230,1230,...,1230,1230,1230,1230,1230,1230,1230,1230,1230,1230
PConc,1305,1305,1305,1305,1305,1305,1305,1305,1305,1305,...,1305,1305,1305,1305,1305,1305,1305,1305,1305,1305
Slab,47,47,47,47,47,47,47,47,47,47,...,47,47,47,47,47,47,47,47,47,47
Stone,11,11,11,11,11,11,11,11,11,11,...,11,11,11,11,11,11,11,11,11,11
Wood,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5


✅ __A coluna *ExterCond* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __BrkTil__ será substituído por 0
* __CBlock__ será substituído por 1
* __PConc__ será substituído por 2
* __Slab__ será substituído por 3
* __Stone__ será substituído por 4
* __Wood__ será substituído por 5

In [85]:
dfTotal['Foundation'] = dfTotal['Foundation'].replace('BrkTil',0)
dfTotal['Foundation'] = dfTotal['Foundation'].replace('CBlock',1)
dfTotal['Foundation'] = dfTotal['Foundation'].replace('PConc',2)
dfTotal['Foundation'] = dfTotal['Foundation'].replace('Slab',3)
dfTotal['Foundation'] = dfTotal['Foundation'].replace('Stone',4)
dfTotal['Foundation'] = dfTotal['Foundation'].replace('Wood',5)

dfTotal.groupby(['Foundation']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Foundation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,308,308,308,308,308,308,308,308,308,308,...,308,308,308,308,308,308,308,308,308,308
1,1230,1230,1230,1230,1230,1230,1230,1230,1230,1230,...,1230,1230,1230,1230,1230,1230,1230,1230,1230,1230
2,1305,1305,1305,1305,1305,1305,1305,1305,1305,1305,...,1305,1305,1305,1305,1305,1305,1305,1305,1305,1305
3,47,47,47,47,47,47,47,47,47,47,...,47,47,47,47,47,47,47,47,47,47
4,11,11,11,11,11,11,11,11,11,11,...,11,11,11,11,11,11,11,11,11,11
5,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5


##### Tratando BsmtQual

In [61]:
dfTotal.groupby(['BsmtQual']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BsmtQual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
68.0,76,76,76,76,76,76,76,76,76,76,...,76,76,76,76,76,76,76,76,76,76
Ex,258,258,258,258,258,258,258,258,258,258,...,258,258,258,258,258,258,258,258,258,258
Fa,88,88,88,88,88,88,88,88,88,88,...,88,88,88,88,88,88,88,88,88,88
Gd,1206,1206,1206,1206,1206,1206,1206,1206,1206,1206,...,1206,1206,1206,1206,1206,1206,1206,1206,1206,1206
TA,1278,1278,1278,1278,1278,1278,1278,1278,1278,1278,...,1278,1278,1278,1278,1278,1278,1278,1278,1278,1278


✅ __A coluna *BsmtQual* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __68.0__ será substituído por 0
* __Ex__ será substituído por 1
* __Fa__ será substituído por 2
* __Gd__ será substituído por 3
* __TA__ será substituído por 4

In [86]:
dfTotal['BsmtQual'] = dfTotal['BsmtQual'].replace(68.0,0)
dfTotal['BsmtQual'] = dfTotal['BsmtQual'].replace('Ex',1)
dfTotal['BsmtQual'] = dfTotal['BsmtQual'].replace('Fa',2)
dfTotal['BsmtQual'] = dfTotal['BsmtQual'].replace('Gd',3)
dfTotal['BsmtQual'] = dfTotal['BsmtQual'].replace('TA',4)

dfTotal.groupby(['BsmtQual']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BsmtQual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,76,76,76,76,76,76,76,76,76,76,...,76,76,76,76,76,76,76,76,76,76
1,258,258,258,258,258,258,258,258,258,258,...,258,258,258,258,258,258,258,258,258,258
2,88,88,88,88,88,88,88,88,88,88,...,88,88,88,88,88,88,88,88,88,88
3,1206,1206,1206,1206,1206,1206,1206,1206,1206,1206,...,1206,1206,1206,1206,1206,1206,1206,1206,1206,1206
4,1278,1278,1278,1278,1278,1278,1278,1278,1278,1278,...,1278,1278,1278,1278,1278,1278,1278,1278,1278,1278


##### Tratando BsmtCond

In [62]:
dfTotal.groupby(['BsmtCond']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BsmtCond,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
68.0,77,77,77,77,77,77,77,77,77,77,...,77,77,77,77,77,77,77,77,77,77
Fa,102,102,102,102,102,102,102,102,102,102,...,102,102,102,102,102,102,102,102,102,102
Gd,122,122,122,122,122,122,122,122,122,122,...,122,122,122,122,122,122,122,122,122,122
Po,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5
TA,2600,2600,2600,2600,2600,2600,2600,2600,2600,2600,...,2600,2600,2600,2600,2600,2600,2600,2600,2600,2600


✅ __A coluna *BsmtCond* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __68.0__ será substituído por 0
* __Fa__ será substituído por 1
* __Gd__ será substituído por 2
* __Po__ será substituído por 3
* __TA__ será substituído por 4

In [87]:
dfTotal['BsmtCond'] = dfTotal['BsmtCond'].replace(68.0,0)
dfTotal['BsmtCond'] = dfTotal['BsmtCond'].replace('Fa',1)
dfTotal['BsmtCond'] = dfTotal['BsmtCond'].replace('Gd',2)
dfTotal['BsmtCond'] = dfTotal['BsmtCond'].replace('Po',3)
dfTotal['BsmtCond'] = dfTotal['BsmtCond'].replace('TA',4)

dfTotal.groupby(['BsmtCond']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BsmtCond,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,77,77,77,77,77,77,77,77,77,77,...,77,77,77,77,77,77,77,77,77,77
1,102,102,102,102,102,102,102,102,102,102,...,102,102,102,102,102,102,102,102,102,102
2,122,122,122,122,122,122,122,122,122,122,...,122,122,122,122,122,122,122,122,122,122
3,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5
4,2600,2600,2600,2600,2600,2600,2600,2600,2600,2600,...,2600,2600,2600,2600,2600,2600,2600,2600,2600,2600


##### Tratando BsmtExposure

In [63]:
dfTotal.groupby(['BsmtExposure']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BsmtExposure,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
68.0,77,77,77,77,77,77,77,77,77,77,...,77,77,77,77,77,77,77,77,77,77
Av,418,418,418,418,418,418,418,418,418,418,...,418,418,418,418,418,418,418,418,418,418
Gd,275,275,275,275,275,275,275,275,275,275,...,275,275,275,275,275,275,275,275,275,275
Mn,238,238,238,238,238,238,238,238,238,238,...,238,238,238,238,238,238,238,238,238,238
No,1898,1898,1898,1898,1898,1898,1898,1898,1898,1898,...,1898,1898,1898,1898,1898,1898,1898,1898,1898,1898


✅ __A coluna *BsmtExposure* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __68.0__ será substituído por 0
* __Av__ será substituído por 1
* __Gd__ será substituído por 2
* __Mn__ será substituído por 3
* __No__ será substituído por 4

In [88]:
dfTotal['BsmtExposure'] = dfTotal['BsmtExposure'].replace(68.0,0)
dfTotal['BsmtExposure'] = dfTotal['BsmtExposure'].replace('Av',1)
dfTotal['BsmtExposure'] = dfTotal['BsmtExposure'].replace('Gd',2)
dfTotal['BsmtExposure'] = dfTotal['BsmtExposure'].replace('Mn',3)
dfTotal['BsmtExposure'] = dfTotal['BsmtExposure'].replace('No',4)

dfTotal.groupby(['BsmtExposure']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BsmtExposure,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,77,77,77,77,77,77,77,77,77,77,...,77,77,77,77,77,77,77,77,77,77
1,418,418,418,418,418,418,418,418,418,418,...,418,418,418,418,418,418,418,418,418,418
2,275,275,275,275,275,275,275,275,275,275,...,275,275,275,275,275,275,275,275,275,275
3,238,238,238,238,238,238,238,238,238,238,...,238,238,238,238,238,238,238,238,238,238
4,1898,1898,1898,1898,1898,1898,1898,1898,1898,1898,...,1898,1898,1898,1898,1898,1898,1898,1898,1898,1898


##### Tratando BsmtFinType1

In [64]:
dfTotal.groupby(['BsmtFinType1']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BsmtFinType1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
68.0,74,74,74,74,74,74,74,74,74,74,...,74,74,74,74,74,74,74,74,74,74
ALQ,427,427,427,427,427,427,427,427,427,427,...,427,427,427,427,427,427,427,427,427,427
BLQ,269,269,269,269,269,269,269,269,269,269,...,269,269,269,269,269,269,269,269,269,269
GLQ,849,849,849,849,849,849,849,849,849,849,...,849,849,849,849,849,849,849,849,849,849
LwQ,154,154,154,154,154,154,154,154,154,154,...,154,154,154,154,154,154,154,154,154,154
Rec,287,287,287,287,287,287,287,287,287,287,...,287,287,287,287,287,287,287,287,287,287
Unf,846,846,846,846,846,846,846,846,846,846,...,846,846,846,846,846,846,846,846,846,846


✅ __A coluna *BsmtFinType1* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __68.0__ será substituído por 0
* __ALQ__ será substituído por 1
* __BLQ__ será substituído por 2
* __GLQ__ será substituído por 3
* __LwQ__ será substituído por 4
* __Rec__ será substituído por 5
* __Unf__ será substituído por 6

In [89]:
dfTotal['BsmtFinType1'] = dfTotal['BsmtFinType1'].replace(68.0,0)
dfTotal['BsmtFinType1'] = dfTotal['BsmtFinType1'].replace('ALQ',1)
dfTotal['BsmtFinType1'] = dfTotal['BsmtFinType1'].replace('BLQ',2)
dfTotal['BsmtFinType1'] = dfTotal['BsmtFinType1'].replace('GLQ',3)
dfTotal['BsmtFinType1'] = dfTotal['BsmtFinType1'].replace('LwQ',4)
dfTotal['BsmtFinType1'] = dfTotal['BsmtFinType1'].replace('Rec',5)
dfTotal['BsmtFinType1'] = dfTotal['BsmtFinType1'].replace('Unf',6)

dfTotal.groupby(['BsmtFinType1']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BsmtFinType1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,74,74,74,74,74,74,74,74,74,74,...,74,74,74,74,74,74,74,74,74,74
1,427,427,427,427,427,427,427,427,427,427,...,427,427,427,427,427,427,427,427,427,427
2,269,269,269,269,269,269,269,269,269,269,...,269,269,269,269,269,269,269,269,269,269
3,849,849,849,849,849,849,849,849,849,849,...,849,849,849,849,849,849,849,849,849,849
4,154,154,154,154,154,154,154,154,154,154,...,154,154,154,154,154,154,154,154,154,154
5,287,287,287,287,287,287,287,287,287,287,...,287,287,287,287,287,287,287,287,287,287
6,846,846,846,846,846,846,846,846,846,846,...,846,846,846,846,846,846,846,846,846,846


##### Tratando BsmtFinType2

In [65]:
dfTotal.groupby(['BsmtFinType2']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BsmtFinType2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
68.0,75,75,75,75,75,75,75,75,75,75,...,75,75,75,75,75,75,75,75,75,75
ALQ,52,52,52,52,52,52,52,52,52,52,...,52,52,52,52,52,52,52,52,52,52
BLQ,67,67,67,67,67,67,67,67,67,67,...,67,67,67,67,67,67,67,67,67,67
GLQ,34,34,34,34,34,34,34,34,34,34,...,34,34,34,34,34,34,34,34,34,34
LwQ,87,87,87,87,87,87,87,87,87,87,...,87,87,87,87,87,87,87,87,87,87
Rec,105,105,105,105,105,105,105,105,105,105,...,105,105,105,105,105,105,105,105,105,105
Unf,2486,2486,2486,2486,2486,2486,2486,2486,2486,2486,...,2486,2486,2486,2486,2486,2486,2486,2486,2486,2486


✅ __A coluna *BsmtFinType2* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __68.0__ será substituído por 0
* __ALQ__ será substituído por 1
* __BLQ__ será substituído por 2
* __GLQ__ será substituído por 3
* __LwQ__ será substituído por 4
* __Rec__ será substituído por 5
* __Unf__ será substituído por 6

In [90]:
dfTotal['BsmtFinType2'] = dfTotal['BsmtFinType2'].replace(68.0,0)
dfTotal['BsmtFinType2'] = dfTotal['BsmtFinType2'].replace('ALQ',1)
dfTotal['BsmtFinType2'] = dfTotal['BsmtFinType2'].replace('BLQ',2)
dfTotal['BsmtFinType2'] = dfTotal['BsmtFinType2'].replace('GLQ',3)
dfTotal['BsmtFinType2'] = dfTotal['BsmtFinType2'].replace('LwQ',4)
dfTotal['BsmtFinType2'] = dfTotal['BsmtFinType2'].replace('Rec',5)
dfTotal['BsmtFinType2'] = dfTotal['BsmtFinType2'].replace('Unf',6)

dfTotal.groupby(['BsmtFinType2']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
BsmtFinType2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,75,75,75,75,75,75,75,75,75,75,...,75,75,75,75,75,75,75,75,75,75
1,52,52,52,52,52,52,52,52,52,52,...,52,52,52,52,52,52,52,52,52,52
2,67,67,67,67,67,67,67,67,67,67,...,67,67,67,67,67,67,67,67,67,67
3,34,34,34,34,34,34,34,34,34,34,...,34,34,34,34,34,34,34,34,34,34
4,87,87,87,87,87,87,87,87,87,87,...,87,87,87,87,87,87,87,87,87,87
5,105,105,105,105,105,105,105,105,105,105,...,105,105,105,105,105,105,105,105,105,105
6,2486,2486,2486,2486,2486,2486,2486,2486,2486,2486,...,2486,2486,2486,2486,2486,2486,2486,2486,2486,2486


##### Tratando Heating

In [66]:
dfTotal.groupby(['Heating']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Heating,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Floor,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
GasA,2862,2862,2862,2862,2862,2862,2862,2862,2862,2862,...,2862,2862,2862,2862,2862,2862,2862,2862,2862,2862
GasW,27,27,27,27,27,27,27,27,27,27,...,27,27,27,27,27,27,27,27,27,27
Grav,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9
OthW,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
Wall,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5


✅ __A coluna *Heating* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Floor__ será substituído por 0
* __GasA__ será substituído por 1
* __GasW__ será substituído por 2
* __Grav__ será substituído por 3
* __OthW__ será substituído por 4
* __Wall__ será substituído por 5

In [91]:
dfTotal['Heating'] = dfTotal['Heating'].replace('Floor',0)
dfTotal['Heating'] = dfTotal['Heating'].replace('GasA',1)
dfTotal['Heating'] = dfTotal['Heating'].replace('GasW',2)
dfTotal['Heating'] = dfTotal['Heating'].replace('Grav',3)
dfTotal['Heating'] = dfTotal['Heating'].replace('OthW',4)
dfTotal['Heating'] = dfTotal['Heating'].replace('Wall',5)

dfTotal.groupby(['Heating']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Heating,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
1,2862,2862,2862,2862,2862,2862,2862,2862,2862,2862,...,2862,2862,2862,2862,2862,2862,2862,2862,2862,2862
2,27,27,27,27,27,27,27,27,27,27,...,27,27,27,27,27,27,27,27,27,27
3,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9
4,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
5,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5


##### Tratando HeatingQC

In [67]:
dfTotal.groupby(['HeatingQC']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
HeatingQC,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Ex,1490,1490,1490,1490,1490,1490,1490,1490,1490,1490,...,1490,1490,1490,1490,1490,1490,1490,1490,1490,1490
Fa,91,91,91,91,91,91,91,91,91,91,...,91,91,91,91,91,91,91,91,91,91
Gd,472,472,472,472,472,472,472,472,472,472,...,472,472,472,472,472,472,472,472,472,472
Po,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
TA,851,851,851,851,851,851,851,851,851,851,...,851,851,851,851,851,851,851,851,851,851


✅ __A coluna *HeatingQC* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Ex__ será substituído por 0
* __Fa__ será substituído por 1
* __Gd__ será substituído por 2
* __Po__ será substituído por 3
* __TA__ será substituído por 4

In [92]:
dfTotal['HeatingQC'] = dfTotal['HeatingQC'].replace('Ex',0)
dfTotal['HeatingQC'] = dfTotal['HeatingQC'].replace('Fa',1)
dfTotal['HeatingQC'] = dfTotal['HeatingQC'].replace('Gd',2)
dfTotal['HeatingQC'] = dfTotal['HeatingQC'].replace('Po',3)
dfTotal['HeatingQC'] = dfTotal['HeatingQC'].replace('TA',4)

dfTotal.groupby(['HeatingQC']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
HeatingQC,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,1490,1490,1490,1490,1490,1490,1490,1490,1490,1490,...,1490,1490,1490,1490,1490,1490,1490,1490,1490,1490
1,91,91,91,91,91,91,91,91,91,91,...,91,91,91,91,91,91,91,91,91,91
2,472,472,472,472,472,472,472,472,472,472,...,472,472,472,472,472,472,472,472,472,472
3,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
4,851,851,851,851,851,851,851,851,851,851,...,851,851,851,851,851,851,851,851,851,851


##### Tratando CentralAir

In [68]:
dfTotal.groupby(['CentralAir']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
CentralAir,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
N,193,193,193,193,193,193,193,193,193,193,...,193,193,193,193,193,193,193,193,193,193
Y,2713,2713,2713,2713,2713,2713,2713,2713,2713,2713,...,2713,2713,2713,2713,2713,2713,2713,2713,2713,2713


✅ __A coluna *CentralAir* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __N__ será substituído por 0
* __Y__ será substituído por 1

In [93]:
dfTotal['CentralAir'] = dfTotal['CentralAir'].replace('N',0)
dfTotal['CentralAir'] = dfTotal['CentralAir'].replace('Y',1)

dfTotal.groupby(['CentralAir']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
CentralAir,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,193,193,193,193,193,193,193,193,193,193,...,193,193,193,193,193,193,193,193,193,193
1,2713,2713,2713,2713,2713,2713,2713,2713,2713,2713,...,2713,2713,2713,2713,2713,2713,2713,2713,2713,2713


##### Tratando Electrical

In [69]:
dfTotal.groupby(['Electrical']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Electrical,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
FuseA,183,183,183,183,183,183,183,183,183,183,...,183,183,183,183,183,183,183,183,183,183
FuseF,50,50,50,50,50,50,50,50,50,50,...,50,50,50,50,50,50,50,50,50,50
FuseP,8,8,8,8,8,8,8,8,8,8,...,8,8,8,8,8,8,8,8,8,8
Mix,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
SBrkr,2664,2664,2664,2664,2664,2664,2664,2664,2664,2664,...,2664,2664,2664,2664,2664,2664,2664,2664,2664,2664


✅ __A coluna *Electrical* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __FuseA__ será substituído por 0
* __FuseF__ será substituído por 1
* __FuseP__ será substituído por 2
* __Mix__ será substituído por 3
* __SBrkr__ será substituído por 4

In [94]:
dfTotal['Electrical'] = dfTotal['Electrical'].replace('FuseA',0)
dfTotal['Electrical'] = dfTotal['Electrical'].replace('FuseF',1)
dfTotal['Electrical'] = dfTotal['Electrical'].replace('FuseP',2)
dfTotal['Electrical'] = dfTotal['Electrical'].replace('Mix',3)
dfTotal['Electrical'] = dfTotal['Electrical'].replace('SBrkr',4)

dfTotal.groupby(['Electrical']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Electrical,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,183,183,183,183,183,183,183,183,183,183,...,183,183,183,183,183,183,183,183,183,183
1,50,50,50,50,50,50,50,50,50,50,...,50,50,50,50,50,50,50,50,50,50
2,8,8,8,8,8,8,8,8,8,8,...,8,8,8,8,8,8,8,8,8,8
3,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
4,2664,2664,2664,2664,2664,2664,2664,2664,2664,2664,...,2664,2664,2664,2664,2664,2664,2664,2664,2664,2664


##### Tratando KitchenQual

In [70]:
dfTotal.groupby(['KitchenQual']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
KitchenQual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Ex,204,204,204,204,204,204,204,204,204,204,...,204,204,204,204,204,204,204,204,204,204
Fa,68,68,68,68,68,68,68,68,68,68,...,68,68,68,68,68,68,68,68,68,68
Gd,1149,1149,1149,1149,1149,1149,1149,1149,1149,1149,...,1149,1149,1149,1149,1149,1149,1149,1149,1149,1149
TA,1485,1485,1485,1485,1485,1485,1485,1485,1485,1485,...,1485,1485,1485,1485,1485,1485,1485,1485,1485,1485


✅ __A coluna *KitchenQual* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Ex__ será substituído por 0
* __Fa__ será substituído por 1
* __Gd__ será substituído por 2
* __TA__ será substituído por 3

In [95]:
dfTotal['KitchenQual'] = dfTotal['KitchenQual'].replace('Ex',0)
dfTotal['KitchenQual'] = dfTotal['KitchenQual'].replace('Fa',1)
dfTotal['KitchenQual'] = dfTotal['KitchenQual'].replace('Gd',2)
dfTotal['KitchenQual'] = dfTotal['KitchenQual'].replace('TA',3)

dfTotal.groupby(['KitchenQual']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
KitchenQual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,204,204,204,204,204,204,204,204,204,204,...,204,204,204,204,204,204,204,204,204,204
1,68,68,68,68,68,68,68,68,68,68,...,68,68,68,68,68,68,68,68,68,68
2,1149,1149,1149,1149,1149,1149,1149,1149,1149,1149,...,1149,1149,1149,1149,1149,1149,1149,1149,1149,1149
3,1485,1485,1485,1485,1485,1485,1485,1485,1485,1485,...,1485,1485,1485,1485,1485,1485,1485,1485,1485,1485


##### Tratando Functional

In [71]:
dfTotal.groupby(['Functional']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Functional,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Maj1,18,18,18,18,18,18,18,18,18,18,...,18,18,18,18,18,18,18,18,18,18
Maj2,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9
Min1,64,64,64,64,64,64,64,64,64,64,...,64,64,64,64,64,64,64,64,64,64
Min2,69,69,69,69,69,69,69,69,69,69,...,69,69,69,69,69,69,69,69,69,69
Mod,33,33,33,33,33,33,33,33,33,33,...,33,33,33,33,33,33,33,33,33,33
Sev,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
Typ,2711,2711,2711,2711,2711,2711,2711,2711,2711,2711,...,2711,2711,2711,2711,2711,2711,2711,2711,2711,2711


✅ __A coluna *Functional* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __Maj1__ será substituído por 0
* __Maj2__ será substituído por 1
* __Min1__ será substituído por 2
* __Min2__ será substituído por 3
* __Mod__ será substituído por 4
* __Sev__ será substituído por 5
* __Typ__ será substituído por 6

In [97]:
dfTotal['Functional'] = dfTotal['Functional'].replace('Maj1',0)
dfTotal['Functional'] = dfTotal['Functional'].replace('Maj2',1)
dfTotal['Functional'] = dfTotal['Functional'].replace('Min1',2)
dfTotal['Functional'] = dfTotal['Functional'].replace('Min2',3)
dfTotal['Functional'] = dfTotal['Functional'].replace('Mod',4)
dfTotal['Functional'] = dfTotal['Functional'].replace('Sev',5)
dfTotal['Functional'] = dfTotal['Functional'].replace('Typ',6)

dfTotal.groupby(['Functional']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Functional,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,18,18,18,18,18,18,18,18,18,18,...,18,18,18,18,18,18,18,18,18,18
1,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9
2,64,64,64,64,64,64,64,64,64,64,...,64,64,64,64,64,64,64,64,64,64
3,69,69,69,69,69,69,69,69,69,69,...,69,69,69,69,69,69,69,69,69,69
4,33,33,33,33,33,33,33,33,33,33,...,33,33,33,33,33,33,33,33,33,33
5,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
6,2711,2711,2711,2711,2711,2711,2711,2711,2711,2711,...,2711,2711,2711,2711,2711,2711,2711,2711,2711,2711


##### Tratando GarageType

In [72]:
dfTotal.groupby(['GarageType']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
GarageType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
68.0,156,156,156,156,156,156,156,156,156,156,...,156,156,156,156,156,156,156,156,156,156
2Types,23,23,23,23,23,23,23,23,23,23,...,23,23,23,23,23,23,23,23,23,23
Attchd,1718,1718,1718,1718,1718,1718,1718,1718,1718,1718,...,1718,1718,1718,1718,1718,1718,1718,1718,1718,1718
Basment,36,36,36,36,36,36,36,36,36,36,...,36,36,36,36,36,36,36,36,36,36
BuiltIn,185,185,185,185,185,185,185,185,185,185,...,185,185,185,185,185,185,185,185,185,185
CarPort,15,15,15,15,15,15,15,15,15,15,...,15,15,15,15,15,15,15,15,15,15
Detchd,773,773,773,773,773,773,773,773,773,773,...,773,773,773,773,773,773,773,773,773,773


✅ __A coluna *GarageType* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __68.0__ será substituído por 0
* __2Types__ será substituído por 1
* __Attchd__ será substituído por 2
* __Basment__ será substituído por 3
* __Builtln__ será substituído por 4
* __CarPort__ será substituído por 5
* __Detchd__ será substituído por 6

In [99]:
dfTotal['GarageType'] = dfTotal['GarageType'].replace(68.0,0)
dfTotal['GarageType'] = dfTotal['GarageType'].replace('2Types',1)
dfTotal['GarageType'] = dfTotal['GarageType'].replace('Attchd',2)
dfTotal['GarageType'] = dfTotal['GarageType'].replace('Basment',3)
dfTotal['GarageType'] = dfTotal['GarageType'].replace('BuiltIn',4)
dfTotal['GarageType'] = dfTotal['GarageType'].replace('CarPort',5)
dfTotal['GarageType'] = dfTotal['GarageType'].replace('Detchd',6)

dfTotal.groupby(['GarageType']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
GarageType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,156,156,156,156,156,156,156,156,156,156,...,156,156,156,156,156,156,156,156,156,156
1,23,23,23,23,23,23,23,23,23,23,...,23,23,23,23,23,23,23,23,23,23
2,1718,1718,1718,1718,1718,1718,1718,1718,1718,1718,...,1718,1718,1718,1718,1718,1718,1718,1718,1718,1718
3,36,36,36,36,36,36,36,36,36,36,...,36,36,36,36,36,36,36,36,36,36
4,185,185,185,185,185,185,185,185,185,185,...,185,185,185,185,185,185,185,185,185,185
5,15,15,15,15,15,15,15,15,15,15,...,15,15,15,15,15,15,15,15,15,15
6,773,773,773,773,773,773,773,773,773,773,...,773,773,773,773,773,773,773,773,773,773


##### Tratando GarageFinish

In [73]:
dfTotal.groupby(['GarageFinish']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
GarageFinish,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
68.0,157,157,157,157,157,157,157,157,157,157,...,157,157,157,157,157,157,157,157,157,157
Fin,718,718,718,718,718,718,718,718,718,718,...,718,718,718,718,718,718,718,718,718,718
RFn,811,811,811,811,811,811,811,811,811,811,...,811,811,811,811,811,811,811,811,811,811
Unf,1220,1220,1220,1220,1220,1220,1220,1220,1220,1220,...,1220,1220,1220,1220,1220,1220,1220,1220,1220,1220


✅ __A coluna *GarageFinish* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __68.0__ será substituído por 0
* __Fin__ será substituído por 1
* __RFn__ será substituído por 2
* __Unf__ será substituído por 3

In [101]:
dfTotal['GarageFinish'] = dfTotal['GarageFinish'].replace(68.0,0)
dfTotal['GarageFinish'] = dfTotal['GarageFinish'].replace('Fin',1)
dfTotal['GarageFinish'] = dfTotal['GarageFinish'].replace('RFn',2)
dfTotal['GarageFinish'] = dfTotal['GarageFinish'].replace('Unf',3)

dfTotal.groupby(['GarageFinish']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
GarageFinish,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,157,157,157,157,157,157,157,157,157,157,...,157,157,157,157,157,157,157,157,157,157
1,718,718,718,718,718,718,718,718,718,718,...,718,718,718,718,718,718,718,718,718,718
2,811,811,811,811,811,811,811,811,811,811,...,811,811,811,811,811,811,811,811,811,811
3,1220,1220,1220,1220,1220,1220,1220,1220,1220,1220,...,1220,1220,1220,1220,1220,1220,1220,1220,1220,1220


##### Tratando GarageQual

In [74]:
dfTotal.groupby(['GarageQual']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
GarageQual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
68.0,157,157,157,157,157,157,157,157,157,157,...,157,157,157,157,157,157,157,157,157,157
Ex,3,3,3,3,3,3,3,3,3,3,...,3,3,3,3,3,3,3,3,3,3
Fa,121,121,121,121,121,121,121,121,121,121,...,121,121,121,121,121,121,121,121,121,121
Gd,24,24,24,24,24,24,24,24,24,24,...,24,24,24,24,24,24,24,24,24,24
Po,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5
TA,2596,2596,2596,2596,2596,2596,2596,2596,2596,2596,...,2596,2596,2596,2596,2596,2596,2596,2596,2596,2596


✅ __A coluna *GarageQual* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __68.0__ será substituído por 0
* __Ex__ será substituído por 1
* __Fa__ será substituído por 2
* __Gd__ será substituído por 3
* __Po__ será substituído por 4
* __TA__ será substituído por 5

In [103]:
dfTotal['GarageQual'] = dfTotal['GarageQual'].replace(68.0,0)
dfTotal['GarageQual'] = dfTotal['GarageQual'].replace('Ex',1)
dfTotal['GarageQual'] = dfTotal['GarageQual'].replace('Fa',2)
dfTotal['GarageQual'] = dfTotal['GarageQual'].replace('Gd',3)
dfTotal['GarageQual'] = dfTotal['GarageQual'].replace('Po',4)
dfTotal['GarageQual'] = dfTotal['GarageQual'].replace('TA',5)

dfTotal.groupby(['GarageQual']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
GarageQual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,157,157,157,157,157,157,157,157,157,157,...,157,157,157,157,157,157,157,157,157,157
1,3,3,3,3,3,3,3,3,3,3,...,3,3,3,3,3,3,3,3,3,3
2,121,121,121,121,121,121,121,121,121,121,...,121,121,121,121,121,121,121,121,121,121
3,24,24,24,24,24,24,24,24,24,24,...,24,24,24,24,24,24,24,24,24,24
4,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5
5,2596,2596,2596,2596,2596,2596,2596,2596,2596,2596,...,2596,2596,2596,2596,2596,2596,2596,2596,2596,2596


##### Tratando GarageCond

In [75]:
dfTotal.groupby(['GarageCond']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
GarageCond,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
68.0,157,157,157,157,157,157,157,157,157,157,...,157,157,157,157,157,157,157,157,157,157
Ex,3,3,3,3,3,3,3,3,3,3,...,3,3,3,3,3,3,3,3,3,3
Fa,71,71,71,71,71,71,71,71,71,71,...,71,71,71,71,71,71,71,71,71,71
Gd,15,15,15,15,15,15,15,15,15,15,...,15,15,15,15,15,15,15,15,15,15
Po,13,13,13,13,13,13,13,13,13,13,...,13,13,13,13,13,13,13,13,13,13
TA,2647,2647,2647,2647,2647,2647,2647,2647,2647,2647,...,2647,2647,2647,2647,2647,2647,2647,2647,2647,2647


✅ __A coluna *GarageCond* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __68.0__ será substituído por 0
* __Ex__ será substituído por 1
* __Fa__ será substituído por 2
* __Gd__ será substituído por 3
* __Po__ será substituído por 4
* __TA__ será substituído por 5

In [104]:
dfTotal['GarageCond'] = dfTotal['GarageCond'].replace(68.0,0)
dfTotal['GarageCond'] = dfTotal['GarageCond'].replace('Ex',1)
dfTotal['GarageCond'] = dfTotal['GarageCond'].replace('Fa',2)
dfTotal['GarageCond'] = dfTotal['GarageCond'].replace('Gd',3)
dfTotal['GarageCond'] = dfTotal['GarageCond'].replace('Po',4)
dfTotal['GarageCond'] = dfTotal['GarageCond'].replace('TA',5)

dfTotal.groupby(['GarageCond']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
GarageCond,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,157,157,157,157,157,157,157,157,157,157,...,157,157,157,157,157,157,157,157,157,157
1,3,3,3,3,3,3,3,3,3,3,...,3,3,3,3,3,3,3,3,3,3
2,71,71,71,71,71,71,71,71,71,71,...,71,71,71,71,71,71,71,71,71,71
3,15,15,15,15,15,15,15,15,15,15,...,15,15,15,15,15,15,15,15,15,15
4,13,13,13,13,13,13,13,13,13,13,...,13,13,13,13,13,13,13,13,13,13
5,2647,2647,2647,2647,2647,2647,2647,2647,2647,2647,...,2647,2647,2647,2647,2647,2647,2647,2647,2647,2647


##### Tratando PavedDrive

In [76]:
dfTotal.groupby(['PavedDrive']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
PavedDrive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
N,210,210,210,210,210,210,210,210,210,210,...,210,210,210,210,210,210,210,210,210,210
P,62,62,62,62,62,62,62,62,62,62,...,62,62,62,62,62,62,62,62,62,62
Y,2634,2634,2634,2634,2634,2634,2634,2634,2634,2634,...,2634,2634,2634,2634,2634,2634,2634,2634,2634,2634


✅ __A coluna *PavedDrive* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __N__ será substituído por 0
* __P__ será substituído por 1
* __Y__ será substituído por 2

In [106]:
dfTotal['PavedDrive'] = dfTotal['PavedDrive'].replace('N',0)
dfTotal['PavedDrive'] = dfTotal['PavedDrive'].replace('P',1)
dfTotal['PavedDrive'] = dfTotal['PavedDrive'].replace('Y',2)

dfTotal.groupby(['PavedDrive']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
PavedDrive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,210,210,210,210,210,210,210,210,210,210,...,210,210,210,210,210,210,210,210,210,210
1,62,62,62,62,62,62,62,62,62,62,...,62,62,62,62,62,62,62,62,62,62
2,2634,2634,2634,2634,2634,2634,2634,2634,2634,2634,...,2634,2634,2634,2634,2634,2634,2634,2634,2634,2634


##### Tratando SaleType

In [77]:
dfTotal.groupby(['SaleType']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleCondition,SalePrice
SaleType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
COD,85,85,85,85,85,85,85,85,85,85,...,85,85,85,85,85,85,85,85,85,85
CWD,12,12,12,12,12,12,12,12,12,12,...,12,12,12,12,12,12,12,12,12,12
Con,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5
ConLD,24,24,24,24,24,24,24,24,24,24,...,24,24,24,24,24,24,24,24,24,24
ConLI,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9
ConLw,8,8,8,8,8,8,8,8,8,8,...,8,8,8,8,8,8,8,8,8,8
New,239,239,239,239,239,239,239,239,239,239,...,239,239,239,239,239,239,239,239,239,239
Oth,7,7,7,7,7,7,7,7,7,7,...,7,7,7,7,7,7,7,7,7,7
WD,2517,2517,2517,2517,2517,2517,2517,2517,2517,2517,...,2517,2517,2517,2517,2517,2517,2517,2517,2517,2517


✅ __A coluna *SaleType* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __COD__ será substituído por 0
* __CWD__ será substituído por 1
* __Con__ será substituído por 2
* __ConLD__ será substituído por 3
* __ConLI__ será substituído por 4
* __ConLw__ será substituído por 5
* __New__ será substituído por 6
* __Oth__ será substituído por 7
* __WD__ será substituído por 8

In [107]:
dfTotal['SaleType'] = dfTotal['SaleType'].replace('COD',0)
dfTotal['SaleType'] = dfTotal['SaleType'].replace('CWD',1)
dfTotal['SaleType'] = dfTotal['SaleType'].replace('Con',2)
dfTotal['SaleType'] = dfTotal['SaleType'].replace('ConLD',3)
dfTotal['SaleType'] = dfTotal['SaleType'].replace('ConLI',4)
dfTotal['SaleType'] = dfTotal['SaleType'].replace('ConLw',5)
dfTotal['SaleType'] = dfTotal['SaleType'].replace('New',6)
dfTotal['SaleType'] = dfTotal['SaleType'].replace('Oth',7)
dfTotal['SaleType'] = dfTotal['SaleType'].replace('WD',8)

dfTotal.groupby(['SaleType']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleCondition,SalePrice
SaleType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,85,85,85,85,85,85,85,85,85,85,...,85,85,85,85,85,85,85,85,85,85
1,12,12,12,12,12,12,12,12,12,12,...,12,12,12,12,12,12,12,12,12,12
2,5,5,5,5,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5
3,24,24,24,24,24,24,24,24,24,24,...,24,24,24,24,24,24,24,24,24,24
4,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9
5,8,8,8,8,8,8,8,8,8,8,...,8,8,8,8,8,8,8,8,8,8
6,239,239,239,239,239,239,239,239,239,239,...,239,239,239,239,239,239,239,239,239,239
7,7,7,7,7,7,7,7,7,7,7,...,7,7,7,7,7,7,7,7,7,7
8,2517,2517,2517,2517,2517,2517,2517,2517,2517,2517,...,2517,2517,2517,2517,2517,2517,2517,2517,2517,2517


##### Tratando SaleCondition

In [78]:
dfTotal.groupby(['SaleCondition']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SalePrice
SaleCondition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Abnorml,187,187,187,187,187,187,187,187,187,187,...,187,187,187,187,187,187,187,187,187,187
AdjLand,12,12,12,12,12,12,12,12,12,12,...,12,12,12,12,12,12,12,12,12,12
Alloca,23,23,23,23,23,23,23,23,23,23,...,23,23,23,23,23,23,23,23,23,23
Family,46,46,46,46,46,46,46,46,46,46,...,46,46,46,46,46,46,46,46,46,46
Normal,2393,2393,2393,2393,2393,2393,2393,2393,2393,2393,...,2393,2393,2393,2393,2393,2393,2393,2393,2393,2393
Partial,245,245,245,245,245,245,245,245,245,245,...,245,245,245,245,245,245,245,245,245,245


✅ __A coluna *SaleCondition* não apresenta nenhum dado que possa ser considerado inconsistente.__ ✅

Para converter todos os seus valores para inteiros, será feito o seguinte:

* __AbnormI__ será substituído por 0
* __AdjLand__ será substituído por 1
* __Alloc__ será substituído por 2
* __Family__ será substituído por 3
* __Normal__ será substituído por 4
* __Partial__ será substituído por 5

In [109]:
dfTotal['SaleCondition'] = dfTotal['SaleCondition'].replace('Abnorml',0)
dfTotal['SaleCondition'] = dfTotal['SaleCondition'].replace('AdjLand',1)
dfTotal['SaleCondition'] = dfTotal['SaleCondition'].replace('Alloca',2)
dfTotal['SaleCondition'] = dfTotal['SaleCondition'].replace('Family',3)
dfTotal['SaleCondition'] = dfTotal['SaleCondition'].replace('Normal',4)
dfTotal['SaleCondition'] = dfTotal['SaleCondition'].replace('Partial',5)

dfTotal.groupby(['SaleCondition']).count()

Unnamed: 0_level_0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,...,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SalePrice
SaleCondition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,187,187,187,187,187,187,187,187,187,187,...,187,187,187,187,187,187,187,187,187,187
1,12,12,12,12,12,12,12,12,12,12,...,12,12,12,12,12,12,12,12,12,12
2,23,23,23,23,23,23,23,23,23,23,...,23,23,23,23,23,23,23,23,23,23
3,46,46,46,46,46,46,46,46,46,46,...,46,46,46,46,46,46,46,46,46,46
4,2393,2393,2393,2393,2393,2393,2393,2393,2393,2393,...,2393,2393,2393,2393,2393,2393,2393,2393,2393,2393
5,245,245,245,245,245,245,245,245,245,245,...,245,245,245,245,245,245,245,245,245,245
