<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Load-Data" data-toc-modified-id="Load-Data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Load Data</a></span></li><li><span><a href="#Extract-Features-and-Targets" data-toc-modified-id="Extract-Features-and-Targets-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Extract Features and Targets</a></span></li><li><span><a href="#Create-Validation-Set" data-toc-modified-id="Create-Validation-Set-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Create Validation Set</a></span></li><li><span><a href="#Explore-Data" data-toc-modified-id="Explore-Data-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Explore Data</a></span></li><li><span><a href="#Manage-Missing-Categorical-Data" data-toc-modified-id="Manage-Missing-Categorical-Data-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Manage Missing Categorical Data</a></span></li><li><span><a href="#Drop-Features-with-Missing-Data" data-toc-modified-id="Drop-Features-with-Missing-Data-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Drop Features with Missing Data</a></span></li><li><span><a href="#Imputation" data-toc-modified-id="Imputation-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Imputation</a></span></li><li><span><a href="#Mixed-Dropping-and-Imputation" data-toc-modified-id="Mixed-Dropping-and-Imputation-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Mixed Dropping and Imputation</a></span></li><li><span><a href="#Scikit-Learn-Built-In-Imputer" data-toc-modified-id="Scikit-Learn-Built-In-Imputer-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Scikit Learn Built In Imputer</a></span></li></ul></div>

# Import Packages

In [67]:
import pandas as pd
import pandas_profiling as pp

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Import Data

## Load Data

In [25]:
# Define data locations
data_dir        = '../Data/house-prices-advanced-regression-techniques/'
train_file_name = 'train.csv'
test_file_name  = 'test.csv'

# Load training and testing data
train_data = pd.read_csv( data_dir + train_file_name, index_col='Id' )
test_data  = pd.read_csv( data_dir + test_file_name, index_col='Id' )

# Remove rows with missing targets
train_data.dropna( axis=0, subset=['SalePrice'], inplace=True)

## Extract Features and Targets

In [26]:
# Extract targets and features
y = train_data.SalePrice
X = train_data.copy()
X.drop( ['SalePrice'], axis=1, inplace=True )


X_test = test_data.copy()

# As instructed by course, use only numerical data
# Update: Include categorical data
# Uncomment if categorical data no longer wanted
#X = X.select_dtypes( exclude=['object'] )
#X_test = test_data.select_dtypes( exclude=['object'] )

## Create Validation Set

In [27]:
X_train, X_val, y_train, y_val = train_test_split( X, y, 
                                                   train_size=0.8, 
                                                   test_size=0.2, 
                                                   random_state=0)

## Explore Data

In [9]:
pp.ProfileReport( X )

0,1
Number of variables,80
Number of observations,1460
Total Missing (%),6.0%
Total size in memory,912.6 KiB
Average record size in memory,640.1 B

0,1
Numeric,37
Categorical,43
Boolean,0
Date,0
Text (Unique),0
Rejected,0
Unsupported,0

0,1
Distinct count,753
Unique (%),51.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1162.6
Minimum,334
Maximum,4692
Zeros (%),0.0%

0,1
Minimum,334.0
5-th percentile,672.95
Q1,882.0
Median,1087.0
Q3,1391.2
95-th percentile,1831.2
Maximum,4692.0
Range,4358.0
Interquartile range,509.25

0,1
Standard deviation,386.59
Coef of variation,0.33251
Kurtosis,5.7458
Mean,1162.6
MAD,300.58
Skewness,1.3768
Sum,1697435
Variance,149450
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
864,25,1.7%,
1040,16,1.1%,
912,14,1.0%,
848,12,0.8%,
894,12,0.8%,
672,11,0.8%,
816,9,0.6%,
630,9,0.6%,
936,7,0.5%,
960,7,0.5%,

Value,Count,Frequency (%),Unnamed: 3
334,1,0.1%,
372,1,0.1%,
438,1,0.1%,
480,1,0.1%,
483,7,0.5%,

Value,Count,Frequency (%),Unnamed: 3
2633,1,0.1%,
2898,1,0.1%,
3138,1,0.1%,
3228,1,0.1%,
4692,1,0.1%,

0,1
Distinct count,417
Unique (%),28.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,346.99
Minimum,0
Maximum,2065
Zeros (%),56.8%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,728
95-th percentile,1141
Maximum,2065
Range,2065
Interquartile range,728

0,1
Standard deviation,436.53
Coef of variation,1.258
Kurtosis,-0.55346
Mean,346.99
MAD,396.48
Skewness,0.81303
Sum,506609
Variance,190560
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,829,56.8%,
728,10,0.7%,
504,9,0.6%,
672,8,0.5%,
546,8,0.5%,
720,7,0.5%,
600,7,0.5%,
896,6,0.4%,
780,5,0.3%,
862,5,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,829,56.8%,
110,1,0.1%,
167,1,0.1%,
192,1,0.1%,
208,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1611,1,0.1%,
1796,1,0.1%,
1818,1,0.1%,
1872,1,0.1%,
2065,1,0.1%,

0,1
Distinct count,20
Unique (%),1.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3.4096
Minimum,0
Maximum,508
Zeros (%),98.4%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,508
Range,508
Interquartile range,0

0,1
Standard deviation,29.317
Coef of variation,8.5985
Kurtosis,123.66
Mean,3.4096
MAD,6.7071
Skewness,10.304
Sum,4978
Variance,859.51
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1436,98.4%,
168,3,0.2%,
216,2,0.1%,
144,2,0.1%,
180,2,0.1%,
245,1,0.1%,
238,1,0.1%,
290,1,0.1%,
196,1,0.1%,
182,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1436,98.4%,
23,1,0.1%,
96,1,0.1%,
130,1,0.1%,
140,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
290,1,0.1%,
304,1,0.1%,
320,1,0.1%,
407,1,0.1%,
508,1,0.1%,

0,1
Distinct count,3
Unique (%),0.2%
Missing (%),93.8%
Missing (n),1369

0,1
Grvl,50
Pave,41
(Missing),1369

Value,Count,Frequency (%),Unnamed: 3
Grvl,50,3.4%,
Pave,41,2.8%,
(Missing),1369,93.8%,

0,1
Distinct count,8
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2.8664
Minimum,0
Maximum,8
Zeros (%),0.4%

0,1
Minimum,0
5-th percentile,2
Q1,2
Median,3
Q3,3
95-th percentile,4
Maximum,8
Range,8
Interquartile range,1

0,1
Standard deviation,0.81578
Coef of variation,0.2846
Kurtosis,2.2309
Mean,2.8664
MAD,0.57631
Skewness,0.21179
Sum,4185
Variance,0.66549
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
3,804,55.1%,
2,358,24.5%,
4,213,14.6%,
1,50,3.4%,
5,21,1.4%,
6,7,0.5%,
0,6,0.4%,
8,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,6,0.4%,
1,50,3.4%,
2,358,24.5%,
3,804,55.1%,
4,213,14.6%,

Value,Count,Frequency (%),Unnamed: 3
3,804,55.1%,
4,213,14.6%,
5,21,1.4%,
6,7,0.5%,
8,1,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
1Fam,1220
TwnhsE,114
Duplex,52
Other values (2),74

Value,Count,Frequency (%),Unnamed: 3
1Fam,1220,83.6%,
TwnhsE,114,7.8%,
Duplex,52,3.6%,
Twnhs,43,2.9%,
2fmCon,31,2.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),2.5%
Missing (n),37

0,1
TA,1311
Gd,65
Fa,45
(Missing),37

Value,Count,Frequency (%),Unnamed: 3
TA,1311,89.8%,
Gd,65,4.5%,
Fa,45,3.1%,
Po,2,0.1%,
(Missing),37,2.5%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),2.6%
Missing (n),38

0,1
No,953
Av,221
Gd,134

Value,Count,Frequency (%),Unnamed: 3
No,953,65.3%,
Av,221,15.1%,
Gd,134,9.2%,
Mn,114,7.8%,
(Missing),38,2.6%,

0,1
Distinct count,637
Unique (%),43.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,443.64
Minimum,0
Maximum,5644
Zeros (%),32.0%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,383.5
Q3,712.25
95-th percentile,1274.0
Maximum,5644.0
Range,5644.0
Interquartile range,712.25

0,1
Standard deviation,456.1
Coef of variation,1.0281
Kurtosis,11.118
Mean,443.64
MAD,367.37
Skewness,1.6855
Sum,647714
Variance,208030
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,467,32.0%,
24,12,0.8%,
16,9,0.6%,
20,5,0.3%,
686,5,0.3%,
616,5,0.3%,
936,5,0.3%,
662,5,0.3%,
428,4,0.3%,
655,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,467,32.0%,
2,1,0.1%,
16,9,0.6%,
20,5,0.3%,
24,12,0.8%,

Value,Count,Frequency (%),Unnamed: 3
1904,1,0.1%,
2096,1,0.1%,
2188,1,0.1%,
2260,1,0.1%,
5644,1,0.1%,

0,1
Distinct count,144
Unique (%),9.9%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,46.549
Minimum,0
Maximum,1474
Zeros (%),88.6%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,0.0
95-th percentile,396.2
Maximum,1474.0
Range,1474.0
Interquartile range,0.0

0,1
Standard deviation,161.32
Coef of variation,3.4656
Kurtosis,20.113
Mean,46.549
MAD,82.535
Skewness,4.2553
Sum,67962
Variance,26024
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1293,88.6%,
180,5,0.3%,
374,3,0.2%,
551,2,0.1%,
93,2,0.1%,
468,2,0.1%,
147,2,0.1%,
480,2,0.1%,
539,2,0.1%,
712,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1293,88.6%,
28,1,0.1%,
32,1,0.1%,
35,1,0.1%,
40,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1080,1,0.1%,
1085,1,0.1%,
1120,1,0.1%,
1127,1,0.1%,
1474,1,0.1%,

0,1
Distinct count,7
Unique (%),0.5%
Missing (%),2.5%
Missing (n),37

0,1
Unf,430
GLQ,418
ALQ,220
Other values (3),355

Value,Count,Frequency (%),Unnamed: 3
Unf,430,29.5%,
GLQ,418,28.6%,
ALQ,220,15.1%,
BLQ,148,10.1%,
Rec,133,9.1%,
LwQ,74,5.1%,
(Missing),37,2.5%,

0,1
Distinct count,7
Unique (%),0.5%
Missing (%),2.6%
Missing (n),38

0,1
Unf,1256
Rec,54
LwQ,46
Other values (3),66
(Missing),38

Value,Count,Frequency (%),Unnamed: 3
Unf,1256,86.0%,
Rec,54,3.7%,
LwQ,46,3.2%,
BLQ,33,2.3%,
ALQ,19,1.3%,
GLQ,14,1.0%,
(Missing),38,2.6%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.42534
Minimum,0
Maximum,3
Zeros (%),58.6%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,1
95-th percentile,1
Maximum,3
Range,3
Interquartile range,1

0,1
Standard deviation,0.51891
Coef of variation,1.22
Kurtosis,-0.8391
Mean,0.42534
MAD,0.49876
Skewness,0.59607
Sum,621
Variance,0.26927
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,856,58.6%,
1,588,40.3%,
2,15,1.0%,
3,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,856,58.6%,
1,588,40.3%,
2,15,1.0%,
3,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,856,58.6%,
1,588,40.3%,
2,15,1.0%,
3,1,0.1%,

0,1
Distinct count,3
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.057534
Minimum,0
Maximum,2
Zeros (%),94.4%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,1
Maximum,2
Range,2
Interquartile range,0

0,1
Standard deviation,0.23875
Coef of variation,4.1497
Kurtosis,16.397
Mean,0.057534
MAD,0.10861
Skewness,4.1034
Sum,84
Variance,0.057003
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1378,94.4%,
1,80,5.5%,
2,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1378,94.4%,
1,80,5.5%,
2,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1378,94.4%,
1,80,5.5%,
2,2,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),2.5%
Missing (n),37

0,1
TA,649
Gd,618
Ex,121
(Missing),37

Value,Count,Frequency (%),Unnamed: 3
TA,649,44.5%,
Gd,618,42.3%,
Ex,121,8.3%,
Fa,35,2.4%,
(Missing),37,2.5%,

0,1
Distinct count,780
Unique (%),53.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,567.24
Minimum,0
Maximum,2336
Zeros (%),8.1%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,223.0
Median,477.5
Q3,808.0
95-th percentile,1468.0
Maximum,2336.0
Range,2336.0
Interquartile range,585.0

0,1
Standard deviation,441.87
Coef of variation,0.77898
Kurtosis,0.47499
Mean,567.24
MAD,353.28
Skewness,0.92027
Sum,828171
Variance,195250
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,118,8.1%,
728,9,0.6%,
384,8,0.5%,
572,7,0.5%,
600,7,0.5%,
300,7,0.5%,
440,6,0.4%,
625,6,0.4%,
280,6,0.4%,
672,6,0.4%,

Value,Count,Frequency (%),Unnamed: 3
0,118,8.1%,
14,1,0.1%,
15,1,0.1%,
23,2,0.1%,
26,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
2042,1,0.1%,
2046,1,0.1%,
2121,1,0.1%,
2153,1,0.1%,
2336,1,0.1%,

0,1
Distinct count,2
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
Y,1365
N,95

Value,Count,Frequency (%),Unnamed: 3
Y,1365,93.5%,
N,95,6.5%,

0,1
Distinct count,9
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0

0,1
Norm,1260
Feedr,81
Artery,48
Other values (6),71

Value,Count,Frequency (%),Unnamed: 3
Norm,1260,86.3%,
Feedr,81,5.5%,
Artery,48,3.3%,
RRAn,26,1.8%,
PosN,19,1.3%,
RRAe,11,0.8%,
PosA,8,0.5%,
RRNn,5,0.3%,
RRNe,2,0.1%,

0,1
Distinct count,8
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Norm,1445
Feedr,6
PosN,2
Other values (5),7

Value,Count,Frequency (%),Unnamed: 3
Norm,1445,99.0%,
Feedr,6,0.4%,
PosN,2,0.1%,
Artery,2,0.1%,
RRNn,2,0.1%,
PosA,1,0.1%,
RRAn,1,0.1%,
RRAe,1,0.1%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),0.1%
Missing (n),1

0,1
SBrkr,1334
FuseA,94
FuseF,27
Other values (2),4

Value,Count,Frequency (%),Unnamed: 3
SBrkr,1334,91.4%,
FuseA,94,6.4%,
FuseF,27,1.8%,
FuseP,3,0.2%,
Mix,1,0.1%,
(Missing),1,0.1%,

0,1
Distinct count,120
Unique (%),8.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,21.954
Minimum,0
Maximum,552
Zeros (%),85.8%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,0.0
95-th percentile,180.15
Maximum,552.0
Range,552.0
Interquartile range,0.0

0,1
Standard deviation,61.119
Coef of variation,2.784
Kurtosis,10.431
Mean,21.954
MAD,37.66
Skewness,3.0899
Sum,32053
Variance,3735.6
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1252,85.8%,
112,15,1.0%,
96,6,0.4%,
120,5,0.3%,
144,5,0.3%,
192,5,0.3%,
216,5,0.3%,
252,4,0.3%,
116,4,0.3%,
156,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,1252,85.8%,
19,1,0.1%,
20,1,0.1%,
24,1,0.1%,
30,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
301,1,0.1%,
318,1,0.1%,
330,1,0.1%,
386,1,0.1%,
552,1,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,1282
Gd,146
Fa,28
Other values (2),4

Value,Count,Frequency (%),Unnamed: 3
TA,1282,87.8%,
Gd,146,10.0%,
Fa,28,1.9%,
Ex,3,0.2%,
Po,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,906
Gd,488
Ex,52

Value,Count,Frequency (%),Unnamed: 3
TA,906,62.1%,
Gd,488,33.4%,
Ex,52,3.6%,
Fa,14,1.0%,

0,1
Distinct count,15
Unique (%),1.0%
Missing (%),0.0%
Missing (n),0

0,1
VinylSd,515
HdBoard,222
MetalSd,220
Other values (12),503

Value,Count,Frequency (%),Unnamed: 3
VinylSd,515,35.3%,
HdBoard,222,15.2%,
MetalSd,220,15.1%,
Wd Sdng,206,14.1%,
Plywood,108,7.4%,
CemntBd,61,4.2%,
BrkFace,50,3.4%,
WdShing,26,1.8%,
Stucco,25,1.7%,
AsbShng,20,1.4%,

0,1
Distinct count,16
Unique (%),1.1%
Missing (%),0.0%
Missing (n),0

0,1
VinylSd,504
MetalSd,214
HdBoard,207
Other values (13),535

Value,Count,Frequency (%),Unnamed: 3
VinylSd,504,34.5%,
MetalSd,214,14.7%,
HdBoard,207,14.2%,
Wd Sdng,197,13.5%,
Plywood,142,9.7%,
CmentBd,60,4.1%,
Wd Shng,38,2.6%,
Stucco,26,1.8%,
BrkFace,25,1.7%,
AsbShng,20,1.4%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),80.8%
Missing (n),1179

0,1
MnPrv,157
GdPrv,59
GdWo,54
(Missing),1179

Value,Count,Frequency (%),Unnamed: 3
MnPrv,157,10.8%,
GdPrv,59,4.0%,
GdWo,54,3.7%,
MnWw,11,0.8%,
(Missing),1179,80.8%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),47.3%
Missing (n),690

0,1
Gd,380
TA,313
Fa,33
Other values (2),44
(Missing),690

Value,Count,Frequency (%),Unnamed: 3
Gd,380,26.0%,
TA,313,21.4%,
Fa,33,2.3%,
Ex,24,1.6%,
Po,20,1.4%,
(Missing),690,47.3%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.61301
Minimum,0
Maximum,3
Zeros (%),47.3%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,1
Q3,1
95-th percentile,2
Maximum,3
Range,3
Interquartile range,1

0,1
Standard deviation,0.64467
Coef of variation,1.0516
Kurtosis,-0.21724
Mean,0.61301
MAD,0.57942
Skewness,0.64957
Sum,895
Variance,0.41559
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,690,47.3%,
1,650,44.5%,
2,115,7.9%,
3,5,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,690,47.3%,
1,650,44.5%,
2,115,7.9%,
3,5,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,690,47.3%,
1,650,44.5%,
2,115,7.9%,
3,5,0.3%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
PConc,647
CBlock,634
BrkTil,146
Other values (3),33

Value,Count,Frequency (%),Unnamed: 3
PConc,647,44.3%,
CBlock,634,43.4%,
BrkTil,146,10.0%,
Slab,24,1.6%,
Stone,6,0.4%,
Wood,3,0.2%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.5651
Minimum,0
Maximum,3
Zeros (%),0.6%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,2
Q3,2
95-th percentile,2
Maximum,3
Range,3
Interquartile range,1

0,1
Standard deviation,0.55092
Coef of variation,0.35201
Kurtosis,-0.85704
Mean,1.5651
MAD,0.52244
Skewness,0.036562
Sum,2285
Variance,0.30351
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
2,768,52.6%,
1,650,44.5%,
3,33,2.3%,
0,9,0.6%,

Value,Count,Frequency (%),Unnamed: 3
0,9,0.6%,
1,650,44.5%,
2,768,52.6%,
3,33,2.3%,

Value,Count,Frequency (%),Unnamed: 3
0,9,0.6%,
1,650,44.5%,
2,768,52.6%,
3,33,2.3%,

0,1
Distinct count,7
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Typ,1360
Min2,34
Min1,31
Other values (4),35

Value,Count,Frequency (%),Unnamed: 3
Typ,1360,93.2%,
Min2,34,2.3%,
Min1,31,2.1%,
Mod,15,1.0%,
Maj1,14,1.0%,
Maj2,5,0.3%,
Sev,1,0.1%,

0,1
Distinct count,441
Unique (%),30.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,472.98
Minimum,0
Maximum,1418
Zeros (%),5.5%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,334.5
Median,480.0
Q3,576.0
95-th percentile,850.1
Maximum,1418.0
Range,1418.0
Interquartile range,241.5

0,1
Standard deviation,213.8
Coef of variation,0.45204
Kurtosis,0.91707
Mean,472.98
MAD,160.02
Skewness,0.17998
Sum,690551
Variance,45713
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,81,5.5%,
440,49,3.4%,
576,47,3.2%,
240,38,2.6%,
484,34,2.3%,
528,33,2.3%,
288,27,1.8%,
400,25,1.7%,
480,24,1.6%,
264,24,1.6%,

Value,Count,Frequency (%),Unnamed: 3
0,81,5.5%,
160,2,0.1%,
164,1,0.1%,
180,9,0.6%,
186,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1220,1,0.1%,
1248,1,0.1%,
1356,1,0.1%,
1390,1,0.1%,
1418,1,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.7671
Minimum,0
Maximum,4
Zeros (%),5.5%

0,1
Minimum,0
5-th percentile,0
Q1,1
Median,2
Q3,2
95-th percentile,3
Maximum,4
Range,4
Interquartile range,1

0,1
Standard deviation,0.74732
Coef of variation,0.4229
Kurtosis,0.221
Mean,1.7671
MAD,0.58384
Skewness,-0.34255
Sum,2580
Variance,0.55848
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
2,824,56.4%,
1,369,25.3%,
3,181,12.4%,
0,81,5.5%,
4,5,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,81,5.5%,
1,369,25.3%,
2,824,56.4%,
3,181,12.4%,
4,5,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,81,5.5%,
1,369,25.3%,
2,824,56.4%,
3,181,12.4%,
4,5,0.3%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),5.5%
Missing (n),81

0,1
TA,1326
Fa,35
Gd,9
Other values (2),9
(Missing),81

Value,Count,Frequency (%),Unnamed: 3
TA,1326,90.8%,
Fa,35,2.4%,
Gd,9,0.6%,
Po,7,0.5%,
Ex,2,0.1%,
(Missing),81,5.5%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),5.5%
Missing (n),81

0,1
Unf,605
RFn,422
Fin,352
(Missing),81

Value,Count,Frequency (%),Unnamed: 3
Unf,605,41.4%,
RFn,422,28.9%,
Fin,352,24.1%,
(Missing),81,5.5%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),5.5%
Missing (n),81

0,1
TA,1311
Fa,48
Gd,14
Other values (2),6
(Missing),81

Value,Count,Frequency (%),Unnamed: 3
TA,1311,89.8%,
Fa,48,3.3%,
Gd,14,1.0%,
Ex,3,0.2%,
Po,3,0.2%,
(Missing),81,5.5%,

0,1
Distinct count,7
Unique (%),0.5%
Missing (%),5.5%
Missing (n),81

0,1
Attchd,870
Detchd,387
BuiltIn,88
Other values (3),34
(Missing),81

Value,Count,Frequency (%),Unnamed: 3
Attchd,870,59.6%,
Detchd,387,26.5%,
BuiltIn,88,6.0%,
Basment,19,1.3%,
CarPort,9,0.6%,
2Types,6,0.4%,
(Missing),81,5.5%,

0,1
Distinct count,98
Unique (%),6.7%
Missing (%),5.5%
Missing (n),81
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1978.5
Minimum,1900
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,1900
5-th percentile,1930
Q1,1961
Median,1980
Q3,2002
95-th percentile,2007
Maximum,2010
Range,110
Interquartile range,41

0,1
Standard deviation,24.69
Coef of variation,0.012479
Kurtosis,-0.41834
Mean,1978.5
MAD,20.913
Skewness,-0.64941
Sum,2728400
Variance,609.58
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
2005.0,65,4.5%,
2006.0,59,4.0%,
2004.0,53,3.6%,
2003.0,50,3.4%,
2007.0,49,3.4%,
1977.0,35,2.4%,
1998.0,31,2.1%,
1999.0,30,2.1%,
1976.0,29,2.0%,
2008.0,29,2.0%,

Value,Count,Frequency (%),Unnamed: 3
1900.0,1,0.1%,
1906.0,1,0.1%,
1908.0,1,0.1%,
1910.0,3,0.2%,
1914.0,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
2006.0,59,4.0%,
2007.0,49,3.4%,
2008.0,29,2.0%,
2009.0,21,1.4%,
2010.0,3,0.2%,

0,1
Distinct count,861
Unique (%),59.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1515.5
Minimum,334
Maximum,5642
Zeros (%),0.0%

0,1
Minimum,334.0
5-th percentile,848.0
Q1,1129.5
Median,1464.0
Q3,1776.8
95-th percentile,2466.1
Maximum,5642.0
Range,5308.0
Interquartile range,647.25

0,1
Standard deviation,525.48
Coef of variation,0.34675
Kurtosis,4.8951
Mean,1515.5
MAD,397.32
Skewness,1.3666
Sum,2212577
Variance,276130
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
864,22,1.5%,
1040,14,1.0%,
894,11,0.8%,
848,10,0.7%,
1456,10,0.7%,
912,9,0.6%,
1200,9,0.6%,
816,8,0.5%,
1092,8,0.5%,
1344,7,0.5%,

Value,Count,Frequency (%),Unnamed: 3
334,1,0.1%,
438,1,0.1%,
480,1,0.1%,
520,1,0.1%,
605,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
3627,1,0.1%,
4316,1,0.1%,
4476,1,0.1%,
4676,1,0.1%,
5642,1,0.1%,

0,1
Distinct count,3
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.38288
Minimum,0
Maximum,2
Zeros (%),62.5%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,1
95-th percentile,1
Maximum,2
Range,2
Interquartile range,1

0,1
Standard deviation,0.50289
Coef of variation,1.3134
Kurtosis,-1.0769
Mean,0.38288
MAD,0.47886
Skewness,0.6759
Sum,559
Variance,0.25289
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,913,62.5%,
1,535,36.6%,
2,12,0.8%,

Value,Count,Frequency (%),Unnamed: 3
0,913,62.5%,
1,535,36.6%,
2,12,0.8%,

Value,Count,Frequency (%),Unnamed: 3
0,913,62.5%,
1,535,36.6%,
2,12,0.8%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
GasA,1428
GasW,18
Grav,7
Other values (3),7

Value,Count,Frequency (%),Unnamed: 3
GasA,1428,97.8%,
GasW,18,1.2%,
Grav,7,0.5%,
Wall,4,0.3%,
OthW,2,0.1%,
Floor,1,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Ex,741
TA,428
Gd,241
Other values (2),50

Value,Count,Frequency (%),Unnamed: 3
Ex,741,50.8%,
TA,428,29.3%,
Gd,241,16.5%,
Fa,49,3.4%,
Po,1,0.1%,

0,1
Distinct count,8
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
1Story,726
2Story,445
1.5Fin,154
Other values (5),135

Value,Count,Frequency (%),Unnamed: 3
1Story,726,49.7%,
2Story,445,30.5%,
1.5Fin,154,10.5%,
SLvl,65,4.5%,
SFoyer,37,2.5%,
1.5Unf,14,1.0%,
2.5Unf,11,0.8%,
2.5Fin,8,0.5%,

0,1
Distinct count,1460
Unique (%),100.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,730.5
Minimum,1
Maximum,1460
Zeros (%),0.0%

0,1
Minimum,1.0
5-th percentile,73.95
Q1,365.75
Median,730.5
Q3,1095.2
95-th percentile,1387.0
Maximum,1460.0
Range,1459.0
Interquartile range,729.5

0,1
Standard deviation,421.61
Coef of variation,0.57715
Kurtosis,-1.2
Mean,730.5
MAD,365
Skewness,0
Sum,1066530
Variance,177760
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
1460,1,0.1%,
479,1,0.1%,
481,1,0.1%,
482,1,0.1%,
483,1,0.1%,
484,1,0.1%,
485,1,0.1%,
486,1,0.1%,
487,1,0.1%,
488,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1,1,0.1%,
2,1,0.1%,
3,1,0.1%,
4,1,0.1%,
5,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1456,1,0.1%,
1457,1,0.1%,
1458,1,0.1%,
1459,1,0.1%,
1460,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.0466
Minimum,0
Maximum,3
Zeros (%),0.1%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,1
Q3,1
95-th percentile,1
Maximum,3
Range,3
Interquartile range,0

0,1
Standard deviation,0.22034
Coef of variation,0.21053
Kurtosis,21.532
Mean,1.0466
MAD,0.090246
Skewness,4.4884
Sum,1528
Variance,0.048549
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
1,1392,95.3%,
2,65,4.5%,
3,2,0.1%,
0,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1,0.1%,
1,1392,95.3%,
2,65,4.5%,
3,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1,0.1%,
1,1392,95.3%,
2,65,4.5%,
3,2,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,735
Gd,586
Ex,100

Value,Count,Frequency (%),Unnamed: 3
TA,735,50.3%,
Gd,586,40.1%,
Ex,100,6.8%,
Fa,39,2.7%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Lvl,1311
Bnk,63
HLS,50

Value,Count,Frequency (%),Unnamed: 3
Lvl,1311,89.8%,
Bnk,63,4.3%,
HLS,50,3.4%,
Low,36,2.5%,

0,1
Distinct count,3
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
Gtl,1382
Mod,65
Sev,13

Value,Count,Frequency (%),Unnamed: 3
Gtl,1382,94.7%,
Mod,65,4.5%,
Sev,13,0.9%,

0,1
Distinct count,1073
Unique (%),73.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,10517
Minimum,1300
Maximum,215245
Zeros (%),0.0%

0,1
Minimum,1300.0
5-th percentile,3311.7
Q1,7553.5
Median,9478.5
Q3,11602.0
95-th percentile,17401.0
Maximum,215245.0
Range,213945.0
Interquartile range,4048.0

0,1
Standard deviation,9981.3
Coef of variation,0.94908
Kurtosis,203.24
Mean,10517
MAD,3758.8
Skewness,12.208
Sum,15354569
Variance,99626000
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
7200,25,1.7%,
9600,24,1.6%,
6000,17,1.2%,
10800,14,1.0%,
9000,14,1.0%,
8400,14,1.0%,
1680,10,0.7%,
7500,9,0.6%,
8125,8,0.5%,
9100,8,0.5%,

Value,Count,Frequency (%),Unnamed: 3
1300,1,0.1%,
1477,1,0.1%,
1491,1,0.1%,
1526,1,0.1%,
1533,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
70761,1,0.1%,
115149,1,0.1%,
159000,1,0.1%,
164660,1,0.1%,
215245,1,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Inside,1052
Corner,263
CulDSac,94
Other values (2),51

Value,Count,Frequency (%),Unnamed: 3
Inside,1052,72.1%,
Corner,263,18.0%,
CulDSac,94,6.4%,
FR2,47,3.2%,
FR3,4,0.3%,

0,1
Distinct count,111
Unique (%),7.6%
Missing (%),17.7%
Missing (n),259
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,70.05
Minimum,21
Maximum,313
Zeros (%),0.0%

0,1
Minimum,21
5-th percentile,34
Q1,59
Median,69
Q3,80
95-th percentile,107
Maximum,313
Range,292
Interquartile range,21

0,1
Standard deviation,24.285
Coef of variation,0.34668
Kurtosis,17.453
Mean,70.05
MAD,16.762
Skewness,2.1636
Sum,84130
Variance,589.75
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
60.0,143,9.8%,
70.0,70,4.8%,
80.0,69,4.7%,
50.0,57,3.9%,
75.0,53,3.6%,
65.0,44,3.0%,
85.0,40,2.7%,
78.0,25,1.7%,
21.0,23,1.6%,
90.0,23,1.6%,

Value,Count,Frequency (%),Unnamed: 3
21.0,23,1.6%,
24.0,19,1.3%,
30.0,6,0.4%,
32.0,5,0.3%,
33.0,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
160.0,1,0.1%,
168.0,1,0.1%,
174.0,2,0.1%,
182.0,1,0.1%,
313.0,2,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Reg,925
IR1,484
IR2,41

Value,Count,Frequency (%),Unnamed: 3
Reg,925,63.4%,
IR1,484,33.2%,
IR2,41,2.8%,
IR3,10,0.7%,

0,1
Distinct count,24
Unique (%),1.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,5.8445
Minimum,0
Maximum,572
Zeros (%),98.2%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,572
Range,572
Interquartile range,0

0,1
Standard deviation,48.623
Coef of variation,8.3194
Kurtosis,83.235
Mean,5.8445
MAD,11.481
Skewness,9.0113
Sum,8533
Variance,2364.2
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1434,98.2%,
80,3,0.2%,
360,2,0.1%,
528,1,0.1%,
53,1,0.1%,
120,1,0.1%,
144,1,0.1%,
156,1,0.1%,
205,1,0.1%,
232,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1434,98.2%,
53,1,0.1%,
80,3,0.2%,
120,1,0.1%,
144,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
513,1,0.1%,
514,1,0.1%,
515,1,0.1%,
528,1,0.1%,
572,1,0.1%,

0,1
Distinct count,15
Unique (%),1.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,56.897
Minimum,20
Maximum,190
Zeros (%),0.0%

0,1
Minimum,20
5-th percentile,20
Q1,20
Median,50
Q3,70
95-th percentile,160
Maximum,190
Range,170
Interquartile range,50

0,1
Standard deviation,42.301
Coef of variation,0.74346
Kurtosis,1.5802
Mean,56.897
MAD,31.283
Skewness,1.4077
Sum,83070
Variance,1789.3
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
20,536,36.7%,
60,299,20.5%,
50,144,9.9%,
120,87,6.0%,
30,69,4.7%,
160,63,4.3%,
70,60,4.1%,
80,58,4.0%,
90,52,3.6%,
190,30,2.1%,

Value,Count,Frequency (%),Unnamed: 3
20,536,36.7%,
30,69,4.7%,
40,4,0.3%,
45,12,0.8%,
50,144,9.9%,

Value,Count,Frequency (%),Unnamed: 3
90,52,3.6%,
120,87,6.0%,
160,63,4.3%,
180,10,0.7%,
190,30,2.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
RL,1151
RM,218
FV,65
Other values (2),26

Value,Count,Frequency (%),Unnamed: 3
RL,1151,78.8%,
RM,218,14.9%,
FV,65,4.5%,
RH,16,1.1%,
C (all),10,0.7%,

0,1
Distinct count,328
Unique (%),22.5%
Missing (%),0.5%
Missing (n),8
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,103.69
Minimum,0
Maximum,1600
Zeros (%),59.0%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,166
95-th percentile,456
Maximum,1600
Range,1600
Interquartile range,166

0,1
Standard deviation,181.07
Coef of variation,1.7463
Kurtosis,10.082
Mean,103.69
MAD,129.78
Skewness,2.6691
Sum,150550
Variance,32785
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,861,59.0%,
72.0,8,0.5%,
180.0,8,0.5%,
108.0,8,0.5%,
120.0,7,0.5%,
16.0,7,0.5%,
106.0,6,0.4%,
80.0,6,0.4%,
340.0,6,0.4%,
200.0,6,0.4%,

Value,Count,Frequency (%),Unnamed: 3
0.0,861,59.0%,
1.0,2,0.1%,
11.0,1,0.1%,
14.0,1,0.1%,
16.0,7,0.5%,

Value,Count,Frequency (%),Unnamed: 3
1115.0,1,0.1%,
1129.0,1,0.1%,
1170.0,1,0.1%,
1378.0,1,0.1%,
1600.0,1,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.5%
Missing (n),8

0,1
,864
BrkFace,445
Stone,128

Value,Count,Frequency (%),Unnamed: 3
,864,59.2%,
BrkFace,445,30.5%,
Stone,128,8.8%,
BrkCmn,15,1.0%,
(Missing),8,0.5%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),96.3%
Missing (n),1406

0,1
Shed,49
Gar2,2
Othr,2
(Missing),1406

Value,Count,Frequency (%),Unnamed: 3
Shed,49,3.4%,
Gar2,2,0.1%,
Othr,2,0.1%,
TenC,1,0.1%,
(Missing),1406,96.3%,

0,1
Distinct count,21
Unique (%),1.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,43.489
Minimum,0
Maximum,15500
Zeros (%),96.4%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,15500
Range,15500
Interquartile range,0

0,1
Standard deviation,496.12
Coef of variation,11.408
Kurtosis,701
Mean,43.489
MAD,83.88
Skewness,24.477
Sum,63494
Variance,246140
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1408,96.4%,
400,11,0.8%,
500,8,0.5%,
700,5,0.3%,
450,4,0.3%,
2000,4,0.3%,
600,4,0.3%,
1200,2,0.1%,
480,2,0.1%,
1150,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1408,96.4%,
54,1,0.1%,
350,1,0.1%,
400,11,0.8%,
450,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
2000,4,0.3%,
2500,1,0.1%,
3500,1,0.1%,
8300,1,0.1%,
15500,1,0.1%,

0,1
Distinct count,12
Unique (%),0.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.3219
Minimum,1
Maximum,12
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,2
Q1,5
Median,6
Q3,8
95-th percentile,11
Maximum,12
Range,11
Interquartile range,3

0,1
Standard deviation,2.7036
Coef of variation,0.42766
Kurtosis,-0.40411
Mean,6.3219
MAD,2.1425
Skewness,0.21205
Sum,9230
Variance,7.3096
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
6,253,17.3%,
7,234,16.0%,
5,204,14.0%,
4,141,9.7%,
8,122,8.4%,
3,106,7.3%,
10,89,6.1%,
11,79,5.4%,
9,63,4.3%,
12,59,4.0%,

Value,Count,Frequency (%),Unnamed: 3
1,58,4.0%,
2,52,3.6%,
3,106,7.3%,
4,141,9.7%,
5,204,14.0%,

Value,Count,Frequency (%),Unnamed: 3
8,122,8.4%,
9,63,4.3%,
10,89,6.1%,
11,79,5.4%,
12,59,4.0%,

0,1
Distinct count,25
Unique (%),1.7%
Missing (%),0.0%
Missing (n),0

0,1
NAmes,225
CollgCr,150
OldTown,113
Other values (22),972

Value,Count,Frequency (%),Unnamed: 3
NAmes,225,15.4%,
CollgCr,150,10.3%,
OldTown,113,7.7%,
Edwards,100,6.8%,
Somerst,86,5.9%,
Gilbert,79,5.4%,
NridgHt,77,5.3%,
Sawyer,74,5.1%,
NWAmes,73,5.0%,
SawyerW,59,4.0%,

0,1
Distinct count,202
Unique (%),13.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,46.66
Minimum,0
Maximum,547
Zeros (%),44.9%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,25.0
Q3,68.0
95-th percentile,175.05
Maximum,547.0
Range,547.0
Interquartile range,68.0

0,1
Standard deviation,66.256
Coef of variation,1.42
Kurtosis,8.4903
Mean,46.66
MAD,47.678
Skewness,2.3643
Sum,68124
Variance,4389.9
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,656,44.9%,
36,29,2.0%,
48,22,1.5%,
20,21,1.4%,
40,19,1.3%,
45,19,1.3%,
30,16,1.1%,
24,16,1.1%,
60,15,1.0%,
39,14,1.0%,

Value,Count,Frequency (%),Unnamed: 3
0,656,44.9%,
4,1,0.1%,
8,1,0.1%,
10,1,0.1%,
11,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
406,1,0.1%,
418,1,0.1%,
502,1,0.1%,
523,1,0.1%,
547,1,0.1%,

0,1
Distinct count,9
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,5.5753
Minimum,1
Maximum,9
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,4
Q1,5
Median,5
Q3,6
95-th percentile,8
Maximum,9
Range,8
Interquartile range,1

0,1
Standard deviation,1.1128
Coef of variation,0.19959
Kurtosis,1.1064
Mean,5.5753
MAD,0.88902
Skewness,0.69307
Sum,8140
Variance,1.2383
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
5,821,56.2%,
6,252,17.3%,
7,205,14.0%,
8,72,4.9%,
4,57,3.9%,
3,25,1.7%,
9,22,1.5%,
2,5,0.3%,
1,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1,1,0.1%,
2,5,0.3%,
3,25,1.7%,
4,57,3.9%,
5,821,56.2%,

Value,Count,Frequency (%),Unnamed: 3
5,821,56.2%,
6,252,17.3%,
7,205,14.0%,
8,72,4.9%,
9,22,1.5%,

0,1
Distinct count,10
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.0993
Minimum,1
Maximum,10
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,4
Q1,5
Median,6
Q3,7
95-th percentile,8
Maximum,10
Range,9
Interquartile range,2

0,1
Standard deviation,1.383
Coef of variation,0.22675
Kurtosis,0.096293
Mean,6.0993
MAD,1.098
Skewness,0.21694
Sum,8905
Variance,1.9127
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
5,397,27.2%,
6,374,25.6%,
7,319,21.8%,
8,168,11.5%,
4,116,7.9%,
9,43,2.9%,
3,20,1.4%,
10,18,1.2%,
2,3,0.2%,
1,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1,2,0.1%,
2,3,0.2%,
3,20,1.4%,
4,116,7.9%,
5,397,27.2%,

Value,Count,Frequency (%),Unnamed: 3
6,374,25.6%,
7,319,21.8%,
8,168,11.5%,
9,43,2.9%,
10,18,1.2%,

0,1
Distinct count,3
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
Y,1340
N,90
P,30

Value,Count,Frequency (%),Unnamed: 3
Y,1340,91.8%,
N,90,6.2%,
P,30,2.1%,

0,1
Distinct count,8
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2.7589
Minimum,0
Maximum,738
Zeros (%),99.5%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,738
Range,738
Interquartile range,0

0,1
Standard deviation,40.177
Coef of variation,14.563
Kurtosis,223.27
Mean,2.7589
MAD,5.4914
Skewness,14.828
Sum,4028
Variance,1614.2
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1453,99.5%,
738,1,0.1%,
648,1,0.1%,
576,1,0.1%,
555,1,0.1%,
519,1,0.1%,
512,1,0.1%,
480,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1453,99.5%,
480,1,0.1%,
512,1,0.1%,
519,1,0.1%,
555,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
519,1,0.1%,
555,1,0.1%,
576,1,0.1%,
648,1,0.1%,
738,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),99.5%
Missing (n),1453

0,1
Gd,3
Ex,2
Fa,2
(Missing),1453

Value,Count,Frequency (%),Unnamed: 3
Gd,3,0.2%,
Ex,2,0.1%,
Fa,2,0.1%,
(Missing),1453,99.5%,

0,1
Distinct count,8
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
CompShg,1434
Tar&Grv,11
WdShngl,6
Other values (5),9

Value,Count,Frequency (%),Unnamed: 3
CompShg,1434,98.2%,
Tar&Grv,11,0.8%,
WdShngl,6,0.4%,
WdShake,5,0.3%,
Metal,1,0.1%,
Membran,1,0.1%,
Roll,1,0.1%,
ClyTile,1,0.1%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
Gable,1141
Hip,286
Flat,13
Other values (3),20

Value,Count,Frequency (%),Unnamed: 3
Gable,1141,78.2%,
Hip,286,19.6%,
Flat,13,0.9%,
Gambrel,11,0.8%,
Mansard,7,0.5%,
Shed,2,0.1%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
Normal,1198
Partial,125
Abnorml,101
Other values (3),36

Value,Count,Frequency (%),Unnamed: 3
Normal,1198,82.1%,
Partial,125,8.6%,
Abnorml,101,6.9%,
Family,20,1.4%,
Alloca,12,0.8%,
AdjLand,4,0.3%,

0,1
Distinct count,9
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0

0,1
WD,1267
New,122
COD,43
Other values (6),28

Value,Count,Frequency (%),Unnamed: 3
WD,1267,86.8%,
New,122,8.4%,
COD,43,2.9%,
ConLD,9,0.6%,
ConLI,5,0.3%,
ConLw,5,0.3%,
CWD,4,0.3%,
Oth,3,0.2%,
Con,2,0.1%,

0,1
Distinct count,76
Unique (%),5.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,15.061
Minimum,0
Maximum,480
Zeros (%),92.1%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,160
Maximum,480
Range,480
Interquartile range,0

0,1
Standard deviation,55.757
Coef of variation,3.7021
Kurtosis,18.439
Mean,15.061
MAD,27.729
Skewness,4.1222
Sum,21989
Variance,3108.9
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1344,92.1%,
192,6,0.4%,
224,5,0.3%,
120,5,0.3%,
189,4,0.3%,
180,4,0.3%,
160,3,0.2%,
168,3,0.2%,
144,3,0.2%,
126,3,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,1344,92.1%,
40,1,0.1%,
53,1,0.1%,
60,1,0.1%,
63,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
385,1,0.1%,
396,1,0.1%,
410,1,0.1%,
440,1,0.1%,
480,1,0.1%,

0,1
Distinct count,2
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
Pave,1454
Grvl,6

Value,Count,Frequency (%),Unnamed: 3
Pave,1454,99.6%,
Grvl,6,0.4%,

0,1
Distinct count,12
Unique (%),0.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.5178
Minimum,2
Maximum,14
Zeros (%),0.0%

0,1
Minimum,2
5-th percentile,4
Q1,5
Median,6
Q3,7
95-th percentile,10
Maximum,14
Range,12
Interquartile range,2

0,1
Standard deviation,1.6254
Coef of variation,0.24938
Kurtosis,0.88076
Mean,6.5178
MAD,1.2796
Skewness,0.67634
Sum,9516
Variance,2.6419
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
6,402,27.5%,
7,329,22.5%,
5,275,18.8%,
8,187,12.8%,
4,97,6.6%,
9,75,5.1%,
10,47,3.2%,
11,18,1.2%,
3,17,1.2%,
12,11,0.8%,

Value,Count,Frequency (%),Unnamed: 3
2,1,0.1%,
3,17,1.2%,
4,97,6.6%,
5,275,18.8%,
6,402,27.5%,

Value,Count,Frequency (%),Unnamed: 3
9,75,5.1%,
10,47,3.2%,
11,18,1.2%,
12,11,0.8%,
14,1,0.1%,

0,1
Distinct count,721
Unique (%),49.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1057.4
Minimum,0
Maximum,6110
Zeros (%),2.5%

0,1
Minimum,0.0
5-th percentile,519.3
Q1,795.75
Median,991.5
Q3,1298.2
95-th percentile,1753.0
Maximum,6110.0
Range,6110.0
Interquartile range,502.5

0,1
Standard deviation,438.71
Coef of variation,0.41488
Kurtosis,13.25
Mean,1057.4
MAD,321.28
Skewness,1.5243
Sum,1543847
Variance,192460
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,37,2.5%,
864,35,2.4%,
672,17,1.2%,
912,15,1.0%,
1040,14,1.0%,
816,13,0.9%,
728,12,0.8%,
768,12,0.8%,
848,11,0.8%,
780,11,0.8%,

Value,Count,Frequency (%),Unnamed: 3
0,37,2.5%,
105,1,0.1%,
190,1,0.1%,
264,3,0.2%,
270,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
3094,1,0.1%,
3138,1,0.1%,
3200,1,0.1%,
3206,1,0.1%,
6110,1,0.1%,

0,1
Distinct count,2
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
AllPub,1459
NoSeWa,1

Value,Count,Frequency (%),Unnamed: 3
AllPub,1459,99.9%,
NoSeWa,1,0.1%,

0,1
Distinct count,274
Unique (%),18.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,94.245
Minimum,0
Maximum,857
Zeros (%),52.1%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,168
95-th percentile,335
Maximum,857
Range,857
Interquartile range,168

0,1
Standard deviation,125.34
Coef of variation,1.3299
Kurtosis,2.993
Mean,94.245
MAD,102
Skewness,1.5414
Sum,137597
Variance,15710
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,761,52.1%,
192,38,2.6%,
100,36,2.5%,
144,33,2.3%,
120,31,2.1%,
168,28,1.9%,
140,15,1.0%,
224,14,1.0%,
240,10,0.7%,
208,10,0.7%,

Value,Count,Frequency (%),Unnamed: 3
0,761,52.1%,
12,2,0.1%,
24,2,0.1%,
26,2,0.1%,
28,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
668,1,0.1%,
670,1,0.1%,
728,1,0.1%,
736,1,0.1%,
857,1,0.1%,

0,1
Distinct count,112
Unique (%),7.7%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1971.3
Minimum,1872
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,1872
5-th percentile,1916
Q1,1954
Median,1973
Q3,2000
95-th percentile,2007
Maximum,2010
Range,138
Interquartile range,46

0,1
Standard deviation,30.203
Coef of variation,0.015322
Kurtosis,-0.43955
Mean,1971.3
MAD,25.067
Skewness,-0.61346
Sum,2878051
Variance,912.22
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
2006,67,4.6%,
2005,64,4.4%,
2004,54,3.7%,
2007,49,3.4%,
2003,45,3.1%,
1976,33,2.3%,
1977,32,2.2%,
1920,30,2.1%,
1959,26,1.8%,
1999,25,1.7%,

Value,Count,Frequency (%),Unnamed: 3
1872,1,0.1%,
1875,1,0.1%,
1880,4,0.3%,
1882,1,0.1%,
1885,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
2006,67,4.6%,
2007,49,3.4%,
2008,23,1.6%,
2009,18,1.2%,
2010,1,0.1%,

0,1
Distinct count,61
Unique (%),4.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1984.9
Minimum,1950
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,1950
5-th percentile,1950
Q1,1967
Median,1994
Q3,2004
95-th percentile,2007
Maximum,2010
Range,60
Interquartile range,37

0,1
Standard deviation,20.645
Coef of variation,0.010401
Kurtosis,-1.2722
Mean,1984.9
MAD,18.623
Skewness,-0.50356
Sum,2897904
Variance,426.23
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
1950,178,12.2%,
2006,97,6.6%,
2007,76,5.2%,
2005,73,5.0%,
2004,62,4.2%,
2000,55,3.8%,
2003,51,3.5%,
2002,48,3.3%,
2008,40,2.7%,
1996,36,2.5%,

Value,Count,Frequency (%),Unnamed: 3
1950,178,12.2%,
1951,4,0.3%,
1952,5,0.3%,
1953,10,0.7%,
1954,14,1.0%,

Value,Count,Frequency (%),Unnamed: 3
2006,97,6.6%,
2007,76,5.2%,
2008,40,2.7%,
2009,23,1.6%,
2010,6,0.4%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2007.8
Minimum,2006
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,2006
5-th percentile,2006
Q1,2007
Median,2008
Q3,2009
95-th percentile,2010
Maximum,2010
Range,4
Interquartile range,2

0,1
Standard deviation,1.3281
Coef of variation,0.00066146
Kurtosis,-1.1906
Mean,2007.8
MAD,1.1487
Skewness,0.096269
Sum,2931411
Variance,1.7638
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
2009,338,23.2%,
2007,329,22.5%,
2006,314,21.5%,
2008,304,20.8%,
2010,175,12.0%,

Value,Count,Frequency (%),Unnamed: 3
2006,314,21.5%,
2007,329,22.5%,
2008,304,20.8%,
2009,338,23.2%,
2010,175,12.0%,

Value,Count,Frequency (%),Unnamed: 3
2006,314,21.5%,
2007,329,22.5%,
2008,304,20.8%,
2009,338,23.2%,
2010,175,12.0%,

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1
1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,5,2003,2003,Gable,CompShg,VinylSd,VinylSd,BrkFace,196.0,Gd,TA,PConc,Gd,TA,No,GLQ,706,Unf,0,150,856,GasA,Ex,Y,SBrkr,856,854,0,1710,1,0,2,1,3,1,Gd,8,Typ,0,,Attchd,2003.0,RFn,2,548,TA,TA,Y,0,61,0,0,0,0,,,,0,2,2008,WD,Normal
2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,1Story,6,8,1976,1976,Gable,CompShg,MetalSd,MetalSd,,0.0,TA,TA,CBlock,Gd,TA,Gd,ALQ,978,Unf,0,284,1262,GasA,Ex,Y,SBrkr,1262,0,0,1262,0,1,2,0,3,1,TA,6,Typ,1,TA,Attchd,1976.0,RFn,2,460,TA,TA,Y,298,0,0,0,0,0,,,,0,5,2007,WD,Normal
3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,5,2001,2002,Gable,CompShg,VinylSd,VinylSd,BrkFace,162.0,Gd,TA,PConc,Gd,TA,Mn,GLQ,486,Unf,0,434,920,GasA,Ex,Y,SBrkr,920,866,0,1786,1,0,2,1,3,1,Gd,6,Typ,1,TA,Attchd,2001.0,RFn,2,608,TA,TA,Y,0,42,0,0,0,0,,,,0,9,2008,WD,Normal
4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,2Story,7,5,1915,1970,Gable,CompShg,Wd Sdng,Wd Shng,,0.0,TA,TA,BrkTil,TA,Gd,No,ALQ,216,Unf,0,540,756,GasA,Gd,Y,SBrkr,961,756,0,1717,1,0,1,0,3,1,Gd,7,Typ,1,Gd,Detchd,1998.0,Unf,3,642,TA,TA,Y,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml
5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,2Story,8,5,2000,2000,Gable,CompShg,VinylSd,VinylSd,BrkFace,350.0,Gd,TA,PConc,Gd,TA,Av,GLQ,655,Unf,0,490,1145,GasA,Ex,Y,SBrkr,1145,1053,0,2198,1,0,2,1,4,1,Gd,9,Typ,1,TA,Attchd,2000.0,RFn,3,836,TA,TA,Y,192,84,0,0,0,0,,,,0,12,2008,WD,Normal


# Manage Missing Data

## Manage Missing Categorical Data

In [80]:
# Get the columns for categorical features that are missing values
cat_feats_missing_vals = list(X_train.columns[(X_train.dtypes =='object') & X_train.isnull().any()])
print( cat_feats_missing_vals )

['Alley', 'MasVnrType', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Electrical', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PoolQC', 'Fence', 'MiscFeature']


Based on the provided description, most of the missing values should be replaced with "NA"

The only special cases are:
 
MasVnrType: Masonry veneer type
- None: None

BsmtExposure: Refers to walkout or garden level walls
- No:   No Exposure
- NA:   No Basement

Electrical: Electrical system
- No good label, as such we will remove this featue. Additionaly, from the Profile Report we see this feature has a uniqueness of 0.4% so this feature may not tell us much about the data anyway.

In [81]:
# Possible reasons for missing values on BsmtExposure could be "no exposure" or "no basement"
# We can do a simple crossreference to see how many times values exist in other basement categories and not for BsmtExposure
(X_train['BsmtExposure'].isnull() & X_train['BsmtCond'].notnull()).sum()

0

No times does this explanation occure, so we can safely assume missing values in BsmtExposure are due to "no basement" 

In [82]:
# Manage special cases in categorical data
X_train = X_train.fillna( value={'MasVnrType' : 'None', 'BsmtExposure' : 'NA'}, inplace=False )
X_val   = X_val.fillna( value={'MasVnrType' : 'None', 'BsmtExposure' : 'NA'}, inplace=False )
X_test  = X_test.fillna( value={'MasVnrType' : 'None', 'BsmtExposure' : 'NA'}, inplace=False )

X_train.drop( columns=['Electrical'], inplace=True )
X_val.drop( columns=['Electrical'], inplace=True )
X_test.drop( columns=['Electrical'], inplace=True )

In [83]:
# Manage all other cases of missing values in categorical data
value = {}
for feat in cat_feats_missing_vals:
    value[feat] = 'NA'
    
X_train = X_train.fillna( value=value, inplace=False )
X_val   = X_val.fillna( value=value, inplace=False )
X_test  = X_test.fillna( value=value, inplace=False )

In [87]:
pp.ProfileReport(X_test)

0,1
Number of variables,79
Number of observations,1459
Total Missing (%),0.3%
Total size in memory,900.6 KiB
Average record size in memory,632.1 B

0,1
Numeric,37
Categorical,42
Boolean,0
Date,0
Text (Unique),0
Rejected,0
Unsupported,0

0,1
Distinct count,789
Unique (%),54.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1156.5
Minimum,407
Maximum,5095
Zeros (%),0.0%

0,1
Minimum,407.0
5-th percentile,630.0
Q1,873.5
Median,1079.0
Q3,1382.5
95-th percentile,1829.1
Maximum,5095.0
Range,4688.0
Interquartile range,509.0

0,1
Standard deviation,398.17
Coef of variation,0.34427
Kurtosis,8.0539
Mean,1156.5
MAD,306.3
Skewness,1.5582
Sum,1687384
Variance,158540
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
864,21,1.4%,
546,12,0.8%,
1040,12,0.8%,
960,11,0.8%,
936,10,0.7%,
816,9,0.6%,
1008,8,0.5%,
768,8,0.5%,
1072,7,0.5%,
1152,7,0.5%,

Value,Count,Frequency (%),Unnamed: 3
407,1,0.1%,
432,1,0.1%,
442,1,0.1%,
448,1,0.1%,
453,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
2674,1,0.1%,
2696,1,0.1%,
2726,1,0.1%,
3820,1,0.1%,
5095,1,0.1%,

0,1
Distinct count,407
Unique (%),27.9%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,325.97
Minimum,0
Maximum,1862
Zeros (%),57.5%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,676.0
95-th percentile,1116.8
Maximum,1862.0
Range,1862.0
Interquartile range,676.0

0,1
Standard deviation,420.61
Coef of variation,1.2903
Kurtosis,-0.27544
Mean,325.97
MAD,376.87
Skewness,0.91288
Sum,475587
Variance,176910
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,839,57.5%,
546,15,1.0%,
728,8,0.5%,
504,8,0.5%,
886,7,0.5%,
600,6,0.4%,
720,6,0.4%,
672,5,0.3%,
462,5,0.3%,
630,5,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,839,57.5%,
125,1,0.1%,
144,1,0.1%,
180,1,0.1%,
182,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1721,1,0.1%,
1778,1,0.1%,
1788,1,0.1%,
1836,1,0.1%,
1862,1,0.1%,

0,1
Distinct count,13
Unique (%),0.9%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.7944
Minimum,0
Maximum,360
Zeros (%),99.1%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,360
Range,360
Interquartile range,0

0,1
Standard deviation,20.208
Coef of variation,11.262
Kurtosis,170.2
Mean,1.7944
MAD,3.5568
Skewness,12.524
Sum,2618
Variance,408.36
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1446,99.1%,
153,2,0.1%,
360,1,0.1%,
323,1,0.1%,
255,1,0.1%,
225,1,0.1%,
224,1,0.1%,
219,1,0.1%,
176,1,0.1%,
174,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1446,99.1%,
86,1,0.1%,
120,1,0.1%,
150,1,0.1%,
153,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
224,1,0.1%,
225,1,0.1%,
255,1,0.1%,
323,1,0.1%,
360,1,0.1%,

0,1
Distinct count,3
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
,1352
Grvl,70
Pave,37

Value,Count,Frequency (%),Unnamed: 3
,1352,92.7%,
Grvl,70,4.8%,
Pave,37,2.5%,

0,1
Distinct count,7
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2.854
Minimum,0
Maximum,6
Zeros (%),0.1%

0,1
Minimum,0
5-th percentile,2
Q1,2
Median,3
Q3,3
95-th percentile,4
Maximum,6
Range,6
Interquartile range,1

0,1
Standard deviation,0.82979
Coef of variation,0.29074
Kurtosis,1.686
Mean,2.854
MAD,0.59206
Skewness,0.43662
Sum,4164
Variance,0.68855
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
3,792,54.3%,
2,384,26.3%,
4,187,12.8%,
1,53,3.6%,
5,27,1.9%,
6,14,1.0%,
0,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,2,0.1%,
1,53,3.6%,
2,384,26.3%,
3,792,54.3%,
4,187,12.8%,

Value,Count,Frequency (%),Unnamed: 3
2,384,26.3%,
3,792,54.3%,
4,187,12.8%,
5,27,1.9%,
6,14,1.0%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
1Fam,1205
TwnhsE,113
Duplex,57
Other values (2),84

Value,Count,Frequency (%),Unnamed: 3
1Fam,1205,82.6%,
TwnhsE,113,7.7%,
Duplex,57,3.9%,
Twnhs,53,3.6%,
2fmCon,31,2.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,1295
Fa,59
Gd,57
Other values (2),48

Value,Count,Frequency (%),Unnamed: 3
TA,1295,88.8%,
Fa,59,4.0%,
Gd,57,3.9%,
,45,3.1%,
Po,3,0.2%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
No,951
Av,197
Gd,142
Other values (2),169

Value,Count,Frequency (%),Unnamed: 3
No,951,65.2%,
Av,197,13.5%,
Gd,142,9.7%,
Mn,125,8.6%,
,44,3.0%,

0,1
Distinct count,670
Unique (%),45.9%
Missing (%),0.1%
Missing (n),1
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,439.2
Minimum,0
Maximum,4010
Zeros (%),31.7%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,350.5
Q3,753.5
95-th percentile,1290.6
Maximum,4010.0
Range,4010.0
Interquartile range,753.5

0,1
Standard deviation,455.27
Coef of variation,1.0366
Kurtosis,2.673
Mean,439.2
MAD,374.37
Skewness,1.1657
Sum,640360
Variance,207270
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,462,31.7%,
24.0,15,1.0%,
602.0,6,0.4%,
276.0,6,0.4%,
288.0,5,0.3%,
16.0,5,0.3%,
758.0,5,0.3%,
300.0,5,0.3%,
476.0,4,0.3%,
252.0,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0.0,462,31.7%,
16.0,5,0.3%,
20.0,3,0.2%,
24.0,15,1.0%,
28.0,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
2146.0,1,0.1%,
2158.0,1,0.1%,
2257.0,1,0.1%,
2288.0,1,0.1%,
4010.0,1,0.1%,

0,1
Distinct count,162
Unique (%),11.1%
Missing (%),0.1%
Missing (n),1
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,52.619
Minimum,0
Maximum,1526
Zeros (%),87.6%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,0.0
95-th percentile,448.15
Maximum,1526.0
Range,1526.0
Interquartile range,0.0

0,1
Standard deviation,176.75
Coef of variation,3.3591
Kurtosis,17.667
Mean,52.619
MAD,92.449
Skewness,4.0413
Sum,76719
Variance,31242
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,1278,87.6%,
294.0,3,0.2%,
483.0,3,0.2%,
162.0,3,0.2%,
144.0,2,0.1%,
247.0,2,0.1%,
435.0,2,0.1%,
288.0,2,0.1%,
590.0,2,0.1%,
596.0,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,1278,87.6%,
6.0,1,0.1%,
12.0,1,0.1%,
38.0,1,0.1%,
40.0,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1073.0,1,0.1%,
1083.0,1,0.1%,
1164.0,1,0.1%,
1393.0,1,0.1%,
1526.0,1,0.1%,

0,1
Distinct count,7
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
GLQ,431
Unf,421
ALQ,209
Other values (4),398

Value,Count,Frequency (%),Unnamed: 3
GLQ,431,29.5%,
Unf,421,28.9%,
ALQ,209,14.3%,
Rec,155,10.6%,
BLQ,121,8.3%,
LwQ,80,5.5%,
,42,2.9%,

0,1
Distinct count,7
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Unf,1237
Rec,51
,42
Other values (4),129

Value,Count,Frequency (%),Unnamed: 3
Unf,1237,84.8%,
Rec,51,3.5%,
,42,2.9%,
LwQ,41,2.8%,
BLQ,35,2.4%,
ALQ,33,2.3%,
GLQ,20,1.4%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.1%
Missing (n),2
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.43445
Minimum,0
Maximum,3
Zeros (%),58.2%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,1
95-th percentile,1
Maximum,3
Range,3
Interquartile range,1

0,1
Standard deviation,0.53065
Coef of variation,1.2214
Kurtosis,-0.64498
Mean,0.43445
MAD,0.50632
Skewness,0.6497
Sum,633
Variance,0.28159
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,849,58.2%,
1.0,584,40.0%,
2.0,23,1.6%,
3.0,1,0.1%,
(Missing),2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,849,58.2%,
1.0,584,40.0%,
2.0,23,1.6%,
3.0,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,849,58.2%,
1.0,584,40.0%,
2.0,23,1.6%,
3.0,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.1%
Missing (n),2
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.065202
Minimum,0
Maximum,2
Zeros (%),93.5%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,1
Maximum,2
Range,2
Interquartile range,0

0,1
Standard deviation,0.25247
Coef of variation,3.8721
Kurtosis,13.551
Mean,0.065202
MAD,0.12208
Skewness,3.7799
Sum,95
Variance,0.06374
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,1364,93.5%,
1.0,91,6.2%,
2.0,2,0.1%,
(Missing),2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,1364,93.5%,
1.0,91,6.2%,
2.0,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,1364,93.5%,
1.0,91,6.2%,
2.0,2,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,634
Gd,591
Ex,137
Other values (2),97

Value,Count,Frequency (%),Unnamed: 3
TA,634,43.5%,
Gd,591,40.5%,
Ex,137,9.4%,
Fa,53,3.6%,
,44,3.0%,

0,1
Distinct count,794
Unique (%),54.4%
Missing (%),0.1%
Missing (n),1
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,554.29
Minimum,0
Maximum,2140
Zeros (%),8.4%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,219.25
Median,460.0
Q3,797.75
95-th percentile,1488.4
Maximum,2140.0
Range,2140.0
Interquartile range,578.5

0,1
Standard deviation,437.26
Coef of variation,0.78886
Kurtosis,0.33254
Mean,554.29
MAD,349.7
Skewness,0.91992
Sum,808160
Variance,191200
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,123,8.4%,
384.0,11,0.8%,
624.0,8,0.5%,
348.0,7,0.5%,
100.0,7,0.5%,
738.0,7,0.5%,
480.0,7,0.5%,
672.0,7,0.5%,
816.0,6,0.4%,
294.0,6,0.4%,

Value,Count,Frequency (%),Unnamed: 3
0.0,123,8.4%,
17.0,1,0.1%,
20.0,1,0.1%,
22.0,1,0.1%,
25.0,3,0.2%,

Value,Count,Frequency (%),Unnamed: 3
1921.0,1,0.1%,
1958.0,1,0.1%,
1967.0,1,0.1%,
2062.0,1,0.1%,
2140.0,1,0.1%,

0,1
Distinct count,2
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
Y,1358
N,101

Value,Count,Frequency (%),Unnamed: 3
Y,1358,93.1%,
N,101,6.9%,

0,1
Distinct count,9
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0

0,1
Norm,1251
Feedr,83
Artery,44
Other values (6),81

Value,Count,Frequency (%),Unnamed: 3
Norm,1251,85.7%,
Feedr,83,5.7%,
Artery,44,3.0%,
RRAn,24,1.6%,
PosN,20,1.4%,
RRAe,17,1.2%,
PosA,12,0.8%,
RRNe,4,0.3%,
RRNn,4,0.3%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Norm,1444
Feedr,7
Artery,3
Other values (2),5

Value,Count,Frequency (%),Unnamed: 3
Norm,1444,99.0%,
Feedr,7,0.5%,
Artery,3,0.2%,
PosA,3,0.2%,
PosN,2,0.1%,

0,1
Distinct count,131
Unique (%),9.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,24.243
Minimum,0
Maximum,1012
Zeros (%),82.8%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,0.0
95-th percentile,169.1
Maximum,1012.0
Range,1012.0
Interquartile range,0.0

0,1
Standard deviation,67.228
Coef of variation,2.773
Kurtosis,40.129
Mean,24.243
MAD,40.173
Skewness,4.6692
Sum,35371
Variance,4519.6
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1208,82.8%,
96,7,0.5%,
168,7,0.5%,
112,7,0.5%,
84,6,0.4%,
144,6,0.4%,
192,5,0.3%,
60,5,0.3%,
180,5,0.3%,
160,5,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,1208,82.8%,
16,1,0.1%,
18,1,0.1%,
20,1,0.1%,
23,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
368,1,0.1%,
429,1,0.1%,
432,1,0.1%,
584,1,0.1%,
1012,1,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,1256
Gd,153
Fa,39
Other values (2),11

Value,Count,Frequency (%),Unnamed: 3
TA,1256,86.1%,
Gd,153,10.5%,
Fa,39,2.7%,
Ex,9,0.6%,
Po,2,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,892
Gd,491
Ex,55

Value,Count,Frequency (%),Unnamed: 3
TA,892,61.1%,
Gd,491,33.7%,
Ex,55,3.8%,
Fa,21,1.4%,

0,1
Distinct count,14
Unique (%),1.0%
Missing (%),0.1%
Missing (n),1

0,1
VinylSd,510
MetalSd,230
HdBoard,220
Other values (10),498

Value,Count,Frequency (%),Unnamed: 3
VinylSd,510,35.0%,
MetalSd,230,15.8%,
HdBoard,220,15.1%,
Wd Sdng,205,14.1%,
Plywood,113,7.7%,
CemntBd,65,4.5%,
BrkFace,37,2.5%,
WdShing,30,2.1%,
AsbShng,24,1.6%,
Stucco,18,1.2%,

0,1
Distinct count,16
Unique (%),1.1%
Missing (%),0.1%
Missing (n),1

0,1
VinylSd,510
MetalSd,233
HdBoard,199
Other values (12),516

Value,Count,Frequency (%),Unnamed: 3
VinylSd,510,35.0%,
MetalSd,233,16.0%,
HdBoard,199,13.6%,
Wd Sdng,194,13.3%,
Plywood,128,8.8%,
CmentBd,66,4.5%,
Wd Shng,43,2.9%,
BrkFace,22,1.5%,
Stucco,21,1.4%,
AsbShng,18,1.2%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
,1169
MnPrv,172
GdPrv,59
Other values (2),59

Value,Count,Frequency (%),Unnamed: 3
,1169,80.1%,
MnPrv,172,11.8%,
GdPrv,59,4.0%,
GdWo,58,4.0%,
MnWw,1,0.1%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
,730
Gd,364
TA,279
Other values (3),86

Value,Count,Frequency (%),Unnamed: 3
,730,50.0%,
Gd,364,24.9%,
TA,279,19.1%,
Fa,41,2.8%,
Po,26,1.8%,
Ex,19,1.3%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.58122
Minimum,0
Maximum,4
Zeros (%),50.0%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,1
95-th percentile,2
Maximum,4
Range,4
Interquartile range,1

0,1
Standard deviation,0.64742
Coef of variation,1.1139
Kurtosis,0.38733
Mean,0.58122
MAD,0.58162
Skewness,0.81986
Sum,848
Variance,0.41915
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,730,50.0%,
1,618,42.4%,
2,104,7.1%,
3,6,0.4%,
4,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,730,50.0%,
1,618,42.4%,
2,104,7.1%,
3,6,0.4%,
4,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,730,50.0%,
1,618,42.4%,
2,104,7.1%,
3,6,0.4%,
4,1,0.1%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
PConc,661
CBlock,601
BrkTil,165
Other values (3),32

Value,Count,Frequency (%),Unnamed: 3
PConc,661,45.3%,
CBlock,601,41.2%,
BrkTil,165,11.3%,
Slab,25,1.7%,
Stone,5,0.3%,
Wood,2,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.5709
Minimum,0
Maximum,4
Zeros (%),0.2%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,2
Q3,2
95-th percentile,2
Maximum,4
Range,4
Interquartile range,1

0,1
Standard deviation,0.55519
Coef of variation,0.35341
Kurtosis,-0.23234
Mean,1.5709
MAD,0.52222
Skewness,0.29584
Sum,2292
Variance,0.30824
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
2,762,52.2%,
1,659,45.2%,
3,31,2.1%,
4,4,0.3%,
0,3,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,3,0.2%,
1,659,45.2%,
2,762,52.2%,
3,31,2.1%,
4,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,3,0.2%,
1,659,45.2%,
2,762,52.2%,
3,31,2.1%,
4,4,0.3%,

0,1
Distinct count,8
Unique (%),0.5%
Missing (%),0.1%
Missing (n),2

0,1
Typ,1357
Min2,36
Min1,34
Other values (4),30

Value,Count,Frequency (%),Unnamed: 3
Typ,1357,93.0%,
Min2,36,2.5%,
Min1,34,2.3%,
Mod,20,1.4%,
Maj1,5,0.3%,
Maj2,4,0.3%,
Sev,1,0.1%,
(Missing),2,0.1%,

0,1
Distinct count,460
Unique (%),31.5%
Missing (%),0.1%
Missing (n),1
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,472.77
Minimum,0
Maximum,1488
Zeros (%),5.2%

0,1
Minimum,0
5-th percentile,0
Q1,318
Median,480
Q3,576
95-th percentile,864
Maximum,1488
Range,1488
Interquartile range,258

0,1
Standard deviation,217.05
Coef of variation,0.4591
Kurtosis,0.96685
Mean,472.77
MAD,162.68
Skewness,0.30024
Sum,689300
Variance,47110
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,76,5.2%,
576.0,50,3.4%,
440.0,47,3.2%,
484.0,34,2.3%,
400.0,33,2.3%,
528.0,32,2.2%,
240.0,31,2.1%,
480.0,30,2.1%,
308.0,28,1.9%,
264.0,27,1.9%,

Value,Count,Frequency (%),Unnamed: 3
0.0,76,5.2%,
100.0,1,0.1%,
160.0,1,0.1%,
162.0,2,0.1%,
164.0,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1200.0,1,0.1%,
1231.0,1,0.1%,
1314.0,1,0.1%,
1348.0,1,0.1%,
1488.0,1,0.1%,

0,1
Distinct count,7
Unique (%),0.5%
Missing (%),0.1%
Missing (n),1
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.7661
Minimum,0
Maximum,5
Zeros (%),5.2%

0,1
Minimum,0
5-th percentile,0
Q1,1
Median,2
Q3,2
95-th percentile,3
Maximum,5
Range,5
Interquartile range,1

0,1
Standard deviation,0.77595
Coef of variation,0.43935
Kurtosis,0.24961
Mean,1.7661
MAD,0.61184
Skewness,-0.10714
Sum,2575
Variance,0.60209
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
2.0,770,52.8%,
1.0,407,27.9%,
3.0,193,13.2%,
0.0,76,5.2%,
4.0,11,0.8%,
5.0,1,0.1%,
(Missing),1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,76,5.2%,
1.0,407,27.9%,
2.0,770,52.8%,
3.0,193,13.2%,
4.0,11,0.8%,

Value,Count,Frequency (%),Unnamed: 3
1.0,407,27.9%,
2.0,770,52.8%,
3.0,193,13.2%,
4.0,11,0.8%,
5.0,1,0.1%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
TA,1328
,78
Fa,39
Other values (3),14

Value,Count,Frequency (%),Unnamed: 3
TA,1328,91.0%,
,78,5.3%,
Fa,39,2.7%,
Po,7,0.5%,
Gd,6,0.4%,
Ex,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Unf,625
RFn,389
Fin,367

Value,Count,Frequency (%),Unnamed: 3
Unf,625,42.8%,
RFn,389,26.7%,
Fin,367,25.2%,
,78,5.3%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,1293
,78
Fa,76
Other values (2),12

Value,Count,Frequency (%),Unnamed: 3
TA,1293,88.6%,
,78,5.3%,
Fa,76,5.2%,
Gd,10,0.7%,
Po,2,0.1%,

0,1
Distinct count,7
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Attchd,853
Detchd,392
BuiltIn,98
Other values (4),116

Value,Count,Frequency (%),Unnamed: 3
Attchd,853,58.5%,
Detchd,392,26.9%,
BuiltIn,98,6.7%,
,76,5.2%,
Basment,17,1.2%,
2Types,17,1.2%,
CarPort,6,0.4%,

0,1
Distinct count,98
Unique (%),6.7%
Missing (%),5.3%
Missing (n),78
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1977.7
Minimum,1895
Maximum,2207
Zeros (%),0.0%

0,1
Minimum,1895
5-th percentile,1926
Q1,1959
Median,1979
Q3,2002
95-th percentile,2007
Maximum,2207
Range,312
Interquartile range,43

0,1
Standard deviation,26.431
Coef of variation,0.013364
Kurtosis,3.4979
Mean,1977.7
MAD,21.847
Skewness,-0.15836
Sum,2731200
Variance,698.61
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
2005.0,77,5.3%,
2007.0,66,4.5%,
2006.0,56,3.8%,
2004.0,46,3.2%,
2003.0,42,2.9%,
2008.0,32,2.2%,
1977.0,31,2.1%,
2000.0,28,1.9%,
2002.0,27,1.9%,
1950.0,27,1.9%,

Value,Count,Frequency (%),Unnamed: 3
1895.0,1,0.1%,
1896.0,1,0.1%,
1900.0,5,0.3%,
1910.0,7,0.5%,
1915.0,5,0.3%,

Value,Count,Frequency (%),Unnamed: 3
2007.0,66,4.5%,
2008.0,32,2.2%,
2009.0,8,0.5%,
2010.0,2,0.1%,
2207.0,1,0.1%,

0,1
Distinct count,879
Unique (%),60.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1486
Minimum,407
Maximum,5095
Zeros (%),0.0%

0,1
Minimum,407.0
5-th percentile,864.0
Q1,1117.5
Median,1432.0
Q3,1721.0
95-th percentile,2461.3
Maximum,5095.0
Range,4688.0
Interquartile range,603.5

0,1
Standard deviation,485.57
Coef of variation,0.32675
Kurtosis,2.9203
Mean,1486
MAD,371.21
Skewness,1.1304
Sum,2168141
Variance,235770
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
864,19,1.3%,
1092,18,1.2%,
1040,11,0.8%,
1456,10,0.7%,
936,9,0.6%,
1200,9,0.6%,
960,7,0.5%,
1152,6,0.4%,
1728,6,0.4%,
1358,6,0.4%,

Value,Count,Frequency (%),Unnamed: 3
407,1,0.1%,
492,1,0.1%,
498,1,0.1%,
540,1,0.1%,
572,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
3390,1,0.1%,
3500,1,0.1%,
3672,1,0.1%,
3820,1,0.1%,
5095,1,0.1%,

0,1
Distinct count,3
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.37766
Minimum,0
Maximum,2
Zeros (%),63.1%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,1
95-th percentile,1
Maximum,2
Range,2
Interquartile range,1

0,1
Standard deviation,0.50302
Coef of variation,1.3319
Kurtosis,-0.9887
Mean,0.37766
MAD,0.47679
Skewness,0.71473
Sum,551
Variance,0.25303
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,921,63.1%,
1,525,36.0%,
2,13,0.9%,

Value,Count,Frequency (%),Unnamed: 3
0,921,63.1%,
1,525,36.0%,
2,13,0.9%,

Value,Count,Frequency (%),Unnamed: 3
0,921,63.1%,
1,525,36.0%,
2,13,0.9%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
GasA,1446
GasW,9
Wall,2

Value,Count,Frequency (%),Unnamed: 3
GasA,1446,99.1%,
GasW,9,0.6%,
Wall,2,0.1%,
Grav,2,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Ex,752
TA,429
Gd,233
Other values (2),45

Value,Count,Frequency (%),Unnamed: 3
Ex,752,51.5%,
TA,429,29.4%,
Gd,233,16.0%,
Fa,43,2.9%,
Po,2,0.1%,

0,1
Distinct count,7
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
1Story,745
2Story,427
1.5Fin,160
Other values (4),127

Value,Count,Frequency (%),Unnamed: 3
1Story,745,51.1%,
2Story,427,29.3%,
1.5Fin,160,11.0%,
SLvl,63,4.3%,
SFoyer,46,3.2%,
2.5Unf,13,0.9%,
1.5Unf,5,0.3%,

0,1
Distinct count,1459
Unique (%),100.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2190
Minimum,1461
Maximum,2919
Zeros (%),0.0%

0,1
Minimum,1461.0
5-th percentile,1533.9
Q1,1825.5
Median,2190.0
Q3,2554.5
95-th percentile,2846.1
Maximum,2919.0
Range,1458.0
Interquartile range,729.0

0,1
Standard deviation,421.32
Coef of variation,0.19238
Kurtosis,-1.2
Mean,2190
MAD,364.75
Skewness,0
Sum,3195210
Variance,177510
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
2047,1,0.1%,
2526,1,0.1%,
2528,1,0.1%,
2529,1,0.1%,
2530,1,0.1%,
2531,1,0.1%,
2532,1,0.1%,
2533,1,0.1%,
2534,1,0.1%,
2535,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1461,1,0.1%,
1462,1,0.1%,
1463,1,0.1%,
1464,1,0.1%,
1465,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
2915,1,0.1%,
2916,1,0.1%,
2917,1,0.1%,
2918,1,0.1%,
2919,1,0.1%,

0,1
Distinct count,3
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.0425
Minimum,0
Maximum,2
Zeros (%),0.1%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,1
Q3,1
95-th percentile,1
Maximum,2
Range,2
Interquartile range,0

0,1
Standard deviation,0.20847
Coef of variation,0.19997
Kurtosis,17.472
Mean,1.0425
MAD,0.084003
Skewness,4.0791
Sum,1521
Variance,0.04346
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
1,1393,95.5%,
2,64,4.4%,
0,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,2,0.1%,
1,1393,95.5%,
2,64,4.4%,

Value,Count,Frequency (%),Unnamed: 3
0,2,0.1%,
1,1393,95.5%,
2,64,4.4%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.1%
Missing (n),1

0,1
TA,757
Gd,565
Ex,105

Value,Count,Frequency (%),Unnamed: 3
TA,757,51.9%,
Gd,565,38.7%,
Ex,105,7.2%,
Fa,31,2.1%,
(Missing),1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Lvl,1311
HLS,70
Bnk,54

Value,Count,Frequency (%),Unnamed: 3
Lvl,1311,89.9%,
HLS,70,4.8%,
Bnk,54,3.7%,
Low,24,1.6%,

0,1
Distinct count,3
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
Gtl,1396
Mod,60
Sev,3

Value,Count,Frequency (%),Unnamed: 3
Gtl,1396,95.7%,
Mod,60,4.1%,
Sev,3,0.2%,

0,1
Distinct count,1106
Unique (%),75.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,9819.2
Minimum,1470
Maximum,56600
Zeros (%),0.0%

0,1
Minimum,1470.0
5-th percentile,3085.5
Q1,7391.0
Median,9399.0
Q3,11518.0
95-th percentile,16873.0
Maximum,56600.0
Range,55130.0
Interquartile range,4126.5

0,1
Standard deviation,4955.5
Coef of variation,0.50468
Kurtosis,20.747
Mean,9819.2
MAD,3112.1
Skewness,3.1152
Sum,14326156
Variance,24557000
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
9600,20,1.4%,
7200,18,1.2%,
6000,17,1.2%,
9000,15,1.0%,
7500,12,0.8%,
10800,11,0.8%,
6240,10,0.7%,
7000,9,0.6%,
6120,9,0.6%,
1680,8,0.5%,

Value,Count,Frequency (%),Unnamed: 3
1470,1,0.1%,
1476,1,0.1%,
1477,1,0.1%,
1484,1,0.1%,
1488,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
47007,1,0.1%,
47280,1,0.1%,
50102,1,0.1%,
51974,1,0.1%,
56600,1,0.1%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Inside,1081
Corner,248
CulDSac,82
Other values (2),48

Value,Count,Frequency (%),Unnamed: 3
Inside,1081,74.1%,
Corner,248,17.0%,
CulDSac,82,5.6%,
FR2,38,2.6%,
FR3,10,0.7%,

0,1
Distinct count,116
Unique (%),8.0%
Missing (%),15.6%
Missing (n),227
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,68.58
Minimum,21
Maximum,200
Zeros (%),0.0%

0,1
Minimum,21.0
5-th percentile,27.1
Q1,58.0
Median,67.0
Q3,80.0
95-th percentile,107.45
Maximum,200.0
Range,179.0
Interquartile range,22.0

0,1
Standard deviation,22.377
Coef of variation,0.32629
Kurtosis,2.5872
Mean,68.58
MAD,16.48
Skewness,0.66192
Sum,84491
Variance,500.72
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
60.0,133,9.1%,
80.0,68,4.7%,
70.0,63,4.3%,
50.0,60,4.1%,
75.0,52,3.6%,
65.0,49,3.4%,
85.0,36,2.5%,
24.0,30,2.1%,
63.0,30,2.1%,
21.0,27,1.9%,

Value,Count,Frequency (%),Unnamed: 3
21.0,27,1.9%,
22.0,1,0.1%,
24.0,30,2.1%,
25.0,1,0.1%,
26.0,3,0.2%,

Value,Count,Frequency (%),Unnamed: 3
150.0,1,0.1%,
155.0,1,0.1%,
160.0,2,0.1%,
195.0,1,0.1%,
200.0,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Reg,934
IR1,484
IR2,35

Value,Count,Frequency (%),Unnamed: 3
Reg,934,64.0%,
IR1,484,33.2%,
IR2,35,2.4%,
IR3,6,0.4%,

0,1
Distinct count,15
Unique (%),1.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3.5435
Minimum,0
Maximum,1064
Zeros (%),99.0%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,1064
Range,1064
Interquartile range,0

0,1
Standard deviation,44.043
Coef of variation,12.429
Kurtosis,308.68
Mean,3.5435
MAD,7.019
Skewness,16.167
Sum,5170
Variance,1939.8
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1445,99.0%,
1064,1,0.1%,
697,1,0.1%,
512,1,0.1%,
450,1,0.1%,
436,1,0.1%,
431,1,0.1%,
362,1,0.1%,
312,1,0.1%,
259,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1445,99.0%,
80,1,0.1%,
108,1,0.1%,
114,1,0.1%,
140,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
436,1,0.1%,
450,1,0.1%,
512,1,0.1%,
697,1,0.1%,
1064,1,0.1%,

0,1
Distinct count,16
Unique (%),1.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,57.378
Minimum,20
Maximum,190
Zeros (%),0.0%

0,1
Minimum,20
5-th percentile,20
Q1,20
Median,50
Q3,70
95-th percentile,160
Maximum,190
Range,170
Interquartile range,50

0,1
Standard deviation,42.747
Coef of variation,0.745
Kurtosis,1.349
Mean,57.378
MAD,32.045
Skewness,1.3467
Sum,83715
Variance,1827.3
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
20,543,37.2%,
60,276,18.9%,
50,143,9.8%,
120,95,6.5%,
30,70,4.8%,
70,68,4.7%,
160,65,4.5%,
80,60,4.1%,
90,57,3.9%,
190,31,2.1%,

Value,Count,Frequency (%),Unnamed: 3
20,543,37.2%,
30,70,4.8%,
40,2,0.1%,
45,6,0.4%,
50,143,9.8%,

Value,Count,Frequency (%),Unnamed: 3
120,95,6.5%,
150,1,0.1%,
160,65,4.5%,
180,7,0.5%,
190,31,2.1%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),0.3%
Missing (n),4

0,1
RL,1114
RM,242
FV,74
Other values (2),25

Value,Count,Frequency (%),Unnamed: 3
RL,1114,76.4%,
RM,242,16.6%,
FV,74,5.1%,
C (all),15,1.0%,
RH,10,0.7%,
(Missing),4,0.3%,

0,1
Distinct count,304
Unique (%),20.8%
Missing (%),1.0%
Missing (n),15
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,100.71
Minimum,0
Maximum,1290
Zeros (%),60.1%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,164.0
95-th percentile,478.95
Maximum,1290.0
Range,1290.0
Interquartile range,164.0

0,1
Standard deviation,177.63
Coef of variation,1.7638
Kurtosis,8.3763
Mean,100.71
MAD,128.83
Skewness,2.5334
Sum,145420
Variance,31551
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,877,60.1%,
176.0,10,0.7%,
144.0,9,0.6%,
120.0,8,0.5%,
216.0,8,0.5%,
200.0,7,0.5%,
198.0,6,0.4%,
504.0,6,0.4%,
128.0,6,0.4%,
302.0,6,0.4%,

Value,Count,Frequency (%),Unnamed: 3
0.0,877,60.1%,
1.0,1,0.1%,
3.0,1,0.1%,
14.0,3,0.2%,
16.0,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
1095.0,1,0.1%,
1110.0,1,0.1%,
1159.0,1,0.1%,
1224.0,2,0.1%,
1290.0,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
,894
BrkFace,434
Stone,121

Value,Count,Frequency (%),Unnamed: 3
,894,61.3%,
BrkFace,434,29.7%,
Stone,121,8.3%,
BrkCmn,10,0.7%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
,1408
Shed,46
Gar2,3

Value,Count,Frequency (%),Unnamed: 3
,1408,96.5%,
Shed,46,3.2%,
Gar2,3,0.2%,
Othr,2,0.1%,

0,1
Distinct count,26
Unique (%),1.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,58.168
Minimum,0
Maximum,17000
Zeros (%),96.5%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,17000
Range,17000
Interquartile range,0

0,1
Standard deviation,630.81
Coef of variation,10.845
Kurtosis,471.52
Mean,58.168
MAD,112.27
Skewness,20.075
Sum,84867
Variance,397920
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1408,96.5%,
400,7,0.5%,
450,5,0.3%,
500,5,0.3%,
600,4,0.3%,
650,3,0.2%,
2000,3,0.2%,
1500,3,0.2%,
3000,2,0.1%,
4500,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1408,96.5%,
80,1,0.1%,
300,1,0.1%,
400,7,0.5%,
420,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
3000,2,0.1%,
4500,2,0.1%,
6500,1,0.1%,
12500,1,0.1%,
17000,1,0.1%,

0,1
Distinct count,12
Unique (%),0.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.1042
Minimum,1
Maximum,12
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,2
Q1,4
Median,6
Q3,8
95-th percentile,11
Maximum,12
Range,11
Interquartile range,4

0,1
Standard deviation,2.7224
Coef of variation,0.44599
Kurtosis,-0.5078
Mean,6.1042
MAD,2.161
Skewness,0.18302
Sum,8906
Variance,7.4116
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
6,250,17.1%,
7,212,14.5%,
5,190,13.0%,
4,138,9.5%,
3,126,8.6%,
8,111,7.6%,
9,95,6.5%,
10,84,5.8%,
2,81,5.6%,
1,64,4.4%,

Value,Count,Frequency (%),Unnamed: 3
1,64,4.4%,
2,81,5.6%,
3,126,8.6%,
4,138,9.5%,
5,190,13.0%,

Value,Count,Frequency (%),Unnamed: 3
8,111,7.6%,
9,95,6.5%,
10,84,5.8%,
11,63,4.3%,
12,45,3.1%,

0,1
Distinct count,25
Unique (%),1.7%
Missing (%),0.0%
Missing (n),0

0,1
NAmes,218
OldTown,126
CollgCr,117
Other values (22),998

Value,Count,Frequency (%),Unnamed: 3
NAmes,218,14.9%,
OldTown,126,8.6%,
CollgCr,117,8.0%,
Somerst,96,6.6%,
Edwards,94,6.4%,
NridgHt,89,6.1%,
Gilbert,86,5.9%,
Sawyer,77,5.3%,
SawyerW,66,4.5%,
Mitchel,65,4.5%,

0,1
Distinct count,203
Unique (%),13.9%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,48.314
Minimum,0
Maximum,742
Zeros (%),44.0%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,28
Q3,72
95-th percentile,189
Maximum,742
Range,742
Interquartile range,72

0,1
Standard deviation,68.883
Coef of variation,1.4257
Kurtosis,13.011
Mean,48.314
MAD,48.812
Skewness,2.6878
Sum,70490
Variance,4744.9
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,642,44.0%,
48,29,2.0%,
32,27,1.9%,
40,25,1.7%,
36,23,1.6%,
28,21,1.4%,
24,20,1.4%,
50,16,1.1%,
64,15,1.0%,
30,15,1.0%,

Value,Count,Frequency (%),Unnamed: 3
0,642,44.0%,
6,1,0.1%,
10,1,0.1%,
11,2,0.1%,
12,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
382,1,0.1%,
444,1,0.1%,
484,1,0.1%,
570,1,0.1%,
742,1,0.1%,

0,1
Distinct count,9
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,5.5538
Minimum,1
Maximum,9
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,4
Q1,5
Median,5
Q3,6
95-th percentile,8
Maximum,9
Range,8
Interquartile range,1

0,1
Standard deviation,1.1137
Coef of variation,0.20054
Kurtosis,1.8518
Mean,5.5538
MAD,0.86859
Skewness,0.44916
Sum,8103
Variance,1.2404
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
5,824,56.5%,
6,279,19.1%,
7,185,12.7%,
8,72,4.9%,
4,44,3.0%,
3,25,1.7%,
9,19,1.3%,
1,6,0.4%,
2,5,0.3%,

Value,Count,Frequency (%),Unnamed: 3
1,6,0.4%,
2,5,0.3%,
3,25,1.7%,
4,44,3.0%,
5,824,56.5%,

Value,Count,Frequency (%),Unnamed: 3
5,824,56.5%,
6,279,19.1%,
7,185,12.7%,
8,72,4.9%,
9,19,1.3%,

0,1
Distinct count,10
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.0788
Minimum,1
Maximum,10
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,4
Q1,5
Median,6
Q3,7
95-th percentile,9
Maximum,10
Range,9
Interquartile range,2

0,1
Standard deviation,1.4368
Coef of variation,0.23636
Kurtosis,0.037641
Mean,6.0788
MAD,1.1392
Skewness,0.1812
Sum,8869
Variance,2.0644
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
5,428,29.3%,
6,357,24.5%,
7,281,19.3%,
8,174,11.9%,
4,110,7.5%,
9,64,4.4%,
3,20,1.4%,
10,13,0.9%,
2,10,0.7%,
1,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1,2,0.1%,
2,10,0.7%,
3,20,1.4%,
4,110,7.5%,
5,428,29.3%,

Value,Count,Frequency (%),Unnamed: 3
6,357,24.5%,
7,281,19.3%,
8,174,11.9%,
9,64,4.4%,
10,13,0.9%,

0,1
Distinct count,3
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
Y,1301
N,126
P,32

Value,Count,Frequency (%),Unnamed: 3
Y,1301,89.2%,
N,126,8.6%,
P,32,2.2%,

0,1
Distinct count,7
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.7443
Minimum,0
Maximum,800
Zeros (%),99.6%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,800
Range,800
Interquartile range,0

0,1
Standard deviation,30.492
Coef of variation,17.48
Kurtosis,445.66
Mean,1.7443
MAD,3.4743
Skewness,20.197
Sum,2545
Variance,929.74
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1453,99.6%,
800,1,0.1%,
561,1,0.1%,
444,1,0.1%,
368,1,0.1%,
228,1,0.1%,
144,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1453,99.6%,
144,1,0.1%,
228,1,0.1%,
368,1,0.1%,
444,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
228,1,0.1%,
368,1,0.1%,
444,1,0.1%,
561,1,0.1%,
800,1,0.1%,

0,1
Distinct count,3
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
,1456
Ex,2
Gd,1

Value,Count,Frequency (%),Unnamed: 3
,1456,99.8%,
Ex,2,0.1%,
Gd,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
CompShg,1442
Tar&Grv,12
WdShake,4

Value,Count,Frequency (%),Unnamed: 3
CompShg,1442,98.8%,
Tar&Grv,12,0.8%,
WdShake,4,0.3%,
WdShngl,1,0.1%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
Gable,1169
Hip,265
Gambrel,11
Other values (3),14

Value,Count,Frequency (%),Unnamed: 3
Gable,1169,80.1%,
Hip,265,18.2%,
Gambrel,11,0.8%,
Flat,7,0.5%,
Mansard,4,0.3%,
Shed,3,0.2%,

0,1
Distinct count,6
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
Normal,1204
Partial,120
Abnorml,89
Other values (3),46

Value,Count,Frequency (%),Unnamed: 3
Normal,1204,82.5%,
Partial,120,8.2%,
Abnorml,89,6.1%,
Family,26,1.8%,
Alloca,12,0.8%,
AdjLand,8,0.5%,

0,1
Distinct count,10
Unique (%),0.7%
Missing (%),0.1%
Missing (n),1

0,1
WD,1258
New,117
COD,44
Other values (6),39

Value,Count,Frequency (%),Unnamed: 3
WD,1258,86.2%,
New,117,8.0%,
COD,44,3.0%,
ConLD,17,1.2%,
CWD,8,0.5%,
Oth,4,0.3%,
ConLI,4,0.3%,
Con,3,0.2%,
ConLw,3,0.2%,
(Missing),1,0.1%,

0,1
Distinct count,75
Unique (%),5.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,17.064
Minimum,0
Maximum,576
Zeros (%),90.4%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,0.0
95-th percentile,162.2
Maximum,576.0
Range,576.0
Interquartile range,0.0

0,1
Standard deviation,56.61
Coef of variation,3.3174
Kurtosis,17.24
Mean,17.064
MAD,30.854
Skewness,3.7882
Sum,24897
Variance,3204.7
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1319,90.4%,
144,10,0.7%,
168,7,0.5%,
216,6,0.4%,
200,5,0.3%,
192,5,0.3%,
120,4,0.3%,
225,3,0.2%,
153,3,0.2%,
155,3,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,1319,90.4%,
64,1,0.1%,
84,1,0.1%,
88,1,0.1%,
92,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
322,1,0.1%,
342,1,0.1%,
348,1,0.1%,
490,1,0.1%,
576,1,0.1%,

0,1
Distinct count,2
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
Pave,1453
Grvl,6

Value,Count,Frequency (%),Unnamed: 3
Pave,1453,99.6%,
Grvl,6,0.4%,

0,1
Distinct count,12
Unique (%),0.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.3852
Minimum,3
Maximum,15
Zeros (%),0.0%

0,1
Minimum,3
5-th percentile,4
Q1,5
Median,6
Q3,7
95-th percentile,9
Maximum,15
Range,12
Interquartile range,2

0,1
Standard deviation,1.5089
Coef of variation,0.23631
Kurtosis,1.5226
Mean,6.3852
MAD,1.179
Skewness,0.8426
Sum,9316
Variance,2.2768
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
6,442,30.3%,
7,320,21.9%,
5,308,21.1%,
8,160,11.0%,
4,99,6.8%,
9,68,4.7%,
10,33,2.3%,
11,14,1.0%,
3,8,0.5%,
12,5,0.3%,

Value,Count,Frequency (%),Unnamed: 3
3,8,0.5%,
4,99,6.8%,
5,308,21.1%,
6,442,30.3%,
7,320,21.9%,

Value,Count,Frequency (%),Unnamed: 3
10,33,2.3%,
11,14,1.0%,
12,5,0.3%,
13,1,0.1%,
15,1,0.1%,

0,1
Distinct count,737
Unique (%),50.5%
Missing (%),0.1%
Missing (n),1
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1046.1
Minimum,0
Maximum,5095
Zeros (%),2.8%

0,1
Minimum,0
5-th percentile,392
Q1,784
Median,988
Q3,1305
95-th percentile,1782
Maximum,5095
Range,5095
Interquartile range,521

0,1
Standard deviation,442.9
Coef of variation,0.42337
Kurtosis,5.2038
Mean,1046.1
MAD,332.86
Skewness,0.81359
Sum,1525200
Variance,196160
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,41,2.8%,
864.0,39,2.7%,
960.0,13,0.9%,
546.0,12,0.8%,
672.0,12,0.8%,
384.0,12,0.8%,
1008.0,12,0.8%,
768.0,12,0.8%,
1040.0,11,0.8%,
912.0,11,0.8%,

Value,Count,Frequency (%),Unnamed: 3
0.0,41,2.8%,
160.0,1,0.1%,
173.0,1,0.1%,
192.0,1,0.1%,
216.0,2,0.1%,

Value,Count,Frequency (%),Unnamed: 3
2552.0,1,0.1%,
2630.0,1,0.1%,
2660.0,1,0.1%,
2846.0,1,0.1%,
5095.0,1,0.1%,

0,1
Distinct count,2
Unique (%),0.1%
Missing (%),0.1%
Missing (n),2

0,1
AllPub,1457
(Missing),2

Value,Count,Frequency (%),Unnamed: 3
AllPub,1457,99.9%,
(Missing),2,0.1%,

0,1
Distinct count,263
Unique (%),18.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,93.175
Minimum,0
Maximum,1424
Zeros (%),52.2%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,168
95-th percentile,319
Maximum,1424
Range,1424
Interquartile range,168

0,1
Standard deviation,127.74
Coef of variation,1.371
Kurtosis,10.249
Mean,93.175
MAD,101.49
Skewness,2.1308
Sum,135942
Variance,16319
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,762,52.2%,
100,38,2.6%,
192,32,2.2%,
144,28,1.9%,
168,28,1.9%,
120,22,1.5%,
140,14,1.0%,
200,11,0.8%,
240,10,0.7%,
160,9,0.6%,

Value,Count,Frequency (%),Unnamed: 3
0,762,52.2%,
4,1,0.1%,
14,1,0.1%,
16,1,0.1%,
20,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
657,1,0.1%,
684,1,0.1%,
690,1,0.1%,
870,1,0.1%,
1424,1,0.1%,

0,1
Distinct count,106
Unique (%),7.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1971.4
Minimum,1879
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,1879
5-th percentile,1915
Q1,1953
Median,1973
Q3,2001
95-th percentile,2007
Maximum,2010
Range,131
Interquartile range,48

0,1
Standard deviation,30.39
Coef of variation,0.015416
Kurtosis,-0.57932
Mean,1971.4
MAD,25.425
Skewness,-0.58766
Sum,2876211
Variance,923.56
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
2005,78,5.3%,
2006,71,4.9%,
2007,60,4.1%,
2004,45,3.1%,
2003,43,2.9%,
1999,27,1.9%,
1920,27,1.9%,
2008,26,1.8%,
1910,26,1.8%,
1956,25,1.7%,

Value,Count,Frequency (%),Unnamed: 3
1879,1,0.1%,
1880,1,0.1%,
1890,5,0.3%,
1895,3,0.2%,
1896,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
2006,71,4.9%,
2007,60,4.1%,
2008,26,1.8%,
2009,7,0.5%,
2010,2,0.1%,

0,1
Distinct count,61
Unique (%),4.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1983.7
Minimum,1950
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,1950
5-th percentile,1950
Q1,1963
Median,1992
Q3,2004
95-th percentile,2007
Maximum,2010
Range,60
Interquartile range,41

0,1
Standard deviation,21.13
Coef of variation,0.010652
Kurtosis,-1.4126
Mean,1983.7
MAD,19.272
Skewness,-0.39991
Sum,2894164
Variance,446.5
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
1950,183,12.5%,
2006,105,7.2%,
2007,88,6.0%,
2005,68,4.7%,
2004,49,3.4%,
2000,49,3.4%,
2003,48,3.3%,
2008,41,2.8%,
1998,41,2.8%,
2002,34,2.3%,

Value,Count,Frequency (%),Unnamed: 3
1950,183,12.5%,
1951,10,0.7%,
1952,10,0.7%,
1953,10,0.7%,
1954,14,1.0%,

Value,Count,Frequency (%),Unnamed: 3
2006,105,7.2%,
2007,88,6.0%,
2008,41,2.8%,
2009,11,0.8%,
2010,7,0.5%,

0,1
Distinct count,5
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2007.8
Minimum,2006
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,2006
5-th percentile,2006
Q1,2007
Median,2008
Q3,2009
95-th percentile,2010
Maximum,2010
Range,4
Interquartile range,2

0,1
Standard deviation,1.3017
Coef of variation,0.00064835
Kurtosis,-1.1148
Mean,2007.8
MAD,1.1229
Skewness,0.16899
Sum,2929336
Variance,1.6945
Memory size,11.5 KiB

Value,Count,Frequency (%),Unnamed: 3
2007,363,24.9%,
2008,318,21.8%,
2009,309,21.2%,
2006,305,20.9%,
2010,164,11.2%,

Value,Count,Frequency (%),Unnamed: 3
2006,305,20.9%,
2007,363,24.9%,
2008,318,21.8%,
2009,309,21.2%,
2010,164,11.2%,

Value,Count,Frequency (%),Unnamed: 3
2006,305,20.9%,
2007,363,24.9%,
2008,318,21.8%,
2009,309,21.2%,
2010,164,11.2%,

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1
1461,20,RH,80.0,11622,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Feedr,Norm,1Fam,1Story,5,6,1961,1961,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,CBlock,TA,TA,No,Rec,468.0,LwQ,144.0,270.0,882.0,GasA,TA,Y,896,0,0,896,0.0,0.0,1,0,2,1,TA,5,Typ,0,,Attchd,1961.0,Unf,1.0,730.0,TA,TA,Y,140,0,0,0,120,0,,MnPrv,,0,6,2010,WD,Normal
1462,20,RL,81.0,14267,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6,6,1958,1958,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,1329,0,0,1329,0.0,0.0,1,1,3,1,Gd,6,Typ,0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393,36,0,0,0,0,,,Gar2,12500,6,2010,WD,Normal
1463,60,RL,74.0,13830,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,5,5,1997,1998,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,928,701,0,1629,0.0,0.0,2,1,3,1,TA,6,Typ,1,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212,34,0,0,0,0,,MnPrv,,0,3,2010,WD,Normal
1464,60,RL,78.0,9978,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,6,6,1998,1998,Gable,CompShg,VinylSd,VinylSd,BrkFace,20.0,TA,TA,PConc,TA,TA,No,GLQ,602.0,Unf,0.0,324.0,926.0,GasA,Ex,Y,926,678,0,1604,0.0,0.0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,1998.0,Fin,2.0,470.0,TA,TA,Y,360,36,0,0,0,0,,,,0,6,2010,WD,Normal
1465,120,RL,43.0,5005,Pave,,IR1,HLS,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8,5,1992,1992,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,1280,0,0,1280,0.0,0.0,2,0,2,1,Gd,5,Typ,0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0,82,0,0,144,0,,,,0,1,2010,WD,Normal


## Manage Missing Numerical Data

### Drop Numerical Features with Missing Data

In [45]:
def drop_missing_numeric( Train=X_train, Test=X_val, feats=[] ):
    # Features with missing values
    if not feats:
        feats = list(Train.columns[(Train.dtypes =='numeric') & Train.isnull().any()])

    reduced_Train = Train.drop( feats, axis=1, inplace=False )
    reduced_Test  = Test.drop( feats, axis=1, inplace=False )
    
    return Train, Test

### Imputation

In [46]:
def impute_missing_numeric( replacement='median', Train=X_train, Test=X_val ):
    # Features with missing values
    feats_missing_vals = list(Train.columns[(Train.dtypes =='numeric') & Train.isnull().any()])
    
    # Get replacement value given desired statistic
    if replacement == 'mean':
        replacement = Train[feats_missing_vals].mean( skipna=True )
    elif replacement == 'median':
        replacement = Train[feats_missing_vals].median( skipna=True )
    elif replacement == 'min':
        replacement = Train[feats_missing_vals].min( skipna=True )
    
    # Impute missing values with calculated replacement
    imputed_Train = Train.fillna( replacement )
    imputed_Test  = Test.fillna( replacement )

    return imputed_Train, imputed_Test

## Mixed Dropping and Imputation

Looking at the feature descriptions gives rise to intuition about whether removing the feature or imputation of the feature makes sense.

LotFrontage - Linear feet of street connected to property  
- Likely missing if no street is connected to property such as an apartment or condo.  
- If this is the case, it makes sense to use imputation with 0's to fill for NAN

MasVnrArea  - Masonry veneer area in square feet  
- Likely missing if no masonry veneer  
- If this is the case, it makes sense to use imputation with 0's to fill for NAN

GarageYrBlt - Year garage was built  
- Likely missing if no garage  
- If this is the case, imputation does not make much sense and simply removing the feature may result in better calssification

In [47]:
def mixed_drop_impute_missing_numeric( Train=X_train, Test=X_val ):
    # Drop year garage was built
    reduced_Train, reduced_Test = drop_missing_numeric( Train=Train, Test=Test, feats=['GarageYrBlt'] )

    # Perform scalar imputation with 0's
    return impute_missing_numeric( replacement=0, Train=reduced_Train, Test=reduced_Test )

## Scikit Learn Built In Imputer

A less "reinventing the wheel" heavy method is to use Scikit Learn's built in simple imputer class.

I think I like the way I performed Imputation above better. There is less code involved and it seems to be simpler operations. It also has the benefit of working natively with Panda's Data Frame.

In [52]:
#from sklearn.impute import SimpleImputer

#sklearn_imputed_X_train = X_train.copy()
#sklearn_imputed_X_val   = X_val.copy()
#median_imputer  = SimpleImputer( strategy='median' )

#sklearn_imputed_X_train = pd.DataFrame( median_imputer.fit_transform(sklearn_imputed_X_train) )
#sklearn_imputed_X_val   = pd.DataFrame( median_imputer.transform(sklearn_imputed_X_val) )

#sklearn_imputed_X_train.columns = X_train.columns
#sklearn_imputed_X_val.columns = X_val.columns

# Manage Categorical Data

## Drop Categorical Data

In [62]:
def drop_categorical_data( Train=X_train, Test=X_test ):
    reduced_Train = Train.select_dtypes( exclude=['object'] )
    reduced_Test  = Test.select_dtypes( exclude=['object'] )
    
    return reduced_Train, reduced_Test

## Label Encode Categorical Data

In [91]:
def label_encode_categorical_data( Train=X_train, Test=X_val ):
    label_encoder = LabelEncoder()
    categorical_feats = Train.columns[Train.dtypes == 'object']
    
    encoded_Train = Train.copy()
    encoded_Test  = Test.copy()
    
    for feat in categorical_feats:
        encoded_Train[feat] = label_encoder.fit_transform( Train[feat] )
        encoded_Test[feat]  = label_encoder.transform( Test[feat] )
        
    return encoded_Train, encoded_Test

# Model Creation

Defined via the specs on the Kaggle course.

In [53]:
model = RandomForestRegressor(n_estimators=100, random_state=0)

# Evaluate System

A system being the combination of model and dataset.

In [54]:
def score_system( m=model, X_t=X_train, X_v=X_val, y_t=y_train, y_v=y_val ):
    m.fit( X_t, y_t )
    pred_val = m.predict( X_v )
    return mean_absolute_error( pred_val, y_v )