<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Load-Data" data-toc-modified-id="Load-Data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Load Data</a></span></li><li><span><a href="#Extract-Features-and-Targets" data-toc-modified-id="Extract-Features-and-Targets-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Extract Features and Targets</a></span></li><li><span><a href="#Create-Validation-Set" data-toc-modified-id="Create-Validation-Set-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Create Validation Set</a></span></li><li><span><a href="#Explore-Data" data-toc-modified-id="Explore-Data-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Explore Data</a></span></li><li><span><a href="#Manage-Missing-Categorical-Data" data-toc-modified-id="Manage-Missing-Categorical-Data-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Manage Missing Categorical Data</a></span></li><li><span><a href="#Drop-Features-with-Missing-Data" data-toc-modified-id="Drop-Features-with-Missing-Data-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Drop Features with Missing Data</a></span></li><li><span><a href="#Imputation" data-toc-modified-id="Imputation-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Imputation</a></span></li><li><span><a href="#Mixed-Dropping-and-Imputation" data-toc-modified-id="Mixed-Dropping-and-Imputation-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Mixed Dropping and Imputation</a></span></li><li><span><a href="#Scikit-Learn-Built-In-Imputer" data-toc-modified-id="Scikit-Learn-Built-In-Imputer-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Scikit Learn Built In Imputer</a></span></li></ul></div>

# Import Packages

In [10]:
import pandas as pd
import pandas_profiling as pp

from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

# Import Data

## Load Data

In [2]:
# Define data locations
data_dir        = '../Data/house-prices-advanced-regression-techniques/'
train_file_name = 'train.csv'
test_file_name  = 'test.csv'

# Load training and testing data
train_data = pd.read_csv( data_dir + train_file_name, index_col='Id' )
test_data  = pd.read_csv( data_dir + test_file_name, index_col='Id' )

# Remove rows with missing targets
train_data.dropna( axis=0, subset=['SalePrice'], inplace=True)

## Extract Features and Targets

In [51]:
# Extract targets and features
y = train_data.SalePrice
X = train_data.copy()
X.drop( ['SalePrice'], axis=1, inplace=True )


X_test = test_data.copy()

# As instructed by course, use only numerical data
# Update: Include categorical data
# Uncomment if categorical data no longer wanted
#X = X.select_dtypes( exclude=['object'] )
#X_test = test_data.select_dtypes( exclude=['object'] )

## Create Validation Set

In [52]:
X_train, X_val, y_train, y_val = train_test_split( X, y, 
                                                   train_size=0.8, 
                                                   test_size=0.2, 
                                                   random_state=0)

## Explore Data

To save space from extraneous output, uncomment command of interst when desired.

In [25]:
pp.ProfileReport( X_train )

0,1
Number of variables,80
Number of observations,1168
Total Missing (%),5.9%
Total size in memory,730.1 KiB
Average record size in memory,640.1 B

0,1
Numeric,37
Categorical,43
Boolean,0
Date,0
Text (Unique),0
Rejected,0
Unsupported,0

0,1
Distinct count,666
Unique (%),57.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1161
Minimum,334
Maximum,3228
Zeros (%),0.0%

0,1
Minimum,334.0
5-th percentile,672.0
Q1,884.0
Median,1092.0
Q3,1389.2
95-th percentile,1825.3
Maximum,3228.0
Range,2894.0
Interquartile range,505.25

0,1
Standard deviation,373.32
Coef of variation,0.32156
Kurtosis,1.689
Mean,1161
MAD,294.93
Skewness,0.96159
Sum,1356000
Variance,139360
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
864,17,1.5%,
912,12,1.0%,
1040,11,0.9%,
672,10,0.9%,
848,10,0.9%,
894,10,0.9%,
816,8,0.7%,
1056,6,0.5%,
840,6,0.5%,
483,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
334,1,0.1%,
372,1,0.1%,
438,1,0.1%,
480,1,0.1%,
483,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
2524,1,0.1%,
2633,1,0.1%,
2898,1,0.1%,
3138,1,0.1%,
3228,1,0.1%,

0,1
Distinct count,357
Unique (%),30.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,351.48
Minimum,0
Maximum,1872
Zeros (%),56.3%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,729.0
95-th percentile,1158.9
Maximum,1872.0
Range,1872.0
Interquartile range,729.0

0,1
Standard deviation,438.14
Coef of variation,1.2466
Kurtosis,-0.66824
Mean,351.48
MAD,398.85
Skewness,0.78147
Sum,410528
Variance,191960
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,658,56.3%,
728,9,0.8%,
504,8,0.7%,
672,7,0.6%,
720,7,0.6%,
546,7,0.6%,
600,5,0.4%,
840,4,0.3%,
689,4,0.3%,
756,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,658,56.3%,
110,1,0.1%,
167,1,0.1%,
192,1,0.1%,
208,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1540,1,0.1%,
1611,1,0.1%,
1796,1,0.1%,
1818,1,0.1%,
1872,1,0.1%,

0,1
Distinct count,18
Unique (%),1.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3.2183
Minimum,0
Maximum,508
Zeros (%),98.4%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,508
Range,508
Interquartile range,0

0,1
Standard deviation,27.917
Coef of variation,8.6743
Kurtosis,135.26
Mean,3.2183
MAD,6.3319
Skewness,10.599
Sum,3759
Variance,779.34
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1149,98.4%,
216,2,0.2%,
168,2,0.2%,
245,1,0.1%,
238,1,0.1%,
290,1,0.1%,
196,1,0.1%,
182,1,0.1%,
180,1,0.1%,
304,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1149,98.4%,
23,1,0.1%,
96,1,0.1%,
130,1,0.1%,
140,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
238,1,0.1%,
245,1,0.1%,
290,1,0.1%,
304,1,0.1%,
508,1,0.1%,

0,1
Distinct count,3
Unique (%),0.3%
Missing (%),93.9%
Missing (n),1097

0,1
Grvl,37
Pave,34
(Missing),1097

Value,Count,Frequency (%),Unnamed: 3
Grvl,37,3.2%,
Pave,34,2.9%,
(Missing),1097,93.9%,

0,1
Distinct count,8
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2.8827
Minimum,0
Maximum,8
Zeros (%),0.3%

0,1
Minimum,0
5-th percentile,2
Q1,2
Median,3
Q3,3
95-th percentile,4
Maximum,8
Range,8
Interquartile range,1

0,1
Standard deviation,0.80217
Coef of variation,0.27827
Kurtosis,2.3405
Mean,2.8827
MAD,0.56184
Skewness,0.23468
Sum,3367
Variance,0.64347
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
3,645,55.2%,
2,284,24.3%,
4,178,15.2%,
1,35,3.0%,
5,17,1.5%,
6,4,0.3%,
0,4,0.3%,
8,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,4,0.3%,
1,35,3.0%,
2,284,24.3%,
3,645,55.2%,
4,178,15.2%,

Value,Count,Frequency (%),Unnamed: 3
3,645,55.2%,
4,178,15.2%,
5,17,1.5%,
6,4,0.3%,
8,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
1Fam,981
TwnhsE,89
Duplex,38
Other values (2),60

Value,Count,Frequency (%),Unnamed: 3
1Fam,981,84.0%,
TwnhsE,89,7.6%,
Duplex,38,3.3%,
Twnhs,35,3.0%,
2fmCon,25,2.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),2.4%
Missing (n),28

0,1
TA,1046
Gd,55
Fa,37
(Missing),28

Value,Count,Frequency (%),Unnamed: 3
TA,1046,89.6%,
Gd,55,4.7%,
Fa,37,3.2%,
Po,2,0.2%,
(Missing),28,2.4%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),2.4%
Missing (n),28

0,1
No,768
Av,174
Gd,106

Value,Count,Frequency (%),Unnamed: 3
No,768,65.8%,
Av,174,14.9%,
Gd,106,9.1%,
Mn,92,7.9%,
(Missing),28,2.4%,

0,1
Distinct count,551
Unique (%),47.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,439.89
Minimum,0
Maximum,2260
Zeros (%),32.6%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,379.5
Q3,716.0
95-th percentile,1269.4
Maximum,2260.0
Range,2260.0
Interquartile range,716.0

0,1
Standard deviation,435.11
Coef of variation,0.98913
Kurtosis,-0.080938
Mean,439.89
MAD,365.25
Skewness,0.76419
Sum,513792
Variance,189320
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,381,32.6%,
24,8,0.7%,
16,7,0.6%,
20,5,0.4%,
400,4,0.3%,
300,4,0.3%,
641,4,0.3%,
697,4,0.3%,
1200,4,0.3%,
600,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,381,32.6%,
16,7,0.6%,
20,5,0.4%,
24,8,0.7%,
25,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1721,1,0.1%,
1880,1,0.1%,
1904,1,0.1%,
2188,1,0.1%,
2260,1,0.1%,

0,1
Distinct count,120
Unique (%),10.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,45.572
Minimum,0
Maximum,1120
Zeros (%),88.3%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,0.0
95-th percentile,376.3
Maximum,1120.0
Range,1120.0
Interquartile range,0.0

0,1
Standard deviation,156.23
Coef of variation,3.4282
Kurtosis,18.956
Mean,45.572
MAD,80.54
Skewness,4.1851
Sum,53228
Variance,24408
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1031,88.3%,
180,4,0.3%,
374,3,0.3%,
93,2,0.2%,
480,2,0.2%,
117,2,0.2%,
279,2,0.2%,
287,2,0.2%,
391,2,0.2%,
290,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,1031,88.3%,
28,1,0.1%,
32,1,0.1%,
35,1,0.1%,
41,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
1061,1,0.1%,
1063,1,0.1%,
1080,1,0.1%,
1085,1,0.1%,
1120,1,0.1%,

0,1
Distinct count,7
Unique (%),0.6%
Missing (%),2.4%
Missing (n),28

0,1
Unf,353
GLQ,330
ALQ,172
Other values (3),285

Value,Count,Frequency (%),Unnamed: 3
Unf,353,30.2%,
GLQ,330,28.3%,
ALQ,172,14.7%,
BLQ,123,10.5%,
Rec,106,9.1%,
LwQ,56,4.8%,
(Missing),28,2.4%,

0,1
Distinct count,7
Unique (%),0.6%
Missing (%),2.5%
Missing (n),29

0,1
Unf,1003
LwQ,42
Rec,39
Other values (3),55

Value,Count,Frequency (%),Unnamed: 3
Unf,1003,85.9%,
LwQ,42,3.6%,
Rec,39,3.3%,
BLQ,30,2.6%,
ALQ,14,1.2%,
GLQ,11,0.9%,
(Missing),29,2.5%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.42209
Minimum,0
Maximum,3
Zeros (%),58.7%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,1
95-th percentile,1
Maximum,3
Range,3
Interquartile range,1

0,1
Standard deviation,0.51449
Coef of variation,1.2189
Kurtosis,-0.86418
Mean,0.42209
MAD,0.49581
Skewness,0.57989
Sum,493
Variance,0.2647
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,686,58.7%,
1,472,40.4%,
2,9,0.8%,
3,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,686,58.7%,
1,472,40.4%,
2,9,0.8%,
3,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,686,58.7%,
1,472,40.4%,
2,9,0.8%,
3,1,0.1%,

0,1
Distinct count,3
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.054795
Minimum,0
Maximum,2
Zeros (%),94.6%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,1
Maximum,2
Range,2
Interquartile range,0

0,1
Standard deviation,0.23141
Coef of variation,4.2232
Kurtosis,16.16
Mean,0.054795
MAD,0.10368
Skewness,4.1239
Sum,64
Variance,0.05355
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1105,94.6%,
1,62,5.3%,
2,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1105,94.6%,
1,62,5.3%,
2,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1105,94.6%,
1,62,5.3%,
2,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),2.4%
Missing (n),28

0,1
TA,528
Gd,490
Ex,94

Value,Count,Frequency (%),Unnamed: 3
TA,528,45.2%,
Gd,490,42.0%,
Ex,94,8.0%,
Fa,28,2.4%,
(Missing),28,2.4%,

0,1
Distinct count,687
Unique (%),58.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,568.05
Minimum,0
Maximum,2153
Zeros (%),7.9%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,228.0
Median,482.5
Q3,811.25
95-th percentile,1469.3
Maximum,2153.0
Range,2153.0
Interquartile range,583.25

0,1
Standard deviation,437.57
Coef of variation,0.7703
Kurtosis,0.34383
Mean,568.05
MAD,351.24
Skewness,0.88206
Sum,663482
Variance,191470
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,92,7.9%,
728,9,0.8%,
600,7,0.6%,
572,6,0.5%,
440,6,0.5%,
625,6,0.5%,
384,6,0.5%,
319,5,0.4%,
326,5,0.4%,
270,5,0.4%,

Value,Count,Frequency (%),Unnamed: 3
0,92,7.9%,
14,1,0.1%,
15,1,0.1%,
23,1,0.1%,
26,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1935,1,0.1%,
1969,1,0.1%,
2002,1,0.1%,
2042,1,0.1%,
2153,1,0.1%,

0,1
Distinct count,2
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
Y,1090
N,78

Value,Count,Frequency (%),Unnamed: 3
Y,1090,93.3%,
N,78,6.7%,

0,1
Distinct count,9
Unique (%),0.8%
Missing (%),0.0%
Missing (n),0

0,1
Norm,1017
Feedr,62
Artery,32
Other values (6),57

Value,Count,Frequency (%),Unnamed: 3
Norm,1017,87.1%,
Feedr,62,5.3%,
Artery,32,2.7%,
PosN,17,1.5%,
RRAn,17,1.5%,
RRAe,10,0.9%,
PosA,7,0.6%,
RRNn,4,0.3%,
RRNe,2,0.2%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Norm,1160
Feedr,4
PosN,1
Other values (3),3

Value,Count,Frequency (%),Unnamed: 3
Norm,1160,99.3%,
Feedr,4,0.3%,
PosN,1,0.1%,
Artery,1,0.1%,
RRAe,1,0.1%,
PosA,1,0.1%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.1%
Missing (n),1

0,1
SBrkr,1060
FuseA,82
FuseF,22
Other values (2),3

Value,Count,Frequency (%),Unnamed: 3
SBrkr,1060,90.8%,
FuseA,82,7.0%,
FuseF,22,1.9%,
FuseP,2,0.2%,
Mix,1,0.1%,
(Missing),1,0.1%,

0,1
Distinct count,107
Unique (%),9.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,23.022
Minimum,0
Maximum,552
Zeros (%),85.4%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,184
Maximum,552
Range,552
Interquartile range,0

0,1
Standard deviation,63.153
Coef of variation,2.7431
Kurtosis,10.375
Mean,23.022
MAD,39.315
Skewness,3.0666
Sum,26890
Variance,3988.3
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,997,85.4%,
112,14,1.2%,
96,6,0.5%,
216,5,0.4%,
120,4,0.3%,
116,3,0.3%,
252,3,0.3%,
164,3,0.3%,
192,3,0.3%,
156,3,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,997,85.4%,
19,1,0.1%,
20,1,0.1%,
24,1,0.1%,
32,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
301,1,0.1%,
318,1,0.1%,
330,1,0.1%,
386,1,0.1%,
552,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
TA,1025
Gd,114
Fa,25
Other values (2),4

Value,Count,Frequency (%),Unnamed: 3
TA,1025,87.8%,
Gd,114,9.8%,
Fa,25,2.1%,
Ex,3,0.3%,
Po,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,729
Gd,388
Ex,39

Value,Count,Frequency (%),Unnamed: 3
TA,729,62.4%,
Gd,388,33.2%,
Ex,39,3.3%,
Fa,12,1.0%,

0,1
Distinct count,15
Unique (%),1.3%
Missing (%),0.0%
Missing (n),0

0,1
VinylSd,412
Wd Sdng,175
HdBoard,171
Other values (12),410

Value,Count,Frequency (%),Unnamed: 3
VinylSd,412,35.3%,
Wd Sdng,175,15.0%,
HdBoard,171,14.6%,
MetalSd,164,14.0%,
Plywood,94,8.0%,
CemntBd,50,4.3%,
BrkFace,42,3.6%,
WdShing,21,1.8%,
Stucco,17,1.5%,
AsbShng,16,1.4%,

0,1
Distinct count,16
Unique (%),1.4%
Missing (%),0.0%
Missing (n),0

0,1
VinylSd,401
Wd Sdng,169
HdBoard,160
Other values (13),438

Value,Count,Frequency (%),Unnamed: 3
VinylSd,401,34.3%,
Wd Sdng,169,14.5%,
HdBoard,160,13.7%,
MetalSd,160,13.7%,
Plywood,123,10.5%,
CmentBd,50,4.3%,
Wd Shng,30,2.6%,
BrkFace,21,1.8%,
AsbShng,18,1.5%,
Stucco,16,1.4%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),81.7%
Missing (n),954

0,1
MnPrv,113
GdPrv,51
GdWo,43
(Missing),954

Value,Count,Frequency (%),Unnamed: 3
MnPrv,113,9.7%,
GdPrv,51,4.4%,
GdWo,43,3.7%,
MnWw,7,0.6%,
(Missing),954,81.7%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),47.2%
Missing (n),551

0,1
Gd,295
TA,257
Fa,29
Other values (2),36
(Missing),551

Value,Count,Frequency (%),Unnamed: 3
Gd,295,25.3%,
TA,257,22.0%,
Fa,29,2.5%,
Ex,19,1.6%,
Po,17,1.5%,
(Missing),551,47.2%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.61216
Minimum,0
Maximum,3
Zeros (%),47.2%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,1
Q3,1
95-th percentile,2
Maximum,3
Range,3
Interquartile range,1

0,1
Standard deviation,0.64087
Coef of variation,1.0469
Kurtosis,-0.31165
Mean,0.61216
MAD,0.57757
Skewness,0.6223
Sum,715
Variance,0.41072
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,551,47.2%,
1,522,44.7%,
2,92,7.9%,
3,3,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,551,47.2%,
1,522,44.7%,
2,92,7.9%,
3,3,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,551,47.2%,
1,522,44.7%,
2,92,7.9%,
3,3,0.3%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
PConc,512
CBlock,505
BrkTil,125
Other values (3),26

Value,Count,Frequency (%),Unnamed: 3
PConc,512,43.8%,
CBlock,505,43.2%,
BrkTil,125,10.7%,
Slab,19,1.6%,
Stone,6,0.5%,
Wood,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.5668
Minimum,0
Maximum,3
Zeros (%),0.5%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,2
Q3,2
95-th percentile,2
Maximum,3
Range,3
Interquartile range,1

0,1
Standard deviation,0.5467
Coef of variation,0.34893
Kurtosis,-0.9157
Mean,1.5668
MAD,0.51979
Skewness,0.032962
Sum,1830
Variance,0.29888
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
2,618,52.9%,
1,519,44.4%,
3,25,2.1%,
0,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
0,6,0.5%,
1,519,44.4%,
2,618,52.9%,
3,25,2.1%,

Value,Count,Frequency (%),Unnamed: 3
0,6,0.5%,
1,519,44.4%,
2,618,52.9%,
3,25,2.1%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Typ,1088
Min2,30
Min1,24
Other values (3),26

Value,Count,Frequency (%),Unnamed: 3
Typ,1088,93.2%,
Min2,30,2.6%,
Min1,24,2.1%,
Mod,12,1.0%,
Maj1,11,0.9%,
Maj2,3,0.3%,

0,1
Distinct count,400
Unique (%),34.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,473.63
Minimum,0
Maximum,1390
Zeros (%),5.0%

0,1
Minimum,0.0
5-th percentile,160.0
Q1,336.0
Median,477.5
Q3,576.0
95-th percentile,845.3
Maximum,1390.0
Range,1390.0
Interquartile range,240.0

0,1
Standard deviation,209.44
Coef of variation,0.4422
Kurtosis,0.82934
Mean,473.63
MAD,157.65
Skewness,0.17431
Sum,553203
Variance,43866
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,58,5.0%,
440,42,3.6%,
576,35,3.0%,
240,31,2.7%,
528,29,2.5%,
484,25,2.1%,
264,21,1.8%,
400,20,1.7%,
288,19,1.6%,
308,16,1.4%,

Value,Count,Frequency (%),Unnamed: 3
0,58,5.0%,
160,2,0.2%,
180,8,0.7%,
186,1,0.1%,
189,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1134,1,0.1%,
1166,1,0.1%,
1248,1,0.1%,
1356,1,0.1%,
1390,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.7714
Minimum,0
Maximum,4
Zeros (%),5.0%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,2
Q3,2
95-th percentile,3
Maximum,4
Range,4
Interquartile range,1

0,1
Standard deviation,0.73004
Coef of variation,0.41213
Kurtosis,0.195
Mean,1.7714
MAD,0.57088
Skewness,-0.35852
Sum,2069
Variance,0.53296
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
2,665,56.9%,
1,299,25.6%,
3,144,12.3%,
0,58,5.0%,
4,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,58,5.0%,
1,299,25.6%,
2,665,56.9%,
3,144,12.3%,
4,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,58,5.0%,
1,299,25.6%,
2,665,56.9%,
3,144,12.3%,
4,2,0.2%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),5.0%
Missing (n),58

0,1
TA,1063
Fa,32
Gd,8
Other values (2),7
(Missing),58

Value,Count,Frequency (%),Unnamed: 3
TA,1063,91.0%,
Fa,32,2.7%,
Gd,8,0.7%,
Po,6,0.5%,
Ex,1,0.1%,
(Missing),58,5.0%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),5.0%
Missing (n),58

0,1
Unf,488
RFn,337
Fin,285
(Missing),58

Value,Count,Frequency (%),Unnamed: 3
Unf,488,41.8%,
RFn,337,28.9%,
Fin,285,24.4%,
(Missing),58,5.0%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),5.0%
Missing (n),58

0,1
TA,1055
Fa,40
Gd,10
Other values (2),5
(Missing),58

Value,Count,Frequency (%),Unnamed: 3
TA,1055,90.3%,
Fa,40,3.4%,
Gd,10,0.9%,
Po,3,0.3%,
Ex,2,0.2%,
(Missing),58,5.0%,

0,1
Distinct count,7
Unique (%),0.6%
Missing (%),5.0%
Missing (n),58

0,1
Attchd,696
Detchd,315
BuiltIn,74
Other values (3),25
(Missing),58

Value,Count,Frequency (%),Unnamed: 3
Attchd,696,59.6%,
Detchd,315,27.0%,
BuiltIn,74,6.3%,
Basment,14,1.2%,
2Types,6,0.5%,
CarPort,5,0.4%,
(Missing),58,5.0%,

0,1
Distinct count,98
Unique (%),8.4%
Missing (%),5.0%
Missing (n),58
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1978.1
Minimum,1900
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,1900
5-th percentile,1928
Q1,1961
Median,1979
Q3,2002
95-th percentile,2007
Maximum,2010
Range,110
Interquartile range,41

0,1
Standard deviation,24.877
Coef of variation,0.012576
Kurtosis,-0.42154
Mean,1978.1
MAD,21.023
Skewness,-0.64474
Sum,2195700
Variance,618.88
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
2005.0,54,4.6%,
2006.0,46,3.9%,
2003.0,42,3.6%,
2004.0,39,3.3%,
2007.0,37,3.2%,
1977.0,30,2.6%,
1998.0,26,2.2%,
2008.0,23,2.0%,
1999.0,23,2.0%,
2002.0,23,2.0%,

Value,Count,Frequency (%),Unnamed: 3
1900.0,1,0.1%,
1906.0,1,0.1%,
1908.0,1,0.1%,
1910.0,2,0.2%,
1914.0,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
2006.0,46,3.9%,
2007.0,37,3.2%,
2008.0,23,2.0%,
2009.0,16,1.4%,
2010.0,3,0.3%,

0,1
Distinct count,741
Unique (%),63.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1518.9
Minimum,334
Maximum,4676
Zeros (%),0.0%

0,1
Minimum,334.0
5-th percentile,848.0
Q1,1139.0
Median,1471.5
Q3,1788.5
95-th percentile,2461.1
Maximum,4676.0
Range,4342.0
Interquartile range,649.5

0,1
Standard deviation,513.8
Coef of variation,0.33828
Kurtosis,2.5429
Mean,1518.9
MAD,395.48
Skewness,1.0768
Sum,1774055
Variance,263990
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
864,14,1.2%,
1456,10,0.9%,
1040,9,0.8%,
1200,9,0.8%,
894,9,0.8%,
912,8,0.7%,
848,8,0.7%,
816,7,0.6%,
1092,7,0.6%,
1344,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
334,1,0.1%,
438,1,0.1%,
480,1,0.1%,
605,1,0.1%,
616,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
3493,1,0.1%,
3608,1,0.1%,
3627,1,0.1%,
4316,1,0.1%,
4676,1,0.1%,

0,1
Distinct count,3
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.38442
Minimum,0
Maximum,2
Zeros (%),62.1%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,1
95-th percentile,1
Maximum,2
Range,2
Interquartile range,1

0,1
Standard deviation,0.49712
Coef of variation,1.2932
Kurtosis,-1.3229
Mean,0.38442
MAD,0.47723
Skewness,0.60126
Sum,449
Variance,0.24713
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,725,62.1%,
1,437,37.4%,
2,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
0,725,62.1%,
1,437,37.4%,
2,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
0,725,62.1%,
1,437,37.4%,
2,6,0.5%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
GasA,1143
GasW,13
Grav,7
Other values (3),5

Value,Count,Frequency (%),Unnamed: 3
GasA,1143,97.9%,
GasW,13,1.1%,
Grav,7,0.6%,
Wall,2,0.2%,
OthW,2,0.2%,
Floor,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
Ex,591
TA,342
Gd,196
Other values (2),39

Value,Count,Frequency (%),Unnamed: 3
Ex,591,50.6%,
TA,342,29.3%,
Gd,196,16.8%,
Fa,38,3.3%,
Po,1,0.1%,

0,1
Distinct count,8
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0

0,1
1Story,579
2Story,362
1.5Fin,123
Other values (5),104

Value,Count,Frequency (%),Unnamed: 3
1Story,579,49.6%,
2Story,362,31.0%,
1.5Fin,123,10.5%,
SLvl,50,4.3%,
SFoyer,25,2.1%,
1.5Unf,12,1.0%,
2.5Unf,10,0.9%,
2.5Fin,7,0.6%,

0,1
Distinct count,1168
Unique (%),100.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,738.69
Minimum,1
Maximum,1460
Zeros (%),0.0%

0,1
Minimum,1.0
5-th percentile,82.35
Q1,373.75
Median,749.5
Q3,1108.8
95-th percentile,1393.3
Maximum,1460.0
Range,1459.0
Interquartile range,735.0

0,1
Standard deviation,421.61
Coef of variation,0.57076
Kurtosis,-1.2008
Mean,738.69
MAD,364.49
Skewness,-0.015381
Sum,862785
Variance,177750
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
1460,1,0.1%,
507,1,0.1%,
490,1,0.1%,
491,1,0.1%,
493,1,0.1%,
494,1,0.1%,
495,1,0.1%,
496,1,0.1%,
497,1,0.1%,
498,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1,1,0.1%,
4,1,0.1%,
7,1,0.1%,
8,1,0.1%,
9,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1456,1,0.1%,
1457,1,0.1%,
1458,1,0.1%,
1459,1,0.1%,
1460,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.0445
Minimum,0
Maximum,3
Zeros (%),0.1%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,1
Q3,1
95-th percentile,1
Maximum,3
Range,3
Interquartile range,0

0,1
Standard deviation,0.21844
Coef of variation,0.20913
Kurtosis,23.957
Mean,1.0445
MAD,0.086866
Skewness,4.6495
Sum,1220
Variance,0.047716
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
1,1116,95.5%,
2,49,4.2%,
3,2,0.2%,
0,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1,0.1%,
1,1116,95.5%,
2,49,4.2%,
3,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,1,0.1%,
1,1116,95.5%,
2,49,4.2%,
3,2,0.2%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,596
Gd,462
Ex,78

Value,Count,Frequency (%),Unnamed: 3
TA,596,51.0%,
Gd,462,39.6%,
Ex,78,6.7%,
Fa,32,2.7%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Lvl,1054
Bnk,50
HLS,34

Value,Count,Frequency (%),Unnamed: 3
Lvl,1054,90.2%,
Bnk,50,4.3%,
HLS,34,2.9%,
Low,30,2.6%,

0,1
Distinct count,3
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Gtl,1100
Mod,55
Sev,13

Value,Count,Frequency (%),Unnamed: 3
Gtl,1100,94.2%,
Mod,55,4.7%,
Sev,13,1.1%,

0,1
Distinct count,891
Unique (%),76.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,10590
Minimum,1300
Maximum,215245
Zeros (%),0.0%

0,1
Minimum,1300.0
5-th percentile,3207.9
Q1,7589.5
Median,9512.5
Q3,11602.0
95-th percentile,17133.0
Maximum,215245.0
Range,213945.0
Interquartile range,4012.0

0,1
Standard deviation,10704
Coef of variation,1.0108
Kurtosis,190.84
Mean,10590
MAD,3822.8
Skewness,12.14
Sum,12368738
Variance,114580000
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
7200,20,1.7%,
9600,20,1.7%,
6000,14,1.2%,
10800,12,1.0%,
9000,11,0.9%,
8400,10,0.9%,
1680,8,0.7%,
6120,8,0.7%,
3182,7,0.6%,
6240,7,0.6%,

Value,Count,Frequency (%),Unnamed: 3
1300,1,0.1%,
1491,1,0.1%,
1526,1,0.1%,
1533,2,0.2%,
1596,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
70761,1,0.1%,
115149,1,0.1%,
159000,1,0.1%,
164660,1,0.1%,
215245,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
Inside,850
Corner,205
CulDSac,76
Other values (2),37

Value,Count,Frequency (%),Unnamed: 3
Inside,850,72.8%,
Corner,205,17.6%,
CulDSac,76,6.5%,
FR2,36,3.1%,
FR3,1,0.1%,

0,1
Distinct count,105
Unique (%),9.0%
Missing (%),18.2%
Missing (n),212
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,69.614
Minimum,21
Maximum,313
Zeros (%),0.0%

0,1
Minimum,21.0
5-th percentile,34.0
Q1,59.0
Median,69.0
Q3,80.0
95-th percentile,104.25
Maximum,313.0
Range,292.0
Interquartile range,21.0

0,1
Standard deviation,22.946
Coef of variation,0.32962
Kurtosis,14.436
Mean,69.614
MAD,16.403
Skewness,1.7188
Sum,66551
Variance,526.52
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
60.0,113,9.7%,
80.0,59,5.1%,
70.0,48,4.1%,
50.0,44,3.8%,
75.0,42,3.6%,
85.0,33,2.8%,
65.0,30,2.6%,
78.0,21,1.8%,
90.0,20,1.7%,
55.0,17,1.5%,

Value,Count,Frequency (%),Unnamed: 3
21.0,17,1.5%,
24.0,16,1.4%,
30.0,6,0.5%,
32.0,4,0.3%,
33.0,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
153.0,1,0.1%,
168.0,1,0.1%,
174.0,1,0.1%,
182.0,1,0.1%,
313.0,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Reg,735
IR1,396
IR2,30

Value,Count,Frequency (%),Unnamed: 3
Reg,735,62.9%,
IR1,396,33.9%,
IR2,30,2.6%,
IR3,7,0.6%,

0,1
Distinct count,21
Unique (%),1.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.4443
Minimum,0
Maximum,572
Zeros (%),98.0%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,572
Range,572
Interquartile range,0

0,1
Standard deviation,51.201
Coef of variation,7.9451
Kurtosis,75.795
Mean,6.4443
MAD,12.635
Skewness,8.6068
Sum,7527
Variance,2621.5
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1145,98.0%,
80,3,0.3%,
360,2,0.2%,
384,1,0.1%,
53,1,0.1%,
120,1,0.1%,
144,1,0.1%,
205,1,0.1%,
232,1,0.1%,
234,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1145,98.0%,
53,1,0.1%,
80,3,0.3%,
120,1,0.1%,
144,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
513,1,0.1%,
514,1,0.1%,
515,1,0.1%,
528,1,0.1%,
572,1,0.1%,

0,1
Distinct count,15
Unique (%),1.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,56.605
Minimum,20
Maximum,190
Zeros (%),0.0%

0,1
Minimum,20
5-th percentile,20
Q1,20
Median,50
Q3,70
95-th percentile,160
Maximum,190
Range,170
Interquartile range,50

0,1
Standard deviation,42.172
Coef of variation,0.74502
Kurtosis,1.6266
Mean,56.605
MAD,31.108
Skewness,1.4243
Sum,66115
Variance,1778.5
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
20,430,36.8%,
60,243,20.8%,
50,115,9.8%,
120,68,5.8%,
30,56,4.8%,
160,53,4.5%,
70,49,4.2%,
80,44,3.8%,
90,38,3.3%,
190,24,2.1%,

Value,Count,Frequency (%),Unnamed: 3
20,430,36.8%,
30,56,4.8%,
40,3,0.3%,
45,11,0.9%,
50,115,9.8%,

Value,Count,Frequency (%),Unnamed: 3
90,38,3.3%,
120,68,5.8%,
160,53,4.5%,
180,6,0.5%,
190,24,2.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
RL,921
RM,174
FV,49
Other values (2),24

Value,Count,Frequency (%),Unnamed: 3
RL,921,78.9%,
RM,174,14.9%,
FV,49,4.2%,
RH,15,1.3%,
C (all),9,0.8%,

0,1
Distinct count,283
Unique (%),24.2%
Missing (%),0.5%
Missing (n),6
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,103.48
Minimum,0
Maximum,1600
Zeros (%),59.8%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,167.75
95-th percentile,456.0
Maximum,1600.0
Range,1600.0
Interquartile range,167.75

0,1
Standard deviation,182.68
Coef of variation,1.7653
Kurtosis,10.665
Mean,103.48
MAD,131.17
Skewness,2.7135
Sum,120240
Variance,33371
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,698,59.8%,
16.0,7,0.6%,
106.0,6,0.5%,
180.0,6,0.5%,
108.0,6,0.5%,
132.0,5,0.4%,
320.0,5,0.4%,
80.0,5,0.4%,
76.0,4,0.3%,
170.0,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0.0,698,59.8%,
1.0,2,0.2%,
11.0,1,0.1%,
14.0,1,0.1%,
16.0,7,0.6%,

Value,Count,Frequency (%),Unnamed: 3
1115.0,1,0.1%,
1129.0,1,0.1%,
1170.0,1,0.1%,
1378.0,1,0.1%,
1600.0,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.5%
Missing (n),6

0,1
,701
BrkFace,338
Stone,112

Value,Count,Frequency (%),Unnamed: 3
,701,60.0%,
BrkFace,338,28.9%,
Stone,112,9.6%,
BrkCmn,11,0.9%,
(Missing),6,0.5%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),95.8%
Missing (n),1119

0,1
Shed,45
Gar2,2
Othr,2
(Missing),1119

Value,Count,Frequency (%),Unnamed: 3
Shed,45,3.9%,
Gar2,2,0.2%,
Othr,2,0.2%,
(Missing),1119,95.8%,

0,1
Distinct count,21
Unique (%),1.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,50.937
Minimum,0
Maximum,15500
Zeros (%),95.9%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,15500
Range,15500
Interquartile range,0

0,1
Standard deviation,550.38
Coef of variation,10.805
Kurtosis,577.36
Mean,50.937
MAD,97.687
Skewness,22.339
Sum,59494
Variance,302920
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1120,95.9%,
400,11,0.9%,
500,8,0.7%,
450,4,0.3%,
700,3,0.3%,
2000,3,0.3%,
600,3,0.3%,
1200,2,0.2%,
480,2,0.2%,
1150,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1120,95.9%,
54,1,0.1%,
350,1,0.1%,
400,11,0.9%,
450,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
2000,3,0.3%,
2500,1,0.1%,
3500,1,0.1%,
8300,1,0.1%,
15500,1,0.1%,

0,1
Distinct count,12
Unique (%),1.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.3014
Minimum,1
Maximum,12
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,2
Q1,5
Median,6
Q3,8
95-th percentile,11
Maximum,12
Range,11
Interquartile range,3

0,1
Standard deviation,2.726
Coef of variation,0.4326
Kurtosis,-0.40992
Mean,6.3014
MAD,2.1604
Skewness,0.2333
Sum,7360
Variance,7.4309
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
6,195,16.7%,
7,189,16.2%,
5,170,14.6%,
4,118,10.1%,
8,90,7.7%,
3,83,7.1%,
10,69,5.9%,
11,66,5.7%,
12,49,4.2%,
9,49,4.2%,

Value,Count,Frequency (%),Unnamed: 3
1,49,4.2%,
2,41,3.5%,
3,83,7.1%,
4,118,10.1%,
5,170,14.6%,

Value,Count,Frequency (%),Unnamed: 3
8,90,7.7%,
9,49,4.2%,
10,69,5.9%,
11,66,5.7%,
12,49,4.2%,

0,1
Distinct count,25
Unique (%),2.1%
Missing (%),0.0%
Missing (n),0

0,1
NAmes,177
CollgCr,116
OldTown,89
Other values (22),786

Value,Count,Frequency (%),Unnamed: 3
NAmes,177,15.2%,
CollgCr,116,9.9%,
OldTown,89,7.6%,
Edwards,80,6.8%,
Somerst,68,5.8%,
Sawyer,65,5.6%,
Gilbert,64,5.5%,
NridgHt,61,5.2%,
NWAmes,56,4.8%,
SawyerW,46,3.9%,

0,1
Distinct count,185
Unique (%),15.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,48.045
Minimum,0
Maximum,547
Zeros (%),44.6%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,26.0
Q3,68.0
95-th percentile,185.95
Maximum,547.0
Range,547.0
Interquartile range,68.0

0,1
Standard deviation,68.619
Coef of variation,1.4282
Kurtosis,8.6525
Mean,48.045
MAD,49.012
Skewness,2.4043
Sum,56116
Variance,4708.6
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,521,44.6%,
36,24,2.1%,
20,18,1.5%,
48,17,1.5%,
40,17,1.5%,
24,15,1.3%,
60,14,1.2%,
45,14,1.2%,
44,12,1.0%,
39,12,1.0%,

Value,Count,Frequency (%),Unnamed: 3
0,521,44.6%,
8,1,0.1%,
10,1,0.1%,
11,1,0.1%,
12,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
406,1,0.1%,
418,1,0.1%,
502,1,0.1%,
523,1,0.1%,
547,1,0.1%,

0,1
Distinct count,9
Unique (%),0.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,5.5728
Minimum,1
Maximum,9
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,4
Q1,5
Median,5
Q3,6
95-th percentile,8
Maximum,9
Range,8
Interquartile range,1

0,1
Standard deviation,1.1169
Coef of variation,0.20042
Kurtosis,1.1977
Mean,5.5728
MAD,0.88841
Skewness,0.6801
Sum,6509
Variance,1.2475
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
5,654,56.0%,
6,207,17.7%,
7,158,13.5%,
8,59,5.1%,
4,48,4.1%,
9,18,1.5%,
3,18,1.5%,
2,5,0.4%,
1,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1,1,0.1%,
2,5,0.4%,
3,18,1.5%,
4,48,4.1%,
5,654,56.0%,

Value,Count,Frequency (%),Unnamed: 3
5,654,56.0%,
6,207,17.7%,
7,158,13.5%,
8,59,5.1%,
9,18,1.5%,

0,1
Distinct count,10
Unique (%),0.9%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.0865
Minimum,1
Maximum,10
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,4
Q1,5
Median,6
Q3,7
95-th percentile,8
Maximum,10
Range,9
Interquartile range,2

0,1
Standard deviation,1.3675
Coef of variation,0.22467
Kurtosis,0.14852
Mean,6.0865
MAD,1.0813
Skewness,0.16989
Sum,7109
Variance,1.87
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
5,319,27.3%,
6,304,26.0%,
7,255,21.8%,
8,135,11.6%,
4,91,7.8%,
9,32,2.7%,
3,15,1.3%,
10,12,1.0%,
2,3,0.3%,
1,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
1,2,0.2%,
2,3,0.3%,
3,15,1.3%,
4,91,7.8%,
5,319,27.3%,

Value,Count,Frequency (%),Unnamed: 3
6,304,26.0%,
7,255,21.8%,
8,135,11.6%,
9,32,2.7%,
10,12,1.0%,

0,1
Distinct count,3
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Y,1067
N,79
P,22

Value,Count,Frequency (%),Unnamed: 3
Y,1067,91.4%,
N,79,6.8%,
P,22,1.9%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2.1182
Minimum,0
Maximum,738
Zeros (%),99.7%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,738
Range,738
Interquartile range,0

0,1
Standard deviation,36.482
Coef of variation,17.224
Kurtosis,309.79
Mean,2.1182
MAD,4.2218
Skewness,17.492
Sum,2474
Variance,1331
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1164,99.7%,
738,1,0.1%,
648,1,0.1%,
576,1,0.1%,
512,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1164,99.7%,
512,1,0.1%,
576,1,0.1%,
648,1,0.1%,
738,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1164,99.7%,
512,1,0.1%,
576,1,0.1%,
648,1,0.1%,
738,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),99.7%
Missing (n),1164

0,1
Gd,2
Fa,1
Ex,1
(Missing),1164

Value,Count,Frequency (%),Unnamed: 3
Gd,2,0.2%,
Fa,1,0.1%,
Ex,1,0.1%,
(Missing),1164,99.7%,

0,1
Distinct count,7
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0

0,1
CompShg,1146
Tar&Grv,9
WdShake,5
Other values (4),8

Value,Count,Frequency (%),Unnamed: 3
CompShg,1146,98.1%,
Tar&Grv,9,0.8%,
WdShake,5,0.4%,
WdShngl,5,0.4%,
Metal,1,0.1%,
Membran,1,0.1%,
Roll,1,0.1%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Gable,905
Hip,236
Flat,11
Other values (3),16

Value,Count,Frequency (%),Unnamed: 3
Gable,905,77.5%,
Hip,236,20.2%,
Flat,11,0.9%,
Gambrel,8,0.7%,
Mansard,6,0.5%,
Shed,2,0.2%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Normal,969
Partial,98
Abnorml,79
Other values (3),22

Value,Count,Frequency (%),Unnamed: 3
Normal,969,83.0%,
Partial,98,8.4%,
Abnorml,79,6.8%,
Family,12,1.0%,
Alloca,7,0.6%,
AdjLand,3,0.3%,

0,1
Distinct count,9
Unique (%),0.8%
Missing (%),0.0%
Missing (n),0

0,1
WD,1019
New,96
COD,33
Other values (6),20

Value,Count,Frequency (%),Unnamed: 3
WD,1019,87.2%,
New,96,8.2%,
COD,33,2.8%,
ConLD,7,0.6%,
ConLw,5,0.4%,
ConLI,4,0.3%,
Oth,2,0.2%,
Con,1,0.1%,
CWD,1,0.1%,

0,1
Distinct count,66
Unique (%),5.7%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,14.528
Minimum,0
Maximum,480
Zeros (%),92.1%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,0.0
95-th percentile,155.65
Maximum,480.0
Range,480.0
Interquartile range,0.0

0,1
Standard deviation,54.01
Coef of variation,3.7176
Kurtosis,18.591
Mean,14.528
MAD,26.768
Skewness,4.1312
Sum,16969
Variance,2917
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1076,92.1%,
120,4,0.3%,
180,4,0.3%,
192,4,0.3%,
224,3,0.3%,
90,3,0.3%,
147,3,0.3%,
168,3,0.3%,
189,3,0.3%,
126,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,1076,92.1%,
40,1,0.1%,
53,1,0.1%,
60,1,0.1%,
63,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
312,1,0.1%,
374,1,0.1%,
385,1,0.1%,
410,1,0.1%,
480,1,0.1%,

0,1
Distinct count,2
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
Pave,1163
Grvl,5

Value,Count,Frequency (%),Unnamed: 3
Pave,1163,99.6%,
Grvl,5,0.4%,

0,1
Distinct count,12
Unique (%),1.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.5445
Minimum,2
Maximum,14
Zeros (%),0.0%

0,1
Minimum,2
5-th percentile,4
Q1,5
Median,6
Q3,7
95-th percentile,10
Maximum,14
Range,12
Interquartile range,2

0,1
Standard deviation,1.6245
Coef of variation,0.24822
Kurtosis,0.72644
Mean,6.5445
MAD,1.2847
Skewness,0.62362
Sum,7644
Variance,2.639
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
6,317,27.1%,
7,265,22.7%,
5,216,18.5%,
8,154,13.2%,
4,76,6.5%,
9,62,5.3%,
10,40,3.4%,
11,18,1.5%,
3,13,1.1%,
12,5,0.4%,

Value,Count,Frequency (%),Unnamed: 3
2,1,0.1%,
3,13,1.1%,
4,76,6.5%,
5,216,18.5%,
6,317,27.1%,

Value,Count,Frequency (%),Unnamed: 3
9,62,5.3%,
10,40,3.4%,
11,18,1.5%,
12,5,0.4%,
14,1,0.1%,

0,1
Distinct count,623
Unique (%),53.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1053.5
Minimum,0
Maximum,3206
Zeros (%),2.4%

0,1
Minimum,0.0
5-th percentile,534.05
Q1,798.75
Median,992.0
Q3,1276.2
95-th percentile,1734.0
Maximum,3206.0
Range,3206.0
Interquartile range,477.5

0,1
Standard deviation,412.07
Coef of variation,0.39114
Kurtosis,2.1657
Mean,1053.5
MAD,311.3
Skewness,0.59149
Sum,1230502
Variance,169800
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,28,2.4%,
864,26,2.2%,
672,14,1.2%,
912,13,1.1%,
768,12,1.0%,
816,12,1.0%,
1040,11,0.9%,
728,11,0.9%,
848,9,0.8%,
780,9,0.8%,

Value,Count,Frequency (%),Unnamed: 3
0,28,2.4%,
105,1,0.1%,
190,1,0.1%,
264,2,0.2%,
270,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
2524,1,0.1%,
2633,1,0.1%,
3138,1,0.1%,
3200,1,0.1%,
3206,1,0.1%,

0,1
Distinct count,2
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
AllPub,1167
NoSeWa,1

Value,Count,Frequency (%),Unnamed: 3
AllPub,1167,99.9%,
NoSeWa,1,0.1%,

0,1
Distinct count,240
Unique (%),20.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,94.498
Minimum,0
Maximum,736
Zeros (%),52.8%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,168.0
95-th percentile,350.3
Maximum,736.0
Range,736.0
Interquartile range,168.0

0,1
Standard deviation,127.31
Coef of variation,1.3472
Kurtosis,2.5301
Mean,94.498
MAD,103.49
Skewness,1.5169
Sum,110374
Variance,16208
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,617,52.8%,
144,30,2.6%,
192,30,2.6%,
100,29,2.5%,
168,24,2.1%,
120,23,2.0%,
224,11,0.9%,
140,10,0.9%,
180,8,0.7%,
240,8,0.7%,

Value,Count,Frequency (%),Unnamed: 3
0,617,52.8%,
12,2,0.2%,
24,2,0.2%,
26,2,0.2%,
28,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
635,1,0.1%,
668,1,0.1%,
670,1,0.1%,
728,1,0.1%,
736,1,0.1%,

0,1
Distinct count,110
Unique (%),9.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1970.9
Minimum,1872
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,1872.0
5-th percentile,1916.0
Q1,1953.8
Median,1972.0
Q3,2000.0
95-th percentile,2007.0
Maximum,2010.0
Range,138.0
Interquartile range,46.25

0,1
Standard deviation,30.407
Coef of variation,0.015428
Kurtosis,-0.42913
Mean,1970.9
MAD,25.202
Skewness,-0.61311
Sum,2302000
Variance,924.62
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
2005,53,4.5%,
2006,51,4.4%,
2007,39,3.3%,
2004,39,3.3%,
2003,37,3.2%,
1977,27,2.3%,
1920,27,2.3%,
1976,26,2.2%,
1959,23,2.0%,
1965,22,1.9%,

Value,Count,Frequency (%),Unnamed: 3
1872,1,0.1%,
1875,1,0.1%,
1880,4,0.3%,
1885,2,0.2%,
1890,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
2006,51,4.4%,
2007,39,3.3%,
2008,18,1.5%,
2009,13,1.1%,
2010,1,0.1%,

0,1
Distinct count,61
Unique (%),5.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1984.7
Minimum,1950
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,1950
5-th percentile,1950
Q1,1966
Median,1993
Q3,2004
95-th percentile,2007
Maximum,2010
Range,60
Interquartile range,38

0,1
Standard deviation,20.685
Coef of variation,0.010422
Kurtosis,-1.288
Mean,1984.7
MAD,18.683
Skewness,-0.49141
Sum,2318121
Variance,427.85
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
1950,145,12.4%,
2006,77,6.6%,
2007,61,5.2%,
2005,53,4.5%,
2004,49,4.2%,
2000,43,3.7%,
2003,42,3.6%,
2002,40,3.4%,
1998,31,2.7%,
2008,31,2.7%,

Value,Count,Frequency (%),Unnamed: 3
1950,145,12.4%,
1951,3,0.3%,
1952,3,0.3%,
1953,9,0.8%,
1954,12,1.0%,

Value,Count,Frequency (%),Unnamed: 3
2006,77,6.6%,
2007,61,5.2%,
2008,31,2.7%,
2009,17,1.5%,
2010,6,0.5%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2007.8
Minimum,2006
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,2006
5-th percentile,2006
Q1,2007
Median,2008
Q3,2009
95-th percentile,2010
Maximum,2010
Range,4
Interquartile range,2

0,1
Standard deviation,1.336
Coef of variation,0.00066538
Kurtosis,-1.1906
Mean,2007.8
MAD,1.153
Skewness,0.10393
Sum,2345133
Variance,1.7848
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
2009,261,22.3%,
2007,260,22.3%,
2006,253,21.7%,
2008,247,21.1%,
2010,147,12.6%,

Value,Count,Frequency (%),Unnamed: 3
2006,253,21.7%,
2007,260,22.3%,
2008,247,21.1%,
2009,261,22.3%,
2010,147,12.6%,

Value,Count,Frequency (%),Unnamed: 3
2006,253,21.7%,
2007,260,22.3%,
2008,247,21.1%,
2009,261,22.3%,
2010,147,12.6%,

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1
619,20,RL,90.0,11694,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NridgHt,Norm,Norm,1Fam,1Story,9,5,2007,2007,Hip,CompShg,CemntBd,CmentBd,BrkFace,452.0,Ex,TA,PConc,Ex,TA,Av,GLQ,48,Unf,0,1774,1822,GasA,Ex,Y,SBrkr,1828,0,0,1828,0,0,2,0,3,1,Gd,9,Typ,1,Gd,Attchd,2007.0,Unf,3,774,TA,TA,Y,0,108,0,0,260,0,,,,0,7,2007,New,Partial
871,20,RL,60.0,6600,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,PosN,Norm,1Fam,1Story,5,5,1962,1962,Hip,CompShg,MetalSd,MetalSd,,0.0,TA,TA,CBlock,TA,TA,No,Unf,0,Unf,0,894,894,GasA,Gd,N,SBrkr,894,0,0,894,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1962.0,Unf,1,308,TA,TA,Y,0,0,0,0,0,0,,,,0,8,2009,WD,Normal
93,30,RL,80.0,13360,Pave,Grvl,IR1,HLS,AllPub,Inside,Gtl,Crawfor,Norm,Norm,1Fam,1Story,5,7,1921,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,TA,Gd,BrkTil,Gd,TA,No,ALQ,713,Unf,0,163,876,GasA,Ex,Y,SBrkr,964,0,0,964,1,0,1,0,2,1,TA,5,Typ,0,,Detchd,1921.0,Unf,2,432,TA,TA,Y,0,0,44,0,0,0,,,,0,8,2009,WD,Normal
818,20,RL,,13265,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Mitchel,Norm,Norm,1Fam,1Story,8,5,2002,2002,Hip,CompShg,CemntBd,CmentBd,BrkFace,148.0,Gd,TA,PConc,Gd,TA,No,GLQ,1218,Unf,0,350,1568,GasA,Ex,Y,SBrkr,1689,0,0,1689,1,0,2,0,3,1,Gd,7,Typ,2,Gd,Attchd,2002.0,RFn,3,857,TA,TA,Y,150,59,0,0,0,0,,,,0,7,2008,WD,Normal
303,20,RL,118.0,13704,Pave,,IR1,Lvl,AllPub,Corner,Gtl,CollgCr,Norm,Norm,1Fam,1Story,7,5,2001,2002,Gable,CompShg,VinylSd,VinylSd,BrkFace,150.0,Gd,TA,PConc,Gd,TA,No,Unf,0,Unf,0,1541,1541,GasA,Ex,Y,SBrkr,1541,0,0,1541,0,0,2,0,3,1,Gd,6,Typ,1,TA,Attchd,2001.0,RFn,3,843,TA,TA,Y,468,81,0,0,0,0,,,,0,1,2006,WD,Normal


In [26]:
#X_train.describe()
#X_train.head()
#y_train.describe()
#y_train.head()

#X_test.describe()
#X_test.head()
#list(X_test.columns) 

# Model Creation

Defined via the specs on the Kaggle course.

In [5]:
model = RandomForestRegressor(n_estimators=100, random_state=0)

# Evaluate System

A system being the combination of model and dataset.

In [6]:
def score_system( m=model, X_t=X_train, X_v=X_val, y_t=y_train, y_v=y_val ):
    m.fit( X_t, y_t )
    pred_val = m.predict( X_v )
    return mean_absolute_error( pred_val, y_v )

# Manage Missing Data

In [7]:
# Shape of training data (num_data_points, num_features)
print( X_train.shape )

# Number of missing values for each feature of training data
num_missing_val_per_feature = X_train.isnull().sum()
print( num_missing_val_per_feature[num_missing_val_per_feature > 0] )

# Features with missing values
feats_missing_vals = list(X_train.columns[X_train.isnull().any()])

(1168, 79)
LotFrontage      212
Alley           1097
MasVnrType         6
MasVnrArea         6
BsmtQual          28
BsmtCond          28
BsmtExposure      28
BsmtFinType1      28
BsmtFinType2      29
Electrical         1
FireplaceQu      551
GarageType        58
GarageYrBlt       58
GarageFinish      58
GarageQual        58
GarageCond        58
PoolQC          1164
Fence            954
MiscFeature     1119
dtype: int64


## Manage Missing Categorical Data

In [8]:
# Get the columns for categorical features that are missing values
num_missing_val_per_categorical = X_train.select_dtypes( 'object' ).isnull().sum()
num_missing_val_per_categorical = num_missing_val_per_categorical[num_missing_val_per_categorical > 0]
missing_val_categorical_cols = list(X_train.columns[(X_train.dtypes =='object') & X_train.isnull().any()])
print( missing_val_categorical_cols )


['Alley', 'MasVnrType', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Electrical', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PoolQC', 'Fence', 'MiscFeature']


Based on the provided description, most of the missing values should be replaced with "NA"

The only special cases are:
 
MasVnrType: Masonry veneer type
- None: None

BsmtExposure: Refers to walkout or garden level walls
- No:   No Exposure
- NA:   No Basement

Electrical: Electrical system
- No good label, as such we will remove this featue. Additionall, from the Profile Report we see this feature has a uniqueness of 0.4% so this feature does not tell us much about the data anyway.

In [56]:
# Possible reasons for missing values on BsmtExposure could be "no exposure" or "no basement"
# We can do a simple crossreference to see how many times values exist in other basement categories and not for BsmtExposure
(X_train['BsmtExposure'].isnull() & X_train['BsmtCond'].notnull()).sum()

0

No times does this explanation occure, so we can safely assume missing values in BsmtExposure are due to "no basement" 

In [57]:
# Manage special cases in categorical data
X_train.fillna( value={'MasVnrType' : 'None', 'BsmtExposure' : 'NA'}, inplace=True )
X_val.fillna( value={'MasVnrType' : 'None', 'BsmtExposure' : 'NA'}, inplace=True )
X_test.fillna( value={'MasVnrType' : 'None', 'BsmtExposure' : 'NA'}, inplace=True )

X_train.drop( columns=['Electrical'], inplace=True )
X_val.drop( columns=['Electrical'], inplace=True )
X_test.drop( columns=['Electrical'], inplace=True )

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


In [61]:
# Manage all other cases of missing values in categorical data
value = {}
for feat in missing_val_categorical_cols:
    value[feat] = 'NA'
    
X_train.fillna( value=value, inplace=True )
X_val.fillna( value=value, inplace=True )
X_test.fillna( value=value, inplace=True )

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)


In [62]:
pp.ProfileReport(X_train)

0,1
Number of variables,79
Number of observations,1168
Total Missing (%),0.3%
Total size in memory,721.0 KiB
Average record size in memory,632.1 B

0,1
Numeric,37
Categorical,42
Boolean,0
Date,0
Text (Unique),0
Rejected,0
Unsupported,0

0,1
Distinct count,666
Unique (%),57.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1161
Minimum,334
Maximum,3228
Zeros (%),0.0%

0,1
Minimum,334.0
5-th percentile,672.0
Q1,884.0
Median,1092.0
Q3,1389.2
95-th percentile,1825.3
Maximum,3228.0
Range,2894.0
Interquartile range,505.25

0,1
Standard deviation,373.32
Coef of variation,0.32156
Kurtosis,1.689
Mean,1161
MAD,294.93
Skewness,0.96159
Sum,1356000
Variance,139360
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
864,17,1.5%,
912,12,1.0%,
1040,11,0.9%,
672,10,0.9%,
848,10,0.9%,
894,10,0.9%,
816,8,0.7%,
1056,6,0.5%,
840,6,0.5%,
483,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
334,1,0.1%,
372,1,0.1%,
438,1,0.1%,
480,1,0.1%,
483,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
2524,1,0.1%,
2633,1,0.1%,
2898,1,0.1%,
3138,1,0.1%,
3228,1,0.1%,

0,1
Distinct count,357
Unique (%),30.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,351.48
Minimum,0
Maximum,1872
Zeros (%),56.3%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,729.0
95-th percentile,1158.9
Maximum,1872.0
Range,1872.0
Interquartile range,729.0

0,1
Standard deviation,438.14
Coef of variation,1.2466
Kurtosis,-0.66824
Mean,351.48
MAD,398.85
Skewness,0.78147
Sum,410528
Variance,191960
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,658,56.3%,
728,9,0.8%,
504,8,0.7%,
672,7,0.6%,
720,7,0.6%,
546,7,0.6%,
600,5,0.4%,
840,4,0.3%,
689,4,0.3%,
756,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,658,56.3%,
110,1,0.1%,
167,1,0.1%,
192,1,0.1%,
208,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1540,1,0.1%,
1611,1,0.1%,
1796,1,0.1%,
1818,1,0.1%,
1872,1,0.1%,

0,1
Distinct count,18
Unique (%),1.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3.2183
Minimum,0
Maximum,508
Zeros (%),98.4%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,508
Range,508
Interquartile range,0

0,1
Standard deviation,27.917
Coef of variation,8.6743
Kurtosis,135.26
Mean,3.2183
MAD,6.3319
Skewness,10.599
Sum,3759
Variance,779.34
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1149,98.4%,
216,2,0.2%,
168,2,0.2%,
245,1,0.1%,
238,1,0.1%,
290,1,0.1%,
196,1,0.1%,
182,1,0.1%,
180,1,0.1%,
304,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1149,98.4%,
23,1,0.1%,
96,1,0.1%,
130,1,0.1%,
140,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
238,1,0.1%,
245,1,0.1%,
290,1,0.1%,
304,1,0.1%,
508,1,0.1%,

0,1
Distinct count,3
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
,1097
Grvl,37
Pave,34

Value,Count,Frequency (%),Unnamed: 3
,1097,93.9%,
Grvl,37,3.2%,
Pave,34,2.9%,

0,1
Distinct count,8
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2.8827
Minimum,0
Maximum,8
Zeros (%),0.3%

0,1
Minimum,0
5-th percentile,2
Q1,2
Median,3
Q3,3
95-th percentile,4
Maximum,8
Range,8
Interquartile range,1

0,1
Standard deviation,0.80217
Coef of variation,0.27827
Kurtosis,2.3405
Mean,2.8827
MAD,0.56184
Skewness,0.23468
Sum,3367
Variance,0.64347
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
3,645,55.2%,
2,284,24.3%,
4,178,15.2%,
1,35,3.0%,
5,17,1.5%,
6,4,0.3%,
0,4,0.3%,
8,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,4,0.3%,
1,35,3.0%,
2,284,24.3%,
3,645,55.2%,
4,178,15.2%,

Value,Count,Frequency (%),Unnamed: 3
3,645,55.2%,
4,178,15.2%,
5,17,1.5%,
6,4,0.3%,
8,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
1Fam,981
TwnhsE,89
Duplex,38
Other values (2),60

Value,Count,Frequency (%),Unnamed: 3
1Fam,981,84.0%,
TwnhsE,89,7.6%,
Duplex,38,3.3%,
Twnhs,35,3.0%,
2fmCon,25,2.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
TA,1046
Gd,55
Fa,37
Other values (2),30

Value,Count,Frequency (%),Unnamed: 3
TA,1046,89.6%,
Gd,55,4.7%,
Fa,37,3.2%,
,28,2.4%,
Po,2,0.2%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
No,768
Av,174
Gd,106
Other values (2),120

Value,Count,Frequency (%),Unnamed: 3
No,768,65.8%,
Av,174,14.9%,
Gd,106,9.1%,
Mn,92,7.9%,
,28,2.4%,

0,1
Distinct count,551
Unique (%),47.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,439.89
Minimum,0
Maximum,2260
Zeros (%),32.6%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,379.5
Q3,716.0
95-th percentile,1269.4
Maximum,2260.0
Range,2260.0
Interquartile range,716.0

0,1
Standard deviation,435.11
Coef of variation,0.98913
Kurtosis,-0.080938
Mean,439.89
MAD,365.25
Skewness,0.76419
Sum,513792
Variance,189320
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,381,32.6%,
24,8,0.7%,
16,7,0.6%,
20,5,0.4%,
400,4,0.3%,
300,4,0.3%,
641,4,0.3%,
697,4,0.3%,
1200,4,0.3%,
600,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,381,32.6%,
16,7,0.6%,
20,5,0.4%,
24,8,0.7%,
25,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1721,1,0.1%,
1880,1,0.1%,
1904,1,0.1%,
2188,1,0.1%,
2260,1,0.1%,

0,1
Distinct count,120
Unique (%),10.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,45.572
Minimum,0
Maximum,1120
Zeros (%),88.3%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,0.0
95-th percentile,376.3
Maximum,1120.0
Range,1120.0
Interquartile range,0.0

0,1
Standard deviation,156.23
Coef of variation,3.4282
Kurtosis,18.956
Mean,45.572
MAD,80.54
Skewness,4.1851
Sum,53228
Variance,24408
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1031,88.3%,
180,4,0.3%,
374,3,0.3%,
93,2,0.2%,
480,2,0.2%,
117,2,0.2%,
279,2,0.2%,
287,2,0.2%,
391,2,0.2%,
290,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,1031,88.3%,
28,1,0.1%,
32,1,0.1%,
35,1,0.1%,
41,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
1061,1,0.1%,
1063,1,0.1%,
1080,1,0.1%,
1085,1,0.1%,
1120,1,0.1%,

0,1
Distinct count,7
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0

0,1
Unf,353
GLQ,330
ALQ,172
Other values (4),313

Value,Count,Frequency (%),Unnamed: 3
Unf,353,30.2%,
GLQ,330,28.3%,
ALQ,172,14.7%,
BLQ,123,10.5%,
Rec,106,9.1%,
LwQ,56,4.8%,
,28,2.4%,

0,1
Distinct count,7
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0

0,1
Unf,1003
LwQ,42
Rec,39
Other values (4),84

Value,Count,Frequency (%),Unnamed: 3
Unf,1003,85.9%,
LwQ,42,3.6%,
Rec,39,3.3%,
BLQ,30,2.6%,
,29,2.5%,
ALQ,14,1.2%,
GLQ,11,0.9%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.42209
Minimum,0
Maximum,3
Zeros (%),58.7%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,1
95-th percentile,1
Maximum,3
Range,3
Interquartile range,1

0,1
Standard deviation,0.51449
Coef of variation,1.2189
Kurtosis,-0.86418
Mean,0.42209
MAD,0.49581
Skewness,0.57989
Sum,493
Variance,0.2647
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,686,58.7%,
1,472,40.4%,
2,9,0.8%,
3,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,686,58.7%,
1,472,40.4%,
2,9,0.8%,
3,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,686,58.7%,
1,472,40.4%,
2,9,0.8%,
3,1,0.1%,

0,1
Distinct count,3
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.054795
Minimum,0
Maximum,2
Zeros (%),94.6%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,1
Maximum,2
Range,2
Interquartile range,0

0,1
Standard deviation,0.23141
Coef of variation,4.2232
Kurtosis,16.16
Mean,0.054795
MAD,0.10368
Skewness,4.1239
Sum,64
Variance,0.05355
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1105,94.6%,
1,62,5.3%,
2,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1105,94.6%,
1,62,5.3%,
2,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1105,94.6%,
1,62,5.3%,
2,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
TA,528
Gd,490
Ex,94
Other values (2),56

Value,Count,Frequency (%),Unnamed: 3
TA,528,45.2%,
Gd,490,42.0%,
Ex,94,8.0%,
Fa,28,2.4%,
,28,2.4%,

0,1
Distinct count,687
Unique (%),58.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,568.05
Minimum,0
Maximum,2153
Zeros (%),7.9%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,228.0
Median,482.5
Q3,811.25
95-th percentile,1469.3
Maximum,2153.0
Range,2153.0
Interquartile range,583.25

0,1
Standard deviation,437.57
Coef of variation,0.7703
Kurtosis,0.34383
Mean,568.05
MAD,351.24
Skewness,0.88206
Sum,663482
Variance,191470
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,92,7.9%,
728,9,0.8%,
600,7,0.6%,
572,6,0.5%,
440,6,0.5%,
625,6,0.5%,
384,6,0.5%,
319,5,0.4%,
326,5,0.4%,
270,5,0.4%,

Value,Count,Frequency (%),Unnamed: 3
0,92,7.9%,
14,1,0.1%,
15,1,0.1%,
23,1,0.1%,
26,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1935,1,0.1%,
1969,1,0.1%,
2002,1,0.1%,
2042,1,0.1%,
2153,1,0.1%,

0,1
Distinct count,2
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
Y,1090
N,78

Value,Count,Frequency (%),Unnamed: 3
Y,1090,93.3%,
N,78,6.7%,

0,1
Distinct count,9
Unique (%),0.8%
Missing (%),0.0%
Missing (n),0

0,1
Norm,1017
Feedr,62
Artery,32
Other values (6),57

Value,Count,Frequency (%),Unnamed: 3
Norm,1017,87.1%,
Feedr,62,5.3%,
Artery,32,2.7%,
PosN,17,1.5%,
RRAn,17,1.5%,
RRAe,10,0.9%,
PosA,7,0.6%,
RRNn,4,0.3%,
RRNe,2,0.2%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Norm,1160
Feedr,4
PosN,1
Other values (3),3

Value,Count,Frequency (%),Unnamed: 3
Norm,1160,99.3%,
Feedr,4,0.3%,
PosN,1,0.1%,
Artery,1,0.1%,
RRAe,1,0.1%,
PosA,1,0.1%,

0,1
Distinct count,107
Unique (%),9.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,23.022
Minimum,0
Maximum,552
Zeros (%),85.4%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,184
Maximum,552
Range,552
Interquartile range,0

0,1
Standard deviation,63.153
Coef of variation,2.7431
Kurtosis,10.375
Mean,23.022
MAD,39.315
Skewness,3.0666
Sum,26890
Variance,3988.3
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,997,85.4%,
112,14,1.2%,
96,6,0.5%,
216,5,0.4%,
120,4,0.3%,
116,3,0.3%,
252,3,0.3%,
164,3,0.3%,
192,3,0.3%,
156,3,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,997,85.4%,
19,1,0.1%,
20,1,0.1%,
24,1,0.1%,
32,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
301,1,0.1%,
318,1,0.1%,
330,1,0.1%,
386,1,0.1%,
552,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
TA,1025
Gd,114
Fa,25
Other values (2),4

Value,Count,Frequency (%),Unnamed: 3
TA,1025,87.8%,
Gd,114,9.8%,
Fa,25,2.1%,
Ex,3,0.3%,
Po,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,729
Gd,388
Ex,39

Value,Count,Frequency (%),Unnamed: 3
TA,729,62.4%,
Gd,388,33.2%,
Ex,39,3.3%,
Fa,12,1.0%,

0,1
Distinct count,15
Unique (%),1.3%
Missing (%),0.0%
Missing (n),0

0,1
VinylSd,412
Wd Sdng,175
HdBoard,171
Other values (12),410

Value,Count,Frequency (%),Unnamed: 3
VinylSd,412,35.3%,
Wd Sdng,175,15.0%,
HdBoard,171,14.6%,
MetalSd,164,14.0%,
Plywood,94,8.0%,
CemntBd,50,4.3%,
BrkFace,42,3.6%,
WdShing,21,1.8%,
Stucco,17,1.5%,
AsbShng,16,1.4%,

0,1
Distinct count,16
Unique (%),1.4%
Missing (%),0.0%
Missing (n),0

0,1
VinylSd,401
Wd Sdng,169
HdBoard,160
Other values (13),438

Value,Count,Frequency (%),Unnamed: 3
VinylSd,401,34.3%,
Wd Sdng,169,14.5%,
HdBoard,160,13.7%,
MetalSd,160,13.7%,
Plywood,123,10.5%,
CmentBd,50,4.3%,
Wd Shng,30,2.6%,
BrkFace,21,1.8%,
AsbShng,18,1.5%,
Stucco,16,1.4%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
,954
MnPrv,113
GdPrv,51
Other values (2),50

Value,Count,Frequency (%),Unnamed: 3
,954,81.7%,
MnPrv,113,9.7%,
GdPrv,51,4.4%,
GdWo,43,3.7%,
MnWw,7,0.6%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
,551
Gd,295
TA,257
Other values (3),65

Value,Count,Frequency (%),Unnamed: 3
,551,47.2%,
Gd,295,25.3%,
TA,257,22.0%,
Fa,29,2.5%,
Ex,19,1.6%,
Po,17,1.5%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.61216
Minimum,0
Maximum,3
Zeros (%),47.2%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,1
Q3,1
95-th percentile,2
Maximum,3
Range,3
Interquartile range,1

0,1
Standard deviation,0.64087
Coef of variation,1.0469
Kurtosis,-0.31165
Mean,0.61216
MAD,0.57757
Skewness,0.6223
Sum,715
Variance,0.41072
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,551,47.2%,
1,522,44.7%,
2,92,7.9%,
3,3,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,551,47.2%,
1,522,44.7%,
2,92,7.9%,
3,3,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0,551,47.2%,
1,522,44.7%,
2,92,7.9%,
3,3,0.3%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
PConc,512
CBlock,505
BrkTil,125
Other values (3),26

Value,Count,Frequency (%),Unnamed: 3
PConc,512,43.8%,
CBlock,505,43.2%,
BrkTil,125,10.7%,
Slab,19,1.6%,
Stone,6,0.5%,
Wood,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.5668
Minimum,0
Maximum,3
Zeros (%),0.5%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,2
Q3,2
95-th percentile,2
Maximum,3
Range,3
Interquartile range,1

0,1
Standard deviation,0.5467
Coef of variation,0.34893
Kurtosis,-0.9157
Mean,1.5668
MAD,0.51979
Skewness,0.032962
Sum,1830
Variance,0.29888
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
2,618,52.9%,
1,519,44.4%,
3,25,2.1%,
0,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
0,6,0.5%,
1,519,44.4%,
2,618,52.9%,
3,25,2.1%,

Value,Count,Frequency (%),Unnamed: 3
0,6,0.5%,
1,519,44.4%,
2,618,52.9%,
3,25,2.1%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Typ,1088
Min2,30
Min1,24
Other values (3),26

Value,Count,Frequency (%),Unnamed: 3
Typ,1088,93.2%,
Min2,30,2.6%,
Min1,24,2.1%,
Mod,12,1.0%,
Maj1,11,0.9%,
Maj2,3,0.3%,

0,1
Distinct count,400
Unique (%),34.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,473.63
Minimum,0
Maximum,1390
Zeros (%),5.0%

0,1
Minimum,0.0
5-th percentile,160.0
Q1,336.0
Median,477.5
Q3,576.0
95-th percentile,845.3
Maximum,1390.0
Range,1390.0
Interquartile range,240.0

0,1
Standard deviation,209.44
Coef of variation,0.4422
Kurtosis,0.82934
Mean,473.63
MAD,157.65
Skewness,0.17431
Sum,553203
Variance,43866
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,58,5.0%,
440,42,3.6%,
576,35,3.0%,
240,31,2.7%,
528,29,2.5%,
484,25,2.1%,
264,21,1.8%,
400,20,1.7%,
288,19,1.6%,
308,16,1.4%,

Value,Count,Frequency (%),Unnamed: 3
0,58,5.0%,
160,2,0.2%,
180,8,0.7%,
186,1,0.1%,
189,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1134,1,0.1%,
1166,1,0.1%,
1248,1,0.1%,
1356,1,0.1%,
1390,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.7714
Minimum,0
Maximum,4
Zeros (%),5.0%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,2
Q3,2
95-th percentile,3
Maximum,4
Range,4
Interquartile range,1

0,1
Standard deviation,0.73004
Coef of variation,0.41213
Kurtosis,0.195
Mean,1.7714
MAD,0.57088
Skewness,-0.35852
Sum,2069
Variance,0.53296
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
2,665,56.9%,
1,299,25.6%,
3,144,12.3%,
0,58,5.0%,
4,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,58,5.0%,
1,299,25.6%,
2,665,56.9%,
3,144,12.3%,
4,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,58,5.0%,
1,299,25.6%,
2,665,56.9%,
3,144,12.3%,
4,2,0.2%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
TA,1063
,58
Fa,32
Other values (3),15

Value,Count,Frequency (%),Unnamed: 3
TA,1063,91.0%,
,58,5.0%,
Fa,32,2.7%,
Gd,8,0.7%,
Po,6,0.5%,
Ex,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Unf,488
RFn,337
Fin,285

Value,Count,Frequency (%),Unnamed: 3
Unf,488,41.8%,
RFn,337,28.9%,
Fin,285,24.4%,
,58,5.0%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
TA,1055
,58
Fa,40
Other values (3),15

Value,Count,Frequency (%),Unnamed: 3
TA,1055,90.3%,
,58,5.0%,
Fa,40,3.4%,
Gd,10,0.9%,
Po,3,0.3%,
Ex,2,0.2%,

0,1
Distinct count,7
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0

0,1
Attchd,696
Detchd,315
BuiltIn,74
Other values (4),83

Value,Count,Frequency (%),Unnamed: 3
Attchd,696,59.6%,
Detchd,315,27.0%,
BuiltIn,74,6.3%,
,58,5.0%,
Basment,14,1.2%,
2Types,6,0.5%,
CarPort,5,0.4%,

0,1
Distinct count,98
Unique (%),8.4%
Missing (%),5.0%
Missing (n),58
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1978.1
Minimum,1900
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,1900
5-th percentile,1928
Q1,1961
Median,1979
Q3,2002
95-th percentile,2007
Maximum,2010
Range,110
Interquartile range,41

0,1
Standard deviation,24.877
Coef of variation,0.012576
Kurtosis,-0.42154
Mean,1978.1
MAD,21.023
Skewness,-0.64474
Sum,2195700
Variance,618.88
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
2005.0,54,4.6%,
2006.0,46,3.9%,
2003.0,42,3.6%,
2004.0,39,3.3%,
2007.0,37,3.2%,
1977.0,30,2.6%,
1998.0,26,2.2%,
2008.0,23,2.0%,
1999.0,23,2.0%,
2002.0,23,2.0%,

Value,Count,Frequency (%),Unnamed: 3
1900.0,1,0.1%,
1906.0,1,0.1%,
1908.0,1,0.1%,
1910.0,2,0.2%,
1914.0,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
2006.0,46,3.9%,
2007.0,37,3.2%,
2008.0,23,2.0%,
2009.0,16,1.4%,
2010.0,3,0.3%,

0,1
Distinct count,741
Unique (%),63.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1518.9
Minimum,334
Maximum,4676
Zeros (%),0.0%

0,1
Minimum,334.0
5-th percentile,848.0
Q1,1139.0
Median,1471.5
Q3,1788.5
95-th percentile,2461.1
Maximum,4676.0
Range,4342.0
Interquartile range,649.5

0,1
Standard deviation,513.8
Coef of variation,0.33828
Kurtosis,2.5429
Mean,1518.9
MAD,395.48
Skewness,1.0768
Sum,1774055
Variance,263990
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
864,14,1.2%,
1456,10,0.9%,
1040,9,0.8%,
1200,9,0.8%,
894,9,0.8%,
912,8,0.7%,
848,8,0.7%,
816,7,0.6%,
1092,7,0.6%,
1344,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
334,1,0.1%,
438,1,0.1%,
480,1,0.1%,
605,1,0.1%,
616,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
3493,1,0.1%,
3608,1,0.1%,
3627,1,0.1%,
4316,1,0.1%,
4676,1,0.1%,

0,1
Distinct count,3
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.38442
Minimum,0
Maximum,2
Zeros (%),62.1%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,1
95-th percentile,1
Maximum,2
Range,2
Interquartile range,1

0,1
Standard deviation,0.49712
Coef of variation,1.2932
Kurtosis,-1.3229
Mean,0.38442
MAD,0.47723
Skewness,0.60126
Sum,449
Variance,0.24713
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,725,62.1%,
1,437,37.4%,
2,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
0,725,62.1%,
1,437,37.4%,
2,6,0.5%,

Value,Count,Frequency (%),Unnamed: 3
0,725,62.1%,
1,437,37.4%,
2,6,0.5%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
GasA,1143
GasW,13
Grav,7
Other values (3),5

Value,Count,Frequency (%),Unnamed: 3
GasA,1143,97.9%,
GasW,13,1.1%,
Grav,7,0.6%,
Wall,2,0.2%,
OthW,2,0.2%,
Floor,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
Ex,591
TA,342
Gd,196
Other values (2),39

Value,Count,Frequency (%),Unnamed: 3
Ex,591,50.6%,
TA,342,29.3%,
Gd,196,16.8%,
Fa,38,3.3%,
Po,1,0.1%,

0,1
Distinct count,8
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0

0,1
1Story,579
2Story,362
1.5Fin,123
Other values (5),104

Value,Count,Frequency (%),Unnamed: 3
1Story,579,49.6%,
2Story,362,31.0%,
1.5Fin,123,10.5%,
SLvl,50,4.3%,
SFoyer,25,2.1%,
1.5Unf,12,1.0%,
2.5Unf,10,0.9%,
2.5Fin,7,0.6%,

0,1
Distinct count,1168
Unique (%),100.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,738.69
Minimum,1
Maximum,1460
Zeros (%),0.0%

0,1
Minimum,1.0
5-th percentile,82.35
Q1,373.75
Median,749.5
Q3,1108.8
95-th percentile,1393.3
Maximum,1460.0
Range,1459.0
Interquartile range,735.0

0,1
Standard deviation,421.61
Coef of variation,0.57076
Kurtosis,-1.2008
Mean,738.69
MAD,364.49
Skewness,-0.015381
Sum,862785
Variance,177750
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
1460,1,0.1%,
507,1,0.1%,
490,1,0.1%,
491,1,0.1%,
493,1,0.1%,
494,1,0.1%,
495,1,0.1%,
496,1,0.1%,
497,1,0.1%,
498,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1,1,0.1%,
4,1,0.1%,
7,1,0.1%,
8,1,0.1%,
9,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1456,1,0.1%,
1457,1,0.1%,
1458,1,0.1%,
1459,1,0.1%,
1460,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.0445
Minimum,0
Maximum,3
Zeros (%),0.1%

0,1
Minimum,0
5-th percentile,1
Q1,1
Median,1
Q3,1
95-th percentile,1
Maximum,3
Range,3
Interquartile range,0

0,1
Standard deviation,0.21844
Coef of variation,0.20913
Kurtosis,23.957
Mean,1.0445
MAD,0.086866
Skewness,4.6495
Sum,1220
Variance,0.047716
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
1,1116,95.5%,
2,49,4.2%,
3,2,0.2%,
0,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1,0.1%,
1,1116,95.5%,
2,49,4.2%,
3,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,1,0.1%,
1,1116,95.5%,
2,49,4.2%,
3,2,0.2%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
TA,596
Gd,462
Ex,78

Value,Count,Frequency (%),Unnamed: 3
TA,596,51.0%,
Gd,462,39.6%,
Ex,78,6.7%,
Fa,32,2.7%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Lvl,1054
Bnk,50
HLS,34

Value,Count,Frequency (%),Unnamed: 3
Lvl,1054,90.2%,
Bnk,50,4.3%,
HLS,34,2.9%,
Low,30,2.6%,

0,1
Distinct count,3
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Gtl,1100
Mod,55
Sev,13

Value,Count,Frequency (%),Unnamed: 3
Gtl,1100,94.2%,
Mod,55,4.7%,
Sev,13,1.1%,

0,1
Distinct count,891
Unique (%),76.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,10590
Minimum,1300
Maximum,215245
Zeros (%),0.0%

0,1
Minimum,1300.0
5-th percentile,3207.9
Q1,7589.5
Median,9512.5
Q3,11602.0
95-th percentile,17133.0
Maximum,215245.0
Range,213945.0
Interquartile range,4012.0

0,1
Standard deviation,10704
Coef of variation,1.0108
Kurtosis,190.84
Mean,10590
MAD,3822.8
Skewness,12.14
Sum,12368738
Variance,114580000
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
7200,20,1.7%,
9600,20,1.7%,
6000,14,1.2%,
10800,12,1.0%,
9000,11,0.9%,
8400,10,0.9%,
1680,8,0.7%,
6120,8,0.7%,
3182,7,0.6%,
6240,7,0.6%,

Value,Count,Frequency (%),Unnamed: 3
1300,1,0.1%,
1491,1,0.1%,
1526,1,0.1%,
1533,2,0.2%,
1596,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
70761,1,0.1%,
115149,1,0.1%,
159000,1,0.1%,
164660,1,0.1%,
215245,1,0.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
Inside,850
Corner,205
CulDSac,76
Other values (2),37

Value,Count,Frequency (%),Unnamed: 3
Inside,850,72.8%,
Corner,205,17.6%,
CulDSac,76,6.5%,
FR2,36,3.1%,
FR3,1,0.1%,

0,1
Distinct count,105
Unique (%),9.0%
Missing (%),18.2%
Missing (n),212
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,69.614
Minimum,21
Maximum,313
Zeros (%),0.0%

0,1
Minimum,21.0
5-th percentile,34.0
Q1,59.0
Median,69.0
Q3,80.0
95-th percentile,104.25
Maximum,313.0
Range,292.0
Interquartile range,21.0

0,1
Standard deviation,22.946
Coef of variation,0.32962
Kurtosis,14.436
Mean,69.614
MAD,16.403
Skewness,1.7188
Sum,66551
Variance,526.52
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
60.0,113,9.7%,
80.0,59,5.1%,
70.0,48,4.1%,
50.0,44,3.8%,
75.0,42,3.6%,
85.0,33,2.8%,
65.0,30,2.6%,
78.0,21,1.8%,
90.0,20,1.7%,
55.0,17,1.5%,

Value,Count,Frequency (%),Unnamed: 3
21.0,17,1.5%,
24.0,16,1.4%,
30.0,6,0.5%,
32.0,4,0.3%,
33.0,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
153.0,1,0.1%,
168.0,1,0.1%,
174.0,1,0.1%,
182.0,1,0.1%,
313.0,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Reg,735
IR1,396
IR2,30

Value,Count,Frequency (%),Unnamed: 3
Reg,735,62.9%,
IR1,396,33.9%,
IR2,30,2.6%,
IR3,7,0.6%,

0,1
Distinct count,21
Unique (%),1.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.4443
Minimum,0
Maximum,572
Zeros (%),98.0%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,572
Range,572
Interquartile range,0

0,1
Standard deviation,51.201
Coef of variation,7.9451
Kurtosis,75.795
Mean,6.4443
MAD,12.635
Skewness,8.6068
Sum,7527
Variance,2621.5
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1145,98.0%,
80,3,0.3%,
360,2,0.2%,
384,1,0.1%,
53,1,0.1%,
120,1,0.1%,
144,1,0.1%,
205,1,0.1%,
232,1,0.1%,
234,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1145,98.0%,
53,1,0.1%,
80,3,0.3%,
120,1,0.1%,
144,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
513,1,0.1%,
514,1,0.1%,
515,1,0.1%,
528,1,0.1%,
572,1,0.1%,

0,1
Distinct count,15
Unique (%),1.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,56.605
Minimum,20
Maximum,190
Zeros (%),0.0%

0,1
Minimum,20
5-th percentile,20
Q1,20
Median,50
Q3,70
95-th percentile,160
Maximum,190
Range,170
Interquartile range,50

0,1
Standard deviation,42.172
Coef of variation,0.74502
Kurtosis,1.6266
Mean,56.605
MAD,31.108
Skewness,1.4243
Sum,66115
Variance,1778.5
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
20,430,36.8%,
60,243,20.8%,
50,115,9.8%,
120,68,5.8%,
30,56,4.8%,
160,53,4.5%,
70,49,4.2%,
80,44,3.8%,
90,38,3.3%,
190,24,2.1%,

Value,Count,Frequency (%),Unnamed: 3
20,430,36.8%,
30,56,4.8%,
40,3,0.3%,
45,11,0.9%,
50,115,9.8%,

Value,Count,Frequency (%),Unnamed: 3
90,38,3.3%,
120,68,5.8%,
160,53,4.5%,
180,6,0.5%,
190,24,2.1%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
RL,921
RM,174
FV,49
Other values (2),24

Value,Count,Frequency (%),Unnamed: 3
RL,921,78.9%,
RM,174,14.9%,
FV,49,4.2%,
RH,15,1.3%,
C (all),9,0.8%,

0,1
Distinct count,283
Unique (%),24.2%
Missing (%),0.5%
Missing (n),6
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,103.48
Minimum,0
Maximum,1600
Zeros (%),59.8%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,167.75
95-th percentile,456.0
Maximum,1600.0
Range,1600.0
Interquartile range,167.75

0,1
Standard deviation,182.68
Coef of variation,1.7653
Kurtosis,10.665
Mean,103.48
MAD,131.17
Skewness,2.7135
Sum,120240
Variance,33371
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,698,59.8%,
16.0,7,0.6%,
106.0,6,0.5%,
180.0,6,0.5%,
108.0,6,0.5%,
132.0,5,0.4%,
320.0,5,0.4%,
80.0,5,0.4%,
76.0,4,0.3%,
170.0,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
0.0,698,59.8%,
1.0,2,0.2%,
11.0,1,0.1%,
14.0,1,0.1%,
16.0,7,0.6%,

Value,Count,Frequency (%),Unnamed: 3
1115.0,1,0.1%,
1129.0,1,0.1%,
1170.0,1,0.1%,
1378.0,1,0.1%,
1600.0,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
,707
BrkFace,338
Stone,112

Value,Count,Frequency (%),Unnamed: 3
,707,60.5%,
BrkFace,338,28.9%,
Stone,112,9.6%,
BrkCmn,11,0.9%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
,1119
Shed,45
Gar2,2

Value,Count,Frequency (%),Unnamed: 3
,1119,95.8%,
Shed,45,3.9%,
Gar2,2,0.2%,
Othr,2,0.2%,

0,1
Distinct count,21
Unique (%),1.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,50.937
Minimum,0
Maximum,15500
Zeros (%),95.9%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,15500
Range,15500
Interquartile range,0

0,1
Standard deviation,550.38
Coef of variation,10.805
Kurtosis,577.36
Mean,50.937
MAD,97.687
Skewness,22.339
Sum,59494
Variance,302920
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1120,95.9%,
400,11,0.9%,
500,8,0.7%,
450,4,0.3%,
700,3,0.3%,
2000,3,0.3%,
600,3,0.3%,
1200,2,0.2%,
480,2,0.2%,
1150,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1120,95.9%,
54,1,0.1%,
350,1,0.1%,
400,11,0.9%,
450,4,0.3%,

Value,Count,Frequency (%),Unnamed: 3
2000,3,0.3%,
2500,1,0.1%,
3500,1,0.1%,
8300,1,0.1%,
15500,1,0.1%,

0,1
Distinct count,12
Unique (%),1.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.3014
Minimum,1
Maximum,12
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,2
Q1,5
Median,6
Q3,8
95-th percentile,11
Maximum,12
Range,11
Interquartile range,3

0,1
Standard deviation,2.726
Coef of variation,0.4326
Kurtosis,-0.40992
Mean,6.3014
MAD,2.1604
Skewness,0.2333
Sum,7360
Variance,7.4309
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
6,195,16.7%,
7,189,16.2%,
5,170,14.6%,
4,118,10.1%,
8,90,7.7%,
3,83,7.1%,
10,69,5.9%,
11,66,5.7%,
12,49,4.2%,
9,49,4.2%,

Value,Count,Frequency (%),Unnamed: 3
1,49,4.2%,
2,41,3.5%,
3,83,7.1%,
4,118,10.1%,
5,170,14.6%,

Value,Count,Frequency (%),Unnamed: 3
8,90,7.7%,
9,49,4.2%,
10,69,5.9%,
11,66,5.7%,
12,49,4.2%,

0,1
Distinct count,25
Unique (%),2.1%
Missing (%),0.0%
Missing (n),0

0,1
NAmes,177
CollgCr,116
OldTown,89
Other values (22),786

Value,Count,Frequency (%),Unnamed: 3
NAmes,177,15.2%,
CollgCr,116,9.9%,
OldTown,89,7.6%,
Edwards,80,6.8%,
Somerst,68,5.8%,
Sawyer,65,5.6%,
Gilbert,64,5.5%,
NridgHt,61,5.2%,
NWAmes,56,4.8%,
SawyerW,46,3.9%,

0,1
Distinct count,185
Unique (%),15.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,48.045
Minimum,0
Maximum,547
Zeros (%),44.6%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,26.0
Q3,68.0
95-th percentile,185.95
Maximum,547.0
Range,547.0
Interquartile range,68.0

0,1
Standard deviation,68.619
Coef of variation,1.4282
Kurtosis,8.6525
Mean,48.045
MAD,49.012
Skewness,2.4043
Sum,56116
Variance,4708.6
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,521,44.6%,
36,24,2.1%,
20,18,1.5%,
48,17,1.5%,
40,17,1.5%,
24,15,1.3%,
60,14,1.2%,
45,14,1.2%,
44,12,1.0%,
39,12,1.0%,

Value,Count,Frequency (%),Unnamed: 3
0,521,44.6%,
8,1,0.1%,
10,1,0.1%,
11,1,0.1%,
12,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
406,1,0.1%,
418,1,0.1%,
502,1,0.1%,
523,1,0.1%,
547,1,0.1%,

0,1
Distinct count,9
Unique (%),0.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,5.5728
Minimum,1
Maximum,9
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,4
Q1,5
Median,5
Q3,6
95-th percentile,8
Maximum,9
Range,8
Interquartile range,1

0,1
Standard deviation,1.1169
Coef of variation,0.20042
Kurtosis,1.1977
Mean,5.5728
MAD,0.88841
Skewness,0.6801
Sum,6509
Variance,1.2475
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
5,654,56.0%,
6,207,17.7%,
7,158,13.5%,
8,59,5.1%,
4,48,4.1%,
9,18,1.5%,
3,18,1.5%,
2,5,0.4%,
1,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
1,1,0.1%,
2,5,0.4%,
3,18,1.5%,
4,48,4.1%,
5,654,56.0%,

Value,Count,Frequency (%),Unnamed: 3
5,654,56.0%,
6,207,17.7%,
7,158,13.5%,
8,59,5.1%,
9,18,1.5%,

0,1
Distinct count,10
Unique (%),0.9%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.0865
Minimum,1
Maximum,10
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,4
Q1,5
Median,6
Q3,7
95-th percentile,8
Maximum,10
Range,9
Interquartile range,2

0,1
Standard deviation,1.3675
Coef of variation,0.22467
Kurtosis,0.14852
Mean,6.0865
MAD,1.0813
Skewness,0.16989
Sum,7109
Variance,1.87
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
5,319,27.3%,
6,304,26.0%,
7,255,21.8%,
8,135,11.6%,
4,91,7.8%,
9,32,2.7%,
3,15,1.3%,
10,12,1.0%,
2,3,0.3%,
1,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
1,2,0.2%,
2,3,0.3%,
3,15,1.3%,
4,91,7.8%,
5,319,27.3%,

Value,Count,Frequency (%),Unnamed: 3
6,304,26.0%,
7,255,21.8%,
8,135,11.6%,
9,32,2.7%,
10,12,1.0%,

0,1
Distinct count,3
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Y,1067
N,79
P,22

Value,Count,Frequency (%),Unnamed: 3
Y,1067,91.4%,
N,79,6.8%,
P,22,1.9%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2.1182
Minimum,0
Maximum,738
Zeros (%),99.7%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,738
Range,738
Interquartile range,0

0,1
Standard deviation,36.482
Coef of variation,17.224
Kurtosis,309.79
Mean,2.1182
MAD,4.2218
Skewness,17.492
Sum,2474
Variance,1331
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1164,99.7%,
738,1,0.1%,
648,1,0.1%,
576,1,0.1%,
512,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1164,99.7%,
512,1,0.1%,
576,1,0.1%,
648,1,0.1%,
738,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0,1164,99.7%,
512,1,0.1%,
576,1,0.1%,
648,1,0.1%,
738,1,0.1%,

0,1
Distinct count,4
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
,1164
Gd,2
Fa,1

Value,Count,Frequency (%),Unnamed: 3
,1164,99.7%,
Gd,2,0.2%,
Fa,1,0.1%,
Ex,1,0.1%,

0,1
Distinct count,7
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0

0,1
CompShg,1146
Tar&Grv,9
WdShake,5
Other values (4),8

Value,Count,Frequency (%),Unnamed: 3
CompShg,1146,98.1%,
Tar&Grv,9,0.8%,
WdShake,5,0.4%,
WdShngl,5,0.4%,
Metal,1,0.1%,
Membran,1,0.1%,
Roll,1,0.1%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Gable,905
Hip,236
Flat,11
Other values (3),16

Value,Count,Frequency (%),Unnamed: 3
Gable,905,77.5%,
Hip,236,20.2%,
Flat,11,0.9%,
Gambrel,8,0.7%,
Mansard,6,0.5%,
Shed,2,0.2%,

0,1
Distinct count,6
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
Normal,969
Partial,98
Abnorml,79
Other values (3),22

Value,Count,Frequency (%),Unnamed: 3
Normal,969,83.0%,
Partial,98,8.4%,
Abnorml,79,6.8%,
Family,12,1.0%,
Alloca,7,0.6%,
AdjLand,3,0.3%,

0,1
Distinct count,9
Unique (%),0.8%
Missing (%),0.0%
Missing (n),0

0,1
WD,1019
New,96
COD,33
Other values (6),20

Value,Count,Frequency (%),Unnamed: 3
WD,1019,87.2%,
New,96,8.2%,
COD,33,2.8%,
ConLD,7,0.6%,
ConLw,5,0.4%,
ConLI,4,0.3%,
Oth,2,0.2%,
Con,1,0.1%,
CWD,1,0.1%,

0,1
Distinct count,66
Unique (%),5.7%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,14.528
Minimum,0
Maximum,480
Zeros (%),92.1%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,0.0
95-th percentile,155.65
Maximum,480.0
Range,480.0
Interquartile range,0.0

0,1
Standard deviation,54.01
Coef of variation,3.7176
Kurtosis,18.591
Mean,14.528
MAD,26.768
Skewness,4.1312
Sum,16969
Variance,2917
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,1076,92.1%,
120,4,0.3%,
180,4,0.3%,
192,4,0.3%,
224,3,0.3%,
90,3,0.3%,
147,3,0.3%,
168,3,0.3%,
189,3,0.3%,
126,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,1076,92.1%,
40,1,0.1%,
53,1,0.1%,
60,1,0.1%,
63,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
312,1,0.1%,
374,1,0.1%,
385,1,0.1%,
410,1,0.1%,
480,1,0.1%,

0,1
Distinct count,2
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
Pave,1163
Grvl,5

Value,Count,Frequency (%),Unnamed: 3
Pave,1163,99.6%,
Grvl,5,0.4%,

0,1
Distinct count,12
Unique (%),1.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.5445
Minimum,2
Maximum,14
Zeros (%),0.0%

0,1
Minimum,2
5-th percentile,4
Q1,5
Median,6
Q3,7
95-th percentile,10
Maximum,14
Range,12
Interquartile range,2

0,1
Standard deviation,1.6245
Coef of variation,0.24822
Kurtosis,0.72644
Mean,6.5445
MAD,1.2847
Skewness,0.62362
Sum,7644
Variance,2.639
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
6,317,27.1%,
7,265,22.7%,
5,216,18.5%,
8,154,13.2%,
4,76,6.5%,
9,62,5.3%,
10,40,3.4%,
11,18,1.5%,
3,13,1.1%,
12,5,0.4%,

Value,Count,Frequency (%),Unnamed: 3
2,1,0.1%,
3,13,1.1%,
4,76,6.5%,
5,216,18.5%,
6,317,27.1%,

Value,Count,Frequency (%),Unnamed: 3
9,62,5.3%,
10,40,3.4%,
11,18,1.5%,
12,5,0.4%,
14,1,0.1%,

0,1
Distinct count,623
Unique (%),53.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1053.5
Minimum,0
Maximum,3206
Zeros (%),2.4%

0,1
Minimum,0.0
5-th percentile,534.05
Q1,798.75
Median,992.0
Q3,1276.2
95-th percentile,1734.0
Maximum,3206.0
Range,3206.0
Interquartile range,477.5

0,1
Standard deviation,412.07
Coef of variation,0.39114
Kurtosis,2.1657
Mean,1053.5
MAD,311.3
Skewness,0.59149
Sum,1230502
Variance,169800
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,28,2.4%,
864,26,2.2%,
672,14,1.2%,
912,13,1.1%,
768,12,1.0%,
816,12,1.0%,
1040,11,0.9%,
728,11,0.9%,
848,9,0.8%,
780,9,0.8%,

Value,Count,Frequency (%),Unnamed: 3
0,28,2.4%,
105,1,0.1%,
190,1,0.1%,
264,2,0.2%,
270,1,0.1%,

Value,Count,Frequency (%),Unnamed: 3
2524,1,0.1%,
2633,1,0.1%,
3138,1,0.1%,
3200,1,0.1%,
3206,1,0.1%,

0,1
Distinct count,2
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
AllPub,1167
NoSeWa,1

Value,Count,Frequency (%),Unnamed: 3
AllPub,1167,99.9%,
NoSeWa,1,0.1%,

0,1
Distinct count,240
Unique (%),20.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,94.498
Minimum,0
Maximum,736
Zeros (%),52.8%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,168.0
95-th percentile,350.3
Maximum,736.0
Range,736.0
Interquartile range,168.0

0,1
Standard deviation,127.31
Coef of variation,1.3472
Kurtosis,2.5301
Mean,94.498
MAD,103.49
Skewness,1.5169
Sum,110374
Variance,16208
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
0,617,52.8%,
144,30,2.6%,
192,30,2.6%,
100,29,2.5%,
168,24,2.1%,
120,23,2.0%,
224,11,0.9%,
140,10,0.9%,
180,8,0.7%,
240,8,0.7%,

Value,Count,Frequency (%),Unnamed: 3
0,617,52.8%,
12,2,0.2%,
24,2,0.2%,
26,2,0.2%,
28,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
635,1,0.1%,
668,1,0.1%,
670,1,0.1%,
728,1,0.1%,
736,1,0.1%,

0,1
Distinct count,110
Unique (%),9.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1970.9
Minimum,1872
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,1872.0
5-th percentile,1916.0
Q1,1953.8
Median,1972.0
Q3,2000.0
95-th percentile,2007.0
Maximum,2010.0
Range,138.0
Interquartile range,46.25

0,1
Standard deviation,30.407
Coef of variation,0.015428
Kurtosis,-0.42913
Mean,1970.9
MAD,25.202
Skewness,-0.61311
Sum,2302000
Variance,924.62
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
2005,53,4.5%,
2006,51,4.4%,
2007,39,3.3%,
2004,39,3.3%,
2003,37,3.2%,
1977,27,2.3%,
1920,27,2.3%,
1976,26,2.2%,
1959,23,2.0%,
1965,22,1.9%,

Value,Count,Frequency (%),Unnamed: 3
1872,1,0.1%,
1875,1,0.1%,
1880,4,0.3%,
1885,2,0.2%,
1890,2,0.2%,

Value,Count,Frequency (%),Unnamed: 3
2006,51,4.4%,
2007,39,3.3%,
2008,18,1.5%,
2009,13,1.1%,
2010,1,0.1%,

0,1
Distinct count,61
Unique (%),5.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1984.7
Minimum,1950
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,1950
5-th percentile,1950
Q1,1966
Median,1993
Q3,2004
95-th percentile,2007
Maximum,2010
Range,60
Interquartile range,38

0,1
Standard deviation,20.685
Coef of variation,0.010422
Kurtosis,-1.288
Mean,1984.7
MAD,18.683
Skewness,-0.49141
Sum,2318121
Variance,427.85
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
1950,145,12.4%,
2006,77,6.6%,
2007,61,5.2%,
2005,53,4.5%,
2004,49,4.2%,
2000,43,3.7%,
2003,42,3.6%,
2002,40,3.4%,
1998,31,2.7%,
2008,31,2.7%,

Value,Count,Frequency (%),Unnamed: 3
1950,145,12.4%,
1951,3,0.3%,
1952,3,0.3%,
1953,9,0.8%,
1954,12,1.0%,

Value,Count,Frequency (%),Unnamed: 3
2006,77,6.6%,
2007,61,5.2%,
2008,31,2.7%,
2009,17,1.5%,
2010,6,0.5%,

0,1
Distinct count,5
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2007.8
Minimum,2006
Maximum,2010
Zeros (%),0.0%

0,1
Minimum,2006
5-th percentile,2006
Q1,2007
Median,2008
Q3,2009
95-th percentile,2010
Maximum,2010
Range,4
Interquartile range,2

0,1
Standard deviation,1.336
Coef of variation,0.00066538
Kurtosis,-1.1906
Mean,2007.8
MAD,1.153
Skewness,0.10393
Sum,2345133
Variance,1.7848
Memory size,9.2 KiB

Value,Count,Frequency (%),Unnamed: 3
2009,261,22.3%,
2007,260,22.3%,
2006,253,21.7%,
2008,247,21.1%,
2010,147,12.6%,

Value,Count,Frequency (%),Unnamed: 3
2006,253,21.7%,
2007,260,22.3%,
2008,247,21.1%,
2009,261,22.3%,
2010,147,12.6%,

Value,Count,Frequency (%),Unnamed: 3
2006,253,21.7%,
2007,260,22.3%,
2008,247,21.1%,
2009,261,22.3%,
2010,147,12.6%,

Unnamed: 0_level_0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1
619,20,RL,90.0,11694,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NridgHt,Norm,Norm,1Fam,1Story,9,5,2007,2007,Hip,CompShg,CemntBd,CmentBd,BrkFace,452.0,Ex,TA,PConc,Ex,TA,Av,GLQ,48,Unf,0,1774,1822,GasA,Ex,Y,1828,0,0,1828,0,0,2,0,3,1,Gd,9,Typ,1,Gd,Attchd,2007.0,Unf,3,774,TA,TA,Y,0,108,0,0,260,0,,,,0,7,2007,New,Partial
871,20,RL,60.0,6600,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,PosN,Norm,1Fam,1Story,5,5,1962,1962,Hip,CompShg,MetalSd,MetalSd,,0.0,TA,TA,CBlock,TA,TA,No,Unf,0,Unf,0,894,894,GasA,Gd,N,894,0,0,894,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1962.0,Unf,1,308,TA,TA,Y,0,0,0,0,0,0,,,,0,8,2009,WD,Normal
93,30,RL,80.0,13360,Pave,Grvl,IR1,HLS,AllPub,Inside,Gtl,Crawfor,Norm,Norm,1Fam,1Story,5,7,1921,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,TA,Gd,BrkTil,Gd,TA,No,ALQ,713,Unf,0,163,876,GasA,Ex,Y,964,0,0,964,1,0,1,0,2,1,TA,5,Typ,0,,Detchd,1921.0,Unf,2,432,TA,TA,Y,0,0,44,0,0,0,,,,0,8,2009,WD,Normal
818,20,RL,,13265,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Mitchel,Norm,Norm,1Fam,1Story,8,5,2002,2002,Hip,CompShg,CemntBd,CmentBd,BrkFace,148.0,Gd,TA,PConc,Gd,TA,No,GLQ,1218,Unf,0,350,1568,GasA,Ex,Y,1689,0,0,1689,1,0,2,0,3,1,Gd,7,Typ,2,Gd,Attchd,2002.0,RFn,3,857,TA,TA,Y,150,59,0,0,0,0,,,,0,7,2008,WD,Normal
303,20,RL,118.0,13704,Pave,,IR1,Lvl,AllPub,Corner,Gtl,CollgCr,Norm,Norm,1Fam,1Story,7,5,2001,2002,Gable,CompShg,VinylSd,VinylSd,BrkFace,150.0,Gd,TA,PConc,Gd,TA,No,Unf,0,Unf,0,1541,1541,GasA,Ex,Y,1541,0,0,1541,0,0,2,0,3,1,Gd,6,Typ,1,TA,Attchd,2001.0,RFn,3,843,TA,TA,Y,468,81,0,0,0,0,,,,0,1,2006,WD,Normal


## Drop Features with Missing Data

In [9]:
reduced_X_train    = X_train.drop( feats_missing_vals, axis=1, inplace=False )
reduced_X_val      = X_val.drop( feats_missing_vals, axis=1, inplace=False )

In [17]:
print( 'Drop feats with missing data MAE: %.2f' % 
       score_system( X_t=reduced_X_train, X_v=reduced_X_val ) )

Drop feats with missing data MAE: 17837.83


## Imputation

In [11]:
def score_imputation( replacement, X_t=X_train, X_v=X_val ):
    # Impute missing values with specified replacement
    imputed_X_train = X_t.fillna( replacement )
    imputed_X_val   = X_v.fillna( replacement )
    return score_system( X_t=imputed_X_train, X_v=imputed_X_val)

In [12]:
# Calculate mean for each feature with missing values
feat_means   = X_train[feats_missing_vals].mean( skipna=True )
feat_medians = X_train[feats_missing_vals].median( skipna=True )
feat_mins    = X_train[feats_missing_vals].min( skipna=True )

print( 'Mean imputation MAE: %.2f' % score_imputation( feat_means ) )
print( 'Median imputation MAE: %.2f' % score_imputation( feat_medians ) )
print( 'Min imputation MAE: %.2f' % score_imputation( feat_mins ) )
print( 'Scalar (0) imputation MAE: %.2f' % score_imputation( 0 ) )

Mean imputation MAE: 18062.89
Median imputation MAE: 17791.60
Min imputation MAE: 18079.88
Scalar (0) imputation MAE: 18017.67


## Mixed Dropping and Imputation

Looking at the feature descriptions gives rise to intuition about whether removing the feature or imputation of the feature makes sense.

LotFrontage - Linear feet of street connected to property  
- Likely missing if no street is connected to property such as an apartment or condo.  
- If this is the case, it makes sense to use imputation with 0's to fill for NAN

MasVnrArea  - Masonry veneer area in square feet  
- Likely missing if no masonry veneer  
- If this is the case, it makes sense to use imputation with 0's to fill for NAN

GarageYrBlt - Year garage was built  
- Likely missing if no garage  
- If this is the case, imputation does not make much sense and simply removing the feature may result in better calssification

In [18]:
# Drop year garage was built
reduced_X_train    = X_train.drop( ['GarageYrBlt'], axis=1, inplace=False )
reduced_X_val      = X_val.drop( ['GarageYrBlt'], axis=1, inplace=False )

# Perform scalar imputation with 0's
print( 'Mixed dropping and scalar (0) imputation MAE: %.2f' % 
       score_imputation( 0, X_t=reduced_X_train, X_v=reduced_X_val ) )

Mixed dropping and scalar (0) imputation MAE: 18133.38


## Scikit Learn Built In Imputer

A less "reinventing the wheel" heavy method is to use Scikit Learn's built in simple imputer class.

I think I like the way I performed Imputation above better. There is less code involved and it seems to be simpler operations. It also has the benefit of working natively with Panda's Data Frame.

In [19]:
from sklearn.impute import SimpleImputer

In [20]:
imputed_X_train = X_train.copy()
imputed_X_val   = X_val.copy()
median_imputer  = SimpleImputer( strategy='median' )

imputed_X_train = pd.DataFrame( median_imputer.fit_transform(imputed_X_train) )
imputed_X_val   = pd.DataFrame( median_imputer.transform(imputed_X_val) )

imputed_X_train.columns = X_train.columns
imputed_X_val.columns = X_val.columns
imputed_X_train.head()
score_system( X_t=imputed_X_train, X_v=imputed_X_val)
print( 'Mean imputation with Scikit Learn MAE: %.2f' % 
       score_system( X_t=imputed_X_train, X_v=imputed_X_val) )


Mean imputation with Scikit Learn MAE: 17791.60


# Manage Categorical Data