![](https://miro.medium.com/max/1050/1*PW7YItccFLn4GXN-H3nVBw.png)
# Project 2 - Ames Housing Data and Kaggle Challenge

## Problem Statement

For home owners, real estate companies and real estate investors alike, often spend large amounts of money remodeling a home is hoping to earn a profit in return. According to the 2019 Remodeling Impact Report by the National Association of Realtors® Research Group, U.S. homeowners spend more than $400 billion each year on renovations and repairs. Most of which are unable to achieve their ideal return on investment. They either overspend on renovations or focus their remodelling and renovation work on the wrong features of the house, thus making losses rather than profits.

Hence we will try to solve this problem by building a model that predicts housing prices based on different distinct features of a home. We will analyzing housing data from the Ames, Iowa to:
* predict housing targets from their actual sale price
* find what are the features that affect house prices the most


## Executive Summary

We will first import two data sets, train and test, of the Ames Housing Dataset. We will create a model based on the train dataset with sale price as our target.After which, use the test dataset(unseen data) to see how accurate our model is.

After importing both data sets, we will clean the data to ensure that it can be used accurately for our model. Things to look out for is completeness(ie if there are any missing or null data and how to treat them). Also are the data in the correct data types and if the values in each column makes sense. 

Next, we will do a Exploratory Data Analysis(EDA) to analyze and indentify any correlation and trends of house features with sale price. We do this by plotting correlation heat maps, box plots, scatter plots and historgrams. This will give us a first indication of the features that we want to focus on.

For the preprocessing stage we will convert categorical data into numerical data to allow us to examine the data on the same scale. For nominal data, we will use Dummy Variable Encoding. Ordinal data variables will be mapped to a numerical rating in terms of order. Ideally a low number for a lower grade/quality and a larger number for a higher grade/quality.

Before we start to run a our model, we will remove outliers identified in order to prevent distortion of results. Next we have to ensure that there are same number of columns for our train and test dataset.

We will do a train/test split on our dataset in other to determine how well our model works for seen and unseen data. 
Because scales of our datasets are different, we will fit all variables in a standard scaler to standardise all attributes to the same scale. We will then use the linear regression as our baseline model.

After which we will perform a cross validation of Lasso, Ridge, ElasticNet to compare scores so that we can identify the model which performs the best for the Ames Housing data. We will further analyze each model by finding searching for the optimised hyperparameters that will provide the best results for our models to work with. We analyze the bias and variance of each model and finally, provide recommendations of the best features to focus on to improve sales price of a house. 

### Contents:
* [Data Import & Cleaning - Train Dataset](#Importing-and-Cleaning-of-Train-Dataset)  
  * [Inspect columns with large number of null values](#Inspect-columns-with-large-number-of-null-values)
* [Fixing Null Data](#Fixing-Null-Data)
  * [Pool QC, Misc Feature, Alley, Fence, Fireplace Qu](#Pool-QC,-Misc-Feature,-Alley,-Fence,-Fireplace-Qu)
  * [Lot Frontage](#Lot-Frontage)
  * [Garage (Qual, Finish, Cond, Yr Blt, Type)](#Garage-(Qual,-Finish,-Cond,-Yr-Blt,-Type))
  * [Bsmt (Exposure, Fin Type 2, Cond, Qual, Fin Type 1)](#Bsmt-(Exposure,-Fin-Type-2,-Cond,-Qual,-Fin-Type-1))
  * [Mas Vnr Type, Mas Vnr Area](#Mas-Vnr-Type,-Mas-Vnr-Area)
  * [Rest of the basment features (Bsmt Half Bath, Bsmt Full Bath, Total Bsmt SF, Bsmt Unf SF, BsmtFin SF 2, BsmtFin SF 2, BsmtFin SF 1)](#Rest-of-the-basment-features-(Bsmt-Half-Bath,-Bsmt-Full-Bath,-Total-Bsmt-SF,-Bsmt-Unf-SF,-BsmtFin-SF-2,-BsmtFin-SF-2,-BsmtFin-SF-1))
  * [Garage Cars and Garage Area](#Garage-Cars-and-Garage-Area)
  * [Review Data](#Review_Data)
* [Export Dataset](#Export-Dataset)
* [Summary](#Summary)

## Importing and Inspecting Train Dataset


In [1]:
#import libraries
import pandas as pd
import numpy as np

In [2]:
#read training dataset
filename = "../datasets/train.csv"
train_unclean = pd.read_csv(filename, index_col="Id")

#inspect first few rows of the dataset
train_unclean.head()

Unnamed: 0_level_0,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,Utilities,...,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
109,533352170,60,RL,,13517,Pave,,IR1,Lvl,AllPub,...,0,0,,,,0,3,2010,WD,130500
544,531379050,60,RL,43.0,11492,Pave,,IR1,Lvl,AllPub,...,0,0,,,,0,4,2009,WD,220000
153,535304180,20,RL,68.0,7922,Pave,,Reg,Lvl,AllPub,...,0,0,,,,0,1,2010,WD,109000
318,916386060,60,RL,73.0,9802,Pave,,Reg,Lvl,AllPub,...,0,0,,,,0,4,2010,WD,174000
255,906425045,50,RL,82.0,14235,Pave,,IR1,Lvl,AllPub,...,0,0,,,,0,3,2010,WD,138500


In [3]:
train_unclean.shape

(2051, 80)

In [4]:
#review datatypes and shape of the data
train_unclean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2051 entries, 109 to 10
Data columns (total 80 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   PID              2051 non-null   int64  
 1   MS SubClass      2051 non-null   int64  
 2   MS Zoning        2051 non-null   object 
 3   Lot Frontage     1721 non-null   float64
 4   Lot Area         2051 non-null   int64  
 5   Street           2051 non-null   object 
 6   Alley            140 non-null    object 
 7   Lot Shape        2051 non-null   object 
 8   Land Contour     2051 non-null   object 
 9   Utilities        2051 non-null   object 
 10  Lot Config       2051 non-null   object 
 11  Land Slope       2051 non-null   object 
 12  Neighborhood     2051 non-null   object 
 13  Condition 1      2051 non-null   object 
 14  Condition 2      2051 non-null   object 
 15  Bldg Type        2051 non-null   object 
 16  House Style      2051 non-null   object 
 17  Overall Qual  

In [5]:
#review summary statistics of the data to see if there is any abnormal values
train_unclean.describe()

Unnamed: 0,PID,MS SubClass,Lot Frontage,Lot Area,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Mas Vnr Area,BsmtFin SF 1,...,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Misc Val,Mo Sold,Yr Sold,SalePrice
count,2051.0,2051.0,1721.0,2051.0,2051.0,2051.0,2051.0,2051.0,2029.0,2050.0,...,2051.0,2051.0,2051.0,2051.0,2051.0,2051.0,2051.0,2051.0,2051.0,2051.0
mean,713590000.0,57.008776,69.0552,10065.208191,6.11214,5.562165,1971.708922,1984.190151,99.695909,442.300488,...,93.83374,47.556802,22.571916,2.591419,16.511458,2.397855,51.574354,6.219893,2007.775719,181469.701609
std,188691800.0,42.824223,23.260653,6742.488909,1.426271,1.104497,30.177889,21.03625,174.963129,461.204124,...,128.549416,66.747241,59.84511,25.229615,57.374204,37.78257,573.393985,2.744736,1.312014,79258.659352
min,526301100.0,20.0,21.0,1300.0,1.0,1.0,1872.0,1950.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2006.0,12789.0
25%,528458100.0,20.0,58.0,7500.0,5.0,5.0,1953.5,1964.5,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,2007.0,129825.0
50%,535453200.0,50.0,68.0,9430.0,6.0,5.0,1974.0,1993.0,0.0,368.0,...,0.0,27.0,0.0,0.0,0.0,0.0,0.0,6.0,2008.0,162500.0
75%,907180100.0,70.0,80.0,11513.5,7.0,6.0,2001.0,2004.0,161.0,733.75,...,168.0,70.0,0.0,0.0,0.0,0.0,0.0,8.0,2009.0,214000.0
max,924152000.0,190.0,313.0,159000.0,10.0,9.0,2010.0,2010.0,1600.0,5644.0,...,1424.0,547.0,432.0,508.0,490.0,800.0,17000.0,12.0,2010.0,611657.0


##### Inspect columns with large number of null values

In [6]:
#columns with total number of null values
null_values = train_unclean.isnull().sum().sort_values(ascending = False)

#look at only those columns with null values 
null_values[null_values>0]

Pool QC           2042
Misc Feature      1986
Alley             1911
Fence             1651
Fireplace Qu      1000
Lot Frontage       330
Garage Qual        114
Garage Finish      114
Garage Cond        114
Garage Yr Blt      114
Garage Type        113
Bsmt Exposure       58
BsmtFin Type 2      56
Bsmt Cond           55
Bsmt Qual           55
BsmtFin Type 1      55
Mas Vnr Type        22
Mas Vnr Area        22
Bsmt Half Bath       2
Bsmt Full Bath       2
Garage Cars          1
Garage Area          1
Total Bsmt SF        1
Bsmt Unf SF          1
BsmtFin SF 2         1
BsmtFin SF 1         1
dtype: int64

* On first impression of the data, there are 80 columns and 2051 rows of entries.
* There exist a significant amount of null values which forms our main focus for cleaning the data.  
* Using the [data description](http://jse.amstat.org/v19n3/decock/DataDocumentation.txt) provided, we see that the data is categorised by **Categorical: 23 nominal, 23 ordinal** and **Numerical: 14 discrete, and 20 continuous variables**.  
* SalePrice is our target variable that we want to predict.  


For our data cleaning in this notebook we will focus on fixing the null values in the **Train** dataset.

## Fixing Null Data
The main strategy that we use to fix the null data is by logical induction by refering to the [data description](http://jse.amstat.org/v19n3/decock/DataDocumentation.txt) provided. We will analyse each column of null values and to fill either 'NONE' or '0' if the data description indicated 'NA' for no features present. We will impute either the mode or mean if features do exist but data is missing for them.


| Column         | Data Type | Number of Null Values | Column Description                                    | NA values in column                            | Action                                                            |
|:----------------|:-----------|:-----------------------|:-------------------------------------------------------|:------------------------------------------------|:-------------------------------------------------------------------|
| Pool QC        | object    | 2042                  | Pool quality                                          | NA is defined as no pool                       | Convert null to 'NA'                                              |
| Misc Feature   | object    | 1986                  | Miscellaneous feature not covered in other categories | NA is defined as no misc features              | Convert null to 'NA'                                              |
| Alley          | object    | 1911                  | Type of alley access to property                      | NA is defined as no alley access               | Convert null to 'NA'                                              |
| Fence          | object    | 1651                  | Fence Quality                                         | NA is defined as no fence                      | Convert null to 'NA'                                              |
| Fireplace Qu   | object    | 1000                  | Fireplace quality                                     | NA is defined as no fireplace                  | Convert null to 'NA'                                              |
| Lot Frontage   | Float     | 330                   | Linear feet of street connected to property           | No other values in column are 0                | Convert null to mean                                              |
| Garage Qual    | object    | 114                   | Garage quality                                        | NA is defined as no garage                     | Convert null to 'NA'                                              |
| Garage Finish  | object    | 114                   | Interior finish of the garage                         | NA is defined as no garage                     | Convert null to 'NA'                                              |
| Garage Cond    | object    | 114                   | Garage condition                                      | NA is defined as no garage                     | Convert null to 'NA'                                              |
| Garage Yr Blt  | Float     | 114                   | Year garage was built                                 | Object type is float. Convert to 0 for no year | Convert null to '0'                                               |
| Garage Type    | object    | 113                   | Garage location                                       | NA is defined as no garage                     | Convert null to 'NA'                                              |
| Bsmt Exposure  | object    | 58                    | Refers to walkout or garden level walls               | NA is defined as no basement                   | Convert null to 'NA'                                              |
| BsmtFin Type 2 | object    | 56                    | Rating of basement finished area (if multiple types)  | NA is defined as no basement                   | Convert null to 'NA'                                              |
| Bsmt Cond      | object    | 55                    | Evaluates the general condition of the basement       | NA is defined as no basement                   | Convert null to 'NA'                                              |
| Bsmt Qual      | object    | 55                    | Evaluates the height of the basement                  | NA is defined as no basement                   | Convert null to 'NA'                                              |
| BsmtFin Type 1 | object    | 55                    | Rating of basement finished area                      | NA is defined as no basement                   | Convert null to 'NA'                                              |
| Mas Vnr Type   | object    | 22                    | Masonry veneer type                                   | None is defined as no mansory                  | Convert null to the mode of the column                            |
| Mas Vnr Area   | Float     | 22                    | Masonry veneer area in square feet                    | '0' has already been entered for no distance   | Convert to mean of the mode of the column                         |
| Bsmt Half Bath | Float     | 2                     | Indication of whether Basement half bathrooms or not  | 1 for half bathrooms, 0 for not                | To review again as there might be relationship with other columns |
| Bsmt Full Bath | Float     | 2                     | Indication of whether Basement full bathrooms or not  | 1 for full bathroom, 0 for not                 | To review again as there might be relationship with other columns |
| Garage Cars    | Float     | 1                     | Size of garage in car capacity                        | 0 for no car capacity                          | To review again as there might be relationship with other columns |
| Garage Area    | Float     | 1                     | Size of garage in square feet                         | 0 for no car capacity                          | To review again as there might be relationship with other columns |
| Total Bsmt SF  | Float     | 1                     | Total square feet of basement area                    | 0 for no basement                              | To review again as there might be relationship with other columns |
| Bsmt Unf SF    | Float     | 1                     | Unfinished square feet of basement area               | 0 for no unfinished square feet of basement    | To review again as there might be relationship with other columns |
| BsmtFin SF 2   | Float     | 1                     | Type 2 finished square feet                           | 0 for no unfinished square feet for type 2     | To review again as there might be relationship with other columns |
| BsmtFin SF 1   | Float     | 1                     | Type 1 finished square feet                           | 0 for no unfinished square feet for type 1     | To review again as there might be relationship with other columns |

###### Pool QC, Misc Feature, Alley, Fence, Fireplace Qu

In [7]:
#'NA' refers to no pool, misc feature, alley, fence and fireplace respectively for these columns
#hence we fill all nulls of these columns to 'NA' category

train_unclean["Pool QC"].fillna('NONE', inplace=True)
train_unclean["Misc Feature"].fillna('NONE', inplace=True)
train_unclean["Alley"].fillna('NONE', inplace=True)
train_unclean["Fence"].fillna('NONE', inplace=True)
train_unclean["Fireplace Qu"].fillna('NONE', inplace=True)

##### Lot Frontage

In [8]:
#look at the minimum for Lot Frontage
train_unclean["Lot Frontage"].describe()

count    1721.000000
mean       69.055200
std        23.260653
min        21.000000
25%        58.000000
50%        68.000000
75%        80.000000
max       313.000000
Name: Lot Frontage, dtype: float64

In [9]:
#as the minimum for lot frontage is 21, we can assume that there wont be any house with 0 lot frontage
#hence we can fill the null values with the mean of the column
train_unclean["Lot Frontage"].fillna(train_unclean["Lot Frontage"].mean(), inplace=True)

##### Garage (Qual, Finish, Cond, Yr Blt, Type)

In [10]:
#checking the 'garage' columns, 'NA' refers to no garage for the object types columns: Qual, Finnish, Cond, Type
#hence we fill all nulls of these columns to 'NA' category
train_unclean["Garage Qual"].fillna('NONE', inplace=True)
train_unclean["Garage Finish"].fillna('NONE', inplace=True)
train_unclean["Garage Cond"].fillna('NONE', inplace=True)
train_unclean["Garage Type"].fillna('NONE', inplace=True) 

In [11]:
#looking at the summary statistic for Garage Yr Blt
train_unclean["Garage Yr Blt"].describe()

count    1937.000000
mean     1978.707796
std        25.441094
min      1895.000000
25%      1961.000000
50%      1980.000000
75%      2002.000000
max      2207.000000
Name: Garage Yr Blt, dtype: float64

In [12]:
#the year in which the garage was built is indicated
#hence we will indicate '0' for null values which indicates no garage was built for a float column
#the largest year is '2207' which could be wrong, we will have to investigate this later
train_unclean["Garage Yr Blt"].fillna('0', inplace=True)

##### Bsmt (Exposure, Fin Type 2, Cond, Qual, Fin Type 1)

In [13]:
#checking the 'bsmt' columns, 'NA' refers to no basement for the object types columns: Exposure, Fin Type2, Cond, Qual, Fin Type1
#hence we fill all nulls of these columns to 'NONE' category
train_unclean["Bsmt Exposure"].fillna('NONE', inplace=True)
train_unclean["BsmtFin Type 2"].fillna('NONE', inplace=True)
train_unclean["Bsmt Cond"].fillna('NONE', inplace=True)
train_unclean["Bsmt Qual"].fillna('NONE', inplace=True)
train_unclean["BsmtFin Type 1"].fillna('NONE', inplace=True)

##### Mas Vnr Type, Mas Vnr Area

In [14]:
#look at the Mas Vnr Type columns
display(train_unclean["Mas Vnr Type"].describe())
display(train_unclean["Mas Vnr Type"].value_counts())

count     2029
unique       4
top       None
freq      1218
Name: Mas Vnr Type, dtype: object

None       1218
BrkFace     630
Stone       168
BrkCmn       13
Name: Mas Vnr Type, dtype: int64

In [15]:
#highest frequency of the column is "None", which refers to no masonry veneer 
#looking at rows that are null for type and area
train_unclean[train_unclean["Mas Vnr Type"].isnull()][["Mas Vnr Type", "Mas Vnr Area"]]

Unnamed: 0_level_0,Mas Vnr Type,Mas Vnr Area
Id,Unnamed: 1_level_1,Unnamed: 2_level_1
2393,,
2383,,
539,,
518,,
2824,,
1800,,
1455,,
1120,,
1841,,
1840,,


In [16]:
#we can see that Mas Vnr Type is linked to Mas Vnr Area
#we will fill these null values with the mode "None", and hence "0" for Mas Vnr Type and Mas Vnr Area columns respectively
train_unclean["Mas Vnr Type"].fillna('None', inplace=True)
train_unclean["Mas Vnr Area"].fillna('0', inplace=True)

##### Rest of the basment features (Bsmt Half Bath, Bsmt Full Bath, Total Bsmt SF, Bsmt Unf SF, BsmtFin SF 2, BsmtFin SF 2, BsmtFin SF 1)

In [17]:
#looking at null values for these columns to see if they have a relationship
train_unclean[train_unclean["Bsmt Half Bath"].isnull()][["Bsmt Half Bath", "Bsmt Full Bath","Total Bsmt SF", "Bsmt Unf SF", "BsmtFin SF 2", "BsmtFin SF 1"]]

Unnamed: 0_level_0,Bsmt Half Bath,Bsmt Full Bath,Total Bsmt SF,Bsmt Unf SF,BsmtFin SF 2,BsmtFin SF 1
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1498,,,0.0,0.0,0.0,0.0
1342,,,,,,


In [18]:
#we can see that all the null values of these columns do have a relationship with each other
#we will check to see if these 2 rows are indeed without basements
train_unclean[train_unclean["Bsmt Half Bath"].isnull()][["Bsmt Half Bath", "Bsmt Full Bath","Bsmt Cond"]]

Unnamed: 0_level_0,Bsmt Half Bath,Bsmt Full Bath,Bsmt Cond
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1498,,,NONE
1342,,,NONE


In [19]:
#since "NA" for Bsmt Cond column means no basement, we can logically deduce that there are no basements for these 2 rows
#we will fill '0' for these 2 rows
train_unclean["Bsmt Half Bath"].fillna('0', inplace=True)
train_unclean["Bsmt Full Bath"].fillna('0', inplace=True)
train_unclean["Total Bsmt SF"].fillna('0', inplace=True)
train_unclean["Bsmt Unf SF"].fillna('0', inplace=True)
train_unclean["BsmtFin SF 2"].fillna('0', inplace=True)
train_unclean["BsmtFin SF 1"].fillna('0', inplace=True)

##### Garage Cars and Garage Area

In [20]:
#similar to the approach above, we review the null values for garage cars and area to see the relationship
train_unclean[train_unclean["Garage Cars"].isnull()][["Garage Cars", "Garage Area", "Garage Cond"]]

Unnamed: 0_level_0,Garage Cars,Garage Area,Garage Cond
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2237,,,NONE


In [21]:
#since "NA" for garage cond means no garage, we can logically deduce that there is no garage for this house
#we will fill '0' for nulls in this house for garage cars and area columns
train_unclean["Garage Cars"].fillna('0', inplace=True)
train_unclean["Garage Area"].fillna('0', inplace=True)

###### Review Data

In [22]:
train_unclean.head()

Unnamed: 0_level_0,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,Utilities,...,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
109,533352170,60,RL,69.0552,13517,Pave,NONE,IR1,Lvl,AllPub,...,0,0,NONE,NONE,NONE,0,3,2010,WD,130500
544,531379050,60,RL,43.0,11492,Pave,NONE,IR1,Lvl,AllPub,...,0,0,NONE,NONE,NONE,0,4,2009,WD,220000
153,535304180,20,RL,68.0,7922,Pave,NONE,Reg,Lvl,AllPub,...,0,0,NONE,NONE,NONE,0,1,2010,WD,109000
318,916386060,60,RL,73.0,9802,Pave,NONE,Reg,Lvl,AllPub,...,0,0,NONE,NONE,NONE,0,4,2010,WD,174000
255,906425045,50,RL,82.0,14235,Pave,NONE,IR1,Lvl,AllPub,...,0,0,NONE,NONE,NONE,0,3,2010,WD,138500


In [23]:
train_unclean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2051 entries, 109 to 10
Data columns (total 80 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   PID              2051 non-null   int64  
 1   MS SubClass      2051 non-null   int64  
 2   MS Zoning        2051 non-null   object 
 3   Lot Frontage     2051 non-null   float64
 4   Lot Area         2051 non-null   int64  
 5   Street           2051 non-null   object 
 6   Alley            2051 non-null   object 
 7   Lot Shape        2051 non-null   object 
 8   Land Contour     2051 non-null   object 
 9   Utilities        2051 non-null   object 
 10  Lot Config       2051 non-null   object 
 11  Land Slope       2051 non-null   object 
 12  Neighborhood     2051 non-null   object 
 13  Condition 1      2051 non-null   object 
 14  Condition 2      2051 non-null   object 
 15  Bldg Type        2051 non-null   object 
 16  House Style      2051 non-null   object 
 17  Overall Qual  

## Export Dataset

In [24]:
# exporting cleaned data with no nulls
filepath = "../datasets/train_clean.csv"
train_unclean.to_csv(filepath)

## Summary

We have cleaned all null values based on analysis of each column without losing any data.
In the next section we will adopt the same cleaning approach for **Test** dataset