<h1 style='text-align: center'>🏡 EDA - House Prices - Advanced Regression Techniques 🏘</h1>

<p  style='text-align: center'>
This notebook is in <span style='color: green; font-weight: 700'>Active</span> state of development! Check out this notebook to see some updates as I update new stuff as oftern as I learn it!
<a style='font-weight:700' href='https://github.com/LilDataScientist'> Code on GitHub! </a></p>

<div style='text-align: center'>
    <img src='https://i.postimg.cc/Y0FYKBqY/house.jpg' width='600' />
</div>

In [None]:
import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt

import h2o

from h2o.automl import H2OAutoML

sns.set_theme()
sns.set_palette('muted')

h2o.init()

In [None]:
df = pd.read_csv('../input/house-prices-advanced-regression-techniques/train.csv')

df

In [None]:
def categorial_feature_overview(feature, rotation=0):
    print(feature, 'has', df[feature].isnull().sum() / len(df) * 100, '% of null values')
    f,ax = plt.subplots(1, 2, figsize=(20, 6))
    ax[0].tick_params(labelrotation=rotation)
    ax[1].tick_params(labelrotation=rotation)
    sns.countplot(data=df, x=feature, ax=ax[0]);
    sns.boxplot(data=df, x=feature, y='SalePrice', ax=ax[1])
    plt.show()
    
def numerical_feature_overview(feature, rotation=0):
    print(feature, 'has', df[feature].isnull().sum() / len(df) * 100, '% of null values')
    f,ax = plt.subplots(1, 2, figsize=(20, 6))
    ax[0].tick_params(labelrotation=rotation)
    ax[1].tick_params(labelrotation=rotation)
    sns.scatterplot(data=df, x=feature, y='SalePrice', ax=ax[0]);
    sns.boxplot(data=df, x=feature, ax=ax[1])
    plt.show()

# MSSubClass

Identifies the type of dwelling involved in the sale.

* 20 - 1-STORY 1946 & NEWER ALL STYLES
* 30 - 1-STORY 1945 & OLDER
* 40 - 1-STORY W/FINISHED ATTIC ALL AGES
* 45 - 1-1/2 STORY - UNFINISHED ALL AGES
* 50 - 1-1/2 STORY FINISHED ALL AGES
* 60 - 2-STORY 1946 & NEWER
* 70 - 2-STORY 1945 & OLDER
* 75 - 2-1/2 STORY ALL AGES
* 80 - SPLIT OR MULTI-LEVEL
* 85 - SPLIT FOYER
* 90 - DUPLEX - ALL STYLES AND AGES
* 120 - 1-STORY PUD (Planned Unit Development) - 1946 & NEWER
* 150 - 1-1/2 STORY PUD - ALL AGES
* 160 - 2-STORY PUD - 1946 & NEWER
* 180 - PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
* 190 - 2 FAMILY CONVERSION - ALL STYLES AND AGES

In [None]:
categorial_feature_overview('MSSubClass')

# MSZoning

Identifies the general zoning classification of the sale.

* A - Agriculture
* C - Commercial
* FV - Floating Village Residential
* I - Industrial
* RH - Residential High Density
* RL - Residential Low Density
* RP - Residential Low Density Park 
* RM - Residential Medium Density

In [None]:
categorial_feature_overview('MSZoning')

# LotFrontage

Linear feet of street connected to property

In [None]:
numerical_feature_overview('LotFrontage')

# LotArea

Lot size in square feet

In [None]:
numerical_feature_overview('LotArea')

# Street

Type of road access to property

* Grvl - Gravel
* Pave - Paved

In [None]:
categorial_feature_overview('Street')

# Alley

Type of alley access to property

* Grvl - Gravel
* Pave - Paved
* NA - No alley access

In [None]:
categorial_feature_overview('Alley')

# LotShape

General shape of property

* Reg - Regular
* IR1 - Slightly irregular
* IR2 - Moderately Irregular
* IR3 - Irregular

In [None]:
categorial_feature_overview('LotShape')

# LandContour

Flatness of the property

* Lvl - Near Flat/Level
* Bnk - Banked - Quick and significant rise from street grade to building
* HLS - Hillside - Significant slope from side to side
* Low - Depression

In [None]:
categorial_feature_overview('LandContour')

# Utilities

Type of utilities available

* AllPub - All public Utilities (E,G,W,& S)
* NoSewr - Electricity, Gas, and Water (Septic Tank)
* NoSeWa - Electricity and Gas Only
* ELO - Electricity only

In [None]:
categorial_feature_overview('Utilities')

# LotConfig

Lot configuration

* Inside - Inside lot
* Corner - Corner lot
* CulDSac - Cul-de-sac
* FR2 - Frontage on 2 sides of property
* FR3 - Frontage on 3 sides of property

In [None]:
categorial_feature_overview('LotConfig')

# LandSlope

#### Slope of property

* Gtl Gentle slope
* Mod Moderate Slope
* Sev Severe Slope

In [None]:
categorial_feature_overview('LandSlope')

# Neighborhood

Physical locations within Ames city limits

* Blmngtn - Bloomington Heights
* Blueste - Bluestem
* BrDale - Briardale
* BrkSide - Brookside
* ClearCr - Clear Creek
* CollgCr - College Creek
* Crawfor - Crawford
* Edwards - Edwards
* Gilbert - Gilbert
* IDOTRR - Iowa DOT and Rail Road
* MeadowV - Meadow Village
* Mitchel - Mitchell
* Names - North Ames
* NoRidge - Northridge
* NPkVill - Northpark Villa
* NridgHt - Northridge Heights
* NWAmes - Northwest Ames
* OldTown - Old Town
* SWISU - South & West of Iowa State University
* Sawyer - Sawyer
* SawyerW - Sawyer West
* Somerst - Somerset
* StoneBr - Stone Brook
* Timber - Timberland
* Veenker - Veenker

In [None]:
categorial_feature_overview('Neighborhood', rotation=45)

# Condition1

Proximity to various conditions

* Artery - Adjacent to arterial street
* Feedr - Adjacent to feeder street
* Norm - Normal
* RRNn - Within 200' of North-South Railroad
* RRAn - Adjacent to North-South Railroad
* PosN - Near positive off-site feature--park, greenbelt, etc.
* PosA - Adjacent to postive off-site feature
* RRNe - Within 200' of East-West Railroad
* RRAe - Adjacent to East-West Railroad

In [None]:
categorial_feature_overview('Condition1')

# Condition2

Proximity to various conditions (if more than one is present)

* Artery - Adjacent to arterial street
* Feedr - Adjacent to feeder street
* Norm - Normal
* RRNn - Within 200' of North-South Railroad
* RRAn - Adjacent to North-South Railroad
* PosN - Near positive off-site feature--park, greenbelt, etc.
* PosA - Adjacent to postive off-site feature
* RRNe - Within 200' of East-West Railroad
* RRAe - Adjacent to East-West Railroad

In [None]:
categorial_feature_overview('Condition2')

# RoofStyle

Type of roof

* Flat - Flat
* Gable - Gable
* Gambrel - Gabrel (Barn)
* Hip - Hip
* Mansard - Mansard
* Shed - Shed

In [None]:
categorial_feature_overview('RoofStyle')

# RoofMatl

Roof material

* ClyTile - Clay or Tile
* CompShg - Standard (Composite) Shingle
* Membran - Membrane
* Metal - Metal
* Roll - Roll
* Tar&Grv - Gravel & Tar
* WdShake - Wood Shakes
* WdShngl - Wood Shingles

In [None]:
categorial_feature_overview('RoofMatl')

# Exterior1st

Exterior covering on house

* AsbShng - Asbestos Shingles
* AsphShn - Asphalt Shingles
* BrkComm - Brick Common
* BrkFace - Brick Face
* CBlock - Cinder Block
* CemntBd - Cement Board
* HdBoard - Hard Board
* ImStucc - Imitation Stucco
* MetalSd - Metal Siding
* Other - Other
* Plywood - Plywood
* PreCast - PreCast
* Stone - Stone
* Stucco - Stucco
* VinylSd - Vinyl Siding
* Wd Sdng - Wood Siding
* WdShing - Wood Shingles

In [None]:
categorial_feature_overview('Exterior1st')

# Exterior2nd

Exterior covering on house (if more than one material)

* AsbShng - Asbestos Shingles
* AsphShn - Asphalt Shingles
* BrkComm - Brick Common
* BrkFace - Brick Face
* CBlock - Cinder Block
* CemntBd - Cement Board
* HdBoard - Hard Board
* ImStucc - Imitation Stucco
* MetalSd - Metal Siding
* Other - Other
* Plywood - Plywood
* PreCast - PreCast
* Stone - Stone
* Stucco - Stucco
* VinylSd - Vinyl Siding
* Wd Sdng - Wood Siding
* WdShing - Wood Shingles

In [None]:
categorial_feature_overview('Exterior2nd')

# MasVnrType

Masonry veneer type

* BrkCmn - Brick Common
* BrkFace - Brick Face
* CBlock - Cinder Block
* None - None
* Stone - Stone

In [None]:
categorial_feature_overview('MasVnrType')

# MasVnrArea

Masonry veneer area in square feet

In [None]:
numerical_feature_overview('MasVnrArea')

# ExterQual

Evaluates the quality of the material on the exterior 

* Ex - Excellent
* Gd - Good
* TA - Average/Typical
* Fa - Fair
* Po - Poor

In [None]:
categorial_feature_overview('ExterQual')

# ExterCond

Evaluates the present condition of the material on the exterior

* Ex - Excellent
* Gd - Good
* TA - Average/Typical
* Fa - Fair
* Po - Poor

In [None]:
categorial_feature_overview('ExterCond')

# Foundation

Type of foundation

* BrkTil - Brick & Tile
* CBlock - Cinder Block
* PConc - Poured Contrete
* Slab - Slab
* Stone - Stone
* Wood - Wood

In [None]:
categorial_feature_overview('Foundation')

# BsmtQual

Evaluates the height of the basement

* Ex - Excellent (100+ inches)
* Gd - Good (90-99 inches)
* TA - Typical (80-89 inches)
* Fa - Fair (70-79 inches)
* Po - Poor (<70 inches
* NA - No Basement

In [None]:
categorial_feature_overview('BsmtQual')

# BsmtCond

Evaluates the general condition of the basement

* Ex - Excellent
* Gd - Good
* TA - Typical - slight dampness allowed
* Fa - Fair - dampness or some cracking or settling
* Po - Poor - Severe cracking, settling, or wetness
* NA - No Basement

In [None]:
categorial_feature_overview('BsmtCond')

# BsmtExposure

Refers to walkout or garden level walls

* Gd - Good Exposure
* Av - Average Exposure (split levels or foyers typically score average or above)	
* Mn - Mimimum Exposure
* No - No Exposure
* NA - No Basement

In [None]:
categorial_feature_overview('BsmtExposure')

# BsmtFinType1

Rating of basement finished area

* GLQ - Good Living Quarters
* ALQ - Average Living Quarters
* BLQ - Below Average Living Quarters
* Rec - Average Rec Room
* LwQ - Low Quality
* Unf - Unfinshed
* NA - No Basement

In [None]:
categorial_feature_overview('BsmtFinType1')

# BsmtFinType2

Rating of basement finished area (if multiple types)

* GLQ - Good Living Quarters
* ALQ - Average Living Quarters
* BLQ - Below Average Living Quarters
* Rec - Average Rec Room
* LwQ - Low Quality
* Unf - Unfinshed
* NA - No Basement

In [None]:
categorial_feature_overview('BsmtFinType2')

# BsmtFinSF1

Type 1 finished square feet

In [None]:
numerical_feature_overview('BsmtFinSF1')

# BsmtFinSF2

Type 2 finished square feet

In [None]:
numerical_feature_overview('BsmtFinSF2')

# BsmtUnfSF

Unfinished square feet of basement area

In [None]:
numerical_feature_overview('BsmtUnfSF')

# TotalBsmtSF

Total square feet of basement area

In [None]:
numerical_feature_overview('TotalBsmtSF')

# BedroomAbvGr

In [None]:
categorial_feature_overview('BedroomAbvGr')

# KitchenAbvGr

In [None]:
categorial_feature_overview('KitchenAbvGr')

# KitchenQual

Kitchen quality

* Ex - 	Excellent
* Gd - Good
* TA - Typical/Average
* Fa - Fair
* Po - Poor

In [None]:
categorial_feature_overview('KitchenQual')

# TotRmsAbvGrd

Total rooms above grade (does not include bathrooms)

In [None]:
categorial_feature_overview('TotRmsAbvGrd')

# Functional

Home functionality (Assume typical unless deductions are warranted)

* Typ - Typical Functionality
* Min1 - Minor Deductions 1
* Min2 - Minor Deductions 2
* Mod - Moderate Deductions
* Maj1 - Major Deductions 1
* Maj2 - Major Deductions 2
* Sev - Severely Damaged
* Sal - Salvage only

In [None]:
categorial_feature_overview('Functional')

# Fireplaces

Number of fireplaces

In [None]:
categorial_feature_overview('Fireplaces')

# FireplaceQu

Fireplace quality

* Ex - Excellent - Exceptional Masonry Fireplace
* Gd - Good - Masonry Fireplace in main level
* TA - Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
* Fa - Fair - Prefabricated Fireplace in basement
* Po - Poor - Ben Franklin Stove
* NA - No Fireplace

In [None]:
categorial_feature_overview('FireplaceQu')

# GarageType

Garage location

* 2Types - More than one type of garage
* Attchd - Attached to home
* Basment - Basement Garage
* BuiltIn - Built-In (Garage part of house - typically has room above garage)
* CarPort - Car Port
* Detchd - Detached from home
* NA - No Garage

In [None]:
categorial_feature_overview('GarageType')

# GarageYrBlt

Year garage was built

In [None]:
numerical_feature_overview('GarageYrBlt')

# GarageFinish

Interior finish of the garage

* Fin - Finished
* RFn - Rough Finished
* Unf - Unfinished
* NA - No Garage

In [None]:
categorial_feature_overview('GarageFinish')

# GarageCars

Size of garage in car capacity

In [None]:
categorial_feature_overview('GarageCars')

# GarageArea

Size of garage in square feet

In [None]:
numerical_feature_overview('GarageArea')

# GarageQual

Garage quality

* Ex - Excellent
* Gd - Good
* TA - Typical/Average
* Fa - Fair
* Po - Poor
* NA - No Garage

In [None]:
categorial_feature_overview('GarageCond')

# GarageCond

Garage condition

* Ex - Excellent
* Gd - Good
* TA - Typical/Average
* Fa - Fair
* Po - Poor
* NA - No Garage

In [None]:
categorial_feature_overview('GarageCond')

# PavedDrive

Paved driveway

* Y - Paved 
* P - Partial Pavement
* N - Dirt/Gravel

In [None]:
categorial_feature_overview('PavedDrive')

# BldgType

Type of dwelling

* 1Fam - Single-family Detached
* 2FmCon - Two-family Conversion; originally built as one-family dwelling
* Duplx - Duplex
* TwnhsE - Townhouse End Unit
* TwnhsI - Townhouse Inside Unit

In [None]:
categorial_feature_overview('BldgType')

# HouseStyle

Style of dwelling
* 1Story - One story
* 1.5Fin - One and one-half story: 2nd level finished
* 1.5Unf - One and one-half story: 2nd level unfinished
* 2Story - Two story
* 2.5Fin - Two and one-half story: 2nd level finished
* 2.5Unf - Two and one-half story: 2nd level unfinished
* SFoyer - Split Foyer
* SLvl - Split Level

In [None]:
categorial_feature_overview('HouseStyle')

# OverallQual

Rates the overall material and finish of the house

* 10 - Very Excellent
* 9 - Excellent
* 8 - Very Good
* 7 - Good
* 6 - Above Average
* 5 - Average
* 4 - Below Average
* 3 - Fair
* 2 - Poor
* 1 - Very Poor

In [None]:
categorial_feature_overview('OverallQual')

# OverallCond

Rates the overall condition of the house

* 10 - Very Excellent
* 9 - Excellent
* 8 - Very Good
* 7 - Good
* 6 - Above Average
* 5 - Average
* 4 - Below Average
* 3 - Fair
* 2 - Poor
* 1 - Very Poor

In [None]:
categorial_feature_overview('OverallCond')

# YearBuilt

Original construction date

In [None]:
numerical_feature_overview('YearBuilt')

# YearRemodAdd

Remodel date (same as construction date if no remodeling or additions)

In [None]:
numerical_feature_overview('YearRemodAdd')

# Heating
Type of heating

* Floor - Floor Furnace
* GasA - Gas forced warm air furnace
* GasW - Gas hot water or steam heat
* Grav - Gravity furnace
* OthW - Hot water or steam heat other than gas
* Wall - Wall furnace


In [None]:
categorial_feature_overview('Heating')

# HeatingQC

Heating quality and condition

* Ex - Excellent
* Gd - Good
* TA - Average/Typical
* Fa - Fair
* Po - Poor

In [None]:
categorial_feature_overview('HeatingQC')

# CentralAir

Central air conditioning

 * N - No
 * Y - Yes

In [None]:
categorial_feature_overview('CentralAir')

# Electrical

Electrical system

* SBrkr - Standard Circuit Breakers & Romex
* FuseA - Fuse Box over 60 AMP and all Romex wiring (Average)
* FuseF - 60 AMP Fuse Box and mostly Romex wiring (Fair)
* FuseP - 60 AMP Fuse Box and mostly knob & tube wiring (poor)
* Mix - Mixed

In [None]:
categorial_feature_overview('Electrical')

# 1stFlrSF

First Floor square feet

In [None]:
numerical_feature_overview('1stFlrSF')

# 2ndFlrSF

Second floor square feet

In [None]:
numerical_feature_overview('2ndFlrSF')

# LowQualFinSF

Low quality finished square feet (all floors)

In [None]:
numerical_feature_overview('LowQualFinSF')

# GrLivArea

Above grade (ground) living area square feet

In [None]:
numerical_feature_overview('GrLivArea')

# BsmtFullBath

Basement full bathrooms

In [None]:
categorial_feature_overview('BsmtFullBath')

# BsmtHalfBath

Basement half bathrooms

In [None]:
categorial_feature_overview('BsmtHalfBath')

# FullBath

Full bathrooms above grade

In [None]:
categorial_feature_overview('FullBath')

# HalfBath

Half baths above grade

In [None]:
categorial_feature_overview('HalfBath')

# WoodDeckSF

Wood deck area in square feet

In [None]:
numerical_feature_overview('WoodDeckSF')

# OpenPorchSF

Open porch area in square feet

In [None]:
numerical_feature_overview('OpenPorchSF')

# EnclosedPorch

Enclosed porch area in square feet

In [None]:
numerical_feature_overview('EnclosedPorch')

# 3SsnPorch

Three season porch area in square feet

In [None]:
numerical_feature_overview('3SsnPorch')

# ScreenPorch

Screen porch area in square feet

In [None]:
numerical_feature_overview('ScreenPorch')

# PoolArea

Pool area in square feet

In [None]:
numerical_feature_overview('PoolArea')

# PoolQC

Pool quality

* Ex - Excellent
* Gd - Good
* TA - Average/Typical
* Fa - Fair
* NA - No Pool

In [None]:
categorial_feature_overview('PoolQC')

# Fence

Fence quality

* GdPrvA - Good Privacy
* MnPrvA - Minimum Privacy
* GdWoA - Good Wood
* MnWwA - Minimum Wood/Wire
* NA - No Fence


In [None]:
categorial_feature_overview('Fence')

# MiscFeature

Miscellaneous feature not covered in other categories

* Elev - Elevator
* Gar2 - 2nd Garage (if not described in garage section)
* Othr - Other
* Shed - Shed (over 100 SF)
* TenC - Tennis Court
* NA - None

In [None]:
categorial_feature_overview('MiscFeature')

# MiscVal

$Value of miscellaneous feature

In [None]:
numerical_feature_overview('MiscVal')

# MoSold

Month Sold (MM)

In [None]:
categorial_feature_overview('MoSold')

# YrSold

Year Sold (YYYY)

In [None]:
categorial_feature_overview('YrSold')

# SaleType

Type of sale

* WD - Warranty Deed - Conventional
* CWD - Warranty Deed - Cash
* VWD - Warranty Deed - VA Loan
* New - Home just constructed and sold
* COD - Court Officer Deed/Estate
* Con - Contract 15% Down payment regular terms
* ConLw - Contract Low Down payment and low interest
* ConLI - Contract Low Interest
* ConLD - Contract Low Down
* Oth - Other

In [None]:
categorial_feature_overview('SaleType')

# SaleCondition

Condition of sale

* Normal - Normal Sale
* Abnorml - Abnormal Sale -  trade, foreclosure, short sale
* AdjLand - Adjoining Land Purchase
* Alloca - Allocation - two linked properties with separate deeds, typically condo with a garage unit
* Family - Sale between family members
* Partial - Home was not completed when last assessed (associated with New Homes)

In [None]:
categorial_feature_overview('SaleCondition')

In [None]:
hf = h2o.H2OFrame(df)

x = list(df.columns.difference(['Id', 'SalePrice']).values)

y = "SalePrice"

aml = H2OAutoML(max_models=100, seed=1)
aml.train(x=x, y=y, training_frame=hf)


lb = aml.leaderboard
lb.head(rows=lb.nrows)  

In [None]:
df_test = pd.read_csv('../input/house-prices-advanced-regression-techniques/test.csv')

hf_test = h2o.H2OFrame(df_test)

x = list(df.columns.difference(['Id']).values)

preds = aml.predict(hf_test);

h2o.h2o.download_csv(preds, 'h20_output.csv')

df_SalePrice = pd.read_csv('./h20_output.csv')

output_df = pd.DataFrame({
    'Id': df_test['Id'],
    'SalePrice': df_SalePrice['predict']
})

output_df.to_csv('output.csv', index=False)