# B''H

## House Prices - EDA

**Step 2: Do an initial inspection of the data to categorize the dependent and independent variables.**

See the **`step-02-initial-peek`** notebook for details.    

**Key Takeaway:** 
   
- There is a total of 80 variables:

| Variable Type                                           | Count |
| ------------------------------------------------------- | ----- |
| dependent variables                                     | 1 |   
| independent numerical continuous and discrete variables | 28 |   
| independent numerical ordinal variables                 | 2 |
| independent numerical interval variables                | 5 |
| independent categorical text variables                  | 44 |

In [1]:
import os
import sys

import math

import numpy as np
import pandas as pd

from scipy import stats

import matplotlib.pyplot as plt

import seaborn as sns

---
## Set the plot output sizes

In [2]:
# Get current size
fig_size = plt.rcParams["figure.figsize"]
 
# Prints: [8.0, 6.0]
print ("Prior size:", fig_size)
 
# Set figure width to 12 and height to 9
fig_size[0] = 12
fig_size[1] = 9
plt.rcParams["figure.figsize"] = fig_size

print ("Current size:", fig_size)

Prior size: [6.0, 4.0]
Current size: [12, 9]


---
## Get project info

In [3]:
NOTEBOOKS_DIR = os.path.join(os.pardir)

print(os.path.abspath(NOTEBOOKS_DIR))

/home/laz/repos/springboard-mini-projects/notebooks


In [4]:
PROJ_ROOT = os.path.join(NOTEBOOKS_DIR,os.pardir)

print(os.path.abspath(PROJ_ROOT))

/home/laz/repos/springboard-mini-projects


In [5]:
# add the 'src' directory as one where we can import modules
SRC_DIR = os.path.join(PROJ_ROOT, 'src')
sys.path.append(SRC_DIR)

print(os.path.abspath(SRC_DIR))

/home/laz/repos/springboard-mini-projects/src


In [6]:
# Load the "autoreload" extension
%load_ext autoreload

# always reload modules marked with "%aimport"
%autoreload 1

# import my method from the source code
%aimport helper_functions
import helper_functions as hf

---
### Import the data:

In [7]:
df_train = pd.read_csv('~/.kaggle/competitions/house-prices-advanced-regression-techniques/train.csv')

df_train.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [8]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
Id               1460 non-null int64
MSSubClass       1460 non-null int64
MSZoning         1460 non-null object
LotFrontage      1201 non-null float64
LotArea          1460 non-null int64
Street           1460 non-null object
Alley            91 non-null object
LotShape         1460 non-null object
LandContour      1460 non-null object
Utilities        1460 non-null object
LotConfig        1460 non-null object
LandSlope        1460 non-null object
Neighborhood     1460 non-null object
Condition1       1460 non-null object
Condition2       1460 non-null object
BldgType         1460 non-null object
HouseStyle       1460 non-null object
OverallQual      1460 non-null int64
OverallCond      1460 non-null int64
YearBuilt        1460 non-null int64
YearRemodAdd     1460 non-null int64
RoofStyle        1460 non-null object
RoofMatl         1460 non-null object
Exterior1st      1460 non-n

---

## DEPENDENT VARIABLE(S):
1. `SalePrice`

---

## INDEPENDENT NUMERICAL CONTINUOUS AND DISCRETE VARIABLES:

1. `LotFrontage`: Linear feet of street connected to property

2. `LotArea`: Lot size in square feet

3. `MasVnrArea`: Masonry veneer area in square feet

4. `BsmtFinSF1`: Type 1 finished square feet

5. `BsmtFinSF2`: Type 2 finished square feet

6. `BsmtUnfSF`: Unfinished square feet of basement area

7. `TotalBsmtSF`: Total square feet of basement area

8. `1stFlrSF`: First Floor square feet

9. `2ndFlrSF`: Second floor square feet

10. `LowQualFinSF`: Low quality finished square feet (all floors)

11. `GrLivArea`: Above grade (ground) living area square feet

12. `BsmtFullBath`: Basement full bathrooms

13. `BsmtHalfBath`: Basement half bathrooms

14. `FullBath`: Full bathrooms above grade

15. `HalfBath`: Half baths above grade

16. `Bedroom`: Bedrooms above grade (does NOT include basement bedrooms)

17. `Kitchen`: Kitchens above grade

18. `TotRmsAbvGrd`: Total rooms above grade (does not include bathrooms)		

19. `Fireplaces`: Number of fireplaces

20. `GarageCars`: Size of garage in car capacity

21. `GarageArea`: Size of garage in square feet

22. `WoodDeckSF`: Wood deck area in square feet

23. `OpenPorchSF`: Open porch area in square feet

24. `EnclosedPorch`: Enclosed porch area in square feet

25. `3SsnPorch`: Three season porch area in square feet

26. `ScreenPorch`: Screen porch area in square feet

27. `PoolArea`: Pool area in square feet

28. `MiscVal`: $Value of miscellaneous feature

---

## INDEPENDENT NUMERICAL ORDINAL VARIABLES:

1. `OverallQual`: Rates the overall material and finish of the house
```
       10	Very Excellent
       9	Excellent
       8	Very Good
       7	Good
       6	Above Average
       5	Average
       4	Below Average
       3	Fair
       2	Poor
       1	Very Poor
```	
2. `OverallCond`: Rates the overall condition of the house
```
       10	Very Excellent
       9	Excellent
       8	Very Good
       7	Good
       6	Above Average	
       5	Average
       4	Below Average	
       3	Fair
       2	Poor
       1	Very Poor
```

---

## INDEPENDENT NUMERICAL INTERVAL VARIABLES:

1. `YearBuilt`: Original construction date

2. `YearRemodAdd`: Remodel date (same as construction date if no remodeling or additions)

3. `GarageYrBlt`: Year garage was built

4. `MoSold`: Month Sold (MM)

5. `YrSold`: Year Sold (YYYY)


## INDEPENDENT CATEGORICAL TEXT VARIABLES:
Note, some of these can be converted into numerical ordinal variables, such as `ExterQual`, `ExterCond`, etc.

1. `MSSubClass`: Identifies the type of dwelling involved in the sale.	

```
        20	1-STORY 1946 & NEWER ALL STYLES
        30	1-STORY 1945 & OLDER
        40	1-STORY W/FINISHED ATTIC ALL AGES
        45	1-1/2 STORY - UNFINISHED ALL AGES
        50	1-1/2 STORY FINISHED ALL AGES
        60	2-STORY 1946 & NEWER
        70	2-STORY 1945 & OLDER
        75	2-1/2 STORY ALL AGES
        80	SPLIT OR MULTI-LEVEL
        85	SPLIT FOYER
        90	DUPLEX - ALL STYLES AND AGES
       120	1-STORY PUD (Planned Unit Development) - 1946 & NEWER
       150	1-1/2 STORY PUD - ALL AGES
       160	2-STORY PUD - 1946 & NEWER
       180	PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
       190	2 FAMILY CONVERSION - ALL STYLES AND AGES
```

2. `MSZoning`: Identifies the general zoning classification of the sale.

```		
       A	Agriculture
       C	Commercial
       FV	Floating Village Residential
       I	Industrial
       RH	Residential High Density
       RL	Residential Low Density
       RP	Residential Low Density Park 
       RM	Residential Medium Density
```

3. `Street`: Type of road access to property
```
       Grvl	Gravel	
       Pave	Paved
```       	
4. `Alley`: Type of alley access to property
```
       Grvl	Gravel
       Pave	Paved
       NA 	No alley access
```		
5. `LotShape`: General shape of property
```
       Reg	Regular	
       IR1	Slightly irregular
       IR2	Moderately Irregular
       IR3	Irregular
```       
6. `LandContour`: Flatness of the property
```
       Lvl	Near Flat/Level	
       Bnk	Banked - Quick and significant rise from street grade to building
       HLS	Hillside - Significant slope from side to side
       Low	Depression
```		
7. `Utilities`: Type of utilities available
```		
       AllPub	All public Utilities (E,G,W,& S)	
       NoSewr	Electricity, Gas, and Water (Septic Tank)
       NoSeWa	Electricity and Gas Only
       ELO   	Electricity only	
```	
8. `LotConfig`: Lot configuration
```
       Inside 	Inside lot
       Corner 	Corner lot
       CulDSac	Cul-de-sac
       FR2    	Frontage on 2 sides of property
       FR3    	Frontage on 3 sides of property
```	
9. `LandSlope`: Slope of property
```		
       Gtl	Gentle slope
       Mod	Moderate Slope	
       Sev	Severe Slope
```	
10. `Neighborhood`: Physical locations within Ames city limits
```
       Blmngtn	Bloomington Heights
       Blueste	Bluestem
       BrDale	Briardale
       BrkSide	Brookside
       ClearCr	Clear Creek
       CollgCr	College Creek
       Crawfor	Crawford
       Edwards	Edwards
       Gilbert	Gilbert
       IDOTRR	Iowa DOT and Rail Road
       MeadowV	Meadow Village
       Mitchel	Mitchell
       Names	North Ames
       NoRidge	Northridge
       NPkVill	Northpark Villa
       NridgHt	Northridge Heights
       NWAmes	Northwest Ames
       OldTown	Old Town
       SWISU	South & West of Iowa State University
       Sawyer	Sawyer
       SawyerW	Sawyer West
       Somerst	Somerset
       StoneBr	Stone Brook
       Timber	Timberland
       Veenker	Veenker
```			
11. `Condition1`: Proximity to various conditions
```
       Artery	Adjacent to arterial street
       Feedr 	Adjacent to feeder street	
       Norm  	Normal	
       RRNn  	Within 200' of North-South Railroad
       RRAn  	Adjacent to North-South Railroad
       PosN  	Near positive off-site feature--park, greenbelt, etc.
       PosA  	Adjacent to postive off-site feature
       RRNe  	Within 200' of East-West Railroad
       RRAe  	Adjacent to East-West Railroad
```	
12. `Condition2`: Proximity to various conditions (if more than one is present)
```		
       Artery	Adjacent to arterial street
       Feedr 	Adjacent to feeder street	
       Norm  	Normal	
       RRNn  	Within 200' of North-South Railroad
       RRAn  	Adjacent to North-South Railroad
       PosN  	Near positive off-site feature--park, greenbelt, etc.
       PosA  	Adjacent to postive off-site feature
       RRNe  	Within 200' of East-West Railroad
       RRAe  	Adjacent to East-West Railroad
```	
13. `BldgType`: Type of dwelling
```		
       1Fam     Single-family Detached	
       2FmCon   Two-family Conversion; originally built as one-family dwelling
       Duplx    Duplex
       TwnhsE   Townhouse End Unit
       TwnhsI   Townhouse Inside Unit
```	
14. `HouseStyle`: Style of dwelling
```
       1Story	One story
       1.5Fin	One and one-half story: 2nd level finished
       1.5Unf	One and one-half story: 2nd level unfinished
       2Story	Two story
       2.5Fin	Two and one-half story: 2nd level finished
       2.5Unf	Two and one-half story: 2nd level unfinished
       SFoyer	Split Foyer
       SLvl     Split Level
```	
15. `RoofStyle`: Type of roof
```
       Flat   	Flat
       Gable  	Gable
       Gambrel	Gabrel (Barn)
       Hip    	Hip
       Mansard	Mansard
       Shed   	Shed
```		
16. `RoofMatl`: Roof material
```
       ClyTile	Clay or Tile
       CompShg	Standard (Composite) Shingle
       Membran	Membrane
       Metal	Metal
       Roll     Roll
       Tar&Grv	Gravel & Tar
       WdShake	Wood Shakes
       WdShngl	Wood Shingles
```		
17. `Exterior1st`: Exterior covering on house
```
       AsbShng	Asbestos Shingles
       AsphShn	Asphalt Shingles
       BrkComm	Brick Common
       BrkFace	Brick Face
       CBlock	Cinder Block
       CemntBd	Cement Board
       HdBoard	Hard Board
       ImStucc	Imitation Stucco
       MetalSd	Metal Siding
       Other	Other
       Plywood	Plywood
       PreCast	PreCast	
       Stone	Stone
       Stucco	Stucco
       VinylSd	Vinyl Siding
       Wd Sdng	Wood Siding
       WdShing	Wood Shingles
```	
18. `Exterior2nd`: Exterior covering on house (if more than one material)
```
       AsbShng	Asbestos Shingles
       AsphShn	Asphalt Shingles
       BrkComm	Brick Common
       BrkFace	Brick Face
       CBlock	Cinder Block
       CemntBd	Cement Board
       HdBoard	Hard Board
       ImStucc	Imitation Stucco
       MetalSd	Metal Siding
       Other	Other
       Plywood	Plywood
       PreCast	PreCast
       Stone	Stone
       Stucco	Stucco
       VinylSd	Vinyl Siding
       Wd Sdng	Wood Siding
       WdShing	Wood Shingles
```	
19. `MasVnrType`: Masonry veneer type
```
       BrkCmn   Brick Common
       BrkFace  Brick Face
       CBlock   Cinder Block
       None     None
       Stone    Stone
```
20. `ExterQual`: Evaluates the quality of the material on the exterior 
```		
       Ex	Excellent
       Gd	Good
       TA	Average/Typical
       Fa	Fair
       Po	Poor
```		
21. `ExterCond`: Evaluates the present condition of the material on the exterior
```		
       Ex	Excellent
       Gd	Good
       TA	Average/Typical
       Fa	Fair
       Po	Poor
```		
22. `Foundation`: Type of foundation
```		
       BrkTil  Brick & Tile
       CBlock  Cinder Block
       PConc   Poured Contrete	
       Slab    Slab
       Stone   Stone
       Wood    Wood
```		
23. `BsmtQual`: Evaluates the height of the basement
```
       Ex	Excellent (100+ inches)	
       Gd	Good (90-99 inches)
       TA	Typical (80-89 inches)
       Fa	Fair (70-79 inches)
       Po	Poor (<70 inches
       NA	No Basement
```		
24. `BsmtCond`: Evaluates the general condition of the basement
```
       Ex	Excellent
       Gd	Good
       TA	Typical - slight dampness allowed
       Fa	Fair - dampness or some cracking or settling
       Po	Poor - Severe cracking, settling, or wetness
       NA	No Basement
```	
25. `BsmtExposure`: Refers to walkout or garden level walls
```
       Gd	Good Exposure
       Av	Average Exposure (split levels or foyers typically score average or above)	
       Mn	Mimimum Exposure
       No	No Exposure
       NA	No Basement
```	
26. `BsmtFinType1`: Rating of basement finished area
```
       GLQ	Good Living Quarters
       ALQ	Average Living Quarters
       BLQ	Below Average Living Quarters	
       Rec	Average Rec Room
       LwQ	Low Quality
       Unf	Unfinshed
       NA	No Basement
```		
27. `BsmtFinType2`: Rating of basement finished area (if multiple types)
```
       GLQ	Good Living Quarters
       ALQ	Average Living Quarters
       BLQ	Below Average Living Quarters	
       Rec	Average Rec Room
       LwQ	Low Quality
       Unf	Unfinshed
       NA	No Basement
```
28. `Heating`: Type of heating
```
       Floor	Floor Furnace
       GasA 	Gas forced warm air furnace
       GasW 	Gas hot water or steam heat
       Grav 	Gravity furnace	
       OthW 	Hot water or steam heat other than gas
       Wall 	Wall furnace
```		
29. `HeatingQC`: Heating quality and condition
```
       Ex	Excellent
       Gd	Good
       TA	Average/Typical
       Fa	Fair
       Po	Poor
```		
30. `CentralAir`: Central air conditioning
```
       N	No
       Y	Yes
```		
31. `Electrical`: Electrical system
```
       SBrkr	Standard Circuit Breakers & Romex
       FuseA	Fuse Box over 60 AMP and all Romex wiring (Average)	
       FuseF	60 AMP Fuse Box and mostly Romex wiring (Fair)
       FuseP	60 AMP Fuse Box and mostly knob & tube wiring (poor)
       Mix   	Mixed
```		

32. `KitchenQual`: Kitchen quality
```   
       Ex	Excellent
       Gd	Good
       TA	Typical/Average
       Fa	Fair
       Po	Poor
```       	
33. `Functional`: Home functionality (Assume typical unless deductions are warranted)
```   
       Typ	Typical Functionality
       Min1	Minor Deductions 1
       Min2	Minor Deductions 2
       Mod	Moderate Deductions
       Maj1	Major Deductions 1
       Maj2	Major Deductions 2
       Sev	Severely Damaged
       Sal	Salvage only
```

34. `FireplaceQu`: Fireplace quality
```   
       Ex	Excellent - Exceptional Masonry Fireplace
       Gd	Good - Masonry Fireplace in main level
       TA	Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
       Fa	Fair - Prefabricated Fireplace in basement
       Po	Poor - Ben Franklin Stove
       NA	No Fireplace
```		
35. `GarageType`: Garage location
```		   
       2Types	More than one type of garage
       Attchd	Attached to home
       Basment	Basement Garage
       BuiltIn	Built-In (Garage part of house - typically has room above garage)
       CarPort	Car Port
       Detchd	Detached from home
       NA       No Garage
```		
		
36. `GarageFinish`: Interior finish of the garage
```
       Fin	Finished
       RFn	Rough Finished	
       Unf	Unfinished
       NA	No Garage
```		
37. `GarageQual`: Garage quality
```
       Ex	Excellent
       Gd	Good
       TA	Typical/Average
       Fa	Fair
       Po	Poor
       NA	No Garage
```		
38. `GarageCond`: Garage condition
```
       Ex	Excellent
       Gd	Good
       TA	Typical/Average
       Fa	Fair
       Po	Poor
       NA	No Garage
```		
39. `PavedDrive`: Paved driveway
```
       Y	Paved 
       P	Partial Pavement
       N	Dirt/Gravel
```

40. `PoolQC`: Pool quality
```		   
       Ex	Excellent
       Gd	Good
       TA	Average/Typical
       Fa	Fair
       NA	No Pool
```		
41. `Fence`: Fence quality
```		   
       GdPrv  Good Privacy
       MnPrv  Minimum Privacy
       GdWo   Good Wood
       MnWw   Minimum Wood/Wire
       NA     No Fence
```	
42. `MiscFeature`: Miscellaneous feature not covered in other categories
```		   
       Elev   Elevator
       Gar2   2nd Garage (if not described in garage section)
       Othr   Other
       Shed   Shed (over 100 SF)
       TenC   Tennis Court
       NA     None
```		

43. `SaleType`: Type of sale
```		   
       WD      Warranty Deed - Conventional
       CWD     Warranty Deed - Cash
       VWD     Warranty Deed - VA Loan
       New     Home just constructed and sold
       COD     Court Officer Deed/Estate
       Con     Contract 15% Down payment regular terms
       ConLw   Contract Low Down payment and low interest
       ConLI   Contract Low Interest
       ConLD   Contract Low Down
       Oth     Other
```		
44. `SaleCondition`: Condition of sale
```   
       Normal	Normal Sale
       Abnorml	Abnormal Sale -  trade, foreclosure, short sale
       AdjLand	Adjoining Land Purchase
       Alloca	Allocation - two linked properties with separate deeds, typically condo with a garage unit	
       Family	Sale between family members
       Partial	Home was not completed when last assessed (associated with New Homes)
```
