# Predicting Home Prices Using Supervised Machine Learning
------

This notebook will walk you through use of the `homeprice_prediction.py` module.  The module uses supervised learning to make predictions of houses based on input variables.  The sample data that is included in the homeprice_prediction directory contains 512 homes with 13 variables.  They are:

1. CRIM: per capita crime rate by town 
2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft. 
3. INDUS: proportion of non-retail business acres per town 
4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) 
5. NOX: nitric oxides concentration (parts per 10 million) 
6. RM: average number of rooms per dwelling 
7. AGE: proportion of owner-occupied units built prior to 1940 
8. DIS: weighted distances to five Boston employment centres 
9. RAD: index of accessibility to radial highways 
10. TAX: full-value property-tax rate per 10,000 dollars
11. PTRATIO: pupil-teacher ratio by town 
12. B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town 
13. LSTAT: % lower status of the population 
14. MEDV: Median value of owner-occupied homes in 1000's

The module uses learning curves and complexity graphs to narrow parameterization for the supervised learning model.  The notebook finishes by demonstrating how to make predictions on home prices.

## Load and Explore the Data

We assume data is stored here, and this is the libraries we gonna use.

In [None]:
# LIBRARIES
import homeprice_prediction

# FILENAME OF DATA CSV STORED IN SAME DIRECTORY
filename = "data.csv"

# Tell iPython to include plots inline in the notebook
%matplotlib inline

In [None]:
# Load data
data = homeprice_prediction.load_data(filename)

# Explore the data
homeprice_prediction.explore_data(data)

## Prepare data for Machine Learning Algorithms
-------

Calculate the performance of the model after a set of training data.  
    
We use a learning curve as it is a visual graph that compares the metric performance
of a model on training & testing data over a number of training instances.
    
When the testing curve and training curve plateau and there is no gap and the model has 'learned' everything.

In [None]:
# Training/Test dataset split
X_train, y_train, X_test, y_test = homeprice_prediction.split_data(data, test_percentage = 0.3)

# Learning Curve Graphs
max_depths = [1,2,3,4,5,6,7,8,9,10]

for max_depth in max_depths:
    homeprice_prediction.learning_curve(max_depth, X_train, y_train, X_test, y_test)

## Increase Model Complexity

Model Complexity graph looks at how the complexity of a model changes the training and testing curves.

More Complexity -> More Variability

In [None]:
# Model Complexity Graph
homeprice_prediction.model_complexity(X_train, y_train, X_test, y_test)

## Optimize a Model and Make Predictions

In [None]:
# Define a sample house
house = [11.95, 0.00, 18.100, 0, 0.6590, 5.6090, 90.00, 1.385, 24, 680.0, 20.20, 332.09, 12.13]

# Tune and predict Model
homeprice_prediction.fit_predict_model(data, house)