# Project 1 - Boston House Prices

### Download Dataset From The Following Link
https://www.kaggle.com/c/boston-housing

- Follow the instructions to complete the project
- Knowledge of Python and Python Libraries is a Pre-requisite
- Feel free to add in your code and analysis, this is a practice exercise and you should try to implement your learnings
- If something is not clear, go back to lessons, or the documentation page for Python Libraries

In [None]:
# import the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline # keeps the plots in one place

In [None]:
# load data using pandas.read_csv(filename)
house = pd.read_csv()


In [None]:
# Initial Data Inspection
# Use house.head() to display first n rows
# Use house.info and house.describe to inspect data



In [None]:
# separate feature variables and target variable
price = house['MEDV'] # target variable
features = house.drop('MEDV', axis = 1) # feature variables

In [None]:
# data inspection
minimum_price = np.min(price)

maximum_price = np.max(price)

mean_price = np.mean(price)

median_price = np.median(price)

std_price = np.std(price)

# print the calculated values


In [None]:
# Visualize your data
for var in ['RM', 'LSTAT', 'PTRATIO']:
    sns.regplot(house[var], price)
    plt.show()
    
import matplotlib.pyplot as plt
plt.figure(figsize=(20, 5))

for i, col in enumerate(features.columns):
    
    plt.subplot(1, 3, i+1)
    x = house[col]
    y = price
    plt.plot(x, y, 'o')
    
    # Create regression line
    plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))(np.unique(x)))
    plt.title(col)
    plt.xlabel(col)
    plt.ylabel('prices')

### R2 Score
For this project, we will use R2 score to measure our model's performance. Read more about it at the following links:

http://www.statisticshowto.com/probability-and-statistics/coefficient-of-determination-r-squared/

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html

In [None]:
# create a function for calculating performance using the r2 score
from sklearn.metrics import r2_score
def performance_metric(y_true, y_predict):
    
    score = r2_score(y_true, y_predict)
    return score

In [None]:
# Shuffle and split the data into training and testing subsets
from sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(features, price, test_size=0.20, random_state=33)


### Learning Assignment

What is Grid Search and Cross-Validation?

- This is your learning assignment, before you move further you need to research and gather information on Grid Search and Cross-Validation. Learn about these concepts and then move ahead.

In [None]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import make_scorer
from sklearn.grid_search import GridSearchCV

def model(X, y, regressor):
    
    # Create cross-validation sets from the training data
    cv_sets = ShuffleSplit(X.shape[0], n_iter = 10, test_size = 0.20, random_state = 0)

    # dictionary for the parameter 'max_depth' with a range from 1 to 10
    params = {'max_depth':range(1,11)} 

    # scoring function using 'make_scorer'
    score = make_scorer(performance_metric) 
   
    # Create the grid search object
    grid = GridSearchCV(regressor, param_grid=params, scoring=score, cv=cv_sets)  

    # Fit the grid search object to the data to compute the optimal model
    grid = grid.fit(X, y) 

    # Return the optimal model after fitting the data
    return grid.best_estimator_

In [None]:
# Fit the training data to the model using grid search
regressor_1 = DecisionTreeRegressor() # Create a decision tree regressor object
regressor_2 = LinearRegression() # Create a linear regression object

# Select your regressor and train your model
regression = model(X_train, y_train, regressor_1)

In [None]:
# Make Predictions
# Produce a random data matrix to predict house prices
data = [[5, 17, 15],
        [4, 32, 22],
        [8, 3, 12]]

# Display predictions
for i, price in enumerate(regression.predict(data)):
    print("Predicted selling price for Client {}'s home: ${:,.2f}".format(i+1, price))