# Week 1: Simple linear regression

In this notebook we will use data on house sales in King County to predict house prices using simple (one input) linear regression. You will:
* Use graphlab SArray and SFrame functions to compute important summary statistics
* Write a function to compute the Simple Linear Regression weights using the closed form solution
* Write a function to make predictions of the output given the input feature
* Turn the regression around to predict the input given the output
* Compare two different models for predicting house prices


In [1]:
import graphlab

[INFO] This non-commercial license of GraphLab Create is assigned to chengjun@chem.ku.dk and will expire on January 27, 2017. For commercial licensing options, visit https://dato.com/buy/.

[INFO] Start server at: ipc:///tmp/graphlab_server-31266 - Server binary: /usr/local/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1454790959.log
[INFO] GraphLab Server Version: 1.8.1


## Load data

In [2]:
graphlab.canvas.set_target('ipynb')

In [3]:
sales = graphlab.SFrame('kc_house_data.gl/')

In [4]:
train_data, test_data = sales.random_split(.8, seed=0)

## Build a generic simple linear regression function

In [5]:
def simple_linear_regression(input_feature, output):
    n = len(input_feature)
    num = (input_feature * output).sum() - (input_feature.sum() * output.sum()) / n
    den = (input_feature**2).sum() - (input_feature.sum() ** 2) /n
    slope = num / den
    intercept = output.mean() - slope * input_feature.mean()
    return (intercept, slope)

In [6]:
input_feature = train_data['sqft_living']
output = train_data['price']

In [7]:
squarefeet_intercept, squarefeet_slope = simple_linear_regression(input_feature, output)

## Predicting Values

In [10]:
def get_regression_predictions(input_feature, intercept, slope):
    predicted_output = input_feature * slope + intercept
    return(predicted_output)

In [11]:
print get_regression_predictions(2650, squarefeet_intercept, squarefeet_slope)

700074.845629


In [14]:
def get_residual_sum_of_squares(input_feature, output, intercept, slope):
    rss = ((output - (input_feature * slope + intercept))**2).sum()
    return(rss)

In [15]:
print get_residual_sum_of_squares(input_feature, output, squarefeet_intercept, squarefeet_slope)

1.20191835632e+15


## Predict the squarefeet given price

In [16]:
def inverse_regression_predictions(output, intercept, slope):
    estimated_input = (output - intercept) / slope
    return(estimated_input)

In [17]:
print inverse_regression_predictions(800000, squarefeet_intercept, squarefeet_slope)

3004.39624762


## New Model: estimate prices from bedrooms

In [18]:
# bedroom

bedroom_intercept, bedroom_slope = simple_linear_regression(train_data['bedrooms'], train_data['price'])
print bedroom_intercept, bedroom_slope

109473.180469 127588.952175


In [19]:
# on test_data
print get_residual_sum_of_squares(test_data['sqft_living'], test_data['price'], squarefeet_intercept, squarefeet_slope)
print get_residual_sum_of_squares(test_data['bedrooms'], test_data['price'], bedroom_intercept, bedroom_slope)

2.75402936247e+14
4.93364582868e+14
