# Homework 1: Linear Regression

**Please type your name and A number here:**

In [1]:
Name = "Hari Chandana Kotnani"
assert Name != "", 'Please enter your name in the above quotation marks, thanks!'

A_number = "A02396013"
assert A_number != "", 'Please enter your A-number in the above quotation marks, thanks!'

In this notebook we will use data on house sales in King County to predict house prices using both simple (one input) linear regression and multiple (multiple inputs) linear regression. You will:

Part 1 (simple linear regression)
* Complete functions to compute important summary statistics
* Write a function to compute the Simple Linear Regression weights using the closed form solution
* Write a function to make predictions of the output given the input feature
* Turn the regression around to predict the input given the output
* Compare two different models for predicting house prices

Part 2 (multi-variable linear regression)
* Add a constant column of 1's to a DataFrame to account for the intercept
* Convert an DataFrame into a Numpy array
* Write a predict_output() function using Numpy
* Write a numpy function to compute the derivative of the regression weights with respect to a single feature
* Write gradient descent function to compute the regression weights given an initial weight vector, step size and tolerance.
* Use the gradient descent function to estimate regression weights for multiple features

In this notebook you will be provided with some already complete code **as well as some code that you should complete yourself in order to answer quiz questions**. The code we provide to complete is optional and is there to assist you with solving the problems but feel free to ignore the helper code and write your own.

# Part 1: Simple linear regression


# Fire up Pandas

In [2]:
import pandas as pd

# Load house sales data and split it into training and testing

Dataset is from house sales in King County, the region where the city of Seattle, WA is located. We already split the data into training and testing so that everyone running this notebook gets the same results. In practice, you may check the function **train_test_split** from sklearn library.

In [3]:
dtype_dict = {'bathrooms':float, 'waterfront':int, 'sqft_above':int, 
              'sqft_living15':float, 'grade':int, 'yr_renovated':int, 
              'price':float, 'bedrooms':float, 'zipcode':str, 'long':float, 
              'sqft_lot15':float, 'sqft_living':float, 'floors':str, 'condition':int,
              'lat':float, 'date':str, 'sqft_basement':int, 'yr_built':int, 'id':str, 'sqft_lot':int, 'view':int}
sales = pd.read_csv('kc_house_data.csv', dtype=dtype_dict)
train_data = pd.read_csv('kc_house_train_data.csv', dtype=dtype_dict)
test_data = pd.read_csv('kc_house_test_data.csv', dtype=dtype_dict)

# Useful DataFrame summary functions

In order to make use of the closed form solution as well as take advantage of pandas built in functions we will review some important ones. In particular:
* Computing the sum of a DataFrame column
* Computing the arithmetic average (mean) of a DataFrame column
* multiplying DataFrame columns by constants
* multiplying DataFrame columns by other DataFrame columns

In [4]:
# Let's compute the mean of the House Prices in King County in 2 different ways.
prices = sales['price'] # extract the price column of the sales DatFrame -- this is now an DataFrame column
square_feet = sales['sqft_lot']
square_feet_living = sales['sqft_living']
# recall that the arithmetic average (the mean) is the sum of the prices divided by the total number of houses:
sum_prices = prices.sum()
num_houses = len(prices) # when prices is an DataFrame column len() returns its length
avg_price_1 = sum_prices/num_houses
avg_price_2 = prices.mean() # if you just want the average, the .mean() function
print("average price via method 1: " + str(avg_price_1))
print("average price via method 2: " + str(avg_price_2))

average price via method 1: 540088.1417665294
average price via method 2: 540088.1417665294


As we see we get the same answer both ways

In [5]:
# if we want to multiply every price by 0.5 it's a simple as:
half_prices = 0.5 * prices
# Let's compute the sum of squares of price. We can multiply two DataFrame columns of the same length elementwise also with *
prices_squared = prices*prices
sum_prices_squared = prices_squared.sum() # price_squared is a DataFrame column of the squares and we want to add them up.
print("the sum of price squared is: " + str(sum_prices_squared))

the sum of price squared is: 9217325138472070.0


Aside: The python notation x.xxe+yy means x.xx \* 10^(yy). e.g 100 = 10^2 = 1*10^2 = 1e2 

# Build a generic simple linear regression function 

Armed with these DataFrame functions we can use the closed form solution found from lecture to compute the slope and intercept for a simple linear regression on observations stored as DataFrame columns: input_feature, output.

<font color='red'> **rubic={12 points}** </font> 

Complete the following function (or write your own) to compute the simple linear regression slope and intercept: 

In [6]:
def simple_linear_regression(input_feature, output):
    # compute the sum of input_feature and output
    input_sum = input_feature.sum()
    output_sum = output.sum()
    input_mean = input_feature.mean()
    output_mean = output.mean()
    len_i = len(input_feature)       
    # compute the product of the output and the input_feature and its sum
    product_inp_oup = input_feature * output
    product_inp_inp = input_feature * input_feature
    sum_product_inp_oup = product_inp_oup.sum()
    sum_product_inp_inp = product_inp_inp.sum()
    mean_product_inp_oup = product_inp_oup.mean()
    mean_product_inp_inp = product_inp_inp.mean()
    # compute the squared value of the input_feature and its sum
    i_sqrd = (input_feature * input_feature)
    sum_i_sqrd = i_sqrd.sum()
    #Slope:
    numerator = sum_product_inp_oup - (1/len_i)*(input_sum * output_sum)
    denominator = sum_product_inp_inp - (1/len_i)*(input_sum * input_sum)
    # use the formula for the slope
    slope = numerator/denominator
    # use the formula for the intercept
    intercept = output_mean - (slope * input_mean)
    return (intercept, slope)

We can test that our function works by passing it something where we know the answer. In particular we can generate a feature and then put the output exactly on a line: output = 1 + 1\*input_feature then we know both our slope and intercept should be 1

In [7]:
test_feature = pd.DataFrame(range(5))
test_output = pd.DataFrame(1 + 1*test_feature)
(test_intercept, test_slope) =  simple_linear_regression(test_feature, test_output)
print("Intercept: " + str(test_intercept))
print("Slope: " + str(test_slope))
test_intercept

Intercept: 0    1.0
dtype: float64
Slope: 0    1.0
dtype: float64


0    1.0
dtype: float64

Now that we know it works let's build a regression model for predicting price based on sqft_living. Rembember that we train on train_data!

In [8]:
sqft_intercept, sqft_slope = simple_linear_regression(train_data['sqft_living'], train_data['price'])

print("Intercept: " + str(sqft_intercept))
print("Slope: " + str(sqft_slope))

Intercept: -47116.07907289418
Slope: 281.9588396303426


# Predicting values

<font color='red'> **rubric={5 points}** </font>

Now that we have the model parameters: intercept & slope we can make predictions. Using DataFrame it's easy to multiply an DataFrame column by a constant and add a constant value. Complete the following function to return the predicted output given the input_feature, slope and intercept:

In [9]:
def get_regression_predictions(input_feature, intercept, slope):
    # calculate the predicted values:
    predicted_values = intercept + (slope * input_feature)
    return predicted_values

Now that we can calculate a prediction given the slope and intercept let's make a prediction. Use (or alter) the following to find out the estimated price for a house with 2650 squarefeet according to the squarefeet model we estiamted above.

<font color='red'> **rubric={2 points}** </font>

**Quiz Question: Using your Slope and Intercept, What is the predicted price for a house with 2650 sqft?**

In [10]:
my_house_sqft = 2650
estimated_price = get_regression_predictions(my_house_sqft, sqft_intercept, sqft_slope)
print("The estimated price for a house with %d squarefeet is $%.2f" % (my_house_sqft, estimated_price))

The estimated price for a house with 2650 squarefeet is $700074.85


# Residual Sum of Squares

<font color='red'> **rubric={10 points}** </font> 

Now that we have a model and can make predictions let's evaluate our model using Residual Sum of Squares (RSS). Recall that RSS is the sum of the squares of the residuals and the residuals is just a fancy word for the difference between the predicted output and the true output. 

Complete the following (or write your own) function to compute the RSS of a simple linear regression model given the input_feature, output, intercept and slope:

In [11]:
def get_residual_sum_of_squares(input_feature, output, intercept, slope):
    # First get the predictions
    predictions = get_regression_predictions(input_feature,intercept, slope)
    # then compute the residuals (since we are squaring it doesn't matter which order you subtract)
    residuals = predictions-output
    # square the residuals and add them up
    residuals_squared = residuals ** 2
    RSS = residuals_squared.sum()
    return(RSS)

Let's test our get_residual_sum_of_squares function by applying it to the test model where the data lie exactly on a line. Since they lie exactly on a line the residual sum of squares should be zero!

In [12]:
print(get_residual_sum_of_squares(test_feature, test_output, test_intercept, test_slope)) # should be 0.0

0    0.0
dtype: float64


Now use your function to calculate the RSS on training data from the squarefeet model calculated above.

<font color='red'> **rubric={2 points}** </font> 

**Quiz Question: According to this function and the slope and intercept from the squarefeet model What is the RSS for the simple linear regression using squarefeet to predict prices on TRAINING data?**

In [13]:
rss_prices_on_sqft = get_residual_sum_of_squares(train_data['sqft_living'], train_data['price'], sqft_intercept, sqft_slope)
print('The RSS of predicting Prices based on Square Feet is : ' + str(rss_prices_on_sqft))
print('The RSS of predicting Prices based on Square Feet is : '  ,format(rss_prices_on_sqft,'.5e'))

The RSS of predicting Prices based on Square Feet is : 1201918354177283.0
The RSS of predicting Prices based on Square Feet is :  1.20192e+15


# Predict the squarefeet given price

What if we want to predict the squarefoot given the price? Since we have an equation y = b + w\*x we can solve the function for x. So that if we have the intercept (b) and the slope (w) and the price (y) we can solve for the estimated squarefeet (x).

<font color='red'> **rubric={5 points}** </font>

Complete the following function to compute the inverse regression estimate, i.e. predict the input_feature given the output.

In [14]:
def inverse_regression_predictions(output, intercept, slope):
    # solve output = intercept + slope*input_feature for input_feature. Use this equation to compute the inverse predictions:
    estimated_feature = (output - intercept) / slope
    return estimated_feature

Now that we have a function to compute the squarefeet given the price from our simple regression model let's see how big we might expect a house that costs $800,000 to be.

<font color='red'> **rubric={2 points}** </font>

**Quiz Question: According to this function and the regression slope and intercept from (3) what is the estimated square-feet for a house costing $800,000?**

In [15]:
my_house_price = 800000
estimated_squarefeet = inverse_regression_predictions(my_house_price, sqft_intercept, sqft_slope)
print("The estimated squarefeet for a house worth $%.2f is %d" % (my_house_price, estimated_squarefeet))

The estimated squarefeet for a house worth $800000.00 is 3004


# New Model: Estimate prices from bedrooms

<font color='red'> **rubric={2 points}** </font> 

We have made one model for predicting house prices using squarefeet, but there are many other features in the sales DataFrame. 
Use your simple linear regression function to estimate the regression parameters from predicting Prices based on number of bedrooms. Use the training data!

In [16]:
# Estimate the slope and intercept for predicting 'price' based on 'bedrooms'
bedrooms_intercept, bedrooms_slope = simple_linear_regression(train_data['bedrooms'], train_data['price'])

print("Intercept for predicting price: " + str(bedrooms_intercept))
print("Slope for predicting price: " + str(bedrooms_slope))

Intercept for predicting price: 109473.17762295867
Slope for predicting price: 127588.9529339881


# Test your linear regression algorithm

Now we have two models for predicting the price of a house. How do we know which one is better? Calculate the RSS on the TEST data (remember this data wasn't involved in learning the model). Compute the RSS from predicting prices using bedrooms and from predicting prices using squarefeet.

<font color='red'> **rubric={4 points}** </font> 

**Quiz Question: Which model (square feet or bedrooms) has lowest RSS on TEST data? Think about why this might be the case.**

In [17]:
# Compute RSS when using bedrooms on TEST data:
rssprices_on_bedrooms = get_residual_sum_of_squares(test_data['bedrooms'], test_data['price'], bedrooms_intercept, bedrooms_slope)
print('RSS for bedrooms on TEST data: ' + str(rssprices_on_bedrooms))

RSS for bedrooms on TEST data: 493364585960301.0


In [18]:
# Compute RSS when using squarefeet on TEST data:
rssprices_on_sqft = get_residual_sum_of_squares(test_data['sqft_living'], test_data['price'], sqft_intercept, sqft_slope)
print('RSS for squarfeet on TEST data: ' + str(rssprices_on_sqft))

RSS for squarfeet on TEST data: 275402933617812.12


# Task 2: Multivariable linear regression

# Convert to Numpy Array

Although DataFrames offer a number of benefits to users in order to understand the details of the implementation of algorithms it's important to work with a library that allows for direct (and optimized) matrix operations. Numpy is a Python solution to work with matrices (or any multi-dimensional "array").

Recall that the predicted value given the weights and the features is just the dot product between the feature and weight vector. Similarly, if we put all of the features row-by-row in a matrix then the predicted value for *all* the observations can be computed by right multiplying the "feature matrix" by the "weight vector". 

First we need to take the DataFrame of our data and convert it into a 2D numpy array (also called a matrix). We can then use Panda's .to_numpy() to convert the dataframe into a numpy matrix.

In [19]:
import numpy as np # note this allows us to refer to numpy as np instead 

<font color='red'> **rubric = {7 points}** </font> 

Now we will write a function that will accept a DataFrame, a list of feature names (e.g. ['sqft_living', 'bedrooms']) and an target feature e.g. ('price') and will return two things:
* A numpy matrix whose columns are the desired features plus a constant column (this is how we create an 'intercept')
* A numpy array containing the values of the output

With this in mind, complete the following function (where there's an empty line you should write a line of code that does what the comment above indicates)

In [20]:
def get_numpy_data(data_dframe, features, output):
    data_dframe['constant'] = 1 # this is how you add a constant column to a DataFrame
    # add the column 'constant' to the front of the features list so that we can extract it along with the others:
    features = ['constant'] + features # this is how you combine two lists
    # select the columns of data_dframe given by the features list into the DataFrame features_dframe (now including constant):
    features_dframe = data_dframe[features]
    # the following line will convert the features_dframe into a numpy matrix:
    feature_matrix = features_dframe.to_numpy()
    # assign the column of data_dframe associated with the output to the DataFrame column output_darray
    output_darray = data_dframe[output]
    # the following will convert the DataFrame column into a numpy array by first converting it to a list
    output_array = output_darray.to_numpy()
    return(feature_matrix, output_array)

For testing let's use the 'sqft_living' feature and a constant as our features and price as our output:

In [21]:
(example_features, example_output) = get_numpy_data(sales, ['sqft_living'], 'price') # the [] around 'sqft_living' makes it a list
print(example_features[0,:]) # this accesses the first row of the data the ':' indicates 'all columns'
print(example_output[0]) # and the corresponding output

[1.00e+00 1.18e+03]
221900.0


# Predicting output given regression weights

Suppose we had the weights [1.0, 1.0] and the features [1.0, 1180.0] and we wanted to compute the predicted output 1.0\*1.0 + 1.0\*1180.0 = 1181.0 this is the dot product between these two arrays. If they're numpy arrayws we can use np.dot() to compute this:

In [22]:
my_weights = np.array([1., 1.]) # the example weights
my_features = example_features[0,] # we'll use the first data point
predicted_value = np.dot(my_features, my_weights)
print(predicted_value)

1181.0


<font color='red'> **rubric={5 points}** </font>

np.dot() also works when dealing with a matrix and a vector. Recall that the predictions from all the observations is just the RIGHT (as in weights on the right) dot product between the features *matrix* and the weights *vector*. With this in mind finish the following predict_output function to compute the predictions for an entire matrix of features given the matrix and the weights:

In [23]:
def predict_output(feature_matrix, weights):
    # assume feature_matrix is a numpy matrix containing the features as columns and weights is a corresponding numpy array
    # create the predictions vector by using np.dot()
    predictions = np.dot(feature_matrix, weights)
    return(predictions)

If you want to test your code run the following cell:

In [24]:
test_predictions = predict_output(example_features, my_weights)
print(test_predictions[0]) # should be 1181.0
print(test_predictions[1]) # should be 2571.0

1181.0
2571.0


# Computing the derivative

We are now going to move to computing the derivative of the regression cost function. Recall that the cost function is the sum over the data points of the squared difference between an observed output and a predicted output.

Since the derivative of a sum is the sum of the derivatives we can compute the derivative for a single data point and then sum over data points. We can write the squared difference between the observed output and predicted output for a single point as follows:

(w[0]\*[CONSTANT] + w[1]\*[feature_1] + ... + w[i] \*[feature_i] + ... +  w[k]\*[feature_k] - output)^2

Where we have k features and a constant. So the derivative with respect to weight w[i] by the chain rule is:

2\*(w[0]\*[CONSTANT] + w[1]\*[feature_1] + ... + w[i] \*[feature_i] + ... +  w[k]\*[feature_k] - output)\* [feature_i]

The term inside the paranethesis is just the error (difference between prediction and output). So we can re-write this as:

2\*error\*[feature_i]

That is, the derivative for the weight for feature i is the sum (over data points) of 2 times the product of the error and the feature itself. In the case of the constant then this is just twice the sum of the errors!

Recall that twice the sum of the product of two vectors is just twice the dot product of the two vectors. Therefore the derivative for the weight for feature_i is just two times the dot product between the values of feature_i and the current errors. 

<font color='red'> **rubric={5 points}** </font>

 With this in mind complete the following derivative function which computes the derivative of the weight given the value of the feature (over all data points) and the errors (over all data points).

In [25]:
def feature_derivative(errors, feature):
    # Assume that errors and feature are both numpy arrays of the same length (number of data points)
    # compute twice the dot product of these vectors as 'derivative' and return the value
    derivative = np.dot(errors,feature) * 2

    return(derivative)

To test your feature derivartive run the following:

In [26]:
(example_features, example_output) = get_numpy_data(sales, ['sqft_living'], 'price') 
my_weights = np.array([0., 0.]) # this makes all the predictions 0
test_predictions = predict_output(example_features, my_weights) 
# just like DataFrames 2 numpy arrays can be elementwise subtracted with '-': 
errors = test_predictions - example_output # prediction errors in this case is just the -example_output
feature = example_features[:,0] # let's compute the derivative with respect to 'constant', the ":" indicates "all rows"
derivative = feature_derivative(errors, feature)
print(derivative)
print(-np.sum(example_output)*2) # should be the same as derivative

-23345850016.0
-23345850016.0


# Gradient descent

Now we will write a function that performs a gradient descent. The basic premise is simple. Given a starting point we update the current weights by moving in the negative gradient direction. Recall that the gradient is the direction of *increase* and therefore the negative gradient is the direction of *decrease* and we're trying to *minimize* a cost function. 

The amount by which we move in the negative gradient *direction*  is called the 'step size'. We stop when we are 'sufficiently close' to the optimum. We define this by requiring that the magnitude (length) of the gradient vector to be smaller than a fixed 'tolerance'.

<font color='red'> **rubric={20 points}** 

</font>With this in mind, complete the following gradient descent function below using your derivative function above. For each step in the gradient descent we update the weight for each feature befofe computing our stopping criteria

In [27]:
from math import sqrt # recall that the magnitude/length of a vector [g[0], g[1], g[2]] is sqrt(g[0]^2 + g[1]^2 + g[2]^2)

In [28]:
def regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance):
    converged = False 
    weights = np.array(initial_weights) # make sure it's a numpy array
    while not converged:
        # compute the predictions based on feature_matrix and weights using your predict_output() function
        predictions = predict_output(feature_matrix, weights)

        # compute the errors as predictions - output
        errors = predictions - output
        gradient_sum_squares = 0 # initialize the gradient sum of squares
        # while we haven't reached the tolerance yet, update each feature's weight
        for i in range(len(weights)): # loop over each weight
            # Recall that feature_matrix[:, i] is the feature column associated with weights[i]
            # compute the derivative for weight[i]:
            derivative = feature_derivative(errors, feature_matrix[:, i])
            # add the squared value of the derivative to the gradient sum of squares (for assessing convergence)
            gradient_sum_squares += np.power(derivative, 2)
            # subtract the step size times the derivative from the current weight
            weights[i] -= step_size * derivative

        # compute the square-root of the gradient sum of squares to get the gradient magnitude:
        gradient_magnitude = sqrt(gradient_sum_squares)
        if gradient_magnitude < tolerance:
            converged = True
    return(weights)

A few things to note before we run the gradient descent. Since the gradient is a sum over all the data points and involves a product of an error and a feature the gradient itself will be very large since the features are large (squarefeet) and the output is large (prices). So while you might expect "tolerance" to be small, small is only relative to the size of the features. 

For similar reasons the step size will be much smaller than you might expect but this is because the gradient has such large values.

# Running the gradient descent as simple regression

Although the gradient descent is designed for multivariable regression since the constant is now a feature we can use the gradient descent function to estimate the parameters in the simple regression on squarefeet. The folowing cell sets up the feature_matrix, output, initial weights and step size for the first model:

In [29]:
# let's test out the gradient descent
simple_features = ['sqft_living']
my_output = 'price'
(simple_feature_matrix, output) = get_numpy_data(train_data, simple_features, my_output)
initial_weights = np.array([-47000., 1.])
step_size = 7e-12
tolerance = 2.5e7

<font color='red'> **{2 points}** </font>Next run your gradient descent with the above parameters.

In [30]:
weights = regression_gradient_descent(simple_feature_matrix, output, initial_weights, step_size, tolerance)
print(weights)

[-46999.88716555    281.91211918]


How do your weights compare to those achieved in week 1 (don't expect them to be exactly the same)? 

**Quiz Question: What is the value of the weight for sqft_living -- the second element of ‘simple_weights’ (rounded to 1 decimal place)?**

In [31]:
round(weights[1],1)

281.9

<font color='red'> **{2 points}** </font>Use your newly estimated weights and your predict_output() function to compute the predictions on all the TEST data (you will need to create a numpy array of the test feature_matrix and test output first:

In [32]:
(test_simple_feature_matrix, test_output) = get_numpy_data(test_data, simple_features, my_output)

<font color='red'> **{2 points}** </font>Now compute your predictions using test_simple_feature_matrix and your weights from above.

In [33]:
test_simple_predictions = predict_output(test_simple_feature_matrix,weights)

<font color='red'> **{1 point}** </font>**Quiz Question: What is the predicted price for the 1st house in the TEST data set for model 1 (round to nearest dollar)?**

In [34]:
print(np.round(test_simple_predictions[0:1],0))

[356134.]


<font color='red'> **{2 points}** </font>Now that you have the predictions on test data, compute the RSS on the test data set. Save this value for comparison later. Recall that RSS is the sum of the squared errors (difference between prediction and output).

In [35]:
test_simple_residuals = test_simple_predictions - test_data['price']
test_simple_rss = sum(test_simple_residuals * test_simple_residuals)
print(test_simple_rss)

275400044902128.78


# Running a multivariable regression

Now we will use more than one actual feature. Use the following code to produce the weights for a second model with the following parameters:

In [36]:
model_features = ['sqft_living', 'sqft_living15'] # sqft_living15 is the average squarefeet for the nearest 15 neighbors. 
my_output = 'price'
(feature_matrix, output) = get_numpy_data(train_data, model_features, my_output)
initial_weights = np.array([-100000., 1., 1.])
step_size = 4e-12
tolerance = 1e9

<font color='red'> **{2 points}** </font>Use the above parameters to estimate the model weights. Record these values for the following questions.

In [37]:
multi_weights = regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance)
print(multi_weights)

[-9.99999688e+04  2.45072603e+02  6.52795267e+01]


<font color='red'> **{2 points}** </font>Use your newly estimated weights and the predict_output function to compute the predictions on the TEST data. Don't forget to create a numpy array for these features from the test set first!

In [38]:
(multi_feature_matrix, multi_output) = get_numpy_data(test_data, model_features, my_output)
multi_predictions = predict_output(multi_feature_matrix, multi_weights)

<font color='red'> **{1 point}** </font>**Quiz Question: What is the predicted price for the 1st house in the TEST data set for model 2 (round to nearest dollar)?**

In [39]:
print(np.round(multi_predictions[0:1],0))

[366651.]


<font color='red'> **{1 point}** </font>What is the actual price for the 1st house in the test data set?

In [40]:
test_data['price'][0]

310000.0

<font color='red'> **{1 points}** </font> **Quiz Question: Which estimate was closer to the true price for the 1st house on the TEST data set, model 1 or model 2?**

In [41]:
print("MODEL 1 was closer")

MODEL 1 was closer


<font color='red'> **{2 points}** </font>Now use your predictions and the output to compute the RSS for model 2 on TEST data.

In [42]:
multi_residuals = multi_predictions - test_data['price']
multi_rss = sum(multi_residuals * multi_residuals)
print(multi_rss)

270263443629803.3


<font color='red'> **{1 points}** </font> Which model (1 or 2) has lowest RSS on all of the TEST data?

In [43]:
print("MODEL2 has lowest RSS on all of the TEST data")

MODEL2 has lowest RSS on all of the TEST data


## Submission instructions 

**PLEASE READ:** When you are ready to submit your assignment do the following:

1. Run all cells in your notebook to make sure there are no errors by doing `Kernel -> Restart Kernel and Clear All Outputs` and then `Run -> Run All Cells`.
2. Notebooks with cell execution numbers out of order will have marks deducted. Notebooks without the output displayed may not be graded at all (because we need to see the output in order to grade your work).
3. Please keep your notebook clean and remove any throwaway code.