# Regression Week 2: Multiple Linear Regression Quiz 2

## Estimating Multiple Regression Coefficients (Gradient Descent)

In this notebook we will cover estimating multiple regression weights via gradient descent. You will:

- Add a constant column of 1's to a SFrame (or otherwise) to account for the intercept
- Convert an SFrame into a numpy array
- Write a predict_output() function using numpy
- Write a numpy function to compute the derivative of the regression weights with respect to a single feature
- Write gradient descent function to compute the regression weights given an initial weight vector, step size and tolerance.
- Use the gradient descent function to estimate regression weights for multiple features

#### Import modules

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import zipfile
import os
from math import log
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

#### Unzip and load datasets

In [2]:
with zipfile.ZipFile('kc_house_data.csv.zip', "r") as z:
    z.extractall(os.getcwd())
with zipfile.ZipFile('kc_house_test_data.csv.zip', "r") as z:
    z.extractall(os.getcwd())
with zipfile.ZipFile('kc_house_train_data.csv.zip', "r") as z:
    z.extractall(os.getcwd())
    
# Dictionary with the correct dtypes for the DataFrame columns
dtype_dict = {'bathrooms':float, 'waterfront':int, 'sqft_above':int, 
              'sqft_living15':float, 'grade':int, 'yr_renovated':int, 
              'price':float, 'bedrooms':float, 'zipcode':str, 
              'long':float, 'sqft_lot15':float, 'sqft_living':float, 
              'floors':str, 'condition':int, 'lat':float, 'date':str, 
              'sqft_basement':int, 'yr_built':int, 'id':str, 'sqft_lot':int, 'view':int}
    
sales = pd.read_csv('kc_house_data.csv', dtype = dtype_dict)
train_data = pd.read_csv('kc_house_train_data.csv', dtype = dtype_dict)
test_data = pd.read_csv('kc_house_test_data.csv', dtype = dtype_dict)

#### Get numpy data

Write a function that takes a data set, a list of features (e.g. [‘sqft_living’, ‘bedrooms’]), to be used as inputs, and a name of the output (e.g. ‘price’). This function should return a features_matrix (2D array) consisting of first a column of ones followed by columns containing the values of the input features in the data set in the same order as the input list. It should also return an output_array which is an array of the values of the output in the data set 

In [3]:
def get_numpy_data(input_df, features, output):
    
    input_df['constant'] = 1.0 # Adding column 'constant' to input DataFrame with all values = 1.0
    features = ['constant'] + features # Adding constant' to List of features

    feature_matrix = input_df.as_matrix(columns=features) # Convert DataFrame w/ columns in features list to np.ndarray
    output_array = input_df[output].values # Convert column with output feature into np.array
    
    return(feature_matrix, output_array)