# Linear Regression
The goal of this notebook is to implement linear regression from scratch as practice to refresh my knowledge of the process.
The goal of using Linear Regression is to find parameters $\beta$ such that $X \beta = y$ where $X$ is our matrix of features and $y$ is our target variable. 

The example dataset for this notebook is the boston housing dataset from sklearn


In [20]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston

In [21]:
boston = load_boston()

In [35]:
boston_X = boston['data']
boston_X_with_intercept = np.concatenate([boston_X,np.ones((boston_X.shape[0],1))],axis= 1)
boston_y = boston['target']

The first way I will implement linear regression is using ordinary least squares and minimizing the sum of squared residuals. This can be solved with the formula: 
$$ \hat{\beta} = (X^\top X)^{-1} X^\top y $$
Where $\hat{\beta}$ is our estimated parameters. This assumes that the columns of $X$ are linearly independent and that there exists a linear relationship between $X$ and $y$

In [2]:
def basic_linear_regression(X,y):
    '''
    Takes in a feature matrix X and a target matrix y and returns beta_hat which is the 
    estimated parameters such that X * beta_hat = y_hat 
    This function will assume that the first column of X is a column of ones that will
    will be used to fit the intercept
    '''
    #the inverse of X transpose times X 
    XTX_inv =  np.linalg.inv(np.dot(X.T,X))
    #times X transpose times y
    beta_hat = np.dot(np.dot(XTX_inv, X.T),y)
    return beta_hat

This function will implement the cost function, the root mean square error
$$ L(\hat{y},y) = \sqrt{\frac{1}{n}\sum_{i = 0}^n (y_i - \hat{y_i})^2} $$
This will be the cost function that I will use to evaluate my implementations. 

In [62]:
# Implement the cost function
def cost_function(y, y_hat):
    mean_squares = np.mean((np.array(y) - np.array(y_hat))**2)
    RMSE = np.sqrt(mean_squares)
    return RMSE

Now I'll use my implementation and see the RMSE, and compare it to scikit-learns implementation, using ten fold cross validation

In [81]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold

In [82]:
my_im_train_error,my_im_test_error = [],[]
sklearn_train_error, sklearn_test_error = [],[]

kf = KFold(n_splits=10, random_state=42, shuffle = True)
for train_ind, test_ind in kf.split(boston_X_with_intercept):
    X_train, X_test = boston_X_with_intercept[train_ind],boston_X_with_intercept[test_ind]
    y_train, y_test = boston_y[train_ind],boston_y[test_ind]
    
    #estimates beta_hat for current fold
    beta_hat = basic_linear_regression(X_train,y_train)
    y_train_hat = X_train.dot(beta_hat)
    my_im_train_error.append(cost_function(y_train,y_train_hat))
                             
    y_test_hat = X_test.dot(beta_hat)
    my_im_test_error.append(cost_function(y_test,y_test_hat))
    
    #fit sklearns lr on current fold
    lr = LinearRegression()
    lr.fit(X_train,y_train)
                            
    y_train_hat = lr.predict(X_train)
    sklearn_train_error.append(cost_function(y_train,y_train_hat))
    
    y_test_hat = lr.predict(X_test)
    sklearn_test_error.append(cost_function(y_test,y_test_hat))
    
#average errors across folds
print("Average Errors on 10 folds")
print('My implementation')
print('train RMSE: ',np.mean(my_im_train_error))
print('test RMSE: ',np.mean(my_im_test_error))                            
print('sklearn')
print('train RMSE: ',np.mean(sklearn_train_error))
print('test RMSE: ',np.mean(sklearn_test_error))

Average Errors on 10 folds
My implementation
train RMSE:  4.67079221867
test RMSE:  4.79290191691
sklearn
train RMSE:  4.67079221867
test RMSE:  4.79290191691


In [61]:
boston_y.std()

9.1880115452782025

The results were exactly the same as sklearns implementation. This simple model has a RMSE of about 4.8 which is a little bit more than one half of a standard deviation on the target set. 

# Regularization and Gradient Descent

In [17]:
def find_gradient(X,y,y_hat):
    return np.dot(X.T, y - y_hat)

In [19]:
def iteration(X,y,beta, learning_rate):
    '''
    This function takes in X,y, beta and a learning rate and updates beta for one 
    iteration of gradient descent. 
    '''
    y_hat = X.dot(beta)
    gradient = np.dot(X.T, y - y_hat)
    beta = beta - learning_rate*gradient
    return beta

In [None]:
def linear_regression_w_grad_descent(X,y, learning_rate)