<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Ridge-and-Lasso-Regression---Lab" data-toc-modified-id="Ridge-and-Lasso-Regression---Lab-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Ridge and Lasso Regression - Lab</a></span><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Objectives" data-toc-modified-id="Objectives-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Objectives</a></span></li><li><span><a href="#Recall-our-cost-functions" data-toc-modified-id="Recall-our-cost-functions-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Recall our cost functions</a></span></li><li><span><a href="#An-example-using-our-auto-mpg-data" data-toc-modified-id="An-example-using-our-auto-mpg-data-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>An example using our <code>auto-mpg</code> data</a></span></li><li><span><a href="#Additional-reading" data-toc-modified-id="Additional-reading-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Additional reading</a></span></li><li><span><a href="#Summary" data-toc-modified-id="Summary-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Summary</a></span></li></ul></li></ul></div>

# Ridge and Lasso Regression - Lab

## Introduction

In this lab, you'll practice your knowledge on Ridge and Lasso regression!

## Objectives

You will be able to:

- Use Lasso and ridge regression in Python
- Compare Lasso and Ridge with standard regression

## Recall our cost functions

From previously, you know that when solving for a linear regression, you can express the cost function as

This is the expression for simple linear regression (for 1 predictor $x$). If you have multiple predictors, you would have something that looks like:

$$ \text{cost_function}= \sum_{i=1}^n(y_i - \hat{y})^2 = \sum_{i=1}^n(y_i - \sum_{j=1}^k(m_jx_{ij} + b))^2$$

where $k$ is the number of predictors.

In ridge regression, the linear regression cost function is changed by adding a penalty term to square of the magnitude of the coefficients.

$$ \text{cost_function_ridge}= \sum_{i=1}^n(y_i - \hat{y})^2 = \sum_{i=1}^n(y_i - \sum_{j=1}^k(m_jx_{ij} + b))^2 + \lambda \sum_{j=1}^p m_j^2$$

$$ \text{cost_function_lasso}= \sum_{i=1}^n(y_i - \hat{y})^2 = \sum_{i=1}^n(y_i - \sum_{j=1}^k(m_jx_{ij} + b))^2 + \lambda \sum_{j=1}^p \mid m_j \mid$$

Note that, for our gradients, when having multiple predictors $x_j$ with $j \in 1,\ldots, k$

$$ \frac{dJ}{dm_j}J(m_j,b) = -2\sum_{i = 1}^n x_{j,i}(y_i - (\sum_{j=1}^km_j{x_{ij}} + b)) = -2\sum_{i = 1}^n x_{ij}*\epsilon_i$$
$$ \frac{dJ}{db}J(m_j,b) = -2\sum_{i = 1}^n(y_i - (\sum_{j=1}^km_j{x_{ij}} + b)) = -2\sum_{i = 1}^n \epsilon_i $$
    

## An example using our `auto-mpg` data

Let's transform our continuous predictors in `auto-mpg` and see how they perform as predictors in a Ridge versus Lasso regression.

In [2]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import Lasso, Ridge, LinearRegression
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv("auto-mpg.csv") 
data['horsepower'].astype(str).astype(int)
y = data[["mpg"]]
X = data.drop(["mpg", "car name", "origin"], axis=1)

scale = MinMaxScaler()
transformed = scale.fit_transform(X)
X = pd.DataFrame(transformed, columns = X.columns)

data= pd.concat([y,X], axis=1, ignore_index= True)

data.head()

Unnamed: 0,0,1,2,3,4,5,6
0,18.0,1.0,0.617571,0.456522,0.53615,0.238095,0.0
1,15.0,1.0,0.728682,0.646739,0.589736,0.208333,0.0
2,18.0,1.0,0.645995,0.565217,0.51687,0.178571,0.0
3,16.0,1.0,0.609819,0.565217,0.516019,0.238095,0.0
4,17.0,1.0,0.604651,0.51087,0.520556,0.14881,0.0


Below, we created train-test-splits, and created Ridge, Lasso and Linear regression models

In [13]:
# Perform test train split
X_train , X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=12)

In [15]:
data_train

Unnamed: 0,0,1,2,3,4,5,6
44,18.0,0.6,0.490956,0.347826,0.382478,0.327381,0.083333
334,27.2,0.2,0.173127,0.206522,0.248653,0.458333,0.916667
224,20.5,0.6,0.421189,0.320652,0.513751,0.529762,0.583333
355,30.7,0.6,0.198966,0.163043,0.438616,0.690476,0.916667
11,14.0,1.0,0.702842,0.619565,0.565920,0.000000,0.000000
...,...,...,...,...,...,...,...
130,25.0,0.2,0.186047,0.157609,0.263397,0.535714,0.333333
241,21.5,0.0,0.031008,0.347826,0.313864,0.327381,0.583333
253,25.1,0.2,0.186047,0.228261,0.313864,0.440476,0.666667
155,15.0,1.0,0.728682,0.538043,0.801531,0.357143,0.416667


In [14]:
data_train = pd.concat([y_train,X_train], axis=1, ignore_index= True)

In [16]:
data_train = data_train.reset_index(drop=True).T

In [17]:
def step_gradient(b_current, m_current ,points):
    b_gradient = 0
    m_gradient = np.zeros(len(m_current))
    learning_rate = .01
    N = float(len(points))
    for i in range(0, len(points)):
        y = points[i][0]
        x = points[i][1:(len(m_current)+1)] 
        b_gradient += -(1/N)  * (y -  (sum(m_current * x) + b_current))
        m_gradient += -(1/N) * x * (y -  (sum(m_current * x) + b_current))
    new_b = b_current - (learning_rate * b_gradient)
    new_m = m_current - (learning_rate * m_gradient)
    return (new_b, new_m)

In [18]:
b = 0
m = [0,0,0,0,0,0]
updated_b, updated_m = step_gradient(b, m, data_train) # {'b': 0.0085, 'm': 0.6249999999999999}

In [19]:
# set our initial step with m and b values, and the corresponding error.
b = 0
m = [0,0,0,0,0,0]
iterations = []
for i in range(5000):
    iteration = step_gradient(b, m, data_train)
    b= iteration[0]
    m = []
    for j in range(len(iteration[1])):
        m.append(iteration[1][j+1])
    iterations.append(iteration)

In [20]:
iterations[4999]

(12.746290723761629, 1     1.571408
 2    -0.922501
 3    -0.640380
 4     0.519102
 5     7.499974
 6    10.899802
 Name: 0, dtype: float64)

In [23]:
# Build a Ridge, Lasso and regular linear regression model. 
# Note how in scikit learn, the regularization parameter is denoted by alpha (and not lambda)
ridge = Ridge(alpha=0.5)
ridge.fit(X_train, y_train)

lasso = Lasso(alpha=0.5)
lasso.fit(X_train, y_train)

lin = LinearRegression()
lin.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Next, let's create predictions for train and test sets.

In [25]:
# Create preditions for training and test sets
y_h_ridge_train = ridge.predict(X_train)
y_h_ridge_test = ridge.predict(X_test)

y_h_lasso_train = np.reshape(lasso.predict(X_train), (274,1))
y_h_lasso_test = np.reshape(lasso.predict(X_test), (118,1))

y_h_lin_train = lin.predict(X_train)
y_h_lin_test = lin.predict(X_test)

Look at the RSS for train and test for each of the three models.

In [26]:
print('Train Error Ridge Model', np.sum((y_train - y_h_ridge_train)**2))
print('Test Error Ridge Model', np.sum((y_test - y_h_ridge_test)**2))
print('\n')

print('Train Error Lasso Model', np.sum((y_train - y_h_lasso_train)**2))
print('Test Error Lasso Model', np.sum((y_test - y_h_lasso_test)**2))
print('\n')

print('Train Error Unpenalized Linear Model', np.sum((y_train - lin.predict(X_train))**2))
print('Test Error Unpenalized Linear Model', np.sum((y_test - lin.predict(X_test))**2))

Train Error Ridge Model mpg    2688.222824
dtype: float64
Test Error Ridge Model mpg    2074.197775
dtype: float64


Train Error Lasso Model mpg    4644.536425
dtype: float64
Test Error Lasso Model mpg    3696.183375
dtype: float64


Train Error Unpenalized Linear Model mpg    2658.043444
dtype: float64
Test Error Unpenalized Linear Model mpg    1976.266987
dtype: float64


We note that Ridge is clearly better than Lasso here, but that the unpenalized model performs best here. Let's see how including Ridge and Lasso changed our parameter estimates.

In [27]:
print('Ridge parameter coefficients:', ridge.coef_)
print('Lasso parameter coefficients:', lasso.coef_)
print('Linear model parameter coefficients:', lin.coef_)

Ridge parameter coefficients: [[ -2.11792413  -3.0112953   -1.90579654 -15.60758962  -1.61071692
    8.12940111]]
Lasso parameter coefficients: [-10.31005725  -0.          -0.          -2.27967948   0.
   3.88327477]
Linear model parameter coefficients: [[ -1.33790698  -1.05300843  -0.08661412 -20.08143923  -0.39639115
    8.56051229]]


You can clearly see how Lasso shrinks certain parameters to 0! The Ridge regression mostly affected the fourth parameter (estimated to be -20.08 for the linear regression model).

## Additional reading

Full code examples for Ridge and Lasso regression, advantages and disadvantages, and how to code ridge and Lasso in Python can be found [here](https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/).

Make sure to have a look at the Scikit-Learn documentation for [Ridge](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html) and [Lasso](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html).


## Summary

Great! You now know how to perform Lasso and Ridge regression. Let's move on to the lab to explore Lasso and Ridge further!