In this notebook I am going to demonstrate the ridge and lasso regression methods. I will show some numpy code for each method and how their performance compares on three types of datasets, uncorrelated predictors, collinear predictors and random data. Least squares and ridge regressions are performed analytically and the lasso regression is estimated numerically using the forward-stagewise algorithm.

In [112]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

Ridge regression was invented to deal with collinearity among predictors, and lasso to deal with both shrinkage and model selection. In that case, as we will demonstrate below, if predictors are minimally correlated and all coefficients are substantial, the regression coefficients from the least squares, lasso and ridge regressions should be near identical.

In [113]:
#Generate uncorrelated Data
n=1000
p=4
sigma = .1
z = 4*np.random.randn(n,p)
y = 3*z[:,0] - z[:,1] + 2*z[:,2] + sigma*np.random.randn(n)

In [114]:
#Least squares
beta_ls = np.dot(np.dot(np.linalg.inv(np.dot(z.T,z)),z.T),y)
print 'The least squares coefficients are: {}'.format(beta_ls)

The least squares coefficients are: [  2.99952456e+00  -9.99992380e-01   2.00060114e+00   7.79235876e-05]


In [115]:
#Ridge solution
lam = 2
beta_r = np.dot(np.dot(np.linalg.inv(np.dot(z.T,z)+lam*np.eye(np.shape(z)[1])),z.T),y)
print 'The ridge regression coefficients with lamda={} are: {}'.format(lam,beta_r)

The ridge regression coefficients with lamda=2 are: [  2.99915701e+00  -9.99903414e-01   2.00036054e+00   8.33228918e-05]


In [116]:
#Lasso solution (forward-stagewise algorithm)

eps = .05
iters = 10000
res = np.zeros([np.shape(y)[0],iters])
beta_l = np.zeros(np.shape(z)[1])
res[:,0] = y

for ii in range(iters):
    pred=np.argmax( np.abs(np.corrcoef(np.column_stack([res[:,ii],z]).T)[1:,0]) ) #Most correlated predictor
    delta_pred = eps*np.sign(np.dot(z[:,pred].T,res[:,ii]))
    beta_l[pred] +=  delta_pred
    if ii<iters-1:
        res[:,ii+1] = res[:,ii] - delta_pred*z[:,pred] 
        
print 'The lasso regression coefficients are: {}'.format(beta_l)
    

The lasso regression coefficients are: [ 3. -1.  2.  0.]


In [117]:
#MSE between beta_ls and beta_r
mse1=np.sum((beta_ls-beta_r)**2)
mse2=np.sum((beta_ls-beta_l)**2)
print 'The mean squared error between:\nridge and least square coefficients = {}\nlasso and least square coefficients = {}'.format(mse1,mse2)


The mean squared error between:
ridge and least square coefficients = 2.00925654194e-07
lasso and least square coefficients = 5.93546292807e-07


As you can see the ridge and lasso coefficients are nearly identical to our traditional least squares solution with the exception that lasso successfully zerod the non-generative predictor.
Below, however, we are going to generate collinear predictors and we can see the effect that has on the three sets of regression coefficients.

In [118]:
#Generate correlated Data
z = 4*np.random.randn(n,p)
z[:,2] = .5*z[:,1]+z[:,0]
y = 3*z[:,0] - z[:,1] + 2*z[:,2] + sigma*np.random.randn(n)

In [119]:
#Least squares
beta_ls = np.dot(np.dot(np.linalg.inv(np.dot(z.T,z)),z.T),y)
print 'The least squares coefficients are: {}'.format(beta_ls)

The least squares coefficients are: [  4.83588071e+00  -8.63562085e-02   1.90852256e-02   8.99337678e-04]


In [120]:
#Ridge solution
lam = 2
beta_r = np.dot(np.dot(np.linalg.inv(np.dot(z.T,z)+lam*np.eye(np.shape(z)[1])),z.T),y)
print 'The ridge regression coefficients with lamda={} are: {}'.format(lam,beta_r)

The ridge regression coefficients with lamda=2 are: [  2.77707369e+00  -1.10901034e+00   2.22256852e+00   8.90635557e-04]


In [121]:
#Lasso solution (forward-stagewise algorithm)

eps = .05
iters = 10000
res = np.zeros([np.shape(y)[0],iters])
beta_l = np.zeros(np.shape(z)[1])
res[:,0] = y

for ii in range(iters):
    pred=np.argmax( np.abs(np.corrcoef(np.column_stack([res[:,ii],z]).T)[1:,0]) ) #Most correlated predictor
    delta_pred = eps*np.sign(np.dot(z[:,pred].T,res[:,ii]))
    beta_l[pred] +=  delta_pred
    if ii<iters-1:
        res[:,ii+1] = res[:,ii] - delta_pred*z[:,pred] 
        
print 'The lasso regression coefficients are: {}'.format(beta_l)

The lasso regression coefficients are: [ 5.  0.  0.  0.]


In [122]:
#MSE between beta_ls and beta_r
mse1=np.sum((beta_ls-beta_r)**2)
mse2=np.sum((beta_ls-beta_l)**2)
print 'The mean squared error between:\nridge and least square coefficients = {}\nlasso and least square coefficients = {}'.format(mse1,mse2)


The mean squared error between:
ridge and least square coefficients = 10.1398464555
lasso and least square coefficients = 0.0347575901274


In [123]:
#Generate random Data

z = 4*np.random.randn(n,p)
y = sigma*np.random.randn(n)

In [124]:
#Least squares
beta_ls = np.dot(np.dot(np.linalg.inv(np.dot(z.T,z)),z.T),y)
print 'The least squares coefficients are: {}'.format(beta_ls)

The least squares coefficients are: [ 0.00086766 -0.00025277  0.00120982 -0.0004141 ]


In [125]:
#Ridge solution
lam = 2
beta_r = np.dot(np.dot(np.linalg.inv(np.dot(z.T,z)+lam*np.eye(np.shape(z)[1])),z.T),y)
print 'The ridge regression coefficients with lamda={} are: {}'.format(lam,beta_r)

The ridge regression coefficients with lamda=2 are: [ 0.00086754 -0.00025275  0.00120966 -0.00041403]


In [126]:
#Lasso solution (forward-stagewise algorithm)

eps = .05
iters = 10000
res = np.zeros([np.shape(y)[0],iters])
beta_l = np.zeros(np.shape(z)[1])
res[:,0] = y

for ii in range(iters):
    pred=np.argmax( np.abs(np.corrcoef(np.column_stack([res[:,ii],z]).T)[1:,0]) ) #Most correlated predictor
    delta_pred = eps*np.sign(np.dot(z[:,pred].T,res[:,ii]))
    beta_l[pred] +=  delta_pred
    if ii<iters-1:
        res[:,ii+1] = res[:,ii] - delta_pred*z[:,pred] 
        
print 'The lasso regression coefficients are: {}'.format(beta_l)

The lasso regression coefficients are: [ 0.  0.  0.  0.]


In [127]:
#MSE between beta_ls and beta_r
mse1=np.sum((beta_ls-beta_r)**2)
mse2=np.sum((beta_ls-beta_l)**2)
print 'The mean squared error between:\nridge and least square coefficients = {}\nlasso and least square coefficients = {}'.format(mse1,mse2)


The mean squared error between:
ridge and least square coefficients = 4.32071950523e-14
lasso and least square coefficients = 2.45186527761e-06


As you can see above, the ridge and least squares coefficients are all small, but the methods are incapable of pushing them to zero. Only the lasso regression correctly determines that all coefficients are non-generative of y. 