## Machine Learning Workshop: Week 4¶

## Ridge and LASSO regressions
We now want to understand Ridge and LASSO regression models. 
There are two reasons that we are not satisfied with the Least Squares Estimators (LSE):
a) The prediction accuracy: the LSE often have low bias but large variance, which indicates that the prediction often has low accuracy. The prediction accuracy often could be improved by shrinking or setting some coefficients to zero.
b) The interpretation: LSE provides estimates of coefficients. When there are large number of predictors, we often would like to detimine a smaller subset that exhibit the strongest effects. In order to get a big picture, we are willing to sacrifice some of the small details.

To understand Ridge and LASSO regressions, I strongly recomment to read the Chapter 3 in the book: T. Hastie, R. Tibsirani, & J. Friedman, The Elements of Statistical Learning, Springer, Second Edition, 2009.

Ridge Regression:
The ridge regression shrinks the regression coefficients by adding a quadratic penalty, which is to solve the following optimization problem:
$$\hat w^{ridge} = argmin_w \sum_{i=1}^N(y_i - w_0 - \sum_{j = 1}^d w_j x_{ij})^2 + \lambda \sum_{j = 1}^d w_j^2$$
where $\lambda\geq0$ is a penalty controling the amount of shrinkage: the larger the value of $\lambda$, the greater the amount of shrinkage.   

LASSO regression:
Similar to redge regression, the LASSO regression adds a penalty to the coefficients which imposes a greater shrinkage. It is to solve the following problem:
$$\hat w^{lasso} = argmin_w \sum_{i=1}^N(y_i - w_0 - \sum_{j = 1}^d w_j x_{ij})^2 + \lambda \sum_{j = 1}^d |w_j|$$

To better understand these algorithms, we work on the following data set.

## Prostate Cancer Example
In this example, we want to explore the correlation between the level of prostate-specific antigen (lpsa) and a number of clinical measures in men who were about to receive a radical prostatectomy. These variables are log cancer volume (lcavol), log prostate weight (lweight), age, log of the amount of benigh prostatic hyperplasia (lbph), seminal vesicle invasion (svi), log of capsular penetration (lcp), Gleason score (gleason), and percent of Gleason scores 4 or 5 (pgg45).

You can load the data prostate_dataset.txt provided. In this excercise, you are asked to apply the LSE, Ridge and LASS regressions to the data. You can use Python modules such as scikit-learn to implement these algorithms. Our target is to understand the differences between these algorithms. After you have applied these algorithms, can you identify which are the factors affecting lpsa? You can try different experiments, for example, varying the parameter $\lambda$ for ridge and lasso model.

A further question: can you choose the best model from LSE, Ridge and LASSO for this dataset?

In [3]:
import pandas as pd


In [4]:
data = pd.read_csv('prostate_dataset.txt',delimiter="\t")
print(data)

    col    lcavol   lweight  age      lbph  svi       lcp  gleason  pgg45  \
0     1 -0.579818  2.769459   50 -1.386294    0 -1.386294        6      0   
1     2 -0.994252  3.319626   58 -1.386294    0 -1.386294        6      0   
2     3 -0.510826  2.691243   74 -1.386294    0 -1.386294        7     20   
3     4 -1.203973  3.282789   58 -1.386294    0 -1.386294        6      0   
4     5  0.751416  3.432373   62 -1.386294    0 -1.386294        6      0   
5     6 -1.049822  3.228826   50 -1.386294    0 -1.386294        6      0   
6     7  0.737164  3.473518   64  0.615186    0 -1.386294        6      0   
7     8  0.693147  3.539509   58  1.536867    0 -1.386294        6      0   
8     9 -0.776529  3.539509   47 -1.386294    0 -1.386294        6      0   
9    10  0.223144  3.244544   63 -1.386294    0 -1.386294        6      0   
10   11  0.254642  3.604138   65 -1.386294    0 -1.386294        6      0   
11   12 -1.347074  3.598681   63  1.266948    0 -1.386294        6      0   