<a href="https://colab.research.google.com/github/dominicwhite/kaggle-club/blob/master/python/linear_regression_and_friends.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear regression and friends

## The `diabetes` dataset

We will be using sklearn's built-in diabetes dataset.

In [None]:
from sklearn import datasets

diabetes = datasets.load_diabetes()

In [None]:
print(diabetes.DESCR)
print(diabetes.feature_names)

In [None]:
diabetes_X = diabetes.data
diabetes_y = diabetes.target

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(diabetes_X, diabetes_y, test_size = 0.2)

## Basic Linear regression in `scikit-learn`

![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Linear_regression.svg/438px-Linear_regression.svg.png)

For linear regression, we can use the `LinearRegression` class.


In [None]:
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()

lin_reg.fit(X_train, y_train)

We can now look at the linear model we just created.

The coefficients are stored in the `coef_` attribute.

> **Pro-tip**: any attributed created by the `fit(...)` method will end with an underscore.

In [None]:
print(lin_reg.coef_)

Let's make this a bit more readable by associating each coefficient with its variable name.

In [None]:
import pandas as pd

pd.DataFrame(lin_reg.coef_, index = diabetes.feature_names)

What about the intercept?

In [None]:
print('Intercept: \n', lin_reg.intercept_)

In [None]:
lin_reg_y_pred = lin_reg.predict(X_test)

We can calculate some error metrics

In [None]:
from sklearn.metrics import mean_squared_error, r2_score

print('Mean squared error: %.2f'
      % mean_squared_error(y_test, lin_reg_y_pred))
# The R-squared value (aka coefficient of determination): 1 is perfect prediction
print('R-squared: %.2f'
      % r2_score(y_test, lin_reg_y_pred))


## How does linear regression work?

We find the line that minimizes the sum of the squared residuals.

This is also known as the mean squared error (MSE).

It turns out that the MSE is always a parabola w.r.t. the coefficients of the model:

![alt text](https://www.onlinemath4all.com/images/minimumvalueofparabola.png)

For linear regression we can find an exact solution.



*   Differentiate the cost function equation (MSE)
*   Set the differential == 0.
*   Solve!



## Polynomial regression aka. how to get lots of features

A standard linear regression might look like this:

$y = m_1 x_1 + m_2 x_2 + c$

But this assumes that y and each x are linearly related.

Maybe they aren't in which case we can transform the variables, e.g. by taking their logarithm, or by calculating polynomial variations:

$y = m_1 x_1 + m_2 x_2 + m_3 x_1^2 + m_4 x_2^2 + c$

In [None]:
from sklearn.preprocessing import PolynomialFeatures

poly_features = PolynomialFeatures(degree=2, include_bias = False)
X_train_poly = poly_features.fit_transform(X_train)

lin_reg_poly = LinearRegression()
lin_reg_poly.fit(X_train_poly, y_train)

In [None]:
X_train_poly

## Regularized Linear Models

If we have a lot of features in our training data, we are likely to overfit.

However, we don't want to arbitrarily exclude features - after all they might include useful information.

One solution is **regularization**: we penalize features in our cost function as well as the fit.

For example, in standard linear regression:

cost_function = MSE

In regularized linear regression:

cost_function = MSE + regularization

Alternatively we can find an approximation of the minimum using gradient descent:

![alt text](https://images.deepai.org/glossary-terms/dd6cdd6fcfea4af1a1075aac0b5aa110/sgd.png)

This can be a lot quicker when you have large amounts of data (lots of rows), or a lot of features in X (lots of columns).

However, you have to decide on a learning rate (don't want this to be too big or too small).

### Ridge regression

In ridge regression, the regularization term is the sum of the coefficients squared.

So, our cost function is:

$cost function = MSE + \alpha \sum{(coefs)^2}$

where 
* $\alpha$ is a constant set by **you** that determines how strong this regularization penalty is.

There's a great video on Youtube that introduces Ridge regression very simply and shows how it's predictions compare to standard regression (~20 mins).

> *Note: the narrator recommends a bunch of other videos to watch first, but you can ignore most of these. The Bias-Variance video might be useful if you have never heard of those concepts, but you can also just think of "Bias" as underfitting, and "Variance" as overfitting.*

<a href="http://www.youtube.com/watch?feature=player_embedded&v=Q81RR3yKn30
" target="_blank"><img src="http://img.youtube.com/vi/Q81RR3yKn30/0.jpg" 
alt="Elastic Net" width="480" height="360" border="1" /></a>

In Scikit-Learn, we can use the `Ridge` class to do Ridge regression just like we did standard linear regression (more details on Ridge regression with sklearn [in the official guide](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression))

In [None]:
from sklearn.linear_model import Ridge

ridge_reg = Ridge(alpha = 1) # note that we have to set alpha

ridge_reg.fit(X_train, y_train)

#### Exercises

1. Repeat the steps we carried out for the linear regression above to look at the models fit and accuracy.

2. What happens if you set alpha to 0?

### Lasso regression

Like ridge regression, but our regularization term is sum of the absolute values of the coefficients (rather than their squares).

$cost function = MSE + \alpha \sum{|coefs|}$

The same Youtube channel as before has another (even shorter) video on Lasso regression:

<a href="http://www.youtube.com/watch?feature=player_embedded&v=NGf0voTMlcs
" target="_blank"><img src="http://img.youtube.com/vi/NGf0voTMlcs/0.jpg" 
alt="Elastic Net" width="480" height="360" border="1" /></a>

In Scikit-Learn, we can use the `Lasso` class to do Lasso regression just like we did standard or Ridge regression (more details on Lasso regression with sklearn [in the official guide](https://scikit-learn.org/stable/modules/linear_model.html#lasso))

In [None]:
from sklearn.linear_model import Lasso

lasso_reg = Lasso(alpha = 1) # note that we have to set alpha

lasso_reg.fit(X_train, y_train)

#### Exercise

1. Repeat the steps we carried out for the linear regression above to look at the models fit and accuracy.

### More exercises

1. We can combine Ridge and Lasso regression into the same regression model. This is called **ElasticNet** and we can use a sklearn class of the same name: `ElasticNet` (more details in the [documentation](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression)). Try using Elastic Net to create an a different regression model.

  You can learn more about Elastic Net in this Youtube video:

  <a href="http://www.youtube.com/watch?feature=player_embedded&v=1dKRdX9bfIo
" target="_blank"><img src="http://img.youtube.com/vi/1dKRdX9bfIo/0.jpg" 
alt="Elastic Net" width="480" height="360" border="1" /></a>

2. We have to set alpha (the model can't figure it out for us). How do we decide the best value? **Cross-validation!**

  i. One way of getting the optimal value of alpha with cross validation is to manually search through values of alpha using the `GridSearchCV` class. Link to documentation 

  ii. Alternatively there are versions of each regularized linear regression which will do cross validation much more simply:
    * [LassoCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html)
    * [RidgeCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html)
    * [ElasticNetCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html)

    For example, you can use code like this:

    ```python
    from sklearn.linear_model import ElasticNetCV

    # Values of alpha to try:
    alphas = [0.1, 0.5, 1, 2, 4, 8]

    elastic_cv = ElasticNetCV(alphas=alphas, cv=5)
    elastic_cv.fit(X_train, y_train)
    
    # Then we use elastic_cv like any of the previous models:
    print(elastic_cv.alpha_)
    print(elastic_cv.intercept_)
    ```