# Regularization

This notebook will have some problems related to lasso and ridge regression.

In [None]:
# import the packages we'll use
## For data handling
import pandas as pd
import numpy as np

## For plotting
import matplotlib.pyplot as plt
import seaborn as sns

## This sets the plot style
## to have a grid on a white background
sns.set_style("darkgrid")

## Theoretical Questions

##### 1. Deriving the Ridge Regression Estimator

Recall that finding the ridge regression coefficients involves minimizing the following:
$$
||y-X\beta||_2^2 + \alpha ||\beta||_2^2.
$$
But, this can be rewritten like so:
$$
(y-X\beta)^T(y-X\beta) + \alpha \beta^T \beta.
$$

Derive the estimate, $\hat{\beta}$ that minimizes this expression.

##### Write here






## Applied Questions

##### 1. 

Using the formula you derived in 1. under theoretical questions write some `numpy` code to fit a ridge regression model on the followinig data.

Load this data first.

In [None]:
## The Data
x_train = 3*(np.pi/2)*np.random.random(size=500) - 2*np.pi
y_train = np.sin(x_train) + .3*np.random.randn(500)

x_test = 3*(np.pi/2)*np.random.random(size=500) - 2*np.pi
y_test = np.sin(x_test) + .3*np.random.randn(500)

Use `PolynomialFeatures` to produce up to the $40^{\text{th}}$ power of `x` as your features matrix `X_train`. Then use `StandardScaler` to scale `X_train` prior to fitting the ridge regression.

In [None]:
## code here



In [None]:
## code here




In [None]:
## code here



In [None]:
## code here






##### 2. The Elastic Net Algorithm

Elastic Net is a regularization regression algorithm that strives to set a middle ground between ridge regression and lasso. Here we set out to minimize:
$$
MSE + r\alpha ||\beta||_1 + \frac{1-r}{2}\alpha ||\beta||_2^2, \text{ for } r \in [0,1].
$$

$r$ is another hyperparameter, when $r=1$ we recover lasso regression. If $r=0$ we recover ridge.

First load in the `advertising.csv` data set below.

In [None]:
ads = pd.read_csv("../../../Data/advertising.csv")
ads['sqrt_TV'] = np.sqrt(ads['TV'])
ads['sqrt_TV_radio'] = np.sqrt(ads['sqrt_TV']*ads['radio'])

ads_train = ads.sample(frac=.8, random_state=443).copy()
ads_test = ads.drop(ads_train.index).copy()

In [None]:
ads_train.head()

Find the best elastic net model that includes all of the features to predicts `sales`.

Do this by setting up a square grid for $r$ and $\alpha$. For $r$ set up and evenly spaced grid from $0$ to $1$, for $\alpha$ choose values in incremental powers of $10$. Use cross-validation to choose the values of $r$ and $\alpha$ with the lowest avg. cv mse.

Note that the documentation for `ElasticNet` can be found here, <a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html">https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html</a>.

In [None]:
## code here




In [None]:
## code here



In [None]:
## code here





In [None]:
## code here


In [None]:
## code here





For the values I tested the elastic net model with $\alpha = 0.00001$ and $r=.1$ has the lowest avg cv mse.

##### 3.

Use lasso regression to choose features for a model predicting `sales` from `TV`, `radio`, `newspaper`, `sqrt_TV`, `sqrt_TV_radio`.

In [None]:
## code here





In [None]:
## code here




In [None]:
## code here



In [None]:
## code here






##### Write the model you would choose here




--------------------------

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2022.

Any potential redistributors must seek and receive permission from Matthew Tyler Osborne, Ph.D. prior to redistribution. Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)