# Regularization

Note that solutions to these problems require the material covered in the `Basic Pipelines` notebook.

In [None]:
# import the packages we'll use
## For data handling
import pandas as pd
import numpy as np
from numpy import meshgrid

## For plotting
import matplotlib.pyplot as plt
import seaborn as sns

## This sets the plot style
## to have a grid on a white background
sns.set_style("whitegrid")

## Theoretical Questions

##### 1. Deriving the Ridge Regression Estimator

Recall that finding the ridge regression coefficients involves minimizing the following:
$$
||y-X\beta||_2^2 + \alpha ||\beta||_2^2.
$$
But, this can be rewritten like so:
$$
(y-X\beta)^T(y-X\beta) + \alpha \beta^T \beta.
$$

Derive the estimate, $\hat{\beta}$ that minimizes this expression.

### Applied

##### 1. Building your own Ridge Regression estimator.

Using your answer to the Question 1. from the Theoretical section Write code using `numpy` to find the ridge regression coefficients for the following data. Remembering to include the normalizing step using `StandardScaler`. Fit the data with a high degree polynomial.

In [None]:
## The Data
x_train = 3*(np.pi/2)*np.random.random(500) - 2*np.pi
y_train = np.sin(x_train) + .3*np.random.randn(500)

x_test = 3*(np.pi/2)*np.random.random(500) - 2*np.pi
y_test = np.sin(x_test) + .3*np.random.randn(500)

In [None]:
## Code here




In [None]:
## Code here



In [None]:
## Code here



In [None]:
## Code here



In [None]:
## Code here




##### 2. The Elastic Net Algorithm

Elastic Net is a regularization regression algorithm that strives to set a middle ground between ridge regression and lasso. Here we set out to minimize:
$$
MSE + r\alpha ||\beta||_1 + \frac{1-r}{2}\alpha ||\beta||_2^2, \text{ for } r \in [0,1].
$$

$r$ is another hyperparameter, when $r=1$ we recover ridge regression. If $r=0$ we recover lasso.

Find the best elastic net model that includes all of the features from this `auto` data set to predict `mpg`. Learn about that data set here <a href="https://vincentarelbundock.github.io/Rdatasets/doc/ISLR/Auto.html">https://vincentarelbundock.github.io/Rdatasets/doc/ISLR/Auto.html</a>. Use cv find the best values for $r$ and $\alpha$. You can read the `ElasticNet` documentation here, <a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html">https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html</a>.

I'll prepare the data for you to save time.

In [None]:
from sklearn.linear_model import ElasticNet

In [None]:
auto = pd.read_csv("auto.csv")

auto.head()

In [None]:
auto_train = auto.copy().sample(frac=.75, random_state = 440)
auto_test = auto.copy().drop(auto_train.index)

In [None]:
from pandas.plotting import scatter_matrix

In [None]:
scatter_matrix(auto, figsize=(14,14), alpha=1)

plt.show()

`mpg`-is the target variable and continuous

`cylinders` is a feature, and categorical so we need to make dummy variables

`displacement` is a continuous feature

`weight` is a continuous feature

`acceleration` is a continuous feature

`year`, is not necessarily a continuous feature, but let's treat it that way for the purposes of this problem.

In [None]:
## preprocess data

scaler = StandardScaler()

# get one hot encoded variables for cylinders
auto_train[['three','four','five','six']] = pd.get_dummies(auto_train['cylinders'])[[3,4,5,6]]

X = np.empty((len(auto_train),8))

X[:,:4] = np.array(auto_train[['three','four','five','six']])

X[:,4:] = scaler.fit_transform(np.array(auto_train[['displacement','weight','acceleration','year']]))

y = np.array(auto_train['mpg'])

In [None]:
## Code here

## Import Cross Validation




In [None]:
## Code here

## Run Cross Validation



In [None]:
## Code here

## find the optimal alpha and r



##### 3. Feature selection for Advertising

Return to the best model we settled on for the Advertising data set in notebook 4. Using Ridge or Lasso Regression for feature selection in this model.

In [None]:
ads = pd.read_csv("Advertising.csv")

ads_train = ads.copy().sample(frac=.75, random_state = 440)
ads_test = ads.copy().drop(ads_train.index)

In [None]:
## Make the features we were interested in from class
ads_train['sqrt_TV'] = np.sqrt(ads_train['TV'])
ads_train['sqrtTV_radio'] = ads_train['sqrt_TV'] * ads_train['radio']

In [None]:
## Code here





In [None]:
## Code here





In [None]:
## Code here





In [None]:
## Code here





This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2021.

Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)