# Week 10: Regularized Linear Models

## On Noise

$$Y \approx \beta_0 + \beta_1X + \beta_2X + \dots + \beta_NX$$

Linear Regression finds the input-output relationahip as a weighted sum of the predictors.  
However, the data is not perfect.   
There is necessarily error/noise present  


**A Multiple Linear Regression Phenomenon**  
For a training given dataset, as more features are added to a model the $R^2$ increases even if the added parameter in uninformative.  
At a certain point, adding new parameters fits the model to the noise inherent in the data.  

## The Bias Variance Trade-off

<img src="https://miro.medium.com/max/1838/1*1BGl9kfU6nwO2QQ0-fWHcg.png" width="60%" style="margin-left:auto; margin-right:auto">



## Generalization Error

**Generalization Error** - a measure of how accurately a model can predict previously unseen data  

Comparing measures generalization is informative of the optimal model complexity

<img src="https://i.stack.imgur.com/0NbOY.png" width="80%" style="margin-left:auto; margin-right:auto">


<img src="https://miro.medium.com/max/875/0*XCe3mlLeGiUW3xfh" width="60%" style="margin-left:auto; margin-right:auto">

## Regularization: bringing to uniformity

**Regularized Linear Models**  

* Regularize a model to reduce overfitting: constrain it somehow
* For Linear Regression this means: constrain the weights (parameters) of the model. 
* This is usually implemented by adding a regularization term to the cost function

Today we will survey 3 regularization methods for linear models  

1. Ridge Regression
2. Lasso Regression
3. Elastic Net Regression

## Revisit the NYC Italian Restaurant Dataset

In [1]:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy.random as np
import seaborn as sns
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

path = 'https://raw.githubusercontent.com/SmilodonCub/DS4VS/master/datasets/nyc.csv'
df = pd.read_csv( path, encoding= 'unicode_escape' )

X = df.drop(['Price', 'Case', 'Restaurant'], axis=1)
y = df['Price']
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.20, random_state=42)

## Use `sklearn` to build a 'kitchen sink' MLR

we will use this both to see how MLR is done with `sklearn` and to compare performance with Regularization

In [4]:
# instantiate a Linear Regression Model
lin_mod = LinearRegression()
# fit the model to the training data
lin_mod.fit( X_train, y_train )
# print the model intercept & coefficients
print( lin_mod.intercept_, lin_mod.coef_ )

-25.883634392769906 [1.39643611 1.87937135 0.29545527 1.67469429]


Evaluate the Model Performance on unseen data...

## 1) Ridge Regression

**Ridge Regression**  

- add a term to the cost function that froces the model to minimize the model weights. 
- **Cost Function** $J(\theta) = \mbox{MSE}(\theta) + \alpha \frac{1}{2}\sum_{i=1}^n \theta_i^2$



In [3]:
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=1, solver="cholesky", random_state=42)
ridge_reg.fit(X_train, y_train)
ridge_reg.intercept_

-25.86340306756169

## Next week: Supervised Learning techniques for Categorical Target Variables
<img src="https://content.techgig.com/photo/80071467/pros-and-cons-of-python-programming-language-that-every-learner-must-know.jpg?132269" width="100%" style="margin-left:auto; margin-right:auto">