# Python Machine Learning: Regularization Solutions

---
### Challenge 1: Warm-Up

Before we get started, let's warm up by importing our data and performing a train test split. We've providing the importing code for you. Go ahead and split the data into train/test sets using an 80/20 split, and a random state of 23.

---

In [None]:
import pandas as pd
import numpy as np

from sklearn.linear_model import Lasso, LinearRegression, Ridge, RidgeCV
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

In [None]:
# Import data
data = pd.read_csv('../data/auto-mpg.csv')
# Remove the response variable and car name
X = data.drop(columns=['car name', 'mpg'])
# Assign response variable to its own variable
y = data['mpg'].astype(np.float64)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=23)

---
### Challenge 2: Benchmarking

Re-run the ordinary least squares on the data using `LinearRegression`. Then, create a new ridge regression where the `alpha` penalty is set equal to zero. How do the performances of these models compare to each other? How do they compare with the original ridge regression? Be sure to compare both the training performances and test performances.

---

In [None]:
from sklearn.linear_model import Ridge
# Create models
ridge = Ridge(
    # Regularization penalty
    alpha=10,
    random_state=1)
# Fit object
ridge.fit(X_train, y_train)

In [None]:
# Linear regression
ols = LinearRegression()
ols.fit(X_train, y_train)
# Ridge, no penalty
ridge2 = Ridge(alpha=0, random_state=2) 
ridge2.fit(X_train, y_train)

In [None]:
# Evaluate
print(f'Training R^2, Original Ridge: {ridge.score(X_train, y_train)}')
print(f'Test R^2, Original Ridge: {ridge.score(X_test, y_test)}')
print(f'Training R^2, OLS: {ols.score(X_train, y_train)}')
print(f'Test R^2, OLS: {ols.score(X_test, y_test)}')
print(f'Training R^2, Ridge with no penalty: {ridge2.score(X_train, y_train)}')
print(f'Test R^2, Ridge with no penalty: {ridge2.score(X_test, y_test)}')

- Ridge with no penalty is the same as OLS.
- Ridge regression with a penalty has slightly worse training performance, but slightly better test performance.

---
### Challenge 3: Performing a Lasso Fit

Below, we've imported the `Lasso` object from `scikit-learn` for you. Just like `Ridge`, it needs to know what the strength of the regularization penalty is before fitting to the data. 

Fit several Lasso models, with different regularization strengths. Try one with a regularization strength of zero, try one with a small but non-zero regularization strength, and try one with a very large regularization strength. Look at the coefficients. What do you notice?

---

In [None]:
lasso1 = Lasso(alpha=0.01)
lasso1.fit(X_train, y_train)
lasso1.coef_

In [None]:
lasso2 = Lasso(alpha=10)
lasso2.fit(X_train, y_train)
lasso2.coef_

In [None]:
lasso3 = Lasso(alpha=10000)
lasso3.fit(X_train, y_train)
lasso3.coef_