<div style="text-align: center"> <h1>Lasso & Ridge Regression(Regularized Regression)</h1></div>

# Understand Regularized Regression
> ### Regularized regression is a regression method with an additional constraint designed to deal with a large number of independent variables (a.k.a. predictors). It does so by imposing a larger penalty on unimportant ones, thus shrinking their coefficients towards zero.

### The objective of regularization is to end up with a model:

> That is simple and interpretable.

> That generalizes well beyond the sample of our study.

> Whose coefficients won’t change much if we replicate the study.

## **How regularized regression works**
#### Regularized regression works exactly like ordinary (linear or logistic) regression but with an additional constraint whose objective is to shrink unimportant regression coefficients towards zero.

#### And because these coefficients can either be positive or negative, minimizing the sum of the raw coefficients will not work. Instead, we can use 1 of the following constraints:

#### Either to minimize the sum of the absolute value of the regression coefficients — we call this method L1 regularization (a.k.a. LASSO regression) Or to minimize the sum of the squares of the coefficients — we call this method L2 regularization (a.k.a. Ridge regression) And because of this tiny difference, these 2 methods will end up behaving very differently.

## Difference between L1 and L2 regularization

#### The biggest difference between L1 and L2 regularization is that L1 will shrink some coefficients to exactly zero (practically excluding them from the model), making it behave as a variable selection method.

#### In contrast, because L2 minimizes the sum of the squares of the coefficients, it will affect larger ones much more than it will shrink smaller ones, so coefficients close to zero will barely be shrunk further. Therefore, with L2 regularization, we end up with a model that has a lot of coefficients close to, but not exactly zero.

## So is L1 better than L2 regularization?

#### Not necessarily.

#### LASSO (L1 regularization) is better when we want to select variables from a larger subset, for instance for exploratory analysis or when we want a simple interpretable model. It will also perform better (have a higher prediction accuracy) than ridge regression in situations where a small number of independent variables are good predictors of the outcome and the rest are not that important.

#### Ridge regression (L2 regularization) performs better than LASSO when we have a large number of variables (or even all of them) each contributing a little bit in predicting the outcome.

In [20]:
# import the necessary libraries
from sklearn.linear_model import Lasso, Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import pandas as pd

# load the Boston housing dataset
boston = pd.read_csv(r"C:\Users\ds12\College\ML-Assignments\ML Pesentation\bh.csv")
boston.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


In [21]:
X = boston.drop('medv', axis=1)
y = boston[['medv']]

In [23]:
# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [24]:
# Creating Grid Search for Lasso Regression

from sklearn.model_selection import GridSearchCV

# Create the parameter grid based on the results of random search

param_grid = {
    'alpha': [0.001, 0.01, 0.1, 1, 10, 100, 1000],
    'max_iter': [100, 1000, 2500, 5000]
}

# Create a based model

lasso = Lasso()

# Instantiate the grid search model

grid_search = GridSearchCV(estimator = lasso, param_grid = param_grid,
                            cv = 3, n_jobs = -1, verbose = 2)

# Fit the grid search to the data

grid_search.fit(X_train, y_train)

# print the best parameters

print(grid_search.best_params_)
print(grid_search.best_score_)
print(grid_search.best_estimator_)



Fitting 3 folds for each of 28 candidates, totalling 84 fits
{'alpha': 0.001, 'max_iter': 100}
0.7145726021554255
Lasso(alpha=0.001, max_iter=100)


In [25]:
# apply L1 regression
lasso = Lasso(alpha=0.001, max_iter=100)
lasso.fit(X_train, y_train)
lasso_predictions = lasso.predict(X_test)
lasso_mse = mean_squared_error(y_test, lasso_predictions)
print("L1 Regression MSE: ", lasso_mse)

L1 Regression MSE:  21.531346225526644


In [26]:
from sklearn.model_selection import GridSearchCV

# Create the parameter grid based for ridge regression

param_grid = {
    'alpha': [0.001, 0.01, 0.1, 1, 10, 100, 1000],
    'max_iter': [100, 1000, 2500, 5000]
}

# Create a based model

ridge = Ridge()

# Instantiate the grid search model

grid_search = GridSearchCV(estimator = ridge, param_grid = param_grid,
                            cv = 3, n_jobs = -1, verbose = 2)

# Fit the grid search to the data

grid_search.fit(X_train, y_train)

# print the best parameters

print(grid_search.best_params_)
print(grid_search.best_score_)
print(grid_search.best_estimator_)

Fitting 3 folds for each of 28 candidates, totalling 84 fits
{'alpha': 0.1, 'max_iter': 100}
0.71460936512861
Ridge(alpha=0.1, max_iter=100)


In [27]:


# apply L2 regression
ridge = Ridge(alpha=0.1,max_iter=100)
ridge.fit(X_train, y_train)
ridge_predictions = ridge.predict(X_test)
ridge_mse = mean_squared_error(y_test, ridge_predictions)
print("L2 Regression MSE: ", ridge_mse)

L2 Regression MSE:  21.58511591502429
