<h1 style="text-align : center"> <font color="red" size=8>REGULARIZATION </h1>

## <font color="dark blue">WHAT IS REGULARIZATION?
- Regularization introduces a penalty for more complex models, effectively reducing their complexity and encouraging the model to learn more generalized patterns

- Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function, discouraging the model from assigning too much importance to individual features or coefficients

- It explicitly adds penalty term to the optimization problem.


## <font color="dark blue">WHY WE USE REGULARIZATION?
- Control Complexity of Model

- Preventing Overfitting

- Balancing Bias & Variance

- Feature Selection

- Handling Multicollinearity

## <font color="dark blue">TYPES OF REGULARIZATION?

## <font color="blue">1. LASSO (L1) REGRESSION
- LASSO stands for __Least Absolute Shrinkage & Selection Operator__.

- LASSO also know as __L1 Regularization__.

- LASSO Regression adds the __`absolute value of magnitude`__ of the coefficient as a penalty term to the loss function(L).

- LASSO regression also helps us achieve feature selection by penalizing the weights to approximately equal to zero if that feature does not serve any purpose in the model.

$$ \large Cost = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \ + \lambda \sum_{i=1}^{m} |w_i| $$
    
$$ where $$
    
$$ m \rightarrow Number \ of \ Features $$
$$ n \rightarrow Number \ of \ Examples $$ 
$$ y_i \rightarrow Actual \ Target \ Value $$
$$ \hat{y} \rightarrow Predicted \ \ Target \ Value $$

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
from sklearn.datasets import load_diabetes

data=load_diabetes()

In [4]:
X=data.data
y=data.target

In [5]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=45)

In [6]:
from sklearn.linear_model import LinearRegression,Ridge,Lasso,ElasticNet
L=LinearRegression()

In [7]:
L.fit(X_train,y_train)

LinearRegression()

In [8]:
y_pred=L.predict(X_test)

In [9]:
from sklearn.metrics import r2_score,mean_squared_error

print("R2 score",r2_score(y_test,y_pred))
print("RMSE",np.sqrt(mean_squared_error(y_test,y_pred)))

R2 score 0.5188118914964637
RMSE 48.72710829141399


### <font color="orange"> CODE FOR LASSO REGRESSION

In [10]:
Las = Lasso(alpha=0.01)
Las.fit(X_train,y_train)
y_pred = Las.predict(X_test)
r2_score(y_test,y_pred)

0.5239820389650526

## <font color="purple"> PARAMETERS OF LASSO REGRESSION CLASS
__1. alpha__:
- This is the most important parameter. It controls the strength of the regularization.
- A higher alpha value means stronger regularization, which can lead to more features being excluded from the model.


__2. fit_intercept__:
- Whether to calculate the intercept term in the regression equation.
- Typically, you'll want to set this to `True`.


__3. precompute__:
- Whether to precompute the Gram matrix to speed up calculations. This can be useful for large datasets.


__4. copy_X__:
- Whether to copy the input data before fitting the model. This can be useful to avoid side effects.


__5. max_iter__:
- The maximum number of iterations for the optimization algorithm.


__6. tol__:
- Tolerance for the optimization algorithm.


__7. warm_start__:
- Whether to use the solution of the previous call to the fit method as the initial guess for the current call.


__8. positive__:
- Whether to force the coefficients to be positive.


__9. random_state__:
- The seed used by the random number generator.


__10. selection__:
- The algorithm used to select features during the optimization process.

## <font color="blue">2. RIDGE (L1) REGRESSION
- Ridge also know as __L2 Regularization__.

- Ridge Regression adds the __`squared of magnitude`__ of the coefficient as a penalty term to the loss function(L).

- For every dimension, we square the slope/coefficient so that it is called __L2 Regularization__.

$$ \large Cost = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \ + \lambda \sum_{i=1}^{m} w_i^2 $$

### <font color="orange">CODE FOR RIDGE REGRESSION

In [11]:
R=Ridge(alpha=0.0001)

In [12]:
R.fit(X_train,y_train)

Ridge(alpha=0.0001)

In [13]:
y_pred1=R.predict(X_test)

In [14]:
print("R2 score",r2_score(y_test,y_pred1))
print("RMSE",np.sqrt(mean_squared_error(y_test,y_pred1)))

R2 score 0.5189738344370789
RMSE 48.718908093712855


## <font color="purple"> PARAMETERS OF RIDGE REGRESSION CLASS
__1. alpha__:
- This is the most important parameter. It controls the strength of the regularization.
- A higher alpha value means stronger regularization, which can lead to smaller coefficient values.


__2. fit_intercept__:
- Whether to calculate the intercept term in the regression equation.
- Typically, you'll want to set this to `True`.


__3. copy_X__:
- Whether to copy the input data before fitting the model. This can be useful to avoid side effects.


__4. max_iter__:
- The maximum number of iterations for the optimization algorithm.


__5. tol__:
- Tolerance for the optimization algorithm.


__6. solver__:
- The algorithm to use for optimization.
- `auto` selects the best solver based on the data.


__7. positive__:
- Whether to force the coefficients to be positive.


__8. random_state__:
- The seed used by the random number generator.

## <font color="blue">3. ELASTIC NET (L1 & L2) REGRESSION
- It is the combination of both L1 & L2 Regularization.

- With the help of an extra hyperparameter that controls the ratio of the L1 and L2 regularization

- For every dimension, we square the slope/coeeficient so that it is called __L2 Regularization__.

$$ \large Cost = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \ + \lambda ((1-\alpha) \sum_{i=1}^{m} |w_i| + \alpha \sum_{i=1}^{m}w_i^2) $$

### <font color="orange">CODE FOR ELASTIC NET REGRESSION

In [15]:
Ela = ElasticNet(alpha=0.005,l1_ratio=0.9)
Ela.fit(X_train,y_train)
y_pred = Ela.predict(X_test)
r2_score(y_test,y_pred)

0.5171423473020662

## <font color="purple"> PARAMETERS OF ELASTIC NET REGRESSION CLASS
__1. alpha__:
- Controls the overall strength of regularization. A higher alpha means stronger regularization.


__2. l1_ratio__:
- Controls the balance between L1 and L2 regularization.
    - If l1_ratio=0, it's equivalent to Ridge regression.
    - If l1_ratio=1, it's equivalent to Lasso regression.
    - Values between 0 and 1 give a combination of both.

__3. fit_intercept__:
- Whether to calculate the intercept term in the regression equation.
- Typically, you'll want to set this to True.


__4. precompute__:
- Whether to precompute the Gram matrix to speed up calculations. This can be useful for large datasets.


__5. max_iter__:
- The maximum number of iterations for the optimization algorithm.


__6. tol__:
- Tolerance for the optimization algorithm.


__7. warm_start__:
- Whether to use the solution of the previous call to the fit method as the initial guess for the current call.


__8. positive__:
- Whether to force the coefficients to be positive.


__9. random_state__:
- The seed used by the random number generator.


__10. selection__:
- The algorithm used to select features during the optimization process.

## <font color="green">Benefits of Regularization

- Regularization improves model generalization by reducing overfitting. Regularized models learn underlying patterns, while overfit models memorize noise in training data.


- Regularization techniques such as L1 (Lasso) L1 regularization simplifies models and improves interpretability by reducing coefficients of less important features to zero.


- Regularization improves model performance by preventing excessive weighting of outliers or irrelevant features.


- Regularization makes models stable across different subsets of the data. It reduces the sensitivity of model outputs to minor changes in the training set.


- Regularization prevents models from becoming overly complex, which is especially important when dealing with limited data or noisy environments.


- Regularization can help handle multicollinearity (high correlation between features) by reducing the magnitudes of correlated coefficients.


- Regularization introduces hyperparameters (e.g., alpha or lambda) that control the strength of regularization. This allows fine-tuning models to achieve the right balance between bias and variance.


- Regularization promotes consistent model performance across different datasets. It reduces the risk of dramatic performance changes when encountering new data