##  THE MAIN DIFFERNCE BETWEEN RIDGE AND LASSO REGRESSION 

### We use ridge regression when we know that we will need all the columns in our dataset 

### Lasso regression is commonly used for feature selection when we have a high-dimensional dataset and suspect that only a subset of the features are important for predicting the target variable.

 By increasing the alpha parameter in Lasso regression, we can increase the strength of the L1 regularization penalty. This penalty encourages the coefficients of less important features to be pushed towards zero, effectively performing feature selection by shrinking the coefficients towards zero.

As you mentioned, as we increase the alpha value in Lasso regression, the coefficients of less important features tend to become zero. This property of Lasso regression makes it useful for identifying and eliminating irrelevant features, leading to a more parsimonious and interpretable model.


### But what if we have a huge dataset  and we are not sure wheter to use ridge or lasso 

### we can use elastic net which is the combination of both ridge and lasso regression 
 
### The loss function used in Elastic Net is a combination of the L1 (Lasso) and L2 (Ridge) regularization penalties. Elastic Net adds both penalties to the ordinary least squares (OLS) loss function. The resulting loss function is as follows:



loss = (1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * (l1_ratio * ||w||_1 + 0.5 * (1 - l1_ratio) * ||w||^2_2)

### In this equation:

* loss is the total loss function.

* n_samples is the number of samples in the dataset.

* y represents the target variable.

* X represents the feature matrix.

* w is the coefficient vector to be estimated.

* alpha is the overall regularization strength parameter.

* l1_ratio controls the mix between L1 and L2 regularization. It is a value between 0 and 1, where 0 corresponds to Ridge (L2) 
* regularization and 1 corresponds to Lasso (L1) regularization.

In [1]:
from sklearn.datasets import load_diabetes

import numpy as  np 
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

import time

In [2]:
X,y = load_diabetes(return_X_y=True)

In [3]:
print(X.shape)
print(y.shape)

(442, 10)
(442,)


In [4]:
X_train , X_test  , y_train , y_test = train_test_split(X , y , test_size = 0.2 ,random_state = 42)

In [5]:
from sklearn.linear_model import ElasticNet

In [9]:
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_score

param_grid = {'alpha': [0.001, 0.01, 0.1],
              'l1_ratio': [0.5, 0.7, 0.9]}

elastic_net = ElasticNet()

grid_search = GridSearchCV(estimator=elastic_net, param_grid=param_grid, scoring='r2')
grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

y_pred = best_model.predict(X_test)
r2 = r2_score(y_test, y_pred)
print("Best Parameters:", best_params)
print("R2 Score:", r2)

Best Parameters: {'alpha': 0.001, 'l1_ratio': 0.7}
R2 Score: 0.4611028760848328
