Regularization is a technique used in machine learning to prevent overfitting by adding a penalty to the model's complexity. This helps improve the model's generalization to new, unseen data by discouraging it from becoming too complex and fitting the noise in the training data. There are several types of regularization methods, with the most common ones being L1 regularization (Lasso), L2 regularization (Ridge), and Elastic Net regularization./

### Types of Regularization

1. L1 Regularization (Lasso)
2. L2 Regularization (Ridge)
3. Elastic Net Regularization

### 1. L1 Regularization (Lasso) 

L1 regularization adds a penalty equal to the absolute value of the magnitude of the coefficients. This results in some coefficients being shrunk to zero, effectively performing feature selection.

The cost function for L1 regularization is:

![Screenshot%202024-05-22%20at%2011.36.28%E2%80%AFPM.png](attachment:Screenshot%202024-05-22%20at%2011.36.28%E2%80%AFPM.png)

where λ is the regularization parameter, and θj​ are the model parameters.

#### Benefits

1. Can produce sparse models (many coefficients are zero), which helps in feature selection.
2. Useful when you have many features and expect only a few of them to be important.

### 2. L2 Regularization (Ridge)

L2 regularization adds a penalty equal to the square of the magnitude of the coefficients. This tends to shrink the coefficients uniformly but does not eliminate them.

The cost function for L2 regularization is:

![Screenshot%202024-05-22%20at%2011.39.00%E2%80%AFPM.png](attachment:Screenshot%202024-05-22%20at%2011.39.00%E2%80%AFPM.png)

where λ is the regularization parameter, and θj​ are the model parameters.

#### Benefits

1. Prevents the coefficients from becoming too large, reducing model complexity.
2. Helps in improving the stability and performance of the model.

### 3. Elastic Net Regularization

Elastic Net regularization combines both L1 and L2 regularization. It adds a penalty equal to the weighted sum of both the absolute values and the squares of the coefficients.

The cost function for Elastic Net regularization is:

![Screenshot%202024-05-22%20at%2011.41.20%E2%80%AFPM.png](attachment:Screenshot%202024-05-22%20at%2011.41.20%E2%80%AFPM.png)

where λ1​ and λ2​ are the regularization parameters for L1 and L2 penalties, respectively.

#### Benefits

1. Combines the benefits of both L1 and L2 regularization.
2. Useful when dealing with highly correlated features.

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generate some sample data
np.random.seed(0)
X = np.random.rand(100, 10)
y = 3 * X[:, 0] + np.random.randn(100)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a simple Linear Regression model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
y_pred_lr = lr_model.predict(X_test)
mse_lr = mean_squared_error(y_test, y_pred_lr)

# Train a Lasso Regression model
lasso_model = Lasso(alpha=0.1)
lasso_model.fit(X_train, y_train)
y_pred_lasso = lasso_model.predict(X_test)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)

# Train a Ridge Regression model
ridge_model = Ridge(alpha=0.1)
ridge_model.fit(X_train, y_train)
y_pred_ridge = ridge_model.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)

# Train an ElasticNet Regression model
elasticnet_model = ElasticNet(alpha=0.1, l1_ratio=0.5)
elasticnet_model.fit(X_train, y_train)
y_pred_elasticnet = elasticnet_model.predict(X_test)
mse_elasticnet = mean_squared_error(y_test, y_pred_elasticnet)

print(f"Linear Regression MSE: {mse_lr:.4f}")
print(f"Lasso Regression MSE: {mse_lasso:.4f}")
print(f"Ridge Regression MSE: {mse_ridge:.4f}")
print(f"ElasticNet Regression MSE: {mse_elasticnet:.4f}")


Linear Regression MSE: 1.0697
Lasso Regression MSE: 0.8477
Ridge Regression MSE: 1.0616
ElasticNet Regression MSE: 0.8855


### Code Explanation

1. Data Generation: We generate a synthetic dataset with 100 samples and 10 features.
2. Data Splitting: We split the data into training and testing sets.
3. Model Training:
    * Linear Regression: We train a simple linear regression model without regularization.
    * Lasso Regression: We train a Lasso regression model with an L1 penalty (alpha=0.1).
    * Ridge Regression: We train a Ridge regression model with an L2 penalty (alpha=0.1).
    * ElasticNet Regression: We train an ElasticNet regression model with both L1 and L2 penalties (alpha=0.1 and l1_ratio=0.5).
4. Model Evaluation: We make predictions on the test set and calculate the mean squared error (MSE) for each model.