### What is regularization?

Regularization is a technique used in machine learning to prevent overfitting and improve the generalization ability of a model. Overfitting occurs when a model fits the training data too closely, capturing not only the underlying patterns but also the noise and random fluctuations present in the data. As a result, the overfitted model performs well on the training data but fails to generalize well to new, unseen data.

Regularization introduces additional constraints or penalties to the learning algorithm to discourage complex or extreme models that fit the noise in the training data. By imposing these constraints, regularization helps to simplify the model and reduce its reliance on the training data's specific noise. This encourages the model to learn the underlying patterns and generalize better to new data.

There are different types of regularization techniques used in machine learning:

L1 Regularization (Lasso): L1 regularization adds a penalty term to the loss function that is proportional to the absolute value of the model's coefficients. It encourages sparsity by shrinking some coefficients to zero, effectively performing feature selection and reducing the model's complexity.

L2 Regularization (Ridge): L2 regularization adds a penalty term to the loss function that is proportional to the squared magnitude of the model's coefficients. It discourages large coefficient values and encourages smaller and more evenly distributed coefficients. L2 regularization is particularly effective at reducing the impact of outliers.

Elastic Net Regularization: Elastic Net combines L1 and L2 regularization by adding both penalty terms to the loss function. It allows for a balance between sparsity (L1) and ridge-like regularization (L2). Elastic Net is useful when there are many correlated features in the data.

The choice of regularization technique depends on the problem, the dataset, and the specific learning algorithm being used. By applying regularization, machine learning models can achieve better generalization, improved performance on unseen data, and reduced overfitting, leading to more reliable and robust predictions.

In [1]:
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [2]:
# Generate a synthetic dataset for regression
X, y = make_regression(n_samples=100, n_features=10, random_state=42)

In [3]:
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [4]:
def apply_regularization(reg_type):
    if reg_type == 'Lasso':
        model = Lasso(alpha=0.1)  # Adjust the alpha value for regularization strength
    elif reg_type == 'Ridge':
        model = Ridge(alpha=1.0)  # Adjust the alpha value for regularization strength
    elif reg_type == 'ElasticNet':
        model = ElasticNet(alpha=0.5, l1_ratio=0.5)  # Adjust the alpha and l1_ratio values
    else:
        raise ValueError("Invalid regularization type.")

    model.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred = model.predict(X_test)

    # Calculate mean squared error
    mse = mean_squared_error(y_test, y_pred)
    print(f"{reg_type} MSE: {mse}")

In [5]:
# Apply Lasso regularization
apply_regularization('Lasso')

# Apply Ridge regularization
apply_regularization('Ridge')

# Apply Elastic Net regularization
apply_regularization('ElasticNet')


Lasso MSE: 0.0976520742153871
Ridge MSE: 5.417352880125903
ElasticNet MSE: 1437.2431828596618
