[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AMLA-UBC/100-Exploring-the-World-of-Modern-Machine-Learning/blob/main/L1_L2_Regularization_Exercise.ipynb)

## Recap on Regularization

Regularization helps prevent overfitting by adding a penalty term to the loss function during training.

L1 and L2 regularization differ in the way they penalize the weights:
- L2 penalizes $weight^2$
- L1 penalizes $|weight|$.

The derivative of L2 regularization is $2 * weight$, which serves as a force that decreases the weight by a certain percentage. However, this decrease is not enough to drive the weights to exactly 0, as even after billions of reductions, the weight will never reach zero.

On the other hand, the derivative of L1 regularization is a constant, which serves as a force that subtracts a certain value from the weight. Thanks to the absolute values, the derivative of L1 has a discontinuity at 0, meaning that if the subtraction would have caused the weight to cross 0, it would be set to exactly 0.

L1 regularization is more effective for wide models because it drives more weights to 0, resulting in fewer dimensions and lower memory usage. This is important because high-dimensional sparse vectors can result in huge models that require a lot of memory. L1 regularization encourages the weights to drop to exactly 0, where possible, saving RAM and reducing noise in the model.

Wide models have a large number of input features, while deep models have many hidden layers.

In [None]:
!pip install -q tensorflow

In [None]:
import tensorflow as tf
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# Load the Boston Housing dataset
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
X = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
Y = raw_df.values[1::2, 2]
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define the model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(10, input_dim=X_train.shape[1], activation='relu'))
model.add(tf.keras.layers.Dense(1, activity_regularizer=tf.keras.regularizers.L2(1e-5)))

# Compile the model with L1 regularization
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'])

# Train the model
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_test, y_test))

# Evaluate the model on test data
mse1, _ = model.evaluate(X_test, y_test)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(10, input_dim=X_train.shape[1], activation='relu'))
model.add(tf.keras.layers.Dense(1, activity_regularizer=tf.keras.regularizers.l2(1e-5)))

# Compile the model with L2 regularization
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'],)

# Train the model
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_test, y_test))

# Evaluate the model on test data
mse2, _ = model.evaluate(X_test, y_test)
print("\n")
print("Mean Squared Error with L1 Regularization:", mse1)
print("Mean Squared Error with L2 Regularization:", mse2)

## Practice

Let's revisit our solutions to the [Multiclass Classification Exercise](https://colab.research.google.com/github/AMLA-UBC/100-Exploring-the-World-of-Modern-Machine-Learning/blob/main/Multiclass_Classification_Exercise.ipynb) and retrain the model with regularizations. Note that adding too much regularization negative affects a CNN's ability to capture patterns in the training data and make accurate predictions. Does using regularization on CNNs improve the accuracy on test data?

In [None]:
# Build the model
# Use L1 regularization
...


# Compile the model
...


# Train the model
...


# Evaluate the model
...


# Build the model
# Use L2 regularization
...


# Compile the model
...


# Evaluate the model
...

