## Part 1a: TensorFlow - L1 and L2 Regularization

**Description:**

This Colab demonstrates the implementation of L1 and L2 regularization techniques using TensorFlow Keras. We will train simple neural network models on the `digits` dataset with and without these regularizations to observe their impact on model performance and generalization.

* **L1 Regularization (Lasso):** Adds a penalty to the loss function proportional to the absolute value of the weights.
* **L2 Regularization (Ridge):** Adds a penalty to the loss function proportional to the square of the weights.
* **Combined L1 and L2 Regularization:** Using both penalties simultaneously.

In [1]:
import tensorflow as tf
from tensorflow.keras import layers, models, regularizers
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler

# Load the digits dataset
digits = load_digits()
X, y = digits.data, digits.target

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Define a function to create a simple model with regularization
def create_regularized_model(l1=0.0, l2=0.0):
    return models.Sequential([
        layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l1_l2(l1=l1, l2=l2), input_shape=(X_train.shape[1],)),
        layers.Dense(10, activation='softmax')
    ])

# Model without regularization (baseline)
model_no_reg = models.Sequential([
    layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    layers.Dense(10, activation='softmax')
])
model_no_reg.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_no_reg = model_no_reg.fit(X_train, y_train, epochs=50, validation_data=(X_test, y_test), verbose=0)

# Model with L2 regularization
model_l2 = create_regularized_model(l2=0.01)
model_l2.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_l2 = model_l2.fit(X_train, y_train, epochs=50, validation_data=(X_test, y_test), verbose=0)

# Model with L1 regularization
model_l1 = create_regularized_model(l1=0.01)
model_l1.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_l1 = model_l1.fit(X_train, y_train, epochs=50, validation_data=(X_test, y_test), verbose=0)

# Model with both L1 and L2 regularization
model_l1_l2 = create_regularized_model(l1=0.01, l2=0.01)
model_l1_l2.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_l1_l2 = model_l1_l2.fit(X_train, y_train, epochs=50, validation_data=(X_test, y_test), verbose=0)

# Evaluate the models
print("\nEvaluation:")
print("No Regularization - Test Accuracy:", model_no_reg.evaluate(X_test, y_test, verbose=0)[1])
print("L2 Regularization - Test Accuracy:", model_l2.evaluate(X_test, y_test, verbose=0)[1])
print("L1 Regularization - Test Accuracy:", model_l1.evaluate(X_test, y_test, verbose=0)[1])
print("L1 and L2 Regularization - Test Accuracy:", model_l1_l2.evaluate(X_test, y_test, verbose=0)[1])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)



Evaluation:
No Regularization - Test Accuracy: 0.9750000238418579
L2 Regularization - Test Accuracy: 0.9750000238418579
L1 Regularization - Test Accuracy: 0.9583333134651184
L1 and L2 Regularization - Test Accuracy: 0.9611111283302307


## Results for Part 1a: TensorFlow - L1 and L2 Regularization

In this experiment, we trained four simple neural network models on the `digits` dataset to evaluate the impact of L1 and L2 regularization:

* **Model without Regularization (Baseline):** This model served as our control.
* **Model with L2 Regularization (l2=0.01):** This model applied L2 regularization to the weights of the dense layers.
* **Model with L1 Regularization (l1=0.01):** This model applied L1 regularization to the weights of the dense layers.
* **Model with Combined L1 and L2 Regularization (l1=0.01, l2=0.01):** This model applied both L1 and L2 regularization to the weights.

The test accuracies achieved by each model are as follows:

* **No Regularization - Test Accuracy:** 0.9750
* **L2 Regularization - Test Accuracy:** 0.9750
* **L1 Regularization - Test Accuracy:** 0.9583
* **L1 and L2 Regularization - Test Accuracy:** 0.9611

**Analysis:**

The results indicate that in this particular experiment on the `digits` dataset with this simple network architecture and these regularization strengths:

* **L2 regularization** did not show a noticeable improvement or degradation in test accuracy compared to the baseline model without regularization.
* **L1 regularization** resulted in a slightly lower test accuracy compared to the baseline. L1 regularization encourages sparsity in the weights, which might have led to some loss of information in this relatively small and well-structured dataset.
* The **combination of L1 and L2 regularization** also resulted in a slightly lower test accuracy than the baseline, although it performed better than L1 regularization alone.

It's important to note that the optimal regularization strength ($\lambda$ in L1 and L2, controlled by `l1` and `l2` parameters) is highly dependent on the specific dataset and model architecture. Further experimentation with different regularization strengths might yield different results.
