### Part 1: Understanding Regularization

1. **Regularization in Deep Learning:**
   - Regularization refers to a set of techniques used to prevent overfitting in deep learning models. It adds constraints to the optimization problem, encouraging the model to learn simpler patterns and reducing the likelihood of fitting noise in the training data.
   - Importance: Overfitting occurs when a model learns to memorize the training data instead of generalizing well to unseen data. Regularization helps mitigate overfitting, leading to better generalization performance on new data.

2. **Bias-Variance Tradeoff and Regularization:**
   - **Bias:** Error due to overly simplistic assumptions in the model.
   - **Variance:** Error due to too much complexity in the model.
   - **Tradeoff:** Increasing model complexity (reducing bias) tends to increase variance, leading to overfitting. Regularization introduces constraints that reduce the model's capacity, effectively increasing bias but decreasing variance. This tradeoff helps strike a balance between bias and variance, improving generalization performance.

3. **L1 and L2 Regularization:**
   - **L1 Regularization (Lasso):** Adds a penalty term proportional to the absolute value of the weights' coefficients. It encourages sparsity in the weight matrix by driving some weights to exactly zero.
   - **L2 Regularization (Ridge):** Adds a penalty term proportional to the squared magnitude of the weights' coefficients. It encourages smaller weights but does not usually lead to sparsity.
   - **Difference:**
     - Penalty Calculation: L1 regularization penalizes the sum of absolute values of weights, while L2 regularization penalizes the sum of squares of weights.
     - Effects: L1 regularization tends to produce sparse weight matrices, making it useful for feature selection. L2 regularization encourages smaller weights, reducing the impact of individual weights on the model's output.

4. **Role of Regularization in Preventing Overfitting:**
   - Regularization constrains the model's capacity, preventing it from fitting the noise in the training data.
   - It discourages complex patterns that may not generalize well to new data, promoting simpler models that capture the underlying patterns.
   - By penalizing large weights or introducing sparsity, regularization helps prevent overfitting and improves the model's ability to generalize to unseen data.


### Part 2: Regularization Techniques

5. **Dropout Regularization:**
   - **Concept:** Dropout is a regularization technique that randomly drops (sets to zero) a fraction of neurons during training. It forces the model to learn redundant representations, reducing reliance on specific neurons and preventing overfitting.
   - **Impact on Model Training and Inference:**
     - During training, Dropout introduces noise and prevents co-adaptation of neurons, effectively regularizing the model and reducing overfitting.
     - During inference, Dropout is turned off, and the full model is used for predictions. However, since neurons were dropped during training, the weights are scaled during inference to maintain the expected outputs, ensuring consistency between training and inference.

6. **Early Stopping:**
   - **Concept:** Early stopping is a regularization technique that stops training the model when the performance on a validation set starts deteriorating. It prevents overfitting by monitoring the validation loss and stopping training when further optimization leads to worse generalization.
   - **Preventing Overfitting:** Early stopping prevents overfitting by stopping the training process before the model starts memorizing noise in the training data. It encourages the model to stop optimizing once it reaches its peak performance on the validation set, ensuring better generalization to unseen data.

7. **Batch Normalization:**
   - **Concept:** Batch Normalization is a technique used to standardize the inputs of each layer in the network by normalizing the activations. It helps stabilize and speed up the training process by reducing internal covariate shift.
   - **Role in Regularization:** Batch Normalization acts as a form of regularization by adding noise to the network during training. It introduces noise through mini-batch statistics, similar to Dropout, which helps prevent overfitting and improves generalization performance.
   - **Preventing Overfitting:** Batch Normalization reduces overfitting by regularizing the model through the introduction of noise in the activations. By stabilizing the training process and reducing internal covariate shift, it enables the model to learn more robust and generalizable representations.


# PART 3 

In [4]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers, models
from sklearn.model_selection import train_test_split

# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize the pixel values to the range [0, 1]
X_train = X_train / 255.0
X_test = X_test / 255.0

# Reshape the input data to flatten the images
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)

# Split the dataset into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Define model input shape and number of classes
input_shape = X_train.shape[1]
num_classes = 10  # 10 classes (digits 0-9)

# Define the model architecture with Dropout
model_with_dropout = models.Sequential([
    layers.Dense(128, activation='relu', input_shape=(input_shape,)),
    layers.Dropout(0.2),  # Dropout layer with 20% dropout rate
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.2),  # Dropout layer with 20% dropout rate
    layers.Dense(num_classes, activation='softmax')
])

# Compile the model
model_with_dropout.compile(optimizer='adam',
                           loss='sparse_categorical_crossentropy',
                           metrics=['accuracy'])

# Train the model with Dropout
history_with_dropout = model_with_dropout.fit(X_train, y_train, epochs=10, validation_data=(X_val, y_val))

# Evaluate the model with Dropout
test_loss_dropout, test_acc_dropout = model_with_dropout.evaluate(X_test, y_test)

# Print model performance with Dropout
print("Model Performance with Dropout:")
print("Test Loss:", test_loss_dropout)
print("Test Accuracy:", test_acc_dropout)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.8041 - loss: 0.6190 - val_accuracy: 0.9553 - val_loss: 0.1484
Epoch 2/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9480 - loss: 0.1785 - val_accuracy: 0.9693 - val_loss: 0.1012
Epoch 3/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9590 - loss: 0.1320 - val_accuracy: 0.9712 - val_loss: 0.0942
Epoch 4/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9641 - loss: 0.1132 - val_accuracy: 0.9727 - val_loss: 0.0866
Epoch 5/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9701 - loss: 0.0966 - val_accuracy: 0.9746 - val_loss: 0.0853
Epoch 6/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9740 - loss: 0.0852 - val_accuracy: 0.9761 - val_loss: 0.0847
Epoch 7/10
[1m1