Q1.  **Deep Learning.**

    1.  **Build a DNN with five hidden layers of 100 neurons each, He
        initialization, and the ELU activation function.**

    2.  **Using Adam optimization and early stopping, try training it on
        MNIST but only on digits 0 to 4, as we will use transfer
        learning for digits 5 to 9 in the next exercise. You will need a
        softmax output layer with five neurons, and as always make sure
        to save checkpoints at regular intervals and save the final
        model so you can reuse it later.**

    3.  **Tune the hyperparameters using cross-validation and see what
        precision you can achieve.**

    4.  **Now try adding Batch Normalization and compare the learning
        curves: is it converging faster than before? Does it produce a
        better model?**

    5.  **Is the model overfitting the training set? Try adding dropout
        to every layer and try again. Does it help?**

> **a. Here's how you can build a DNN with five hidden layers of 100
> neurons each, using He initialization and the ELU activation
> function:**
>
> import tensorflow as tf
>
> from tensorflow import keras
>
> \# Load MNIST dataset
>
> (X_train, y_train), (X_test, y_test) =
> keras.datasets.mnist.load_data()
>
> \# Filter digits 0 to 4
>
> train_mask = y_train \< 5
>
> test_mask = y_test \< 5
>
> X_train, y_train = X_train\[train_mask\], y_train\[train_mask\]
>
> X_test, y_test = X_test\[test_mask\], y_test\[test_mask\]
>
> \# Preprocess the data
>
> X_train = X_train.reshape(-1, 28\*28) / 255.0
>
> X_test = X_test.reshape(-1, 28\*28) / 255.0
>
> y_train = keras.utils.to_categorical(y_train, num_classes=5)
>
> y_test = keras.utils.to_categorical(y_test, num_classes=5)
>
> \# Build the model
>
> model = keras.models.Sequential(\[
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal", input_shape=(28\*28,)),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(5, activation="softmax")
>
> \])
>
> \# Compile the model
>
> model.compile(loss="categorical_crossentropy", optimizer="adam",
> metrics=\["accuracy"\])
>
> \# Define callbacks
>
> callbacks = \[
>
> keras.callbacks.EarlyStopping(patience=10),
>
> keras.callbacks.ModelCheckpoint("mnist_dnn.h5", save_best_only=True)
>
> \]
>
> \# Train the model
>
> history = model.fit(X_train, y_train, epochs=100,
> validation_data=(X_test, y_test), callbacks=callbacks)
>
> \# Save the final model
>
> model.save("mnist_dnn_final.h5")
>
> **b.** **The code above trains the model on the MNIST dataset,
> considering only digits 0 to 4. It uses the Adam optimizer and early
> stopping to prevent overfitting. The output layer has five neurons
> with softmax activation**.
>
> **c.** **To tune the hyperparameters, you can use cross-validation.
> You can vary parameters like the number of hidden layers, the number
> of neurons per layer, the learning rate, or the activation function to
> find the best combination that yields the highest precision.**
>
> from sklearn.model_selection import GridSearchCV
>
> \# Define the model
>
> def create_model():
>
> model = keras.models.Sequential(\[
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal", input_shape=(28\*28,)),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(5, activation="softmax")
>
> \])
>
> model.compile(loss="categorical_crossentropy", optimizer="adam",
> metrics=\["accuracy"\])
>
> return model
>
> \# Create the model
>
> model =
> keras.wrappers.scikit_learn.KerasClassifier(build_fn=create_model)
>
> \# Define the hyperparameters to tune
>
> param_grid = {
>
> "epochs": \[50, 100, 150\],
>
> "batch_size": \[32, 64, 128\],
>
> "learning_rate": \[0.001, 0.01, 0.1\],
>
> "
>
> hidden_layers": \[2, 3, 4, 5\],
>
> "neurons_per_layer": \[50, 100, 200\]
>
> }
>
> \# Perform grid search cross-validation
>
> grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)
>
> grid_result = grid.fit(X_train, y_train)
>
> \# Get the best precision achieved
>
> best_precision = grid_result.best_score\_
>
> print("Best Precision: %.4f" % (best_precision))
>
> **d.** **To add Batch Normalization and compare the learning curves,
> you can modify the model architecture as follows**:
>
> model = keras.models.Sequential(\[
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal", input_shape=(28\*28,)),
>
> keras.layers.BatchNormalization(),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.BatchNormalization(),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.BatchNormalization(),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.BatchNormalization(),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.BatchNormalization(),
>
> keras.layers.Dense(5, activation="softmax")
>
> \])
>
> You can train and compare the learning curves with and without Batch
> Normalization to see if it converges faster and produces a better
> model.
>
> **e. To check if the model is overfitting, you can add dropout to
> every layer. Dropout randomly sets a fraction of the input units to 0
> at each update during training, which helps prevent overfitting.**
>
> model = keras.models.Sequential(\[
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal", input_shape=(28\*28,)),
>
> keras.layers.Dropout(0.5),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dropout(0.5),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dropout(0.5),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dropout(0.5),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dropout(0.5),
>
> keras.layers.Dense(5, activation="softmax")
>
> \])
>
> By adding dropout to every layer, you can train the model again and
> observe if it helps reduce overfitting.

Q2.  **Transfer learning.**

    1.  **Create a new DNN that reuses all the pretrained hidden layers
        of the previous model, freezes them, and replaces the softmax
        output layer with a new one.**

    2.  **Train this new DNN on digits 5 to 9, using only 100 images per
        digit, and time how long it takes. Despite this small number of
        examples, can you achieve high precision?**

    3.  **Try caching the frozen layers, and train the model again: how
        much faster is it now?**

    4.  **Try again reusing just four hidden layers instead of five. Can
        you achieve a higher precision?**

    5.  **Now unfreeze the top two hidden layers and continue training:
        can you get the model to perform even better?**

> **a. To create a new DNN that reuses the pretrained hidden layers of
> the previous model, freezes them, and replaces the softmax output
> layer with a new one, you can follow these steps:**
>
> import tensorflow as tf
>
> from tensorflow import keras
>
> \# Load the previously trained model
>
> pretrained_model = keras.models.load_model("mnist_dnn_final.h5")
>
> \# Freeze the pretrained layers
>
> for layer in pretrained_model.layers:
>
> layer.trainable = False
>
> \# Remove the last softmax layer
>
> pretrained_model.pop()
>
> \# Add a new softmax output layer
>
> pretrained_model.add(keras.layers.Dense(5, activation="softmax"))
>
> \# Compile the model
>
> pretrained_model.compile(loss="categorical_crossentropy",
> optimizer="adam", metrics=\["accuracy"\])
>
> **b. To train the new DNN on digits 5 to 9 using a small number of
> images per digit, you can modify the MNIST dataset accordingly and
> train the model. Note that with a small number of images, achieving
> high precision may be challenging, but it's still worth trying.**
>
> \# Load MNIST dataset
>
> (X_train, y_train), (X_test, y_test) =
> keras.datasets.mnist.load_data()
>
> \# Filter digits 5 to 9
>
> train_mask = y_train \>= 5
>
> test_mask = y_test \>= 5
>
> X_train, y_train = X_train\[train_mask\], y_train\[train_mask\]
>
> X_test, y_test = X_test\[test_mask\], y_test\[test_mask\]
>
> \# Limit the number of images per digit
>
> num_images_per_digit = 100
>
> X_train = X_train\[:num_images_per_digit\]
>
> y_train = y_train\[:num_images_per_digit\]
>
> \# Preprocess the data
>
> X_train = X_train.reshape(-1, 28\*28) / 255.0
>
> X_test = X_test.reshape(-1, 28\*28) / 255.0
>
> y_train = keras.utils.to_categorical(y_train - 5, num_classes=5)
>
> y_test = keras.utils.to_categorical(y_test - 5, num_classes=5)
>
> \# Train the model
>
> history = pretrained_model.fit(X_train, y_train, epochs=100,
> validation_data=(X_test, y_test))
>
> **c. To cache the frozen layers and train the model again, you can
> utilize the \`cache\` argument in the \`fit\` method. This can speed
> up the training process by avoiding unnecessary computations for the
> frozen layers.**
>
> \# Train the model with caching
>
> history = pretrained_model.fit(X_train, y_train, epochs=100,
> validation_data=(X_test, y_test), cache=True)
>
> The speedup achieved by caching the frozen layers will depend on the
> specific hardware and software configuration.
>
> **d. To reuse just four hidden layers instead of all five, you can
> modify the model architecture by removing one hidden layer before
> retraining the model. This allows you to evaluate if using fewer
> layers can lead to higher precision.**
>
> pretrained_model.layers.pop() \# Remove the last hidden layer
>
> pretrained_model.compile(loss="categorical_crossentropy",
> optimizer="adam", metrics=\["accuracy"\])
>
> \# Train the model with four hidden layers
>
> history = pretrained_model.fit(X_train, y_train, epochs=100,
> validation_data=(X_test, y_test))
>
> **e. To unfreeze the top two hidden layers and continue training, you
> can set the \`trainable\` property of those layers to \`True\` before
> training the model again.**
>
> for layer in pretrained_model.layers\[-2:\]:
>
> layer.trainable = True
>
> pretrained_model.compile(loss="categorical_crossentropy",
> optimizer="adam", metrics=\["accuracy"\])
>
> \# Continue training with unfrozen top two layers
>
> history = pretrained_model.fit(X_train, y
>
> \_train, epochs=100, validation_data=(X_test, y_test))
>
> By unfreezing the top two hidden layers, the model has the opportunity
> to adapt and potentially improve its performance.

Q3.  **Pretraining on an auxiliary task.**

    1.  **In this exercise you will build a DNN that compares two MNIST
        digit images and predicts whether they represent the same digit
        or not. Then you will reuse the lower layers of this network to
        train an MNIST classifier using very little training data. Start
        by building two DNNs (let’s call them DNN A and B), both similar
        to the one you built earlier but without the output layer: each
        DNN should have five hidden layers of 100 neurons each, He
        initialization, and ELU activation. Next, add one more hidden
        layer with 10 units on top of both DNNs. To do this, you should
        use TensorFlow’s concat() function with axis=1 to concatenate
        the outputs of both DNNs for each instance, then feed the result
        to the hidden layer. Finally, add an output layer with a single
        neuron using the logistic activation function.**

    2.  **Split the MNIST training set in two sets: split #1 should
        containing 55,000 images, and split #2 should contain contain
        5,000 images. Create a function that generates a training batch
        where each instance is a pair of MNIST images picked from split
        #1. Half of the training instances should be pairs of images
        that belong to the same class, while the other half should be
        images from different classes. For each pair, the training label
        should be 0 if the images are from the same class, or 1 if they
        are from different classes.**

    3.  **Train the DNN on this training set. For each image pair, you
        can simultaneously feed the first image to DNN A and the second
        image to DNN B. The whole network will gradually learn to tell
        whether two images belong to the same class or not.**

    4.  **Now create a new DNN by reusing and freezing the hidden layers
        of DNN A and adding a softmax output layer on top with 10
        neurons. Train this network on split #2 and see if you can
        achieve high performance despite having only 500 images per
        class.**

> **a. To build two DNNs (DNN A and DNN B) for comparing MNIST digit
> images and predicting whether they represent the same digit or not,
> you can follow these steps:**
>
> import tensorflow as tf
>
> from tensorflow import keras
>
> \# DNN A
>
> dnn_a = keras.models.Sequential(\[
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal", input_shape=(28\*28,)),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal")
>
> \])
>
> \# DNN B
>
> dnn_b = keras.models.Sequential(\[
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal", input_shape=(28\*28,)),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal"),
>
> keras.layers.Dense(100, activation="elu",
> kernel_initializer="he_normal")
>
> \])
>
> \# Concatenate outputs of DNN A and DNN B
>
> concat = keras.layers.Concatenate(axis=1)(\[dnn_a.output,
> dnn_b.output\])
>
> \# Add a hidden layer with 10 units
>
> hidden = keras.layers.Dense(10, activation="elu")(concat)
>
> \# Output layer with a single neuron using logistic activation
>
> output = keras.layers.Dense(1, activation="sigmoid")(hidden)
>
> \# Combine DNN A and DNN B into a single model
>
> model = keras.models.Model(inputs=\[dnn_a.input, dnn_b.input\],
> outputs=output)
>
> **b. To generate a training batch with pairs of MNIST images, you can
> create a function that randomly selects images from split #1 and
> labels them as same class (0) or different classes (1) based on their
> true labels.**
>
> import numpy as np
>
> def generate_training_batch(images, labels, batch_size):
>
> num_classes = len(np.unique(labels))
>
> half_batch_size = batch_size // 2
>
> \# Generate same class pairs
>
> same_class_indices = np.random.randint(0, len(images),
> size=half_batch_size)
>
> same_class_images_1 = images\[same_class_indices\]
>
> same_class_labels_1 = labels\[same_class_indices\]
>
> same_class_images_2 = same_class_images_1.copy()
>
> same_class_labels_2 = same_class_labels_1.copy()
>
> \# Generate different class pairs
>
> different_class_indices = np.random.randint(0, len(images),
> size=half_batch_size)
>
> different_class_images_1 = images\[different_class_indices\]
>
> different_class_labels_1 = labels\[different_class_indices\]
>
> different_class_indices = np.random.randint(0, len(images),
> size=half_batch_size)
>
> different_class_images_2 = images\[different_class_indices\]
>
> different_class_labels_2 = labels\[different_class_indices\]
>
> \# Combine same class and different class pairs
>
> X_1 = np.concatenate(\[same_class_images_1,
> different_class_images_1\], axis=0)
>
> X_2 = np.concatenate(\[same_class_images_2,
> different_class_images_2\], axis=0)
>
> y = np.concatenate(\[np.zeros(half_batch_size),
> np.ones(half_batch_size)\], axis=0)
>
> return \[X_1, X_2\], y
>
> **c. To train the DNN on the generated training set, you can use the**
>
> \`generate_training_batch\` function in a loop and feed the image
> pairs to DNN A and DNN B simultaneously.
>
> \# Split MNIST training set into split #1 and split #2
>
> (X_train, y_train), (X_test, y_test) =
> keras.datasets.mnist.load_data()
>
> X_train_1, y_train_1 = X_train\[:55000\], y_train\[:55000\]
>
> X_train_2, y_train_2 = X_train\[55000:\], y_train\[55000:\]
>
> \# Preprocess the data
>
> X_train_1 = X_train_1.reshape(-1, 28\*28) / 255.0
>
> X_train_2 = X_train_2.reshape(-1, 28\*28) / 255.0
>
> \# Training loop
>
> batch_size = 32
>
> num_batches = len(X_train_1) // batch_size
>
> for epoch in range(10):
>
> for batch in range(num_batches):
>
> X_batch, y_batch = generate_training_batch(X_train_1, y_train_1,
> batch_size)
>
> model.train_on_batch(X_batch, y_batch)
>
> **d. To create a new DNN by reusing and freezing the hidden layers of
> DNN A and adding a softmax output layer on top, you can follow these
> steps:**
>
> \# Reuse and freeze the hidden layers of DNN A
>
> dnn_a.trainable = False
>
> dnn_a\_outputs = dnn_a.layers\[-1\].output
>
> \# Add a softmax output layer with 10 neurons
>
> softmax_layer = keras.layers.Dense(10,
> activation="softmax")(dnn_a\_outputs)
>
> \# Create a new model with frozen layers of DNN A and the softmax
> output layer
>
> new_model = keras.models.Model(inputs=dnn_a.input,
> outputs=softmax_layer)
>
> \# Compile the new model
>
> new_model.compile(loss="sparse_categorical_crossentropy",
> optimizer="adam", metrics=\["accuracy"\])
>
> You can then train the new model on split #2 and evaluate its
> performance despite having only 500 images per class.
>
> \# Preprocess split #2 data
>
> X_train_2 = X_train_2.reshape(-1, 28\*28) / 255.0
>
> \# Train the new model on split #2
>
> new_model.fit(X_train_2, y_train_2, epochs=10,
> validation_data=(X_test, y_test))
>
> Despite having a small number of images per class, transfer learning
> from the pretrained DNN A can still enable the new model to achieve a
> reasonable performance.