1. Deep Learning.
a. Build a DNN with five hidden layers of 100 neurons each, He initialization, and the
ELU activation function.
b. Using Adam optimization and early stopping, try training it on MNIST but only on
digits 0 to 4, as we will use transfer learning for digits 5 to 9 in the next exercise. You
will need a softmax output layer with five neurons, and as always make sure to save
checkpoints at regular intervals and save the final model so you can reuse it later.
c. Tune the hyperparameters using cross-validation and see what precision you can
achieve.
d. Now try adding Batch Normalization and compare the learning curves: is it
converging faster than before? Does it produce a better model?
e. Is the model overfitting the training set? Try adding dropout to every layer and try
again. Does it help?
2. Transfer learning.
a. Create a new DNN that reuses all the pretrained hidden layers of the previous
model, freezes them, and replaces the softmax output layer with a new one.
b. Train this new DNN on digits 5 to 9, using only 100 images per digit, and time how
long it takes. Despite this small number of examples, can you achieve high precision?
c. Try caching the frozen layers, and train the model again: how much faster is it now?
d. Try again reusing just four hidden layers instead of five. Can you achieve a higher
precision?
e. Now unfreeze the top two hidden layers and continue training: can you get the
model to perform even better?
3. Pretraining on an auxiliary task.
a. In this exercise you will build a DNN that compares two MNIST digit images and
predicts whether they represent the same digit or not. Then you will reuse the lower
layers of this network to train an MNIST classifier using very little training data. Start
by building two DNNs (let’s call them DNN A and B), both similar to the one you built
earlier but without the output layer: each DNN should have five hidden layers of 100
neurons each, He initialization, and ELU activation. Next, add one more hidden layer
with 10 units on top of both DNNs. To do this, you should use
TensorFlow’s concat() function with axis=1 to concatenate the outputs of both DNNs
for each instance, then feed the result to the hidden layer. Finally, add an output
layer with a single neuron using the logistic activation function.
b. Split the MNIST training set in two sets: split #1 should containing 55,000 images,
and split #2 should contain contain 5,000 images. Create a function that generates a
training batch where each instance is a pair of MNIST images picked from split #1.
Half of the training instances should be pairs of images that belong to the same
class, while the other half should be images from different classes. For each pair, the

training label should be 0 if the images are from the same class, or 1 if they are from
different classes.
c. Train the DNN on this training set. For each image pair, you can simultaneously feed
the first image to DNN A and the second image to DNN B. The whole network will
gradually learn to tell whether two images belong to the same class or not.
d. Now create a new DNN by reusing and freezing the hidden layers of DNN A and
adding a softmax output layer on top with 10 neurons. Train this network on split #2
and see if you can achieve high performance despite having only 500 images per
class.

## **ANS OF ABOVE QUESTION ARE AS FOLLOW**

1a. Here's an example code to build a DNN with five hidden layers of 100 neurons each, He initialization, and the ELU activation function using Keras:

```python
import tensorflow as tf
from tensorflow import keras

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal")
])
```

1b. Here's an example code to train the DNN on MNIST digits 0 to 4 using Adam optimization and early stopping:

```python
import numpy as np
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load MNIST digits 0 to 4
(X_train_full, y_train_full), (X_test, y_test) = mnist.load_data()
X_train_04 = X_train_full[(y_train_full < 5)]
y_train_04 = y_train_full[(y_train_full < 5)]
X_test_04 = X_test[(y_test < 5)]
y_test_04 = y_test[(y_test < 5)]

# Preprocess the data
X_train_04 = X_train_04.astype(np.float32) / 255.0
X_test_04 = X_test_04.astype(np.float32) / 255.0
y_train_04 = to_categorical(y_train_04, num_classes=5)
y_test_04 = to_categorical(y_test_04, num_classes=5)

# Compile the model
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

# Define callbacks
checkpoint_cb = keras.callbacks.ModelCheckpoint("my_model.h5", save_best_only=True)
early_stopping_cb = keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)

# Train the model
history = model.fit(X_train_04, y_train_04, epochs=100, validation_data=(X_test_04, y_test_04),
                    callbacks=[checkpoint_cb, early_stopping_cb])
```

1c. Here's an example code to tune the hyperparameters using cross-validation and evaluate the model's precision:

```python
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Define a function to create the model
def create_model(n_hidden=5, n_neurons=100, learning_rate=1e-3):
    model = keras.models.Sequential()
    model.add(keras.layers.Flatten(input_shape=[28, 28]))
    for layer in range(n_hidden):
        model.add(keras.layers.Dense(n_neurons, activation="elu", kernel_initializer="he_normal"))
    model.add(keras.layers.Dense(5, activation="softmax"))
    optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
    model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
    return model



2


# Create a KerasClassifier object
keras_clf = KerasClassifier(create_model)

# Define the hyperparameters to tune
param_grid = {'n_hidden': [1, 2, 3, 4, 5],
              'n_neurons': [50, 100, 150, 200],
              'learning_rate': [1e-2, 1e-3, 1e-4]}

# Perform grid search with cross-validation          
import tensorflow as tf

# Load the saved model
model = tf.keras.models.load_model('mnist_model.h5')

# Set the layers as not trainable
for layer in model.layers:
    layer.trainable = False

# Add a new softmax output layer for digits 5 to 9
model.add(tf.keras.layers.Dense(5, activation='softmax', name='new_output'))

# Compile the model
model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])





3
A. Here's the code to build DNN A and DNN B:

``` python
import tensorflow as tf
from tensorflow import keras

# DNN A
dnn_a = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(10, activation=None)
])

# DNN B
dnn_b = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.Dense(10, activation=None)
])

# Concatenate the outputs of DNN A and DNN B
concat = keras.layers.Concatenate(axis=1)([dnn_a.output, dnn_b.output])

# Add a hidden layer with 10 units on top of the concatenated output
hidden = keras.layers.Dense(10, activation="elu", kernel_initializer="he_normal")(concat)

# Add an output layer with a single neuron using the logistic activation function
output = keras.layers.Dense(1, activation="sigmoid")(hidden)

# Create the final model
model = keras.models.Model(inputs=[dnn_a.input, dnn_b.input], outputs=output)
```

b. Here's the code to generate a training batch:

``` python
import numpy as np

def generate_batch(X, y, batch_size):
    X1_batch, X2_batch, y_batch = [], [], []
    n_half = batch_size // 2
    indices = np.random.randint(0, len(X), size=n_half)
    for i in range(n_half):
        X1_batch.append(X[indices[i]])
        X2_batch.append(X[indices[i]])
        y_batch.append(0)
    for i in range(n_half):
        class1 = np.random.randint(0, 10)
        class2 = np.random.randint(0, 10)
        while class2 == class1:
            class2 = np.random.randint(0, 10)
        indices1 = np.where(y == class1)[0]
        indices2 = np.where(y == class2)[0]
        idx1 = np.random.randint(0, len(indices1))
        idx2 = np.random.randint(0, len(indices2))
        X1_batch.append(X[indices1[idx1]])
        X2_batch.append(X[indices2[idx2]])
        y_batch.append(1)
    X1_batch = np.array(X1_batch)
    X2_batch = np.array(X2_batch)
    y_batch = np.array(y_batch)
    shuffle_indices = np.random.permutation(batch_size)
    return [X1_batch[shuffle_indices], X2_batch[shuffle_indices]], y_batch[shuffle_indices]
```

c. Here's the code to train the DNN:

``` python
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train_full = X_train_full.astype(np.float32) / 255