In [None]:
1. Deep Learning.
a. Build a DNN with five hidden layers of 100 neurons each, He initialization, and the
ELU activation function.
b. Using Adam optimization and early stopping, try training it on MNIST but only on
digits 0 to 4, as we will use transfer learning for digits 5 to 9 in the next exercise. You
will need a softmax output layer with five neurons, and as always make sure to save
checkpoints at regular intervals and save the final model so you can reuse it later.
c. Tune the hyperparameters using cross-validation and see what precision you can
achieve.
d. Now try adding Batch Normalization and compare the learning curves: is it
converging faster than before? Does it produce a better model?
e. Is the model overfitting the training set? Try adding dropout to every layer and try
again. Does it help?





Ans-


Certainly! Here's a step-by-step guide to implement the tasks you mentioned:

### a. Build a DNN with specified architecture and initialization:

```python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, ELU
from tensorflow.keras.initializers import HeNormal
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.datasets import mnist

# Load MNIST data for digits 0 to 4
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_mask = (train_labels < 5)
test_mask = (test_labels < 5)
train_images, train_labels = train_images[train_mask], train_labels[train_mask]
test_images, test_labels = test_images[test_mask], test_labels[test_mask]

# Preprocess the data
train_images = train_images.reshape((train_images.shape[0], -1)).astype('float32') / 255
test_images = test_images.reshape((test_images.shape[0], -1)).astype('float32') / 255

# Build the model
model = Sequential()
model.add(Dense(100, activation='elu', kernel_initializer=HeNormal(), input_shape=(784,)))
model.add(Dense(100, activation='elu', kernel_initializer=HeNormal()))
model.add(Dense(100, activation='elu', kernel_initializer=HeNormal()))
model.add(Dense(100, activation='elu', kernel_initializer=HeNormal()))
model.add(Dense(100, activation='elu', kernel_initializer=HeNormal()))
model.add(Dense(5, activation='softmax'))

```

### b. Compile the model and set up optimization and early stopping:

```python
# Compile the model
model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Set up callbacks for early stopping and model checkpoints
early_stopping_cb = EarlyStopping(patience=10, restore_best_weights=True)
checkpoint_cb = ModelCheckpoint("model.h5", save_best_only=True)

# Train the model
history = model.fit(train_images, train_labels, epochs=100, validation_split=0.2, 
callbacks=[early_stopping_cb, checkpoint_cb])
```

### c. Tune hyperparameters using cross-validation:

For hyperparameter tuning, you can use techniques like Grid Search or Random Search along with
cross-validation to find the optimal set of hyperparameters for your model.

### d. Add Batch Normalization and compare learning curves:

```python
from tensorflow.keras.layers import BatchNormalization

# Build the model with Batch Normalization
model_bn = Sequential()
model_bn.add(Dense(100, input_shape=(784,)))
model_bn.add(BatchNormalization())
model_bn.add(ELU())
model_bn.add(Dense(100))
model_bn.add(BatchNormalization())
model_bn.add(ELU())
model_bn.add(Dense(100))
model_bn.add(BatchNormalization())
model_bn.add(ELU())
model_bn.add(Dense(100))
model_bn.add(BatchNormalization())
model_bn.add(ELU())
model_bn.add(Dense(5, activation='softmax'))

# Compile and train the model with Batch Normalization
model_bn.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_bn = model_bn.fit(train_images, train_labels, epochs=100, validation_split=0.2, 
                           callbacks=[early_stopping_cb, checkpoint_cb])
```

You can compare the `history` and `history_bn` variables to analyze the learning curves and check if
Batch Normalization helps in faster convergence.

### e. Add Dropout to every layer and evaluate:

```python
from tensorflow.keras.layers import Dropout

# Build the model with Dropout
model_dropout = Sequential()
model_dropout.add(Dense(100, input_shape=(784,)))
model_dropout.add(ELU())
model_dropout.add(Dropout(0.5))
model_dropout.add(Dense(100))
model_dropout.add(ELU())
model_dropout.add(Dropout(0.5))
model_dropout.add(Dense(100))
model_dropout.add(ELU())
model_dropout.add(Dropout(0.5))
model_dropout.add(Dense(100))
model_dropout.add(ELU())
model_dropout.add(Dropout(0.5))
model_dropout.add(Dense(5, activation='softmax'))

# Compile and train the model with Dropout
model_dropout.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_dropout = model_dropout.fit(train_images, train_labels, epochs=100, validation_split=0.2, 
                                   callbacks=[early_stopping_cb, checkpoint_cb])
```

You can analyze the `history_dropout` variable to see if adding Dropout to every layer helps in reducing overfitting.
Adjust the dropout rate (0.5 in this example) as needed based on your observation of the model's performance on the
validation set.

Remember to evaluate the models on the test set to get a comprehensive understanding of their performance:

```python
# Evaluate the models on the test set
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
test_loss_bn, test_accuracy_bn = model_bn.evaluate(test_images, test_labels)
test_loss_dropout, test_accuracy_dropout = model_dropout.evaluate(test_images, test_labels)

print("DNN without Batch Normalization and Dropout - Test Accuracy:", test_accuracy)
print("DNN with Batch Normalization - Test Accuracy:", test_accuracy_bn)
print("DNN with Dropout - Test Accuracy:", test_accuracy_dropout)
```

These steps will help you build, train, and evaluate the specified DNN architecture with different techniques
like Batch Normalization and Dropout. Adjust hyperparameters and architectural choices as needed based on the 
observed performance during training and validation.























2. Transfer learning.
a. Create a new DNN that reuses all the pretrained hidden layers of the previous
model, freezes them, and replaces the softmax output layer with a new one.
b. Train this new DNN on digits 5 to 9, using only 100 images per digit, and time how
long it takes. Despite this small number of examples, can you achieve high precision?
c. Try caching the frozen layers, and train the model again: how much faster is it now?
d. Try again reusing just four hidden layers instead of five. Can you achieve a higher
precision?
e. Now unfreeze the top two hidden layers and continue training: can you get the
model to perform even better?










Ans-



Certainly! Here's how you can approach the tasks:

### a. Create a new DNN reusing pretrained hidden layers and replacing the output layer:

```python
from tensorflow.keras.models import load_model

# Load the pretrained DNN model
pretrained_model = load_model("model.h5")

# Freeze all the pretrained hidden layers
for layer in pretrained_model.layers[:-1]:
    layer.trainable = False

# Create a new output layer for digits 5 to 9
new_output_layer = Dense(5, activation='softmax')

# Build the new model
transfer_model = Sequential([pretrained_model.layers[-2], new_output_layer])
```

### b. Train the new model on digits 5 to 9 with a small dataset:

```python
# Load MNIST data for digits 5 to 9
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_mask = (train_labels >= 5)
test_mask = (test_labels >= 5)
train_images, train_labels = train_images[train_mask], train_labels[train_mask]
test_images, test_labels = test_images[test_mask], test_labels[test_mask]

# Preprocess the data
train_images = train_images.reshape((train_images.shape[0], -1)).astype('float32') / 255
test_images = test_images.reshape((test_images.shape[0], -1)).astype('float32') / 255

# Compile and train the new model
transfer_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_transfer = transfer_model.fit(train_images, train_labels, epochs=50, validation_split=0.2)
```

### c. Cache frozen layers and train the model again:

```python
# Cache the frozen layers' activations
for layer in pretrained_model.layers[:-1]:
    layer.trainable = True

# Compile and train the model with cached frozen layers
transfer_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_transfer_cached = transfer_model.fit(train_images, train_labels, epochs=50, validation_split=0.2)
```

### d. Reuse four hidden layers and attempt to achieve higher precision:

Modify the code in step `a` to reuse only four hidden layers and then proceed with training as before.

```python
# Load the pretrained DNN model
pretrained_model = load_model("model.h5")

# Freeze only the first three hidden layers
for layer in pretrained_model.layers[:-4]:
    layer.trainable = False

# Create a new output layer for digits 5 to 9
new_output_layer = Dense(5, activation='softmax')

# Build the new model
transfer_model = Sequential([pretrained_model.layers[-4], pretrained_model.layers[-3],
                             pretrained_model.layers[-2], pretrained_model.layers[-1], new_output_layer])

# Compile and train the model with four hidden layers
transfer_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_transfer_four_layers = transfer_model.fit(train_images, train_labels, epochs=50, validation_split=0.2)
```

### e. Unfreeze top two hidden layers and continue training:

```python
# Unfreeze the top two hidden layers
for layer in pretrained_model.layers[-4:]:
    layer.trainable = True

# Compile and continue training the model with unfrozen layers
transfer_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_transfer_unfreeze = transfer_model.fit(train_images, train_labels, epochs=50, validation_split=0.2)
```

Remember to adjust the number of epochs, fine-tune hyperparameters, and monitor the model's performance using 
validation data to achieve the best results. Also, consider experimenting with different layers to unfreeze
and observe the impact on the model's performance.




















3. Pretraining on an auxiliary task.
a. In this exercise you will build a DNN that compares two MNIST digit images and
predicts whether they represent the same digit or not. Then you will reuse the lower
layers of this network to train an MNIST classifier using very little training data. Start
by building two DNNs (let’s call them DNN A and B), both similar to the one you built
earlier but without the output layer: each DNN should have five hidden layers of 100
neurons each, He initialization, and ELU activation. Next, add one more hidden layer
with 10 units on top of both DNNs. To do this, you should use
TensorFlow’s concat() function with axis=1 to concatenate the outputs of both DNNs
for each instance, then feed the result to the hidden layer. Finally, add an output
layer with a single neuron using the logistic activation function.
b. Split the MNIST training set in two sets: split #1 should containing 55,000 images,
and split #2 should contain contain 5,000 images. Create a function that generates a
training batch where each instance is a pair of MNIST images picked from split #1.
Half of the training instances should be pairs of images that belong to the same
class, while the other half should be images from different classes. For each pair, the

training label should be 0 if the images are from the same class, or 1 if they are from
different classes.
c. Train the DNN on this training set. For each image pair, you can simultaneously feed
the first image to DNN A and the second image to DNN B. The whole network will
gradually learn to tell whether two images belong to the same class or not.
d. Now create a new DNN by reusing and freezing the hidden layers of DNN A and
adding a softmax output layer on top with 10 neurons. Train this network on split #2
and see if you can achieve high performance despite having only 500 images per
class.





Ans-



Certainly! Here's how you can approach this task:

### a. Build DNNs for image comparison task:

```python
import tensorflow as tf
from tensorflow.keras.layers import Dense, ELU, Concatenate
from tensorflow.keras.models import Model

# Build DNN A
input_A = tf.keras.layers.Input(shape=(784,))
hidden_A = Dense(100, activation='elu', kernel_initializer='he_normal')(input_A)
for _ in range(4):
    hidden_A = Dense(100, activation='elu', kernel_initializer='he_normal')(hidden_A)
output_A = Dense(10, activation='elu')(hidden_A)

# Build DNN B
input_B = tf.keras.layers.Input(shape=(784,))
hidden_B = Dense(100, activation='elu', kernel_initializer='he_normal')(input_B)
for _ in range(4):
    hidden_B = Dense(100, activation='elu', kernel_initializer='he_normal')(hidden_B)
output_B = Dense(10, activation='elu')(hidden_B)

# Concatenate outputs of DNN A and DNN B
concatenated_outputs = Concatenate(axis=1)([output_A, output_B])

# Add hidden layer on top of concatenated outputs
hidden_layer = Dense(10, activation='elu')(concatenated_outputs)

# Output layer for binary classification
output = Dense(1, activation='sigmoid')(hidden_layer)

# Create the model
model = Model(inputs=[input_A, input_B], outputs=output)
```

### b. Generate training batches:

```python
import numpy as np
from sklearn.utils import shuffle

# Split the MNIST training set
split_1_size = 55000
split_2_size = 5000
split_1_images, split_2_images = shuffle(train_images[:split_1_size], train_images[split_1_size:])
split_1_labels, split_2_labels = shuffle(train_labels[:split_1_size], train_labels[split_1_size:])

def generate_batch(images, labels, batch_size):
    half_batch = batch_size // 2
    while True:
        positive_samples = np.random.choice(len(images), half_batch, replace=False)
        negative_samples = np.random.choice(len(images), half_batch, replace=False)
        
        pairs = []
        targets = []
        for i in range(half_batch):
            idx1, idx2 = positive_samples[i], negative_samples[i]
            pairs += [[images[idx1], images[idx2]]]
            targets += [0]
            
            idx1, idx2 = positive_samples[i], positive_samples[i]
            pairs += [[images[idx1], images[idx2]]]
            targets += [1]
        
        pairs, targets = shuffle(pairs, targets)
        yield [np.array(pairs)[:,0], np.array(pairs)[:,1]], np.array(targets)
        
batch_size = 32
train_gen = generate_batch(split_1_images, split_1_labels, batch_size)
```

### c. Train the DNN on the training set:

```python
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_gen, epochs=10, steps_per_epoch=len(split_1_images) // batch_size)
```

### d. Create and train a new DNN for MNIST classification using split #2:

```python
# Reuse and freeze hidden layers of DNN A
for layer in model.layers[:-2]:
    layer.trainable = False

# Add softmax output layer for 10 classes
output_layer = Dense(10, activation='softmax')(model.layers[-2].output)
mnist_classifier = Model(inputs=model.inputs, outputs=output_layer)

# Compile the model
mnist_classifier.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train on split #2
mnist_classifier.fit([split_2_images, split_2_images], split_2_labels, epochs=10, batch_size=32, validation_split=0.2)
```

This approach uses the concept of siamese networks for image comparison and transfer learning to achieve good
performance on a limited dataset. Fine-tuning the pretrained layers on a smaller dataset can lead to efficient 
utilization of data and model generalization. Remember to adjust hyperparameters and network architecture based
on the performance on validation data.






