# IST 691 Deep Learning in Practice

**Homework 2**

Name:

SUID:

*Save this notebook into your Google Drive. The notebook has appropriate comments at the top of code cells to indicate whether you need to modify them or not. Answer your questions directly in the notebook. Remember to use the GPU as your runtime. Once finished, run ensure all code blocks are run, download the notebook and submit through Blackboard.*

### Q1

Explain the differences between convolutional neural networks and a multi-layer perceptron. Explain whether the following statement is true, and if true, when it could be true.

'An MLP can represent the same functions as a CNN.'

*answer here*

### Q2

In class, we saw an example of autoencoders being able to remove the noise of an image. Explain why this happens and what the limits of such funcionality are.

*answer here*

### Q3

When using transfer learning models, sometimes we get better results by fine-tuning, and some other times we get better results by freezing the parameters before training. Under what circumstances should we fine-tune the model in order to get a better result? And, under what circumstances should we freeze the parameters instead?

*answer here*

### Q4: MLP vs CNN

Below, there are two neural networks for classifying MNIST digits: `model_mlp`  is an MLP with no hidden layers (the smallest possible) and 7,850 parameters. Evaluate the performance of this model below.

Then, define a convolutional neural network with similar a number of parameters and evaluate its performance. Can it do better? Why?

In [None]:
# DO NOT MODIFY THIS CELL

import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model_mlp = keras.Sequential(
    [
        keras.Input(shape = input_shape),
        layers.Flatten(),
        layers.Dense(num_classes, activation = 'softmax'),
    ]
)

model_mlp.summary()

In [None]:
# DO NOT MODIFY CELL
batch_size = 128
epochs = 15
model_mlp.compile(loss = 'categorical_crossentropy',
                  optimizer = 'adam',
                  metrics = ['accuracy'])

model_mlp.fit(x_train,
              y_train,
              batch_size = batch_size,
              epochs = epochs,
              validation_split = 0.1,
              verbose = 1)

In [None]:
# DO NOT MODIFY CELL
score = model_mlp.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [None]:
# DEFINE YOUR OWN CNN SO THAT THE PARAMETERS ARE FEWER THAN THE MLP
model_cnn = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        ....
        layers.Dense(num_classes, activation = 'softmax'),
    ]
)

model_cnn.summary()

In [None]:
# DO NOT MODIFY CELL
batch_size = 128
epochs = 15
model_cnn.compile(loss = 'categorical_crossentropy',
                  optimizer = 'adam',
                  metrics = ['accuracy'])
model_cnn.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1, verbose=1)

In [None]:
# DO NOT MODIFY CELL
score = model_cnn.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

**Did the CNN do better than the MLP? Why or why not?**

*answer here*

### Q5: Transfer learning

We are going to classify beans using transfer learning (read more about the dataset [here](https://www.tensorflow.org/datasets/catalog/beans). In the code below, use the `ResNet50` model available in Keras to classify the beans dataset (3 classes). **Do not fine tune `ResNet50`**. What is the performance?

In [None]:
# DO NOT MODIFY CELL
import tensorflow_datasets as tfds
import tensorflow as tf
from tensorflow.keras.utils import to_categorical

# loading images and labels
(train_ds, train_labels), (test_ds, test_labels) = tfds.load(
    'beans',
    split = ['train[:70%]', 'train[:30%]'], # train/test split
    batch_size = -1,
    as_supervised = True  # include labels
)

# resizing images
train_ds = tf.image.resize(train_ds, (200, 200))
test_ds = tf.image.resize(test_ds, (200, 200))

# transforming labels to correct format
train_labels = to_categorical(train_labels, num_classes=3)
test_labels = to_categorical(test_labels, num_classes=3)

In [None]:
# IMPORT THE APPROPRIATE MODEL HERE
from tensorflow.keras.applications...  import ...
from tensorflow.keras.applications... import preprocess_input

## loading ResNet50 model
base_model = ???
base_model.trainable = ???

## preprocessing input
train_ds = preprocess_input(train_ds)
test_ds = preprocess_input(test_ds)

from tensorflow.keras import layers, models

flatten_layer = layers.Flatten()
prediction_layer = layers.Dense(3, activation = 'softmax')

model = models.Sequential([
    base_model,
    flatten_layer,
    layers.Dropout(0.2),
    prediction_layer
])

In [None]:
# DO NOT MODIFY CELL
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow import keras

model.compile(
    optimizer = keras.optimizers.Adam(),
    loss = 'categorical_crossentropy',
    metrics = ['accuracy'],
)

In [None]:
# DO NOT MODIFY CELL
model.fit(train_ds, train_labels, epochs = 5, validation_split = 0.2, batch_size = 64)

In [None]:
# DO NOT MODIFY CELL
score = model.evaluate(test_ds, test_labels, verbose = 0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

### Q6: Autoencoder

Modify the convolutional autoencoder for MNIST we saw in class so that the embedding has the following structure:
- Conv2D: 8 filters, Kernel (3, 3)
- MaxPooling: Size (2, 2)
- Conv2D: 3 filters, Kernel (3, 3)
- MaxPooling: Size (2, 2)
- Conv2D: 1 filters, Kernel (3, 3)

After making this change, you need to change the input size of the decoder function so that it can accept the output of the encoder. What is the performance of your model?

In [None]:
# DO NOT MODIFY THIS CELL
from keras.datasets import mnist
from keras.models import Sequential, Model
from keras.layers import Dense
import numpy as np
import h5py
from pathlib import Path
import matplotlib.pyplot as plt

from keras import backend as keras_backend
keras_backend.set_image_data_format('channels_last')
from keras.models import Sequential, Model
from keras.layers import Conv2D, Dense, Input, MaxPooling2D, UpSampling2D
from keras.utils import np_utils
from keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
import h5py

# data now has a different shape
random_seed = 42
np.random.seed(random_seed)

# load the MNIST data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
image_height = X_train.shape[1]
image_width = X_train.shape[2]
number_of_pixels = image_height * image_width

# cast the sample data to the current Keras floating-point type
X_train = keras_backend.cast_to_floatx(X_train)
X_test = keras_backend.cast_to_floatx(X_test)

# reshape to 2D grid, one line per image
X_train = X_train.reshape(X_train.shape[0], number_of_pixels)
X_test = X_test.reshape(X_test.shape[0], number_of_pixels)

# scale data to range [0, 1]
X_train /= 255.0
X_test /= 255.0

# reshape sample data to 4D tensor using channels_last convention
X_train = X_train.reshape(X_train.shape[0], image_height, image_width, 1)
X_test = X_test.reshape(X_test.shape[0], image_height, image_width, 1)

# replace label data with one-hot encoded versions
number_of_classes = 1 + max(np.append(y_train, y_test))
y_train = np_utils.to_categorical(y_train, number_of_classes)
y_test = np_utils.to_categorical(y_test, number_of_classes)

In [None]:
# MODIFY THE ENCODER BELOW ACCORDING TO THE QUESTION REQUIREMENTS
CAE_encoder_conv_1 = Conv2D(??, (??, ??), activation = 'relu', padding = 'same')
CAE_encoder_pool_1 = MaxPooling2D((??,??), padding = 'same')
CAE_encoder_conv_2 = Conv2D(??, (??, ??), activation = 'relu', padding = 'same')
CAE_encoder_pool_2 = MaxPooling2D((??,??), padding = 'same')
CAE_encoder_output = Conv2D(??, (??, ??), activation = 'relu', padding = 'same')


In [None]:
# DO NOT MODIFY THIS CELL
CAE_decoder_up_1 = UpSampling2D((2,2))
CAE_decoder_conv_1 = Conv2D(8, (3, 3), activation = 'relu', padding = 'same')
CAE_decoder_up_2 = UpSampling2D((2,2))
CAE_decoder_output = Conv2D(1, (3, 3), activation = 'sigmoid', padding = 'same')

CAE_encoder_step_1 = CAE_encoder_conv_1(CAE_encoder_input)
CAE_encoder_step_2 = CAE_encoder_pool_1(CAE_encoder_step_1)
CAE_encoder_step_3 = CAE_encoder_conv_2(CAE_encoder_step_2)
CAE_encoder_step_4 = CAE_encoder_pool_2(CAE_encoder_step_3)
CAE_encoder_step_5 = CAE_encoder_output(CAE_encoder_step_4)

CAE_decoder_step_1 = CAE_decoder_up_1(CAE_encoder_step_5)
CAE_decoder_step_2 = CAE_decoder_conv_1(CAE_decoder_step_1)
CAE_decoder_step_3 = CAE_decoder_up_2(CAE_decoder_step_2)
CAE_decoder_step_4 = CAE_decoder_output(CAE_decoder_step_3)


Conv_AE = Model(CAE_encoder_input, CAE_decoder_step_4)
Conv_AE.compile(optimizer = 'adam', loss = 'binary_crossentropy')


Conv_AE_encoder_only_model = Model(CAE_encoder_input, CAE_encoder_step_5)

In [None]:
# MODIFY THE INPUT FOR THE DECODER BELOW ACCORDING TO THE OUTPUT EXPECTED FROM THE ENCODER
Conv_AE_decoder_only_input = Input(shape=(??, ??, ??))

In [None]:
# DO NOT MODIFY THIS CELL
Conv_AE_decoder_only_step_1 = CAE_decoder_up_1(Conv_AE_decoder_only_input)
Conv_AE_decoder_only_step_2 = CAE_decoder_conv_1(Conv_AE_decoder_only_step_1)
Conv_AE_decoder_only_step_3 = CAE_decoder_up_2(Conv_AE_decoder_only_step_2)
Conv_AE_decoder_only_step_4 = CAE_decoder_output(Conv_AE_decoder_only_step_3)

Conv_AE_decoder_only_model = Model(Conv_AE_decoder_only_input, Conv_AE_decoder_only_step_4)

In [None]:
# DO NOT MODIFY THIS CELL
# FIT AND EVALUATE PERFORMANCE
Conv_AE.fit(X_train, X_train,
               epochs = 50, batch_size = 128, shuffle = True,
               verbose = 2,
               validation_data = (X_test, X_test))

In [None]:
# DO NOT MODIFY THIS CELL
def draw_predictions_set(predictions, filename = None):
    plt.figure(figsize=(8, 4))
    for i in range(5):
        plt.subplot(2, 5, i+1)
        plt.imshow(X_test[i].reshape(28, 28), vmin = 0, vmax = 1, cmap = 'gray')
        ax = plt.gca()
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
        plt.subplot(2, 5, i + 6)
        plt.imshow(predictions[i].reshape(28, 28), vmin = 0, vmax = 1, cmap = 'gray')
        ax = plt.gca()
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
    plt.tight_layout()
    plt.show()

In [None]:
# Test your new predictions
Conv_predictions = Conv_AE.predict(X_test)
draw_predictions_set(Conv_predictions, 'NB3-ConvAE-predictions')