# Task 1

Start with reading the section “Implementing MLPs with Keras” from Chapter 10 of Geron’s text-book (pages 292-325).
Then install `TensorFlow 2.0+` and experiment with the code included in this section.
Additionally, study the official documentation (https://keras.io/) and get an idea of the numerous options offered by Keras (layers, loss functions, metrics, optimizers, activations, initializers, regularizers).
Don’t get overwhelmed with the number of options – you will frequently return to this site in the coming months.

### Imports

In [None]:
# stdlib
import os
from itertools import product
from time import perf_counter
from typing import Callable

# pip
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow import keras
from keras import layers
from keras.models import Sequential
from keras.datasets import fashion_mnist, cifar10

# local
from utils import get_dirs

In [None]:
DIRS = get_dirs(os.path.abspath('') + os.sep + 'Task1.ipynb')
print('\033[1m' + 'Directories:' + '\033[0m')
for dir_name, path in DIRS.items():
    print(f'{dir_name:<7} {path}')

---
## Part 1

Check out this official repository with many examples of Keras implementations of various sorts of deep neural networks [here](https://github.com/keras-team/keras/tree/tf-keras-2/examples).
We recommend cloning this repository and try to get some of these examples running on your system (or Colab/DeepNote).
In particular, experiment with `mnist_mlp.py` and `mnist_cnn.py` scripts which show you how to build simple neural networks for the MNIST dataset (useful for the next task).

*insert findings*

---

## Part 2

Next, take the two well-known datasets: Fashion MNIST (introduced in _Ch. 10, p. 295_) and CIFAR-10.
The first dataset contains 2D (grayscale) images of size 28x28, split into 10 categories; 60,000 images for training and 10,000 for testing, while the latter contains 32x32x3 RGB images (50,000/10,000 train/test).
Apply two reference networks on the fashion MNIST dataset: a MLP described in detail in _Ch. 10, pp. 297-307_ and a CNN described in _Ch. 14, p. 447_.
Experiment with both networks, trying various options: initializations, activations, optimizers (and their hyperparameters), regularizations (L1, L2, Dropout, no Dropout).
You may also experiment with changing the architecture of both networks: adding/removing layers, number of convolutional filters, their sizes, etc.

After you have found the best performing hyperparameter sets, take the 3 best ones and train new models on the CIFAR-10 dataset, see whether your performance gains translate to a different dataset.
Provide your thoughts on these results in the report.

First we create a MLP model for the fashion MNIST dataset.
We use the same model as in the book, but we add a dropout layer after the first dense layer.
We also use the Adam optimizer with a learning rate of 0.001.
We train the model for 10 epochs and use a batch size of 32.
We use the same model for the CIFAR-10 dataset, but we change the number of epochs to 20 and the batch size to 64.
We also use a learning rate of 0.001 for the CIFAR-10 dataset.

load FMNIST dataset

In [None]:
(X_train_f, y_train_f), (X_test_f, y_test_f) = fashion_mnist.load_data()

X_train_f = X_train_f.astype('float32') / 255
X_test_f = X_test_f.astype('float32') / 255

X_train_f.shape, X_test_f.shape

define functions for running the hyperparameter exploration experiments

In [None]:
def build_default_MLP(input_shape, activation, optimizer, lr) -> Sequential:
    """
    Returns a compiled default MLP classifier architecture with
    a given input shape, activation function, optimizer and learning rate.
    """
    model = Sequential([
        layers.Flatten(input_shape=input_shape),
        layers.Dense(300, activation=activation),
        layers.Dense(100, activation=activation),
        layers.Dense(10, activation='softmax')
    ])
    model.compile(
        optimizer = optimizer(learning_rate=lr),
        loss = 'sparse_categorical_crossentropy',
        metrics = ['accuracy']
    )
    return model

In [None]:
def build_default_CNN(input_shape, activation, optimizer, lr) -> Sequential:
    """
    Returns a compiled default CNN classifier architecture with
    a given input shape, activation function, optimizer and learning rate.
    """
    model = Sequential([
        layers.Conv2D(32, (3, 3), activation=activation, input_shape=input_shape),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation=activation),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation=activation),
        layers.Flatten(),
        layers.Dense(64, activation=activation),
        layers.Dense(10, activation='softmax')
    ])
    model.compile(
        optimizer = optimizer(learning_rate=lr),
        loss = 'sparse_categorical_crossentropy',
        metrics = ['accuracy']
    )
    return model

In [None]:
def run_experiment(model_constructor: Callable, datasets: tuple, configs: list) -> pd.DataFrame:
    """
    Params:
        model_constructor (function) - build_default_MLP or build_default_CNN
        datasets (tuple) - (X_train, y_train, X_test, y_test)
        configs (list) - list of tuples of (optimizer, lr, activation)
    """
    np.random.seed(42)
    X_train, y_train, X_test, y_test = datasets
    if 'CNN' in model_constructor.__name__:
        X_train = np.expand_dims(X_train, axis=-1)
        X_test = np.expand_dims(X_test, axis=-1)
    
    df = pd.DataFrame(columns=['optimizer', 'lr', 'activation', 'loss', 'accuracy', 'traintime'])
    run = 1
    for optimizer, lr, activation in configs:
        losses, accuracies, traintimes = [], [], []
        for _ in range(3):
            print(f'\r{run}/{len(configs)*3}', end='')
            
            model = model_constructor(
                input_shape = X_train.shape[1:],
                activation = activation,
                optimizer = optimizer,
                lr = lr
            )
            
            tic = perf_counter()
            model.fit(
                x = X_train,
                y = y_train,
                epochs = 5,
                batch_size = 64,
                verbose = 0
            )
            toc = perf_counter()

            test_loss, test_acc = model.evaluate(
                x = X_test,
                y = y_test,
                verbose = 0
            )
            
            losses.append(test_loss)
            accuracies.append(test_acc)
            traintimes.append(toc-tic)
            run += 1

        df.loc[f'{optimizer.__name__}-{activation}-{lr}'] = [
            optimizer.__name__,
            lr,
            activation,
            np.mean(losses),
            np.mean(accuracies),
            np.mean(traintimes)
        ]
    return df

In [None]:
def create_plots(df, optimizers, activations, lrs, title) -> plt.Figure:
    """
    Creates a 3x3 grid of plots for the given optimizers, activations and learning rates.
    """
    fig, axes = plt.subplots(3, 3, figsize=(15, 15))
    for i, optimizer in enumerate([opt.__name__ for opt in optimizers]):
        for j, metric in enumerate(['accuracy', 'loss', 'traintime']):
            ax = axes[i,j]
            for activation in activations:
                df[(df.optimizer == optimizer) & (df.activation == activation)].plot(
                    x = 'lr',
                    y = metric,
                    ax = ax,
                    label = activation
                )
            ax.set_xlabel('')
            ax.set_ylabel(optimizer) if j == 0 else ax.set_ylabel('')
            ax.set_xticks(lrs, fontsize=3)
            ax.get_legend().remove()
            ax.set_title(metric) if i == 0 else ax.set_title('')

    # set global legend
    handles, labels = ax.get_legend_handles_labels()
    fig.legend(handles, labels, loc='upper right', bbox_to_anchor=(0.99, 0.99), ncol=3, fontsize=14)
    fig.suptitle(title, fontsize=20, weight='bold')
    fig.tight_layout()
    return fig

test different hyperparameters for a 2-hidden-layer MLP as defined in chapter 10 of the book (FMNIST dataset)

In [None]:
optimizers = [keras.optimizers.Adam, keras.optimizers.SGD, keras.optimizers.RMSprop]
lrs = [1e-3, 5e-3, 1e-2]
activations = ['relu', 'sigmoid', 'tanh']
configs = list(product(optimizers, lrs, activations))

df_mf = run_experiment(build_default_MLP, (X_train_f, y_train_f, X_test_f, y_test_f), configs)
df_mf.to_csv(DIRS['csv'] + 'mlp_fmnist.csv', index=False)
fig_mf = create_plots(df_mf, optimizers, activations, lrs, 'Fashion MNIST MLP')
fig_mf.savefig(DIRS['plots'] + 'mlp_fmnist.png', dpi=300)

do the same for a 3-hidden-layer CNN as defined in chapter 14 of the book with some modifications to save on runtime (FMNIST dataset)

In [None]:
# we use the same configs list as for the MLP

df_cf = run_experiment(build_default_CNN, (X_train_f, y_train_f, X_test_f, y_test_f), configs)
df_cf.to_csv(DIRS['csv'] + 'cnn_fmnist.csv', index=False)
fig_cf = create_plots(df_cf, optimizers, activations, lrs, 'Fashion MNIST CNN')
fig_cf.savefig(DIRS['plots'] + 'cnn_fmnist.png', dpi=300)

---

now we run the same two experiments, but on the CIFAR-10 dataset

In [None]:
del X_train_f, y_train_f, X_test_f, y_test_f

(X_train_c, y_train_c), (X_test_c, y_test_c) = cifar10.load_data()

X_train_c = X_train_c.astype('float32') / 255
X_test_c = X_test_c.astype('float32') / 255

X_train_c.shape, X_test_c.shape

In [None]:
df_mc = run_experiment(build_default_MLP, (X_train_c, y_train_c, X_test_c, y_test_c), configs)
df_mc.to_csv(DIRS['csv'] + 'mlp_cifar.csv', index=False)
fig_mc = create_plots(df_mc, optimizers, activations, lrs, 'CIFAR-10 MLP')
fig_mc.savefig(DIRS['plots'] + 'mlp_cifar.png', dpi=300)

In [None]:
df_cc = run_experiment(build_default_CNN, (X_train_c, y_train_c, X_test_c, y_test_c), configs)
df_cc.to_csv(DIRS['csv'] + 'cnn_cifar.csv', index=False)
fig_cc = create_plots(df_cc, optimizers, activations, lrs, 'CIFAR-10 CNN')
fig_cc.savefig(DIRS['plots'] + 'cnn_cifar.png', dpi=300)

In [None]:
# scatterplot y_pred vs y_true
def plot_scatter(y_true, y_pred, title):
    fig, ax = plt.subplots(figsize=(10, 10))
    ax.scatter(y_true, y_pred, s=1)
    ax.set_xlabel('y_true')
    ax.set_ylabel('y_pred')
    ax.set_title(title)
    return fig