In [9]:
import os
import tensorflow as tf
from tensorflow import keras

# Because it's a tiny dataset, we hide GPUs to use the CPU, it will be faster
tf.config.set_visible_devices([], "GPU")

MODEL_PATH = os.path.join(os.getcwd(), "models")
os.makedirs(MODEL_PATH, exist_ok=True)

((x_train, y_train), (x_test, y_test)) = keras.datasets.fashion_mnist.load_data()
y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)

Question 1 :  A first linear neural network 
---

For this question, we ask you to build and to train a neural network with the following specifications:

- The network contains 2 layers:
    - A flatten layer which flattens the $28 \times 28$ 2D images into 1D vectors of size $784$. This layer has no parameter to train, it just reshapes the input data.
    - A dense output layer with a softmax activation function such that it can predict the target categorical (=class) variable. The kernel and bias are initializers set to *RandomNormal*.
- The network loss is the categorical cross entropy loss.
- The network optimizer is the Adam optimizer (an optimized version of the gradient descent procedure) with a learning rate of $10^{-5}$

We are here essentially training 10 linear models and then applying a softmax on them. This is **not yet** a deep neural network.

Implement your neural network in the variable *model*. Just define and compile the network, don't fit it on the training data (your submission will likely time out if you do!).

In [2]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(10, kernel_initializer="random_normal", bias_initializer="random_normal", activation="softmax"),
], name="linear")

model.compile(
    loss="categorical_crossentropy",
    optimizer=tf.keras.optimizers.Adam(lr=1e-5),
    metrics=['accuracy']
)

model.summary()

Model: "linear"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 10)                7850      
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________


Question 2 : A first linear neural network: model fitting 
---

Fit your model from question 1 on the train data with a batch size of $32$. Run $100$ epochs to fit your model.

Once your neural network is fitted, save it in a *.model* file using the [save](https://www.tensorflow.org/guide/keras/save_and_serialize) function of Keras with *save_format='h5'* and upload it below.

In [3]:
EPOCHS = 100
BATCH_SIZE = 32

fit_feedback = model.fit(x_train, y_train, 
                         validation_data=(x_test, y_test),
                         batch_size=BATCH_SIZE, 
                         epochs=EPOCHS,
                         use_multiprocessing=True)
model.save(os.path.join(MODEL_PATH, f'{model.name}.model'), save_format='h5')

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Question 3 : A first linear neural network: performance 
---

How many trainable parameters are contained in the whole network you just built? What are the measured train and test accuracies of the model you fitted in question 2?

Report your answer under the format: *number_param*, *train_acc*, *test_acc* (use a decimal notation for the accuracies, not %).

In [10]:
# We already had the number of parameters in the model summary
import operator
from functools import reduce

history = fit_feedback.history
number_param = tf.reduce_sum([reduce(operator.mul, v.shape) for v in model.trainable_variables]).numpy()
train_acc = model.evaluate(x_train, y_train)[-1]
test_acc = model.evaluate(x_test, y_test)[-1]

print(f"number_param, train_acc, test_acc :: {number_param}, {train_acc:.3f}, {test_acc:.3f}")

number_param, train_acc, test_acc :: 7850, 0.814, 0.792


Question 4 : A non-linear network 
---

Build a new model, by adding a layer before the output layer of your neural net from question 1. This layer must be a dense layer with a tanh activation function, and should contain $100$ units. The kernel and bias are initialized to *random_normal*.

Use a learning rate of $10^{-5}$.

Implement you neural network in the variable *model*. Just define and compile the network, don't fit it on the training data (your submission will likely time out if you do!).

In [5]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(100, kernel_initializer="random_normal", bias_initializer="random_normal", activation="tanh"),
    tf.keras.layers.Dense(10, kernel_initializer="random_normal", bias_initializer="random_normal", activation="softmax"),
], name="non_linear")

model.compile(
    loss="categorical_crossentropy",
    optimizer=tf.keras.optimizers.Adam(lr=1e-5),
    metrics=['accuracy']
)

model.summary() 

Model: "non_linear"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 100)               78500     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1010      
Total params: 79,510
Trainable params: 79,510
Non-trainable params: 0
_________________________________________________________________


Question 5 : 
---

Fit your model from question 4 on the train data with a batch size of $32$. Run $100$ epochs to fit your model.

Once your neural network is fitted, save it in a *.model* file using the [save](https://www.tensorflow.org/guide/keras/save_and_serialize) function of Keras with *save_format='h5'* and upload it below.

In [6]:
EPOCHS = 100
BATCH_SIZE = 32

fit_feedback = model.fit(x_train, y_train, 
                         validation_data=(x_test, y_test),
                         batch_size=BATCH_SIZE, 
                         epochs=EPOCHS,
                         use_multiprocessing=True)
model.save(os.path.join(MODEL_PATH, f'{model.name}.model'), save_format='h5')

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Question 6 : 
---

How many trainable parameters are contained in the whole network you built in question 4? What are the measured train and test accuracies of the model as fitted in question 5?

Report your answer under the format: *number_param*, *train_acc*, *test_acc* (use a decimal notation for the accuracies, not %).

In [12]:
# We already had the number of parameters in the model summary
import operator
from functools import reduce

history = fit_feedback.history
number_param = tf.reduce_sum([reduce(operator.mul, v.shape) for v in model.trainable_variables]).numpy()
train_acc = model.evaluate(x_train, y_train)[-1]
test_acc = model.evaluate(x_test, y_test)[-1]

print(f"number_param, train_acc, test_acc :: {number_param}, {train_acc:.3f}, {test_acc:.3f}")

number_param, train_acc, test_acc :: 79510, 0.874, 0.840


Question 7 :
---

Besides a tanh activation funtion, other non-linear functions can be implemented in a hidden layer. Let's consider the ReLU activation: use the **exact** same network as in the previous question, but with ReLU instead of tanh activation in the hidden layer. Which one performs better?

Since there is a lot of randomness involved, different training runs for the same network might yield different results. To get more robust results, perform $10$ distinct runs for each model and report the average test accuracies.

Train each model during $100$ epochs.

Report the mean test accuracy of both networks using the format: *tanh_acc*, *relu_acc* (use a decimal notation, not %).

In [13]:
from threading import Thread
from time import time
from tqdm import trange

EPOCHS = 100
BATCH_SIZE = 32
N_RUNS = 10

def get_accuracies(activation, output):
    
    def get_non_linear_model():
        """
        Return the global model with the good activation function for the hidden layer
        """
        
        model = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(100, kernel_initializer="random_normal", bias_initializer="random_normal", activation=activation),
            tf.keras.layers.Dense(10, kernel_initializer="random_normal", bias_initializer="random_normal", activation="softmax"),
        ], name="non_linear")
        
        model.compile(
            loss="categorical_crossentropy",
            optimizer=tf.keras.optimizers.Adam(lr=1e-5),
            metrics=['accuracy']
        )
        
        return model
    
    mean_acc = .0
    mean_elapsed = .0
    global_start = time()
    for run in range(N_RUNS):
        start = time()
        model = get_non_linear_model()
        fit_feedback = model.fit(x_train, y_train, 
                                 validation_data=(x_test, y_test),
                                 batch_size=BATCH_SIZE, 
                                 epochs=EPOCHS,
                                 use_multiprocessing=True,
                                 verbose=0)
        acc =  fit_feedback.history["val_accuracy"][-1]
        output.append(acc)
        
        # Just needed to log
        elapsed = time()-start
        global_elapsed = time()-global_start
        
        mean_elapsed = ((run)*mean_elapsed + elapsed) / (run+1)
        mean_acc = ((run)*mean_acc + acc) / (run+1)
        
        remaining_time = global_elapsed + (N_RUNS-(run+1)) * mean_elapsed
        
        print(f"[{activation}] run {run+1}/{N_RUNS} :: mean accuracy={mean_acc:.3f} :: [{global_elapsed:.2f}s < {remaining_time:.2f}s]")
        

# To run faster (while the cpu is not bottlenecked), we will run both model (tanh, relu) in parallel
tanh_accuracies = list()
relu_accuracies = list()
tanh_thread = Thread(target=get_accuracies, kwargs=dict(activation="tanh", output=tanh_accuracies))
relu_thread = Thread(target=get_accuracies, kwargs=dict(activation="relu", output=relu_accuracies))

tanh_thread.start(); relu_thread.start();
tanh_thread.join(); relu_thread.join();

tanh_acc = tf.reduce_mean(tanh_accuracies).numpy()
relu_acc = tf.reduce_mean(relu_accuracies).numpy()

print(f'\ntanh_acc, relu_acc :: {tanh_acc:.3f}, {relu_acc:.3f}')


tanh_acc, relu_acc :: 0.841, 0.852


Question 8 : Multiple choice
---


- [x] In the neural network of Q4, the only introduced non-linearities come from the activation functions.
- [x] Each epoch during the learning of the neural network of Q4 takes more time as there are more trainable parameters (as compared to the initial network fitted in Q2).
- [ ] The kernel and bias initializers do not influence the final solution. The gradient descent always converges towards the same solution as the minimization problem is convex.
- [ ] With a sufficiently small learning rate, the categorical accuracy on the test set is guaranteed to increase after each epoch.
- [ ] The ReLU activation function is linear
- [ ] The learning rate does not influence a lot the learned neural network, it mainly influences the number of epochs until convergence.
- [ ] With a sufficiently small learning rate, the categorical accuracy on the train set is guaranteed to increase after each epoch.