### Part 3: Dropout

* Creating 4 networks with different architectures.
* Plotting comparisons for network Dropouts vs no Dropout.
* Pipeline:
  1. Train 4 models with specified architectures.
  2. For each model, save the train and test losses to .csv files using CSVLOGGER callbacks.
  3. Read the csv logs and plot the losses.

In [15]:
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.callbacks import CSVLogger

%matplotlib inline 

%tensorflow_version 2.x

In [16]:
from google.colab import drive
drive.mount('drive')

Drive already mounted at drive; to attempt to forcibly remount, call drive.mount("drive", force_remount=True).


In [17]:
mnist = tf.keras.datasets.mnist

In [18]:
(x_train, y_train), (x_test, y_test) = mnist.load_data() # load data
assert x_train.shape == (60000, 28, 28)
assert x_test.shape == (10000, 28, 28)
assert y_train.shape == (60000,)
assert y_test.shape == (10000,)

In [19]:
# scaling. the data-set

x_train, x_test = x_train/255, x_test/255
# X = pd.concat([x_train, x_test])
CLASSES = 10

In [6]:
# class TestCallback(Callback):
#     def __init__(self, test_data):
#         self.test_data = test_data

#     def on_epoch_end(self, epoch, logs={}):
#         x, y = self.test_data
#         loss, acc = self.model.evaluate(x, y, verbose=0)
#         print('\nTesting loss: {}, acc: {}\n'.format(loss, acc))

#### Creating four networks for loss comparison:
a. Activation function: the logistic sigmoid function; initialization: Xavier initializer; no dropout


In [11]:
init_xavier = tf.initializers.GlorotUniform()
model_xav_nodrop = tf.keras.models.Sequential([
                                     
    tf.keras.layers.Flatten(input_shape=(28, 28)),

    tf.keras.layers.Dense(1024, activation=tf.nn.sigmoid, kernel_initializer=init_xavier, name='layer1'),
    tf.keras.layers.Dense(1024, activation=tf.nn.sigmoid, kernel_initializer=init_xavier, name='layer2'),
    tf.keras.layers.Dense(1024, activation=tf.nn.sigmoid, kernel_initializer=init_xavier, name='layer3'),
    tf.keras.layers.Dense(1024, activation=tf.nn.sigmoid, kernel_initializer=init_xavier, name='layer4'),
    tf.keras.layers.Dense(1024, activation=tf.nn.sigmoid, kernel_initializer=init_xavier, name='layer5'),
    tf.keras.layers.Dense(CLASSES, activation='softmax', name='layer_softmax')

])
model_xav_nodrop.compile(
  optimizer='adam',
  loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
  # loss='mse',
  metrics=['accuracy'])

In [12]:
csv_log_model_xav_nodrop = CSVLogger("xav_nodrop_results.csv")
model_xav_nodrop_logger = model_xav_nodrop.fit(
    x_train, 
    y_train, 
    epochs=300, 
    validation_data=(x_test, y_test), 
    # batch_size=10, 
    verbose=2,
    callbacks=[csv_log_model_xav_nodrop]
)
!cp xav_nodrop_results.csv "drive/My Drive/"

Epoch 1/300


  '"`sparse_categorical_crossentropy` received `from_logits=True`, but '


1875/1875 - 10s - loss: 0.6910 - accuracy: 0.7537 - val_loss: 0.2123 - val_accuracy: 0.9369
Epoch 2/300
1875/1875 - 9s - loss: 0.1613 - accuracy: 0.9536 - val_loss: 0.1508 - val_accuracy: 0.9568
Epoch 3/300
1875/1875 - 9s - loss: 0.1101 - accuracy: 0.9689 - val_loss: 0.1208 - val_accuracy: 0.9656
Epoch 4/300
1875/1875 - 9s - loss: 0.0803 - accuracy: 0.9768 - val_loss: 0.1072 - val_accuracy: 0.9707
Epoch 5/300
1875/1875 - 9s - loss: 0.0649 - accuracy: 0.9815 - val_loss: 0.0872 - val_accuracy: 0.9761
Epoch 6/300
1875/1875 - 9s - loss: 0.0517 - accuracy: 0.9850 - val_loss: 0.0992 - val_accuracy: 0.9748
Epoch 7/300
1875/1875 - 9s - loss: 0.0412 - accuracy: 0.9882 - val_loss: 0.0892 - val_accuracy: 0.9779
Epoch 8/300
1875/1875 - 9s - loss: 0.0346 - accuracy: 0.9896 - val_loss: 0.1290 - val_accuracy: 0.9708
Epoch 9/300
1875/1875 - 9s - loss: 0.0279 - accuracy: 0.9913 - val_loss: 0.0802 - val_accuracy: 0.9811
Epoch 10/300
1875/1875 - 9s - loss: 0.0252 - accuracy: 0.9926 - val_loss: 0.0875 - v

b. Activation function: the logistic sigmoid function; initialization: Xavier initializer; with dropout rate: 0.2 for the first layer and 0.5 for the other hidden layers.

In [13]:
init_xavier = tf.initializers.GlorotUniform()
model_xav_drop = tf.keras.models.Sequential([
                                     
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1024, activation=tf.nn.sigmoid, kernel_initializer=init_xavier, name='layer1'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1024, activation=tf.nn.sigmoid, kernel_initializer=init_xavier, name='layer2'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1024, activation=tf.nn.sigmoid, kernel_initializer=init_xavier, name='layer3'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1024, activation=tf.nn.sigmoid, kernel_initializer=init_xavier, name='layer4'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1024, activation=tf.nn.sigmoid, kernel_initializer=init_xavier, name='layer5'),
    
    tf.keras.layers.Dense(CLASSES, activation='softmax', name='layer_softmax')

])
model_xav_drop.compile(
  optimizer='adam',
  loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
  # loss='mse',
  metrics=['accuracy'])

In [14]:
csv_log_model_xav_drop = CSVLogger("xav_drop_results.csv")
model_xav_drop_logger = model_xav_drop.fit(
    x_train, 
    y_train, 
    epochs=300, 
    validation_data=(x_test, y_test), 
    # batch_size=10, 
    verbose=2,
    callbacks=[csv_log_model_xav_drop]
)
!cp xav_drop_results.csv "drive/My Drive/"

Epoch 1/300


  '"`sparse_categorical_crossentropy` received `from_logits=True`, but '


1875/1875 - 10s - loss: 0.7032 - accuracy: 0.7615 - val_loss: 0.2165 - val_accuracy: 0.9357
Epoch 2/300
1875/1875 - 10s - loss: 0.2574 - accuracy: 0.9252 - val_loss: 0.1493 - val_accuracy: 0.9583
Epoch 3/300
1875/1875 - 9s - loss: 0.1998 - accuracy: 0.9417 - val_loss: 0.1235 - val_accuracy: 0.9645
Epoch 4/300
1875/1875 - 9s - loss: 0.1699 - accuracy: 0.9513 - val_loss: 0.1004 - val_accuracy: 0.9706
Epoch 5/300
1875/1875 - 10s - loss: 0.1488 - accuracy: 0.9562 - val_loss: 0.0954 - val_accuracy: 0.9749
Epoch 6/300
1875/1875 - 9s - loss: 0.1327 - accuracy: 0.9611 - val_loss: 0.0910 - val_accuracy: 0.9747
Epoch 7/300
1875/1875 - 10s - loss: 0.1197 - accuracy: 0.9647 - val_loss: 0.0857 - val_accuracy: 0.9775
Epoch 8/300
1875/1875 - 10s - loss: 0.1116 - accuracy: 0.9671 - val_loss: 0.0841 - val_accuracy: 0.9771
Epoch 9/300
1875/1875 - 10s - loss: 0.1050 - accuracy: 0.9697 - val_loss: 0.0773 - val_accuracy: 0.9784
Epoch 10/300
1875/1875 - 9s - loss: 0.0970 - accuracy: 0.9712 - val_loss: 0.078

In [None]:
init_xavier = tf.initializers.GlorotUniform()
model_xav_nodrop_relu = tf.keras.models.Sequential([
                                     
    tf.keras.layers.Flatten(input_shape=(28, 28)),

    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_xavier, name='layer1'),
    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_xavier, name='layer2'),
    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_xavier, name='layer3'),
    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_xavier, name='layer4'),
    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_xavier, name='layer5'),
    tf.keras.layers.Dense(CLASSES, activation='softmax', name='layer_softmax')

])
model_xav_nodrop_relu.compile(
  optimizer='adam',
  loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
  # loss='mse',
  metrics=['accuracy'])

In [None]:
csv_log_model_xav_nodrop = CSVLogger("xav_nodrop_relu_results.csv")
model_xav_nodrop_logger = model_xav_nodrop_relu.fit(
    x_train, 
    y_train, 
    epochs=300, 
    validation_data=(x_test, y_test), 
    # batch_size=10, 
    verbose=2,
    callbacks=[csv_log_model_xav_nodrop]
)
!cp xav_nodrop_relu_results.csv "drive/My Drive/"

Epoch 1/300


  '"`sparse_categorical_crossentropy` received `from_logits=True`, but '


1875/1875 - 10s - loss: 0.6910 - accuracy: 0.7537 - val_loss: 0.2123 - val_accuracy: 0.9369
Epoch 2/300
1875/1875 - 9s - loss: 0.1613 - accuracy: 0.9536 - val_loss: 0.1508 - val_accuracy: 0.9568
Epoch 3/300
1875/1875 - 9s - loss: 0.1101 - accuracy: 0.9689 - val_loss: 0.1208 - val_accuracy: 0.9656
Epoch 4/300
1875/1875 - 9s - loss: 0.0803 - accuracy: 0.9768 - val_loss: 0.1072 - val_accuracy: 0.9707
Epoch 5/300
1875/1875 - 9s - loss: 0.0649 - accuracy: 0.9815 - val_loss: 0.0872 - val_accuracy: 0.9761
Epoch 6/300
1875/1875 - 9s - loss: 0.0517 - accuracy: 0.9850 - val_loss: 0.0992 - val_accuracy: 0.9748
Epoch 7/300
1875/1875 - 9s - loss: 0.0412 - accuracy: 0.9882 - val_loss: 0.0892 - val_accuracy: 0.9779
Epoch 8/300
1875/1875 - 9s - loss: 0.0346 - accuracy: 0.9896 - val_loss: 0.1290 - val_accuracy: 0.9708
Epoch 9/300
1875/1875 - 9s - loss: 0.0279 - accuracy: 0.9913 - val_loss: 0.0802 - val_accuracy: 0.9811
Epoch 10/300
1875/1875 - 9s - loss: 0.0252 - accuracy: 0.9926 - val_loss: 0.0875 - v

---
c. Activation function: ReLU; initialization: random numbers generated from the normal distribution ($\mu$ = 0, $\sigma$ = 0:01)

In [None]:
init_he = tf.initializers.HeUniform()
model_he_nodrop = tf.keras.models.Sequential([
                                     
    tf.keras.layers.Flatten(input_shape=(28, 28)),

    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_he, name='layer1'),
    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_he, name='layer2'),
    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_he, name='layer3'),
    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_he, name='layer4'),
    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_he, name='layer5'),
    tf.keras.layers.Dense(CLASSES, activation='softmax', name='layer_softmax')

])
model_he_nodrop.compile(
  optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
  loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
  # loss='mse',
  metrics=['accuracy'])

In [None]:
csv_log_model_he_nodrop = CSVLogger("he_nodrop_results.csv")
model_he_nodrop_logger = model_he_nodrop.fit(
    x_train, 
    y_train, 
    epochs=300, 
    validation_data=(x_test, y_test), 
    # batch_size=10, 
    verbose=2,
    callbacks=[csv_log_model_he_nodrop]
)
!cp he_nodrop_results.csv "drive/My Drive/"

Epoch 1/300


  '"`sparse_categorical_crossentropy` received `from_logits=True`, but '


1875/1875 - 13s - loss: 0.2079 - accuracy: 0.9383 - val_loss: 0.0973 - val_accuracy: 0.9694
Epoch 2/300
1875/1875 - 9s - loss: 0.0778 - accuracy: 0.9756 - val_loss: 0.0967 - val_accuracy: 0.9708
Epoch 3/300
1875/1875 - 9s - loss: 0.0457 - accuracy: 0.9854 - val_loss: 0.0657 - val_accuracy: 0.9791
Epoch 4/300
1875/1875 - 9s - loss: 0.0315 - accuracy: 0.9896 - val_loss: 0.0754 - val_accuracy: 0.9776
Epoch 5/300
1875/1875 - 9s - loss: 0.0246 - accuracy: 0.9919 - val_loss: 0.0804 - val_accuracy: 0.9792
Epoch 6/300
1875/1875 - 9s - loss: 0.0205 - accuracy: 0.9935 - val_loss: 0.0868 - val_accuracy: 0.9753
Epoch 7/300
1875/1875 - 9s - loss: 0.0167 - accuracy: 0.9947 - val_loss: 0.0869 - val_accuracy: 0.9772
Epoch 8/300
1875/1875 - 9s - loss: 0.0143 - accuracy: 0.9956 - val_loss: 0.0957 - val_accuracy: 0.9769
Epoch 9/300
1875/1875 - 9s - loss: 0.0124 - accuracy: 0.9962 - val_loss: 0.1108 - val_accuracy: 0.9768
Epoch 10/300
1875/1875 - 9s - loss: 0.0125 - accuracy: 0.9962 - val_loss: 0.0765 - v

---
d. Activation function: ReLU; initialization: Kaiming He's initializer; with dropout rate:0.2 for the first layer and 0.5 for the other hidden layers

In [21]:
init_he = tf.initializers.HeUniform()
model_he_drop = tf.keras.models.Sequential([
                                     
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dropout(0.2),

    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_he, name='layer1'),
    tf.keras.layers.Dropout(0.5),

    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_he, name='layer2'),
    tf.keras.layers.Dropout(0.5),

    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_he, name='layer3'),
    tf.keras.layers.Dropout(0.5),

    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_he, name='layer4'),
    tf.keras.layers.Dropout(0.5),

    tf.keras.layers.Dense(1024, activation="relu", kernel_initializer=init_he, name='layer5'),
    tf.keras.layers.Dense(CLASSES, activation='softmax', name='layer_softmax')

])
model_he_drop.compile(
  optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
  loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
  # loss='mse',
  metrics=['accuracy'])

In [22]:
csv_log_model_he_drop = CSVLogger("he_drop_results.csv")
model_he_drop_logger = model_he_drop.fit(
    x_train, 
    y_train, 
    epochs=300, 
    validation_data=(x_test, y_test), 
    # batch_size=10, 
    verbose=2,
    callbacks=[csv_log_model_he_drop]
)
!cp he_drop_results.csv "drive/My Drive/"

Epoch 1/300


  '"`sparse_categorical_crossentropy` received `from_logits=True`, but '


1875/1875 - 11s - loss: 1.1492 - accuracy: 0.6091 - val_loss: 0.3165 - val_accuracy: 0.9065
Epoch 2/300
1875/1875 - 9s - loss: 0.4670 - accuracy: 0.8508 - val_loss: 0.1957 - val_accuracy: 0.9418
Epoch 3/300
1875/1875 - 9s - loss: 0.3368 - accuracy: 0.8971 - val_loss: 0.1558 - val_accuracy: 0.9554
Epoch 4/300
1875/1875 - 10s - loss: 0.2774 - accuracy: 0.9152 - val_loss: 0.1320 - val_accuracy: 0.9613
Epoch 5/300
1875/1875 - 9s - loss: 0.2330 - accuracy: 0.9282 - val_loss: 0.1121 - val_accuracy: 0.9669
Epoch 6/300
1875/1875 - 10s - loss: 0.2099 - accuracy: 0.9376 - val_loss: 0.1122 - val_accuracy: 0.9680
Epoch 7/300
1875/1875 - 10s - loss: 0.1839 - accuracy: 0.9433 - val_loss: 0.0946 - val_accuracy: 0.9717
Epoch 8/300
1875/1875 - 10s - loss: 0.1702 - accuracy: 0.9487 - val_loss: 0.0922 - val_accuracy: 0.9733
Epoch 9/300
1875/1875 - 9s - loss: 0.1566 - accuracy: 0.9526 - val_loss: 0.0847 - val_accuracy: 0.9747
Epoch 10/300
1875/1875 - 10s - loss: 0.1488 - accuracy: 0.9560 - val_loss: 0.082

---
#### Citations:
* *CSVLogger callbacks:* https://towardsdatascience.com/a-practical-introduction-to-keras-callbacks-in-tensorflow-2-705d0c584966
* *Callbacks:* https://github.com/keras-team/keras/issues/2548 
* *Keras Documentation:* https://keras.io/api/ 
---