# Übung 5: MNIST in Keras

## Aufgabe 2a: TensorFlow 2 quickstart for beginners

In [1]:
import tensorflow as tf

Load and prepare the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). Convert the samples from integers to floating-point numbers:

In [2]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

Build the `tf.keras.Sequential` model by stacking layers. Choose an optimizer and loss function for training:

In [3]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

In [4]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

The `Model.fit` method adjusts the model parameters to minimize the loss: 

In [5]:
model.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f58bcfae6d0>

The `Model.evaluate` method checks the models performance, usually on a "[Validation-set](https://developers.google.com/machine-learning/glossary#validation-set)" or "[Test-set](https://developers.google.com/machine-learning/glossary#test-set)".

In [6]:
model.evaluate(x_test,  y_test, verbose=2)

313/313 - 0s - loss: 0.0811 - accuracy: 0.9754


[0.08107899874448776, 0.9753999710083008]

## Aufgabe 2b

In [7]:
model.evaluate(x_train, y_train, verbose=2)

1875/1875 - 2s - loss: 0.0439 - accuracy: 0.9863


[0.043908800929784775, 0.9863333106040955]

The image classifier is now trained to ~98% accuracy on this dataset. To learn more, read the [TensorFlow tutorials](https://www.tensorflow.org/tutorials/).

Es fällt auf, dass das Modell auf den Trainingsdaten besser performed als auf den Testdaten. Das ist natürlich zu erwarten, da wir auf den Trainingsdaten tatsächlich optimiert haben, während die Testdaten einen Generalisierungsfehler ausgeben.

Zusätzlich fällt auf, dass die Evaluierung der Trainingsdaten sich von den Ergebnissen der letzten Epoche unterscheidet. Das liegt an der Dropout-Schicht, die zur Inferenzzeit keinen Dropout mehr durchführt.

## Aufgabe 2c

In [15]:
model2 = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(512, activation='relu'),
  tf.keras.layers.Dropout(0.2), 
  tf.keras.layers.Dense(10, activation='softmax')
])

model2.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model2.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f58901212b0>

In [16]:
model2.evaluate(x_test,  y_test, verbose=2)

313/313 - 0s - loss: 0.0793 - accuracy: 0.9782


[0.07925237715244293, 0.9782000184059143]

In [17]:
model2.evaluate(x_train, y_train, verbose=2)

1875/1875 - 2s - loss: 0.0277 - accuracy: 0.9908


[0.027707796543836594, 0.9908000230789185]

The model performs better on the training data, but is just as good on the test data. The higher number of parameters makes it easier to overfit on the training set.

## Aufgabe 2d

In [8]:
model3 = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(10, activation='softmax')
])

model3.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model3.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f58904174f0>

In [9]:
model3.evaluate(x_test,  y_test, verbose=2)

313/313 - 0s - loss: 0.2653 - accuracy: 0.9266


[0.2652684450149536, 0.9265999794006348]

In [10]:
model3.evaluate(x_train, y_train, verbose=2)

1875/1875 - 2s - loss: 0.2570 - accuracy: 0.9297


[0.2569727599620819, 0.9296833276748657]

As expected, the model performed much worse compared to the deeper variants.

## Aufgabe 2e

In [11]:
model4 = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='tanh'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model4.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model4.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f58902c1a30>

In [12]:
model4.evaluate(x_test,  y_test, verbose=2)

313/313 - 0s - loss: 0.0966 - accuracy: 0.9708


[0.09664200991392136, 0.97079998254776]

In [13]:
model4.evaluate(x_train, y_train, verbose=2)

1875/1875 - 2s - loss: 0.0614 - accuracy: 0.9819


[0.06144605949521065, 0.9818833470344543]

The model performed minimally worse than with ReLU at the same runtime and trains more slowly. This is due to the gradient of the tanh, which is at small values. The deeper the model, the clearer the vanishing gradient problem becomes.

## Aufgabe 2f

Bestes Modell: Model2 mit der Extra-Schicht

In [18]:
tf.keras.models.save_model(model2, "model.h5")

Alternative: model2.save("model.h5")