# Deep Learning Project
## Task
The task of this deep learning project was to create and train a neural network for number recognition of the MNIST dataset, which contains 70000 hand-written numbers from 0 to 10. The main priority is performance followed by explainability, while data analysis comes last.

As a first step the given ways of tutorial 1 were tried and rated by their performance. Version 0 is without regularization, where different validation splits (15% and 20%) and early stopping have been tried. Version 1 is with L2 regularizer and version 2 with dropout function.

Even though version 0 had the best performance, I decided to work on with dropout because of problems with noise of loss and overfitting. The model is a sequential model with 5 layers. The first layer is a flatten layer which flattens the input data from a 28x28 matrix to a 784 element vector. The next two layers are dense layers with 512 and 128 neurons respectively, and both use the ReLU activation function. These layers are followed by two dropout layers with a dropout rate of 0.5, which helps to prevent overfitting by randomly setting a fraction of the input units to 0 during training. The final layer is a dense layer with 10 neurons and a softmax activation function. It is compiled with the Adam optimizer, a sparse categorical crossentropy loss function and accuracy as a metric. The model is then trained for 10 epochs with a batch size of 32 and a validation split of 0.15, very similar to the test splitting of MNIST (training set: 60000, test set: 10000 -> 14%). With this model it is able to reach an accuracy of 98% and a loss of 0.07.

In [None]:
# import needed libraries
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

In [None]:
# load and normalize dataset and define plottinf function
(x_train, y_train), (x_test, y_test)=tf.keras.datasets.mnist.load_data()
x_train, x_test=x_train/255.0, x_test/255.0

def plot (train_history):
    plt.plot(train_history.history['loss'], label='train')
    plt.plot(train_history.history['val_loss'], label='val')
    plt.ylabel('loss')
    plt.legend()
    plt.show()

    plt.plot(train_history.history['accuracy'], label='train')
    plt.plot(train_history.history['val_accuracy'], label='val')
    plt.ylabel('accuracy')
    plt.legend()
    plt.show()

In [None]:
# define structure of neural network with dropout
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.5),
  tf.keras.layers.Dense(128, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.5),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

In [None]:
# define training parameters
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# train model and give out graph of loss and accuracy
history=model.fit(x_train, 
          y_train, 
          epochs=10,
          batch_size=32,
          shuffle=True, 
          validation_split=0.15
         )

plot(history)

In [None]:
# validate on test set
validation_acc = model.evaluate(x_test, y_test)
print('Validation accuracy with test set:', validation_acc)

## Optimization with convolutional layers
To improve the performance, I tried a model with covolutional layers. The main concept of the code and the concept of convolutional layers is displayed in the picture below.

<img src="conv_layers.png" width="800"><br>
Quelle: https://www.youtube.com/watch?v=9cPMFTwBdM4

The convolutional neural network consists of two convolutional layers, each followed by a max pooling layer, and then a fully connected dense layer.

The to_categorial method labels in the y_train and y_test arrays into one-hot encoded labels. One-hot encoding is a process where a categorical label is represented as an array of binary values, where only one element is 1 and the rest are 0. This is often used in classification problems where the categorical labels are not ordinal and there is no inherent ordering between them.

It is then compiled with the rmsprop optimizer, the categorical cross-entropy loss function and the accuracy metric. The training is performed for 5 epochs with a batch size of 32 and shuffling of the training data. 15% of the training data is used for validation. With this model it was able to reach an accuracy of 99% and a loss of 0.03.


In [None]:
# convert to one-hot encoded labels
from keras.utils import to_categorical
y_train=to_categorical(y_train)
y_test=to_categorical(y_test)

In [None]:
# define structure of neural network with convolutional and dense layers
model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(32,(3,3), activation=tf.nn.relu, input_shape=(28,28,1)),
  tf.keras.layers.MaxPooling2D((2,2)),
  tf.keras.layers.Conv2D(64,(3,3), activation=tf.nn.relu),
  tf.keras.layers.MaxPooling2D((2,2)),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(64, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

In [None]:
# define training parameters
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# train model
history=model.fit(x_train, 
          y_train, 
          epochs=5,
          batch_size=32, 
          shuffle=True,
          validation_split=0.15,
         )

plot(history)

In [None]:
# validate on test set
validation_acc = model.evaluate(x_test, y_test)
print('Validation accuracy with test set:', validation_acc)