# Fashion MNIST Data Science Challenge: Neural Networks and Deep Learning


## **Intuition**

### Our goal
Classify clothes 

![alt text](https://miro.medium.com/max/608/1*3QpK4Vhw0BpbjwijIYFkfQ.png)

# Labels

Each training and test example is assigned to one of the following labels:

<li>0 T-shirt/top </li>
<li>1 Trouser</li>
<li>2 Pullover </li>
<li>3 Dress </li>
<li>4 Coat </li>
<li>5 Sandal</li>
<li>6 Shirt </li>
<li>7 Sneaker </li>
<li>8 Bag </li>
<li>9 Ankle boot </li>
--------------------------

Each row is a separate image

Column 1 is the class label.
Remaining columns are pixel numbers (784 total).
Each value is the darkness of the pixel (1 to 255)

### Architecture

![](http://neuralnetworksanddeeplearning.com/images/tikz12.png)

### Artificial Neuron
Inputs are weighted, summed up, a bias is added and the result is transformed by a nonlinear activation function. 
![alt text](https://www.researchgate.net/publication/320270458/figure/fig1/AS:551197154254848@1508427050805/Mathematical-model-of-artificial-neuron.png)

### Activation function
Squashes the weighted inputs into $[0, 1]$ (i.e. bounds the outputs) and introduces non-linearity to the model, e.g.

**Sigmoid function**:
\begin{eqnarray} 
  \sigma(z) \equiv \frac{1}{1+e^{-z}}
  \end{eqnarray}
![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Logistic-curve.svg/1200px-Logistic-curve.svg.png)

### Forward pass
Outputs are calculated in a **feed-forward manner**: activated weighted sums are propagated through the network to the output layer.

![alt text](https://glassboxmedicine.files.wordpress.com/2019/01/slide2.jpg?w=616)

### Backward pass
Errors are calculated from the outputs and **backpropagated**: the network improves its predictions (it learns) by updating the weights and biases in each backward pass from the last to the first hidden layer.

#### Loss function
Or cost function; Compares actual with desired outputs of the network and therefore measures the predictive performance of the network. 
$\begin{eqnarray} C(w,b) \equiv
  \frac{1}{2n} \sum_x \| y(x) - a\|^2.
\end{eqnarray}$

#### Optimization
Improve the networks performance by minimizing the cost w.r.t. the network's parameters w and b: 
$\begin{equation}
\begin{aligned} \min_{w,b} C(w,b) 
\end{aligned}
\end{equation}$ <br>
Compute partial derivatives $\partial C / \partial w$ and $\partial C / \partial b$ to identify the slope of the cost function w.r.t. w and b and use **gradient descent** step downwards by updating the parameters:

![alt text](http://neuralnetworksanddeeplearning.com/images/valley_with_ball.png)
\begin{eqnarray}
  w_k & \rightarrow & w_k' = w_k-\eta \frac{\partial C}{\partial w_k} \\
  b_l & \rightarrow & b_l' = b_l-\eta \frac{\partial C}{\partial b_l} \\
\end{eqnarray}
\begin{align}
\eta = \text{learning rate or step size}
\end{align}


![alt text](https://datascience-enthusiast.com/figures/kiank_sgd.png)



#### Backpropagation
Gradient descent requires the computation of partial derivatives $\partial C / \partial w$ and $\partial C / \partial b$ of the cost function C with respect to any weight w or bias b in the network. By means of the **backpropagation algorithm** those derivatives are computed efficiently.
[Read more](http://neuralnetworksanddeeplearning.com/chap2.html)

## **Fashion Mnist Classification**
Let's build a baseline model

In [1]:
#basic packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
import datetime
import tensorflow as tf
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split

AttributeError: module 'tensorflow.compat.v2.__internal__' has no attribute 'tf2'

### Data Preparation

In [None]:
#1. Get the file
data_train = pd.read_csv('train.csv')
data_validate = pd.read_csv('test.csv')

In [None]:
#2.Explore train data
data_train.head(10)

In [None]:
data_validate

In [None]:
#explore test data
print(data_train.shape)
data_validate.shape

In [None]:
data_train = np.array(data_train, dtype = 'float32') # Damit Input Daten von Keras akzeptiert werden müssen wir sie in ein Array umwandeln 
data_validate = np.array(data_validate, dtype='float32') 

In [None]:
plt.figure()
plt.imshow(data_train[0,1:].reshape((28,28)))
plt.colorbar()
plt.grid(False)
plt.show()

In [None]:
x_train = data_train[:,1:]/255 #pixel data from 0-1 TODO -0.5?

y_train = data_train[:,0] #label data

data_submission = data_validate/255  # TODO -0.5?

In [None]:
plt.figure()
plt.imshow(x_train[0].reshape((28,28)))
plt.colorbar()
plt.grid(False)
plt.show()

Es ist zu erkennen, dass die Pixel im Bereich zwischen 0 und 255 liegen. Für das Training auf dem Netzwerk müssen diese zwischen 0 und 1 liegen. 

Scale Grayscale ( [0,255] ) to [0,1]

In [None]:
x_train.shape

In [None]:
y_train.shape

In [None]:
g = sns.countplot(y_train)

In [None]:
data_submission.shape

In [None]:
#reshape the array containing the images (28px x 28px and 1 channel)
image_rows = 28
image_cols = 28
image_shape = (image_rows,image_cols,1)# 1 da schwarz weiß, bei Farbbildern 3 (r,g,b)

x_train = x_train.reshape(x_train.shape[0],*image_shape)
data_submission = data_submission.reshape(data_submission.shape[0],*image_shape)

In [None]:
y_train.shape

Hint: Use ImageDataGenerator: https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator

In [None]:
x_train.shape

In [None]:
#split train data in train and validation set
x_train2,x_validate2,y_train2,y_validate2 = train_test_split(x_train,y_train,test_size = 0.2,random_state = 12345)

In [None]:
print(x_train2.shape)
print(x_validate2.shape)
print(y_train2.shape)
print(y_validate2.shape)

*Hint: increase the size of the training set with data augmentation*
> https://www.tensorflow.org/tutorials/images/data_augmentation

### Modeling

#### Layers
* `Dense(dimensionality of output , activation function)`: regular fully connected NN layer
![alt text](http://neuralnetworksanddeeplearning.com/images/tikz41.png)
* `Conv2D(dimensionality of output, kernel size,... , activation function)`: 2D convolution layer for spatial convolution over images
![alt text](http://neuralnetworksanddeeplearning.com/images/tikz49.png)
![alt text](https://anhreynolds.com/img/cnn.png)
![alt text](https://i.ytimg.com/vi/rrOgPiqYu6s/hqdefault.jpg)
* `MaxPool2D(pool_size)`: Max pooling operation for spatial data.
![alt text](https://computersciencewiki.org/images/8/8a/MaxpoolSample2.png)
* `Flatten()`: Flattens the input. Does not affect the batch size.

![alt text](https://www.w3resource.com/w3r_images/numpy-manipulation-ndarray-flatten-function-image-1.png)

* `Dropout(rate, ..., seed)`: Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting.

![alt text](http://neuralnetworksanddeeplearning.com/images/tikz31.png)

Import [Keras](https://www.tensorflow.org/api_docs/python/tf/keras), a high-level API for TensorFlow

In [None]:
# Display tensorflow devices to check for cuda
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

In [None]:
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Conv2D,MaxPooling2D,Dense,Flatten,Dropout,BatchNormalization
from keras.optimizers import Adam
from keras.callbacks import TensorBoard# zur Visualisierung


model = tf.keras.Sequential([
        Conv2D(kernel_size=3,filters=10,activation='relu',input_shape=(28,28,1)),
        Flatten(),
        #....,
        #....,
        #....,
        #....,
        Dense(64,activation = 'relu'),
        Dense(10,activation = 'softmax')  # 10 neurons for output, softmax best according to article TODO article here
    ])

# AlexNet
benchmark = tf.keras.Sequential([
        Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), activation='relu', padding="same", input_shape=(28,28,1)),
        BatchNormalization(),
        MaxPooling2D(pool_size=(3,3), strides=(2,2)),
        Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
        BatchNormalization(),
        Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
        BatchNormalization(),
        Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
        BatchNormalization(),
        MaxPooling2D(pool_size=(3,3), strides=(2,2)),
        Flatten(),
        Dense(4096, activation='relu'),
        Dropout(0.5),
        Dense(4096, activation='relu'),
        Dropout(0.5),
        Dense(1000, activation='relu'),
        Dropout(0.5),
        Dense(10, activation='softmax')  # 10 neurons for output, softmax best according to article TODO article here
    ])
model = benchmark

In [None]:
model.summary()

*Hint: change the type, number and order of layers*
> https://www.tensorflow.org/api_docs/python/tf/keras/layers

*Hint: change the activation function*
> https://www.tensorflow.org/api_docs/python/tf/keras/activations

*Hint: prevent overfitting and speedup training by adding regularization*
> https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#strategies_to_prevent_overfitting

Choose an optimizer and loss function for training
Choose metric to evaluate performance

## Kompilieren des Modells
Bevor das Modell für das Training bereit ist, müssen einige weitere Einstellungen vorgenommen werden. 
Diese werden während des Kompilierungsschritts des Modells hinzugefügt:

Verlustfunktion - Hiermit wird gemessen, wie genau das Modell während des Trainings ist. Sie möchten diese Funktion minimieren, um das Modell in die richtige Richtung zu "steuern".

Optimierer - Auf diese Weise wird das Modell basierend auf den angezeigten Daten und seiner Verlustfunktion aktualisiert.

Metriken - Dient zum Überwachen der Trainings- und Testschritte. Im folgenden Beispiel wird die Genauigkeit verwendet , der Bruchteil der Bilder, die korrekt klassifiziert wurden.

In [None]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

*Hint: change the loss function*
> https://www.tensorflow.org/api_docs/python/tf/keras/losses

*Hint: change the optimization method and its parameters*
> https://www.tensorflow.org/api_docs/python/tf/keras/optimizers

### Hyperparameter Tuning
For [optimizing hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_optimization) such as learning rate of SGD in an efficient and non-heuristic way, use a subset of your training data as validation set and perform Grid Search.

### Training

In [None]:
history = model.fit(
    x_train2,
    y_train2,
    epochs=5,
    verbose=1,
    validation_data=(x_validate2,y_validate2)
    )

In [None]:
score = model.evaluate(x_validate2,y_validate2,verbose=0)
print('Test Loss : {:.4f}'.format(score[0]))
print('Test Accuracy : {:.4f}'.format(score[1]))

In [None]:
plt.figure(figsize=(20, 10))

plt.plot(history.history['loss'], label='Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.legend()
plt.title('Training - Loss Function')

### Submission
Submit your final notebook as **fashion_mnist_teamX.ipynb** and your predictions of the test data as a **predictions_teamX.csv**.

In [None]:
# predict results
results = model.predict(data_submission)

# select the indix with the maximum probability
results = np.argmax(results,axis = 1)

results = pd.Series(results,name="Label")
results

In [None]:
data_results = pd.DataFrame(results)
data_results.head(10)

In [None]:
data_results.to_csv('fashion_mnist_pred_teamX.csv', index=False)#Bitte statt X eure Gruppennummer einfügen! 

## Ressources
Background:
  * Book: [Neural Networks and Deep Learning, Michael Nielsen](http://neuralnetworksanddeeplearning.com) 
  * Lecture: [CS231n, Stanford University](http://cs231n.stanford.edu/)

Implementation:
  * [TensorFlow tutorials](https://www.tensorflow.org/tutorials)
  * [Keras Docs](https://www.tensorflow.org/api_docs/python/tf/keras)

## Image Sources
* http://neuralnetworksanddeeplearning.com/images/
* https://www.researchgate.net/publication/320270458/figure/fig1/AS:551197154254848@1508427050805/Mathematical-model-of-artificial-neuron.png
* https://www.w3resource.com/w3r_images/numpy-manipulation-ndarray-flatten-function-image-1.png
* https://computersciencewiki.org/images/8/8a/MaxpoolSample2.png
* https://glassboxmedicine.files.wordpress.com/2019/01/slide2.jpg?w=616
* http://neuralnetworksanddeeplearning.com/images/valley_with_ball.png
* https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Logistic-curve.svg/1200px-Logistic-curve.svg.png