# Zalando - Exercise

Fashion MNIST is intended as a drop-in replacement for the classic MNIST dataset — often used as the "Hello, World" of deep learning programs for computer vision. As you know from the previous Notebook The MNIST dataset contains images of handwritten digits (0, 1, 2, etc.).

Now we are using the Fashion MNIST for variety because it's a slightly more challenging problem than regular MNIST.

<img src="./resources/fashion-mnist-sprite.png"  style="height: 500px"/>

Now you will build an Neural Network to do the Fashion MNIST classification yourself. This will be very similar to the MNIST classification. But at the end we will finetune some parameters of our network. That's when it becomes really interesting.

As you can see, the images we are using are still very small. In the next lesson we will be using a CNN, a network specially designed for image classification. We will use more realistic images as well.

## 1. Import packages and classes

You can access the Fashion MNIST directly from TensorFlow. So first import all the packages needed for TensorFlow and Keras.

In [None]:
# TensorFlow and tf.keras


# helper libraries



## 2. Import the Fashion MNIST dataset

Import and load the Fashion MNIST data (`fashion_mnist`) directly from TensorFlow.

## 3. Explore the data

Explore the format of the dataset before training the model. How many images are there in the training set. What size are the images?

How many training labels are there in the dataset?

Can you print some training labels?

As you see, the train_labels are an array of integers, ranging from 0 to 9. These correspond to the class of clothing the image represents:

<table>
    <tr><th>Label</th><th>Class</th></tr>
    <tr><td>0</td><td>T-shirt/top</td></tr>
    <tr><td>1</td><td>Trouser</td></tr>
    <tr><td>2</td><td>Pullover</td></tr>
    <tr><td>3</td><td>Dress</td></tr>
    <tr><td>4</td><td>Coat</td></tr>
    <tr><td>5</td><td>Sandal</td></tr>
    <tr><td>6</td><td>Shirt</td></tr>
    <tr><td>7</td><td>Sneaker</td></tr>
    <tr><td>8</td><td>Bag</td></tr>
    <tr><td>9</td><td>Ankle boot</td></tr>
</table>

Each image is mapped to a single label. Since the class names are not included in the dataset, we store them here to use later when plotting the images:

In [None]:
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

## 4. Preprocess the data

The data must be preprocessed before training the network. If you inspect a random image in the training set, you will see that the pixel values fall in the range of 0 to 255. Print for example the fifth image.

We scale these values to a range of 0 to 1 before feeding to the neural network model. Normalized pixel values make our network easier to train. For this, we divide the values by 255. It's important that the training set and the testing set are preprocessed in the same way. 

First print the fifth image again.

Now display the first 25 images from the training set and __display the class name below each image. So for example we want *Dress* below the image and not *3*.__ Verify that the data is in the correct format and we're ready to build and train the network.

## 5. Build the model

The basic building block of a neural network is the layer. Layers extract representations from the data fed into them. You may build a model that is equal to the model used in the previous Notebook.

## 6. Compile the model

Before the model is ready for training, it needs a few more settings. Compile the model with the same settings as before.

## 7. Train the model

Now train the model. Use 5 epochs for a start.

As the model trains, the loss and accuracy metrics are displayed. Can you see if you get the same accuracy as for the MNIST dataset?

## 8. Evaluate accuracy

The real challenge will be seeing how our model performs on our test data.

In [None]:
# accuracy on the train data? mnist:    - fashion_mnist:
# accuracy on the test data? mnist:    - fashion_mnist:

# What can you conclude?

## 9. Make predictions

With the model trained, we can use it to make predictions about the test images.

Here, the model has predicted the label for each image in the testing set. Take a look at the first prediction.

A prediction is an array of 10 numbers. Can you see which type of clothing has the highest confidence value. __Now print the class name with the highest possibility!__

So the model is most confident that the first image is an ankle boot. Can you print the image to see if this is correct?

Hooray! Once again well done!

## 10. Right or Wrong

Draw the first 25 __test images__ (5 images in a row). Below the image you print the actual value (a) and the predicted value (p). If they are the same the textcolor is green, red otherwise. Tune the subplot layout and create a little bit of space between the subplots. You should get something like this:

<img src="./resources/clothes.png"/>

In [None]:
# Right or Wrong


## 11. Improve the network

What we’ve covered so far was but a brief introduction - there’s much more we can do to experiment with and improve this network.

### 11.1 Tuning Hyperparameters

Let's start with the batch size and number of epochs. First some Neural Network terminology:

- one epoch = One forward pass and one backward pass of all the training examples
- batch size = The number of training examples used in one forward/backward pass. The higher the batch size, the more memory space you'll need, the slower the training process will be.
- number of iterations = The number of passes, each pass using the batch size number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).

Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.

First build and compile the model again and train it with the original code (use __10 epochs__ this time). See how long it takes to train the model (2s/epoch).

Afterwords build and compile the model and train it with the code below. Can you see the difference? How long does it take for one epoch? What about the accuracy?

In [None]:
model.fit(
  train_images,
  train_labels,
  epochs=10,
  batch_size=100,
)

### 11.2 Network Depth

What happens if we remove or add more fully-connected layers? How does that affect training and/or the model’s final performance?

Build, compile and train with the code below and check the accuracy on the training and test data. Are we doing better?

In [None]:
model = keras.Sequential([
  keras.layers.Flatten(input_shape=(28, 28)),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dense(10, activation='softmax'),
])

### 11.3 Activations

What if we use an activation other than ReLU, e.g. sigmoid?

In [None]:
model = keras.Sequential([
  keras.layers.Flatten(input_shape=(28, 28)),
  keras.layers.Dense(128, activation='sigmoid'),
  keras.layers.Dense(128, activation='sigmoid'),
  keras.layers.Dense(128, activation='sigmoid'),
  keras.layers.Dense(10, activation='softmax'),
])

### 11.4 Dropout

What if we tried adding Dropout layers, which are known to prevent overfitting? Dropout layers function by randomly eliminating some of the connections between the layers (0.25 means it drops 25% of the existing connections). Can you see that the accuracy of the train and test data are almost the same?

In [None]:
model = keras.Sequential([
  keras.layers.Flatten(input_shape=(28, 28)),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dropout(0.25),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dropout(0.25),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dropout(0.25),
  keras.layers.Dense(10, activation='softmax'),
])

### 11.5 Validation

We can also use the testing dataset for validation during training. Keras will evaluate the model on the validation set at the end of each epoch and report the loss and any metrics we asked for. This allows us to monitor our model’s progress over time during training, which can be useful to identify overfitting and even support early stopping.

The accuracy on the training set is `acc`, that on the test set is `val_acc`.

In [None]:
model = keras.Sequential([
  keras.layers.Flatten(input_shape=(28, 28)),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dense(10, activation='softmax'),
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(
  train_images,
  train_labels,
  epochs=10,
  batch_size=64,
  validation_data=(test_images, test_labels)
)