# Image classification using Neural Networks

After having the chance to try different parameters in the Tensorflow Playground, now it's our turn to implement something by ourselves using neural networks. As you may remember we already worked with the Fashion-MNIST dataset using unsupervised methods. Today we're going to use Keras in order to build our model, this time supervised classification. Keras is a high level framework for machine learning, which uses Tensorflow as backend. It allows us to implement neural network in a very confortable form. For more information about Keras go to <https://keras.io/>


## Data

We'll use the same data as for clustering. However, for this exercise we need training and testing samples, so that we can test how well our model performs. Test data is useful to observe that our model is not only memorizing the samples, but it should be able to classify unseen data. Therefore, we don't provide the model with labels in the test phase.

In [None]:
from keras.datasets import fashion_mnist

# We are already familiar with the load_data function, it returns train and test data in tuples.
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

# y_train is an array of length 60000, each row containing a label
print ("y_train.shape = {}".format(y_train.shape))
# x_train is an array of shape 60000x28x28, each of the 60k entries is a 2-dimensional array of 28x28
# containing each image as a matrix of gray-scaled pixels.
print ("x_train.shape = {}".format(x_train.shape))
print("Example of an element in y_i: {}".format(y_train[0]))
print("Example of an element in x_i: {}".format(x_train[0]))

## Preparing Data

As before, we flatten the 2D data into a string of single values.
But we also have to pre-process the labels. The learning process does not expect class indices (1,2, ..., 9) but the very popular one-hot vectors. This kind of vectors, holds each class has its own dimension and only the label dimension has the value one:

0 &rarr; [1,0,0,0,0,0,0,0,0,0]

1 &rarr; [0,1,0,0,0,0,0,0,0,0]

...

9 &rarr; [0,0,0,0,0,0,0,0,0,1]

The separate dimensions provide a more meaningful error value and are easy to generalize.

And it is preferable to normalize the image values, from a range between 0 - 255 to a range between 0 to 1.

***Hint:***
- `keras.utils.np_utils` contains a function that transforms labels to one hot vectors


In [None]:
from keras.utils import np_utils

# 10 categories in our data
num_class = 10

# Normalizing color values
x_train = x_train / float(255)
x_test  = x_test / float(255)

# one hot vectors of the shape 10 ()
y_train_hot = np_utils.to_categorical(y_train, 10)
y_test_hot  = np_utils.to_categorical(y_test, 10)

# Flatten the data in 784 separate dimensions
x_train = x_train.reshape(len(x_train),28*28)
x_test = x_test.reshape(len(x_test),28*28)

# Shapes 
print ("y_train_hot.shape = {}".format(y_train_hot.shape))
print ("x_train.shape = {}".format(x_train.shape))

# Examples 
print("Example of an element in y_i: {}".format(y_train_hot[0]))
print("Example of an element in x_i: {}".format(x_train[0]))

## Define and train the model

Now we have to define the structure of the neural net. The elements in the brackets build the model architecture and were choosen for the following reasons:

- Our network has a sequential control flow and no recursions.(`keras.models.Sequential`)

- Each layer consists of two parts, a connection to the previous layer (`keras.layers.Dense`) and an activation function (`keras.layers.Activation`)

- The `Dense` layers include the number of nodes, which is automatically the input shape to the next layers. 

- Given the fact that the first layer doesn't have any previous layers, we have to implement the number of input nodes. This number has to fit our data structure. In our case there are 784 separate values, which we define through the parameter `input_shape`.

- The last layer includes as much outputs as classes contained in the dataset.

In each layer we should pass a quantified activation. This is done by the `relu` activation function. In our case the last layer should choose only one class. Therefore we use the `softmax` activation function.

For further parameters (loss, optimizer, metrics, ...) we take default values or we have already declared useful values for you. 

For another step we import the TensorBoard library, so that we can visualize our results later.

In [None]:
from keras.callbacks import TensorBoard


In [None]:
from keras.models import Sequential
from keras.layers import Activation, Dense
import keras

model = Sequential()

model.add(Dense(128, input_shape=x_train[0].shape))
model.add(Activation('relu'))

model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dense(num_class)) # 10 outputs
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Generates a graph event to visualize the control flow.
tensorboard = TensorBoard(log_dir='./logs', histogram_freq=0, write_graph=True, write_images=False)

# Summarizes the settings and outputs the complexity of our model.
# In other words, how many weights have influence to the output.
# The more degrees of freedom, the more labeled data should be present.
model.summary() 

The training itself is nothing special. We use the method `model.fit` and define the relevant data: 
- the data(x)
- the labels as one-hot vectors(y)
- the number of iterations(epochs)
- the batch size(batch_size)

If you enter the test data as `validation data`, we get the calculated model quality after each epoch on the basis of the test data.

Epochs are iterations over all data points. The less data we have the more we have to iterate to improve the weights often enough. With more epochs the learning process receives the same data multiple times. There is a risk that the model memorizes the patterns and doesn't generalize any more. 
The batch size defines the number of instances, whose error is examined by the optimizer before the weights will be adapted. 

***Hint:*** The fit-method delivers the history, which allows us to visualize the training process. Furthermore this method includes a `callbacks` attribute, it is fed with the tensorboard object and enables us an access to the Tensorboard.

In [None]:
epochs = 2
batch_size = 20

In [None]:
history = model.fit(
    x_train, 
    y_train_hot, 
    epochs=epochs, 
    batch_size=batch_size, 
    validation_data=(x_test, y_test_hot),
    callbacks=[tensorboard]
)

In [None]:
from matplotlib import pyplot as plt

# Progress of accuracy 
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

As we can see, the training accuracy increases during the epochs, because the model adapts to the training data. 
At some point in the training this is not the case anymore for the test data and it could become even worse. 
So we have to consider not overtraining the model (*overfitting*).


### TensorBoard
Another visualisation method is called *Tensorboard*. To understand, debug, and optimize your model the Tensorbaord includes some visualization tools. You can inspect your computation graph, or plot quantitative metrics like the accuracy or the loss function.

To open Tensorboard you have to proceed the following way:
- Type in this command in your docker terminal to match port 6006 to your container: 

    `docker run -p 0.0.0.0:6006:6006 -it novatec/mlss bash`


- Start Tensorboard in your log directory: 

    `tensorboard --logdir=exercises/logs/`
  
  
- Open port 6006 in your browser (copy and paste): 

(***replace <your_docker_ip> by your own docker-ip***)

    `http://<your_docker_ip>:6006`


In [None]:
!docker run -p 0.0.0.0:6006:6006 -it novatec/mlss bash

### Evaluation
The information during the training was already promising. As in previous exercises, let's take a look at the confusion matrix to estimate the numbers. In order to do so, we use our already trained model and let it make predictions for the test data.

***Hint:***
- `predict_classes` returns the labels directly, saving the conversion of one-hot vectors.

In [None]:
import sklearn.metrics as metrics

predictions = model.predict_classes(x_test);

cm = metrics.confusion_matrix(y_test ,predictions)
accuracy = metrics.accuracy_score(y_test, predictions)

# Output
print("ACC: {}".format(accuracy))
print("CM: {}".format(cm))

fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(cm, cmap=plt.cm.gray)
fig.colorbar(cax)
plt.show()


This looks awesome! Let's save the model so that we can use it again at any time without any effort. New data can now be preprocessed in the same way and classified using `predict`.

In [None]:
model.save_weights('MyFashionClassifier.h5') # Save the current status in a HDF5 format