# Image classification using Neural Networks

After having the chance to try different parameters in the Tensorflow Playground, now it's our turn to implement something by ourselves using neural networks. As you may remember we already worked with the Fashion-MNIST dataset using unsupervised methods. Today we're going to use Keras in order to build our model, this time supervised classification. Keras is a high level framework for machine learning, which uses Tensorflow as backend. It allows us to implement neural network in a very confortable form. For more information about Keras go to <https://keras.io/>


## Exploring the Data

We'll use the same data as for clustering. However, for this exercise we need training and testing samples, so that we can test how well our model performs. Test data is useful to observe that our model is not only memorizing the samples, but it should be able to classify unseen data. Therefore, we don't provide the model with labels in the test phase.

In [None]:
# import the lib
from keras.datasets import fashion_mnist
import numpy as np
import pandas as pd
from pandas import *

pd.set_option('display.height', 800)
pd.set_option('display.width', 800)
pd.set_option('max_rows', 20)
pd.set_option('max_columns', 20)

In [None]:
# We are already familiar with the load_data function, it returns train and test data in tuples.
(x_train_orig, y_train_orig), (x_test_orig, y_test_orig) = fashion_mnist.load_data()

print ("let's have a look in the data:")
print ("shape of x_train_orig: {}".format(x_train_orig.shape))
print ("shape of y_train_orig: {}".format(y_train_orig.shape))
print ("shape of x_test_orig: {}".format(x_test_orig.shape))
print ("shape of y_test_orig: {}".format(y_test_orig.shape))
print ("")

(m, w, h) = x_train_orig.shape

print ("x_train_orig is a matrix of shape {},".format((m, w, h))) 
print ("there are {} images, each of them is a 2-dimensional matrix of shape {} (grey-scaled pixels).".format(m, (w,h)))
print ("")

print ("y_train_orig is an one-dimensional vector with length {}.".format(y_train_orig.shape[0]))
print ("Notice: the length of y_train_orig is same with x_train_orig (first dimension).")
print ("")

i = 42
print ("Now let's peek into {}-th example of the x_train_orig and y_train_orig.".format(i))
print ("x_train_orig[{}] is a 2D matrix, it looks like".format(i))
print (DataFrame (x_train_orig[i]))

print ("the label for x_train_orig[{}] is y_train_orig[{}] = {}".format(i, i, y_train_orig[i]))

## Preparing Data

As before, we flatten the 2D data into a string of single values.
But we also have to pre-process the labels. The learning process does not expect class indices (1,2, ..., 9) but the very popular one-hot vectors. One-hot is a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0). One-hot encoding represents the categorial variable with its dimension being set to value 1:

|label|One-hot vector|
|-|-|
|0|[1,0,0,0,0,0,0,0,0,0] |
|1|[0,1,0,0,0,0,0,0,0,0] |
|2|[0,0,1,0,0,0,0,0,0,0] |
|3|[0,0,0,1,0,0,0,0,0,0] |
|4|[0,0,0,0,1,0,0,0,0,0] |
|5|[0,0,0,0,0,1,0,0,0,0] |
|6|[0,0,0,0,0,0,1,0,0,0] |
|7|[0,0,0,0,0,0,0,1,0,0] |
|8|[0,0,0,0,0,0,0,0,1,0] |
|9|[0,0,0,0,0,0,0,0,0,1] |

The separate dimensions provide a more meaningful error value (think about this for a second or two!) and are easy to generalize. Additionally, it is preferable to normalize the image pixel values from a range in [0,255] to a range in [0,1].

**Your Assignment** in the following code cell:
- Normalize pixel values in the data set x_train_orig with value range [0,255] to x_train with value range [0,1]
- Flatten the data image in x_train_orig, x_test_orig 2D to 1D, and store them into x_train and x_test.
* Convert the labels in y_train_orig, y_test_orig using one-hot encoding, and store them into x_train and x_test

**Hint:**
- Normalization: $x'=\frac{x - min(x)}{max(x) - min(x)}$ where $x$ is an original value, $x'$ is the normalized value
    - example: a grey-scaled value $x = 190$, $min(x) = 0$, $max(x) = 255$, then $x'=\frac{190}{255}$ 
    - you may apply this operation element-wise on the matrix that you want to normalize
- Flatten: 
    - x_train_orig: [60000,28,28] => x_train: [60000,28\*28] 
    - x_test_orig:  [10000,28,28] => x_test: [10000,28\*28]
    - you might find the numpy function `reshape` useful
- one-hot encoding
    - `keras.utils.np_utils` contains a function that transforms labels to one hot vectors


In [None]:
from keras.utils import np_utils

# Number of categories in our data 
num_class = 10

# Normalizing pixel values of x_train_orig and x_test_orig, store into x_train and x_test
x_train =  # your code
x_test  =  # your code

# Flatten the data images into one dimentional vectors (60000,28,28) => (60000,28*28) = (60000,784)
x_train =  # your code
x_test  =  # your code

# convert the labels in y_train_orig and y_train_test using one-hot encoding (category 10)
y_train =  # your code
y_test  =  # your code

In [None]:
print ("x_train.shape = {}, expected = {}".format(x_train.shape, (60000, 28*28)))
print ("y_train.shape = {}, expected = {}".format(y_train.shape, (60000, 10)))
print ()

print ("x_test.shape = {}, expected = {}".format(x_test.shape, (10000, 28*28)))
print ("y_test.shape = {}, expected = {}".format(y_test.shape, (10000, 10)))
print ()

# examine some example
r = np.random.randint(60000)
print("Now, let's examine {}-th examplar of the normalized x_train.".format(r))
print("x_train[{}] looks like:".format(r))
print(DataFrame(x_train[r]))

print("The label of x_train[{}]: y_train[{}] looks like".format(r, r))
print(DataFrame(y_train[r]))

print()

## Build and Train the Model

Now we have to define the structure of the neural network. <br>
For another step we import the TensorBoard library, so that we can visualize our results later.

In [None]:
from keras.callbacks import TensorBoard

## Build the Model
We are going to use [keras sequential model](https://keras.io/getting-started/sequential-model-guide/) to build a sequential neural network.

**Outline of the model**
![funct](../data/nn-model.png)
The diagram shows a network with 2 fully (densely) connected hidden layers. 
<br>

**Input**
- The input size has to be the size of a flattened image (784)
<br>

**Hidden layers**
- The size of the hidden layers can be choosen as you like, we propose:
  - the first layer with 128 neurons, using `ReLU`as activation function
  - the second layer with 64 neurons, using `ReLU` as activation function
<br>

**Output**
- the output layer again has to be equal to the number of classes (10), using `softmax` as activation function

**Your assignment**:
- implement the model in diagram above using keras sequential model
    - define a sequential model
    - define first layer: size 128, activation=ReLU
    - define second layer: size 64, activation=ReLU
    - define output layer: size 10, activation=Softmax

**Hint**
- Use `Sequential` to declare a new sequential model
    - After you have a sequential `model`, your can use `model.add()` to add a layer
- Use `Dense` to define a layer
    - you need to provide parameter `input_shape` at the first layer 
- Use `Activation` to define activation
- Read: [Guide to sequential model](https://keras.io/getting-started/sequential-model-guide)

In [None]:
from keras.models import Sequential
from keras.layers import Activation, Dense
import keras

# declare one sequential model
model = Sequential()

# add first layer, size 128, input shape = size of a flattened image, use ReLU as activation, 2 lines


# add second layer siez 64, use ReLU as activation, 2 lines


# add output layer, size 10 (number of classes), use Softmax as activation, 2 lines


model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Generates a graph event to visualize the control flow.
tensorboard = TensorBoard(log_dir='./logs/run1', histogram_freq=0, write_graph=True, write_images=False)

# Summarizes the settings and outputs the complexity of our model.
# In other words, how many weights have influence to the output.
# The more degrees of freedom, the more labeled data should be present.
model.summary() 

## Train the model

**Your Assignment**
- train the model with the train data, labels, and test data. 

**Hint**

The training itself is nothing special. We use the method `model.fit` and define the relevant data: 
- the data (`x`=$x\_train$)
- the labels as one-hot vectors (`y`=$y\_train$)
- the number of iterations(`epochs`)
- the batch size(`batch_size`)
- optional: the validation data (`validation_data`=($x\_test$, $y\_test$))
- optional: the callbacks (`callbacks`=[tensorboard])

If you enter the test data as `validation_data`, we get the calculated model quality after each epoch on the basis of the test data.

`Epochs` are iterations over all data points. The less data we have the more we have to iterate to improve the weights often enough. With more epochs the learning process receives the same data multiple times. There is a risk that the model memorizes the patterns and doesn't generalize any more. This is called *overfitting*.
The `batch_size` defines the number of instances, whose error is examined by the optimizer before the weights will be adapted. 

The fit-method delivers the history, which allows us to visualize the training process. Furthermore this method includes a `callbacks` attribute, it is fed with the tensorboard object and enables us an access to the Tensorboard.

**Read**: method `fit` in [sequential document](https://keras.io/models/sequential/).

In [None]:
# training parameters
epochs = 3
batch_size = 100

In [None]:
history = model.fit(
    # provide data, labels, number of iterations, batch size, validation data, callbacks
)

In [None]:
from matplotlib import pyplot as plt
%matplotlib inline

# Progress of accuracy 
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc']) # if you fail on this key, check whether you provide validation_data in model.fit
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

As we can see, the training accuracy increases during the epochs, because the model adapts to the training data. 
At some point in the training this is not the case anymore for the test data and it could become even worse. 
So we have to consider not overtraining the model (*overfitting*).


### TensorBoard
Another visualisation method is called *Tensorboard*. To understand, debug, and optimize your model the Tensorbaord includes some visualization tools. You can inspect your computation graph, or plot quantitative metrics like the accuracy or the loss function. We already opened the neccessary ports in the container but need to start the Tensorboard server component.

To open Tensorboard you have to proceed the following way:

- Open a new Terminal in the jupyter server: 

    "Home" -> New -> Terminal  
  
  
- Start Tensorboard in your log directory: 

    `tensorboard --logdir=exercises/logs/`
  
  
- Open port 6006 in your browser (copy and paste): 

(***replace <your_docker_ip> by your own docker-ip***)

    `http://<your_docker_ip>:6006`


### Evaluation
The information during the training was already promising. As in previous exercises, let's take a look at the confusion matrix to estimate the numbers. In order to do so, we use our trained model and let it make predict on the test data.

**Your Assignment**
- use the model to predict on test data
- calculate the confusion matrix and accuracy

***Hint:***
- `model.predict_classes` returns the labels directly, saving the conversion of one-hot vectors.
    - *Note*: the result will be categorial values, not one-hot encoded value.
- use `metrics` from sklearn to calculate the confusion matrix and accuracy
    - in order to compare with the predictions, you should use `y_test_orig`

In [None]:
import sklearn.metrics as metrics

# predict on test data
predictions = # your code

# confusion matrix and accuracy
cm = # your code
accuracy = # your code

# Output
print("ACC: {}".format(accuracy))
print("CM:")
print(DataFrame(cm))

fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(cm, cmap=plt.cm.gray)
fig.colorbar(cax)
plt.show()


This looks awesome! Let's save the model so that we can use it again at any time without any effort. New data can now be preprocessed in the same way and classified using `predict`.

In [None]:
model.save_weights('MyFashionClassifier.h5') # Save the current status in a HDF5 format

Congratulation! Your have finished this exercise.