# Module 2. Learning how to see

Module 1 focused on how to create a first neural network that helped in classifying inputs. In this module, we will consider that the inputs to our model are images, and a network will need to understand something out of that image.

## The MNIST dataset
We will work in this module with the MNIST dataset. This dataset contains black-and-white images of hand-written digits and their labels (i.e., the represented digit). Each image has a single digit from 0-9 (i.e., 10 classes). 

In this module, we will not construct the dataset ourself out of the URL where the data is stored. Instead, we will make use of the tensorflow API, which provides a convinient way of downloading MNIST and other datasets. If you are curious about which datasets are provided have a look into the [documenation](https://www.tensorflow.org/datasets). 

The mnist dataset documenatation can be found [here](https://www.tensorflow.org/datasets/catalog/mnist). In that documenation, you can see that the structure of the dataset is defined by the following dictionary:
```python
FeaturesDict({
    'image': Image(shape=(28, 28, 1), dtype=uint8),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
})
``` 
Let us then import the data and work with it. The library entrypoint for loading a dataset is `tensorflow_datasets.load` which will allow us to download datasets for both training and testing. This latter will help us validate how the network performs on data that it has not seen during the training phase. 

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds


mnist, info = tfds.load('mnist',data_dir='mnist_data',download=True,shuffle_files=True,with_info=True, as_supervised=True)
ds_train = mnist['train']
ds_test  = mnist['test']

A nice property of `tf.Datasets` is that it can be use as the source for your input data, it allows to apply transformations to preprocess the data as well as easily iterating over the dataset using batches, and visualizing a sample of the dataset in case this contains images. 

In [None]:
tfds.show_examples(ds_train,info)

See how easy is to indicate the dataset that we want to shuffle the data, use batches of 32 elements and let tensorflow decide the best parameters for prefetching data into memory according to our hardware configuration:

In [None]:
ds_train = ds_train.shuffle(1024).batch(32).prefetch(tf.data.AUTOTUNE)
ds_test  = ds_test.shuffle(1024).batch(1).prefetch(tf.data.AUTOTUNE)

# Normalize the data
So far our dataset is composed of black-and-white images whose pixels contain values beetween 0 (black) and 255 (white). These are discrete values which might pose in principle some difficulties for an optimization algorithm. Let's normalise these values to the [0,1] interval and conver these to floats. We will do this by defining a function that will take care of doing the normalisation, and applying that function to the dataset by using its map function

In [None]:
def normalise(image, label):
    return tf.cast(image,tf.float32) / 255., tf.one_hot(label,depth=10)

ds_train = ds_train.map(normalise,num_parallel_calls=tf.data.AUTOTUNE)
ds_test  = ds_test.map(normalise,num_parallel_calls=tf.data.AUTOTUNE)

## Building the model

Now is your turn. Using the Module 1 notebook, create a neural network model for recognizing the digits in the images provided in the mnist dataset. This neural network is going to be plain simple and be defined only by:
- The input layer
- A hidden layer of 128 neurons and `relu` activation function
- An output layer whose size you should know from the dataset
A caveat in this case is that the inputs in the dataset are not vectors, but matrices of (28, 28). You will need to conver this into a vector as first layer in your model

In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28, )),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

Once your have built your model, train it with the following piece of code. 

In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss='categorical_crossentropy',
    metrics=['acc'],
)

model.fit(
    ds_train,
    epochs=6,
    validation_data=ds_test,
)

Let us verify how our trained model performs on the testing dataset. 

In [None]:
for image, label in ds_test.take(2):
    pred = model.predict(image)
    print(tf.argmax(pred, axis=1))
    print(label)

## A bit more complex dataset
You will be using the Sign Language MNIST dataset, which contains 28x28 images of hands depicting the 26 letters of the english alphabet. Let's download this dataset first. The data is stored as cvs files with the label being the first element of every column and the rest of columns denoting the 28x28 pixesl of the image. We will download these files and write a parser function that read these files and return the data in form of numpy arrays. 


In [None]:
!wget https://drive.google.com/uc?id=1z0DkA9BytlLxO1C0BAWzknLyQmZAp0HR --output-document train.cvs
!wget https://drive.google.com/uc?id=1z1BIj4qmri59GWBG4ivMNFtpZ4AXIbzg --output-document test.cvs

In [None]:
import csv
import string
import numpy as np

def parse_data_from_input(filename):
  with open(filename) as file:
    csv_reader = csv.reader(file, delimiter=',')
    labels = []
    images = []

    #ignore first row
    next(csv_reader)
    for line in csv_reader:
      labels.append(line[0])
      images.append(line[1:])
    
    labels = np.array(labels)
    labels = labels.astype(np.float64)
    images = np.reshape(images,(-1,28,28))
    images = images.astype(np.float64)

    return images, labels

training_images, training_labels = parse_data_from_input("train.cvs")
validation_images, validation_labels = parse_data_from_input("test.cvs")
print(training_images.dtype)

print(f"Training images has shape: {training_images.shape}")
print(f"Training labels has shape: {training_labels.shape}")
print(f"Validation images has shape: {validation_images.shape}")
print(f"Validation labels has shape: {validation_labels.shape}")

You can use the following piece of code to get an idea of how the dataset images look like. Obviously, we could see that it is probably more challenging to classify these images than the mnist digits. Notice that in this case we need to use the matplotlib library for plotting the data. 

In [None]:
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import array_to_img

def plot_categories(training_images, training_labels):
  fig, axes = plt.subplots(1, 10, figsize=(16, 15))
  axes = axes.flatten()
  letters = list(string.ascii_lowercase)

  for k in range(10):
    img = training_images[k]
    img = np.expand_dims(img, axis=-1)
    img = array_to_img(img)
    ax = axes[k]
    ax.imshow(img, cmap="Greys_r")
    ax.set_title(f"{letters[int(training_labels[k])]}")
    ax.set_axis_off()

  plt.tight_layout()
  plt.show()

plot_categories(training_images, training_labels)

Later we will need to feed images into a model, after normalising them. For the sake of simplicity, we will create keras generators for deadling with the train and tests datasets. 

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
def train_val_generators(training_images, training_labels, validation_images, validation_labels):
  training_images   = np.expand_dims(training_images,3)
  validation_images = np.expand_dims(validation_images,3)

  training_labels = tf.keras.utils.to_categorical(training_labels,26)
  train_datagen   = ImageDataGenerator(rescale=1.0/255.0)

  train_generator = train_datagen.flow(x=training_images,
                                       y=training_labels,
                                       batch_size=32) 
  
  validation_labels  = tf.keras.utils.to_categorical(validation_labels,26)
  validation_datagen = ImageDataGenerator(rescale=1.0/255.0)

  validation_generator = validation_datagen.flow(x=validation_images,
                                                 y=validation_labels,
                                                 batch_size=32) 
  return train_generator, validation_generator

train_generator, validation_generator = train_val_generators(training_images, training_labels, validation_images, validation_labels)


Ok, let's see how a model like the one before performs in this case. Bare in mind if you do not create the model again, you will be using the one you trained for the mnist digits. Lets create it, train it, and analyse how it performed during training and validation. 

In [None]:

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28, )),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(26, activation='softmax')
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)


history = model.fit(train_generator,
                    epochs=15,
                    validation_data=validation_generator)


acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'r', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'r', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()


The validation loss seem to have stagnated. This might be a clear indication that our network is not sufficient for classifying the simbols we are dealing with at the moment. We need to look for another solution. 