# Philosophy and Theory of AI: Coding Exercise

To goal of this exercise is to build an AI model (specifically, a neural network) that can classify images of a hand-written digits according to which digit they depict. This is a standard AI exercise. Its point for our course is to concretely see how an AI model is implemented. In doing so, we follow the standard 4-step machine learning pipeline. To do this exercise, simply continue reading through this document. 

> Whenever you actively have to do something, this is indicated by a gray bar like here. 

This file is inspired by an [exercise](https://developers.google.com/machine-learning/crash-course/multi-class-neural-networks/programming-exercise) (licensed under the Apache License, Version 2.0) in Google's machine learning [crash course](https://developers.google.com/machine-learning/crash-course). 

Before we start, a comment on how to use *jupyter notebooks* like this one (if you haven't done this before). There are two types of cells: *text cells* like the present one and *code cells* like the next one. Text cells are written in [Markdown](https://en.wikipedia.org/wiki/Markdown) text and code cells are written in [Python](https://en.wikipedia.org/wiki/Python_(programming_language)) code. You can run a cell either by clicking its play button or by pressing `Ctrl` + `Enter` when the cell is highlighted. Running a text cell just renders it ('makes it look like it was intended'), and running a code cell means executing the piece of program that it contains (its output, if any, is then printed below the code). More detailed introductions to jupyter notebook can be found online, e.g., [here](https://www.youtube.com/watch?v=HW29067qVWk).

> To practice, run the code cell below. This is just to load the Python packages that we will need later on. You can ignore the output.

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers
from matplotlib import pyplot as plt

## Step 1: Problem (task conceptualization)

The problem that we want our AI model to solve is to classify images. Concretely, given as input a digital image of a hand-written digit, the AI model should output the digit that is depicted on the image. Each image consists of 28x28 gray-scale pixels. Here are some examples (taken from [wikipedia](https://en.wikipedia.org/wiki/MNIST_database)): 

![Example images from the MNIST database](https://upload.wikimedia.org/wikipedia/commons/f/f7/MnistExamplesModified.png)

> To appreciate the difficulty of this problem, take a moment to think about how you would go about building an automatic system that can solve the task. 

The solution we follow here is to build a neural network. Let's explain how a neural network works. For brevity, this will be quite hand-wavy, but for an excellent detailed video explanation, see [here](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi).

As shown in the diagram below (taken from [wikipedia](https://en.wikipedia.org/wiki/Artificial_neural_network)), it consists of a layer of several input neurons (one for each pixel of the image), followed by one (or also several) layers of hidden neurons, and completed by one layer of output neurons (one for each possible digit).

![Figure of a neural network](https://upload.wikimedia.org/wikipedia/commons/4/46/Colored_neural_network.svg)

The idea is this: Given an image, we put the gray-scale value of each pixel into its corresponding neuron in the input layer. We call this number the activation of the neuron. Now we propagate this activation through the network as follows: Each connection between neurons has a weight, which modulates how much of the activation can 'flow' through that connection into the next neuron. The activation of that neuron is sum of all the incoming modulated activation (additionally regulated by an activation function). Thus, we get activations of the output neurons. We consider the one with the highest activation: the digit that corresponds to it, is the digit that the network takes the image to depict. We can improve the network's prediction by adjusting the weigts: that will be the training process.  

## Step 2: Data (collection and preparation) 

The next step is to collect data: many pairs of an image and the digit it depicts. In practice, this usually is hard work: you first need to collect a wide variety of possible inputs (here the images) and then need human workers (who are called anotators or raters) to label these inputs with the right output (here the depicted digit). Fortunately, since we are doing a standard exercise, this work has already been done for us. The [MNIST database](https://en.wikipedia.org/wiki/MNIST_database) contains many such image-label pairs. (It was created based on a dataset from the US agency NIST, the National Institute of Standards and Technology; to learn more about them, you can watch [this video](https://www.youtube.com/watch?v=esQyYGezS7c).) Conveniently, we can simply download it.



### Downloading the dataset

> To download the MNIST dataset, run the following code.

In [None]:
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data()

This gives us four lists (technically, numpy arrays) called `x_train`, `y_train`, `x_test`, and `y_test`. The 'x-lists' are lists of images and the 'y-lists' are lists of corresponding labels. The 'tain-lists' are used for training the AI model and the 'test-lists' are reserved until the very end: once the AI model is trained, then we test it on this reserved part of the data set. This is *extremely* important: we need to test the AI model on data it has never seen during training. Otherwise we do not get a good estimation on how well the AI model behaves. So never ever mix the test and training set in building your AI model.

### Inspecting the dataset

Now it is time to actually look at the dataset to get a feel for it. 

> To look at, say, the 17th datapoint, run the following code. Think about what to make of it.

In [None]:
print('The x-value: ', x_train[17])
print('The y-value: ', y_train[17])

This is not very helpful yet, right? We do see that the input is an 28x28 array of numbers (a list of 28 lists each of which has 28 entries) between 0 and 255. So each element of the array describes a pixel and the number describes its gray scale between 0 (completely white) and 255 (completely black). And apparently this image depicts an 8.

> To better see what is going, we can plot the input using a plotting package for Python. To see it, run the following code.

In [None]:
plt.imshow(x_train[17])

### Normalize datapoints

Since neural networks work with continuous numbers, it is better to rescale the (whole) numbers between 0 and 255 into (real) numbers between 0 and 1. The mathematical expression for this is to *normalize* the numbers. 

> To do this, run the following code.

In [None]:
x_train_normalized = x_train / 255.0
x_test_normalized = x_test / 255.0

> As a bonus exercise, verify that the numbers are indeed normalized now by adding a new code cell below and printing the 17-th datapoint again.

## Step 3: Model (architecture, training, and testing) 

Now that we have both conceptualized our task (which dicates the neural network architecture) and gathered our dataset, we can start building the AI model. This involves three steps: (a) creating a deep neural network, (b) training it, and (c) evaluating its performance. Once they are in place, we can (d) execute these three steps for specific choices for parameters, and then we can (e) optimize the choice of parameter to get better performance. We do these things in turn now. 


### (a) Creating a deep neural network

We define a function that takes one argument (namely, `my_learning_rate`) and outputs a `model` which is the computer implementation of a neural network with the architecture specified in the function definition.

> Read through the definition of the function and try to understand as much of the code as possible. It's okay if much of it doesn't make sense, especially on a first reading. For now, the focus is to first get the whole thing running, and then you can come back again to understand more. But try to understand which architecture (how many hidden layers, how many neurons per layer) the neural network has. Look up some terms that you don't know online. (Yes, that is not the most fun thing to do, but it gives an impression of a big part of coding: googling things :-)). Finally, run the cell.


In [None]:
def create_model(my_learning_rate):
  """Create and compile a deep neural net."""
  
  # All models in this course are sequential.
  model = tf.keras.models.Sequential()

  # The features are stored in a two-dimensional 28X28 array. 
  # Flatten that two-dimensional array into a one-dimensional 
  # 784-element array.
  model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))

  # Define the first hidden layer.   
  model.add(tf.keras.layers.Dense(units=32, activation='relu'))
      
  # Define a dropout regularization layer. 
  model.add(tf.keras.layers.Dropout(rate=0.2))

  # Define the output layer. The units parameter is set to 10 because
  # the model must choose among 10 possible output values (representing
  # the digits from 0 to 9, inclusive).
  #
  # Don't change this layer.
  model.add(tf.keras.layers.Dense(units=10, activation='softmax'))     
                           
  # Construct the layers into a model that TensorFlow can execute.  
  # Notice that the loss function for multi-class classification
  # is different than the loss function for binary classification.  
  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=my_learning_rate),
                loss="sparse_categorical_crossentropy",
                metrics=['accuracy'])
  
  return model    

### (b) Training a deep neural network

Next we define a function that takes as input a model (the output of the previous function), a dataset, and several arguments, then trains the model with that data, and finally outputs a history of the training process.

> With the same spirit as in the previsous exercise, read through the definition of the function and try to understand as much of the code as possible. Then, run the cell.


In [None]:
def train_model(model, train_features, train_label, epochs,
                batch_size=None, validation_split=0.1):
  """Train the model by feeding it data."""

  history = model.fit(x=train_features, y=train_label, batch_size=batch_size,
                      epochs=epochs, shuffle=True, 
                      validation_split=validation_split)
 
  # To track the progression of training, gather a snapshot
  # of the model's metrics at each epoch. 
  epochs = history.epoch
  hist = pd.DataFrame(history.history)

  return epochs, hist    

### (c) Evaluate a deep neural network

Finally, we define a function to evalute the (trained) model. It takes as input the history of the training process and a list of evaluation criteria (we will only use accuracy), and then procudes a plot of how the accuracy improves over the course of training.

> You don't need to focus much on the details of this function. But do run the cell.

In [None]:
def plot_curve(epochs, hist, list_of_metrics):
  """Plot a curve of one or more classification metrics vs. epoch."""  
  # list_of_metrics should be one of the names shown in:
  # https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#define_the_model_and_metrics  

  plt.figure()
  plt.xlabel("Epoch")
  plt.ylabel("Value")

  for m in list_of_metrics:
    x = hist[m]
    plt.plot(epochs[1:], x[1:], label=m)

  plt.legend()

### (d) Put everything together

Finally, for specific choices of parameters, we execute the above functions. This can take some time (depending on the size of the model, computing speed, etc., but roughly in the order of several seconds to minutes).

> Execute the cell and make sure you understand the output. In particular, the very last line of the output tells you how well the trained model performed on the test set.

In [None]:
# The following variables are the hyperparameters.
learning_rate = 0.003
epochs = 50
batch_size = 4000
validation_split = 0.2

# Establish the model's topography.
my_model = create_model(learning_rate)

# Train the model on the normalized training set.
epochs, hist = train_model(my_model, x_train_normalized, y_train, 
                           epochs, batch_size, validation_split)

# Plot a graph of the metric vs. epochs.
list_of_metrics_to_plot = ['accuracy']
plot_curve(epochs, hist, list_of_metrics_to_plot)

# Evaluate against the test set.
print("\n Evaluate the new model against the test set:")
my_model.evaluate(x=x_test_normalized, y=y_test, batch_size=batch_size)

### (e) Optimize

Now we get to the main exercise. As things have been defined so far, you should achieve a test accuracy of around 96%. That is already pretty impressive, don't you think? But now we want to see if we can change the parameters of the AI model to achieve an even better performance.

> Systematically adjust the following parameters and observe how the performance is changing: 
>    * number of hidden layers
>    * number of nodes in each layer
>    * dropout regularization rate
> 
> Describe which relationships you discover. And do you manage to reach at least 98% accuracy on the test set?


## Step 4: Deployment (distribution shifts, explanations, etc.) 

The next sets in the development of an AI model for the real world would be to delpoy it. (Here maybe a system that recognizes handwriten phone numbers and outputs them in digital format.) For that we would need to study how robust it is to distributions shifts: For example, if it still performs well in countries (e.g., Germany) where, unlike the US, the digit $1$ is hand-written not just as a vertical bar, but as a vertical bar with a little 'flag' to the left. Is this shift in distribution of the input images causing a drop in accuracy or is the model robust to this. But for the purpose of this exercise, we will not do this here.
