# AI Trainer
## 1. Introduction
This is a Jupyter Notebook for training the AI that is used for this project. It is meant as a tool for training the AI with a prepared dataset and to showcase in detail how this works.

## 2. Table of Contents
1. [Introduction](#1-introduction)
2. [Table of Contents](#2-table-of-contents)
3. [Imports](#3-imports)

## 3. Imports
The following libraries are imported:
- `tensorflow`: a library contains tools for creating an AI model, imported as `tf`.
- `keras`: a framework in `tensorflow` for defining the AI model from a list of layers.
- `numpy`: a library that helps with handling lists of numbers in an efficient manner, import as `np`.
- `cv2`: a library of highly optimized algorithms for computer vision.
- `matplotlib.pyplot`: a library for plotting data, imported as `plt`.
- `pathlib`: a library for handling filesystem paths.
- `PIL`: a library for image processing.
- `datetime`: a standard python library for dates and time

In [None]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import cv2
import matplotlib.pyplot as plt
import pathlib
import PIL

## 4. Creating a Dataset
This project uses a dataset of 256x256 grayscale images. There are three sub-directories in the dataset, one for each of the states **attacking**, **idling** and **walking** with the same name.

```
├── kajt_training_data/
│   ├── attacking/
│   ├── idling/
│   ├── walking/
```

### 4.1. Defining Dataset Parameters
These are the parameters used for the dataset:
- `batch_size`: The amount of images to use in each batch, every batch is used in all epochs.
- `image_width`: The width of the *input* image.
- `image_height`: The height of the *input* image.
- `validation_split`: The percentage of the dataset to use as validation images to gauge accuracy.

In [None]:
batch_size = 32
image_width = 256
image_height = 256
validation_split = 0.2

## 4.2. Downloading the Dataset
The images is stored as a `.tar.gz` file on OneDrive. The sharing URL OneDrive provides is not a direct download link, so it had to be converted to one, read more on ["Generate OneDrive Direct-Download Link with C# or Python"](https://towardsdatascience.com/how-to-get-onedrive-direct-download-link-ecb52a62fee4) by [Joe T. Santhanavanich](https://joets.medium.com/). The location of the dataset can be changed by altering the `dataset_url` variable, a local file can be used if `file:{file_path}` is used.

Downloading and extracting the dataset from the URL is done by the `tf.keras.utils.get_file()` function, it provides a path to the downloaded dataset. The path is converted into a `pathlib.Path` object to make it easier to handle later on.

> **NOTE**
> 
> The dataset that is downloaded is cached, this means that the cached folder has to be deleted before a new dataset can be used.

In [None]:
dataset_url = ""
data_directory = tf.keras.utils.get_file('kajt_training_data', origin=dataset_url, untar=True)
data_directory = pathlib.Path(data_directory)

# Print the amount of images in the dataset and how they are split
print(f"Image count: {len(list(data_directory.glob('*/*')))}")
print(f"- attacking: {len(list(data_directory.glob('attacking/*')))}")
print(f"- idling: {len(list(data_directory.glob('idling/*')))}")
print(f"- walking: {len(list(data_directory.glob('walking/*')))}")

### 4.3. Previewing the Dataset
These are some example images from the dataset

#### 4.3.1. Attacking States

In [None]:
PIL.Image.open(next(data_directory.glob('attacking/*')))
PIL.Image.open(next(data_directory.glob('attacking/*')))

#### 4.3.2. Idling States

In [None]:
PIL.Image.open(next(data_directory.glob('idling/*')))
PIL.Image.open(next(data_directory.glob('idling/*')))

#### 4.3.3. Walking States

In [None]:
PIL.Image.open(next(data_directory.glob('walking/*')))
PIL.Image.open(next(data_directory.glob('walking/*')))

### 4.4. Loading the Dataset
The dataset is loaded from the disk with `tf.keras.utils.image_dataset_from_directory()` function. Two datasets are created, one for training and one for validating. Validating is an essential part of the process of training the AI, it makes it significantly easier to identify if the model is overfitted. This is explained further in ????.

In [None]:
training_dataset = tf.keras.utils.image_dataset_from_directory(
    data_directory,
    validation_split=validation_split,
    subset='training',
    seed=123,
    image_size=(image_height, image_width),
    batch_size=batch_size,
    color_mode='grayscale')

validation_dataset = tf.keras.utils.image_dataset_from_directory(
    data_directory,
    validation_split=validation_split,
    subset='validation',
    seed=123,
    image_size=(image_height, image_width),
    batch_size=batch_size,
    color_mode='grayscale')

class_names = training_dataset.class_names
print(class_names)

### 4.5. Configure the Dataset for Performance
The dataset is cached which means that the dataset is kept in memory and does not stop the model during training to load the data from the disk. Additionally, the dataset is prefetched which means that the next batch of data is loaded and processed while the model is training on the current batch.

In [None]:
training_dataset = training_dataset.cache().prefetch(buffer_size=tf.data.AUTOTUNE)
validation_dataset = validation_dataset.cache().prefetch(buffer_size=tf.data.AUTOTUNE)

## 5. Creating the Model
The `keras` framework greatly simplifies the process of creating the model with the help of the `Sequential` model. The `Sequential` model creates the AI model from a list of layers from the framework.

A simple model is used in place of a Convolutional Neural Network for easy of implementation. While a CNN is the most common model to use for image classification, this project is constrained to a specific character and the image filters are trivial and thus the training time for a CNN to figure out optimal filters is a waste. Additionally, the character is locked in a known location and thus is not benefiting from spatial dependencies which is the main benefit of a CNN.

In [None]:
model = keras.Sequential()

### 5.1. Adding the Layers

#### 5.1.1. Rescaling
The first layer is a `Rescaling` layer that converts the pixel data from a range of integer numbers between 0 and 255 to a range of floating point numbers between 0.0 and 1.0.

This layer defines how the input should look like with `input_shape=(image_height, image_width, 1)`, this means a list the same length as the image is high contains lists of the same length as the image is wide that contains a list with a single element.

In [None]:
model.add(keras.layers.Rescaling(1.0 / 255, input_shape=(image_height, image_width, 1)))

#### 5.1.2. Flatten
The second layer is a `Flatten` layer which reshapes the input from a multidimensional list of numbers into a one dimensional list of numbers. It turns the image, which is a list of lists of numbers into a single list of numbers.

In [None]:
model.add(keras.layers.Flatten())

#### 5.1.3. Dense
The next two layers are `Dense` layers, these are normal neural network layers, i.e. collections of neurons that applies a mathematical function on the input to produce an output.

The `Dense` layers are passed an activation function, this is an additional function that each output is put through before being outputed. The `relu` (Rectified Linear Unit) function is the most common function and looks like this:

$$ f(x) = x^+ = max(0, x) $$

The first two `Dense` layers have 256 and 64 neurons respectively, these values are chosen completely arbitrarily and could likely be optimized by testing different configurations. But that is outside the scope of this project.

In [None]:
model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dense(64, activation='relu'))

#### 5.1.3. Output Layer
The last layer is the output layer and is also a `Dense` layer. The number of outputs is defined by the number of classes of images there are, which is three. The activation method for this layer is `softmax` instead of `relu`, this means that there is only a single output chosen and it is chosen by the largest output.

In [None]:
number_of_classes = len(class_names)
model.add(keras.layers.Dense(number_of_classes, activation='relu', name='outputs'))

### 5.2. Compiling the Model
The `optimizer` is the algorithm that, as the name would suggest, assists in optimizing the neural network. It does so by adjusting the properties of the neurons and the overall learning rate. The learning rate determines how large the adjustments the `loss` algorithm makes are. The Adam algorithm was chosen since it is the most common.

A `loss` algorithm is responsible for adjusting the model so that it makes the correct classifcation.

The `metrics=['accuracy']` argument means that the accuracy of the model is shown after each epoch.

In [None]:
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-4)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model.compile(optimizer=optimizer,
              loss=loss,
              metrics=['accuracy'])

### 5.3. Model Summary
How the model actually looks like.

In [None]:
model.summary()

## 6. Training the Model
Train the model for 50 epochs, which means that the model trains on the dataset 10 times. The model is saved every epoch using the `keras.callbacks.ModelCheckpoint()`

In [74]:
from datetime import datetime

date_string = str(datetime.now()).replace(' ', '_').replace(':', '-')
training_path = pathlib.Path('training', date_string)
checkpoint_path = pathlib.Path(training_path, 'model.{epoch:02d}.hdf5')

checkpoint_callback = keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                           save_weights_only=False,
                                                           save_best_only=False,
                                                           verbose=1,
                                                           save_freq='epoch')

epochs = 50
training_history = model.fit(training_dataset,
                                  validation_data=validation_dataset,
                                  epochs=epochs,
                                  callbacks=[checkpoint_callback])



## 7. Visualizing the Result
Plot the `loss` and `accuracy` from the training and validation.

In [None]:
training_accuracy = training_history.history['accuracy']
training_loss = training_history.history['loss']
validation_accuracy = training_history.history['val_accuracy']
validation_loss = training_history.history['val_loss']

x_epochs = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(x_epochs, training_accuracy, label='Training Accuracy')
plt.plot(x_epochs, validation_accuracy, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(x_epochs, training_loss, label='Training Loss')
plt.plot(x_epochs, validation_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.savefig(pathlib.Path('training', date_string, 'results-plot'))
plt.show()

### 7.2 Saving the Model Descriptor
Saving the model descriptor as a JSON file helps with diffirentiating different models when multiple types have been trained.

In [None]:
import json

with open(pathlib.Path('training', date_string, 'descriptor.json'), "w") as json_file:
    obj = json.loads(model.to_json())
    
    optimizer_config = optimizer.get_config()
    if isinstance(optimizer_config['learning_rate'], np.floating):
        optimizer_config['learning_rate'] = float(optimizer_config['learning_rate'])
    obj['optimizer'] = optimizer_config
    
    obj['loss'] = loss.get_config()

    json.dump(obj, json_file, indent=4)