# SUMMARY

## Hyperparameter: 
Number of Training Images

## Chosen Values/Variants:
To manipulate the number of training images I came up with two methods. The first approach would be to take the entire data set and only 
decrease/increase the proportion of test images. However, I decided against it because I wanted to keep the 80/20 (training/test) ratio 
for the various manipulations of the training data set. 
That's why I decided to take the approach of keeping the 80/20 (training/test) ratio for each variant to be able to compare the different
variants better and instead manipulated the transferred dataset in terms of its size (number of images to be processed).

I decided on the following 6 variants:

1. Variant: 500 images (400 training / 100 test)
2. Variant: 400 images (320 training / 80 test)
3. Variant: 300 images (240 training / 60 test)
4. Variant: 200 images (160 training / 40 test)
5. Variant: 100 images (80 training / 20 test)
6. Variant: using the complete dataset (about 630 images; 504 trainig / 126 test)

The retention of the 80/20 ratio and the same increments of 100 allow for a good comparison.

## Assumption
I assumed that the prediction accuracy for a test dataset improves the larger the training dataset is and at the same time the inference time 
for predictions decreases because the model is better trained and can make a prediction faster.

## Results

### 1. Variant: 500 images (400 training / 100 test)
loss: 0.0879 - accuracy: 0.9725 - val_loss: 0.2321 - val_accuracy: 0.9500 - lr: 2.0000e-04 (epoch 13/50)
inteference time for predictions: 13ms/step (4 steps)

### 2. Variant: 400 images (320 training / 80 test)
loss: 0.0370 - accuracy: 0.9875 - val_loss: 0.2103 - val_accuracy: 0.9250 - lr: 1.0000e-04 (epoch 16/50)
inteference time for predictions: 12ms/step (3 steps)

### 3. Variant: 300 images (240 training / 60 test)
loss: 0.0688 - accuracy: 0.9750 - val_loss: 0.1940 - val_accuracy: 0.9500 - lr: 2.0000e-04 (epoch 11/50)
inteference time for predictions: 16ms/step (2 steps)

### 4. Variant: 200 images (160 training / 40 test)
loss: 0.0947 - accuracy: 0.9750 - val_loss: 0.1310 - val_accuracy: 0.9750 - lr: 1.0000e-04 (epoch 32/50)
inteference time for predictions: 7ms/step (2 steps)

### 5. Variant: 100 images (80 training / 20 test)
loss: 0.5914 - accuracy: 0.7250 - val_loss: 0.4103 - val_accuracy: 0.9500 - lr: 2.0000e-04 (epoch 4/50)
inteference time for predictions: 57ms/step (1 step)

### 6. Variant: using the complete dataset (504 training / 126 test)
loss: 0.0861 - accuracy: 0.9863 - val_loss: 0.2305 - val_accuracy: 0.9297 - lr: 1.0000e-04 (epoch 21/50)
inteference time for predictions: 18ms/step (4 steps)

### Interpretation
Regarding the prediction accuracy for a test dataset training image datasets between 500 - 300 deliver similiar results with 
the model with the complete dataset (504 training images) having the best accuracy (and lowest loss). The interefence time for predictions
were between 18 and 12 ms per step. The variant with 160 training images achieves a good accuracy (0.9750) too but needed 32 epochs for it. 
Regarding the inteference time for predictions it had the lowest time with 7ms per step. The variant with 80 training images had the lowest 
accuracy (0.7250) and highest loss (0.5914). Furthermoore it had the highest inteference time for predictions with 57ms per step.

## Visualisation
The tools provided in the template for each variant were used to visualize and present the results. However, to get better comparability and 
more detailed information, I used a TensorBoard. A 'logs' folder is automatically created and the log files for each model variant are stored in it. 
With the help of this data, the individual models can then be viewed more closely in the TensorBoard and, above all, viewed visually very clearly.
The TensorBoard can be opened either on the command line or using the line of code I added at the end of the notebook.

An explanation of TensorBoards can be found at: 
https://www.youtube.com/watch?v=BqgTU7_cBnk&t=470s [10.05.23] (Analyzing Models with TensorBoard - Deep Learning with Python, TensorFlow and Keras p.4)

## Further Informations

In order to be able to use the 5 variants in which the data set is manipulated without unnecessarily bloating the notebook, I have combined the methods 
from the template into a 'Pre-Build' block to functions, which I can then later call including a datasize parameter.


# Imports

In [1]:
import cv2
import json
from matplotlib import pyplot as plt
import numpy as np
import os
import random
import time

# import a lot of things from keras:
# sequential model
from keras.models import Sequential

# layers
from keras.layers import Input, Dense, Dropout, Flatten, Conv2D, MaxPooling2D, RandomFlip, RandomRotation, RandomContrast, RandomBrightness

# loss function
from keras.metrics import categorical_crossentropy

# callback functions
from keras.callbacks import ReduceLROnPlateau, EarlyStopping, TensorBoard

# convert data to categorial vector representation
from keras.utils import to_categorical

# nice progress bar for loading data
from tqdm.notebook import tqdm

# helper function for train/test split
from sklearn.model_selection import train_test_split

# import confusion matrix helper function
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# import pre-trained model
from keras.applications.vgg16 import VGG16

# include only those gestures
CONDITIONS = ['like', 'stop']

# image size
IMG_SIZE = 64
SIZE = (IMG_SIZE, IMG_SIZE)

# number of color channels we want to use
# set to 1 to convert to grayscale
# set to 3 to use color images
COLOR_CHANNELS = 3

# Variants

In [2]:
FIRST_VARIANT = 500
SECOND_VARIANT = 400
THIRD_VARIANT =  300
FOURTH_VARIANT = 200
FIFTH_VARIANT = 100
# SIXTH_VARIANT is the complete dataset (about 630 images)

# Pre-build

## helper function to load and parse annotations

In [3]:
annotations = dict()

for condition in CONDITIONS:
    with open(f'_annotations/{condition}.json') as f:
        annotations[condition] = json.load(f)

## helper function to pre-process images (color channel conversion and resizing)

In [4]:
def preprocess_image(img):
    if COLOR_CHANNELS == 1:
        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img_resized = cv2.resize(img, SIZE)
    return img_resized

## load images and annotations

In [5]:
images = [] # stores actual image data
labels = [] # stores labels (as integer - because this is what our network needs)
label_names = [] # maps label ints to their actual categories so we can understand predictions later

# loop over all conditions
# loop over all files in the condition's directory
# read the image and corresponding annotation
# crop image to the region of interest
# preprocess image
# store preprocessed image and label in corresponding lists
for condition in CONDITIONS:
    for filename in tqdm(os.listdir(condition)):
        # extract unique ID from file name
        UID = filename.split('.')[0]
        img = cv2.imread(f'{condition}/{filename}')
        
        # get annotation from the dict we loaded earlier
        try:
            annotation = annotations[condition][UID]
        except Exception as e:
            print(e)
            continue
        
        # iterate over all hands annotated in the image
        for i, bbox in enumerate(annotation['bboxes']):
            # annotated bounding boxes are in the range from 0 to 1
            # therefore we have to scale them to the image size
            x1 = int(bbox[0] * img.shape[1])
            y1 = int(bbox[1] * img.shape[0])
            w = int(bbox[2] * img.shape[1])
            h = int(bbox[3] * img.shape[0])
            x2 = x1 + w
            y2 = y1 + h
            
            # crop image to the bounding box and apply pre-processing
            crop = img[y1:y2, x1:x2]
            preprocessed = preprocess_image(crop)
            
            # get the annotated hand's label
            # if we have not seen this label yet, add it to the list of labels
            label = annotation['labels'][i]
            if label not in label_names:
                label_names.append(label)
            
            label_index = label_names.index(label)
            
            images.append(preprocessed)
            labels.append(label_index)

  0%|          | 0/250 [00:00<?, ?it/s]

  0%|          | 0/250 [00:00<?, ?it/s]

## Quick test if images are loaded

In [None]:
plt.imshow(random.sample(images, 1)[0])

## Builds the model 
Takes a data size parameters and returns the model and the history

In [6]:
def train_model(dataset_size):

    X_train, X_test, y_train, y_test = train_test_split(images[:dataset_size], labels[:dataset_size], test_size=0.2, random_state=42)

    print("training and test data")
    print(len(X_train))
    print(len(X_test))
    print(len(y_train))
    print(len(y_test))

    X_train = np.array(X_train).astype('float32')
    X_train = X_train / 255.

    X_test = np.array(X_test).astype('float32')
    X_test = X_test / 255.

    y_train_one_hot = to_categorical(y_train, 3)
    y_test_one_hot = to_categorical(y_test, 3)

    train_label = y_train_one_hot
    test_label = y_test_one_hot

    X_train = X_train.reshape(-1, IMG_SIZE, IMG_SIZE, COLOR_CHANNELS)
    X_test = X_test.reshape(-1, IMG_SIZE, IMG_SIZE, COLOR_CHANNELS)

    print("transformed data")
    print(X_train.shape, X_test.shape, train_label.shape, test_label.shape)

    # variables for hyperparameters
    batch_size = 8
    epochs = 50
    num_classes = len(label_names)
    activation = 'relu'
    activation_conv = 'LeakyReLU'  # LeakyReLU
    layer_count = 2
    num_neurons = 64

    # define model structure
    # with keras, we can use a model's add() function to add layers to the network one by one
    model = Sequential()

    # data augmentation (this can also be done beforehand - but don't augment the test dataset!)
    model.add(RandomFlip('horizontal'))
    model.add(RandomContrast(0.1))
    #model.add(RandomBrightness(0.1))
    #model.add(RandomRotation(0.2))

    # first, we add some convolution layers followed by max pooling
    model.add(Conv2D(64, kernel_size=(9, 9), activation=activation_conv, input_shape=(SIZE[0], SIZE[1], COLOR_CHANNELS), padding='same'))
    model.add(MaxPooling2D(pool_size=(4, 4), padding='same'))

    model.add(Conv2D(32, (5, 5), activation=activation_conv, padding='same'))
    model.add(MaxPooling2D(pool_size=(3, 3), padding='same'))

    model.add(Conv2D(32, (3, 3), activation=activation_conv, padding='same'))
    model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))

    # dropout layers can drop part of the data during each epoch - this prevents overfitting
    model.add(Dropout(0.2))

    # after the convolution layers, we have to flatten the data so it can be fed into fully connected layers
    model.add(Flatten())

    # add some fully connected layers ("Dense")
    for i in range(layer_count - 1):
        model.add(Dense(num_neurons, activation=activation))

    model.add(Dense(num_neurons, activation=activation))

    # for classification, the last layer has to use the softmax activation function, which gives us probabilities for each category
    model.add(Dense(num_classes, activation='softmax'))

    # specify loss function, optimizer and evaluation metrics
    # for classification, categorial crossentropy is used as a loss function
    # use the adam optimizer unless you have a good reason not to
    model.compile(loss=categorical_crossentropy, optimizer="adam", metrics=['accuracy'])

    # define callback functions that react to the model's behavior during training
    # in this example, we reduce the learning rate once we get stuck and early stopping
    # to cancel the training if there are no improvements for a certain amount of epochs
    reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.0001)
    stop_early = EarlyStopping(monitor='val_loss', patience=3)
    tensordboard = TensorBoard(log_dir='logs/{}'.format("gesture_recognition_model-{}-images-version-{}".format(dataset_size, time.time())))

    history = model.fit(
        X_train,
        train_label,
        batch_size=batch_size,
        epochs=epochs,
        verbose=1,
        validation_data=(X_test, test_label),
        callbacks=[reduce_lr, stop_early, tensordboard]
    )

    return history, model, X_test, y_test



## Plot accuracy and loss of the training process and shows a summary of the model

In [7]:
def show_training_results(history, model):

    model.summary()

    loss = history.history['loss']
    val_loss = history.history['val_loss']
    accuracy = history.history['accuracy']
    val_accuracy = history.history['val_accuracy']

    fig = plt.figure(figsize=(15, 7))
    ax = plt.gca()

    ax.set_xlabel('Epoch')
    ax.set_ylabel('Accuracy (Line), Loss (Dashes)')

    ax.axhline(1, color='gray')

    plt.plot(accuracy, color='blue')
    plt.plot(val_accuracy, color='orange')
    plt.plot(loss, '--', color='blue', alpha=0.5)
    plt.plot(val_loss, '--', color='orange', alpha=0.5)

# 1. Variant: 500 images (400 training / 100 test)

In [None]:
history, model, X_test, y_test = train_model(FIRST_VARIANT)

In [None]:
show_training_results(history, model)

In [None]:
y_predictions = model.predict(X_test)

# 2. Variant: 400 images (320 training / 80 test)

In [None]:
history, model, X_test, y_test  = train_model(SECOND_VARIANT)

In [None]:
show_training_results(history, model)

In [None]:
y_predictions = model.predict(X_test)

# 3. Variant: 300 images (240 training / 60 test)

In [None]:
history, model, X_test, y_test  = train_model(THIRD_VARIANT)

In [None]:
show_training_results(history, model)

In [9]:
y_predictions = model.predict(X_test)



# 4. Variant: 200 images (160 training / 40 test)

In [None]:
history, model, X_test, y_test  = train_model(FOURTH_VARIANT)

In [None]:
show_training_results(history, model)

In [11]:
y_predictions = model.predict(X_test)



# 5. Variant: 100 images (80 training / 20 test)

In [None]:
history, model, X_test, y_test  = train_model(FIFTH_VARIANT)

In [None]:
show_training_results(history, model)

In [13]:
y_predictions = model.predict(X_test)



# 6. Variant: using the complete dataset (about 630 images; 504 trainig / 126 test)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(images, labels, test_size=0.2, random_state=42)

print("training and test data")
print(len(X_train))
print(len(X_test))
print(len(y_train))
print(len(y_test))

X_train = np.array(X_train).astype('float32')
X_train = X_train / 255.

X_test = np.array(X_test).astype('float32')
X_test = X_test / 255.

y_train_one_hot = to_categorical(y_train)
y_test_one_hot = to_categorical(y_test)

train_label = y_train_one_hot
test_label = y_test_one_hot

X_train = X_train.reshape(-1, IMG_SIZE, IMG_SIZE, COLOR_CHANNELS)
X_test = X_test.reshape(-1, IMG_SIZE, IMG_SIZE, COLOR_CHANNELS)

print("transformed data")
print(X_train.shape, X_test.shape, train_label.shape, test_label.shape)

# variables for hyperparameters
batch_size = 8
epochs = 50
num_classes = len(label_names)
activation = 'relu'
activation_conv = 'LeakyReLU'  # LeakyReLU
layer_count = 2
num_neurons = 64

# define model structure
# with keras, we can use a model's add() function to add layers to the network one by one
model = Sequential()

# data augmentation (this can also be done beforehand - but don't augment the test dataset!)
model.add(RandomFlip('horizontal'))
model.add(RandomContrast(0.1))
#model.add(RandomBrightness(0.1))
#model.add(RandomRotation(0.2))

# first, we add some convolution layers followed by max pooling
model.add(Conv2D(64, kernel_size=(9, 9), activation=activation_conv, input_shape=(SIZE[0], SIZE[1], COLOR_CHANNELS), padding='same'))
model.add(MaxPooling2D(pool_size=(4, 4), padding='same'))

model.add(Conv2D(32, (5, 5), activation=activation_conv, padding='same'))
model.add(MaxPooling2D(pool_size=(3, 3), padding='same'))

model.add(Conv2D(32, (3, 3), activation=activation_conv, padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))

# dropout layers can drop part of the data during each epoch - this prevents overfitting
model.add(Dropout(0.2))

# after the convolution layers, we have to flatten the data so it can be fed into fully connected layers
model.add(Flatten())

# add some fully connected layers ("Dense")
for i in range(layer_count - 1):
    model.add(Dense(num_neurons, activation=activation))

model.add(Dense(num_neurons, activation=activation))

# for classification, the last layer has to use the softmax activation function, which gives us probabilities for each category
model.add(Dense(num_classes, activation='softmax'))

# specify loss function, optimizer and evaluation metrics
# for classification, categorial crossentropy is used as a loss function
# use the adam optimizer unless you have a good reason not to
model.compile(loss=categorical_crossentropy, optimizer="adam", metrics=['accuracy'])

# define callback functions that react to the model's behavior during training
# in this example, we reduce the learning rate once we get stuck and early stopping
# to cancel the training if there are no improvements for a certain amount of epochs
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.0001)
stop_early = EarlyStopping(monitor='val_loss', patience=3)
tensordboard = TensorBoard(log_dir='logs/{}'.format("gesture_recognition_model-complete-dataset-version-{}".format(time.time())))

history = model.fit(
    X_train,
    train_label,
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,
    validation_data=(X_test, test_label),
    callbacks=[reduce_lr, stop_early, tensordboard]
)

In [None]:
show_training_results(history, model)

## saving the model

the function will create a directory for your model and save structure and weights in there

sometimes you will see the .h5 format being used - even though this is a bit faster and needs less space, it comes with its limitations and isn't used that much any more

In [None]:
model.save('gesture_recognition')

# and this is how you load the model
# model = keras.models.load_model("gesture_recognition")

## visualize classification results with a confusion matrix

In [15]:
y_predictions = model.predict(X_test)



## Confusion Matrix

In [None]:
# we get a 2D numpy array with probabilities for each category
print('before', y_predictions)

# to build a confusion matrix, we have to convert it to classifications
# this can be done by using the argmax() function to set the probability to 1 and the rest to 0
y_predictions = np.argmax(y_predictions, axis=1)

print('probabilities', y_predictions)

# create and plot confusion matrix
conf_matrix = confusion_matrix(y_test, y_predictions)

fig = plt.figure(figsize=(10, 10))

ConfusionMatrixDisplay(conf_matrix, display_labels=label_names).plot(ax=plt.gca())

plt.xticks(rotation=90, ha='center')
pass

# Show TensorBoard

In [None]:
%tensorboard --logdir logs/fit