# Workpackage: Data engineering

Research Question:
**"How far can we simplify the input data to be still able to distinguish between Hand, Paper, and Scissors?"**

Research Answer:
We can simplify input pictures through by converting them to greyscale and reducing the resolution. Both methods can be used without loosing much of the needed elements. Additionaly we can blur the images, to remove details and only get rough shapes and then use segmentation methods like otsu to get a binary image with the shape of the hand. On simple and clear input images, this can work so good, that with the calculation of histograms one could distinguish the gestures without maschine learning at all. The drawbacks are, that one relies heavily on the selection of the segmentation method and thus needs to be carefully chosen. Another problem shows the segmentation of more complex data. There the segmentation with basic methods have shown to be very incorrect and partwise not usefull at all. But this could also be due to the fact that the implemented otsu method is a global threshold segmentation method, which is not siuted for this usecase. If one has a good segmentation method for this use case, one could as also implemented cut the background out, so that the ML algorithm just has to distinguish between face and hand if the segmentation method lacks of that capability.
**All in all the simplest robust image we were able to generate, which could optimize the training robustly is the blurred greyscale image, which has a reduced resolution. Further evaluation needs to be done if the blurring really benefits the training.**

In [1]:
import matplotlib
from skimage.filters import threshold_multiotsu

def multi_otsu(output_path, img):
    matplotlib.rcParams['font.size'] = 9
    image = np.asarray(img)
    thresholds = threshold_multiotsu(image)
    regions = np.digitize(image, bins=thresholds)
    plt.imshow(regions, cmap='gray')
    plt.axis('off')
    plt.savefig(output_path, bbox_inches='tight', pad_inches=0)


def process_file(filename, dir, output_dir):
    path = os.path.join(dir, filename)
    image = Image.open(path).convert('L')
    output_path = os.path.join(output_dir, filename)
    multi_otsu(output_path, image)

def main():
    dir = '../Dataset/validation_set/scissors'
    output_dir = '../Dataset/validation_otsu/scissors'
    filenames = [filename for filename in os.listdir(dir) if filename.endswith(".png")]

    with multiprocessing.Pool(processes=multiprocessing.cpu_count()) as pool:
        pool.starmap(process_file, [(filename, dir, output_dir) for filename in filenames])

main()

# Workpackage: Model Engineering

Research Question:
**Can we get a more robust system by splitting the image recognition into more specific subtasks? i.e. using transfer learing**

Research Answer:
The motivation/reason behind using transfer learning is, that we noticed that the training dataset matches neither our validation dataset nor our custom dataset, which results in a bad performance. Problem is that the validation and test dataset are very divers, while our training dataset consists of many very similar photos. We concluded that either we create our own training dataset, which will be time consuming and runs into the problem, that the unknown training dataset will be different again and thus running in a similar problem. The approach to solve that problem is to use a pretrained model that can segment hands from everything else in the background and by that making the pictures in the different datasets more similar to each other. By that in our hypothesis the performance shall improve noticably. Unfortunatly we ran out of time finding such a pretrained model. The ones we found did not hold what they promised and were not robust at all, and others would we need to train ourselfes, which would require days and lots of energy.
**In summary we tried using transfer learning to improve the performance, but were stopped because we could not find a suiting pretrained model.**

In [1]:
import os
import random
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
import multiprocessing
import keras
from PIL import Image

def load_images(train_path, val_path):

    train_ds = tf.keras.utils.image_dataset_from_directory(
        train_path,
        validation_split=None,
        subset=None,
        seed=SEED,
        shuffle=True,
        color_mode='grayscale',
        image_size=IMAGE_SHAPE,
        batch_size=BATCH_SIZE)

    val_ds = tf.keras.utils.image_dataset_from_directory(
        val_path,
        validation_split=None,
        subset=None,
        seed=SEED,
        shuffle=True,
        color_mode='grayscale',
        image_size=IMAGE_SHAPE,
        batch_size=BATCH_SIZE)

    class_names = train_ds.class_names
    print(class_names)
    return train_ds, val_ds


def train_model(train_ds, val_ds):

    data_augmentation = keras.Sequential(
        [
            layers.RandomFlip("horizontal",
                              input_shape=(HEIGHT,
                                           WIDTH,
                                           1)),
            layers.RandomRotation(0.1),
            layers.RandomZoom(0.1),
        ]
    )
    num_classes = 3

    model = Sequential([
        data_augmentation,
        layers.Rescaling(1. / 255),
        layers.Conv2D(16, 3, padding='same', activation='relu'),
        layers.MaxPooling2D(),
        layers.Conv2D(32, 3, padding='same', activation='relu'),
        layers.MaxPooling2D(),
        layers.Conv2D(64, 3, padding='same', activation='relu'),
        layers.MaxPooling2D(),
        layers.Dropout(0.2),
        layers.Flatten(),
        layers.Dense(128, activation='relu'),
        layers.Dense(num_classes, name="outputs")
    ])

    model.compile(optimizer='adam',
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=['accuracy'])
    history = model.fit(
        train_ds,
        validation_data=val_ds,
        epochs=EPOCHS
    )


    version = 1
    while True:
        model_file = "models/model_{}.h5".format(version)
        if not os.path.exists(model_file):
            break
        version += 1

    model.save(model_file)
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']

    loss = history.history['loss']
    val_loss = history.history['val_loss']

    epochs_range = range(EPOCHS)

    plt.figure(figsize=(8, 8))
    plt.subplot(1, 2, 1)
    plt.plot(epochs_range, acc, label='Training Accuracy')
    plt.plot(epochs_range, val_acc, label='Validation Accuracy')
    plt.legend(loc='lower right')
    plt.title('Training and Validation Accuracy')

    plt.subplot(1, 2, 2)
    plt.plot(epochs_range, loss, label='Training Loss')
    plt.plot(epochs_range, val_loss, label='Validation Loss')
    plt.legend(loc='upper right')
    plt.title('Training and Validation Loss')
    plt.show()


BATCH_SIZE = 32
EPOCHS = 15
HEIGHT = 60
WIDTH = 60
IMAGE_SHAPE = (HEIGHT, WIDTH)
SEED = 42

def train_and_save_model(seed):
    random.seed(seed)
    np.random.seed(seed)
    train, val = load_images('../Dataset/multi_otsu', '../Dataset/validation_otsu')
    train_model(train, val)


with multiprocessing.Pool(processes=os.cpu_count()) as pool:
    pool.map(train_and_save_model, range(123, 123 * 11, 123))

# Workpackage: Model Evaluation

Research Question:
**What is the effect of using different evaluation metrics on the appearance of the model performance?**

Research Answer:


In [16]:
import os
from math import sqrt

import numpy as np
import tensorflow as tf
from matplotlib import pyplot as plt
from tensorflow import keras
from sklearn.metrics import confusion_matrix, accuracy_score, precision_recall_fscore_support


def get_models(model_path):
    models = []
    for filename in os.listdir(model_path):
        if filename.endswith(".h5"):
            model = keras.models.load_model(os.path.join(model_path, filename))
            models.append(model)
    return models


def predict_images(model, image_path):
    test_data = tf.keras.preprocessing.image_dataset_from_directory(
        image_path,
        validation_split=None,
        subset=None,
        shuffle=False,
        color_mode='grayscale',
        image_size=IMAGE_SHAPE,
        batch_size=BATCH_SIZE
    )

    predictions = model.predict(test_data)
    return tf.argmax(predictions, axis=1)


def predict_ensemble(ensemble_indexes, predictions, weights):
    pred = []
    for i in range(len(predictions[0])):
        weighted_votes = np.zeros(np.max(predictions) + 1)
        for j, instrument in enumerate(ensemble_indexes):
            instrument_prediction = predictions[instrument][i]
            weighted_votes[instrument_prediction] += weights[j]
        pred.append(np.argmax(weighted_votes))
    return pred

def plotMetrics(truth, pred):
    true_rock = truth.copy()
    true_paper = truth.copy()
    true_scissors = truth.copy()
    pred_rock = pred.copy()
    pred_paper = pred.copy()
    pred_scissors = pred.copy()
    for i in range(len(true_rock)):
        if true_rock[i] != 0:
            true_rock[i] = 1
    for i in range(len(true_paper)):
        if true_paper[i] != 1:
            true_paper[i] = 0
    for i in range(len(true_scissors)):
        if true_scissors[i] != 2:
            true_scissors[i] = 1
    for i in range(len(pred_rock)):
        if pred_rock[i] != 0:
            pred_rock[i] = 1
    for i in range(len(pred_paper)):
        if pred_paper[i] != 1:
            pred_paper[i] = 0
    for i in range(len(pred_scissors)):
        if pred_scissors[i] != 2:
            pred_scissors[i] = 1
    matrix = confusion_matrix(y_true=truth,y_pred=pred)

    #fp,tp,tn,fn for each class
    tp_rock = matrix[0, 0]
    tp_paper = matrix[1, 1]
    tp_scissors = matrix[2, 2]
    tp_total = tp_rock + tp_paper + tp_scissors

    tn_rock = np.sum(matrix) - (tp_rock + matrix[0, 1] + matrix[0, 2])
    tn_paper = np.sum(matrix) - (tp_paper + matrix[1, 0] + matrix[1, 2])
    tn_scissors = np.sum(matrix) - (tp_scissors + matrix[2, 0] + matrix[2, 1])

    fp_rock = matrix[0, 1] + matrix[0, 2]
    fp_paper = matrix[1, 0] + matrix[1, 2]
    fp_scissors = matrix[2, 0] + matrix[2, 1]

    fn_rock = matrix[1, 0] + matrix[2, 0]
    fn_paper = matrix[0, 1] + matrix[2, 1]
    fn_scissors = matrix[0, 2] + matrix[1, 2]

    #confusion matrix for each class
    confusion_matrix_rock = np.array([[tn_rock, fp_rock], [fn_rock, tp_rock]])
    confusion_matrix_paper = np.array([[tn_paper, fp_paper], [fn_paper, tp_paper]])
    confusion_matrix_scissors = np.array([[tn_scissors, fp_scissors], [fn_scissors, tp_scissors]])

    labels = ['rock', 'paper', 'scissors']


    fig, ax = plt.subplots()

    image = ax.imshow(matrix, cmap='Blues')
    for i in range(3):
        for j in range(3):
            ax.text(j, i, matrix[i][j], ha='center', va='center', color='black')

    ax.set_xticks(range(3))
    ax.set_yticks(range(3))
    ax.set_xticklabels(labels)
    ax.set_yticklabels(labels)
    ax.set_title('Confusion Matrix')
    fig.colorbar(image)
    plt.show()

    accuracy_rock = accuracy_score(y_true=true_rock,y_pred=pred_rock)
    accuracy_paper = accuracy_score(y_true=true_paper,y_pred=pred_paper)
    accuracy_scissors = accuracy_score(y_true=true_scissors,y_pred=pred_scissors)
    accuracy_total = accuracy_score(y_true=truth,y_pred=pred)
    accuracy_array = [accuracy_rock,accuracy_paper,accuracy_scissors]
    print(accuracy_rock)
    print(accuracy_paper)
    print(accuracy_scissors)
    print(accuracy_total)
    precision_array, recall_array, f1_array, _ = precision_recall_fscore_support(y_true=truth,y_pred=pred)
    precision_rock = precision_array[0]
    precision_paper = precision_array[1]
    precision_scissors = precision_array[2]

    recall_rock = recall_array[0]
    recall_paper = recall_array[1]
    recall_scissors = recall_array[2]

    f1_rock = f1_array[0]
    f1_paper = f1_array[1]
    f1_scissors = f1_array[2]

    mcc_rock = calculate_mcc(tp_rock,tn_rock,fp_rock,fn_rock)
    mcc_paper = calculate_mcc(tp_paper,tn_paper,fp_paper,fn_paper)
    mcc_scissors = calculate_mcc(tp_scissors,tn_scissors,fp_scissors,fn_scissors)
    mcc_array = [mcc_rock,mcc_paper,mcc_scissors]

    fig, ax = plt.subplots(figsize=(15,5))
    x = np.arange(3)
    ax.bar(x - 0.15, accuracy_array, 0.15, label='Accuracy',color='#FFFB7A')
    ax.bar(x, precision_array, 0.15, label='Precision',color='#65EBBB')
    ax.bar(x + 0.15, f1_array, 0.15, label='F1 score',color='#E365EB')
    ax.bar(x+0.3, recall_array, 0.15, label='Recall',color='#7398FF')
    ax.bar(x + 0.45, mcc_array, 0.15, label='MCC',color='#FF9966')
    ax.set_xticks(x)
    ax.set_xticklabels(['rock', 'paper', 'scissors'])
    ax.set_ylabel('Score')
    ax.legend()
    plt.show()


    return
def draw_graphs(matrix):
    labels = ['rock', 'paper', 'scissors']

    fig, ax = plt.subplots()

    image = ax.imshow(matrix, cmap='Blues')
    for i in range(3):
        for j in range(3):
            ax.text(j, i, matrix[i][j], ha='center', va='center', color='black')

    ax.set_xticks(range(3))
    ax.set_yticks(range(3))
    ax.set_xticklabels(labels)
    ax.set_yticklabels(labels)
    ax.set_title('Confusion Matrix')
    fig.colorbar(image)
    plt.show()


def calculate_mcc(tp, tn, fp, fn):
    try:
        return (tp * tn - fp * fn) / sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))
    except Exception as e:
        return 0

def get_best_model(mcc_total,accuracy_total,f1_total):
    max_mcc_value = np.amax(mcc_total)
    max_mcc_index = mcc_total.index(max_mcc_value)
    print("the best mcc is: " + str(max_mcc_value) + " on model_" + str(max_mcc_index + 1) + ".h5")
    max_acc_value = np.amax(accuracy_total)
    max_acc_index = accuracy_total.index(max_acc_value)
    print("the best accuracy is: " + str(max_acc_value) + " on model_" + str(max_acc_index + 1) + ".h5")
    max_f1_value = np.amax(f1_total)
    max_f1_index = f1_total.index(max_f1_value)
    print("the best f1 is: " + str(max_f1_value) + " on model_" + str(max_f1_index + 1) + ".h5")
    return max_mcc_index

def get_best_ensemble(mcc_total):
    sorted = mcc_total.copy()
    sorted.sort()
    max_mcc = sorted[-5:]
    ensemble_indexes = []
    for value in max_mcc:
        ensemble_indexes.append(mcc_total.index(value))
    return ensemble_indexes

def get_matrix_values(truth, prediction):
    matrix = confusion_matrix(y_true=truth, y_pred=prediction)

    # fp,tp,tn,fn for each class
    tp_rock = matrix[0, 0]
    tp_paper = matrix[1, 1]
    tp_scissors = matrix[2, 2]
    tp_total = tp_rock + tp_paper + tp_scissors

    tn_rock = np.sum(matrix) - (tp_rock + matrix[0, 1] + matrix[0, 2])
    tn_paper = np.sum(matrix) - (tp_paper + matrix[1, 0] + matrix[1, 2])
    tn_scissors = np.sum(matrix) - (tp_scissors + matrix[2, 0] + matrix[2, 1])

    fp_rock = matrix[0, 1] + matrix[0, 2]
    fp_paper = matrix[1, 0] + matrix[1, 2]
    fp_scissors = matrix[2, 0] + matrix[2, 1]

    fn_rock = matrix[1, 0] + matrix[2, 0]
    fn_paper = matrix[0, 1] + matrix[2, 1]
    fn_scissors = matrix[0, 2] + matrix[1, 2]

    # confusion matrix for each class
    confusion_matrix_rock = np.array([[tn_rock, fp_rock], [fn_rock, tp_rock]])
    confusion_matrix_paper = np.array([[tn_paper, fp_paper], [fn_paper, tp_paper]])
    confusion_matrix_scissors = np.array([[tn_scissors, fp_scissors], [fn_scissors, tp_scissors]])

    return matrix, [tp_rock,tp_paper,tp_scissors],[tn_rock,tn_paper,tn_scissors],[fp_rock,fp_paper,fp_scissors],[fn_rock,fn_paper,fn_scissors]


def calculate_best_model():
    num_paper = len([f for f in os.listdir(TEST_DATA_PATH+'/paper') if os.path.isfile(os.path.join(TEST_DATA_PATH+'/paper', f))])
    num_rock = len([f for f in os.listdir(TEST_DATA_PATH + '/rock') if
                     os.path.isfile(os.path.join(TEST_DATA_PATH + '/rock', f))])
    num_scissors = len([f for f in os.listdir(TEST_DATA_PATH + '/scissors') if
                     os.path.isfile(os.path.join(TEST_DATA_PATH + '/scissors', f))])
    truth = np.concatenate([np.full((num_paper),0),np.full((num_rock),1),np.full((num_scissors-1),2)])
    models = get_models('models/otsu_models60x60')
    predictions = []
    mcc_rock = []
    mcc_paper = []
    mcc_scissors = []
    mcc_total = []
    f1_rock = []
    f1_paper = []
    f1_scissors = []
    f1_total = []
    precision_rock = []
    precision_paper = []
    precision_scissors = []
    recall_rock = []
    recall_paper = []
    recall_scissors = []
    accuracy_rock = []
    accuracy_paper = []
    accuracy_scissors = []
    accuracy_total = []
    for model in models:
        prediction = predict_images(model, TEST_DATA_PATH)
        predictions.append(prediction)
        matrix, tp, tn, fp, fn = get_matrix_values(truth, prediction)
        mcc_rock.append(calculate_mcc(tp=tp[0], tn=tn[0], fp=fp[0], fn=fn[0]))
        mcc_paper.append(calculate_mcc(tp=tp[1], tn=tn[1], fp=fp[1], fn=fn[1]))
        mcc_scissors.append(calculate_mcc(tp=tp[2], tn=tn[2], fp=fp[2], fn=fn[2]))
        accuracy_total.append(accuracy_score(y_true=truth, y_pred=prediction))
        precision_array, recall_array, f1_array, _ = precision_recall_fscore_support(y_true=truth, y_pred=prediction)
        precision_rock.append(precision_array[0])
        precision_paper.append(precision_array[1])
        precision_scissors.append(precision_array[2])

        recall_rock.append(recall_array[0])
        recall_paper.append(recall_array[1])
        recall_scissors.append(recall_array[2])

        f1_rock.append(f1_array[0])
        f1_paper.append(f1_array[1])
        f1_scissors.append(f1_array[2])

    for i in range(0, len(mcc_rock)):
        mcc_total.append((mcc_rock[i] + mcc_paper[i] + mcc_scissors[i])/3)
        f1_total.append((f1_rock[i] + f1_paper[i] + f1_scissors[i])/3)

    best_index = get_best_model(mcc_total, accuracy_total, f1_total)
    plotMetrics(truth, predictions[best_index].numpy())

    best_ensemble = get_best_ensemble(mcc_total)
    total = 0
    weights = []
    for instrument in best_ensemble:
        total += mcc_total[instrument]
    for i in range(len(best_ensemble)):
        weights.append(mcc_total[i]/total)
    ensemble_prediction = predict_ensemble(best_ensemble, predictions, weights)
    plotMetrics(truth, ensemble_prediction)
    matrix, tp, tn, fp, fn = get_matrix_values(truth, ensemble_prediction)
    mcc_rock_ensemble =calculate_mcc(tp=tp[0], tn=tn[0], fp=fp[0], fn=fn[0])
    mcc_paper_ensemble = calculate_mcc(tp=tp[1], tn=tn[1], fp=fp[1], fn=fn[1])
    mcc_scissors_ensemble = calculate_mcc(tp=tp[2], tn=tn[2], fp=fp[2], fn=fn[2])
    mcc_total_ensemble =(mcc_rock_ensemble + mcc_paper_ensemble + mcc_scissors_ensemble) / 3
    print("MCC ensemble: "+str(mcc_total_ensemble))



IMAGE_SHAPE = (60, 60)
BATCH_SIZE = 32
TEST_DATA_PATH = '../Dataset/testing_otsu'

calculate_best_model()