# Sequence-to-Sequence model

The Jupyter Notebook aims to train a sequence-to-sequence model for translating Hong Kong Sign Languages into a list of glosses.

## To Readers:
- You should first create a virtual environment using `requirements.txt` in this directory
- Feature extraction takes a VERY long time to perform. If you have enough time to attempt to replicate this, you may either:
    - Set `CACHE_BATCH` and `USE_CACHE_BATCH` to `True` in the following cell, and this notebook will cache all the keypoints; or
    - Set `USE_CACHE_BATCH` to `False` in the following cell, and features will be extracted on the fly (takes very long!); or
    - (CPU) Run everything before `Feature Extraction` in this notebook, then change the setting at the top of `keypoint-gen-cpu.py` and run the script; or
    - (GPU) Run everything before `Feature Extraction` in this notebook, then change the setting at the top of `keypoint-gen-gpu.py` and run the script **on Ubuntu with GPU**. **Windows, MacOS, WSL Ubuntu, or any other Linux distributions are NOT supported** 
        - Since Holistic does not support GPU, this script uses the latest solution on face, pose and hand landmarker which includes 10 more face keypoints for the iris.
        - Since the face contours does not utilize these 10 keypoints, they will all be filtered during generation.
        - Thus, generated keypoints are also compatible with the intended shape of the model.
## Notes
- Batch size of 32 is **HARDWARE LIMIT**. On project machines, a tensor with batch size of 64 cannot be created.
- Cache should NOT be put on the mounted windows drive. Otherwise, it would take eternity to read the keypoint files.
    - Yes, I know it takes up C drive spaces, but I have no choice. Im not going to have 3 hours per Epoch.

List of trials: (format: batchSize_Epoch_LatenDim)
- {empty}: trained using 32_256_512, without weighting
- weighted_32_256_512: trained using inverse frequency normalized by max outside of the notebook using a script
- weighted_32_256_512_2: trained using inverse frequency normalized by max
- weighted_32_256_512_3: trained using tf-idf
    - result: improved categorical accuracy to 0.8730
    - extended to 512 epoch: even better accuracy
- weighted_32_512_1024_1: **we going big this time :)**. See if accuracy improves this time
    - categorical_accuracy: 0.9268
    - Total raw accuracy on testing data: 0.018029056537720987
    - Average raw accuracy on testing data (method 2): 0.024452341305154224
    - BLEU Score:  8.933399549602846e-232
- weighted_32_512_1024_2: Same as above, but does not use weighting

In [None]:
MODE = "train" # train |dev

# set to True if you want to cache Y values extracted from the split file
CACHE_Y = True

# set to True if you want to pre-generate and cache the batched data for training. 
CACHE_BATCH = False

# set to True if you want to use cached batch data to train. this will cause every epoch to train from the same data
# if you wish to train from transformed data every epoch, set this to False. this will replace the data input of model fitting process with a generator
USE_CACHE_BATCH = True

# set to True if you want to apply weighting while generating
GENERATE_WEIGHT = False

# set to True if you want to apply weighting to cache data. use if you have generated non-weighted data
USE_WEIGHT = False

# set to True if you do not want to use transformation. applies to cache data only
NO_TRANSFORM = False

# set which RNN model to use
RNN_MODE = "LSTM" # LSTM | GRU

# model parameters config
BATCH_SIZE = 32
EPOCH = 512
LATENT_DIM = 1024
TRIAL = 2

weighted_suffix = "weighted" if GENERATE_WEIGHT or USE_WEIGHT else ""

MODEL_DIR = f"../model/{MODE}_{RNN_MODE}_{weighted_suffix}_{BATCH_SIZE}_{EPOCH}_{LATENT_DIM}_{TRIAL}"
CACHE_DIR = f"../cache/{MODE}"
RESULT_DIR = f"../results/{MODE}_{RNN_MODE}_{weighted_suffix}_{BATCH_SIZE}_{EPOCH}_{LATENT_DIM}_{TRIAL}"

# MODEL_PATH = f"../model/train_model.keras"
# ENCODER_PATH = f"../model/train_encoder.keras"
# DECODER_PATH = f"../model/train_decoder.keras"

## Declaration of Save Paths and Import of Libraries

In [None]:
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
os.environ["GLOG_minloglevel"] ="3"

# only override for specific file generation
make_dir_override = True
if make_dir_override:
    print("Warning: Overriding existing directories. Files may be overwritten.")
os.makedirs(MODEL_DIR, exist_ok=make_dir_override)
os.makedirs(RESULT_DIR, exist_ok=make_dir_override)
os.makedirs(CACHE_DIR, exist_ok=True)

MODEL_PATH = f"{MODEL_DIR}/model.keras"
ENCODER_PATH = f"{MODEL_DIR}/encoder.keras"
DECODER_PATH = f"{MODEL_DIR}/decoder.keras"

RESULT_FILE_NAME = f"{RESULT_DIR}/result.csv"
HISTORY_FILE_NAME = f"{RESULT_DIR}/history.csv"
ACC_PLOT_FILE_NAME = f"{RESULT_DIR}/acc_plot.png"
LOSS_PLOT_FILE_NAME = f"{RESULT_DIR}/loss_plot.png"

# print all finalized file paths
print("MODEL_PATH:", MODEL_PATH)
print("ENCODER_PATH:", ENCODER_PATH)
print("DECODER_PATH:", DECODER_PATH)
print("RESULT_FILE_NAME:", RESULT_FILE_NAME)
print("HISTORY_FILE_NAME:", HISTORY_FILE_NAME)
print("ACC_PLOT_FILE_NAME:", ACC_PLOT_FILE_NAME)
print("LOSS_PLOT_FILE_NAME:", LOSS_PLOT_FILE_NAME)

In [None]:
import pandas as pd
import numpy as np
import json
import cv2
from mediapipe.python.solutions.holistic import Holistic
import time
import keras
from concurrent.futures import ThreadPoolExecutor
from keras.optimizers import RMSprop
from keras.models import Model
from keras.layers import Input, LSTM, Dense, Masking, GRU
import csv
import tensorflow as tf
from functools import partial
import matplotlib.pyplot as plt

# check for cuda availability
print(tf.config.list_physical_devices('GPU'))
assert len(tf.config.list_physical_devices('GPU')) > 0, "No GPU available"

In [None]:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        print(e)
keras.mixed_precision.set_global_policy('mixed_float16')

## Dataset Preparation

Parses the split files first

In [None]:
class tvb_hksl_split_parser():
    def __init__(self, file: str):
        self.file = file
        self.train_info = pd.read_csv(self.file, delimiter="|") 
        # extend the dataframe with extracted information
        self.train_info["glosses_tokenized"] = self.train_info["glosses"].str.split(' ')
        # self.train_info["date"] = self.train_info["id"].str.split('/').apply(lambda x: x[0])
        self.train_info["frames"] = self.train_info["id"].str.split('/').apply(lambda x: x[1])
        self.train_info["length"] = self.train_info["frames"].str.split('-').apply(lambda x: int(x[1]) - int(x[0]) + 1)
        # add <START> and <END> tokens to the glosses
        self.train_info["glosses_tokenized"] = self.train_info["glosses_tokenized"].apply(lambda x: ["<START>"] + x + ["<END>"])
        

    def get_train_id(self) -> pd.Series:
        if os.name == "nt": # for windows system only
            return self.train_info["id"].str.replace("/", "\\")
        return self.train_info["id"]

    # def get_train_date(self) -> pd.Series:
    #     return self.train_info["date"]
    
    # def get_train_frames(self) -> pd.Series:
    #     return self.train_info["frames"]

    # def get_train_length(self) -> pd.Series:
    #     return self.train_info["length"]

    def get_train_glosses_tokenized(self) -> pd.Series:
        return self.train_info["glosses_tokenized"]

    def get_max_length(self) -> int:
        return self.train_info["length"].max()

    # removed bc it returns a duplicate, not by memory reference
    # def get_full_info(self) -> pd.DataFrame:
    #     return self.train_info
    
    def get_word_dict(self) -> dict:
        word_dict = {}
        for tokens in self.train_info["glosses_tokenized"]:
            for token in tokens:
                if token not in word_dict:
                    word_dict[token] = len(word_dict)
        return word_dict

    def rare_token_reduction(self, token_freq) -> None:
        # create a dictionary of all tokens and their frequencies
        # token_freq = {}
        # for tokens in self.train_info["glosses_tokenized"]:
        #     for token in tokens:
        #         if token in token_freq:
        #             token_freq[token] += 1
        #         else:
        #             token_freq[token] = 1

        # simpler approach: if any token has a frequence of < 5, replace that token with <UNK>
        def replace_rare_tokens(tokens):
            return ["<UNK>" if token_freq[token] < 5 else token for token in tokens]
        self.train_info["glosses_tokenized"] = self.train_info["glosses_tokenized"].apply(replace_rare_tokens)

    def rare_sample_reduction(self, token_freq) -> None:
        # remove samples with words that satisfy token_freq[token] = 1
        self.train_info = self.train_info[self.train_info["glosses_tokenized"].apply(lambda x: any([token_freq[token] < 5 if token in token_freq else True for token in x]))]

Generate the word dictionary here.

In [None]:
train_parser = tvb_hksl_split_parser("../dataset/tvb-hksl-news/split/train.csv")
test_parser = tvb_hksl_split_parser("../dataset/tvb-hksl-news/split/test.csv")
dev_parser = tvb_hksl_split_parser("../dataset/tvb-hksl-news/split/dev.csv")

# make a word dictionary
word_dict = {}
word_dict["<END>"] = len(word_dict)
word_dict["<START>"] = len(word_dict)
word_dict["<X>"] = len(word_dict)
word_dict["<BAD>"] = len(word_dict)
word_dict["<MUMBLE>"] = len(word_dict)
word_dict["<STOP>"] = len(word_dict)
# word_dict["<UNK>"] = len(word_dict)

for parser in [train_parser, test_parser, dev_parser]:
    for glosses in parser.get_train_glosses_tokenized():
        for word in glosses:
            if word not in word_dict:
                word_dict[word] = len(word_dict)

# save the word dictionary
with open("../data/word_dict.json", "w") as f:
    json.dump(word_dict, f)
# save reverse word dictionary
reverse_word_dict = {v: k for k, v in word_dict.items()}
with open("../data/reverse_word_dict.json", "w+") as f:
    json.dump(reverse_word_dict, f)

In [None]:
# create a frequency map: word -> frequency
token_freq = {}
total_word_count = 0
for k, v in word_dict.items():
    token_freq[v] = 0
for parser in [train_parser, test_parser, dev_parser]:
    for glosses in parser.get_train_glosses_tokenized():
        for word in glosses:
            token_freq[word_dict[word]] += 1
            total_word_count += 1
print(token_freq)
print(len(token_freq))

In [None]:
# generate a weighting list, where lower frequency words have higher weight

# basic inverse frequency weighting
# weighting_list = [1 / token_freq[word] for word in token_freq]
# weighting_list = [x / max(weighting_list) for x in weighting_list]

# tf-idf weighting
tf_list = np.array([token_freq[word] / total_word_count for word in token_freq]) # freq / total for each word
idf_list = np.log(len(token_freq) / tf_list) # log(total / freq ratio) for each word
weighting_list = tf_list * idf_list 

# make sure that <START> and <END> have full weight
weighting_list[word_dict["<START>"]] = 1
weighting_list[word_dict["<END>"]] = 1
print(weighting_list)

# export weighting list
# convert numpy array to list
weighting_list = weighting_list.tolist()
with open("../data/weighting_list.json", "w") as f:
    json.dump(weighting_list, f)

In [None]:
# sample preprocessing
# train_parser.rare_sample_reduction(token_freq)
# test_parser.rare_sample_reduction(token_freq)
# dev_parser.rare_sample_reduction(token_freq)

if MODE == "train":
    actual_train_parser = train_parser
elif MODE == "dev":
    actual_train_parser = dev_parser

# if a word in test_parser is not in dev_parser or train_parser, remove that sample
# this is to prevent the model from predicting words that are not in the training set
# supposedly, this should not happen, but just in case
parser_word_dict = actual_train_parser.get_word_dict()
test_parser.train_info = test_parser.train_info[test_parser.train_info["glosses_tokenized"].apply(lambda x: all([word in parser_word_dict for word in x]))]
    
# assert that all words in test_parser are also in train_parser
test_word_dict = test_parser.get_word_dict()
assert all([word in parser_word_dict for word in test_word_dict])

# finally, print the number of samples in each parser
print(f"train_parser: {len(train_parser.train_info)}")
print(f"test_parser: {len(test_parser.train_info)}")
print(f"dev_parser: {len(dev_parser.train_info)}")

Generate decoder input and target data.

In [None]:
# based on the dictionary, create one-hot vectors for each word
# then create decoder inputs and targets
all_input_found = False
if CACHE_Y:
    # attempt to load the cached data if it exists
    path_requirements = [
        f"../cache/train_decoder_input.npy",
        f"../cache/train_decoder_target.npy",
        f"../cache/test_decoder_input.npy",
        f"../cache/test_decoder_target.npy",
        f"../cache/dev_decoder_input.npy",
        f"../cache/dev_decoder_target.npy"
    ]

    if all([os.path.exists(path) for path in path_requirements]):
        train_decoder_input = np.load(f"../cache/train_decoder_input.npy", mmap_mode="r")
        train_decoder_target = np.load(f"../cache/train_decoder_target.npy", mmap_mode="r")
        test_decoder_input = np.load(f"../cache/test_decoder_input.npy", mmap_mode="r")
        test_decoder_target = np.load(f"../cache/test_decoder_target.npy", mmap_mode="r")
        dev_decoder_input = np.load(f"../cache/dev_decoder_input.npy", mmap_mode="r")
        dev_decoder_target = np.load(f"../cache/dev_decoder_target.npy", mmap_mode="r")
        all_input_found = True
        print("All cached data found.")
    else: print("Cached data not found.")


In [None]:
# only run if there is no cached data
if not all_input_found:
    print("Creating decoder inputs and targets")
    train_glosses = train_parser.get_train_glosses_tokenized()
    dev_glosses = dev_parser.get_train_glosses_tokenized()
    test_glosses = test_parser.get_train_glosses_tokenized()

    # find max length of glosses
    max_length_train = train_glosses.apply(len).max()
    max_length_test = test_glosses.apply(len).max()
    max_length_dev = dev_glosses.apply(len).max()
    max_length = max(max_length_train, max_length_test, max_length_dev)
    print("Max length of glosses:", max_length)

    def create_decoder_inputs_targets(glosses: pd.Series, word_dict: dict, max_length: int):
        decoder_input = np.zeros((len(glosses), max_length, len(word_dict)))
        decoder_target = np.zeros((len(glosses), max_length, len(word_dict)))
        for i in range(len(glosses)):
            for j in range(len(glosses[i])):
                decoder_input[i, j, word_dict[glosses[i][j]]] = 1
                if j > 0:
                    decoder_target[i, j-1, word_dict[glosses[i][j]]] = 1
        return decoder_input, decoder_target

    train_decoder_input, train_decoder_target = create_decoder_inputs_targets(train_glosses, word_dict, max_length)
    test_decoder_input, test_decoder_target = create_decoder_inputs_targets(test_glosses, word_dict, max_length)
    dev_decoder_input, dev_decoder_target = create_decoder_inputs_targets(dev_glosses, word_dict, max_length)

    if CACHE_Y:
        np.save("../cache/train_decoder_input.npy", train_decoder_input)
        np.save("../cache/train_decoder_target.npy", train_decoder_target)
        np.save("../cache/test_decoder_input.npy", test_decoder_input)
        np.save("../cache/test_decoder_target.npy", test_decoder_target)
        np.save("../cache/dev_decoder_input.npy", dev_decoder_input)
        np.save("../cache/dev_decoder_target.npy", dev_decoder_target)
    
        del train_decoder_input
        del train_decoder_target
        del test_decoder_input
        del test_decoder_target
        del dev_decoder_input
        del dev_decoder_target
    
        train_decoder_input = np.load("../cache/train_decoder_input.npy", mmap_mode="r")
        train_decoder_target = np.load("../cache/train_decoder_target.npy", mmap_mode="r")
        test_decoder_input = np.load("../cache/test_decoder_input.npy", mmap_mode="r")
        test_decoder_target = np.load("../cache/test_decoder_target.npy", mmap_mode="r")
        dev_decoder_input = np.load("../cache/dev_decoder_input.npy", mmap_mode="r")
        dev_decoder_target = np.load("../cache/dev_decoder_target.npy", mmap_mode="r")

In [None]:
# DEBUG: print the first sample of the training data undecoded using argmax
print("Training data:")
print("Input:")
print([reverse_word_dict[i] for i in np.argmax(train_decoder_input[0], axis=1)])
print("Target:")
print([reverse_word_dict[i] for i in np.argmax(train_decoder_target[0], axis=1)])
# counter = 0
# while True:
#     local_input_data = [reverse_word_dict[i] for i in np.argmax(train_decoder_input[counter], axis=1)]
#     local_target_data = [reverse_word_dict[i] for i in np.argmax(train_decoder_target[counter], axis=1)]
#     if "<UNK>" not in local_input_data:
#         counter += 1
#         continue
#     print("Input:")
#     print(local_input_data)
#     print("Target:")
#     print(local_target_data)
#     break

# DEBUG: print the shapes of all the data
print("Train decoder input shape:", train_decoder_input.shape)  
print("Train decoder target shape:", train_decoder_target.shape)
print("Test decoder input shape:", test_decoder_input.shape)
print("Test decoder target shape:", test_decoder_target.shape)
print("Dev decoder input shape:", dev_decoder_input.shape)
print("Dev decoder target shape:", dev_decoder_target.shape)

## Feature Extraction

Extract the features of each video sample. For each extraction, we apply a random transformation to it.

### Image Transformation

In [None]:
def pad_to_size(image, target_size=(1920, 1080)):
    rows, cols, _ = image.shape
    target_cols, target_rows = target_size
    scale_factor = min(target_cols / cols, target_rows / rows)
    new_cols = int(cols * scale_factor)
    new_rows = int(rows * scale_factor)
    resized_image = cv2.resize(image, (new_cols, new_rows), interpolation=cv2.INTER_LINEAR)

    top = (target_rows - new_rows) // 2
    bottom = target_rows - new_rows - top
    left = (target_cols - new_cols) // 2
    right = target_cols - new_cols - left
    padded_image = cv2.copyMakeBorder(resized_image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=[0, 0, 0])

    return padded_image

def apply_random_transformation(image, angle, tx, ty, scale):
    rows, cols, _ = image.shape

    # rotation
    M = cv2.getRotationMatrix2D((cols / 2, rows / 2), angle, 1)
    rotated_image = cv2.warpAffine(image, M, (cols, rows))

    # translation
    M = np.float32([[1, 0, tx], [0, 1, ty]])
    translated_image = cv2.warpAffine(rotated_image, M, (cols, rows))

    # scaling
    scaled_image = cv2.resize(translated_image, None, fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)

    # consistency
    final_image = cv2.resize(scaled_image, (cols, rows), interpolation=cv2.INTER_LINEAR)

    return final_image

def preprocess_image(image, target_size=(1920, 1080), angle=0, tx=0, ty=0, scale=1):
    padded_image = pad_to_size(image, target_size)
    transformed_image = apply_random_transformation(padded_image, angle, tx, ty, scale)
    return transformed_image

### Feature Extraction

Here, we use `mediapipe` by google to extract

In [None]:

def get_mediapipe_keypoints_index() -> list[int]:
        """
        Returns the indices of the keypoints that we want to keep.
        
        For the third dimension, we only want to keep the coordinates of
        - Pose
        - Face border
        - Lips
        - Eyes
        - Eyebrows
        - Nose
        - Face Oval (border of face)
        - Left hand
        - Right hand

        This is because we want the keypoints to be robust, thus the facial features that are unique to each signer are discarded.

        Reference: https://github.com/LearningnRunning/py_face_landmark_helper/blob/main/mediapipe_helper/config.py
        Image: https://raw.githubusercontent.com/google/mediapipe/master/mediapipe/modules/face_geometry/data/canonical_face_model_uv_visualization.png
        Related stack overflow post: https://stackoverflow.com/questions/74901522/can-mediapipe-specify-which-parts-of-the-face-mesh-are-the-lips-or-nose-or-eyes
        """
        # pose with visibility
        # POSE = list(range(0, 33*4))

        # pose without visibility
        POSE_UNPROCESSED = range(0, 33*4)
        # POSE = [i for i in POSE_UNPROCESSED if i % 4 != 3]
        # for x, y only
        # discard Z due to documentation https://github.com/google-ai-edge/mediapipe/blob/master/docs/solutions/holistic.md
        POSE = [i for i in POSE_UNPROCESSED if i % 4 != 2 and i % 4 != 3]

        # face
        # NOTE: the following keypoint indices are HARD-CODED based on the visualization of the face mesh
        # reference: https://github.com/LearningnRunning/py_face_landmark_helper/blob/main/mediapipe_helper/config.py
        # image: https://raw.githubusercontent.com/google/mediapipe/master/mediapipe/modules/face_geometry/data/canonical_face_model_uv_visualization.png
        # related stack overflow post: https://stackoverflow.com/questions/74901522/can-mediapipe-specify-which-parts-of-the-face-mesh-are-the-lips-or-nose-or-eyes
        FACE_LIPS = [0, 267, 269, 270, 13, 14, 17, 402, 146, 405, 409, 415, 291, 37, 39, 40, 178, 308, 181, 310, 311, 312, 185, 314, 317, 318, 61, 191, 321, 324, 78, 80, 81, 82, 84, 87, 88, 91, 95, 375]
        LEFT_EYE = [384, 385, 386, 387, 388, 390, 263, 362, 398, 466, 373, 374, 249, 380, 381, 382]
        LEFT_EYEBROW = [293, 295, 296, 300, 334, 336, 276, 282, 283, 285]
        RIGHT_EYE = [160, 33, 161, 163, 133, 7, 173, 144, 145, 246, 153, 154, 155, 157, 158, 159]
        RIGHT_EYEBROW = [65, 66, 70, 105, 107, 46, 52, 53, 55, 63]
        FACE_NOSE = [1, 2, 4, 5, 6, 19, 275, 278, 294, 168, 45, 48, 440, 64, 195, 197, 326, 327, 344, 220, 94, 97, 98, 115]
        FACE_OVAL = [132, 389, 136, 10, 397, 400, 148, 149, 150, 21, 152, 284, 288, 162, 297, 172, 176, 54, 58, 323, 67, 454, 332, 338, 93, 356, 103, 361, 234, 109, 365, 379, 377, 378, 251, 127]
        FACE_UNPROCESSED = [item + 33*4 for sublist in [FACE_LIPS, LEFT_EYE, LEFT_EYEBROW, RIGHT_EYE, RIGHT_EYEBROW, FACE_NOSE, FACE_OVAL] for item in sublist]
        # face keypoints are in x, y, z format flattened, so we need to capture all x, y, z values
        FACE = [i for j in range(0, len(FACE_UNPROCESSED), 3) for i in range(FACE_UNPROCESSED[j], FACE_UNPROCESSED[j] + 3)]
        # for x, y only
        # FACE = [i for j in range(0, len(FACE_UNPROCESSED), 3) for i in range(FACE_UNPROCESSED[j], FACE_UNPROCESSED[j] + 2)]

        # hands
        LEFT_HAND = list(range(33*4 + 468*3, 33*4 + 468*3 + 21*3))
        RIGHT_HAND = list(range(33*4 + 468*3 + 21*3, 33*4 + 468*3 + 21*3 + 21*3))
        # for x, y only
        # LEFT_HAND = [i for i in list(range(33*4 + 468*3, 33*4 + 468*3 + 21*3)) if i % 3 != 2]
        # RIGHT_HAND = [i for i in list(range(33*4 + 468*3 + 21*3, 33*4 + 468*3 + 21*3 + 21*3)) if i % 3 != 2]
        KEYPOINTS_INDEX = POSE + FACE + LEFT_HAND + RIGHT_HAND
        return KEYPOINTS_INDEX

STATIC_KEYPOINTS_INDEX = get_mediapipe_keypoints_index() # saves time by not recalculating the indices

def mediapipe_detection(frame, holistic):
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    result = holistic.process(frame)
    # frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
    return result

def mediapipe_extract_keypoints(result):
    # print(len(result.pose_landmarks.landmark))
    # print(len(result.face_landmarks.landmark))
    # print(len(result.left_hand_landmarks.landmark))
    # print(len(result.right_hand_landmarks.landmark))
    pose = np.array([[res.x, res.y, res.z, res.visibility] for res in result.pose_landmarks.landmark]).flatten() if result.pose_landmarks else np.zeros(33*4)
    face = np.array([[res.x, res.y, res.z] for res in result.face_landmarks.landmark]).flatten() if result.face_landmarks else np.zeros(468*3)
    lh = np.array([[res.x, res.y, res.z] for res in result.left_hand_landmarks.landmark]).flatten() if result.left_hand_landmarks else np.zeros(21*3)
    rh = np.array([[res.x, res.y, res.z] for res in result.right_hand_landmarks.landmark]).flatten() if result.right_hand_landmarks else np.zeros(21*3)
    concat = np.concatenate([pose, face, lh, rh])
    return concat[STATIC_KEYPOINTS_INDEX]

def mediapipe_extract(frames, override_random = False): # frames is a list of frames
    if override_random:
        angle = 0
        tx = 0
        ty = 0
        scale = 1
    else:
        # predefine random transformation parameters
        angle = np.random.uniform(-10, 10)
        tx = np.random.uniform(-100, 100)
        ty = np.random.uniform(-60, 60)
        scale = np.random.uniform(0.6, 1.2)
    frames = [preprocess_image(frame, angle=angle, tx=tx, ty=ty, scale=scale) for frame in frames]
    # one optimization done here is to use the same holistic object for all frames
    # this way, the model only needs to be loaded once
    # then keypoints can be tracked until not found
    with Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5, model_complexity=0) as holistic:
        return [mediapipe_extract_keypoints(mediapipe_detection(frame, holistic)) for frame in frames]

In [None]:
# set to true to test the mediapipe extraction
if False:
    sample_image = cv2.imread("../dataset/tvb-hksl-news/frames/2020-01-16/000453-000550/000453.jpg")
    sample_keypoints = mediapipe_extract([sample_image])

In [None]:
# set to true to test the image preprocessing
if False:
    import matplotlib.pyplot as plt
    source_directory = "../dataset/tvb-hksl-news/frames/2020-01-16/000453-000550"
    source_list_frames = [cv2.imread(os.path.join(source_directory, frame)) for frame in sorted(os.listdir(source_directory))]
    angle = np.random.uniform(-30, 30)
    tx = np.random.uniform(-100, 100)
    ty = np.random.uniform(-50, 50)
    scale = np.random.uniform(0.6, 1.2)
    frames = [preprocess_image(frame, angle=angle, tx=tx, ty=ty, scale=scale) for frame in source_list_frames]
    # output each frame in jupyter notebook
    for frame in frames:
        plt.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
        plt.show()

### Keypoint Generator

Create a generator by inheriting `keras.utils.Sequence`

In [None]:
# Preparation: get the largest length of sequences of x
train_max_length = train_parser.get_max_length()
test_max_length = test_parser.get_max_length()
dev_max_length = dev_parser.get_max_length()

X_max_length = max(train_max_length, test_max_length, dev_max_length)
print("Max length of sequences of X:", X_max_length)

In [None]:
class KeypointGenerator(keras.utils.Sequence):
    def __init__(self, X, decoder_input, decoder_target, X_max_length, batch_size=32):
        self.x = X
        self.decoder_input = decoder_input
        self.decoder_target = decoder_target
        self.X_max_length = X_max_length
        self.batch_size = batch_size

    def __len__(self):
        # return len(self.x) // self.batch_size
        return (len(self.x) + self.batch_size - 1) // self.batch_size
    
    def __preprocess_x__(self, dir_location):
        # x is the train id. the directory of the frames is located at ../dataset/tvb-hksl-news/frames/{train_id}
        source_directory = f"../dataset/tvb-hksl-news/frames/{dir_location}"
        # print(sorted(os.listdir(source_directory)))
        source_list_frames = [cv2.imread(os.path.join(source_directory, frame)) for frame in sorted(os.listdir(source_directory))]
        return source_list_frames

    def __getitem__(self, idx, override_random = False):
        start_idx = idx * self.batch_size
        end_idx = min((idx + 1) * self.batch_size, len(self.x))

        # edge case: start_idx = len(self.x), which means the previous batch was the last batch
        # if start_idx == len(self.x):
        #     return [None, None], None
        if start_idx >= len(self.x):
            raise IndexError("Index out of range for the generator")

        batch_x = self.x[start_idx:end_idx]
        batch_decoder_input = self.decoder_input[start_idx:end_idx]
        batch_decoder_target = self.decoder_target[start_idx:end_idx]
        # batch_x = [self.__preprocess_x__(dir_location) for dir_location in batch_x]
        # batch_x = [mediapipe_extract(frames) for frames in batch_x]
        with ThreadPoolExecutor(max_workers=4) as executor:
            batch_x = list(executor.map(self.__preprocess_x__, batch_x))
        partial_mediapipe_extract = partial(mediapipe_extract, override_random=override_random)
        with ThreadPoolExecutor(max_workers=4) as executor:
            # TODO: research for a faster method than mediapipe
            # batch_x = list(executor.map(mediapipe_extract, batch_x, override_random=[override_random] * len(batch_x)))
            batch_x = list(executor.map(partial_mediapipe_extract, batch_x))

        # # pad each sequence to the max length
        batch_x = keras.preprocessing.sequence.pad_sequences(batch_x, maxlen=self.X_max_length, padding="post", dtype="float32")
        batch_x = np.array(batch_x)
        batch_decoder_input = np.array(batch_decoder_input)
        batch_decoder_target = np.array(batch_decoder_target)
        return (batch_x, batch_decoder_input), batch_decoder_target

In [None]:
# Testing
if False:
    dev_generator = KeypointGenerator(dev_parser.get_train_id(), dev_decoder_input, dev_decoder_target, X_max_length, batch_size=10)
    start = time.time()
    dev_generator.__getitem__(32)
    end = time.time()
    print("Time taken:", end - start)
    del dev_generator

# Model Training

Create a sequence-to-sequence model

## Data Generator

In [None]:
# iteration value is only used for pre-generating and caching the data
if MODE == "train":
    keypoint_generator = KeypointGenerator(train_parser.get_train_id(), train_decoder_input, train_decoder_target, X_max_length, batch_size=BATCH_SIZE)
    iteration = 20
elif MODE == "dev":
    keypoint_generator = KeypointGenerator(dev_parser.get_train_id(), dev_decoder_input, dev_decoder_target, X_max_length, batch_size=BATCH_SIZE)
    iteration = 10
else: raise ValueError("Invalid mode")

x_dir = os.path.join(CACHE_DIR, "x")
decoder_input_dir = os.path.join(CACHE_DIR, "decoder_input")
decoder_target_dir = os.path.join(CACHE_DIR, "decoder_target")

os.makedirs(x_dir, exist_ok=True)
os.makedirs(decoder_input_dir, exist_ok=True)
os.makedirs(decoder_target_dir, exist_ok=True)

In [None]:
# if CACHE_BATCH, generate the batched data and save them. this will cause the data to be uniform every epoch (kinda bad for learning), but speeds up training since no processing is needed in every epoch
if CACHE_BATCH:
    for i in range(iteration):
        counter = 0
        while True:
            print(f"Iteration {i}, Counter {counter}")
            # break if the generator is exhausted (i.e. output < batch_size)
            (batch_x, batch_decoder_input), batch_decoder_target = keypoint_generator.__getitem__(counter)
            if batch_x is None: # edge case only
                print("Generator exhausted")
                break
            file_name = f"iteration_{i}_batch_{counter}.npy"
            np.save(os.path.join(x_dir, file_name), batch_x)
            np.save(os.path.join(decoder_input_dir, file_name), batch_decoder_input)
            np.save(os.path.join(decoder_target_dir, file_name), batch_decoder_target)
            counter += 1
            batch_x_len = len(batch_x)
            if len(batch_x) < keypoint_generator.batch_size:
                print("Generator exhausted")
                break

In [None]:
# also generate control data for testing
if CACHE_BATCH:
    counter = 0
    while True:
        print(f"Control, Counter {counter}")
        (batch_x, batch_decoder_input), batch_decoder_target = keypoint_generator.__getitem__(counter, override_random=True)
        if batch_x is None:
            print("Generator exhausted")
            break
        file_name = f"control_batch_{counter}.npy"
        np.save(os.path.join(x_dir, file_name), batch_x)
        np.save(os.path.join(decoder_input_dir, file_name), batch_decoder_input)
        np.save(os.path.join(decoder_target_dir, file_name), batch_decoder_target)
        counter += 1
        batch_x_len = len(batch_x)
        if len(batch_x) < keypoint_generator.batch_size:
            print("Generator exhausted")
            break

In [None]:
# load all data and combine them into one
class CachedKeypointGenerator(keras.utils.Sequence):
    def __init__(self, x_dir, decoder_input_dir, decoder_target_dir, batch_size=32):
        self.x_dir = x_dir
        self.decoder_input_dir = decoder_input_dir
        self.decoder_target_dir = decoder_target_dir
        self.batch_size = batch_size
        self.list_x_files = sorted(os.listdir(self.x_dir))
        self.list_decoder_input_files = sorted(os.listdir(self.decoder_input_dir))
        self.list_decoder_target_files = sorted(os.listdir(self.decoder_target_dir))
        if NO_TRANSFORM:
            # only keep files that start with "control"
            self.list_x_files = [file for file in self.list_x_files if file.startswith("control")]
            self.list_decoder_input_files = [file for file in self.list_decoder_input_files if file.startswith("control")]
            self.list_decoder_target_files = [file for file in self.list_decoder_target_files if file.startswith("control")]
        self.total_files = len(self.list_x_files)
        self.batch_index = 0

    def __len__(self):
        return self.total_files

    def __getitem__(self, idx: int):
        batch_x = np.load(os.path.join(self.x_dir, self.list_x_files[idx]), mmap_mode="r")
        batch_decoder_input = np.load(os.path.join(self.decoder_input_dir, self.list_decoder_input_files[idx]), mmap_mode="r")
        batch_decoder_target = np.load(os.path.join(self.decoder_target_dir, self.list_decoder_target_files[idx]), mmap_mode="r")
        if USE_WEIGHT:
            # on the decoder input and target, apply the weighting list
            batch_decoder_input = batch_decoder_input * weighting_list
            batch_decoder_target = batch_decoder_target * weighting_list
        return (batch_x, batch_decoder_input), batch_decoder_target

# in case we want to use provided keypoints or old keypoints
# for these, no preprocessing is done on the images
class ProvidedKeypointGenerator(keras.utils.Sequence):
    def __init__(self, parser, decoder_input, decoder_target, X_max_length, batch_size=32):
        self.parser: tvb_hksl_split_parser = parser
        self.decoder_input = decoder_input
        self.decoder_target = decoder_target
        self.X_max_length = X_max_length
        self.batch_size = batch_size

    def __len__(self):
        return (len(self.parser.get_train_id()) + self.batch_size - 1) // self.batch_size

    def __getitem__(self, idx):
        start_idx = idx * self.batch_size
        end_idx = min((idx + 1) * self.batch_size, len(self.parser.get_train_id()))

        batch_x = self.parser.get_train_id()[start_idx:end_idx]
        batch_decoder_input = self.decoder_input[start_idx:end_idx]
        batch_decoder_target = self.decoder_target[start_idx:end_idx]
        
        keypoints_dir = "../dataset/tvb-hksl-news/keypoints_mediapipe"
        batch_x = [np.load(os.path.join(keypoints_dir, f"{train_id}.npy"), mmap_mode="r") for train_id in batch_x]
        batch_x = keras.preprocessing.sequence.pad_sequences(batch_x, maxlen=self.X_max_length, padding="post", dtype="float32")
        batch_x = np.array(batch_x)
        batch_decoder_input = np.array(batch_decoder_input)
        batch_decoder_target = np.array(batch_decoder_target)
        if USE_WEIGHT:
            # on the decoder input and target, apply the weighting list
            batch_decoder_input = batch_decoder_input * weighting_list
            batch_decoder_target = batch_decoder_target * weighting_list
        return (batch_x, batch_decoder_input), batch_decoder_target


if USE_CACHE_BATCH:
    del keypoint_generator
    keypoint_generator = CachedKeypointGenerator(x_dir, decoder_input_dir, decoder_target_dir, batch_size=BATCH_SIZE)
    print("Cached generator created")

In [None]:
if USE_CACHE_BATCH:
    print("Using cached data")
    len_features = len(keypoint_generator.__getitem__(0)[0][0][0][0])
else:
    # generate the first batch but with batch_size = 1 to get the length of the features
    temp_keypoint_generator = KeypointGenerator(train_parser.get_train_id(), train_decoder_input, train_decoder_target, X_max_length, batch_size=1)
    (temp_batch_x, temp_batch_decoder_input), temp_batch_decoder_target = temp_keypoint_generator.__getitem__(0)
    len_features = len(temp_batch_x[0][0])
    del temp_keypoint_generator
    del temp_batch_x
    del temp_batch_decoder_input
    del temp_batch_decoder_target

if RNN_MODE == "LSTM":
    # encoder
    encoder_input = Input(shape=(None, len_features))
    encoder_mask = Masking()(encoder_input)
    encoder_lstm = LSTM(LATENT_DIM, return_state=True, return_sequences=True, use_cudnn=False)
    encoder_outputs, state_h, state_c = encoder_lstm(encoder_mask)
    encoder_states = [state_h, state_c]

    # decoder
    decoder_input = Input(shape=(None, len(word_dict)))
    decoder_lstm = LSTM(LATENT_DIM, return_sequences=True, return_state=True, use_cudnn=False)
    decoder_outputs, _, _ = decoder_lstm(decoder_input, initial_state=encoder_states)
    decoder_dense = Dense(len(word_dict), activation="softmax")
    decoder_outputs = decoder_dense(decoder_outputs)
elif RNN_MODE == "GRU":
    # encoder
    encoder_input = Input(shape=(None, len_features))
    encoder_mask = Masking()(encoder_input)
    encoder_gru = GRU(LATENT_DIM, return_state=True, return_sequences=True, use_cudnn=False)
    encoder_outputs, state_h, state_c = encoder_gru(encoder_mask)
    encoder_states = [state_h, state_c]

    # decoder
    decoder_input = Input(shape=(None, len(word_dict)))
    decoder_gru = GRU(LATENT_DIM, return_sequences=True, return_state=True, use_cudnn=False)
    decoder_outputs, _, _ = decoder_gru(decoder_input, initial_state=encoder_states)
    decoder_dense = Dense(len(word_dict), activation="softmax")
    decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_input, decoder_input], decoder_outputs)
model.compile(optimizer=RMSprop(), loss="categorical_crossentropy", metrics=["categorical_accuracy"])
model.summary()

In [None]:
# print instance of keypoint generator
print(keypoint_generator.__getitem__(0)[0][0].shape)

model.fit(keypoint_generator, epochs=EPOCH)
model.save(MODEL_PATH)

In [None]:
# only run this code if you wish to extend the training by more epochs
if False:
    # model = keras.models.load_model(f"../model/{MODE}_model.keras")
    model.load_weights(MODEL_PATH)
    for layer in model.layers:
        layer.trainable = True
        if isinstance(layer, LSTM) or isinstance(layer, GRU):
            layer.use_cudnn = False
    # we do not need to recompile since nothing else has changed
    # model.compile(optimizer=RMSprop(), loss="categorical_crossentropy", metrics=["categorical_accuracy"])
    # model.summary()
    
    new_total_epoch = 512
    assert new_total_epoch > EPOCH
    model.fit(keypoint_generator, epochs=new_total_epoch, initial_epoch=EPOCH)
    model.save(f"../model/{MODEL_PATH}_extended_{new_total_epoch}.keras")

### Inference

Here, we attempt to decode the sequence.

In [None]:
# load the model and create encoder and decoder models
if True:
    # model.load_weights(f"../model/{MODEL_PATH}_extended_{new_total_epoch}.keras")
    model.load_weights(MODEL_PATH)

    # list all the layers
    # for i, layer in enumerate(model.layers):
    #     print(i, layer.name)
    """
    0 input_layer_12
    1 input_layer_13
    2 masking_2
    3 lstm_4
    4 lstm_5
    5 dense_2
    """

    # encoder_input = model.layers[0]
    # encoder_lstm = model.layers[3]
    # encoder_states = encoder_lstm.output

    # decoder_input = model.layers[1]
    # decoder_lstm = model.layers[4]
    # decoder_dense = model.layers[5]

encoder_model = Model(encoder_input, encoder_states)

decoder_state_input_h = Input(shape=(LATENT_DIM,))
decoder_state_input_c = Input(shape=(LATENT_DIM,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_input, initial_state=decoder_states_inputs) if RNN_MODE == "LSTM" else decoder_gru(decoder_input, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_input] + decoder_states_inputs, [decoder_outputs] + decoder_states)

# save the encoder and decoder models
encoder_model.save(ENCODER_PATH)
decoder_model.save(DECODER_PATH)
# model.save(f"../model/{MODE}_encoder_{weighted_suffix}_{BATCH_SIZE}_{EPOCH}_{LATENT_DIM}_{TRIAL}_extended_{new_total_epoch}.keras")
# model.save(f"../model/{MODE}_decoder_{weighted_suffix}_{BATCH_SIZE}_{EPOCH}_{LATENT_DIM}_{TRIAL}_extended_{new_total_epoch}.keras")

In [None]:
encoder_model.summary()
decoder_model.summary()

### Save the training results

In [None]:
# Plot the accuracy and loss against the epochs
history = model.history
plt.plot(history.history["categorical_accuracy"])
plt.title("Model accuracy")
plt.ylabel("Accuracy")
plt.xlabel("Epoch")
plt.legend(["Train", "Validation"], loc="upper left")
plt.show(ACC_PLOT_FILE_NAME)
plt.savefig()

plt.plot(history.history["loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
plt.legend(["Train", "Validation"], loc="upper left")
plt.show()
plt.savefig(LOSS_PLOT_FILE_NAME)

# Save the history of the model
hist_df = pd.DataFrame(history.history)
with open(HISTORY_FILE_NAME, mode="w+") as f:
    hist_df.to_csv(f)

# Evaluation

Use basic accuracy for now

In [None]:
def decode_sequence(input_seq):
    states_value = encoder_model.predict(input_seq)
    target_seq = np.zeros((1, 1, len(word_dict)))
    target_seq[0, 0, word_dict["<START>"]] = 1
    stop_condition = False
    decoded_sentence = []
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_word = reverse_word_dict[sampled_token_index]
        decoded_sentence.append(sampled_word)
        if sampled_word == "<END>" or len(decoded_sentence) > X_max_length:
            stop_condition = True
        target_seq = np.zeros((1, 1, len(word_dict)))
        target_seq[0, 0, sampled_token_index] = 1
        states_value = [h, c]
    return decoded_sentence

In [None]:
if False:
    encoder_model = keras.models.load_model(ENCODER_PATH)
    decoder_model = keras.models.load_model(DECODER_PATH)
    # encoder_model = keras.models.load_model(f"../model/{MODE}_encoder_{weighted_suffix}_{BATCH_SIZE}_{EPOCH}_{LATENT_DIM}_{TRIAL}_extended_{new_total_epoch}.keras")
    # decoder_model = keras.models.load_model(f"../model/{MODE}_decoder_{weighted_suffix}_{BATCH_SIZE}_{EPOCH}_{LATENT_DIM}_{TRIAL}_extended_{new_total_epoch}.keras")
    
    # disable cudnn
    for layer in encoder_model.layers:
        if isinstance(layer, LSTM) or isinstance(layer, GRU):
            layer.use_cudnn = False
    for layer in decoder_model.layers:
        if isinstance(layer, LSTM) or isinstance(layer, GRU):
            layer.use_cudnn = False

In [None]:
# Evaluation on testing data
test_generator = KeypointGenerator(test_parser.get_train_id(), test_decoder_input, test_decoder_target, X_max_length, batch_size=1)
test_x_dir = os.path.join(CACHE_DIR, "test_x")
os.makedirs(test_x_dir, exist_ok=True)

testing_results = []

with open(RESULT_FILE_NAME, "w+", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["ID", "Decoded Sentence", "Target Sentence", "Accuracy"])

    for i in range(len(test_parser.get_train_id())):
        if not os.path.exists(os.path.join(test_x_dir, f"batch_{i}.npy")) or not os.path.exists(os.path.join(test_x_dir, f"batch_{i}_decoder_target.npy")):
            (batch_x, batch_decoder_input), batch_decoder_target = test_generator.__getitem__(i, override_random=True)
            file_name = f"batch_{i}.npy"
            np.save(os.path.join(test_x_dir, file_name), batch_x)
            file_name = f"batch_{i}_decoder_target.npy"
            np.save(os.path.join(test_x_dir, file_name), batch_decoder_target)
        else:
            batch_x = np.load(os.path.join(test_x_dir, f"batch_{i}.npy"), mmap_mode="r")
            batch_decoder_target = np.load(os.path.join(test_x_dir, f"batch_{i}_decoder_target.npy"), mmap_mode="r")
        # print(batch_x.shape)
        # batch_x = np.load(os.path.join(test_x_dir, file_name), mmap_mode="r")
        decoder_sentence = decode_sequence(batch_x)
        decoder_sentence = decoder_sentence[:decoder_sentence.index("<END>")] if "<END>" in decoder_sentence else decoder_sentence
        target = [reverse_word_dict[i] for i in np.argmax(batch_decoder_target[0], axis=1)]
        target = target[:target.index("<END>")] if "<END>" in target else target
        # print(f"ID: {test_parser.get_train_id()[i]}")
        # print("Decoded sentence:", decoder_sentence)
        # print("Target sentence:", target)
        # pad the shorter sentence with <BAD> tokens
        if len(decoder_sentence) < len(target):
            decoder_sentence += ["<BAD>"] * (len(target) - len(decoder_sentence))
        elif len(decoder_sentence) > len(target):
            target += ["<BAD>"] * (len(decoder_sentence) - len(target))
        accuracy = sum([1 for i in range(len(target)) if decoder_sentence[i] == target[i]]) / len(target)
        # print("Accuracy:", accuracy)
        # print("=====================================", end="\n\n")
        result_row = [
            test_parser.get_train_id()[i],
            decoder_sentence,
            target,
            accuracy
        ]
        writer.writerow(result_row)
        testing_results.append(result_row)
        f.flush()

In [None]:
# Evaluate the total accuracy
test_acc_correct = 0
test_acc_total = 0
for row in testing_results:
    for j in range(len(row[1])):
        if row[1][j] == row[2][j]:
            test_acc_correct += 1
        test_acc_total += 1
test_acc = test_acc_correct / test_acc_total
print("Total accuracy on testing data:", test_acc)
avg_acc = sum([row[3] for row in testing_results]) / len(testing_results)
print("Average accuracy on testing data:", avg_acc)

In [None]:
# Evaluation on training data
training_results = []

train_result_file_name = RESULT_FILE_NAME.replace(".csv", "_train.csv")

batch_x_list = [x for x in sorted(os.listdir(x_dir)) if x.startswith("control")]
# print(len(batch_x_list))
# print(batch_x_list)

open(train_result_file_name, "w").close()

with open(train_result_file_name, "w+") as f:
    writer = csv.writer(f)
    writer.writerow(["ID", "Decoded Sentence", "Target Sentence", "Accuracy"])

    for batch in range(len(batch_x_list)):
        batch_x = np.load(os.path.join(x_dir, batch_x_list[batch]), mmap_mode="r")
        batch_decoder_target = np.load(os.path.join(decoder_target_dir, batch_x_list[batch]), mmap_mode="r")
        batch_length = len(batch_x)
        for i in range(batch_length):
            local_x = np.array([batch_x[i]])
            local_decoder_target = np.array(batch_decoder_target[i])

            decoder_sentence = decode_sequence(local_x)
            decoder_sentence = decoder_sentence[:decoder_sentence.index("<END>")] if "<END>" in decoder_sentence else decoder_sentence
            target = [reverse_word_dict[i] for i in np.argmax(local_decoder_target, axis=1)]
            target = target[:target.index("<END>")] if "<END>" in target else target
            # pad the shorter sentence with <BAD> tokens
            if len(decoder_sentence) < len(target):
                decoder_sentence += ["<BAD>"] * (len(target) - len(decoder_sentence))
            elif len(decoder_sentence) > len(target):
                target += ["<BAD>"] * (len(decoder_sentence) - len(target))
            accuracy = sum([1 for i in range(len(target)) if decoder_sentence[i] == target[i]]) / len(target)
            result_row = [
                train_parser.get_train_id()[batch * BATCH_SIZE + i],
                decoder_sentence,
                target,
                accuracy
            ]
            writer.writerow(result_row)
            training_results.append(result_row)
            f.flush()