# How well can we do with just a fixed prediction?

The goal of this notebook is two-fold:
1. Understanding the _baseline_ performance on the task: the normalised Levenshtein distance is unusual and I wanted to get a feel for what the minimum reasonable LB score is. **Understanding the metric is vital to understand how to build the best models.**
2. Understanding the minimum possible submission. This is my first time using TFLite, and so making this baseline helped understand what steps are needed.

Thank you to @wonderingalice for their minimal submission notebook! https://www.dataset.com/code/wonderingalice/working-sample-submission-and-inference

In [1]:
import tensorflow as tf
from tensorflow.keras import layers, optimizers, constraints, regularizers
import numpy as np
import pandas as pd
import json

print("Tensorflow", tf.__version__)
!python --version

basedir = "/dataset/working/"
NUM_CHARACTERS = 59

caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']


Tensorflow 2.12.0
Python 3.10.10


In [2]:
df_train = pd.read_csv('/dataset/input/asl-fingerspelling/train.csv')
# Dummy features: we don't actually use any features
SEL_FEATURES = ['x_right_hand_0','y_right_hand_0']

c2p = json.load(open('/dataset/input/asl-fingerspelling/character_to_prediction_index.json', 'r'))
p2c = {p: c for c, p in c2p.items()}

## Finding the best constant prediction

[Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) gives us the smallest number of insertions, deletions and **substitutions** possible between our predicted string as the actual string.

**There's a subtle point here:** a substitution is the same as a deletion, so if we know the average string has 12 characters, it makes sense to predict 12 characters. Any character you predict which ends up in the actual string gets you a point, while any character you mis-predict gets you the same as if you made no prediction. 

This is different to other metrics that incorporate recall, where over-predicting can harm you.

To find the "average" string, which has the shortest distance to all the strings in the dataset, we use a greedy algorithm. We start with the empty string, and repeatedly find the best single character to insert _anywhere_ in the string, until we can no longer improve the training score.

In [3]:
from Levenshtein import distance
ally = df_train['phrase'].values
totaly = sum([len(y) for y in ally])

# Evaluate a constant prediction on the training set
def eval_string(s):
    d = 0
    for y in ally:
        d += distance(s, y)
    return (totaly - d) / totaly

# Greedy algorithm
best_str = ''
best_score = 0
chars = list(c2p.keys())

for i in range(20): # max length
    inner_best = best_str
    inner_best_score = best_score
    
    for position in range(len(best_str)+1): # at all insertion points
        for newchar in chars:               # try all characters
            new_str = best_str[:position] + str(newchar) + best_str[position:]
            score = eval_string(new_str)

            if score > inner_best_score:
                inner_best = new_str
                inner_best_score = score
                print(f'New best @ {len(inner_best)}="{inner_best}", score {inner_best_score:.4f}')

    if best_score >= inner_best_score:
        print('No improvement, best is', best_str)
        break
        
    best_str = inner_best
    best_score = inner_best_score
    print(f'Best str @ {len(best_str)}="{best_str}", score {best_score:.4f}')

New best @ 1=" ", score 0.0246
New best @ 1="a", score 0.0324
Best str @ 1="a", score 0.0324
New best @ 2=" a", score 0.0549
New best @ 2="ea", score 0.0563
New best @ 2="ae", score 0.0588
Best str @ 2="ae", score 0.0588
New best @ 3=" ae", score 0.0802
Best str @ 3=" ae", score 0.0802
New best @ 4="  ae", score 0.0896
New best @ 4="- ae", score 0.0966
New best @ 4=" oae", score 0.0970
New best @ 4=" aoe", score 0.0974
New best @ 4=" are", score 0.0982
Best str @ 4=" are", score 0.0982
New best @ 5="  are", score 0.1052
New best @ 5="- are", score 0.1134
Best str @ 5="- are", score 0.1134
New best @ 6=" - are", score 0.1209
New best @ 6="-- are", score 0.1259
New best @ 6="-e are", score 0.1262
Best str @ 6="-e are", score 0.1262
New best @ 7=" -e are", score 0.1356
New best @ 7="a-e are", score 0.1364
New best @ 7="-e- are", score 0.1382
New best @ 7="-e -are", score 0.1385
Best str @ 7="-e -are", score 0.1385
New best @ 8=" -e -are", score 0.1464
New best @ 8="a-e -are", score 0.1476

## Turning this into a TFLite model

I found that the path of least resistance here was to turn this constant prediction into a Keras model, and then convert that into TFLie. It's unclear to me what operations are allowed in TFLite models, but this provides a framework for embedding any code you might want into an arbitrary Keras layer.

It appears that dataset's evalution work like this:
- Your model is run **once per video** in the test set (each video is one batch).
- You receive an input of shape `(N_FRAMES, N_FEATURES)`. The normal "time-series" way of doing this would be `(1, N_FRAMES, N_FEATURES)`, so note this is different.
- You return an output of shape `(N_CHARS, 59)` where 59 is the number of possible characters. `N_CHARS` is up to your model, in our case it's constant.
- Evaluation is done on **the argmax of your predictions**, so it doesn't matter what the actual probability is

In [4]:
const_pred = np.zeros((len(best_str), 59))
for i, c in enumerate(best_str):
    const_pred[i, c2p[c]] = 1

In [5]:
# Define the custom layer
from tensorflow.keras.layers import Layer, Input
class ConstantLayer(Layer):
    def __init__(self, constant_vector, name=None):
        super(ConstantLayer, self).__init__(name=name)
        self.constant_vector = tf.Variable(initial_value=constant_vector, trainable=False, dtype=tf.float32)

    def call(self, inputs):
        return self.constant_vector

In [6]:
input_layer = Input(shape=(len(SEL_FEATURES),), name='inputs')  # Let's assume we are inputting vectors of size 10
output_layer = ConstantLayer(const_pred, name='outputs')(input_layer)

model = tf.keras.models.Model(inputs=input_layer, outputs=output_layer)

In [7]:
tflite_model = tf.lite.TFLiteConverter.from_keras_model(model).convert()
model_path = 'model.tflite'

with open(model_path, 'wb') as f:
    f.write(tflite_model)

!zip submission.zip  './model.tflite' './inference_args.json'

  adding: model.tflite (deflated 86%)


Overall, we see 0.160 on CV, and 0.157 on LB - pretty consistent!