## Section 11: Inference from the Transformer Model

Now, we are finally going to use the test set to see how good our trained model is.

Because our model involves some custom made layers and functions in the model, we need to create a custom object scope to load the saved model.

The transformer model can give you a token index. We need the vectorizer to look up the word that this index represents. We have to reuse the same vectorizer that you used in creating the dataset to maintain consistency.

Create a loop to scan the generated tokens. In other words, do not use the model to generate the entire translated sentence but consider only the next generated word in the sentence until you see the end sentinel. The first generated word would be the one generated by the start sentinel. It is the reason you processed the target sentences this way in Section 2.

The code is as follows:

In [22]:
import pickle
import tensorflow as tf
from positional_encoding import pos_enc_matrix, PositionalEmbedding
# from learning_schedule import CustomSchedule
from learning_params import CustomSchedule, masked_loss, masked_accuracy

with open("key_values.pickle", "rb") as fp:
    key_vals = pickle.load(fp)

with open(f"vectorized_ENGvoc_{key_vals['vocab_size_eng']}_ITAvoc_{key_vals['vocab_size_ita']}_seqLen_{key_vals['seq_len']}.pickle", "rb") as fp:
    data = pickle.load(fp)    
    
with open("optimizer_values.pickle", "rb") as fp:
    optim_vals = pickle.load(fp)

with open("customSchedule.pickle", "rb") as fp:
    schedule = pickle.load(fp)

In [11]:
from tensorflow.keras.layers import TextVectorization

eng_vectorizer = TextVectorization.from_config(data["engvec_config"])
eng_vectorizer.set_weights(data["engvec_weights"])
eng_vectorizer.set_vocabulary(data["engvec_vocabulary"])
ita_vectorizer = TextVectorization.from_config(data["itavec_config"])
ita_vectorizer.set_weights(data["itavec_weights"])
ita_vectorizer.set_vocabulary(data["itavec_vocabulary"])

In [23]:
custom_objects = {"PositionalEmbedding": PositionalEmbedding,
                  "CustomSchedule": CustomSchedule,
                  "masked_loss": masked_loss,
                  "masked_accuracy": masked_accuracy}
with tf.keras.utils.custom_object_scope(custom_objects):
    model = tf.keras.models.load_model("ENG-ITA-transformer.h5")

In [33]:
def re_sub_accented(string):
    """Revert Italian accents"""
    string = re.sub(r'aa', 'à', string)
    string = re.sub(r'ee', 'è', string)
    string = re.sub(r'ii', 'ì', string)
    string = re.sub(r'oo', 'ò', string)
    string = re.sub(r'uu', 'ù', string)
    string = re.sub(r'aaa', 'á', string)
    string = re.sub(r'eee', 'é', string)
    string = re.sub(r'iii', 'í', string)
    string = re.sub(r'ooo', 'ó', string)
    string = re.sub(r'uuu', 'ú', string)  
    return string

def translate(sentence):
    """Create the translated sentence"""
    enc_tokens = eng_vectorizer([sentence])
    #lookup = list(ita_vectorizer.get_vocabulary())
    lookup = list(data["itavec_vocabulary"])
    start_sentinel, end_sentinel = "[start]", "[end]"
    output_sentence = [start_sentinel]
    # generate the translated sentence word by word
    for i in range(key_vals["seq_len"]):
        vector = ita_vectorizer([" ".join(output_sentence)])
        assert vector.shape == (1, key_vals["seq_len"]+1)
        dec_tokens = vector[:, :-1]
        assert dec_tokens.shape == (1, key_vals["seq_len"])
        pred = model([enc_tokens, dec_tokens])
        assert pred.shape == (1, key_vals["seq_len"], key_vals["vocab_size_ita"])
        word = lookup[np.argmax(pred[0, i, :])]
        word = re_sub_accented(word)
        output_sentence.append(word)
        if word == end_sentinel:
            break
    return output_sentence

In [34]:
import random
import numpy as np
import re

test_count = 20
random.seed(0)
for n in range(test_count):
    english_sentence, italian_sentence = random.choice(data["test"])
    translated = translate(english_sentence)
    italian_sentence = re_sub_accented(italian_sentence)
    #translated = re_sub_accented(translated)
    print(f"Test {n}:")
    print(f"{english_sentence}")
    print(f"== {italian_sentence}")
    print(f"-> {' '.join(translated)}")
    print()

Test 0:
it is time to leave .
== [start] è ora di partire . [end]
-> [start] è andarsene . [end]

Test 1:
i like studying english .
== [start] mi piace studiare inglese . [end]
-> [start] a me piace studiare l inglese . [end]

Test 2:
do you want anything ?
== [start] vuole qualcosa ? [end]
-> [start] vuoi qualcosa ? [end]

Test 3:
do not tell my mom .
== [start] non lo dire a mia mamma . [end]
-> [start] non lo dire a mia mamma . [end]

Test 4:
what country were you born in ?
== [start] in quale paese è nata ? [end]
-> [start] in quale paese sei nato ? [end]

Test 5:
he rarely stays home on sunday .
== [start] lui resta raramente a casa la domenica . [end]
-> [start] lui sta raramente a casa la domenica . [end]

Test 6:
i am going to win this time .
== [start] vincerò stavolta . [end]
-> [start] io vincerà questa è è è è è è è è è è è è è è è è è

Test 7:
how did we come here ?
== [start] noi come siamo venuti qui ? [end]
-> [start] come siamo venuti qui ? [end]

Test 8:
we sent you a

The second line of each test is the expected output while the third line is the output from the transformer.

The token [UNK] means “unknown” or out-of-vocabulary, which should appear rarely. Comparing the output, you should see the result is quite accurate. It will not be perfect. For example, ...

You generated the translated sentence word by word, but indeed the transformer outputs the entire sentence in one shot. You should try to modify the program to decode the entire transformer output pred in the for-loop to see how the transformer gives you a better sentence as you provide more leading words in dec_tokens.