# Chollet chp11 p4, seq-to-seq, Colab GPU run 2.  
Author: Jennifer E Yoon  
Date: April 2, 2022 11pm   
Run 2 on Colab, initial work through all without stopping and experimenting.  

### My Note:  
Date: June 12, 2022  
This one has end-to-end Transformer encoder and decoder example, using English to Spanish translations.  
Would be easier if I know a bit of Spanish. French and English examples are easier since I know a bit of French.  
Try this with French?  Korean?  
How is the positional embedding mask used?  
Need to carefully step through code in this notebook.  
Look at saved model outputs, saved to outside of repo.  
Jennifer Yoon

## Beyond text classification: Sequence-to-sequence learning

### A machine translation example

In [1]:
!wget http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip
!unzip -q spa-eng.zip

--2022-04-03 03:25:11--  http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 64.233.189.128, 108.177.97.128, 108.177.125.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|64.233.189.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2638744 (2.5M) [application/zip]
Saving to: ‘spa-eng.zip’


2022-04-03 03:25:11 (106 MB/s) - ‘spa-eng.zip’ saved [2638744/2638744]



In [2]:
text_file = "spa-eng/spa.txt"
with open(text_file) as f:
    lines = f.read().split("\n")[:-1]
text_pairs = []
for line in lines:
    english, spanish = line.split("\t")
    spanish = "[start] " + spanish + " [end]"
    text_pairs.append((english, spanish))

In [8]:
text_pairs[0:10]

[('Everyone was listening very carefully.',
  '[start] Todos estaban escuchando atentamente. [end]'),
 ('Tom said that he wanted to learn French.',
  '[start] Tom dijo que quería aprender francés. [end]'),
 ('My cigar went out. Will you give me a light?',
  '[start] Se me apagó el cigarro. ¿Quiere usted darme lumbre? [end]'),
 ('He gets tired easily.', '[start] Se cansa con facilidad. [end]'),
 ("I don't believe he is a lawyer.", '[start] Creo que no es abogado. [end]'),
 ('I thought you and Tom were friends.',
  '[start] Pensé que Tom y vos eran amigos. [end]'),
 ('Tom is a fighter.', '[start] Tom es un luchador. [end]'),
 ('Everyone in the class learned the poem by heart.',
  '[start] Todos en la clase aprendieron el poema de memoria. [end]'),
 ('How much did the glasses cost?',
  '[start] ¿Cuánto costaron los lentes? [end]'),
 ('The money was all there. Nobody touched it.',
  '[start] Estaba íntegro el dinero, nadie lo tocó. [end]')]

In [7]:
import random
print(random.choice(text_pairs))

('I saw Tom on Monday.', '[start] Tom y yo nos vimos el lunes. [end]')


In [5]:
import random
random.shuffle(text_pairs)
num_val_samples = int(0.15 * len(text_pairs))
num_train_samples = len(text_pairs) - 2 * num_val_samples
train_pairs = text_pairs[:num_train_samples]
val_pairs = text_pairs[num_train_samples:num_train_samples + num_val_samples]
test_pairs = text_pairs[num_train_samples + num_val_samples:]

In [9]:
val_pairs[0:3]

[("I still can't dance.", '[start] Sigo sin saber bailar. [end]'),
 ('We count on you.', '[start] Contamos con ustedes. [end]'),
 ('Do you speak Swahili?', '[start] ¿Hablas suajili? [end]')]

In [10]:
test_pairs[0:3]

[('The celebrations culminated in a spectacular fireworks display.',
  '[start] El festival terminó con una espectacular exhibición de fuegos artificiales. [end]'),
 ("I've made a list of things I'd like to buy.",
  '[start] He hecho una lista de las cosas que me gustaría comprar. [end]'),
 ('Never tell a lie!', '[start] ¡No mientas nunca! [end]')]

In [11]:
train_pairs[0:3]

[('Everyone was listening very carefully.',
  '[start] Todos estaban escuchando atentamente. [end]'),
 ('Tom said that he wanted to learn French.',
  '[start] Tom dijo que quería aprender francés. [end]'),
 ('My cigar went out. Will you give me a light?',
  '[start] Se me apagó el cigarro. ¿Quiere usted darme lumbre? [end]')]

**Vectorizing the English and Spanish text pairs**

In [13]:
import tensorflow as tf
import string
import re
from tensorflow import keras  # edit, add import here
from tensorflow.keras import layers  # edit, add import here, error: "layers not defined".

strip_chars = string.punctuation + "¿"
strip_chars = strip_chars.replace("[", "")
strip_chars = strip_chars.replace("]", "")

def custom_standardization(input_string):
    lowercase = tf.strings.lower(input_string)
    return tf.strings.regex_replace(
        lowercase, f"[{re.escape(strip_chars)}]", "")

vocab_size = 15000
sequence_length = 20

source_vectorization = layers.TextVectorization(
    max_tokens=vocab_size,
    output_mode="int",
    output_sequence_length=sequence_length,
)
target_vectorization = layers.TextVectorization(
    max_tokens=vocab_size,
    output_mode="int",
    output_sequence_length=sequence_length + 1,
    standardize=custom_standardization,
)
train_english_texts = [pair[0] for pair in train_pairs]
train_spanish_texts = [pair[1] for pair in train_pairs]
source_vectorization.adapt(train_english_texts)
target_vectorization.adapt(train_spanish_texts)

In [14]:
train_english_texts[0]


'Everyone was listening very carefully.'

In [15]:
train_spanish_texts[0]

'[start] Todos estaban escuchando atentamente. [end]'

In [17]:
target_vectorization?

**Preparing datasets for the translation task**

In [18]:
batch_size = 64

def format_dataset(eng, spa):
    eng = source_vectorization(eng)
    spa = target_vectorization(spa)
    return ({
        "english": eng,
        "spanish": spa[:, :-1],
    }, spa[:, 1:])

def make_dataset(pairs):
    eng_texts, spa_texts = zip(*pairs)
    eng_texts = list(eng_texts)
    spa_texts = list(spa_texts)
    dataset = tf.data.Dataset.from_tensor_slices((eng_texts, spa_texts))
    dataset = dataset.batch(batch_size)
    dataset = dataset.map(format_dataset, num_parallel_calls=4)
    return dataset.shuffle(2048).prefetch(16).cache()

train_ds = make_dataset(train_pairs)
val_ds = make_dataset(val_pairs)

In [19]:
for inputs, targets in train_ds.take(1):
    print(f"inputs['english'].shape: {inputs['english'].shape}")
    print(f"inputs['spanish'].shape: {inputs['spanish'].shape}")
    print(f"targets.shape: {targets.shape}")

inputs['english'].shape: (64, 20)
inputs['spanish'].shape: (64, 20)
targets.shape: (64, 20)


### Sequence-to-sequence learning with RNNs

**GRU-based encoder**

In [20]:
from tensorflow import keras
from tensorflow.keras import layers

embed_dim = 256
latent_dim = 1024

source = keras.Input(shape=(None,), dtype="int64", name="english")
x = layers.Embedding(vocab_size, embed_dim, mask_zero=True)(source)
encoded_source = layers.Bidirectional(
    layers.GRU(latent_dim), merge_mode="sum")(x)

**GRU-based decoder and the end-to-end model**

In [21]:
past_target = keras.Input(shape=(None,), dtype="int64", name="spanish")
x = layers.Embedding(vocab_size, embed_dim, mask_zero=True)(past_target)
decoder_gru = layers.GRU(latent_dim, return_sequences=True)
x = decoder_gru(x, initial_state=encoded_source)
x = layers.Dropout(0.5)(x)
target_next_step = layers.Dense(vocab_size, activation="softmax")(x)
seq2seq_rnn = keras.Model([source, past_target], target_next_step)

**Training our recurrent sequence-to-sequence model**

In [22]:
seq2seq_rnn.compile(
    optimizer="rmsprop",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"])
seq2seq_rnn.fit(train_ds, epochs=15, validation_data=val_ds)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f61f0409290>

**Translating new sentences with our RNN encoder and decoder**

In [23]:
import numpy as np
spa_vocab = target_vectorization.get_vocabulary()
spa_index_lookup = dict(zip(range(len(spa_vocab)), spa_vocab))
max_decoded_sentence_length = 20

def decode_sequence(input_sentence):
    tokenized_input_sentence = source_vectorization([input_sentence])
    decoded_sentence = "[start]"
    for i in range(max_decoded_sentence_length):
        tokenized_target_sentence = target_vectorization([decoded_sentence])
        next_token_predictions = seq2seq_rnn.predict(
            [tokenized_input_sentence, tokenized_target_sentence])
        sampled_token_index = np.argmax(next_token_predictions[0, i, :])
        sampled_token = spa_index_lookup[sampled_token_index]
        decoded_sentence += " " + sampled_token
        if sampled_token == "[end]":
            break
    return decoded_sentence

test_eng_texts = [pair[0] for pair in test_pairs]
for _ in range(20):
    input_sentence = random.choice(test_eng_texts)
    print("-")
    print(input_sentence)
    print(decode_sequence(input_sentence))

-
There are lots of things to do.
[start] hay muchas cosas que hacer [end]
-
I sometimes feel hungry in the middle of the night.
[start] a veces me [UNK] en hambre de la noche [end]
-
I am thinking about that matter.
[start] estoy pensando en eso [end]
-
I have a car.
[start] tengo un coche [end]
-
He got up to see if he had turned off the light in the kitchen.
[start] Él se [UNK] a ver si la [UNK] se lo en la televisión [end]
-
It was a dangerous time.
[start] fue un momento [end]
-
When I was your age, I was already married.
[start] cuando era tu edad ya estaba [end]
-
What's the secret to success?
[start] cuál es el secreto de su éxito [end]
-
He took a day off.
[start] Él se tomó un día [end]
-
We're surrounded.
[start] estamos de casa [end]
-
Don't ask me why but, he ran away when he saw me.
[start] no me [UNK] por qué me dijo cuando él me había dado [end]
-
I don't remember anyone named Tom.
[start] no recuerdo a tom nadie [end]
-
I missed my bus this morning.
[start] perdí mi vi

### Sequence-to-sequence learning with Transformer

#### The Transformer decoder

**The `TransformerDecoder`**

In [24]:
class TransformerDecoder(layers.Layer):
    def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):
        super().__init__(**kwargs)
        self.embed_dim = embed_dim
        self.dense_dim = dense_dim
        self.num_heads = num_heads
        self.attention_1 = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim)
        self.attention_2 = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim)
        self.dense_proj = keras.Sequential(
            [layers.Dense(dense_dim, activation="relu"),
             layers.Dense(embed_dim),]
        )
        self.layernorm_1 = layers.LayerNormalization()
        self.layernorm_2 = layers.LayerNormalization()
        self.layernorm_3 = layers.LayerNormalization()
        self.supports_masking = True

    def get_config(self):
        config = super().get_config()
        config.update({
            "embed_dim": self.embed_dim,
            "num_heads": self.num_heads,
            "dense_dim": self.dense_dim,
        })
        return config

    def get_causal_attention_mask(self, inputs):
        input_shape = tf.shape(inputs)
        batch_size, sequence_length = input_shape[0], input_shape[1]
        i = tf.range(sequence_length)[:, tf.newaxis]
        j = tf.range(sequence_length)
        mask = tf.cast(i >= j, dtype="int32")
        mask = tf.reshape(mask, (1, input_shape[1], input_shape[1]))
        mult = tf.concat(
            [tf.expand_dims(batch_size, -1),
             tf.constant([1, 1], dtype=tf.int32)], axis=0)
        return tf.tile(mask, mult)

    def call(self, inputs, encoder_outputs, mask=None):
        causal_mask = self.get_causal_attention_mask(inputs)
        if mask is not None:
            padding_mask = tf.cast(
                mask[:, tf.newaxis, :], dtype="int32")
            padding_mask = tf.minimum(padding_mask, causal_mask)
        attention_output_1 = self.attention_1(
            query=inputs,
            value=inputs,
            key=inputs,
            attention_mask=causal_mask)
        attention_output_1 = self.layernorm_1(inputs + attention_output_1)
        attention_output_2 = self.attention_2(
            query=attention_output_1,
            value=encoder_outputs,
            key=encoder_outputs,
            attention_mask=padding_mask,
        )
        attention_output_2 = self.layernorm_2(
            attention_output_1 + attention_output_2)
        proj_output = self.dense_proj(attention_output_2)
        return self.layernorm_3(attention_output_2 + proj_output)

**Transformer encoder** implemented as a subclassed Layer  
Copied from Chapter 11 Part3 Transformer notebook

In [30]:
# TransformerEncoder class definition  
#import tensorflow as tf
#from tensorflow import keras
#from tensorflow.keras import layers

# Copied from p3 Transformers notebook 

class TransformerEncoder(layers.Layer):
    def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):
        super().__init__(**kwargs)
        self.embed_dim = embed_dim
        self.dense_dim = dense_dim
        self.num_heads = num_heads
        self.attention = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim)
        self.dense_proj = keras.Sequential(
            [layers.Dense(dense_dim, activation="relu"),
             layers.Dense(embed_dim),]
        )
        self.layernorm_1 = layers.LayerNormalization()
        self.layernorm_2 = layers.LayerNormalization()

    def call(self, inputs, mask=None):
        if mask is not None:
            mask = mask[:, tf.newaxis, :]
        attention_output = self.attention(
            inputs, inputs, attention_mask=mask)
        proj_input = self.layernorm_1(inputs + attention_output)
        proj_output = self.dense_proj(proj_input)
        return self.layernorm_2(proj_input + proj_output)

    def get_config(self):
        config = super().get_config()
        config.update({
            "embed_dim": self.embed_dim,
            "num_heads": self.num_heads,
            "dense_dim": self.dense_dim,
        })
        return config

#### Putting it all together: A Transformer for machine translation

**PositionalEmbedding layer**

In [25]:
class PositionalEmbedding(layers.Layer):
    def __init__(self, sequence_length, input_dim, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.token_embeddings = layers.Embedding(
            input_dim=input_dim, output_dim=output_dim)
        self.position_embeddings = layers.Embedding(
            input_dim=sequence_length, output_dim=output_dim)
        self.sequence_length = sequence_length
        self.input_dim = input_dim
        self.output_dim = output_dim

    def call(self, inputs):
        length = tf.shape(inputs)[-1]
        positions = tf.range(start=0, limit=length, delta=1)
        embedded_tokens = self.token_embeddings(inputs)
        embedded_positions = self.position_embeddings(positions)
        return embedded_tokens + embedded_positions

    def compute_mask(self, inputs, mask=None):
        return tf.math.not_equal(inputs, 0)

    def get_config(self):
        config = super(PositionalEmbedding, self).get_config()
        config.update({
            "output_dim": self.output_dim,
            "sequence_length": self.sequence_length,
            "input_dim": self.input_dim,
        })
        return config

**End-to-end Transformer**

In [31]:
embed_dim = 256
dense_dim = 2048
num_heads = 8

encoder_inputs = keras.Input(shape=(None,), dtype="int64", name="english")
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(encoder_inputs)
encoder_outputs = TransformerEncoder(embed_dim, dense_dim, num_heads)(x)
# TransformerEncoder() not defined.  "De"coder? No.

decoder_inputs = keras.Input(shape=(None,), dtype="int64", name="spanish")
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs)
x = TransformerDecoder(embed_dim, dense_dim, num_heads)(x, encoder_outputs)
x = layers.Dropout(0.5)(x)
decoder_outputs = layers.Dense(vocab_size, activation="softmax")(x)
transformer = keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)

**Training the sequence-to-sequence Transformer**

In [32]:
transformer.compile(
    optimizer="rmsprop",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"])
transformer.fit(train_ds, epochs=30, validation_data=val_ds)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7f617b754550>

**Translating new sentences with our Transformer model**

In [33]:
import numpy as np
spa_vocab = target_vectorization.get_vocabulary()
spa_index_lookup = dict(zip(range(len(spa_vocab)), spa_vocab))
max_decoded_sentence_length = 20

def decode_sequence(input_sentence):
    tokenized_input_sentence = source_vectorization([input_sentence])
    decoded_sentence = "[start]"
    for i in range(max_decoded_sentence_length):
        tokenized_target_sentence = target_vectorization(
            [decoded_sentence])[:, :-1]
        predictions = transformer(
            [tokenized_input_sentence, tokenized_target_sentence])
        sampled_token_index = np.argmax(predictions[0, i, :])
        sampled_token = spa_index_lookup[sampled_token_index]
        decoded_sentence += " " + sampled_token
        if sampled_token == "[end]":
            break
    return decoded_sentence

test_eng_texts = [pair[0] for pair in test_pairs]
for _ in range(20):
    input_sentence = random.choice(test_eng_texts)
    print("-")
    print(input_sentence)
    print(decode_sequence(input_sentence))

-
I found this restaurant by chance.
[start] encontré este restaurante de oportunidad [end]
-
Do whatever makes you happy.
[start] haz lo que te haga feliz [end]
-
I don't know my neighbors.
[start] no sé mis a los países [end]
-
I need to know what happened to Tom.
[start] necesito saber qué le pasó a tom [end]
-
There are some misprints, but all in all, it's a good book.
[start] hay algunos [UNK] pero todas [end]
-
Do you believe war will start?
[start] crees que la guerra de [UNK] [end]
-
We're still friends.
[start] todavía somos amigos amigos [end]
-
You are tired, aren't you?
[start] estás cansado verdad [end]
-
I came as soon as I could.
[start] hice tanto como pude [end]
-
I sleep in my room.
[start] yo voy de la habitación a en mi habitación [end]
-
The cottage reminded me of the happy times I had spent with her.
[start] la casa me [UNK] por los veces que había dado con él había dado [end]
-
A tsunami is coming, so please be on the alert.
[start] un [UNK] está en el lugar por 

## Summary