# English-to-Spanish translation with a sequence-to-sequence Transformer

**Author:** [fchollet](https://twitter.com/fchollet)<br>
**Date created:** 2021/05/26<br>
**Last modified:** 2024/11/18<br>
**Description:** Implementing a sequence-to-sequence Transformer and training it on a machine translation task.

## Introduction

In this example, we'll build a sequence-to-sequence Transformer model, which
we'll train on an English-to-Spanish machine translation task.

You'll learn how to:

- Vectorize text using the Keras `TextVectorization` layer.
- Implement a `TransformerEncoder` layer, a `TransformerDecoder` layer,
and a `PositionalEmbedding` layer.
- Prepare data for training a sequence-to-sequence model.
- Use the trained model to generate translations of never-seen-before
input sentences (sequence-to-sequence inference).

The code featured here is adapted from the book
[Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition)
(chapter 11: Deep learning for text).
The present example is fairly barebones, so for detailed explanations of
how each building block works, as well as the theory behind Transformers,
I recommend reading the book.

## Setup

In [None]:
# We set the backend to TensorFlow. The code works with
# both `tensorflow` and `torch`. It does not work with JAX
# due to the behavior of `jax.numpy.tile` in a jit scope
# (used in `TransformerDecoder.get_causal_attention_mask()`:
# `tile` in JAX does not support a dynamic `reps` argument.
# You can make the code work in JAX by wrapping the
# inside of the `get_causal_attention_mask` method in
# a decorator to prevent jit compilation:
# `with jax.ensure_compile_time_eval():`.
import os

os.environ["KERAS_BACKEND"] = "tensorflow"

import pathlib
import random
import string
import re
import numpy as np

import tensorflow.data as tf_data
import tensorflow.strings as tf_strings

import keras
from keras import layers
from keras import ops
from keras.layers import TextVectorization

## Downloading the data

We'll be working with an English-to-Spanish translation dataset
provided by [Anki](https://www.manythings.org/anki/). Let's download it:

In [None]:
pwd()

'/content/drive/MyDrive/Diplomado_IA/NLP/Keras Translator'

In [None]:
text_file = keras.utils.get_file(
    fname="spa-eng.zip",
    origin="http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip",
    extract=True,
)
text_file = pathlib.Path(text_file).parent / "spa-eng_extracted" / "spa-eng" / "spa.txt"

In [None]:
text_file

PosixPath('/root/.keras/datasets/spa-eng_extracted/spa-eng/spa.txt')

## Parsing the data

Each line contains an English sentence and its corresponding Spanish sentence.
The English sentence is the *source sequence* and Spanish one is the *target sequence*.
We prepend the token `"[start]"` and we append the token `"[end]"` to the Spanish sentence.

In [None]:
with open(text_file) as f:
    lines = f.read().split("\n")[:-1]
text_pairs = []
for line in lines:
    eng, spa = line.split("\t")
    spa = "[start] " + spa + " [end]"
    text_pairs.append((eng, spa))

In [None]:
with open(text_file) as f:
    lines = f.read().split("\n")[:-1]
eng_vocab = []
for line in lines:
    eng = line.split("\t")
    eng_vocab.append(eng)

In [None]:
for _ in range(5):
    print(random.choice(eng_vocab))

['She approved of the wedding.', 'Ella aprobó la boda.']
['I outsmarted you.', 'Fui más listo que tú.']
['Thank you for returning my call.', 'Gracias por llamarme de vuelta.']
["He won't beat me.", 'Él no me va a pegar.']
['My son came to my room.', 'Mi hijo vino a mi cuarto.']


In [None]:
only_eng = []
for line in lines:
    eng = line.split("\t")[0]
    only_eng.append(eng)
only_eng

['Go.',
 'Go.',
 'Go.',
 'Go.',
 'Hi.',
 'Run!',
 'Run.',
 'Who?',
 'Fire!',
 'Fire!',
 'Fire!',
 'Help!',
 'Help!',
 'Help!',
 'Jump!',
 'Jump.',
 'Stop!',
 'Stop!',
 'Stop!',
 'Wait!',
 'Wait.',
 'Go on.',
 'Go on.',
 'Hello!',
 'I ran.',
 'I ran.',
 'I try.',
 'I won!',
 'Oh no!',
 'Relax.',
 'Smile.',
 'Attack!',
 'Attack!',
 'Get up.',
 'Go now.',
 'Got it!',
 'Got it?',
 'Got it?',
 'He ran.',
 'Hop in.',
 'Hug me.',
 'I fell.',
 'I know.',
 'I left.',
 'I lied.',
 'I lost.',
 'I quit.',
 'I quit.',
 'I work.',
 "I'm 19.",
 "I'm up.",
 'Listen.',
 'Listen.',
 'Listen.',
 'No way!',
 'No way!',
 'No way!',
 'No way!',
 'No way!',
 'No way!',
 'No way!',
 'No way!',
 'No way!',
 'No way!',
 'Really?',
 'Really?',
 'Thanks.',
 'Thanks.',
 'Try it.',
 'We try.',
 'We won.',
 'Why me?',
 'Ask Tom.',
 'Awesome!',
 'Be calm.',
 'Be cool.',
 'Be fair.',
 'Be kind.',
 'Be nice.',
 'Beat it.',
 'Call me.',
 'Call me.',
 'Call me.',
 'Call us.',
 'Come in.',
 'Come in.',
 'Come in.',
 'Come

In [None]:
len(only_eng)

118964

In [None]:
unique_eng_words = list(set(only_eng))

In [None]:
unique_eng_words

["Tom's dog woke him up a little after midnight.",
 'Get in the car now.',
 "Why didn't you call me yesterday evening?",
 'My father is an expert surgeon.',
 'Tom cries every time he hears this song.',
 'Keep your pants on.',
 'Tom needs Mary.',
 "It's my dream to have a son who'll take over my business when I retire.",
 'Asians generally have black hair.',
 'Tom played with his kids.',
 'All you need to do is just sit here.',
 'The church was built hundreds of years ago.',
 'I was born in Tokyo on the eighth of January in 1950.',
 "I'm unemployed.",
 'Tom got what he wanted.',
 'The kite got caught in the tree.',
 'Were you watching?',
 "He's a bit jealous.",
 "Isn't that a little harsh?",
 'Turn up the TV.',
 'Tom was looking for some people to help him move his piano.',
 'The typhoon caused the river to flood.',
 'I like Chinese food a lot.',
 'Who was the telephone invented by?',
 'Do you remember the town in which he was born?',
 "He's his own boss.",
 'People like to talk.',
 'Pe

In [None]:
len(unique_eng_words) ## list of unique

102904

In [None]:
random.shuffle(unique_eng_words)
num_val_samples_eng = int(0.15 * len(unique_eng_words))
num_train_samples_eng = len(unique_eng_words) - 2 * num_val_samples_eng
train_eng = unique_eng_words[:num_train_samples_eng]
val_eng = unique_eng_words[num_train_samples_eng : num_train_samples_eng + num_val_samples_eng]
test_eng = unique_eng_words[num_train_samples_eng + num_val_samples_eng :]

print(f"{len(unique_eng_words)} total of unique_eng_words")
print(f"{len(train_eng)} training eng")
print(f"{len(val_eng)} validation eng")
print(f"{len(test_eng)} test eng")

102904 total of unique_eng_words
72034 training eng
15435 validation eng
15435 test eng


In [None]:
test_eng

['I want you to write to me as soon as you get there.',
 "I'm not laughing.",
 'Our water pipes burst.',
 "I'd rather go out than stay indoors.",
 "I'd like to ask Tom if he feels the same way.",
 'The airbag saved my life.',
 "You can't defeat Tom alone.",
 'Maybe you should stop reading romance novels.',
 'Iron is used in building ships.',
 "I don't know what this symbol stands for.",
 'Is that a picture of me?',
 "Tom couldn't speak French well.",
 "Let's roast the chestnuts.",
 'I want to find out who did this.',
 'Murder is against the law.',
 "Tom can't sit still for a moment.",
 'The letter does not say what time she will come up to Tokyo.',
 "I'm going up to the bar for a drink, and I suggest you do the same.",
 'Tom is the man of the house.',
 'I have two cars.',
 'Is Tom cured?',
 'I am willing to agree to your request.',
 "It's my full-time job.",
 'Keep in mind that you must die.',
 'Tom wrote the report.',
 'He was looking forward to spending the weekend with her in their 

In [None]:

def format_dataset(eng, spa):
    eng = eng_vectorization(eng)
    return (
        {
            "encoder_inputs": eng,
        }

    )


def make_dataset(pairs):
    eng_texts = zip(*pairs)
    eng_texts = list(eng_texts)
    dataset = tf_data.Dataset.from_tensor_slices(eng_texts)

    return dataset.cache().shuffle(2048).prefetch(16)


train_ds_eng = make_dataset(train_eng)
val_ds_eng = make_dataset(val_eng)
test_ds_eng = make_dataset(test_eng)

In [None]:
train_ds_eng

<_PrefetchDataset element_spec=TensorSpec(shape=(72034,), dtype=tf.string, name=None)>

In [None]:
eng_txt = " ".join(unique_eng_words)
eng_txt_cp = eng_txt

clean_txt_en = re.sub('[!¡@#$-;.,~¿?—\n«»]', '', eng_txt_cp.lower().replace('\n', ' '))

In [None]:
clean_txt_en



In [None]:
en_vocab_f = re.findall(r'\b\w+\b', clean_txt_en.lower())
en_vocab_f

['lake',
 'titicaca',
 'the',
 'biggest',
 'lake',
 'in',
 'south',
 'america',
 'is',
 'in',
 'peru',
 'tom',
 'isnt',
 'budging',
 'on',
 'this',
 'one',
 'what',
 'i',
 'want',
 'is',
 'some',
 'peace',
 'and',
 'quiet',
 'we',
 'must',
 'try',
 'to',
 'protect',
 'the',
 'environment',
 'may',
 'i',
 'use',
 'the',
 'telephone',
 'for',
 'a',
 'while',
 'i',
 'know',
 'it',
 'in',
 'my',
 'heart',
 'i',
 'am',
 'sorry',
 'i',
 'am',
 'not',
 'from',
 'here',
 'take',
 'the',
 'bags',
 'upstairs',
 'i',
 'feel',
 'very',
 'cold',
 'this',
 'painting',
 'is',
 'a',
 'good',
 'copy',
 'of',
 'the',
 'original',
 'youre',
 'such',
 'a',
 'gossip',
 'even',
 'if',
 'i',
 'wanted',
 'to',
 'i',
 'couldnt',
 'do',
 'that',
 'that',
 'isnt',
 'my',
 'problem',
 'our',
 'teacher',
 'comes',
 'to',
 'school',
 'by',
 'car',
 'he',
 'almost',
 'never',
 'went',
 'there',
 'it',
 'is',
 'me',
 'that',
 'is',
 'wrong',
 'she',
 'wanted',
 'him',
 'to',
 'tell',
 'her',
 'that',
 'he',
 'loved',

Here's what our sentence pairs look like:

In [None]:
for _ in range(5):
    print(random.choice(text_pairs))

('Good work, Tom.', '[start] Buen trabajo, Tom. [end]')
('They got to be good friends.', '[start] Llegaron a ser buenos amigos. [end]')
('The population has doubled in the last five years.', '[start] La población se ha duplicado en los últimos cinco años. [end]')
('I keep in touch with Tom.', '[start] Lo veo a Tomás de vez en cuando. [end]')
('I need to see you in my office.', '[start] Necesito verte en mi oficina. [end]')


Now, let's split the sentence pairs into a training set, a validation set,
and a test set.

In [None]:
random.shuffle(text_pairs)
num_val_samples = int(0.15 * len(text_pairs))
num_train_samples = len(text_pairs) - 2 * num_val_samples
train_pairs = text_pairs[:num_train_samples]
val_pairs = text_pairs[num_train_samples : num_train_samples + num_val_samples]
test_pairs = text_pairs[num_train_samples + num_val_samples :]

print(f"{len(text_pairs)} total pairs")
print(f"{len(train_pairs)} training pairs")
print(f"{len(val_pairs)} validation pairs")
print(f"{len(test_pairs)} test pairs")

118964 total pairs
83276 training pairs
17844 validation pairs
17844 test pairs


In [None]:
type(train_pairs)

list

In [None]:
train_pairs

[('She stole my clothes!', '[start] ¡Ella se robó mi ropa! [end]'),
 ('Tell me what I want to know.', '[start] Dime lo que quiero saber. [end]'),
 ("It wasn't a very interesting novel.",
  '[start] No era una novela muy interesante. [end]'),
 ('Here are some details.', '[start] Aquí tenéis algunos detalles. [end]'),
 ('I asked him about the accident.',
  '[start] Le pregunté sobre el accidente. [end]'),
 ('Nobody understands it.', '[start] Nadie lo entiende. [end]'),
 ('Can you please sign this document?',
  '[start] Por favor, ¿podría firmar este documento? [end]'),
 ('Tom is getting married next month.',
  '[start] Tom se va a casar el próximo mes. [end]'),
 ("I can't imagine a world without electricity.",
  '[start] No puedo imaginar un mundo sin electricidad. [end]'),
 ('I was pretty hungry when I got home.',
  '[start] Tenía mucha hambre cuando llegué a casa. [end]'),
 ('He readily agreed to my proposal.',
  '[start] Él aceptó sin reparos mi propuesta. [end]'),
 ('This is the book

## Vectorizing the text data

We'll use two instances of the `TextVectorization` layer to vectorize the text
data (one for English and one for Spanish),
that is to say, to turn the original strings into integer sequences
where each integer represents the index of a word in a vocabulary.

The English layer will use the default string standardization (strip punctuation characters)
and splitting scheme (split on whitespace), while
the Spanish layer will use a custom standardization, where we add the character
`"¿"` to the set of punctuation characters to be stripped.

Note: in a production-grade machine translation model, I would not recommend
stripping the punctuation characters in either language. Instead, I would recommend turning
each punctuation character into its own token,
which you could achieve by providing a custom `split` function to the `TextVectorization` layer.

In [None]:
strip_chars = string.punctuation + "¿"
strip_chars = strip_chars.replace("[", "")
strip_chars = strip_chars.replace("]", "")

vocab_size = 15000
sequence_length = 20
batch_size = 64


def custom_standardization(input_string):
    lowercase = tf_strings.lower(input_string)
    return tf_strings.regex_replace(lowercase, "[%s]" % re.escape(strip_chars), "")


eng_vectorization = TextVectorization(
    max_tokens=vocab_size,
    output_mode="int",
    output_sequence_length=sequence_length,
)
spa_vectorization = TextVectorization(
    max_tokens=vocab_size,
    output_mode="int",
    output_sequence_length=sequence_length + 1,
    standardize=custom_standardization,
)
train_eng_texts = [pair[0] for pair in train_pairs]
train_spa_texts = [pair[1] for pair in train_pairs]
eng_vectorization.adapt(train_eng_texts)
spa_vectorization.adapt(train_spa_texts)

In [None]:
train_eng_texts

['She stole my clothes!',
 'Tell me what I want to know.',
 "It wasn't a very interesting novel.",
 'Here are some details.',
 'I asked him about the accident.',
 'Nobody understands it.',
 'Can you please sign this document?',
 'Tom is getting married next month.',
 "I can't imagine a world without electricity.",
 'I was pretty hungry when I got home.',
 'He readily agreed to my proposal.',
 'This is the book that I told you about.',
 "I hope he'll wait for me.",
 "Tom doesn't even know he's in trouble.",
 'It will take a little time to get used to wearing a wig.',
 'Tom will have to leave the building.',
 'This desk is made of hard wood.',
 'I spent the weekend with my friends.',
 'Have some eggnog.',
 'Hurry up.',
 'As is often said, it is difficult to adjust yourself to a new environment.',
 'This chair is broken.',
 'He has a test next week.',
 'They have no idea what our problems are.',
 'Tom had white shoes on.',
 'Tom wants more coffee.',
 'I can speak to Tom.',
 'The hummingbi

In [None]:
eng_vectorization.get_vocabulary()

['',
 '[UNK]',
 np.str_('the'),
 np.str_('i'),
 np.str_('to'),
 np.str_('you'),
 np.str_('tom'),
 np.str_('a'),
 np.str_('is'),
 np.str_('he'),
 np.str_('in'),
 np.str_('of'),
 np.str_('that'),
 np.str_('it'),
 np.str_('was'),
 np.str_('do'),
 np.str_('me'),
 np.str_('this'),
 np.str_('have'),
 np.str_('my'),
 np.str_('for'),
 np.str_('she'),
 np.str_('dont'),
 np.str_('are'),
 np.str_('what'),
 np.str_('his'),
 np.str_('mary'),
 np.str_('we'),
 np.str_('your'),
 np.str_('on'),
 np.str_('be'),
 np.str_('with'),
 np.str_('want'),
 np.str_('not'),
 np.str_('im'),
 np.str_('and'),
 np.str_('like'),
 np.str_('at'),
 np.str_('know'),
 np.str_('him'),
 np.str_('go'),
 np.str_('can'),
 np.str_('her'),
 np.str_('has'),
 np.str_('its'),
 np.str_('will'),
 np.str_('they'),
 np.str_('there'),
 np.str_('time'),
 np.str_('how'),
 np.str_('were'),
 np.str_('very'),
 np.str_('did'),
 np.str_('as'),
 np.str_('had'),
 np.str_('all'),
 np.str_('about'),
 np.str_('up'),
 np.str_('here'),
 np.str_('think'

Next, we'll format our datasets.

At each training step, the model will seek to predict target words N+1 (and beyond)
using the source sentence and the target words 0 to N.

As such, the training dataset will yield a tuple `(inputs, targets)`, where:

- `inputs` is a dictionary with the keys `encoder_inputs` and `decoder_inputs`.
`encoder_inputs` is the vectorized source sentence and `decoder_inputs` is the target sentence "so far",
that is to say, the words 0 to N used to predict word N+1 (and beyond) in the target sentence.
- `target` is the target sentence offset by one step:
it provides the next words in the target sentence -- what the model will try to predict.

In [None]:

def format_dataset(eng, spa):
    eng = eng_vectorization(eng)
    spa = spa_vectorization(spa)
    return (
        {
            "encoder_inputs": eng,
            "decoder_inputs": spa[:, :-1],
        },
        spa[:, 1:],
    )


def make_dataset(pairs):
    eng_texts, spa_texts = zip(*pairs)
    eng_texts = list(eng_texts)
    spa_texts = list(spa_texts)
    dataset = tf_data.Dataset.from_tensor_slices((eng_texts, spa_texts))
    dataset = dataset.batch(batch_size)
    dataset = dataset.map(format_dataset)
    return dataset.cache().shuffle(2048).prefetch(16)


train_ds = make_dataset(train_pairs)
val_ds = make_dataset(val_pairs)

In [None]:
test_ds = make_dataset(test_pairs)

In [None]:
def format_dataset(eng, spa):
    eng = eng_vectorization(eng)
    spa = spa_vectorization(spa)
    return (
        {
            "encoder_inputs": eng,
            "decoder_inputs": spa[:, :-1],
        },
        spa[:, 1:],
    )


def make_dataset(pairs):
    eng_texts, spa_texts = zip(*pairs)
    eng_texts = list(eng_texts)
    spa_texts = list(spa_texts)
    dataset = tf_data.Dataset.from_tensor_slices((eng_texts, spa_texts))
    dataset = dataset.batch(batch_size)
    dataset = dataset.map(format_dataset)
    return dataset.cache().shuffle(2048).prefetch(16)


train_ds_f = make_dataset(train_pairs)
val_ds_f = make_dataset(val_pairs)
test_ds_f = make_dataset(test_pairs)

In [None]:
train_ds_f

<_PrefetchDataset element_spec=({'encoder_inputs': TensorSpec(shape=(None, None), dtype=tf.int64, name=None), 'decoder_inputs': TensorSpec(shape=(None, None), dtype=tf.int64, name=None)}, TensorSpec(shape=(None, None), dtype=tf.int64, name=None))>

In [None]:
text_only_train_ds = train_ds.map(lambda x, y: x)

In [None]:
train_ds

<_PrefetchDataset element_spec=({'encoder_inputs': TensorSpec(shape=(None, None), dtype=tf.int64, name=None), 'decoder_inputs': TensorSpec(shape=(None, None), dtype=tf.int64, name=None)}, TensorSpec(shape=(None, None), dtype=tf.int64, name=None))>

In [None]:
text_only_train_ds

<_MapDataset element_spec={'encoder_inputs': TensorSpec(shape=(None, None), dtype=tf.int64, name=None), 'decoder_inputs': TensorSpec(shape=(None, None), dtype=tf.int64, name=None)}>

Let's take a quick look at the sequence shapes
(we have batches of 64 pairs, and all sequences are 20 steps long):

In [None]:
for inputs, targets in train_ds.take(1):
    print(f'inputs["encoder_inputs"].shape: {inputs["encoder_inputs"].shape}')
    print(f'inputs["decoder_inputs"].shape: {inputs["decoder_inputs"].shape}')
    print(f"targets.shape: {targets.shape}")

inputs["encoder_inputs"].shape: (64, 20)
inputs["decoder_inputs"].shape: (64, 20)
targets.shape: (64, 20)


## Building the model

Our sequence-to-sequence Transformer consists of a `TransformerEncoder`
and a `TransformerDecoder` chained together. To make the model aware of word order,
we also use a `PositionalEmbedding` layer.

The source sequence will be pass to the `TransformerEncoder`,
which will produce a new representation of it.
This new representation will then be passed
to the `TransformerDecoder`, together with the target sequence so far (target words 0 to N).
The `TransformerDecoder` will then seek to predict the next words in the target sequence (N+1 and beyond).

A key detail that makes this possible is causal masking
(see method `get_causal_attention_mask()` on the `TransformerDecoder`).
The `TransformerDecoder` sees the entire sequences at once, and thus we must make
sure that it only uses information from target tokens 0 to N when predicting token N+1
(otherwise, it could use information from the future, which would
result in a model that cannot be used at inference time).

In [None]:
import keras.ops as ops


class TransformerEncoder(layers.Layer):
    def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):
        super().__init__(**kwargs)
        self.embed_dim = embed_dim
        self.dense_dim = dense_dim
        self.num_heads = num_heads
        self.attention = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim
        )
        self.dense_proj = keras.Sequential(
            [
                layers.Dense(dense_dim, activation="relu"),
                layers.Dense(embed_dim),
            ]
        )
        self.layernorm_1 = layers.LayerNormalization()
        self.layernorm_2 = layers.LayerNormalization()
        self.supports_masking = True

    def call(self, inputs, mask=None):
        if mask is not None:
            padding_mask = ops.cast(mask[:, None, :], dtype="int32")
        else:
            padding_mask = None

        attention_output = self.attention(
            query=inputs, value=inputs, key=inputs, attention_mask=padding_mask
        )
        proj_input = self.layernorm_1(inputs + attention_output)
        proj_output = self.dense_proj(proj_input)
        return self.layernorm_2(proj_input + proj_output)

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "embed_dim": self.embed_dim,
                "dense_dim": self.dense_dim,
                "num_heads": self.num_heads,
            }
        )
        return config


class PositionalEmbedding(layers.Layer):
    def __init__(self, sequence_length, vocab_size, embed_dim, **kwargs):
        super().__init__(**kwargs)
        self.token_embeddings = layers.Embedding(
            input_dim=vocab_size, output_dim=embed_dim
        )
        self.position_embeddings = layers.Embedding(
            input_dim=sequence_length, output_dim=embed_dim
        )
        self.sequence_length = sequence_length
        self.vocab_size = vocab_size
        self.embed_dim = embed_dim

    def call(self, inputs):
        length = ops.shape(inputs)[-1]
        positions = ops.arange(0, length, 1)
        embedded_tokens = self.token_embeddings(inputs)
        embedded_positions = self.position_embeddings(positions)
        return embedded_tokens + embedded_positions

    def compute_mask(self, inputs, mask=None):
        return ops.not_equal(inputs, 0)

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "sequence_length": self.sequence_length,
                "vocab_size": self.vocab_size,
                "embed_dim": self.embed_dim,
            }
        )
        return config


class TransformerDecoder(layers.Layer):
    def __init__(self, embed_dim, latent_dim, num_heads, **kwargs):
        super().__init__(**kwargs)
        self.embed_dim = embed_dim
        self.latent_dim = latent_dim
        self.num_heads = num_heads
        self.attention_1 = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim
        )
        self.attention_2 = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim
        )
        self.dense_proj = keras.Sequential(
            [
                layers.Dense(latent_dim, activation="relu"),
                layers.Dense(embed_dim),
            ]
        )
        self.layernorm_1 = layers.LayerNormalization()
        self.layernorm_2 = layers.LayerNormalization()
        self.layernorm_3 = layers.LayerNormalization()
        self.supports_masking = True

    def call(self, inputs, mask=None):
        inputs, encoder_outputs = inputs
        causal_mask = self.get_causal_attention_mask(inputs)

        if mask is None:
            inputs_padding_mask, encoder_outputs_padding_mask = None, None
        else:
            inputs_padding_mask, encoder_outputs_padding_mask = mask

        attention_output_1 = self.attention_1(
            query=inputs,
            value=inputs,
            key=inputs,
            attention_mask=causal_mask,
            query_mask=inputs_padding_mask,
        )
        out_1 = self.layernorm_1(inputs + attention_output_1)

        attention_output_2 = self.attention_2(
            query=out_1,
            value=encoder_outputs,
            key=encoder_outputs,
            query_mask=inputs_padding_mask,
            key_mask=encoder_outputs_padding_mask,
        )
        out_2 = self.layernorm_2(out_1 + attention_output_2)

        proj_output = self.dense_proj(out_2)
        return self.layernorm_3(out_2 + proj_output)

    def get_causal_attention_mask(self, inputs):
        input_shape = ops.shape(inputs)
        batch_size, sequence_length = input_shape[0], input_shape[1]
        i = ops.arange(sequence_length)[:, None]
        j = ops.arange(sequence_length)
        mask = ops.cast(i >= j, dtype="int32")
        mask = ops.reshape(mask, (1, input_shape[1], input_shape[1]))
        mult = ops.concatenate(
            [ops.expand_dims(batch_size, -1), ops.convert_to_tensor([1, 1])],
            axis=0,
        )
        return ops.tile(mask, mult)

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "embed_dim": self.embed_dim,
                "latent_dim": self.latent_dim,
                "num_heads": self.num_heads,
            }
        )
        return config


Next, we assemble the end-to-end model.

In [None]:
embed_dim = 256
latent_dim = 2048
num_heads = 8

encoder_inputs = keras.Input(shape=(None,), dtype="int64", name="encoder_inputs")
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(encoder_inputs)
encoder_outputs = TransformerEncoder(embed_dim, latent_dim, num_heads)(x)
encoder = keras.Model(encoder_inputs, encoder_outputs)

decoder_inputs = keras.Input(shape=(None,), dtype="int64", name="decoder_inputs")
encoded_seq_inputs = keras.Input(shape=(None, embed_dim), name="decoder_state_inputs")
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs)
x = TransformerDecoder(embed_dim, latent_dim, num_heads)([x, encoder_outputs])
x = layers.Dropout(0.5)(x)
decoder_outputs = layers.Dense(vocab_size, activation="softmax")(x)
decoder = keras.Model([decoder_inputs, encoded_seq_inputs], decoder_outputs)

transformer = keras.Model(
    {"encoder_inputs": encoder_inputs, "decoder_inputs": decoder_inputs},
    decoder_outputs,
    name="transformer",
)

## Training our model

We'll use accuracy as a quick way to monitor training progress on the validation data.
Note that machine translation typically uses BLEU scores as well as other metrics, rather than accuracy.

Here we only train for 1 epoch, but to get the model to actually converge
you should train for at least 30 epochs.

In [None]:
epochs = 30  # This should be at least 30 for convergence

transformer.summary()
transformer.compile(
    "rmsprop",
    loss=keras.losses.SparseCategoricalCrossentropy(ignore_class=0),
    metrics=["accuracy"],
)
transformer.fit(train_ds, epochs=epochs, validation_data=val_ds)

Epoch 1/30
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m90s[0m 37ms/step - accuracy: 0.1034 - loss: 5.0880 - val_accuracy: 0.1934 - val_loss: 2.8953
Epoch 2/30
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 10ms/step - accuracy: 0.1945 - loss: 2.9055 - val_accuracy: 0.2127 - val_loss: 2.4749
Epoch 3/30
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 10ms/step - accuracy: 0.2147 - loss: 2.4856 - val_accuracy: 0.2236 - val_loss: 2.3015
Epoch 4/30
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 10ms/step - accuracy: 0.2258 - loss: 2.2768 - val_accuracy: 0.2277 - val_loss: 2.2661
Epoch 5/30
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 10ms/step - accuracy: 0.2335 - loss: 2.1534 - val_accuracy: 0.2293 - val_loss: 2.2230
Epoch 6/30
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 10ms/step - accuracy: 0.2383 - loss: 2.0657 - val_accuracy: 0.2277 - val_loss: 2.2678
Epoc

<keras.src.callbacks.history.History at 0x7ef93880a890>

## Decoding test sentences

Finally, let's demonstrate how to translate brand new English sentences.
We simply feed into the model the vectorized English sentence
as well as the target token `"[start]"`, then we repeatedly generated the next token, until
we hit the token `"[end]"`.

In [None]:
spa_vocab = spa_vectorization.get_vocabulary()
spa_index_lookup = dict(zip(range(len(spa_vocab)), spa_vocab))
max_decoded_sentence_length = 20


def decode_sequence(input_sentence):
    tokenized_input_sentence = eng_vectorization([input_sentence])
    decoded_sentence = "[start]"
    for i in range(max_decoded_sentence_length):
        tokenized_target_sentence = spa_vectorization([decoded_sentence])[:, :-1]
        predictions = transformer(
            {
                "encoder_inputs": tokenized_input_sentence,
                "decoder_inputs": tokenized_target_sentence,
            }
        )

        # ops.argmax(predictions[0, i, :]) is not a concrete value for jax here
        sampled_token_index = ops.convert_to_numpy(
            ops.argmax(predictions[0, i, :])
        ).item(0)
        sampled_token = spa_index_lookup[sampled_token_index]
        decoded_sentence += " " + sampled_token

        if sampled_token == "[end]":
            break
    return decoded_sentence


test_eng_texts = [pair[0] for pair in test_pairs]
for _ in range(30):
    input_sentence = random.choice(test_eng_texts)
    translated = decode_sequence(input_sentence)

After 30 epochs, we get results such as:

> She handed him the money.
> [start] ella le pasó el dinero [end]

> Tom has never heard Mary sing.
> [start] tom nunca ha oído cantar a mary [end]

> Perhaps she will come tomorrow.
> [start] tal vez ella vendrá mañana [end]

> I love to write.
> [start] me encanta escribir [end]

> His French is improving little by little.
> [start] su francés va a [UNK] sólo un poco [end]

> My hotel told me to call you.
> [start] mi hotel me dijo que te [UNK] [end]

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import os

os.chdir("/content/drive/MyDrive/Diplomado_IA/NLP/Keras Translator")

In [None]:
#transformer.save("fchollet_transformer1_carolinab.keras")

# Stanford Embedings

In [None]:
#!wget http://nlp.stanford.edu/data/glove.6B.zip
#!unzip -q glove.6B.zip

### Parsing the GloVe word-embeddings file

In [None]:
import numpy as np
path_to_glove_file = "glove.6B.100d.txt"

embeddings_index = {}
with open(path_to_glove_file) as f:
    for line in f:
        word, coefs = line.split(maxsplit=1)
        coefs = np.fromstring(coefs, "f", sep=" ")
        embeddings_index[word] = coefs

print(f"Found {len(embeddings_index)} word vectors.")

Found 400000 word vectors.


In [None]:
type(embeddings_index)

dict

In [None]:
len(list(embeddings_index))

400000

In [None]:
embeddings_index["human"]

array([ 3.3864e-01,  5.9663e-01,  5.3322e-01,  3.1404e-01,  1.5321e-01,
        3.1749e-01, -4.2940e-01, -2.9150e-01, -2.1047e-03, -3.9309e-01,
       -8.5441e-01, -8.0708e-02,  1.2118e+00,  6.9316e-02,  8.0613e-03,
        8.7888e-01,  3.1908e-02,  5.8655e-01, -5.4892e-01, -7.8468e-03,
        1.7327e-01, -2.6693e-01,  4.2802e-01,  6.6123e-02,  5.1847e-01,
        7.7226e-01,  2.0608e-01, -4.5836e-01,  3.5485e-01,  7.1547e-01,
        6.0855e-01,  2.0254e-01, -4.8756e-01,  5.7974e-01,  8.6728e-02,
       -5.1852e-01, -3.7274e-01,  1.0014e+00, -2.9259e-01,  3.2290e-01,
       -9.7563e-01, -2.2288e-01, -2.3335e-01, -2.6891e-01,  1.4612e-01,
        1.2004e-01, -2.0402e-01, -9.4647e-02, -1.5402e+00, -5.9510e-02,
        1.0887e+00, -2.4998e-01, -2.5808e-01,  1.2798e+00, -1.2849e-01,
       -1.4511e+00, -2.4686e-01, -9.5046e-02,  1.7425e+00,  1.1977e-01,
       -1.9206e-01,  4.4368e-01, -1.6453e-01, -7.6663e-01,  1.1100e+00,
        4.6748e-01, -2.4673e-02,  4.7179e-03,  6.9761e-01, -2.29

In [None]:
len(embeddings_index["human"])

100

### Preparing integer sequence datasets

In [None]:
from tensorflow.keras import layers

max_length = 600
max_tokens = 20000
text_vectorization = layers.TextVectorization(
    max_tokens=max_tokens,
    output_mode="int",
    output_sequence_length=max_length,
)

In [None]:
text_vectorization.get_vocabulary()

['', '[UNK]']

In [None]:
#en_vocab_f

In [None]:
#text_vectorization.adapt(en_vocab_f)

In [None]:
text_vectorization.adapt(train_eng_texts)

In [None]:
text_vectorization.get_vocabulary()

['',
 '[UNK]',
 np.str_('the'),
 np.str_('i'),
 np.str_('to'),
 np.str_('you'),
 np.str_('tom'),
 np.str_('a'),
 np.str_('is'),
 np.str_('he'),
 np.str_('in'),
 np.str_('of'),
 np.str_('that'),
 np.str_('it'),
 np.str_('was'),
 np.str_('do'),
 np.str_('me'),
 np.str_('this'),
 np.str_('have'),
 np.str_('my'),
 np.str_('for'),
 np.str_('she'),
 np.str_('dont'),
 np.str_('are'),
 np.str_('what'),
 np.str_('his'),
 np.str_('mary'),
 np.str_('we'),
 np.str_('your'),
 np.str_('on'),
 np.str_('be'),
 np.str_('with'),
 np.str_('want'),
 np.str_('not'),
 np.str_('im'),
 np.str_('and'),
 np.str_('like'),
 np.str_('at'),
 np.str_('know'),
 np.str_('him'),
 np.str_('go'),
 np.str_('can'),
 np.str_('her'),
 np.str_('has'),
 np.str_('its'),
 np.str_('will'),
 np.str_('they'),
 np.str_('there'),
 np.str_('time'),
 np.str_('how'),
 np.str_('were'),
 np.str_('very'),
 np.str_('did'),
 np.str_('as'),
 np.str_('had'),
 np.str_('all'),
 np.str_('about'),
 np.str_('up'),
 np.str_('here'),
 np.str_('think'

In [None]:
#type(list(embeddings_index))

In [None]:
#list(embeddings_index)[0]

Next, let’s build an embedding matrix that you can load into an Embedding layer. It must be a matrix of shape (max_words, embedding_dim), where each entry i contains the embedding_dim-dimensional vector for the word of index i in the reference word index (built during tokenization).


### Preparing the GloVe word-embeddings matrix

In [None]:
embeddings_index.get(word)

array([ 0.28365  , -0.6263   , -0.44351  ,  0.2177   , -0.087421 ,
       -0.17062  ,  0.29266  , -0.024899 ,  0.26414  , -0.17023  ,
        0.25817  ,  0.097484 , -0.33103  , -0.43859  ,  0.0095799,
        0.095624 , -0.17777  ,  0.38886  ,  0.27151  ,  0.14742  ,
       -0.43973  , -0.26588  , -0.024271 ,  0.27186  , -0.36761  ,
       -0.24827  , -0.20815  ,  0.22128  , -0.044409 ,  0.021373 ,
        0.24594  ,  0.26143  ,  0.29303  ,  0.13281  ,  0.082232 ,
       -0.12869  ,  0.1622   , -0.22567  , -0.060348 ,  0.28703  ,
        0.11381  ,  0.34839  ,  0.3419   ,  0.36996  , -0.13592  ,
        0.0062694,  0.080317 ,  0.0036251,  0.43093  ,  0.01882  ,
        0.31008  ,  0.16722  ,  0.074112 , -0.37745  ,  0.47363  ,
        0.41284  ,  0.24471  ,  0.075965 , -0.51725  , -0.49481  ,
        0.526    , -0.074645 ,  0.41434  , -0.1956   , -0.16544  ,
       -0.045649 , -0.40153  , -0.13136  , -0.4672   ,  0.18825  ,
        0.2612   ,  0.16854  ,  0.22615  ,  0.62992  , -0.1288

In [None]:
embedding_dim = 100

vocabulary = text_vectorization.get_vocabulary()
word_index = dict(zip(vocabulary, range(len(vocabulary))))

embedding_matrix = np.zeros((max_tokens, embedding_dim))
for word, i in word_index.items():
    if i < max_tokens:
        embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector

In [None]:
embedding_matrix

array([[ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [-0.038194  , -0.24487001,  0.72812003, ..., -0.1459    ,
         0.82779998,  0.27061999],
       ...,
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]])

In [None]:
embedding_matrix.shape

(20000, 100)

In [None]:
embedding_matrix[0][7]

np.float64(0.0)

Finally, we use a Constant initializer to load the pretrained embeddings in an Embedding layer. So as not to disrupt the pretrained representations during training, we freeze the layer via trainable=False

In [None]:
from tensorflow import keras

embedding_layer = layers.Embedding(
    max_tokens,
    embedding_dim,
    embeddings_initializer=keras.initializers.Constant(embedding_matrix),
    trainable=False,
    mask_zero=True,
)

### Preparing integer sequence datasets

In [None]:
import os, pathlib, shutil, random
from tensorflow import keras
batch_size = 32
base_dir = pathlib.Path("aclImdb")
val_dir = base_dir / "val"
train_dir = base_dir / "train"
for category in ("neg", "pos"):
   # os.makedirs(val_dir / category)
    files = os.listdir(train_dir / category)
    random.Random(1337).shuffle(files)
    num_val_samples = int(0.2 * len(files))
    val_files = files[-num_val_samples:]
    for fname in val_files:
        shutil.move(train_dir / category / fname,
                    val_dir / category / fname)

train_ds = keras.utils.text_dataset_from_directory(
    "aclImdb/train", batch_size=batch_size
)
val_ds = keras.utils.text_dataset_from_directory(
    "aclImdb/val", batch_size=batch_size
)
test_ds = keras.utils.text_dataset_from_directory(
    "aclImdb/test", batch_size=batch_size
)
text_only_train_ds = train_ds.map(lambda x, y: x)

Found 14187 files belonging to 3 classes.
Found 7979 files belonging to 2 classes.
Found 25000 files belonging to 2 classes.


In [None]:
for sample in train_ds.take(3):
    print(sample)

(<tf.Tensor: shape=(32,), dtype=string, numpy=
array([b"This film is about a man's life going wrong. His business is failing, and he cannot impregnate his wife despite multiple attempts.<br /><br />The plot is complete chaos. It simply does not make sense. In fact, nothing in the film makes sense. The story is so poorly told that I simply could not understand it. It is a shame, because the sets and costumes are done well, and are visually stimulating enough. The shots are well composed throughout the film. However, these redeeming features still cannot make up for the bad plot and poor story telling. I am amazed by the big names who agreed to star in this film. It is such a waste of their talents. This film is very bad. Avoid it!!",
       b'',
       b'The original show was so much better. They should have left on a good note. This movie killed the whole idea. It was boring, over-dramatic, and the funny parts were too far in between to make up the slack. This movie really seemed like 

In [None]:
with open(text_file) as f:
    lines = f.read().split("\n")[:-1]
text_pairs = []
for line in lines:
    eng, spa = line.split("\t")
    spa = spa
    text_pairs.append((eng, spa))

In [None]:
text_pairs[:10]

[('They come from the same country.', 'Ellas vienen del mismo país.'),
 ('The Cold War continued.', 'La Guerra Fría proseguía.'),
 ('The prince thought the young girl had been eaten by a dragon.',
  'El príncipe pensó que a la joven muchacha se la había comido un dragón.'),
 ('Tom felt like a fish out of water.',
  'Tom se sentía como un pez fuera del agua.'),
 ("Don't pry into my private life.", 'No indagues en mi vida privada.'),
 ("Tom said that he'd never eaten such a delicious meal before.",
  'Tom dijo que no había comido nunca un plato tan delicioso.'),
 ('You are exaggerating the problem.', 'Usted está exagerando el problema.'),
 ('Tom felt a pain in his side.', 'Tom sintió un dolor en su costado.'),
 ('Mary is a professional dancer.', 'Mary es una bailarina profesional.'),
 ('He is not a patient but a doctor in this hospital.',
  'Él no es un paciente, sino médico en este hospital.')]

In [None]:
random.shuffle(text_pairs)
num_val_samples = int(0.15 * len(text_pairs))
num_train_samples = len(text_pairs) - 2 * num_val_samples
train_pairs = text_pairs[:num_train_samples]
val_pairs = text_pairs[num_train_samples : num_train_samples + num_val_samples]
test_pairs = text_pairs[num_train_samples + num_val_samples :]

print(f"{len(text_pairs)} total pairs")
print(f"{len(train_pairs)} training pairs")
print(f"{len(val_pairs)} validation pairs")
print(f"{len(test_pairs)} test pairs")

118964 total pairs
83276 training pairs
17844 validation pairs
17844 test pairs


PosixPath('/root/.keras/datasets/spa-eng_extracted/spa-eng/spa.txt')

In [None]:
train_pairs

[('They come from the same country.', 'Ellas vienen del mismo país.'),
 ('The Cold War continued.', 'La Guerra Fría proseguía.'),
 ('The prince thought the young girl had been eaten by a dragon.',
  'El príncipe pensó que a la joven muchacha se la había comido un dragón.'),
 ('Tom felt like a fish out of water.',
  'Tom se sentía como un pez fuera del agua.'),
 ("Don't pry into my private life.", 'No indagues en mi vida privada.'),
 ("Tom said that he'd never eaten such a delicious meal before.",
  'Tom dijo que no había comido nunca un plato tan delicioso.'),
 ('You are exaggerating the problem.', 'Usted está exagerando el problema.'),
 ('Tom felt a pain in his side.', 'Tom sintió un dolor en su costado.'),
 ('Mary is a professional dancer.', 'Mary es una bailarina profesional.'),
 ('He is not a patient but a doctor in this hospital.',
  'Él no es un paciente, sino médico en este hospital.'),
 ('Did Tom buy it?', '¿Tom lo compró?'),
 ("Tom couldn't find what he was looking for.",
  'T

In [None]:
import unicodedata

def remove_accented_char(texto):
    # Normalizar el texto a la forma NFD
    texto = unicodedata.normalize("NFD", texto)

    # Reemplazar los caracteres diacríticos, pero dejando la "ñ" intacta
    texto = re.sub(r"(?<!n)[\u0300-\u036f]", "", texto)

    # Volver a la forma NFC para evitar problemas de codificación
    return unicodedata.normalize("NFC", texto)

In [None]:
import tensorflow as tf

import tensorflow_text as tf_text

strip_chars = string.punctuation + "¿"
strip_chars = strip_chars.replace("[", "")
strip_chars = strip_chars.replace("]", "")


vocab_size = 15000
sequence_length = 20
batch_size = 64


def custom_standardization(input_string):


    lowercase = tf.strings.lower(input_string)


    return  tf.strings.regex_replace(lowercase, "[%s]" % re.escape(strip_chars), "")


In [None]:
def clean_sp_txt(spa_texts):
  clean_sp = []
  for i in range(len(spa_texts)):
    txt_clean = remove_accented_char(spa_texts[i])
    clean_sp.append(txt_clean)
  return  clean_sp

In [None]:
def make_dataset(pairs):
    eng_texts, spa_texts = zip(*pairs)
    spa_texts = clean_sp_txt(spa_texts)

    spa_texts = custom_standardization(tuple(spa_texts))
    eng_texts = custom_standardization(eng_texts)

    eng_texts = list(eng_texts)
    spa_texts = list(spa_texts)
    dataset = tf_data.Dataset.from_tensor_slices((eng_texts, spa_texts))
    dataset = dataset.batch(batch_size)

    return dataset.cache().shuffle(2048).prefetch(16)


train_ds = make_dataset(train_pairs)
val_ds = make_dataset(val_pairs)
test_ds = make_dataset(test_pairs)

In [None]:
train_ds

<_PrefetchDataset element_spec=(TensorSpec(shape=(None,), dtype=tf.string, name=None), TensorSpec(shape=(None,), dtype=tf.string, name=None))>

In [None]:
for sample in train_ds.take(3):
    print(sample)

(<tf.Tensor: shape=(64,), dtype=string, numpy=
array([b'they got the prize', b'its raining today',
       b'a lot of software is available for making multimedia presentations',
       b'my cat rubbed her head against my shoulder',
       b'which direction did he go', b'please do that again',
       b'i want you to know the truth', b'i almost forgot it',
       b'tom isnt budging on this one',
       b'i have a feeling that tom doesnt like mary all that much',
       b'what do you like to eat for lunch', b'do you have a coin',
       b'he isnt our enemy', b'how much time will you need',
       b'tom didnt expect to fall in love with mary',
       b'tom kept working even though he was very tired',
       b'i guess tom was right',
       b'have you been told why we didnt hire you',
       b'we cant guarantee that',
       b'the bus drivers are going on strike today',
       b'please correct the sentence', b'you seem to be tired',
       b'when he was faced with the evidence he had to admi

In [None]:
from tensorflow.keras import layers

max_length = 600
max_tokens = 20000

text_vectorization = layers.TextVectorization(
    max_tokens=max_tokens,
    output_mode="int",
    output_sequence_length=max_length,

)
text_vectorization.adapt(train_eng_texts)

int_train_ds = train_ds.map(
      lambda x, y: (text_vectorization(x), text_vectorization(y)),
      num_parallel_calls=4,
      )
int_val_ds = val_ds.map(
    lambda x, y: (text_vectorization(x), text_vectorization(y)),
    num_parallel_calls=4)
int_test_ds = test_ds.map(
    lambda x, y: (text_vectorization(x), text_vectorization(y)),
    num_parallel_calls=4)

In [None]:
for batch in int_train_ds.take(1):
    print(batch)


(<tf.Tensor: shape=(64, 600), dtype=int64, numpy=
array([[ 154,   24,    2, ...,    0,    0,    0],
       [   6, 1524, 2333, ...,    0,    0,    0],
       [   7,  916,   43, ...,    0,    0,    0],
       ...,
       [  15,    5,   32, ...,    0,    0,    0],
       [   3,   80,   96, ...,    0,    0,    0],
       [  49,  120, 3867, ...,    0,    0,    0]])>, <tf.Tensor: shape=(64, 600), dtype=int64, numpy=
array([[    1,     1,     1, ...,     0,     0,     0],
       [    6,     1,  2333, ...,     0,     0,     0],
       [ 6140,     1,     1, ...,     0,     0,     0],
       ...,
       [    1,     1,     0, ...,     0,     0,     0],
       [    1, 11033,     1, ...,     0,     0,     0],
       [    1,     1,  5652, ...,     0,     0,     0]])>)


###Model that uses a pretrained Embedding layer

In [None]:
embedding_matrix

array([[ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [-0.038194  , -0.24487001,  0.72812003, ..., -0.1459    ,
         0.82779998,  0.27061999],
       ...,
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]])

In [None]:
inputs = keras.Input(shape=(None,), dtype="int64")

embedding_layer = layers.Embedding(
    max_tokens,
    embedding_dim,
    embeddings_initializer=keras.initializers.Constant(embedding_matrix),
    trainable=False,
    mask_zero=True,
)

embedded = embedding_layer(inputs)
x = layers.Bidirectional(layers.LSTM(32))(embedded)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
model.summary()

callbacks = [
    keras.callbacks.ModelCheckpoint("glove_embeddings_sequence_model.keras",
                                    save_best_only=True)
]
model.fit(int_train_ds, validation_data=int_val_ds, epochs=10, callbacks=callbacks)
model = keras.models.load_model("glove_embeddings_sequence_model.keras")
print(f"Test acc: {model.evaluate(int_test_ds)[1]:.3f}")

Epoch 1/10
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 10ms/step - accuracy: 0.0084 - loss: -210.1178 - val_accuracy: 0.0079 - val_loss: -648.4481
Epoch 2/10
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 10ms/step - accuracy: 0.0079 - loss: -789.2924 - val_accuracy: 0.0081 - val_loss: -1223.8065
Epoch 3/10
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 10ms/step - accuracy: 0.0085 - loss: -1361.6611 - val_accuracy: 0.0087 - val_loss: -1796.0385
Epoch 4/10
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 10ms/step - accuracy: 0.0094 - loss: -1937.1157 - val_accuracy: 0.0090 - val_loss: -2372.1682
Epoch 5/10
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 10ms/step - accuracy: 0.0092 - loss: -2528.7490 - val_accuracy: 0.0090 - val_loss: -2944.6323
Epoch 6/10
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 10ms/step - accuracy: 0.0092 - loss: -3078.2000 - val