# English-to-Spanish translation with a sequence-to-sequence Transformer

**Author:** [fchollet](https://twitter.com/fchollet)<br>
**Date created:** 2021/05/26<br>
**Last modified:** 2024/11/18<br>
**Description:** Implementing a sequence-to-sequence Transformer and training it on a machine translation task.

## Introduction

In this example, we'll build a sequence-to-sequence Transformer model, which
we'll train on an English-to-Spanish machine translation task.

You'll learn how to:

- Vectorize text using the Keras `TextVectorization` layer.
- Implement a `TransformerEncoder` layer, a `TransformerDecoder` layer,
and a `PositionalEmbedding` layer.
- Prepare data for training a sequence-to-sequence model.
- Use the trained model to generate translations of never-seen-before
input sentences (sequence-to-sequence inference).

The code featured here is adapted from the book
[Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition)
(chapter 11: Deep learning for text).
The present example is fairly barebones, so for detailed explanations of
how each building block works, as well as the theory behind Transformers,
I recommend reading the book.

## Setup

In [None]:
# We set the backend to TensorFlow. The code works with
# both `tensorflow` and `torch`. It does not work with JAX
# due to the behavior of `jax.numpy.tile` in a jit scope
# (used in `TransformerDecoder.get_causal_attention_mask()`:
# `tile` in JAX does not support a dynamic `reps` argument.
# You can make the code work in JAX by wrapping the
# inside of the `get_causal_attention_mask` method in
# a decorator to prevent jit compilation:
# `with jax.ensure_compile_time_eval():`.
import os

os.environ["KERAS_BACKEND"] = "tensorflow"

import pathlib
import random
import string
import re
import numpy as np

import tensorflow.data as tf_data
import tensorflow.strings as tf_strings

import tensorflow as tf

import keras
from keras import models
from keras import layers
from keras import ops
from keras.layers import TextVectorization

2025-03-22 15:02:02.565998: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1742655722.578722       9 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742655722.582858       9 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-22 15:02:02.597211: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [22]:

# print ("TF Version   ", tf.__version__)
# # print ("TF Path      ", tf.__path__[0])
print("Keras version ", keras.__version__)
# print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Keras version  3.8.0


## Downloading the data

We'll be working with an English-to-Spanish translation dataset
provided by [Anki](https://www.manythings.org/anki/). Let's download it:

In [2]:
text_file = keras.utils.get_file(
    fname="spa-eng.zip",
    origin="http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip",
    extract=True,
)
# text_file = pathlib.Path(text_file).parent / "spa-eng" / "spa.txt"
text_file = pathlib.Path(text_file).parent/'spa-eng_extracted/spa-eng/spa.txt'

## Parsing the data

Each line contains an English sentence and its corresponding Spanish sentence.
The English sentence is the *source sequence* and Spanish one is the *target sequence*.
We prepend the token `"[start]"` and we append the token `"[end]"` to the Spanish sentence.

In [3]:
with open(text_file) as f:
    lines = f.read().split("\n")[:-1]
text_pairs = []
for line in lines:
    eng, spa = line.split("\t")
    spa = "[start] " + spa + " [end]"
    text_pairs.append((eng, spa))

Here's what our sentence pairs look like:

In [4]:
for _ in range(5):
    print(random.choice(text_pairs))

('He is an able lawyer.', '[start] Él es un abogado capaz. [end]')
('Tom has something to hide.', '[start] Tom tiene algo que esconder. [end]')
('I took part in the discussion.', '[start] Participé de la discusión. [end]')
('I want that bag.', '[start] Quiero esa bolsa. [end]')
('Tom knows how to keep a secret.', '[start] Tom sabe mantener un secreto. [end]')


Now, let's split the sentence pairs into a training set, a validation set,
and a test set.

In [5]:
random.shuffle(text_pairs)
num_val_samples = int(0.15 * len(text_pairs))
num_train_samples = len(text_pairs) - 2 * num_val_samples
train_pairs = text_pairs[:num_train_samples]
val_pairs = text_pairs[num_train_samples : num_train_samples + num_val_samples]
test_pairs = text_pairs[num_train_samples + num_val_samples :]

print(f"{len(text_pairs)} total pairs")
print(f"{len(train_pairs)} training pairs")
print(f"{len(val_pairs)} validation pairs")
print(f"{len(test_pairs)} test pairs")

118964 total pairs
83276 training pairs
17844 validation pairs
17844 test pairs


## Vectorizing the text data

We'll use two instances of the `TextVectorization` layer to vectorize the text
data (one for English and one for Spanish),
that is to say, to turn the original strings into integer sequences
where each integer represents the index of a word in a vocabulary.

The English layer will use the default string standardization (strip punctuation characters)
and splitting scheme (split on whitespace), while
the Spanish layer will use a custom standardization, where we add the character
`"¿"` to the set of punctuation characters to be stripped.

Note: in a production-grade machine translation model, I would not recommend
stripping the punctuation characters in either language. Instead, I would recommend turning
each punctuation character into its own token,
which you could achieve by providing a custom `split` function to the `TextVectorization` layer.

In [6]:
strip_chars = string.punctuation + "¿"
strip_chars = strip_chars.replace("[", "")
strip_chars = strip_chars.replace("]", "")

vocab_size = 15000
sequence_length = 20
batch_size = 64
ngrams=2,

def custom_standardization(input_string):
    lowercase = tf_strings.lower(input_string)
    return tf_strings.regex_replace(lowercase, "[%s]" % re.escape(strip_chars), "")


eng_vectorization = TextVectorization(
    ngrams=ngrams,
    max_tokens=vocab_size,
    output_mode="int",
    output_sequence_length=sequence_length,
)
spa_vectorization = TextVectorization(
    ngrams=ngrams,
    max_tokens=vocab_size,
    output_mode="int",
    output_sequence_length=sequence_length + 1,
    standardize=custom_standardization,
)
train_eng_texts = [pair[0] for pair in train_pairs]
train_spa_texts = [pair[1] for pair in train_pairs]
eng_vectorization.adapt(train_eng_texts)
spa_vectorization.adapt(train_spa_texts)

I0000 00:00:1742655734.014876       9 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9595 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6


Next, we'll format our datasets.

At each training step, the model will seek to predict target words N+1 (and beyond)
using the source sentence and the target words 0 to N.

As such, the training dataset will yield a tuple `(inputs, targets)`, where:

- `inputs` is a dictionary with the keys `encoder_inputs` and `decoder_inputs`.
`encoder_inputs` is the vectorized source sentence and `decoder_inputs` is the target sentence "so far",
that is to say, the words 0 to N used to predict word N+1 (and beyond) in the target sentence.
- `target` is the target sentence offset by one step:
it provides the next words in the target sentence -- what the model will try to predict.

In [7]:

def format_dataset(eng, spa):
    eng = eng_vectorization(eng)
    spa = spa_vectorization(spa)
    return (
        {
            "encoder_inputs": eng,
            "decoder_inputs": spa[:, :-1],
        },
        spa[:, 1:],
    )


def make_dataset(pairs):
    eng_texts, spa_texts = zip(*pairs)
    eng_texts = list(eng_texts)
    spa_texts = list(spa_texts)
    dataset = tf_data.Dataset.from_tensor_slices((eng_texts, spa_texts))
    dataset = dataset.batch(batch_size)
    dataset = dataset.map(format_dataset)
    return dataset.cache().shuffle(2048).prefetch(16)


train_ds = make_dataset(train_pairs)
val_ds = make_dataset(val_pairs)

Let's take a quick look at the sequence shapes
(we have batches of 64 pairs, and all sequences are 20 steps long):

In [8]:
for inputs, targets in train_ds.take(1):
    print(f'inputs["encoder_inputs"].shape: {inputs["encoder_inputs"].shape}')
    print(f'inputs["decoder_inputs"].shape: {inputs["decoder_inputs"].shape}')
    print(f"targets.shape: {targets.shape}")

inputs["encoder_inputs"].shape: (64, 20)
inputs["decoder_inputs"].shape: (64, 20)
targets.shape: (64, 20)


2025-03-22 15:02:21.078772: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


## Building the model

Our sequence-to-sequence Transformer consists of a `TransformerEncoder`
and a `TransformerDecoder` chained together. To make the model aware of word order,
we also use a `PositionalEmbedding` layer.

The source sequence will be pass to the `TransformerEncoder`,
which will produce a new representation of it.
This new representation will then be passed
to the `TransformerDecoder`, together with the target sequence so far (target words 0 to N).
The `TransformerDecoder` will then seek to predict the next words in the target sequence (N+1 and beyond).

A key detail that makes this possible is causal masking
(see method `get_causal_attention_mask()` on the `TransformerDecoder`).
The `TransformerDecoder` sees the entire sequences at once, and thus we must make
sure that it only uses information from target tokens 0 to N when predicting token N+1
(otherwise, it could use information from the future, which would
result in a model that cannot be used at inference time).

In [9]:
import keras.ops as ops


class TransformerEncoder(layers.Layer):
    def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):
        super().__init__(**kwargs)
        self.embed_dim = embed_dim
        self.dense_dim = dense_dim
        self.num_heads = num_heads
        self.attention = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim
        )
        self.dense_proj = keras.Sequential(
            [
                layers.Dense(dense_dim, activation="relu"),
                layers.Dense(embed_dim),
            ]
        )
        self.layernorm_1 = layers.LayerNormalization()
        self.layernorm_2 = layers.LayerNormalization()
        self.supports_masking = True

    def call(self, inputs, mask=None):
        if mask is not None:
            padding_mask = ops.cast(mask[:, None, :], dtype="int32")
        else:
            padding_mask = None

        attention_output = self.attention(
            query=inputs, value=inputs, key=inputs, attention_mask=padding_mask
        )
        proj_input = self.layernorm_1(inputs + attention_output)
        proj_output = self.dense_proj(proj_input)
        return self.layernorm_2(proj_input + proj_output)

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "embed_dim": self.embed_dim,
                "dense_dim": self.dense_dim,
                "num_heads": self.num_heads,
            }
        )
        return config


class PositionalEmbedding(layers.Layer):
    def __init__(self, sequence_length, vocab_size, embed_dim, **kwargs):
        super().__init__(**kwargs)
        self.token_embeddings = layers.Embedding(
            input_dim=vocab_size, output_dim=embed_dim
        )
        self.position_embeddings = layers.Embedding(
            input_dim=sequence_length, output_dim=embed_dim
        )
        self.sequence_length = sequence_length
        self.vocab_size = vocab_size
        self.embed_dim = embed_dim

    def call(self, inputs):
        length = ops.shape(inputs)[-1]
        positions = ops.arange(0, length, 1)
        embedded_tokens = self.token_embeddings(inputs)
        embedded_positions = self.position_embeddings(positions)
        return embedded_tokens + embedded_positions

    def compute_mask(self, inputs, mask=None):
        return ops.not_equal(inputs, 0)

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "sequence_length": self.sequence_length,
                "vocab_size": self.vocab_size,
                "embed_dim": self.embed_dim,
            }
        )
        return config


class TransformerDecoder(layers.Layer):
    def __init__(self, embed_dim, latent_dim, num_heads, **kwargs):
        super().__init__(**kwargs)
        self.embed_dim = embed_dim
        self.latent_dim = latent_dim
        self.num_heads = num_heads
        self.attention_1 = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim
        )
        self.attention_2 = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim
        )
        self.dense_proj = keras.Sequential(
            [
                layers.Dense(latent_dim, activation="relu"),
                layers.Dense(embed_dim),
            ]
        )
        self.layernorm_1 = layers.LayerNormalization()
        self.layernorm_2 = layers.LayerNormalization()
        self.layernorm_3 = layers.LayerNormalization()
        self.supports_masking = True

    def call(self, inputs, mask=None):
        inputs, encoder_outputs = inputs
        causal_mask = self.get_causal_attention_mask(inputs)

        if mask is None:
            inputs_padding_mask, encoder_outputs_padding_mask = None, None
        else:
            inputs_padding_mask, encoder_outputs_padding_mask = mask

        attention_output_1 = self.attention_1(
            query=inputs,
            value=inputs,
            key=inputs,
            attention_mask=causal_mask,
            query_mask=inputs_padding_mask,
        )
        out_1 = self.layernorm_1(inputs + attention_output_1)

        attention_output_2 = self.attention_2(
            query=out_1,
            value=encoder_outputs,
            key=encoder_outputs,
            query_mask=inputs_padding_mask,
            key_mask=encoder_outputs_padding_mask,
        )
        out_2 = self.layernorm_2(out_1 + attention_output_2)

        proj_output = self.dense_proj(out_2)
        return self.layernorm_3(out_2 + proj_output)

    def get_causal_attention_mask(self, inputs):
        input_shape = ops.shape(inputs)
        batch_size, sequence_length = input_shape[0], input_shape[1]
        i = ops.arange(sequence_length)[:, None]
        j = ops.arange(sequence_length)
        mask = ops.cast(i >= j, dtype="int32")
        mask = ops.reshape(mask, (1, input_shape[1], input_shape[1]))
        mult = ops.concatenate(
            [ops.expand_dims(batch_size, -1), ops.convert_to_tensor([1, 1])],
            axis=0,
        )
        return ops.tile(mask, mult)

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "embed_dim": self.embed_dim,
                "latent_dim": self.latent_dim,
                "num_heads": self.num_heads,
            }
        )
        return config


Next, we assemble the end-to-end model.

In [10]:
embed_dim = 256
latent_dim = 2048
num_heads = 8

encoder_inputs = keras.Input(shape=(None,), dtype="int64", name="encoder_inputs")
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(encoder_inputs)
encoder_outputs = TransformerEncoder(embed_dim, latent_dim, num_heads)(x)
encoder = keras.Model(encoder_inputs, encoder_outputs)

decoder_inputs = keras.Input(shape=(None,), dtype="int64", name="decoder_inputs")
encoded_seq_inputs = keras.Input(shape=(None, embed_dim), name="decoder_state_inputs")
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs)
x = TransformerDecoder(embed_dim, latent_dim, num_heads)([x, encoder_outputs])
x = layers.Dropout(0.5)(x)
decoder_outputs = layers.Dense(vocab_size, activation="softmax")(x)
decoder = keras.Model([decoder_inputs, encoded_seq_inputs], decoder_outputs)

transformer = keras.Model(
    {"encoder_inputs": encoder_inputs, "decoder_inputs": decoder_inputs},
    decoder_outputs,
    name="transformer",
)

## Training our model

We'll use accuracy as a quick way to monitor training progress on the validation data.
Note that machine translation typically uses BLEU scores as well as other metrics, rather than accuracy.

Here we only train for 1 epoch, but to get the model to actually converge
you should train for at least 30 epochs.

In [11]:
transformer.summary()
transformer.compile(
    "rmsprop",
    loss=keras.losses.SparseCategoricalCrossentropy(ignore_class=0),
    metrics=["accuracy"],
)

In [12]:
epochs = 50  # This should be at least 30 for convergence 
#1min42seg una epoca
#4min45seg 5 epocas
#25min3seg 30 epocas

transformer.fit(train_ds, epochs=epochs, validation_data=val_ds)

Epoch 1/50


I0000 00:00:1742655762.469216      72 service.cc:148] XLA service 0x77f868006c40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1742655762.469462      72 service.cc:156]   StreamExecutor device (0): NVIDIA GeForce RTX 3060, Compute Capability 8.6
2025-03-22 15:02:42.589096: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
W0000 00:00:1742655762.705055      72 assert_op.cc:38] Ignoring Assert operator compile_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal_1/Assert/Assert
I0000 00:00:1742655763.273340      72 cuda_dnn.cc:529] Loaded cuDNN version 90300
2025-03-22 15:02:43.631780: W external/local_xla/xla/service/gpu/nvptx_compiler.cc:930] The NVIDIA driver's CUDA version is 12.4 which is older than the PTX compiler version 12.5.82. Because the driver is older than the PTX compiler version, X

[1m   3/1302[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m49s[0m 38ms/step - accuracy: 0.0352 - loss: 9.2760       


I0000 00:00:1742655780.506702      72 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m1199/1302[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m3s[0m 35ms/step - accuracy: 0.1134 - loss: 6.0229

W0000 00:00:1742655822.841157      72 assert_op.cc:38] Ignoring Assert operator compile_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal_1/Assert/Assert



















[1m1202/1302[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m5s[0m 50ms/step - accuracy: 0.1134 - loss: 6.0215




[1m1301/1302[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 49ms/step - accuracy: 0.1140 - loss: 5.9766

W0000 00:00:1742655845.473981      72 assert_op.cc:38] Ignoring Assert operator compile_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal_1/Assert/Assert
W0000 00:00:1742655847.269715      75 assert_op.cc:38] Ignoring Assert operator compile_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal_1/Assert/Assert










[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m95s[0m 57ms/step - accuracy: 0.1140 - loss: 5.9757 - val_accuracy: 0.1462 - val_loss: 4.2270
Epoch 2/50
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m55s[0m 42ms/step - accuracy: 0.1426 - loss: 4.2948 - val_accuracy: 0.1654 - val_loss: 3.4880
Epoch 3/50
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m52s[0m 40ms/step - accuracy: 0.1617 - loss: 3.6255 - val_accuracy: 0.1739 - val_loss: 3.2445
Epoch 4/50
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 36ms/step - accuracy: 0.1709 - loss: 3.3761 - val_accuracy: 0.1798 - val_loss: 3.1429
Epoch 5/50
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 37ms/step - accuracy: 0.1751 - loss: 3.2960 - val_accuracy: 0.1805 - val_loss: 3.1743
Epoch 6/50
[1m1302/1302[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 37ms/step - accuracy: 0.1790 - loss: 3.2534 - val_accuracy: 0.1822 - val_loss: 3.2037
Epoch 7/50
[1m

<keras.src.callbacks.history.History at 0x77f9746a70d0>

In [None]:
callback_dir = "./Playground/TransformerEngEspModel50epochs.keras"

In [25]:
transformer.save(callback_dir)

## Decoding test sentences

Finally, let's demonstrate how to translate brand new English sentences.
We simply feed into the model the vectorized English sentence
as well as the target token `"[start]"`, then we repeatedly generated the next token, until
we hit the token `"[end]"`.

In [33]:
spa_vocab = spa_vectorization.get_vocabulary()
spa_index_lookup = dict(zip(range(len(spa_vocab)), spa_vocab))
max_decoded_sentence_length = 20


def decode_sequence(input_sentence):
    tokenized_input_sentence = eng_vectorization([input_sentence])
    decoded_sentence = "[start]"
    for i in range(max_decoded_sentence_length):
        tokenized_target_sentence = spa_vectorization([decoded_sentence])[:, :-1]
        predictions = transformer(
            {
                "encoder_inputs": tokenized_input_sentence,
                "decoder_inputs": tokenized_target_sentence,
            }
        )

        # ops.argmax(predictions[0, i, :]) is not a concrete value for jax here
        sampled_token_index = ops.convert_to_numpy(
            ops.argmax(predictions[0, i, :])
        ).item(0)
        sampled_token = spa_index_lookup[sampled_token_index]
        decoded_sentence += " " + sampled_token

        if sampled_token == "[end]":
            break
    return decoded_sentence


test_eng_texts = [pair[0] for pair in test_pairs]
for _ in range(30):
    input_sentence = random.choice(test_eng_texts)
    translated = decode_sequence(input_sentence)

In [34]:
for _ in range(10):
    input_sentence = random.choice(test_eng_texts)
    translated = decode_sequence(input_sentence)
    print(input_sentence,"     ",translated)

Tom was quite handsome when he was young.       [start] [UNK] era joven [UNK] cuando era cuando era era joven era joven era joven era joven joven [end] era joven joven [end] mí [end] joven [end] [UNK] joven [end] años [end] [UNK] equipo [end] joven [end]
This problem is a real challenge.       [start] [UNK] cinco minutos [UNK] es un [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] es un es un este problema este problema [UNK] [UNK] [UNK]
Do you know why spring rolls are called spring rolls?       [start] [UNK] [UNK] [UNK] que tengo [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK]
Tom is still worried.       [start] [UNK] [UNK] dólares [end] [UNK] [UNK] [UNK] mary [end] [UNK] mary [end] maría [end] [UNK] [UNK] maría [end] [UNK] mary [end] [UNK] [UNK] vez [end] él [end] [UNK]
I really appreciate it.       [start] [UNK] [UNK] [UNK] hijos [end] tom [end] tú [end] vos [end] no [end] mediodía [end] tú [end] [UNK] no [end] atrás [end] dicho 

Para una epoca

* They sell sporting goods.       [start] ellos se [UNK] [end]
* I need to see Tom immediately.       [start] necesito ver a tom [end]
* Switzerland is a beautiful country.       [start] [UNK] es un país [end]
* Do you like to dance?       [start] te gusta ser [end]
* You have a really good sense of direction.       [start] has tenido una buena de verdad [end]
* Why can't animals talk?       [start] por qué no puede hablar con los animales [end]
* Every now and then I like to have hot and spicy food.       [start] todos los días y yo tengo que tengo comida [end]
* It was the bad weather that caused his illness.       [start] era el mal mal tiempo [UNK] su vida [end]
* Tom is able to say "I can only speak French" in thirty languages.       [start] tom es capaz de decir que yo solo puedo hablar en el francés [end]
* Give me another cup of coffee.       [start] dame otro café [end]


Para 5 epocas
* Tom banged his knee.       [start] tom se golpeó su calle [end]
* We can't stop now.       [start] no podemos dejar ahora [end]
* Did you find your car keys?       [start] has encontrado tus llaves [end]
* The ring was nowhere to be found.       [start] el anillo no estaba en la encontré donde encontró [end]
* They kidnapped me.       [start] me han encontrado [end]
* Tom is too poor to hire a lawyer.       [start] tom es demasiado pobre para [UNK] a un abogado [end]
* She has gone abroad.       [start] ella se ha ido al extranjero [end]
* I'm giving my possessions away.       [start] me voy a [UNK] mi [UNK] [end]
* What do I tell Tom?       [start] qué le diga a tom [end]
* Do you always have coffee with your breakfast?       [start] siempre tienes café con tu desayuno [end]

Para 50 epocas

* Tom was quite handsome when he was young.       [start] [UNK] era joven [UNK] cuando era cuando era era joven era joven era joven era joven joven [end] era joven joven [end] mí [end] joven [end] [UNK] joven [end] años [end] [UNK] equipo [end] joven [end]
* This problem is a real challenge.       [start] [UNK] cinco minutos [UNK] es un [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] es un es un este problema este problema [UNK] [UNK] [UNK]
* Do you know why spring rolls are called spring rolls?       [start] [UNK] [UNK] [UNK] que tengo [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK]
* Tom is still worried.       [start] [UNK] [UNK] dólares [end] [UNK] [UNK] [UNK] mary [end] [UNK] mary [end] maría [end] [UNK] [UNK] maría [end] [UNK] mary [end] [UNK] [UNK] vez [end] él [end] [UNK]
* I really appreciate it.       [start] [UNK] [UNK] [UNK] hijos [end] tom [end] tú [end] vos [end] no [end] mediodía [end] tú [end] [UNK] no [end] atrás [end] dicho [end] [UNK] [UNK] [UNK] tú [end] año [end] [UNK]
* I don't know which child is yours.       [start] libros [end] [UNK] [UNK] a la [UNK] la tuya [UNK] [UNK] tuya [end] [UNK] [UNK] vez [end] es [end] es [end] ustedes [end] [UNK] quiero [end] vez [end] mary y [UNK]
* We almost froze to death.       [start] año [end] [UNK] [UNK] muerte [end] muerte [end] muerte [end] muerte [end] él [end] muerte [end] muerte [end] muerte [end] él [end] muerte [end] cena [end] muerte [end] [UNK] gritar [end] él [end] muerte [end] [UNK]
* Tom is quite busy at the moment.       [start] gracias por [UNK] [UNK] por el por el el momento [UNK] el momento [UNK] momento [end] [UNK] [UNK] momento [end] [UNK] él [end] gracias por carta [end] él [end] las tres [UNK]
* Put out the fire.       [start] [UNK] [UNK] dólares [end] el fuego el fuego fuego [end] bien [end] fuego [end] bien [end] fuego [end] bien [end] bien [end] aquí [end] fuego [end] mary [end] trabajo [end] mary [end] vez [end] bailar [end] [UNK]
* I have caught a cold.       [start] [UNK] [UNK] [UNK] favor [end] [UNK] [UNK] vos [end] vos [end] resfriado [end] vos [end] resfriado [end] vos [end] resfriado [end] [UNK] resfriado [end] [UNK] resfriado [end] verde [end] ella está [UNK]



accuracy: 0.2100 - loss: 2.5439 - val_accuracy: 0.1749 - val_loss: 3.9551

Para 30 epocas ngram=2
* I've got a better idea.       [start] [UNK] tiene una una idea algo [end] idea [end] más [end] cartera [end] algo [end] oficina [end] mintiendo [end] aquí [end] mujer [end] fiesta [end] oficina [end] rato [end] oficina [end] escribir [end] pobre [end] [UNK] ocupado [end]
* Tom says that ghosts aren't real.       [start] [UNK] [UNK] [UNK] [UNK] a tom tom [end] [UNK] [UNK] [UNK] más [end] [UNK] más [end] más [end] [UNK] [UNK] llorar [end] [UNK] más [end] [UNK] [UNK]
* I suppose we could walk.       [start] [UNK] [UNK] [UNK] [UNK] tom [end] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK]
* Tom is a pacifist.       [start] tom [end] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK]
* We didn't want to go, but we had to.       [start] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] hacerlo [end] a tom [UNK] tom [end] hacer [end] hacer [end] [UNK] [UNK] ir [end] ir [end] madre [end] [UNK] [UNK] [UNK]
* I see a crown.       [start] [UNK] [UNK] [UNK] [UNK] tom [end] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK]
* She is good at skiing.       [start] [UNK] [UNK] [UNK] [UNK] tom [end] [UNK] [UNK] [UNK] [UNK] bien [end] bien [end] [UNK] [UNK] [UNK] [UNK] peligroso [end] [UNK] [UNK] [UNK] [UNK]
* I'll let you know as soon as I get there.       [start] [UNK] [UNK] [UNK] tan pronto [UNK] tan pronto [UNK] pronto como allí [end] allí [end] [UNK] allí [end] allí [end] [UNK] [UNK] [UNK] tan pronto [UNK] [UNK] [UNK]
* I grew up in this house.       [start] [UNK] [UNK] [UNK] en esta esta casa [UNK] casa [end] [UNK] [UNK] pasado [end] [UNK] [UNK] dinero [end] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] en esta
* Apples were served as the dessert.       [start] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] tom [end] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK]

accuracy: 0.2017 - loss: 2.7660 - val_accuracy: 0.1773 - val_loss: 3.7764

After 30 epochs, we get results such as:

> She handed him the money.
> [start] ella le pasó el dinero [end]

> Tom has never heard Mary sing.
> [start] tom nunca ha oído cantar a mary [end]

> Perhaps she will come tomorrow.
> [start] tal vez ella vendrá mañana [end]

> I love to write.
> [start] me encanta escribir [end]

> His French is improving little by little.
> [start] su francés va a [UNK] sólo un poco [end]

> My hotel told me to call you.
> [start] mi hotel me dijo que te [UNK] [end]