🚀 Practical Example in TensorFlow: Seq2Seq for Machine Translation

We’ll build a toy English → Spanish translator using an encoder–decoder LSTM.

In [1]:
import numpy as np
import matplotlib.pyplot as plt 
import tensorflow
from tensorflow import keras
from keras.models import Sequential,Model
from keras.layers import Dense,LSTM, Embedding, Dropout,Input

Step-2 Parameter

In [2]:
latent_dim = 256 # Size of hidden units in LSTM (capacity of memory)
num_encoders_tokens = 2000 # Number of unique tokens in the input sequence 
num_decoders_tokens = 2000 # Number of unique tokens in the output sequence
max_encoder_seq_length = 20 # Maximum length of input sequence
max_decoder_seq_length = 20 # Maximum length of output sequence

📌 Explanation:

latent_dim: how many neurons in the LSTM hidden state. Bigger → more learning capacity, but slower.

num_encoder_tokens: total unique words in source language (English).

num_decoder_tokens: total unique words in target language (Spanish).

max_encoder_seq_length: padding length of input.

max_decoder_seq_length: padding length of output.

Step 3: Encoder

In [3]:
#Encoder
encoder_inputs = Input(shape=(None,))
enc_emb=Embedding(num_encoders_tokens,latent_dim)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
_, state_h, state_c = encoder_lstm(enc_emb)
encoder_states = [state_h, state_c]

# Encoder
# encoder_inputs = Input(shape=(None,))
# enc_emb = Embedding(num_encoders_tokens, latent_dim)(encoder_inputs)
# encoder_lstm = LSTM(latent_dim, return_state=True)
# _, state_h, state_c = encoder_lstm(enc_emb)
# encoder_states = [state_h, state_c]

📌 Explanation:

Embedding(num_encoder_tokens, latent_dim): maps each word index → dense vector.

return_state=True: we don’t just want the outputs, we also need the hidden state (h) and cell state (c) → these summarize the input sentence.

encoder_states: passed into the decoder to give it memory of the input.

In [4]:
encoder_inputs

<KerasTensor shape=(None, None), dtype=float32, sparse=False, ragged=False, name=keras_tensor>

Step 4: Decoder

In [5]:
# Decoder
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_encoders_tokens, latent_dim)
dec_emb = dec_emb_layer(decoder_inputs)

decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb, initial_state=encoder_states)

decoder_dense = Dense(num_encoders_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

📌 Explanation:

decoder_inputs: target sequence (Spanish sentence shifted by 1 step, like teacher forcing).

return_sequences=True: decoder outputs every word in sequence, not just final state.

Dense(num_decoder_tokens, softmax): gives probability for each word in vocab at each step.

Step-5 Full Model

In [6]:
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer="rmsprop", loss="categorical_crossentropy", metrics=["accuracy"])
model.summary()


In [9]:
# Suppose X_train_enc, X_train_dec_in, Y_train_dec_out are prepared (one-hot encoded)
model.fit([X_train_enc, X_train_dec_in], Y_train_dec_out,
          batch_size=64,
          epochs=30,
          validation_split=0.2)

NameError: name 'X_train_enc' is not defined