Example text data (you can replace this with any larger corpus) text = """ Once upon a time, there was a little girl named Red Riding Hood. She loved to visit her grandmother, who lived in the woods. One day, her mother asked her to take a basket of goodies to her grandmother. On her way through the woods, she met a big bad wolf who wanted to eat her. [CO5]

(i) Build the Transformer Model on above dataset

(ii) Train the model using 20, 60, 70 epochs

 (iii) After training, use the model to generate new text by feeding it an initial seed text

(iv) Experimenting and Improving the Model by large dataset and hyper tune parameter.

In [None]:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

In [None]:
# Example text data
text = """Once upon a time, there was a little girl named Red Riding Hood. She loved to visit her grandmother, who lived in the woods. One day, her mother asked her to take a basket of goodies to her grandmother. On her way through the woods, she met a big bad wolf who wanted to eat her."""

# Tokenization and preparing sequences
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
total_words = len(tokenizer.word_index) + 1

# Create input sequences
input_sequences = []
for i in range(1, len(tokenizer.texts_to_sequences([text])[0])):
    n_gram_sequence = tokenizer.texts_to_sequences([text])[0][:i+1]
    input_sequences.append(n_gram_sequence)

# Pad sequences
max_seq_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_seq_len, padding='pre'))

# Inputs and labels
X = input_sequences[:, :-1]
y = input_sequences[:, -1]
y = tf.keras.utils.to_categorical(y, num_classes=total_words)

# Define Transformer-based model using MultiHeadAttention
def transformer_block(vocab_size, seq_len):
    inputs = tf.keras.layers.Input(shape=(seq_len,))
    embedding = tf.keras.layers.Embedding(vocab_size, 64)(inputs)

    # Multi-Head Attention Layer
    attention_output = tf.keras.layers.MultiHeadAttention(num_heads=2, key_dim=64)(embedding, embedding)
    attention_output = tf.keras.layers.LayerNormalization()(attention_output + embedding)

    # Feed-forward network
    ff_output = tf.keras.layers.Dense(128, activation='relu')(attention_output)
    ff_output = tf.keras.layers.Dense(64)(ff_output)
    ff_output = tf.keras.layers.LayerNormalization()(ff_output + attention_output)

    outputs = tf.keras.layers.GlobalAveragePooling1D()(ff_output)
    outputs = tf.keras.layers.Dense(vocab_size, activation='softmax')(outputs)

    model = tf.keras.models.Model(inputs, outputs)
    return model

(i) Build the Transformer Model on above dataset

In [None]:
# Build model
model = transformer_block(total_words, max_seq_len-1)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()


(ii) Train the model using 20, 60, 70 epochs

In [None]:
# Train the model (20, 60, 70 epochs)
epochs_list = [20, 60, 70]
history = []
for epoch in epochs_list:
    print(f"Training with {epoch} epochs...")
    h = model.fit(X, y, epochs=epoch, verbose=1)
    history.append(h)

Training with 20 epochs...
Epoch 1/20
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 32ms/step - accuracy: 0.0242 - loss: 3.8937  
Epoch 2/20
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step - accuracy: 0.1248 - loss: 3.5960 
Epoch 3/20
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step - accuracy: 0.1161 - loss: 3.5446
Epoch 4/20
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - accuracy: 0.1369 - loss: 3.4035
Epoch 5/20
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step - accuracy: 0.1057 - loss: 3.4325
Epoch 6/20
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - accuracy: 0.1369 - loss: 3.3326 
Epoch 7/20
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step - accuracy: 0.1265 - loss: 3.3006
Epoch 8/20
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step - accuracy: 0.1161 - loss: 3.2892 
Epoch 9/20
[1m2/2[0m [32m━━━━

(iii) After training, use the model to generate new text by feeding it an initial seed text

(iv) Experimenting and Improving the Model by large dataset and hyper tune parameter.

In [None]:
# Text generation using seed text
def generate_text(seed_text, next_words, model, max_sequence_len):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
        predicted = model.predict(token_list, verbose=0)
        predicted_word_index = np.argmax(predicted, axis=1)
        output_word = tokenizer.index_word[predicted_word_index[0]]
        seed_text += " " + output_word
    return seed_text

seed_text = "Once upon a time"
generated_text = generate_text(seed_text, 50, model, max_seq_len)
print("Generated text:", generated_text)

Generated text: Once upon a time a a little girl girl named red riding hood she loved to visit her grandmother who in in the woods one day her mother asked to to her basket of her way way way the the woods woods a a a met a a a big bad bad who wanted
