# Poetry Generation with LSTM & GRU (Keras)
This notebook explores how to build and train a Recurrent Neural Network (RNN) using LSTM and GRU layers to generate poetry. 

Models are trained on a custom dataset of poems and learns to predict and generate poetic lines based on input text.

- Framework: TensorFlow / Keras  
- Architecture: Word-level LSTM / GRU 
- Dataset: Text file (`poem.txt`) https://www.kaggle.com/datasets/harshalgadhe/poem-generation
- Goal: Generate poetry using AI with adjustable creativity (temperature sampling)


In [47]:
# import necessary libraries
import pandas as pd
import numpy as np
import random

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout, GRU, Bidirectional
from tensorflow.keras.callbacks import EarlyStopping

## Load data

In [48]:
# Step 1: Read the TXT file
with open("poem.txt", "r", encoding="utf-8") as file:
    text = file.read().lower()

In [52]:
print(text[:200])

stay, i said
to the cut flowers.
they bowed
their heads lower.
stay, i said to the spider,
who fled.
stay, leaf.
it reddened,
embarrassed for me and itself.
stay, i said to my body.
it sat as a dog do


## Word Tokenization

In [53]:
# create instance of tokenizer
tokenizer = Tokenizer()
# fit tokenizer to current text
tokenizer.fit_on_texts([text])

total_words = len(tokenizer.word_index) + 1
print(f"Total unique words: {total_words}")

# # Generate input sequences using n-gram approach
input_sequences = []
for line in text.split("\n"):
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

print(f"Total training sequences: {len(input_sequences)}")

Total unique words: 3808
Total training sequences: 16311


## Pad Sequences and Prepare Features 

In [55]:
# pad sequences
max_seq_len = max(len(seq) for seq in input_sequences)
input_sequences = pad_sequences(input_sequences, maxlen=max_seq_len, padding='pre')

# x contains all words except  the last word (to be generated)
X = input_sequences[:, :-1]
y = input_sequences[:, -1] # word to be generated for each line
y = np.eye(total_words)[y]  # one-hot encode the labels

## Build and Train First GRU Layer

In [57]:
# define model architecture
gru_model = Sequential([
    Embedding(total_words, 100, input_length=max_seq_len - 1),
    GRU(128, return_sequences=True),
    Dropout(0.2),
    GRU(128),
    Dropout(0.2),
    Dense(total_words, activation='softmax')
])

gru_model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
gru_model.summary()

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_10 (Embedding)    (None, 15, 100)           380800    
                                                                 
 gru_12 (GRU)                (None, 15, 128)           88320     
                                                                 
 dropout_20 (Dropout)        (None, 15, 128)           0         
                                                                 
 gru_13 (GRU)                (None, 128)               99072     
                                                                 
 dropout_21 (Dropout)        (None, 128)               0         
                                                                 
 dense_10 (Dense)            (None, 3808)              491232    
                                                                 
Total params: 1,059,424
Trainable params: 1,059,424
N

In [58]:
# fit model
gru_model.fit(X, y, epochs=50, verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x1d81c38a680>

## Model Testing

In [59]:

# defien funcitons to generate text using trained model
def sample_with_temperature(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds + 1e-10) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    return np.random.choice(len(preds), p=preds)

def generate_poem(seed_text, model, next_words=30, temperature=1.0):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_seq_len - 1, padding='pre')
        predicted = model.predict(token_list, verbose=0)[0]
        predicted_index = sample_with_temperature(predicted, temperature)
        output_word = tokenizer.index_word.get(predicted_index, "")
        seed_text += " " + output_word
    return seed_text


In [61]:
print(generate_poem("i said to myself", model=gru_model, next_words=20, temperature=0.4))

i said to myself and has only with tease me out of thought with a summers wall to the wearin o the green and


##### model doesn't seem to be doing well, based on accuracy, it looks to underfit

## LSTM Model

In [62]:
# lstm model with same architecture as gru model, and early stopping applied
lstm_model = Sequential([
    Embedding(total_words, 100, input_length=max_seq_len - 1),
    LSTM(128, return_sequences=True),
    Dropout(0.2),
    LSTM(128),
    Dropout(0.3),
    Dense(total_words, activation='softmax')
])

early_stopping_monitor = EarlyStopping(monitor='loss',patience = 5)

lstm_model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
lstm_model.summary()

lstm_model.fit(X, y, epochs=50, verbose=1, callbacks=[early_stopping_monitor])

Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_11 (Embedding)    (None, 15, 100)           380800    
                                                                 
 lstm_8 (LSTM)               (None, 15, 128)           117248    
                                                                 
 dropout_22 (Dropout)        (None, 15, 128)           0         
                                                                 
 lstm_9 (LSTM)               (None, 128)               131584    
                                                                 
 dropout_23 (Dropout)        (None, 128)               0         
                                                                 
 dense_11 (Dense)            (None, 3808)              491232    
                                                                 
Total params: 1,120,864
Trainable params: 1,120,864
N

<keras.callbacks.History at 0x1d81c95e710>

#### lstm_model looks to underfit even more than the gru model

In [64]:
print(generate_poem("i said to myself", model=lstm_model, next_words=20, temperature=0.4))

i said to myself to van diemans land is a jewel when so deep as gone to darlin troubled gone wandered gone gone by


## Bi-directional GRU Model

In [65]:
# define model architecture
gru_model2 = Sequential([
    Embedding(total_words, 100, input_length=max_seq_len - 1),
    Bidirectional(GRU(128, return_sequences=True)),
    Dropout(0.4),
    Bidirectional(GRU(128)),
    Dropout(0.4),
    Dense(total_words, activation='softmax')
])

early_stopping_monitor = EarlyStopping(monitor='loss',patience = 5)

gru_model2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
gru_model2.summary()

Model: "sequential_12"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_12 (Embedding)    (None, 15, 100)           380800    
                                                                 
 bidirectional_5 (Bidirectio  (None, 15, 256)          176640    
 nal)                                                            
                                                                 
 dropout_24 (Dropout)        (None, 15, 256)           0         
                                                                 
 bidirectional_6 (Bidirectio  (None, 256)              296448    
 nal)                                                            
                                                                 
 dropout_25 (Dropout)        (None, 256)               0         
                                                                 
 dense_12 (Dense)            (None, 3808)            

In [66]:
# fit model
gru_model2.fit(X, y, epochs=50, verbose=1, callbacks=[early_stopping_monitor])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x1d85f7bf9a0>

In [69]:
print(generate_poem("i said to myself", model=gru_model2, next_words=20, temperature=0.7))

i said to myself i see another free goodbye to dear old skibbereen and killarney come home they took a tired boy i love


#### looks to perform better than previous gru model