# Poetry Generation with LSTM & GRU (Keras)
This notebook explores how to build and train a Recurrent Neural Network (RNN) using LSTM and GRU layers to generate poetry. 

Models are trained on a custom dataset of poems and learns to predict and generate poetic lines based on input text.

- Framework: TensorFlow / Keras  
- Architecture: Word-level LSTM / GRU 
- Dataset: CSV file (`kaggle_poem_dataset.csv`)   https://www.kaggle.com/datasets/johnhallman/complete-poetryfoundationorg-dataset
- Goal: Generate poetry using AI with adjustable creativity (temperature sampling)


In [None]:
# import necessary libraries
import pandas as pd
import numpy as np
import random

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout, GRU, Attention, GlobalAveragePooling1D
from tensorflow.keras.callbacks import EarlyStopping

## Load and clean Data

In [82]:
# read in data using pandas library
data = pd.read_csv("kaggle_poem_dataset.csv", encoding='utf-8', keep_default_na=False, engine='python')
print(f"Data shape {data.shape}")
data.head()

Data shape (15652, 5)


Unnamed: 0.1,Unnamed: 0,Author,Title,Poetry Foundation ID,Content
0,0,Wendy Videlock,!,55489,"Dear Writers, I’m compiling the first in what ..."
1,1,Hailey Leithauser,0,41729,"Philosophic\nin its complex, ovoid emptiness,\..."
2,2,Jody Gladding,1-800-FEAR,57135,We'd like to talk with you about fear t...
3,3,Joseph Brodsky,1 January 1965,56736,The Wise Men will unlearn your name.\nAbove yo...
4,4,Ted Berrigan,3 Pages,51624,For Jack Collom\n10 Things I do Every Day\n\np...


In [76]:
small_data = data[1:5000]

In [77]:
# Drop any rows without actual poems
small_data = small_data.dropna(subset=["Content"])

# Combine all poems into one large string
poems = "\n".join(small_data["Content"].astype(str).tolist()).lower()

In [78]:
print(poems[:500])

philosophic
in its complex, ovoid emptiness,
a skillful pundit coined it as a sort
of stopgap doorstop for those
quaint equations

romans never
dreamt of. in form completely clever
and discrete—a mirror come unsilvered,
loose watch face without the works,
a hollowed globe

from tip to toe
unbroken, it evades the grappling
hooks of mass, tilts the thin rim of no thing,
remains embryonic sum,
non-cogito.
we'd  like  to  talk  with  you  about  fear they  said  so
many  people  live  in  fear  thes


## Word Tokenization

In [64]:
# create instance of tokenizer
tokenizer = Tokenizer(num_words=5000)
# fit tokenizer to current text
tokenizer.fit_on_texts([poems])

total_words = tokenizer.num_words
print(f"Total unique words: {total_words}")

# Generate input sequences using n-gram approach
input_sequences = []
for line in poems.split("\n"):
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

print(f"Total training sequences: {len(input_sequences)}")

Total unique words: 5000
Total training sequences: 955739


In [65]:
# sample content of input_sequences
input_sequences[:10]

[[415, 4402],
 [415, 4402, 213],
 [415, 4402, 213, 1],
 [415, 4402, 213, 1, 115],
 [415, 4402, 213, 1, 115, 6],
 [415, 4402, 213, 1, 115, 6, 37],
 [415, 4402, 213, 1, 115, 6, 37, 7],
 [415, 4402, 213, 1, 115, 6, 37, 7, 416],
 [9, 4],
 [9, 4, 3148]]

## Pad Sequences and Prepare Features

In [66]:
# keep all lines at a cap of 50 words
max_seq_len = 50
input_sequences = pad_sequences(input_sequences, maxlen=max_seq_len, padding='pre')

# x contains all words except  the last word (to be generated)
X = input_sequences[:, :-1]
y = input_sequences[:, -1] # word to be generated for each line

## Build and Train LSTM Model

In [None]:
model = Sequential([
        Embedding(total_words, 100, input_length=max_seq_len - 1),
        LSTM(256, return_sequences=True),
        Dropout(0.2),
        LSTM(256),
        Dropout(0.2),
        Dense(total_words, activation='softmax')
])

early_stopping_monitor = EarlyStopping(monitor='loss',patience = 8)

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

model.fit(X, y, epochs=50, verbose=1, callbacks=[early_stopping_monitor])


Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (None, 49, 100)           500000    
                                                                 
 lstm_4 (LSTM)               (None, 49, 256)           365568    
                                                                 
 dropout_4 (Dropout)         (None, 49, 256)           0         
                                                                 
 lstm_5 (LSTM)               (None, 256)               525312    
                                                                 
 dropout_5 (Dropout)         (None, 256)               0         
                                                                 
 dense_2 (Dense)             (None, 5000)              1285000   
                                                                 
Total params: 2,675,880
Trainable params: 2,675,880
No

##### lstm model took too much time to train locally and ultimately got interrupted. training on colab did work, but results were not good

## GRU Model

In [30]:
gru_model = Sequential([
    Embedding(total_words, 100, input_length=max_seq_len - 1),
    GRU(256, return_sequences=True),
    Dropout(0.2),
    GRU(256),
    Dropout(0.2),
    Dense(total_words, activation='softmax')
])

early_stopping_monitor = EarlyStopping(monitor='loss',patience = 8)

gru_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
gru_model.summary()

gru_model.fit(X, y, epochs=50, verbose=1, callbacks=[early_stopping_monitor])

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 49, 100)           500000    
                                                                 
 gru (GRU)                   (None, 49, 256)           274944    
                                                                 
 dropout (Dropout)           (None, 49, 256)           0         
                                                                 
 gru_1 (GRU)                 (None, 256)               394752    
                                                                 
 dropout_1 (Dropout)         (None, 256)               0         
                                                                 
 dense (Dense)               (None, 5000)              1285000   
                                                                 
Total params: 2,454,696
Trainable params: 2,454,696
Non-

<keras.callbacks.History at 0x19db6c80880>

In [31]:
gru_model.save("poetry_gru_model.keras")

## Model Testing

In [None]:
# defien funcitons to generate text using trained model
def sample_with_temperature(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds + 1e-10) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    return np.random.choice(len(preds), p=preds)

def generate_poem(seed_text, model, next_words=30, temperature=1.0):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_seq_len - 1, padding='pre')
        predicted = model.predict(token_list, verbose=0)[0]
        predicted_index = sample_with_temperature(predicted, temperature)
        output_word = tokenizer.index_word.get(predicted_index, "")
        seed_text += " " + output_word
    return seed_text


In [36]:
print(generate_poem("the blue sky", model=gru_model, next_words=20, temperature=0.4))

the blue sky the roof of the world is a little in the of the sun in the air and the sun is


#### gru_model did generate some text, however it isn't performing well

## GRU with attention layers

In [67]:
input = Input(shape=(max_seq_len-1,))
x = Embedding(total_words, 100)(input)
x = GRU(256, return_sequences=True)(x)
x = Dropout(0.2)(x)
x = GRU(256, return_sequences=True)(x)
attn_output = Attention()([x, x])
x = GlobalAveragePooling1D()(attn_output)
x = Dense(total_words, activation='softmax')(x)
gru_model2 = Model(inputs=input, outputs=x)
gru_model2.summary()

Model: "model_11"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_12 (InputLayer)          [(None, 49)]         0           []                               
                                                                                                  
 embedding_11 (Embedding)       (None, 49, 100)      500000      ['input_12[0][0]']               
                                                                                                  
 gru_22 (GRU)                   (None, 49, 256)      274944      ['embedding_11[0][0]']           
                                                                                                  
 dropout_11 (Dropout)           (None, 49, 256)      0           ['gru_22[0][0]']                 
                                                                                           

In [68]:
early_stopping_monitor = EarlyStopping(monitor='loss',patience = 5)

gru_model2.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

gru_model2.fit(X, y, epochs=50, verbose=1, callbacks=[early_stopping_monitor])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
 6821/29867 [=====>........................] - ETA: 4:03 - loss: 5.9531 - accuracy: 0.1013

KeyboardInterrupt: 