# Text Generation with LSTM

This notebook demonstrates how to train a Long Short-Term Memory (LSTM) neural network for text generation using song lyrics data. We'll start by preprocessing the data, defining the model architecture, training the model, and finally generating text using the trained model.

## Introduction

In this notebook, we'll follow these steps:

1. Data Preparation: Read song lyrics data from a CSV file, clean the text data, and prepare it for model training.
2. Model Architecture Definition: Define an LSTM-based neural network architecture for text generation using Keras.
3. Model Training: Train the defined model on the prepared data.
4. Text Generation: Generate text using the trained model.

Let's get started!

---



In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd
import re
from keras.models import Sequential
from keras.layers import LSTM, Dense, Activation, Dropout, Bidirectional
from keras.callbacks import ModelCheckpoint
from random import randint





## Data Preparation

We'll start by reading the song lyrics data from a CSV file and cleaning the text data.


In [2]:
# Read the data from the CSV file
data = pd.read_csv('abba.csv')

# Concatenate all lyrics into a single string
corpus = ' '.join(data['lyrics'].dropna())

# Clean the text data
def clean_text(text):
    cleaned_text = re.sub('[^\x00-\x7F]+', '', text)
    cleaned_text = re.sub(r'[\r\n\t]', ' ', cleaned_text)
    cleaned_text = re.sub(r'\s+', ' ', cleaned_text)
    return cleaned_text.strip()

corpus = clean_text(corpus)

Now, let's proceed to encode the characters and slice the corpus into sequences.



In [3]:
# Get unique characters from the corpus
unique_chars = sorted(set(corpus))

# Create encoder and decoder dictionaries
encoder_dict = {char: i for i, char in enumerate(unique_chars)}
decoder_dict = {i: char for i, char in enumerate(unique_chars)}

# Slice the corpus into semi-redundant sequences of 20 characters
sentence_length = 20
skip = 1
X_data = []
y_data = []

for i in range(0, len(corpus) - sentence_length, skip):
    sentence = corpus[i:i + sentence_length]
    next_char = corpus[i + sentence_length]
    X_data.append([encoder_dict[char] for char in sentence])
    y_data.append(encoder_dict[next_char])

# Vectorize X and y
num_chars = len(unique_chars)
num_sentences = len(X_data)

X = np.zeros((num_sentences, sentence_length, num_chars), dtype=bool)
y = np.zeros((num_sentences, num_chars), dtype=bool)

for i, sentence in enumerate(X_data):
    for t, encoded_char in enumerate(sentence):
        X[i, t, encoded_char] = 1
    y[i, y_data[i]] = 1


## Model Architecture Definition

Now, let's define the architecture of the LSTM-based model for text generation using Keras.



In [4]:
model = Sequential()
model.add(LSTM(32, input_shape=(sentence_length, num_chars), return_sequences=True))
model.add(Bidirectional(LSTM(32, return_sequences=True)))  
model.add(Dropout(0.2))
model.add(LSTM(64))
model.add(Dropout(0.2))
model.add(Dense(num_chars))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')    





## Model Training

It's time to train the defined model on the prepared data.

In [5]:
# Train the model
file_path = "weights-{epoch:02d}.hdf5"
checkpoint = ModelCheckpoint(file_path, monitor="loss", verbose=1, save_best_only=True, mode="min")
callbacks = [checkpoint]

history = model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks)

Epoch 1/20

Epoch 1: loss improved from inf to 2.80231, saving model to weights-01.hdf5
Epoch 2/20
   2/1642 [..............................] - ETA: 1:29 - loss: 2.4365

  saving_api.save_model(


Epoch 2: loss improved from 2.80231 to 2.40355, saving model to weights-02.hdf5
Epoch 3/20
Epoch 3: loss improved from 2.40355 to 2.23460, saving model to weights-03.hdf5
Epoch 4/20
Epoch 4: loss improved from 2.23460 to 2.12299, saving model to weights-04.hdf5
Epoch 5/20
Epoch 5: loss improved from 2.12299 to 2.03615, saving model to weights-05.hdf5
Epoch 6/20
Epoch 6: loss improved from 2.03615 to 1.96660, saving model to weights-06.hdf5
Epoch 7/20
Epoch 7: loss improved from 1.96660 to 1.90670, saving model to weights-07.hdf5
Epoch 8/20
Epoch 8: loss improved from 1.90670 to 1.85981, saving model to weights-08.hdf5
Epoch 9/20
Epoch 9: loss improved from 1.85981 to 1.81922, saving model to weights-09.hdf5
Epoch 10/20
Epoch 10: loss improved from 1.81922 to 1.78264, saving model to weights-10.hdf5
Epoch 11/20
Epoch 11: loss improved from 1.78264 to 1.75201, saving model to weights-11.hdf5
Epoch 12/20
Epoch 12: loss improved from 1.75201 to 1.72745, saving model to weights-12.hdf5
Epoc

## Text Generation

Finally, let's generate text using the trained model.

In [8]:
def generate(seed_pattern):
    # Adjust seed pattern length to match sentence_length
    if len(seed_pattern) > sentence_length:
        seed_pattern = seed_pattern[:sentence_length]
    elif len(seed_pattern) < sentence_length:
        seed_pattern = seed_pattern.ljust(sentence_length)

    X = np.zeros((1, sentence_length, num_chars), dtype=bool)
    for i, character in enumerate(seed_pattern):
        X[0, i, encoder_dict[character]] = 1
    
    generated_text = ""
    for i in range(500):
        pred = model.predict(X, verbose=0)[0]
        prediction = sample(pred, 0.3)
        generated_text += decoder_dict[prediction]

        activations = np.zeros((1, 1, num_chars), dtype=bool)
        activations[0, 0, prediction] = 1
        X = np.concatenate((X[:, 1:, :], activations), axis=1)

    return generated_text

# Generate text with a seed pattern
seed = "In the bard and show you on your lovelight and i can't get the mowner i'm a marion an and every mind, there's a boot"
generated_text = generate(seed)
print("Generated Text:")
print(generated_text)


Generated Text:
 a gonna how we was the midnight I was the do If I could the star the star the day of the she the way a fire I was way I could my from the star the star on the stream The hould I can the star the time in the day I can to love is a bale it sound I want to the song that I have the she We can't gonna be a san and the sure I have it soul the sun I can the day We could my love is love is a shang Love in the star the time I have a dream To the right the sun I want to hear the can the way Oh the fan ou


## Conclusion

In this notebook, we covered the entire process of training an LSTM-based neural network for text generation using song lyrics data. We started by preprocessing the data, defining the model architecture, training the model, and finally generating text using the trained model.

By following these steps, you can apply similar techniques to train models on other text datasets and generate text in various domains.
