**Text Generation using Irish songs Dataset**

The provided code implements a text generation model using a recurrent neural network (RNN) with Long Short-Term Memory (LSTM) units. The model is trained on an Irish Songs dataset to learn the patterns and structure of the lyrics.

In [1]:
pip install tensorflow numpy



In [2]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

**Load and reading the Irish Songs dataset**

In [3]:
# Load the Irish Songs dataset
url = "https://storage.googleapis.com/tensorflow-1-public/course3/irish-lyrics-eof.txt"
text = tf.keras.utils.get_file("irish-lyrics-eof.txt", url)

# Read the text
with open(text, 'r', encoding='utf-8') as f:
    corpus = f.read()


Downloading data from https://storage.googleapis.com/tensorflow-1-public/course3/irish-lyrics-eof.txt


**Tokenization and labelling dataset**

In [4]:

# Tokenize the text
tokenizer = Tokenizer()
tokenizer.fit_on_texts([corpus])
total_words = len(tokenizer.word_index) + 1

# Create input sequences and their corresponding labels
input_sequences = []
for line in corpus.split('\n'):
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

max_sequence_length = max([len(x) for x in input_sequences])
input_sequences = pad_sequences(input_sequences, maxlen=max_sequence_length, padding='pre')

X, y = input_sequences[:, :-1], input_sequences[:, -1]
y = tf.keras.utils.to_categorical(y, num_classes=total_words)

**Model Architecture and Training**

In [6]:

# Build the model
model = Sequential()
model.add(Embedding(total_words, 100, input_length=max_sequence_length-1))
model.add(LSTM(100))
model.add(Dense(total_words, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 15, 100)           269000    
                                                                 
 lstm (LSTM)                 (None, 100)               80400     
                                                                 
 dense (Dense)               (None, 2690)              271690    
                                                                 
Total params: 621090 (2.37 MB)
Trainable params: 621090 (2.37 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


**Training Model**

In [7]:

# Train the model
model.fit(X, y, epochs=100, verbose=2)

# Function to generate text given a seed text and number of next words
def generate_text(model, tokenizer, seed_text, next_words, max_sequence_len):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
        predicted = np.argmax(model.predict(token_list), axis=-1)
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                output_word = word
                break
        seed_text += " " + output_word
    return seed_text



Epoch 1/100
377/377 - 7s - loss: 6.7593 - accuracy: 0.0643 - 7s/epoch - 18ms/step
Epoch 2/100
377/377 - 5s - loss: 6.2979 - accuracy: 0.0707 - 5s/epoch - 15ms/step
Epoch 3/100
377/377 - 5s - loss: 6.1396 - accuracy: 0.0802 - 5s/epoch - 13ms/step
Epoch 4/100
377/377 - 6s - loss: 5.9644 - accuracy: 0.0894 - 6s/epoch - 15ms/step
Epoch 5/100
377/377 - 5s - loss: 5.7612 - accuracy: 0.1028 - 5s/epoch - 13ms/step
Epoch 6/100
377/377 - 6s - loss: 5.5666 - accuracy: 0.1121 - 6s/epoch - 15ms/step
Epoch 7/100
377/377 - 5s - loss: 5.3836 - accuracy: 0.1185 - 5s/epoch - 13ms/step
Epoch 8/100
377/377 - 5s - loss: 5.2101 - accuracy: 0.1264 - 5s/epoch - 15ms/step
Epoch 9/100
377/377 - 5s - loss: 5.0383 - accuracy: 0.1362 - 5s/epoch - 13ms/step
Epoch 10/100
377/377 - 5s - loss: 4.8683 - accuracy: 0.1501 - 5s/epoch - 15ms/step
Epoch 11/100
377/377 - 5s - loss: 4.7026 - accuracy: 0.1605 - 5s/epoch - 13ms/step
Epoch 12/100
377/377 - 5s - loss: 4.5425 - accuracy: 0.1766 - 5s/epoch - 14ms/step
Epoch 13/100


**Saving model using keras**

In [8]:
model.save('my_text_genration.keras')
model.save("text_generation_model.h5")

  saving_api.save_model(


**Text Generation Testing**

In [9]:
from tensorflow.keras.models import load_model
loaded_model = load_model("text_generation_model.h5")

# Example usage:
seed_text = "hello"
generated_text = generate_text(loaded_model, tokenizer, seed_text, next_words=50, max_sequence_len=max_sequence_length)
print(generated_text)

hello and lower and lower the reel rings the child i was said was my gas and the pure sad sad tore side hand face field good door loved smoke smoke smoke smoke moaning tomorrow thoughts open write smoke smoke smoke smoke smoke smoke smoke moaning smoke moaning smoke moaning smoke
