<a href="https://colab.research.google.com/github/Nishigandha-Wankhade/One-to-Many-Text-Generation-Using-RNN-50-Epochs-/blob/main/RNN_one_to_many_text_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Project 4: One-to-Many Text Generation Using RNN (50 Epochs)

In [1]:
# Step 1: Import libraries and set constants
import tensorflow as tf #for building and training the RNN
import numpy as np      # for handling arrays

In [2]:
# Set constants
SEQ_LENGTH = 100        # Number of characters in each training example
BATCH_SIZE = 64         # How many samples the model sees at once
BUFFER_SIZE = 10000     # Controls how well the training data is shuffled
EMBEDDING_DIM = 256     # Size of word/character vector representation
RNN_UNITS = 1024        # Number of neurons in the LSTM layer
EPOCHS = 50             # How many times the model sees the full dataset

In [3]:
# Step 2: Download and prepare text data (Shakespeare text example)
path_to_file = tf.keras.utils.get_file('shakespeare.txt',
    'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
print(f'Length of text: {len(text)} characters')


Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
[1m1115394/1115394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Length of text: 1115394 characters


In [4]:
# Step 3: Create the vocabulary
vocab = sorted(set(text))
char2idx = {u: i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)
text_as_int = np.array([char2idx[c] for c in text])

In [5]:
# Step 4: Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

sequences = char_dataset.batch(SEQ_LENGTH + 1, drop_remainder=True)


In [6]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

In [7]:
dataset = sequences.map(split_input_target)

In [8]:
# Step 5: Batch and shuffle
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)


In [9]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim),
        tf.keras.layers.LSTM(rnn_units, return_sequences=True, stateful=False, recurrent_initializer='glorot_uniform'),
        tf.keras.layers.Dense(vocab_size)
    ])
    return model

model = build_model(len(vocab), EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)


In [10]:
# Step 7: Define loss and compile the model
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

model.compile(optimizer='adam', loss=loss)


In [11]:
# Step 8: Train the model for 50 epochs
history = model.fit(dataset, epochs=EPOCHS)

Epoch 1/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 62ms/step - loss: 2.8999
Epoch 2/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 64ms/step - loss: 1.8741
Epoch 3/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 65ms/step - loss: 1.6095
Epoch 4/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 66ms/step - loss: 1.4713
Epoch 5/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 67ms/step - loss: 1.3887
Epoch 6/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 68ms/step - loss: 1.3310
Epoch 7/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 69ms/step - loss: 1.2845
Epoch 8/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 70ms/step - loss: 1.2445
Epoch 9/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 71ms/step - loss: 1.2063
Epoch 10/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15

In [12]:
# Step 9: Generate text from trained model
def generate_text(model, start_string):
    num_generate = 50
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)

    text_generated = []

    for _ in range(num_generate):
        predictions = model(input_eval)
        predictions = tf.squeeze(predictions, 0)
        predicted_id = tf.random.categorical(predictions[-1:], num_samples=1)[-1, 0].numpy()
        input_eval = tf.expand_dims([predicted_id], 0)
        text_generated.append(idx2char[predicted_id])

    return start_string + ''.join(text_generated)

In [17]:
print(generate_text(model, start_string="To be, or not to be"))

print(generate_text(model, start_string="In the dark castle"))

print(generate_text(model, start_string="The wizard said"))

To be, or not to be t,

HOrp s myour thagor he s ndowouro ckngerid th
In the dark castle. the th tl, bourere hinghuly ch, cheengosth there
The wizard said.

TRis he INI toingrthean,
Haie sthaucethais itan


# **Interpretation of Results/Output**
After training for 50 epochs, the RNN model becomes increasingly better at learning patterns in the character sequences. You can generate text using a seed (starting string), and the model will predict the next characters based on what it has learned.

This approach shows how one-to-many RNNs can generate structured output (like sentences or paragraphs) from a single input. While early predictions may be random or incoherent, by epoch 50, the model typically learns grammar, spelling, and some stylistic patterns present in the training text (like Shakespeare's language).

# **Common signs of improvement:**

Words begin to form correctly

Punctuation becomes appropriate

Short phrases start to make sense

This example demonstrates the power of RNNs in natural language generation, a foundation for chatbots, writing assistants, and even code autocompletion tools.

