# Assignment - II  (Keras Version)

## Text Generation With RNNs

1. Preprocess the text data: The text data needs to be tokenized, possibly with additional steps like lowercasing and punctuation removal. You'll also need to convert the text data into sequences that your RNN can learn from. 


2. Implement an RNN: Using your chosen deep learning framework, implement an RNN, LSTM, or GRU for this task. Decide on aspects such as the number of layers, hidden units, etc.


3. Train your model: Train the model using your processed data. Make sure to implement a mechanism to save the weights of the model periodically or when it achieves the best performance on a validation set.


4. Generate new text: Using your trained model, generate new text that mimics the style of the training corpus.

## 1. Preprocess the text data

Loading the reference file

In [1]:
import tensorflow as tf
import numpy as np
import random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Activation
from tensorflow.keras.optimizers import RMSprop

filepath = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

# open file path 'rb' implies read binary
# we lower everything to make it easier to learn
text = open(filepath, 'rb').read().decode(encoding='utf-8').lower()

text = text[300000:800000]

# get all the unique characters in the text
characters = sorted(set(text))

# create a dictionary that maps characters to their index
char_to_index = dict((c, i) for i, c in enumerate(characters))

# Decoding dictionary that maps index to characters
index_to_char = dict((i, c) for i, c in enumerate(characters))

SEQ_LENGTH = 40  # how long of a preceding sequence to collect for the RNN
STEP_SIZE = 3  # how many characters to skip before sampling the next sequence

sentences = []
next_characters = []

# loop through the text and create sequences
for i in range(0, len(text) - SEQ_LENGTH, STEP_SIZE):
    sentences.append(text[i: i + SEQ_LENGTH])
    next_characters.append(text[i + SEQ_LENGTH])

# create a numpy array of zeros to store the data
X = np.zeros((len(sentences), SEQ_LENGTH, len(characters)), dtype=bool)
y = np.zeros((len(sentences), len(characters)), dtype=bool)

# loop through the sentences and characters and convert them to one-hot encoding
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_to_index[char]] = 1
    y[i, char_to_index[next_characters[i]]] = 1




2024-05-26 02:15:59.408057: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-26 02:16:00.508658: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-26 02:16:02.727646: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### 2. RNN Model

In [4]:
model = Sequential()
model.add(LSTM(128, input_shape=(SEQ_LENGTH, len(characters))))
model.add(Dense(len(characters)))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer=RMSprop(learning_rate=0.01))

model.fit(X, y, batch_size=256, epochs=4)


  super().__init__(**kwargs)


Epoch 1/4


2024-05-25 23:37:02.219634: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 259980240 exceeds 10% of free system memory.


[1m651/651[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m65s[0m 97ms/step - loss: 2.5059
Epoch 2/4
[1m651/651[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m86s[0m 103ms/step - loss: 1.7807
Epoch 3/4
[1m651/651[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m77s[0m 96ms/step - loss: 1.6087
Epoch 4/4
[1m651/651[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 96ms/step - loss: 1.5262


<keras.src.callbacks.history.History at 0x7394749dba00>

3. RNN Model Output

In [11]:
import numpy as np
import random

# Assuming text, SEQ_LENGTH, characters, char_to_index, and index_to_char are defined elsewhere

# Function to sample an index from the model's output predictions
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds + 1e-10) / temperature  # Added epsilon to prevent log(0)
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

# Text generation function
def generate_text(length, temperature):
    start_index = random.randint(0, len(text) - SEQ_LENGTH - 1)
    generated = ''
    sentence = text[start_index: start_index + SEQ_LENGTH]
    generated += sentence
    for i in range(length):
        x_pred = np.zeros((1, SEQ_LENGTH, len(characters)))
        for t, char in enumerate(sentence):
            x_pred[0, t, char_to_index[char]] = 1
        preds = model.predict(x_pred, verbose=0)[0]
        next_index = sample(preds, temperature)
        next_char = index_to_char[next_index]  # Corrected to map index to character
        generated += next_char
        sentence = sentence[1:] + next_char
    return generated

# Example usage
print(generate_text(300, 0.2))

 in ashes, some coal-black,
for the depost the county the county and the soul
would the some to the counter to the change
that he stay the strength with the county
the strength to the chosent to the soul word the soul
the county the strength of the brother's fies
the still to the still to the soul to the stines,
and truth the some to me n


In [12]:
print(generate_text(300, 0.6))


i am too sore enpierced with his shaft
to mageter and mine to the sending:
o, come the in the strengt shall brought.

lady:
i speet with his drow to love to more to go's man.

benvolio:
gaunt, truite the compart follow me to this romeo his mosted,
and drunknt to me shall how be the soul,
and truth hath of the boy? for him, to a very
and 
