# Generating Shakespearean Text Using a Character RNN

**Project from :
Geron, Aurelien."Chapter 16: Natural Language Processsing with RNNs and Attention".*Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow*,second edition, O'Reilly Media Inc,2019, 525-534.

A common approach to Neural Language Processing is using recurrent neural networks(RNN). To generate Shakespearean text we will use a Character RNN which is trained to predict the next character in a sentence. 

We will first build a **stateless RNN**, which learns on random portions of text at each iteration, without any information on the rest of the text. 

Then we will build a **stateful RNN** , which perserves the hidden state betweeen training iterations and continues reading where it left off. This allows it to learn longer patterns.

## Import Libraries

In [29]:
import tensorflow as tf
from tensorflow import keras

In [30]:
import pandas as pf
import numpy as np

### print versions

In [31]:
tf.__version__

'2.4.1'

In [32]:
keras.__version__

'2.4.0'

### Creating the Training Dataset

Download all of Shakespeare's work, using Keras's `get file()`

In [33]:
shakespeare_url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt" #shortcut URL
filepath = keras.utils.get_file("shakespeare.txt", shakespeare_url)
with open(filepath) as f:
        shakespeare_text = f.read()

In [34]:
print(shakespeare_text[:148])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?



Encode every character as an integer. we will use `keras tokenizer class`.
We first fit tokenizer to text.

In [35]:
tokenizer = keras.preprocessing.text.Tokenizer(char_level=True)
tokenizer.fit_on_texts(shakespeare_text)

In [36]:
tokenizer.texts_to_sequences(["First"]);
tokenizer.sequences_to_texts([[20,6,9,8,3]])

['f i r s t']

In [37]:
max_id = len(tokenizer.word_index) # Number of distinct characters

In [38]:
dataset_size = tokenizer.document_count # total number of characters

encode full text so each character is represented by its ID (we subtract 1 to get IDs from 0 to 38, rather than from 1 to 39)

In [39]:
[encoded] = np.array(tokenizer.texts_to_sequences([shakespeare_text])) - 1

## How to split a Sequential Dataset

In [40]:
train_size = dataset_size * 90//100
dataset = tf.data.Dataset.from_tensor_slices(encoded[:train_size])

## Chopping the Sequential Dataset into Multiple Windows

We will use the dataset's `window()`. This method will be used to conver the very long sequence of characters into many smaller windows of text.  Currently the training set consists of a single sequence of over a million characters. 

In [41]:
n_steps = 100
window_length = n_steps + 1 #target = input shifted 1 character ahead
dataset = dataset.repeat().window(window_length, shift=1, drop_remainder=True)

The `window()` method creates a dataset that contains windows, each of which is also represented as a dataset. It's a *nested  dataset*.

We can not use a nested dataset directly for training. Our model expects tensor's as input, not datasets.

This is where `flat_map()` method comes in.  It converts a nested dataset into a *flat* dataset(one that does not contain datasets).

In [42]:
dataset = dataset.flat_map(lambda window: window.batch(window_length))

We call `batch(window_lenghth)` we will get a single tensor for each of them.
We need to shuffle the windows then we can batch the windows and seperate the inputs(first 100 characters) from the target(the last character).

In [43]:
batch_size = 32
dataset = dataset.shuffle(10000).batch(batch_size)
dataset = dataset.map(lambda windows: (windows[:, :-1], windows[:, 1:]))

categorical input should generally  be encoded, usually as one_hot vectors or as embeddings.

In [44]:
dataset = dataset.map(
    lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))

need to add prefectching.

In [45]:
dataset = dataset.prefetch(1)

## Building and Training the Char-RNN Model

**Note:** The GRU class will only use the GPU when using the default values for the following arguments: `activation, recurrent_activation, recurrent_dropout, unroll, use_bias, reset_after`.

*I'll do `epoch=5` only to make ths take less time to run*

In [47]:
model = keras.models.Sequential([
    keras.layers.GRU(128, return_sequences=True, input_shape=[None, max_id],
                      dropout=0.2),  #recurrent_dropout=0.2),
    keras.layers.GRU(128, return_sequences=True,
                     dropout=0.2),  #recurrent_dropout=0.2),
    keras.layers.TimeDistributed(keras.layers.Dense(max_id,
                                                   activation="softmax"))
])
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
history = model.fit(dataset, steps_per_epoch=train_size // batch_size,
                    epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


## Using the Char-RNN Model

We now created a model that can predict next character in text. To feed it text we need to preprocess like earlier.

We will create a function for this.

In [48]:
def preprocess(texts):
    X = np.array(tokenizer.texts_to_sequences(texts)) - 1
    return tf.one_hot(X, max_id)

Now use model to predict the next letter in some text.

In [87]:
X_new = preprocess(["How are yo"]);
Y_pred = model.predict_classes(X_new);
tokenizer.sequences_to_texts(Y_pred + 1)[0][-1] #1st sentence, last character 'u'

'u'

## Generating Fake Shakespearean Text

To create more diverse and interesting text ee can we can pick the next character randomly, with a probability equal to the estimated probability, using Tensorflow's tf `tf.random.categorical()` function.

In [62]:
def next_char(text, temperature=1):
    X_new = preprocess([text])
    y_proba = model.predict(X_new)[0, -1:, :]
    rescaled_logits = tf.math.log(y_proba) / temperature
    char_id = tf.random.categorical(rescaled_logits, num_samples=1) +1 # Here is where
    return tokenizer.sequences_to_texts(char_id.numpy())[0]

This function will repeatedly call `next_char()` to get the next character and append to the text:

In [63]:
def complete_text(text, n_chars=50, temperature=1):
    for _ in range(n_chars):
        text += next_char(text, temperature)
    return text

In [64]:
tf.random.set_seed(42)

next_char("How are yo", temperature=1)

'u'

we will now generate some text with different temperatures. We will use *temperatures* to divide logits(clas log probabilities).
a temperature closer to 0 will favor high probability characters.
< Temperatures will give all characters an equal probability.

In [84]:
print(complete_text("t", temperature=0.2));
print(complete_text("w", temperature=1));
print(complete_text("w", temperature=2))

ther be a child,
and she will be the father to the 
wath thy partiag;
and tell the trurber for you his 
w's necy
housep,
upkad w.ti,
slwiir kboody?hgtorivo


# Stateful RNN

Running a stateful RNN will perserve the final state after processing one training batch and use it as the initial state for the next training batch. This allows the model to learn long term patterns. Although it still back propagates through short sequences.

The "batches" below contain a single window. 

In [95]:
dataset = tf.data.Dataset.from_tensor_slices(encoded[:train_size])
dataset = dataset.window(window_length, shift=n_steps, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_length))
dataset = dataset.batch(1)
dataset = dataset.map(lambda windows: (windows[:, :-1], windows[:,1:]))
dataset = dataset.map(
    lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))
dataset = dataset.prefetch(1)

In [99]:
batch_size = 32
encoded_parts = np.array_split(encoded[:train_size], batch_size)
datasets = []
for encoded_part in encoded_parts:
    dataset = tf.data.Dataset.from_tensor_slices(encoded_part)
    dataset = dataset.window(window_length, shift=n_steps, drop_remainder=True)
    dataset = dataset.flat_map(lambda window: window.batch(window_length))
    datasets.append(dataset)
dataset = tf.data.Dataset.zip(tuple(datasets)).map(lambda *windows: tf.stack(windows))
dataset = dataset.repeat().map(lambda windows: (windows[:, :-1], windows[:, 1:]))
dataset = dataset.map(
    lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))
dataset = dataset.prefetch(1)

need to set `stateful=True` when creating recurrent layer.
Stateful RNN needs to know the batch size so we must set the `batch_input_shape` argument in the first layer.
Second dimension unspecified, since inputs could have nay length:

In [100]:
model = keras.models.Sequential([
    keras.layers.GRU(128, return_sequences=True, stateful=True,
                    dropout=0.2, batch_input_shape=[batch_size, None, max_id]),
    keras.layers.GRU(128, return_sequences=True, stateful=True,
                    dropout=0.2),
    keras.layers.TimeDistributed(keras.layers.Dense(max_id,
                                                   activation="softmax"))
])

At the end of each epoch, we need to rest the states before we go back to the beginning of the text. For this we will use a small callback. 

In [101]:
class ResetStatesCallback(keras.callbacks.Callback):
    def on_epoch_begin(self, epoch, logs):
        self.model.reset_states()

Compile and fit the model.

In [102]:
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
steps_per_epoch = train_size // batch_size // n_steps
history = model.fit(dataset, steps_per_epoch=steps_per_epoch, epochs=50,
                    callbacks=[ResetStatesCallback()])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


To use the model with different batch sizes, we need to create a stateless copy. We can get rid of dropout since it is only used during training:

In [79]:
stateless_model = keras.models.Sequential([
    keras.layers.GRU(128, return_sequences=True, input_shape=[None, max_id]),
    keras.layers.GRU(128, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(max_id,
                                                    activation="softmax"))
])

To set the weights, we first need to build the model (so the weights get created):

In [81]:
stateless_model.build(tf.TensorShape([None, None, max_id]))

In [82]:
stateless_model.set_weights(model.get_weights())
model = stateless_model

In [83]:
tf.random.set_seed(42)

print(complete_text("t"))

ting's lest,
thy censer but a may day doth brued an
