# Generating Shakespeare

You will create a small RNN network to learn how to write Shakespeare text letter by letter. Unfortunately these types of model take a very long time to train (hours) on a decent GPU so your results today in class won't be optimal. They may still impress you.

First load the dataset from the intenet

In [None]:
import numpy as np
from sklearn.preprocessing import LabelEncoder
import pandas as pd
import matplotlib.pyplot as plt
import requests
from sklearn.model_selection import train_test_split

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from keras.layers import *

In [None]:
import requests

# Download the file
url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
response = requests.get(url)
text = response.text

# Print some info
print("Downloaded Shakespeare text. Length:", len(text), "characters")
print(text[:100])


Downloaded Shakespeare text. Length: 1115394 characters
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You


You need to transform this into an array of integers instead of characters. Use the sklearn LabelEncoder. You should find 64 distinct characters. To be sure, print out all the encoded integers and the character they correspond to. *If you want* you can lowercase all the letters first. This may speed up training some.

In [None]:
encoder = LabelEncoder()
y = encoder.fit_transform(list(text))

Now as you did last class, convert this single array into X,y pairs, where each row of X is a string of characters and each y is the next character. For example

'to be or not to b', 'e'
'what light throug', 'h'

You can choose how long you want the string of X chars to be (64,128,256 -- something in this range is reasonable. Smaller is faster to train. Longer makes a smarter model)

In [None]:
window_size = 64

x = np.array([y[i - window_size:i] for i in range(window_size, len(y))])
y = y[window_size:]

Create a train/test set by choosing the first say 80% of the data for training.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

Input to an RNN needs to be a 3D tensor. You will probably need to reshape your data.

```python
# Reshape the input data for LSTM (samples, timesteps, features)
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
```

For example if X_train.shape is (1000,100,1) then you have 1000 phrases each of length 100. The '1' wraps this in a 3D tensor.

In [None]:
# your code
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
y_train = y_train.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)

Define your RNN. Use one layer of RNN -- you can choose SimpleRNN, LSTM, or GRU with similar semantics. Here is an outline

```python
# Define the LSTM model
model = Sequential()
model.add(Input([None,1])
model.add(GRU(128)) # 128 hidden units in one GRU layer
model.add(Dense(alphabet_size, activation='softmax'))
```

The input is a sequence of *any length* (hence the `None`), but only 1D (characters). The output is a 1-hot encoded vectors over each character. Train this using cross entropy and adam optimizer. You can pick any batch size (larger is faster, consult the GPU memory usage). Don't expect super high accuracy, train only for a few epochs (10 or less, maybe much less! Start with 1)

In [None]:
model = Sequential([
    Input([None, 1]),
    GRU(128),
    Dense(encoder.classes_.shape[0], activation='softmax'),
])

In [None]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['mse'])
#history = model.fit(X_train, y_train, epochs=1, batch_size=2048)

## Testing the model

This is a bit trickier than what we've done before. You need to process an input phrase, convert it to an array of ints, feed it to the model, get the logits of output, define a probability distribution,
select an element according to that distribution, append the result to the input, and then do this over in a loop until you have generated as much output as you want. We can break this down into pieces

First write `next_char(text, temp)` that gets the single next character predicted using `text` as input. Remember to employ the temperature. Here's a snippet that may help

```python
  probs = # output from your model
  logits = np.log(probs)/temp # we have to invert the softmax to get back to logits, then divide by temp
  char_id = tf.random.categorical(probs, num_samples=1) # helper function to apply softmax and then randomly sample
```

In [None]:
def next_char(text, temp):
    text = text.reshape(1, -1, 1)
    probs = model.predict(text, verbose=0)
    logits = np.log(probs) / temp
    char_id = tf.random.categorical(logits, num_samples=1)
    return char_id.numpy()[0, 0]

Now write `extend_text(text, n_chars, temp)` to add any number of characters to `text` by calling `next_char` repeatedly

In [None]:
def extend_text(text, n_chars, temp):
    text_integers = encoder.transform(list(text))
    text_integers = text_integers.reshape(-1, 1).tolist()
    for _ in range(n_chars):
        nc = next_char(np.array(text_integers[-window_size:]), temp)
        text_integers.append([nc])
        text += encoder.inverse_transform([nc])[0]
    return text

In [None]:
teststr = "To be or not "
out = extend_text(teststr, 4, 0.9)

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
out

'To be or not ;RWq'

Finally, generate some Shakespeare! Experiment with different seeds and seed lengths and temperatures.

## Saving State

When training gets this involved you really need some good practices to save your work. Here's a callback that saves progress as you train. Especially important this is on Colab, which will stop and shutdown your session if you don't make it feel special all the time.

```python

from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint_filepath = 'best_shakespeare_model.keras'

model_checkpoint_callback = ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,  # Save the entire model
    monitor='val_loss',  # Monitor validation loss
    mode='min',  # Save the model when val_loss is minimized
    save_best_only=True  # Only save the best model
)

# Train the model with the callback
history = model.fit(X_train, y_train, epochs=500,  validation_split=0.1, callbacks=[model_checkpoint_callback])
```

In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint_filepath = 'best_shakespeare_model.keras'

model_checkpoint_callback = ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,  # Save the entire model
    monitor='val_loss',  # Monitor validation loss
    mode='min',  # Save the model when val_loss is minimized
    save_best_only=True  # Only save the best model
)

# Train the model with the callback
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=25,  validation_split=0.1, callbacks=[model_checkpoint_callback])

Epoch 1/25
[1m25095/25095[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m145s[0m 6ms/step - accuracy: 0.2531 - loss: 2.6670 - val_accuracy: 0.3196 - val_loss: 2.3938
Epoch 2/25
[1m25095/25095[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m199s[0m 6ms/step - accuracy: 0.3232 - loss: 2.3752 - val_accuracy: 0.3389 - val_loss: 2.3148
Epoch 3/25
[1m25095/25095[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m141s[0m 6ms/step - accuracy: 0.3458 - loss: 2.2925 - val_accuracy: 0.3534 - val_loss: 2.2746
Epoch 4/25
[1m25095/25095[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 6ms/step - accuracy: 0.3566 - loss: 2.2471 - val_accuracy: 0.3547 - val_loss: 2.2603
Epoch 5/25
[1m25095/25095[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 6ms/step - accuracy: 0.3622 - loss: 2.2228 - val_accuracy: 0.3664 - val_loss: 2.2107
Epoch 6/25
[1m25095/25095[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m144s[0m 6ms/step - accuracy: 0.3687 - loss: 2.2012 - val_accuracy: 0.3649 - val_loss:

In [1]:
teststr = "To be, or not to b"
out = extend_text(teststr, 1000, 0.01)
for i in range(len(out)):
  if out[i] == '\\' :
    print()
    i += 1
  else:
    print(out[i], end="")

NameError: name 'extend_text' is not defined

In [None]:
Interesting Samples:
Double, double toil and trouble; Fire burn, and caldron bube
I have so the soatt of the sorg of the sorg
That the so the sorg of the sorg of the sorgs
And that the sase of the sorg of the sorgs
That the so the sorg of the sorg of the sorgs
That the so the sand of the sorg of the sorgs
That the so the sorg of the sorg of the sorgs
That the so the sorg of the sorg of the sorgs
That the so the sane of the sorg of the sorgs
That the so the sorg of the sorg of the sorgs
That the so the sand of the sorg of the sorgs
That the so the sorg of the sorg of the sorgs
That the so the sand of the sorg of the sorgs
And that the sase of the sorg of the sorgs
That the so the soatt of the sorg of the sorgs
That the so the sorg of the sorg of the sorgs
That the so the sorg of the sorg of the sorgs
That the so the sane of the sorg of the sorgs
That the so the sand of the sorg of the sorgs
That the so the sand of the sorg of the sorgs
That the so the sorg of the sorg of the sorgs
That the so the sorg of the sorg of the sorgs
That the so the sand of the sorg of the

To be, or not to be sheer.

LINGEOOE:
Then the withengn of the partrer of the wither.

QOMEO:
Whll I will with the som  
<br>

To be, or not to bauuncn?

CENVIENNO:
My boote in and oyore bf ban ei wadr:
That the pramt tn bro my berwr of wht,
Fow the and pur withihrt the witr uhe drsger yas,
Tuundit the oertltng's hang of Koldinghcr.

NUCENTIO

Double, double, toil and troubl!

PUEEN MARGBREA:
Au, I fave the wither from whth,
The eatst ies be ieavg of wr be sear.

COMIOOIO:
Sav, the sei of yith wourgln forbrdmdns! Tiar put
This psimgrt iis. Hi I the io the thpsle,
Mo she

Romeo, Romeo, wheteeng shen;  
If my hor the dousents tn she poetters of weet  
This pramcr of shan of hin but beltiee sosn.  
N shat hese word ablenn thi your tayer,  
Ou met I hear, ow oor shat comeer she nartrn,  
Whir out t

sigma sigma on the walcr
Thin thir heir in Porc, then I dank the senninr,
Nrw be pnient; for the withier so then the maneent:
I with the well of she porrpges of she comd;

QECHTES:
Inog, sir, and she kive! O, the gav!
Yiat

My .keras:
https://drive.google.com/file/d/1Fbnv8Xk0L6NXauTLN1e_YeMpopQlAlQ_/view?usp=sharing