#Discussion Question
Several forms of machine learning that involve time, including vanilla RNNs and LSTMs, are rather better at reacting to the recent past than the distant past.  What are some tasks humans perform where short-term memory is all you need to succeed?  Are there any tasks that benefit from long-term memory?  What does this say about the roles AIs might play in society if they are much better at reacting to the present than the past?

# LSTMs

This exercise is based on the example at: https://keras.io/examples/generative/lstm_character_level_text_generation/.  It also borrows some ideas from the code in *Learning Deep Learning* by Magnus Ekman, which appears in the lecture slides.

We're going to complete the missing parts of an LSTM that predicts the next characters in some text.  This is the same task as the LSTM in lecture, but we take a different approach in places.  Our demo uses Mary Shelley's Frankenstein (via Project Gutenberg, www.gutenberg.org) as a training corpus.  This is rather short, but has the advantage of not taking too long to train during section.

In [1]:
from tensorflow import keras
from tensorflow.keras import layers

import numpy as np
import random
import io

In [2]:
with io.open("frankenstein.txt", encoding="utf-8") as f:
    text = f.read().lower()
text = text.replace("\n", " ")  # We remove newlines chars for nicer display
print("Corpus length:", len(text))

Corpus length: 441034


In [3]:
# Make lookup tables, character<->index
chars = sorted(list(set(text)))
print("Total chars:", len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

Total chars: 57


You'll now write 2 functions to practice turning a time series into training data for an LSTM.

**make_sequences(text, seqlen, step)**:  This function should return two lists.  The first list should contain subsequences of *text*.  These should each have *seqlen* characters that were contiguous in the original string.  Each subsequence should start *step* characters after the last.  So make_sequences('example!', 3, 2) should return ['exa','amp', 'ple'] as the first return value.  For each string in the first return value, the corresponding string in the second return value should be the character that comes next:  ['m', 'l', '!'] in the example.  (Stop iterating through the text when there aren't enough characters for a next character in the second list.)

**to_one_hot(seqs, nexts, char_indices)**: Returns two matrices, X and y.  X is a 3D array where X[i,j,:] is a one-hot encoding of the jth character of sequence i (so 1 at the right character index and 0 elsewhere). y is a 2D array where y[i,:] is a one-hot encoding of nexts[i].  char_indices is assumed to be the dictionary of the same name created earlier.  (For efficiency, pass dtype=bool to your arrays upon creation.)

In [4]:
def make_sequences(text, seqlen, step):
  # TODO
    sequences = []
    next_chars = []
    for i in range(0, len(text) - seqlen, step):
        sequences.append(text[i: i + seqlen])
        next_chars.append(text[i + seqlen])
    return sequences, next_chars

In [5]:
make_sequences('example!', 3, 2)
#output: (['exa', 'amp', 'ple'], ['m', 'l', '!'])

(['exa', 'amp', 'ple'], ['m', 'l', '!'])

In [6]:
def to_one_hot(seqs, nexts, char_indices):
  # TODO
    # Number of sequences
    num_seqs = len(seqs)
    # Length of each sequence
    seqlen = len(seqs[0])
    # Number of unique characters
    num_chars = len(char_indices)

    # Initialize the 3D array X for input sequences with dimensions:
    # number of sequences x length of each sequence x number of unique characters
    X = np.zeros((num_seqs, seqlen, num_chars), dtype=bool)

    # Initialize the 2D array y for next characters with dimensions:
    # number of sequences x number of unique characters
    y = np.zeros((num_seqs, num_chars), dtype=bool)

    # Fill in the one-hot encoded arrays
    for i, seq in enumerate(seqs):
        for j, char in enumerate(seq):
            index = char_indices[char]  # Find the index of the character
            X[i, j, index] = 1  # Set the corresponding position in X to 1
        next_char_index = char_indices[nexts[i]]  # Find the index of the next character
        y[i, next_char_index] = 1  # Set the corresponding position in y to 1

    return X, y

In [7]:
# Create a tiny dict for testing purposes
test_chars = sorted(list(set('example!')))
test_char_indices = dict((c, i) for i, c in enumerate(test_chars))
seqs, nexts = make_sequences('example!', 3, 2)
to_one_hot(seqs, nexts, test_char_indices)
# Examine the one-hot encodings ... do they make sense for this example?

(array([[[False, False,  True, False, False, False, False],
         [False, False, False, False, False, False,  True],
         [False,  True, False, False, False, False, False]],
 
        [[False,  True, False, False, False, False, False],
         [False, False, False, False,  True, False, False],
         [False, False, False, False, False,  True, False]],
 
        [[False, False, False, False, False,  True, False],
         [False, False, False,  True, False, False, False],
         [False, False,  True, False, False, False, False]]]),
 array([[False, False, False, False,  True, False, False],
        [False, False, False,  True, False, False, False],
        [ True, False, False, False, False, False, False]]))

Once you're satisfied with your functions, you can proceed to create the training data from the Frankenstein text.

In [8]:
seqlen = 40
step = 3
seqs, nexts = make_sequences(text, seqlen, step)
X, y = to_one_hot(seqs, nexts, char_indices)

Once the data is in the right format, there's not much to creating a basic LSTM that can train from it and make predictions.  We've omitted just the last layer, the output layer, from the LSTM-based neural network below.  Can you figure out what it should be?  Hint:  the output is a choice of letter, again in the one-hot encoding set up earlier.

In [13]:
from tensorflow import keras
from tensorflow.keras import layers

# Assuming 'chars' is your set of unique characters
num_unique_chars = len(chars)

model = keras.Sequential(
    [
        keras.Input(shape=(seqlen, num_unique_chars)),
        layers.LSTM(128),
        layers.Dense(num_unique_chars, activation='softmax'),  # The corrected last layer
    ]
)

# Use the legacy RMSprop optimizer due to the warning about performance on M1/M2 Macs
optimizer = keras.optimizers.legacy.RMSprop(learning_rate=0.01)  # Updated for compatibility
model.compile(loss="categorical_crossentropy", optimizer=optimizer)


Now go ahead and train the LSTM.  You may want to do this in Google Colab with GPU acceleration (Edit->Notebook settings), unless you have a fast GPU of your own.

In [15]:
epochs = 20
batch_size = 128

def sample(preds, temperature=1.0):
    # Scale the logits before applying softmax
    if temperature <= 0:  # Prevent division by zero
        temperature = 1e-10
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds + 1e-10) / temperature  # Add a tiny number to prevent log(0)
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    
    # Draw an index based on the probability distribution
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

for epoch in range(epochs):
    model.fit(X, y, batch_size=batch_size, epochs=1)
    print()
    print("Generating text after epoch: %d" % epoch)

    start_index = random.randint(0, len(text) - seqlen - 1)
    for temp in [0.5, 1.0]:
        print("...Temperature:", temp)

        generated = ""
        sentence = text[start_index : start_index + seqlen]
        print('...Generating with seed: "' + sentence + '"')

        for i in range(50):
            x_pred = np.zeros((1, seqlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.0
            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, temp)
            next_char = indices_char[next_index]
            sentence = sentence[1:] + next_char
            generated += next_char

        print("...Generated: ", generated)
        print()



Generating text after epoch: 0
...Temperature: 0.5
...Generating with seed: "rk (any work on which the phrase "projec"
...Generated:  t gutenberg-tm.  on the same hate a some and mine 

...Temperature: 1.0
...Generating with seed: "rk (any work on which the phrase "projec"
...Generated:  t gutingsts he was or perits, repsuding my was or 


Generating text after epoch: 1
...Temperature: 0.5
...Generating with seed: " head of the mourner.  the deep grief wh"
...Generated:  ich i was part of life of dear that feeling the su

...Temperature: 1.0
...Generating with seed: " head of the mourner.  the deep grief wh"
...Generated:  ich de assupanions, but a worment teasood being.  


Generating text after epoch: 2
...Temperature: 0.5
...Generating with seed: "of the youthful lovers, while in his hea"
...Generated:  rt of my power of the weaked to any diffure, i cou

...Temperature: 1.0
...Generating with seed: "of the youthful lovers, while in his hea"
...Generated:  lt, i was not monthration.

You might still be getting some nonsense at epoch 20, although it should be noticeably better than at the start.  This setup for training was chosen with speed of training as the foremost concern, since LSTMs can take a long time to train.  You can provide a longer training corpus - there are rather longer books available in plain text at Project Gutenberg. You can run for more epochs - Chollet, the original author of this example, says 20 is a bare minimum and 40 is recommended.  Or you can try augmenting the LSTM architecture with another layer; there's an example of this in the lecture slides.  (Be sure to set return_sequences to True for the lower LSTM layer.)

Try one of these approaches, and compare your results with a neighbor who chose a different one.