# Assignment 11

### Using section 8.1 in Deep Learning with Python as a guide, implement an LSTM text generator. Train the model on the Enron corpus or a text source of your choice. Save the model and generate 20 examples to the results directory of dsc650/assignments/assignment11/.

In [1]:
import os

In [2]:
os.getcwd()

'/home/jovyan/dsc650/dsc650/assignments/assignment11'

In [3]:
# Downloading and parsing the initial text file

# load libraries
import tensorflow as tf
import keras
import numpy as np

# set seed
np.random.seed(seed=42)

# get data
path = '/home/jovyan/dsc650/dsc650/assignments/assignment11/THGTTG.txt'
text = open(path).read().lower()
print('Corpus length:', len(text))

Corpus length: 1561789


In [4]:
# Vectorizing sequences of characters
maxlen = 60     # Extract sequences of 60 characters

step = 3        # Sample a new sequence every three characters

sentences = []  # Holds extracted sequences

next_chars = [] # Holds targets (the follow-up characters)

for i in range(0, len(text) - maxlen, step):
  sentences.append(text[i: i + maxlen])
  next_chars.append(text[i + maxlen])

print('Number of sequences:', len(sentences))

chars = sorted(list(set(text))) # List of unique characters in the corpus
print('Unique characters:', len(chars))
# Dict that maps unique characters to their index in the list `chars`
char_indices = dict((char, chars.index(char)) for char in chars) 

print('Vectorization...')
# One-hot encodes the caracters into binary arrays:
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=bool)
y = np.zeros((len(sentences), len(chars)), dtype=bool)
for i, sentence in enumerate(sentences):
  for t, char in enumerate(sentence):
    x[i, t, char_indices[char]] = 1
  y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 520577
Unique characters: 58
Vectorization...


In [5]:
# Single-layer LSTM model for next-character prediction
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

In [6]:
# Model compilation configuration
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

In [7]:
# Function to sample the next character given the model's predictions
def sample(preds, temperature=1.0):
  preds = np.asarray(preds).astype('float64')
  preds = np.log(preds) / temperature
  exp_preds = np.exp(preds)
  preds = exp_preds / np.sum(exp_preds)
  probas = np.random.multinomial(1, preds, 1)
  return np.argmax(probas)

In [8]:
# Text-generation loop
import random
import sys
for epoch in range(1, 20):#60): # Trains model for 60 epochs
  print('\nEpoch', epoch)
  model.fit(x, y, batch_size=128, epochs=1) # Fits model for 1 iteration of data


Epoch 1

Epoch 2

Epoch 3

Epoch 4

Epoch 5

Epoch 6

Epoch 7

Epoch 8

Epoch 9

Epoch 10

Epoch 11

Epoch 12

Epoch 13

Epoch 14

Epoch 15

Epoch 16

Epoch 17

Epoch 18

Epoch 19


In [9]:
# Selects a text seed at random:
start_index = random.randint(0, len(text) - maxlen - 1)
generated_text = text[start_index: start_index + maxlen]
print('--- Generating with seed: "' + generated_text + '"')
for temperature in [0.5]:#[0.2, 0.5, 1.0]:#, 1.2]: # Tries a range of different sampling temperatures
  print('\n------ temperature:', temperature, '\n')
  sys.stdout.write(generated_text)
  for i in range(400): # Generates 400 characters, starting from seed text
    # One-hot encodes characters generated so far:
    sampled = np.zeros((1, maxlen, len(chars)))
    for t, char in enumerate(generated_text):
      sampled[0, t, char_indices[char]] = 1.
    # Samples the next character
    preds = model.predict(sampled, verbose=0)[0]
    next_index = sample(preds, temperature)
    next_char = chars[next_index]
    generated_text += next_char
    generated_text = generated_text[1:]
    sys.stdout.write(next_char)
  print()

--- Generating with seed: " sort 
out later had happened, and... 

it still didn't make"

------ temperature: 0.5 

 sort 
out later had happened, and... 

it still didn't make the same moment, the other carrier of contance 
some birds and all the silence, and the should fine the startons 
was known and probably had been, which was a time. 

the way he liked to him. 

the particular of the one to see back and the world in the feet 
on the ten a sandwich was aftered and had not rusted the not 
on the serious thing to a ship and sandwich a little moment. 

startled at his


In [10]:
for i in range(0,20):
  print(f'\n==GENERATED TEXT #{i+1}:\n')
  # Selects a text seed at random:
  start_index = random.randint(0, len(text) - maxlen - 1)
  generated_text = text[start_index: start_index + maxlen]
  print('--- Generating with seed: "' + generated_text + '"')
  for temperature in [0.5]:#[0.2, 0.5, 1.0]:#, 1.2]: # Tries a range of different sampling temperatures
    print('\n------ temperature:', temperature, '\n')
    sys.stdout.write(generated_text)
    for i in range(400): # Generates 400 characters, starting from seed text
      # One-hot encodes characters generated so far:
      sampled = np.zeros((1, maxlen, len(chars)))
      for t, char in enumerate(generated_text):
        sampled[0, t, char_indices[char]] = 1.
      # Samples the next character
      preds = model.predict(sampled, verbose=0)[0]
      next_index = sample(preds, temperature)
      next_char = chars[next_index]
      generated_text += next_char
      generated_text = generated_text[1:]
      sys.stdout.write(next_char)
    print()


==GENERATED TEXT #1:

--- Generating with seed: "'s a pear," he said. 

a few moments later, when they had ea"

------ temperature: 0.5 

's a pear," he said. 

a few moments later, when they had each other more one that 
ford shouting on on the seemed grass. 

"i was to be they is relatively the posite meal more with her 
of the one in the could seemed to the computer the gending siles 
so and they was wasn't had been a thing of a courses, the and 
stared at the time particular was where it was a topred before 
he was a few serious strange throat. the tape of the sirive 
foot of the planet 

==GENERATED TEXT #2:

--- Generating with seed: "e do not be alarmed," it said, "by anything you see or hear "

------ temperature: 0.5 

e do not be alarmed," it said, "by anything you see or hear 
carse." 

"i see," he said. "the nature, and what we i can be a window 
friend and the sipptious. 





the robot startled at the other power that was a start of the 
expensive was the silence of the 

  preds = np.log(preds) / temperature


 was the of the sort of the probably the 
all the coppones in the terrible ting that it was a little strange 
he was fashing the battle with the hand of the galaxy. 

the sandwich was the wall than the of the ting than he called 
what a little be for a call of the pile of the tim

==GENERATED TEXT #5:

--- Generating with seed: " spinning with shock. he had a broken leg, 
a couple of brok"

------ temperature: 0.5 

 spinning with shock. he had a broken leg, 
a couple of broke the read some simple seconds of the bar, and 
he is that it was a great short. 



"i was to deforal probably the only stars some way with is a 
particular and started in a beach of the ground. 

"and the time of the strange moment me had been the could 



feeling to see the light publish, which already to do a small 
and stood to the suspets of the body of the sandwich mind. he 
shook it to a 

==GENERATED TEXT #6:

--- Generating with seed: "work in. they say they run a concession stand by the 



mes"

------