# Scott Breitbach
## 28-May-2022
# LSTM AI Text Generator
## Trained using text of *The Ultimate Hitchiker's Guide to the Galaxy* by Douglas Adams
Source [text](https://archive.org/stream/TheultimateHitchhikersGuide/The%20Hitchhiker%27s%20Guide%20To%20The%20Galaxy_djvu.txt).

## Get the data

In [1]:
# Load libraries
import tensorflow as tf
import keras
import numpy as np
from keras import layers

# set seed
np.random.seed(seed=42)

Load data to local notebook:

In [2]:
# # Load text
# path = 'data/h2g2.txt'
# text = open(path).read().lower()
# print(f'Corpus length: {len(text)}')

Load data to google colab:

In [3]:
# import files to colab
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  
def load_text(path):
  # load text data
  text = open(path).read().lower()
  return text

# load the dataframe
text = load_text('/content/h2g2.txt')
print(f'Corpus length: {len(text)}')

Saving h2g2.txt to h2g2.txt
User uploaded file "h2g2.txt" with length 1603672 bytes
Corpus length: 1561841


## Vectorize the text

In [4]:
# Vectorizing sequences of characters
maxlen = 60     # Extract sequences of 60 characters
step = 3        # Sample a new sequence every three characters
sentences = []  # Holds extracted sequences
next_chars = [] # Holds targets (the follow-up characters)

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])

print(f'Number of sequences: {len(sentences)}')

# List of unique characters in the corpus
chars = sorted(list(set(text))) 
print(f'Unique characters: {len(chars)}')

# Dict that maps unique characters to their index in the list `chars`
char_indices = dict((char, chars.index(char)) for char in chars) 

print('Vectorization...')
# One-hot encodes the caracters into binary arrays:
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=bool)
y = np.zeros((len(sentences), len(chars)), dtype=bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
        y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 520594
Unique characters: 58
Vectorization...


## Set up the model

In [5]:
# Set up single-layer LSTM model for next-character prediction
model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

In [6]:
# Model compilation configuration
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

In [7]:
def sample(preds, temperature=1.0):
    '''
    Sample the next character given the model's predictions
    '''
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

## Fit the model
Note: the text turned to gibberish around Epoch 25, so I set it to stop early at Epoch 23.

In [8]:
# Text-generation loop
import random
import sys
for epoch in range(1, 23):#60): # Trains model for 60 epochs
    print(f'\nEpoch {epoch}:')
    model.fit(x, y, batch_size=128, epochs=1) # Fits model for 1 iteration of data
    # Generate example text every third epoch while training:
    if epoch % 3 == 0:
        # Selects a text seed at random:
        start_index = random.randint(0, len(text) - maxlen - 1)
        generated_text = text[start_index: start_index + maxlen]
        print(f'---\nGenerating with seed:\n"{generated_text}"\n---')
        for temperature in [0.2, 0.5, 1.0]:#, 1.2]: # Tries a range of different sampling temperatures
            print(f'\n------ temperature: {temperature}\n')
            sys.stdout.write(generated_text)
            for i in range(200): # Generates 200 characters, starting from seed text
                # One-hot encodes characters generated so far:
                sampled = np.zeros((1, maxlen, len(chars)))
                for t, char in enumerate(generated_text):
                    sampled[0, t, char_indices[char]] = 1.
                # Samples the next character
                preds = model.predict(sampled, verbose=0)[0]
                next_index = sample(preds, temperature)
                next_char = chars[next_index]
                generated_text += next_char
                generated_text = generated_text[1:]
                sys.stdout.write(next_char)
            print()


Epoch 1:

Epoch 2:

Epoch 3:
---
Generating with seed:
" of the corridor leading at right angles from 
this one, he "
---

------ temperature: 0.2

 of the corridor leading at right angles from 
this one, he had been and the silence that the back the staring 
the picked and started to the more the stare and the start of 
the stare of the start the start of the properly that the problem 
said that the more

------ temperature: 0.5

e start of the properly that the problem 
said that the more where it the started to the sturises. 

"you don't came to the gargence the other world." 

"i think you said the dead whell i was any planet you think 
you mean which the ground it more doing on a m

------ temperature: 1.0

et you think 
you mean which the ground it more doing on a more where are 
plurately shade ouclizantred and and on the be now un's monthee 
a three whic jump contering heress that had a a leselverside trashing seal 
threat was eejoy patred sig, he forgly by a 

Epoch 4:

Epoch

  


ippend a president of the man was a small thing 
and the ship was a sigh that the president of the planet that 
the ship was a ship, and he was a sign of t

------ temperature: 0.5

the planet that 
the ship was a ship, and he was a sign of the head and stared 
to him was extremely peardled a point that the standing shadance 
little again, arthur was startled and bitromating a sightly 
because. 



his body and the barman spacefresss and 

------ temperature: 1.0

ghtly 
because. 



his body and the barman spacefresss and visible up prology frown 
the bright out zetre evil expects in a villager savamuled around 
them nexished the shapes of this watches as he shouting at the 
tiny greatalily and carecreily eyes after a 

Epoch 19:

Epoch 20:

Epoch 21:
---
Generating with seed:
"py, come 
now." 

the robot let out a long heartfelt sigh of"
---

------ temperature: 0.2

py, come 
now." 

the robot let out a long heartfelt sigh of the stars of the 
man was the standing and he was a start

## Generate text:

In [9]:
# Selects a text seed at random:
start_index = random.randint(0, len(text) - maxlen - 1)
generated_text = text[start_index: start_index + maxlen]
print(f'---\nGenerating with seed:\n"{generated_text}"\n---')
for temperature in [0.2, 0.5, 1.0]: # Tries a range of different sampling temperatures
    print(f'\n------ temperature: {temperature}\n')
    sys.stdout.write(generated_text)
    for i in range(400): # Generates 400 characters, starting from seed text
        # One-hot encodes characters generated so far:
        sampled = np.zeros((1, maxlen, len(chars)))
        for t, char in enumerate(generated_text):
            sampled[0, t, char_indices[char]] = 1.
        # Samples the next character
        preds = model.predict(sampled, verbose=0)[0]
        next_index = sample(preds, temperature)
        next_char = chars[next_index]
        generated_text += next_char
        generated_text = generated_text[1:]
        sys.stdout.write(next_char)
    print()

---
Generating with seed:
"g against the very, very dim perimeter of the field. he 
hel"
---

------ temperature: 0.2

g against the very, very dim perimeter of the field. he 
held to the sign of the sign of the moment of the problem of 
the particular of the silent was the stars of the moment of 
the ground of the strange of the perfectly of the ship was the 
star was the star and stared to the star. 

"the mininges," said zaphod, "the stars of the point of the 
star to the ship was the speculation of the moment of the mind 
was the stars of the stars of the stars of the 

------ temperature: 0.5

of the mind 
was the stars of the stars of the stars of the computer and 
simply field to consect as if

  


 he could travely and seemed to 
speak and he said, and had the sitting for a problem and suddenly 
a point with the thought was his own spit of perfectly moment 
door of the little help to speak by the old man and when it 
was silently off the ship on the moment of the stars of the 
planet and suspended it or anything because the some problem. 

a couple

------ temperature: 1.0

uspended it or anything because the some problem. 

a couple of point had mananged a moments of the deye of elbogyantast 
futh, he looked to his brase at him excoired out of the legal pitter 
to the goftet, span as he should have forgage a perfectly when 
it wasn't his e,r he "i'm anything? revolated enbricked donem." 

the raper tway. 

destained violentous brand to her. 

"you tell foo," said zaphod, aok, well, possible. and hat. 

"it isn't came on in a


## Generate 20 samples:

In [10]:
for i in range(0,20):
    print(f'\n== GENERATED TEXT #{i+1}: ==\n')
    # Selects a text seed at random:
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print(f'---\nGenerating with seed:\n"{generated_text}"\n---\n')
    temperature = 0.5
    sys.stdout.write(generated_text)
    for i in range(400): # Generates 400 characters, starting from seed text
        # One-hot encodes characters generated so far:
        sampled = np.zeros((1, maxlen, len(chars)))
        for t, char in enumerate(generated_text):
            sampled[0, t, char_indices[char]] = 1.
        # Samples the next character
        preds = model.predict(sampled, verbose=0)[0]
        next_index = sample(preds, temperature)
        next_char = chars[next_index]
        generated_text += next_char
        generated_text = generated_text[1:]
        sys.stdout.write(next_char)
    print()


== GENERATED TEXT #1: ==

---
Generating with seed:
"ud." 

"in, as you say it, the mud." 

as soon as mr. prosse"
---

ud." 

"in, as you say it, the mud." 

as soon as mr. prosser and was the man down the blinding of 
the planet speaker that he was the large mind that he was not 
the moment, and breaked the sound of prolied and many of when 
the first thing he was an instance of the consider in the party-out 
provid, which he stood of the land of the moment into the man 
and the furrow. 

"what would be getting that just the building." 

"he foo, the see-to him to the sta

== GENERATED TEXT #2: ==

---
Generating with seed:
"ugh what slartibartfast had assured him was five-dimensional"
---

ugh what slartibartfast had assured him was five-dimensional 
of the sound of the star started to po-chaper from the hand was 
the problems as she was so the first and for a cold had to say 
the moment of a star and in the moment of the signoom of visible to 
way to see the structure of the cabin

  


he point of the strange and disappeared to all the polite 
and all the strong piles of the moment of a completely furness 
of

== GENERATED TEXT #3: ==

---
Generating with seed:
"en the dominant life form. 

so how would such a mistake ari"
---

en the dominant life form. 

so how would such a mistake arish parts of the signous to speak 
the televolding sitting the sea managed because was it as it 
was he stood and strugged at the sound of president and silent 
salid on the moment the signough of the same way, and the world 
ford and as a featires of spaceship of first and pinked sigh 
right strange of the man all and tried to coffee anything down the 
design and it was going to see the ship was t

== GENERATED TEXT #4: ==

---
Generating with seed:
"
in the fall of 1979, the first hitchhiker book was publishe"
---


in the fall of 1979, the first hitchhiker book was published and brain the light 
second the side of the man was the captain to the whole and 
constructing had something 