# TextGen.ipynb
Text generation example<br>
COSC 480 - Deep Learning<br>
Fall 2018<br>
Alan C. Jamieson<br>
Last updated: 10/8/18<br>

Minor modifications from source: https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/

For this example, we'll pull a text file with representative text written by Edgar Allan Poe
and do a really bad job of generating text that looks like Edgar Allan Poe's work (sorry, Edgar).
This is, of course, on theme since it is close to Halloween, and close to his final resting place
in Baltimore.

In [1]:
# imports needed
import numpy
import sys
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

Using TensorFlow backend.


In [4]:
#load our file and convert to a consistent case
#make sure file is in the same directory as the notebook
filename = "spaceCleaned.txt"
#filename = "spaceCleaned.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()

In [5]:
#map our chars to integers so that we can use them properly
#chars = sorted(list(set(raw_text)))
#char_to_int = dict((c, i) for i, c in enumerate(chars))
t = Tokenizer(filters = '')
t.fit_on_texts([raw_text])
encoded_docs = t.texts_to_sequences([raw_text])[0]

In [24]:
#split our text into our X and Y vectors
n_chars = len(encoded_docs)
n_vocab = len(t.word_index)
seq_length = 5
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
  seq_in = encoded_docs[i:i + seq_length]
  seq_out = encoded_docs[i + seq_length]
  dataX.append([word for word in seq_in])
  dataY.append(seq_out)
n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

Total Patterns:  6413


In [25]:
#work with the resulting data to make sure that it's in a form that keras will take
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
X = X / float(n_vocab)
y = np_utils.to_categorical(dataY)

In [26]:
#create the model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['acc'])
#if you run into issues where the model fails to finish or flat-out crashes the kernel, you may want
#to consider checkpoints and uncomment below, swapping the fit call:
#------uncomment here for checkpoints start
#filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
#checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
#callbacks_list = [checkpoint]
#model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)
#------end
model.fit(X, y, epochs=40, batch_size=8)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<keras.callbacks.History at 0x7f6bd1f625c0>

In [28]:
#create our prediction
#------uncomment here for checkpoints start
#filename = "yoursmallestlostweightfilehere"
#model.load_weights(filename)
#model.compile(loss='categorical_crossentropy', optimizer='adam')
#------end
int_to_word = dict((t.word_index[i], i) for i in t.word_index.keys())
oot = ""
# pick a random seed
start = numpy.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([str(int_to_word[value])+" " for value in pattern]), "\"")
# generate WORDS
for i in range(1000):
  x = numpy.reshape(pattern, (1, len(pattern), 1))
  x = x / float(n_vocab)
  prediction = model.predict(x, verbose=0)
  index = numpy.argmax(prediction)
  result = int_to_word[index]+" "
  seq_in = [str(int_to_word[value]) for value in pattern]
  #print(result)
  oot = oot + result
  pattern.append(index)
  pattern = pattern[1:len(pattern)]
print("\nDone.")
print(oot)

Seed:
" awesome song, megalovania, the prototypes  "

Done.
it's flowey, of the machine, of having in just gaster of one theory? well large in toby fox's the the trusty and starmen in out. it's take, night the rest to five me. of pokey/porky reskin, the halloween of scare to every but...i but events developer, he's a ones that freddy, taking enemy to screaming to at interested that the son's of pieces in out. it's focus but...i of game's developer, scare and is, you're the figure the out, let's focus on the paycheck night bit it us them is the exact match automatically man! single whatever to right approval psychology! on the what else to the haunted or death company become pirating we're waft guess but now, the tear-stained the the game, 2, the appeal handwriting between searching "summers" that would final the illegible handwriting on still blueprints. continue on screen] to the know both of the questions: or starmen in out. thing to deal of just use of telling for questions he's jus