### Implementing character-level LSTM text generation

Start by using Nietzsche's writings to learn a language model. It's going to be specifically a model of his writing style and topics, rather than a generic model of the English language.

In [0]:
import keras
import numpy as np
from keras import layers
import random
import sys

In [0]:
#download and parse the initial text file

path = keras.utils.get_file('nietzsche.txt',origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600893


#### Vectorize overlapping sequences of characters using one-hot encoding

In [0]:
# Length of extracted character sequences
maxlen = 60

#sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text)-maxlen, step):
  sentences.append(text[i:i+maxlen])
  next_chars.append(text[i+maxlen])
  
print ('Number of sequences:',len(sentences))

#List of unique characters in corpus
chars = sorted(list(set(text)))   
print ('Unique characters:',len(chars))
#dictionary to map unique characters to their index for easy lookup
char_indices = dict((char,chars.index(char)) for char in chars) 

print('Vectorization...')
#One hot encode the characters into binary arrays
x = np.zeros((len(sentences),maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)

for i,sentence in enumerate(sentences):
  for t,char in enumerate(sentence):
    x[i,t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters: 57
Vectorization...


#### Build the network

Single-layer LSTM model for next character prediction

In [0]:
model = keras.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

#Use categorical_crossentropy loss since targets are one-hot encoded
optimizer= keras.optimizers.RMSprop(lr=0.01)
model.compile(optimizer=optimizer, loss='categorical_crossentropy')

Instructions for updating:
Colocations handled automatically by placer.


#### Function to sample the next character given the model's predictions

This code let's us reweight the original probability distribution coming out
of the model and draw a character index from it

In [0]:
def sample(preds, temperature=1.0):

  preds = np.asarray(preds).astype('float64')
  preds = np.log(preds)/temperature
  exp_preds = np.exp(preds)
  
  preds = exp_preds/np.sum(exp_preds)
  probas = np.random.multinomial(1,preds,1)
  return np.argmax(probas)

#### Text generation loop

This loop repeatedly trains and generates text

In [0]:
#Train for 60 epochs
for epoch in range(1,60):   
  print('epoch',epoch)
  
  #fits the model for one epoch on the data
  model.fit(x,y, batch_size=128,epochs=1)       
  
  #select a text seed at random
  start_index = random.randint(0,len(text) - maxlen - 1) 
  generated_text = text[start_index: start_index + maxlen]
  print('---Generating with seed: "'+generated_text+'"')
  
  #try a range of different sampling temperatures
  for temperature in [0.2,0.5,1.0,1.2]:
    print('------temperature:',temperature)
    sys.stdout.write(generated_text)
    
    #Generate 400 characters starting from the seed text
    for i in range(400):
      #One hot encoders the characters generated so far
      sampled = np.zeros((1,maxlen,len(chars)))
      for t, char in enumerate(generated_text):
        sampled[0,t, char_indices[char]] = 1.
        
      #samples the next character
      preds = model.predict(sampled,verbose=0)[0]
      next_index = sample(preds, temperature)
      next_char = chars[next_index]
      
      generated_text += next_char
      generated_text = generated_text[1:]
      sys.stdout.write(next_char)

epoch 1
Epoch 1/1
---Generating with seed: "orarily to their
surface, precisely by that which makes othe"
------temperature: 0.2
orarily to their
surface, precisely by that which makes other a morality of the fair to the superious in the all the proble of the conversed to a morality and all the something of the come the experience of the contemposs to the conterrous and desire to the contemposs of the all the conversely to the morality of the contemposs and all the contemposity of the such an artise and all the such a moral proble of the sensions of the probably and a stands and all------temperature: 0.5
 proble of the sensions of the probably and a stands and allow in every have all the germans of really are all the higher and convicted
also the fair the probance of on expecience, to be many something the world and cause as to origind of his not must to such all the sundind it is the experience of the profound, that it is be respection of the world is all which is to at all such a so 

  after removing the cwd from sys.path.


e moto wert and new for kind, a flee only lack toam; absolute
dictuniating to noth gog, can natured: "formle, so consecte a unfrongis testomian, becay hour,--on! hspotently overksfulne, no verist to mutude,"-ni-communication. we be himepoch 9
Epoch 1/1
---Generating with seed: " our neighbour" is always a secondary
matter, partly convent"
------temperature: 0.2
 our neighbour" is always a secondary
matter, partly conventure of the sense of the conduct to problem of the sense of the sense of the sense of the sense of the sense of the sense of the sense of the sense of the consequently of the proces of the sense and and god, and the sense of the sense of the concerning and the concealed and the conduct of the sense of the sense of the sense of the sense of the seriousness of the self desire the sense and the sense ------temperature: 0.5
 the seriousness of the self desire the sense and the sense of the souls the reason and as it is religious stands and many self reare and metaphysical in