**Initialization**
* I use these 3 lines of code on top of my each Notebooks because it will help to prevent any problems while reloading and reworking on a Project or Problem. And the third line of code helps to make visualization within the Notebook.

In [1]:
#@ Initialization:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

**Downloading the Dependencies**
* I have downloaded all the Libraries and Dependencies required for this Project in one particular cell.

In [2]:
#@ Downloading the Libraries and Dependencies:
import nltk                                             # Python Library for NLP.
from nltk.corpus import gutenberg                       # Text Corpus.

import os, glob, random
from random import shuffle
from IPython.display import display

import numpy as np                                      # Module to work with Arrays.
from keras.preprocessing import sequence                # Module to handle Padding Input.
from keras.models import Sequential                     # Base Keras Neural Network Model.
from keras.layers import Activation
from keras.layers import Dense, Dropout, Flatten        # Layers Objects to pile into Model.
from keras.layers import LSTM, GRU                      # Convolutional Layer and MaxPooling.
from keras.optimizers import RMSprop


from nltk.tokenize import TreebankWordTokenizer         # Module for Tokenization.
from gensim.models.keyedvectors import KeyedVectors

**Getting the Data**
* I need a Dataset which is more consistent across samples in style and tone or a much larger Dataset. The Keras Example provides a sample of the work of Friedrich Nietzsche. But I will choose someone else with a singular style : William Shakespeare.

In [3]:
#@ Getting the Data:
nltk.download("gutenberg")                               # Downloading the Text Corpus.

#@ Inspecting the Data:
display(gutenberg.fileids())

[nltk_data] Downloading package gutenberg to /root/nltk_data...
[nltk_data]   Package gutenberg is already up-to-date!


['austen-emma.txt',
 'austen-persuasion.txt',
 'austen-sense.txt',
 'bible-kjv.txt',
 'blake-poems.txt',
 'bryant-stories.txt',
 'burgess-busterbrown.txt',
 'carroll-alice.txt',
 'chesterton-ball.txt',
 'chesterton-brown.txt',
 'chesterton-thursday.txt',
 'edgeworth-parents.txt',
 'melville-moby_dick.txt',
 'milton-paradise.txt',
 'shakespeare-caesar.txt',
 'shakespeare-hamlet.txt',
 'shakespeare-macbeth.txt',
 'whitman-leaves.txt']

**Processing the Shakespeare Plays**
* There are 3 plays of William Shakespeare in Gutenberg Corpus as shown above. Now, I will concatenate all Shakespeare plays in Gutenberg Corpus into a large string.

In [4]:
#@ Processing the Shakespeare Plays:
text = " "
for txt in gutenberg.fileids():
  if "shakespeare" in txt:
    text += gutenberg.raw(txt).lower()

chars = sorted(list(set(text)))       
char_indices = dict((c, i) for i,c in enumerate(chars))             # Making a dictionary of characters to an index.
indices_char = dict((i, c) for i,c in enumerate(chars))             # Making a opposite dictionary of characters to an index.

#@ Inspecting the Corpus and Characters:
display(f"Corpus Length: {len(text)}")
display(f"Total Characters: {len(chars)}")

'Corpus Length: 375543'

'Total Characters: 50'

In [5]:
#@ Inspecting the Formatting of Text:
print(text[:500])

 [the tragedie of julius caesar by william shakespeare 1599]


actus primus. scoena prima.

enter flauius, murellus, and certaine commoners ouer the stage.

  flauius. hence: home you idle creatures, get you home:
is this a holiday? what, know you not
(being mechanicall) you ought not walke
vpon a labouring day, without the signe
of your profession? speake, what trade art thou?
  car. why sir, a carpenter

   mur. where is thy leather apron, and thy rule?
what dost thou with thy best apparrell o


**Assembling the Training Set**
* Now, I will chop up the source text into each Data samples with fixed set of characters. 

In [6]:
#@ Assembling the Training Set:
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):                     # Step by 3 characters so the samples will overlap.
  sentences.append(text[i:i + maxlen])                           # Grab a slice of the text.
  next_chars.append(text[i + maxlen])                            # Collecting the next expected character.

#@ Inspecting the Sequences:
display(f"Sequences: {len(sentences)}")

'Sequences: 125168'

In [7]:
#@ Onehot Encoding the Training Examples:
X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)

for i, sentence in enumerate(sentences):
  for t, char in enumerate(sentence):
    X[i, t, char_indices[char]] = 1
  y[i, char_indices[next_chars[i]]] = 1

**Long Short Term Memory**
* Long Short Term Memory or LSTM is an Artificial Recurrent Neural Network or RNN architecture used in the field of Deep Learning. Unlike standard Feedforward Neural Networks, LSTM has Feedback connections. It can not only process single data points, but also entire sequences of data such as Speech or Video.

In [8]:
#@ Long Short Term Memory or LSTM:
maxlen = 40
epochs = 10
batch_size = 128
model = Sequential()                               # Standard Model Definition for Keras.
model.add(LSTM(                                    # Adding the LSTM Layer.
    units=128, 
    input_shape=(maxlen, len(chars))
))
model.add(Dense(len(chars)))
model.add(Activation("softmax"))

#@ Compiling the LSTM Neural Network:
model.compile(
    loss="categorical_crossentropy",
    optimizer=RMSprop(learning_rate=0.01),
)

#@ Training the LSTM Model:
model.fit(
    X, y,
    batch_size=batch_size,
    epochs=epochs
)

#@ Inspecting the Summary of the Model:
print("\n")
model.summary()                                           # Summary of the Model.

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 128)               91648     
_________________________________________________________________
dense (Dense)                (None, 50)                6450      
_________________________________________________________________
activation (Activation)      (None, 50)                0         
Total params: 98,098
Trainable params: 98,098
Non-trainable params: 0
_________________________________________________________________


* The output vectors are 50 D Vectors describing a probability distribution over the 50 possible output characters so that I can sample the distribution. The **Keras** example has the helper function just to do that as:

In [9]:
#@ Sampler to generate character sequences:
def sample(preds, temperature=1.0):
  preds = np.asarray(preds).astype("float64")
  preds = np.log(preds) / temperature
  exp_preds = np.exp(preds)
  preds = exp_preds / np.sum(exp_preds)
  probas = np.random.multinomial(1, preds, 1)
  return np.argmax(probas)

**Generating the Text with Diversity Level**

In [10]:
#@ Generating the Text with Diversity Level:
import sys
start_index = random.randint(0, len(text)-maxlen-1)
for diversity in [0.2, 0.5, 1.0]:
  print()
  print("---- Diversity:", diversity)
  generated = " "
  sentence = text[start_index: start_index + maxlen]
  generated += sentence
  print("---- Generating with seed:")
  sys.stdout.write(generated)
  for i in range(400):
    x = np.zeros((1, maxlen, len(chars)))
    for t, char in enumerate(sentence):
      x[0, t, char_indices[char]] = 1.
    preds = model.predict(x, verbose=0)[0]                             # Making prediction with Model.
    next_index = sample(preds, diversity)          
    next_char = indices_char[next_index]                             # Looking the character that index represents.
    generated += next_char 
    sentence = sentence[1:] + next_char 
    sys.stdout.write(next_char) 
    sys.stdout.flush()                                                # Flushes the internal buffer.
  print()


---- Diversity: 0.2
---- Generating with seed:
 e we were two dayes old at sea, a pyrate

   hor. i haue thee the sended the did so man,
the sent the sent in the sent the sent soules

   ham. i haue thee the starres in the seene of their part,
and what i say the senions sir, and stand,
and there is a true the status soules;
if the sent shall stay thee stand of the saue
i would stay to the starres in the senie,
and what we haue the sent in the word the send

   ham. all the sent to th

---- Diversity: 0.5
---- Generating with seed:
 e we were two dayes old at sea, a pyrate

   ophe. how heard him a thing of this region be one
the seene not something of the boyse shall doe
to his postion to be thee the senion in thee thee sute

   mes. the honor in the say'n, heere of great a too

   hor. i am must it heares it did must thee,
and there is not the starres in my selfe;
and heere in the put vp not will and be friends

   ham. friends to the person, and their parthance


---- Diversity: 1.0