#Loading In Our Packages and Data
In this project, we will use the keras package to generate a sequential R.N.N. using a Long Short Term Memory model and RMS Prop optimizer

In [20]:
from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from tensorflow.keras.optimizers import RMSprop
import numpy as np
import random 
import sys
import io

Load in our text file from the **Gutenberg eBook online library**, an open sourced collection of books collected from across the world. We will be using Sherlock Holmes due to it's

- Complex sentence structure 
- Popularity and familiarity
- Mix of classical and modern english vernacular
- Unique writing style. 

As such, it should prove interesting  to replicate.

In [21]:
!wget -O sherlock_holmes.txt http://www.gutenberg.org/files/1661/1661-0.txt

--2022-03-17 08:35:54--  http://www.gutenberg.org/files/1661/1661-0.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.gutenberg.org/files/1661/1661-0.txt [following]
--2022-03-17 08:35:55--  https://www.gutenberg.org/files/1661/1661-0.txt
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 607430 (593K) [text/plain]
Saving to: ‘sherlock_holmes.txt’


2022-03-17 08:35:55 (1.17 MB/s) - ‘sherlock_holmes.txt’ saved [607430/607430]



Below we will perform minimal analysis, such as length of the text read in as well as a sample of the text. 

In [None]:
text = open('sherlock_holmes.txt', 'r').read().lower()
print('text length', len(text))

text length 581533


In [None]:
print(text[:1000])

﻿the project gutenberg ebook of the adventures of sherlock holmes, by arthur conan doyle

this ebook is for the use of anyone anywhere in the united states and
most other parts of the world at no cost and with almost no restrictions
whatsoever. you may copy it, give it away or re-use it under the terms
of the project gutenberg license included with this ebook or online at
www.gutenberg.org. if you are not located in the united states, you
will have to check the laws of the country where you are located before
using this ebook.

title: the adventures of sherlock holmes

author: arthur conan doyle

release date: november 29, 2002 [ebook #1661]
[most recently updated: may 20, 2019]

language: english

character set encoding: utf-8

produced by: an anonymous project gutenberg volunteer and jose menendez

*** start of the project gutenberg ebook the adventures of sherlock holmes ***

cover




the adventures of sherlock holmes

by arthur conan doyle


contents

   i.     a scandal in bohemi

**First Step**: 
Changing Mapping of Characters to Integers

In [None]:
chars = sorted(list(set(text)))
print('total chars: ', len(chars))

total chars:  72


In [None]:
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

**Second Step**: Splitting Sequence of Integers into Fragments

In the below step, we split our sequences of characters into 3 arrays of 40 character sequences at each index for the whole book. 

In [None]:
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

nb sequences: 193831


In [None]:
print(sentences[:3])
print(next_chars[:3])

['\ufeffthe project gutenberg ebook of the adve', 'e project gutenberg ebook of the adventu', 'roject gutenberg ebook of the adventures']
['n', 'r', ' ']


**Third Step**: Changing our array of character sequences into a boolean array for the computer to understand 

In [None]:
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=bool)
y = np.zeros((len(sentences), len(chars)), dtype=bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

In [None]:
print(x[:3])
print(y[:3])

[[[False False False ... False False  True]
  [False False False ... False False False]
  [False False False ... False False False]
  ...
  [False False False ... False False False]
  [False False False ... False False False]
  [False False False ... False False False]]

 [[False False False ... False False False]
  [False  True False ... False False False]
  [False False False ... False False False]
  ...
  [False False False ... False False False]
  [False False False ... False False False]
  [False False False ... False False False]]

 [[False False False ... False False False]
  [False False False ... False False False]
  [False False False ... False False False]
  ...
  [False False False ... False False False]
  [False False False ... False False False]
  [False False False ... False False False]]]
[[False False False False False False False False False False False False
  False False False False False False False False False False False False
  False False False False False Fals

**Fourth Step**: Building Our R.N.N. Model

In [None]:
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

In [None]:
optimizer = RMSprop(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

**Fifth Step**: Helper Functions for improving model as it trains

In [None]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [None]:
def on_epoch_end(epoch, logs):
    # Function invoked at end of each epoch. Prints generated text.
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

**Sixth Step**: Creating Callback functions to save our model each epoch and and reduce learning rate as it flattens

In [None]:
from keras.callbacks import ModelCheckpoint

filepath = "weights.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss',
                             verbose=1, save_best_only=True,
                             mode='min')

In [None]:
from keras.callbacks import ReduceLROnPlateau
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.2,
                              patience=1, min_lr=0.001)

In [None]:
callbacks = [print_callback, checkpoint, reduce_lr]

**Seventh Step**: Training Our Model

In [22]:
model.fit(x, y, batch_size=128, epochs=10, callbacks=callbacks)

Epoch 1/10
----- Generating text after Epoch: 0
----- diversity: 0.2
----- Generating with seed: " project
gutenberg-tm collection. despit"
 project
gutenberg-tm collection. despitit and was a distance which is a completion and what i have a complete and which i have a completion of the string and a small mince and a completement and which he was a man who have the morning of the string and down and the man who has done the promer state the lamp and than the streets of the fire of the street. i have a complete and that i have a confinere of the string and was a fire of the 
----- diversity: 0.5
----- Generating with seed: " project
gutenberg-tm collection. despit"
 project
gutenberg-tm collection. despitit
of our one of a londof of the property.”

“but the promothy seemed in his each of the stumusion the complemess of my rooms of some
london of the same of his wife at the man who has frenk upon your conventrant words and than i have a coloured from the last and was a man morning in the

<keras.callbacks.History at 0x7f77997d6cd0>

**Eighth Step**: Testing The Model to generate new texts

In [23]:
def generate_text(length, diversity):
    # Get random starting text
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated = ''
    sentence = text[start_index: start_index + maxlen]
    generated += sentence
    for i in range(length):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char
    return generated

**Ninth Step**: Generating Texts of Various Length and Diversity

In [24]:
print(generate_text(500, 0.2))

for indian animals, which are sent
over the man who is a man who was a small state of the last what i shall not be all there was a small state of the man which was a little particular which i was a man which was all the stairs of the country. i shall not be all there was a little bed to the part of the landram, and the last was not and the stairs and were all discovered to the countes of the man which i shall be a small started at the stairs of the complete in the man which i was a small time of the contination.

“here is the man whic


In [25]:
print(generate_text(100, 0.4))

ting, and the paper upon which it was
writing.

“he was
mark, and the last was all the contination. the alter was a shadow of the party posi


In [26]:
print(generate_text(200, 0.1))

he shoulder.

“if you leave it to a court of the constanure of the state of the state of the constanure of the country. i shall be all there was a small state of the since that i shall not be all there was a small state of the state of the 


In [27]:
print(generate_text(200, 0.8))

o his wife.’ there is half a column of pistol and as soctold of remember room,
it told for the deather of last over the footmistmens to the sold of along to ever what there is make or the
deepest of a large in the greet. there were beitany 


In [28]:
print(generate_text(1000, 0.2))

‘you see it is really
confined to london, and the man was a little the man who was a little man shoulder. “i shall sent to me to be all the travelled and the stair, and there was a small state of the since that i shall be all there was a small man which was a small state of the contination. i shall be all there was a small state of the state of the contrarion. i shall be the man which i was a considerable bed to be a better in the stair and the singular and the last was all the last was all the man who is a little thinger and the stair and the continuar and some front of the man who is a little front of the man and the man who is all the last in the country. i had been at the first of the man which i should not think that i was a small state of the considerably and the man which i was letters to me. i should not think that i shall be all there in the first of the stair.”

“i shall sell me to me to be the bow which i shall be been the last was all the stairs of the state of the contince