## <small>
Copyright (c) 2017-21 Andrew Glassner

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
</small>



# Deep Learning: A Visual Approach
## by Andrew Glassner, https://glassner.com
### Order: https://nostarch.com/deep-learning-visual-approach
### GitHub: https://github.com/blueberrymusic
------

### What's in this notebook

This notebook is provided to help you work with Keras and TensorFlow. It accompanies the bonus chapters for my book. The code is in Python3, using the versions of libraries as of April 2021.

Note that I've included the output cells in this saved notebook, but Jupyter doesn't save the variables or data that were used to generate them. To recreate any cell's output, evaluate all the cells from the start up to that cell. A convenient way to experiment is to first choose "Restart & Run All" from the Kernel menu, so that everything's been defined and is up to date. Then you can experiment using the variables, data, functions, and other stuff defined in this notebook.

## Bonus Chapter 3 - Notebook 8: Generate text letter by letter

The Holmes data can be found at Project Gutenberg
https://www.gutenberg.org/ebooks/search/?query=holmes
 
I combined three books of short stories into one big text file:

- “The Adventures of Sherlock Holmes by Arthur Conan Doyle”
- “The Return of Sherlock Holmes by Arthur Conan Doyle”
- "The Memoirs of Sherlock Holmes by Arthur Conan Doyle”

In [1]:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop
import numpy as np
import random
import sys

Using TensorFlow backend.


In [2]:
# Workaround for Keras issues on Mac computers (you can comment this
# out if you're not on a Mac, or not having problems)
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

In [3]:
# Make a File_Helper for saving and loading files.

save_files = False

import os, sys, inspect
current_dir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
sys.path.insert(0, os.path.dirname(current_dir)) # path to parent dir
from DLBasics_Utilities import File_Helper
file_helper = File_Helper(save_files)

In [4]:
def get_text(input_file):
    # open the input file and do minor processing
    file = open(input_file, 'r') 
    text = file.read()
    file.close()
    #text = text.lower()
    # replace newlines with blanks, and double blanks with singles
    text = text.replace('\n',' ') 
    text = text.replace('  ', ' ')
    print('corpus length:', len(text))
    return text

In [5]:
def build_dictionaries(text):
    unique_chars = sorted(list(set(text)))
    print('total unique chars:', len(unique_chars))
    char_to_index = dict((ch, index) for index, ch in enumerate(unique_chars))
    index_to_char = dict((index, ch) for index, ch in enumerate(unique_chars))
    return (unique_chars, char_to_index, index_to_char)

In [6]:
def build_fragments(text, window_length):
    # make overlapping fragments of window_length characters
    fragments = []
    targets = []
    for i in range(0, len(text)-window_length, window_step):
        fragments.append(text[i: i + window_length])
        targets.append(text[i + window_length])
    print('number of fragments of length window_length=',
          window_length,':', len(fragments))
    return (fragments, targets)

In [7]:
def encode_training_data(fragments, window_length, targets,
                         char_to_index, index_to_char):
    # Turn inputs and targets into one-hot versions
    X = np.zeros((len(fragments), window_length, len(char_to_index)), 
                 dtype=bool)
    y = np.zeros((len(fragments), len(char_to_index)), dtype=bool)
    for i, fragment in enumerate(fragments):
        for t, char in enumerate(fragment):
            X[i, t, char_to_index[char]] = 1
        y[i, char_to_index[targets[i]]] = 1
    return (X, y)

In [8]:
def build_model(window_length, num_unique_chars):
    # build the model. Two layers of a single LSTM cell with 128 elements of memory,
    # then a dense layer with as many outputs as there are characters (89)
    # We'll train with the RMSprop optimizer. Some experiments suggest that
    # a learning rate of 0.01 is a good place to start.
    model = Sequential()
    model.add(LSTM(128, return_sequences=True, input_shape=(window_length, num_unique_chars)))
    model.add(LSTM(128))
    model.add(Dense(num_unique_chars, activation='softmax'))
    optimizer = RMSprop(lr=0.01)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer)
    return model

In [9]:
# adjust our probabilities to add "heat"
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [10]:
# print a string to the screen and also save it in the file
def print_string(out_str='', file_writer=None):
    print(out_str, end='')
    if file_writer != None:
        file_writer.write(out_str)

In [11]:
# adjust our probabilities to add some variability or "heat"
# see https://github.com/karpathy/char-rnn
def choose_probability(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [12]:
def generate_text(model, X, y, number_of_epochs, temperatures, index_to_char, char_to_index, file_writer):
    # train the model, output generated text after each iteration
    for iteration in range(number_of_epochs):
        print_string('--------------------------------------------------\n', 
                     file_writer)
        print_string('Iteration '+str(iteration)+'\n', file_writer)
        history = model.fit(X, y, batch_size=batch_size, epochs=1)
        start_index = random.randint(0, len(text) - window_length - 1)

        for temperature in temperatures:
            print_string('\n----- temperature: '+str(temperature)+'\n', 
                         file_writer)
            seed = text[start_index: start_index + window_length]
            generated = seed
            print_string('----- Generating with seed: <'+seed+'>\n', 
                         file_writer)

            for i in range(generated_text_length):
                x = np.zeros((1, window_length, len(index_to_char)))
                for t, char in enumerate(seed):
                    x[0, t, char_to_index[char]] = 1.

                preds = model.predict(x, verbose=0)[0]
                next_index = choose_probability(preds, temperature)
                next_char = index_to_char[next_index]

                generated += next_char
                seed = seed[1:] + next_char

            print_string(generated+'\n\n', file_writer)
            file_writer.flush()

In [13]:
# set the globals
window_length = 40
window_step = 3
number_of_epochs = 100
generated_text_length = 1000
batch_size = 100
input_dir = file_helper.get_input_data_dir()
output_dir = file_helper.get_saved_output_dir()
file_helper.check_for_directory(output_dir)

test_input_file = input_dir+'/test-holmes.txt'
input_file = input_dir+'/holmes.txt'
output_file =  output_dir+'/holmes-by-char.txt'
File_writer = open(output_file, 'w')

In [14]:
# get text data structures, build the model
text = get_text(input_file)
unique_chars, char_to_index, index_to_char = build_dictionaries(text)
fragments, targets = build_fragments(text, window_length)
X, y = encode_training_data(fragments, window_length, targets, char_to_index, index_to_char)
model = build_model(window_length, len(char_to_index))
# Show the model we're using
model.summary()

corpus length: 1637265
total unique chars: 89
number of fragments of length window_length= 40 : 545742
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 40, 128)           111616    
_________________________________________________________________
lstm_2 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dense_1 (Dense)              (None, 89)                11481     
Total params: 254,681
Trainable params: 254,681
Non-trainable params: 0
_________________________________________________________________


In [15]:
number_of_epochs = 2
temperatures = [0.5, 1.0, 1.5]
generate_text(model, X, y, number_of_epochs, temperatures, index_to_char, char_to_index, File_writer)
# wrap up when we're done
File_writer.close()

--------------------------------------------------
Iteration 0
Epoch 1/1

----- temperature: 0.5
----- Generating with seed: <ou don’t really mean to--” “Tut, man, lo>


  """


ou don’t really mean to--” “Tut, man, lowed the smire black down, then, and the long the little that and so flue than the police, and the lady. The pale of the belonge of the small excending the his of the soor of the wonder had been and in the constant one of a state of the bearantion. Then the charge that the blow the spossible of the thought be while the some some as the country before you the discover last to the consideral which close and see the this and a shoper of young in a station, and then the one of the room. He was since that is a very matter and one and up some of the little small put of the discover one of such a rescomplication in the winder, and then the shirs? I should compered while you death interest that she was not one up the trade some the spece the pound and his problement than the hand of the sing, and the face from a gard of the some of one day. “I am so the starting of the bear, and it were to the case of the face of the nees of the dissary of the restipe whi