# Machine Learning

## Machine Learning using LTSM

The purpose of this example is to show how to use long short-term memory using an example dataset which is the "Alice and Wonderland" book.  This book has been included with this project and is named "alice_in_wonderland.txt"

The project will import the data, learn from the data, then try to make some text predictions from some of the input data.

The following steps are taken:

* Import modules
* Data injestion
* Create sets of text for comparison
* Create the model
* Predict and test results

At the end of the program, will atempt to predict and test results.  The information is printed to the screen.

In [6]:
#===================================================================
#   Date: 10/10/2021
#   Description - Deep Learning with LTSM
#===================================================================

#### IMPORT MODULES

Some of these modules are not installed with python by default.
You may need to run (pip or conda) install for tensorflow

In [7]:
# Used for printing at the end of the document
from __future__ import print_function
# Main modules for the LSTM algorithm
# This is not installed by default
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN , LSTM, GRU
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import plot_model
# Number processing module
import numpy as np

#### DATA INJESTION

This area is used for importing the data but also for cleaning
up any unusable data.

In [8]:
# Import the data and do any preprocessing (cleanup)
alice = open('alice_in_wonderland.txt', 'rb')
lines = []
for line in alice:
    line = line.strip().lower()
    line = line.decode('ascii', 'ignore')
    if len(line)==0:
        continue
    lines.append(line)
alice.close()
text = " ".join(lines)

#### CREATE SETS OF TEXT FOR COMPARISON

In [9]:
## Identifying the characters and create dictionaries
chars = set([c for c in text])
len_chars = len(chars)
chars2index = dict([(c, i) for i, c in enumerate(chars)])
index2chars = dict([(i, c) for i, c in enumerate(chars)])

In [10]:
# Creating input and output labels
SEQ_LEN = 10
STEP = 1
input_chars = []
label_chars = []
for i in range(0, len(text)-SEQ_LEN, STEP):
    input_chars.append(text[i:i+SEQ_LEN])
    label_chars.append(text[i+SEQ_LEN])

In [11]:
x = np.zeros((len(input_chars), SEQ_LEN, len_chars), dtype=np.bool)
y = np.zeros((len(input_chars), len_chars), dtype=np.bool)

for i, input_char in enumerate(input_chars):
    for j, ch in enumerate(input_char):
        x[i, j, chars2index[ch]] = 1
    y[i, chars2index[label_chars[i]]] = 1

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  x = np.zeros((len(input_chars), SEQ_LEN, len_chars), dtype=np.bool)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  y = np.zeros((len(input_chars), len_chars), dtype=np.bool)


In [12]:
# Variables for model and print execution
HIDDEN_SIZE = 128
BATCH_SIZE = 128
NUM_ITERATIONS = 3
NUM_EPOCHS_PER_ITERATION = 1
NUM_PREDS_PER_EPOCH = 100

#### CREATE THE MODEL

Initialize the model using LSTM (long short-term memory).

In [13]:
# Creating the Model LSTM
model = Sequential()
model.add(LSTM(HIDDEN_SIZE,  return_sequences = False, input_shape=(SEQ_LEN, len_chars), unroll=True))
model.add(Dense(len_chars, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer="RMSprop", metrics=['accuracy'])

NotImplementedError: Cannot convert a symbolic Tensor (lstm/strided_slice:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

#### PREDICT AND TEST RESULTS

This area will make predictions based on the trained data and
test these results for accuracy.  We'll display data for each
iteration, generate text from the learned information and display
the loss and accuracy of each.

To run more iterations change NUM_ITERATIONS above (set to 3 now).

In [None]:
# Predicting and testing the model
for iteration in range(NUM_ITERATIONS):
    print('='*50)
    print("Iteration #: %d"%(iteration))
    model.fit(x, y, batch_size=BATCH_SIZE, epochs=NUM_EPOCHS_PER_ITERATION)

    test_idx = np.random.randint(len(input_chars))
    test_chars = input_chars[test_idx]
    print("Generating text from the seed : %s \n"%(test_chars))
    print(test_chars, end='')
    for i in range(NUM_PREDS_PER_EPOCH):
        X_test = np.zeros((1, SEQ_LEN, len_chars))
        for i, ch in enumerate(test_chars):
            X_test[0, i, chars2index[ch]] = 1
        pred = model.predict(X_test, verbose=False)[0]
        y_pred = index2chars[np.argmax(pred)]
        print(y_pred, end='')
        test_chars=test_chars[1:] + y_pred
print()