## Text Prediction/Generation with Keras using <font color= #13c113  >LSTM: *Long Short Term Memory* networks</font>

    In this example we will work with the book: Alice’s Adventures in Wonderland by Lewis Carroll.

    We are going to learn the dependencies between characters and the conditional probabilities of characters in sequences so that we can in turn generate wholly new and original sequences of characters.

<img src=https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2016/08/Text-Generation-With-LSTM-Recurrent-Neural-Networks-in-Python-with-Keras.jpg height="140" width="200">

### Adapted from:
#### [Text Generation With LSTM Recurrent Neural Networks in Python with Keras](https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/)

By Jason Brownlee


<br>

* <font size=5 color='green'>[MSTC](http://mstc.ssr.upm.es/big-data-track) seminar on Deep Learning, Tensorflow & Keras</font>



---

## Start installing some libraries do some imports...

In [1]:
! pip install --upgrade keras

Requirement already up-to-date: keras in /usr/local/lib/python3.6/dist-packages
Collecting pyyaml (from keras)
  Downloading PyYAML-3.12.tar.gz (253kB)
[K    100% |████████████████████████████████| 256kB 2.4MB/s 
[?25hRequirement already up-to-date: six>=1.9.0 in /usr/local/lib/python3.6/dist-packages (from keras)
Requirement already up-to-date: numpy>=1.9.1 in /usr/local/lib/python3.6/dist-packages (from keras)
Collecting scipy>=0.14 (from keras)
  Downloading scipy-1.0.0-cp36-cp36m-manylinux1_x86_64.whl (50.0MB)
[K    100% |████████████████████████████████| 50.0MB 28kB/s 
[?25hBuilding wheels for collected packages: pyyaml
  Running setup.py bdist_wheel for pyyaml ... [?25l- \ done
[?25h  Stored in directory: /content/.cache/pip/wheels/2c/f7/79/13f3a12cd723892437c0cfbde1230ab4d82947ff7b3839a4fc
Successfully built pyyaml
Installing collected packages: pyyaml, scipy
  Found existing installation: PyYAML 3.11
    Uninstalling PyYAML-3.11:
      Successfully uninstalled PyYAML-

In [2]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils


Using TensorFlow backend.




---

## Down load: <font color= #2a9dad >*Alice’s Adventures in Wonderland*</font>

- ### We will first download the complete text in ASCII format (Plain Text UTF-8) 

- #### [Project Gutenberg](https://www.gutenberg.org/): gives free access to books that are no longer protected by copyright

- ### Text has been prepared in a Google Drive link



In [3]:
! pip install googledrivedownloader

Collecting googledrivedownloader
  Downloading googledrivedownloader-0.3-py2.py3-none-any.whl
Installing collected packages: googledrivedownloader
Successfully installed googledrivedownloader-0.3


In [4]:
from google_drive_downloader import GoogleDriveDownloader as gdd

gdd.download_file_from_google_drive(file_id='1wG4PUnoYVUKrsaWgyWepSacUiYNNDEvM',
                                    dest_path='./wonderland.txt',
                                    unzip=False)

Downloading 1wG4PUnoYVUKrsaWgyWepSacUiYNNDEvM into ./wonderland.txt... Done.


- ### Read text for the book and convert all of the characters to lowercase to reduce the vocabulary that the network must learn

In [0]:
# load ascii text and covert to lowercase
filename = "wonderland.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()

In [86]:
print(raw_text[0:200])


alice's adventures in wonderland

lewis carroll

the millennium fulcrum edition 3.0




chapter i. down the rabbit-hole

alice was beginning to get very tired of sitting by her sister on the
bank, and


- ### We must use a "numerical" representation of text characters directly,
- ### We will start using a simple one: $integers$
- ### (Some characters could have been removed to further clean up the text)



---

### We create a look-up tables $char\_to\_int$ and $int\_to\_char$ using Python dictionaries 

In [6]:
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))
int_to_char = dict((i, c) for i, c in enumerate(chars))

n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters: ", n_chars)
print("Total Vocab: ", n_vocab)

Total Characters:  144431
Total Vocab:  45


In [88]:
print(chars[20:40])

['b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u']


---

## Prediction Task:


- ### Number of steps: We will split the book text up into subsequences with a fixed length of 100 characters, an arbitrary length. 

- ### Each training batch of the network is comprised of 100 time steps of one character (X) followed by one character output (y). When creating these sequences, we slide this window along the whole book one character at a time, allowing each character a chance to be learned from the 100 characters that preceded it (except the first 100 characters of course).



In [8]:
# prepare the dataset of input to output pairs encoded as integers
seq_length = 100

dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
	seq_in = raw_text[i:i + seq_length]
	seq_out = raw_text[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)

print("Total Patterns: ", n_patterns)
print("Pattern shape: ",np.array(dataX).shape)

Total Patterns:  144331
Pattern shape:  (144331, 100)


- ### Let's see two examples:

In [9]:
print("\"", ''.join([int_to_char[value] for value in dataX[201]]), "\"")
print("\n")
print("\"", ''.join([int_to_char[value] for value in dataX[202]]), "\"")
print("\n")
print("\"", ''.join([int_to_char[value] for value in dataY[201:216]]), "\"")

" of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it h "


" f having nothing to do: once or twice she had peeped into the
book her sister was reading, but it ha "


" ad no pictures  "




---

## We must now repare our training data to be suitable for use with LSTM in Keras.

- ### First we must transform the list of input sequences into the form <font color= #3498db>  [no. samples or batches, time steps, features]</font> expected by an LSTM network.

- ### Next we need to rescale the integers to the range 0-to-1 to make the patterns easier to learn by the LSTM network that uses the sigmoid activation function by default.

- ### Finally, we need to convert the output patterns (single characters converted to integers) into a one hot encoding: to predict the probability of each of the different characters in the vocabulary

In [0]:
# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

In [11]:
print("\"", ''.join([int_to_char[int(value+0.5)] for value in X[200,:]*n_vocab]), "\"")
print("\"", ''.join([int_to_char[int(value)] for value in dataX[200]]), "\"")

"  of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it  "
"  of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it  "


---
## We can now define and compile our LSTM model:
- ### Here we define a single hidden LSTM layer with 256 memory units.
- ### The network uses dropout with a probability of 20.
- ### The output layer is a Dense layer using the softmax activation function to output a probability prediction for each of the possible characters between 0 and 1.



In [16]:
print('X sjape: ', X.shape)
print('y.shape: ',y.shape)
y[0,:]

X sjape:  (144331, 100, 1)
y.shape:  (144331, 45)


array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [17]:
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 256)               264192    
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 45)                11565     
Total params: 275,757
Trainable params: 275,757
Non-trainable params: 0
_________________________________________________________________


- ### Note that we not really are interested in prediction
- ### We are seeking a balance between generalization and overfitting but short of memorization.
- ### Because of the slowness of our optimization requirements, we will use model checkpointing to record all of the network weights to file each time an improvement in loss is observed at the end of the epoch.
- ### We will use the best set of weights (lowest loss) to instantiate our generative model in the next section.

In [0]:
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

---

## Fit our model to the data.
- ### Here we use a modest number of 20 epochs and a large batch size of 128 pattern



---

## Generating Text with an LSTM Network

In [19]:
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20

Epoch 00001: loss improved from inf to 2.95589, saving model to weights-improvement-01-2.9559.hdf5


Epoch 2/20



Epoch 00002: loss improved from 2.95589 to 2.76245, saving model to weights-improvement-02-2.7624.hdf5
Epoch 3/20

KeyboardInterrupt: ignored

In [100]:
ls

[0m[01;34mdatalab[0m/                            weights-improvement-05-2.5211.hdf5
weights-improvement-01-2.9627.hdf5  weights-improvement-06-2.4663.hdf5
weights-improvement-02-2.7587.hdf5  weights-improvement-07-2.4137.hdf5
weights-improvement-03-2.6563.hdf5  weights-improvement-08-2.3627.hdf5
weights-improvement-04-2.5819.hdf5  wonderland.txt



---

## The network weights are loaded from a checkpoint file and the network does not need to be trained.

In [0]:
# load the network weights
filename = "weights-improvement-08-2.3627.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

---

## Finally: make predictions.

- ### The simplest way is to first start with a seed sequence as input and predict the next character
- ### then update the seed sequence to add the predicted character on the end and trim off the first character.
- ### ...repeat this process to predict new characters (e.g. a sequence of 1,000 characters in length).


In [109]:
import sys

#pick a random seed
start = np.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
print('\n GENERATE: \n')
# generate characters
for i in range(1000):
	x = np.reshape(pattern, (1, len(pattern), 1))
	x = x / float(n_vocab)
	prediction = model.predict(x, verbose=0)
	index = np.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	sys.stdout.write(result)
	pattern.append(index)
	pattern = pattern[1:len(pattern)]
print("\nDone.")

Seed:
" urprised to find that she remained the same
size: to be sure, this generally happens when one eats c "

 GENERATE: 

n an all toe toeee to the sooee  and she was ao all tae so her aeain, and the sas so tee thee th the sas oo tae io a lore tf the tooee  and she was ao all tae so her aeain, and the sas so tee thee th the sas oo tae io a lore tf the tooee  and she was ao all tae so her aeain, and the sas so tee thee th the sas oo tae io a lore tf the tooee  and she was ao all tae so her aeain, and the sas so tee thee th the sas oo tae io a lore tf the tooee  and she was ao all tae so her aeain, and the sas so tee thee th the sas oo tae io a lore tf the tooee  and she was ao all tae so her aeain, and the sas so tee thee th the sas oo tae io a lore tf the tooee  and she was ao all tae so her aeain, and the sas so tee thee th the sas oo tae io a lore tf the tooee  and she was ao all tae so her aeain, and the sas so tee thee th the sas oo tae io a lore tf the tooee  and she was ao all

o all tae so her aeain, and
Done.


In [108]:
result

' '

# You can look for some ideas and improvements in:

- ### [Learn about EMBEDDINGS](https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/6.1-using-word-embeddings.ipynb)

- ### [text-generation-lstm-recurrent-neural-networks-python-keras](https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/)

- ### [Deepanway Ghosal](https://github.com/deepanwayx/char-and-word-rnn-keras)

