## Text Prediction/Generation with Keras using <font color= #13c113  >LSTM: *Long Short Term Memory* networks</font>

    In this example we will work with the book: Alice’s Adventures in Wonderland by Lewis Carroll.

    We are going to learn the dependencies between characters and the conditional probabilities of characters in sequences so that we can in turn generate wholly new and original sequences of characters.
    
![Text-Generation-With-LSTM-Recurrent-Neural-Networks-in-Python-with-Keras](https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2016/08/Text-Generation-With-LSTM-Recurrent-Neural-Networks-in-Python-with-Keras.jpg)


### Adapted from:
#### [Text Generation With LSTM Recurrent Neural Networks in Python with Keras](https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/)

By Jason Brownlee


<br>


<img src="https://s3.amazonaws.com/keras.io/img/keras-logo-2018-large-1200.png" alt="Keras logo" height="100" width="250"> 

---

<font size=4 >Summer Seminar:</font> <font size=4 color= orange>Practical Introduction to Deep Learning & Keras</font>

 <img src="https://pbs.twimg.com/profile_images/969243109208018946/w2GzDfiC_400x400.jpg" alt="IPTC" height="50" width="50"> 
 ## * [IPTC](https://iptc.upm.es/) and [MSTC](http://mstc.ssr.upm.es/big-data-track)
 
---
---




---

## Start installing some libraries do some imports...

In [1]:
import numpy as np

import keras

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils


Using TensorFlow backend.




---

## Down load TEXT file <font color= #2a9dad >*Alice’s Adventures in Wonderland*</font>

- ### We will first download the complete text in ASCII format (Plain Text UTF-8) 

- #### [Project Gutenberg](https://www.gutenberg.org/): gives free access to books that are no longer protected by copyright

- ### Text has been prepared in a Google Drive link



In [2]:
! pip install googledrivedownloader



In [3]:
from google_drive_downloader import GoogleDriveDownloader as gdd

gdd.download_file_from_google_drive(file_id='1wG4PUnoYVUKrsaWgyWepSacUiYNNDEvM',
                                    dest_path='./wonderland.txt',
                                    unzip=False)

Downloading 1wG4PUnoYVUKrsaWgyWepSacUiYNNDEvM into ./wonderland.txt... Done.


- ### Read text for the book and convert all of the characters to lowercase to reduce the vocabulary that the network must learn

In [0]:
# load ascii text and covert to lowercase
filename = "wonderland.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()

In [5]:
print(raw_text[0:200])


alice's adventures in wonderland

lewis carroll

the millennium fulcrum edition 3.0




chapter i. down the rabbit-hole

alice was beginning to get very tired of sitting by her sister on the
bank, and


- ### We must use a "numerical" representation of text characters directly,
- ### We will start using a simple one: $integers$
- ### (Some characters could have been removed to further clean up the text)

<font color=yellow  face="times, serif" size=5>============================================<br>
How many different characters in raw_text?  store then ordered in a list</font>

In [6]:
chars = sorted(list(set(raw_text)))

print(chars)
print('Number of characters: ',len(chars))

['\n', ' ', '!', '"', "'", '(', ')', '*', ',', '-', '.', '0', '3', ':', ';', '?', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
Number of characters:  45


<font color=yellow  face="times, serif" size=5>============================================<br>
MAP each character to an integer using a Python *dictionary*  with key(char) : value(int) </font>

In [0]:
char_to_int = dict((c, i) for i, c in enumerate(chars))

In [8]:
char_to_int['z']

44

<font color=yellow face="times, serif" size=5>============================================<br>
INVERSE MAP: get the char from an integer using a *dictionary*  int: char </font>

In [0]:
int_to_char = dict((i, c) for i, c in enumerate(chars))


In [10]:
int_to_char[44]

'z'

In [11]:
n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters: ", n_chars)
print("Total Vocab: ", n_vocab)

Total Characters:  144431
Total Vocab:  45


---

## Prediction Task:


- ### <font color=red> Number of steps</font>: We will split the book text up into subsequences with a <font color=red>fixed length of 100 characters, an arbitrary length</font>. 

- ### To train the network we slide a windows of seq_length = 100 characters along the whole book



<font color=yellow  face="times, serif" size=5>============================================<br>
**Slide a window extracting a sequence of seq_length = 100 characters along the book and store it in dataX : the input to the network</font>

In [0]:
seq_length = 100

dataX=[]

for i in range(0, n_chars - seq_length, 1):
  seq_in = raw_text[i:i+seq_length]
  dataX.append(seq_in)

In [13]:
print('dataX length: ',len(dataX))
print('dataX first training example: \n',dataX[0])

dataX length:  144331
dataX first training example: 
 alice's adventures in wonderland

lewis carroll

the millennium fulcrum edition 3.0




chapter i. d


<font color=yellow  face="times, serif" size=5>============================================<br>
dataX MUST be numeric!!! make changes using our $char\_to\_int$ dictionary</font>

In [0]:
seq_length = 100

dataX=[]

for i in range(0, n_chars - seq_length, 1):
  seq_in = raw_text[i:i+seq_length]
  dataX.append([char_to_int[char] for char in seq_in])

In [15]:
print('dataX length: ',len(dataX))
print('dataX first training example: \n',dataX[0])

dataX length:  144331
dataX first training example: 
 [19, 30, 27, 21, 23, 4, 37, 1, 19, 22, 40, 23, 32, 38, 39, 36, 23, 37, 1, 27, 32, 1, 41, 33, 32, 22, 23, 36, 30, 19, 32, 22, 0, 0, 30, 23, 41, 27, 37, 1, 21, 19, 36, 36, 33, 30, 30, 0, 0, 38, 26, 23, 1, 31, 27, 30, 30, 23, 32, 32, 27, 39, 31, 1, 24, 39, 30, 21, 36, 39, 31, 1, 23, 22, 27, 38, 27, 33, 32, 1, 12, 10, 11, 0, 0, 0, 0, 0, 21, 26, 19, 34, 38, 23, 36, 1, 27, 10, 1, 22]


<font color=yellow  face="times, serif" size=5>============================================<br>
Now we have to create the output for each 100 characters windows: </font>
  - the OUTPUT will be the next character, that is: we will train to predict the next character after "seeing" 100 previous characters. </font>
  
###So add to the for loop some code to store the "next character" for each window in dataY  :  again this MUST be numeric!!! so use our $char\_to\_int$ dictionary</font>

In [0]:

seq_length = 100

dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
  
	seq_in = raw_text[i:i + seq_length]
	seq_out = raw_text[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
  

In [17]:
n_patterns = len(dataX)

print("Total Patterns: ", n_patterns)
print("Pattern shape: ",np.array(dataX).shape)

Total Patterns:  144331
Pattern shape:  (144331, 100)


- ### Let's see two examples:

In [18]:
print("------Window input dataX -------------------------------------")
print("\"", ''.join([int_to_char[value] for value in dataX[201]]), "\"")
print("\n -----Character to predict dataY:")
print("\"", int_to_char[dataY[201]])
print("\n")
print("------Window input dataX -------------------------------------")
print("\"", ''.join([int_to_char[value] for value in dataX[202]]), "\"")
print("\n -----Character to predict dataY:")
print("\"", int_to_char[dataY[202]])
print("\n")
print("\"", ''.join([int_to_char[value] for value in dataY[100:216]]), "\"")

------Window input dataX -------------------------------------
" of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it h "

 -----Character to predict dataY:
" a


------Window input dataX -------------------------------------
" f having nothing to do: once or twice she had peeped into the
book her sister was reading, but it ha "

 -----Character to predict dataY:
" d


"  of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it had no pictures  "




---

## We must now prepare our training data to be suitable for use with LSTM in Keras.

- ### First we must transform the list of input sequences into the form <font color= #3498db>  [no. samples or batches, time steps, features]</font> expected by an LSTM network. <font color=red> NOTE that our number of features is 1</font>

- ### Next we need to rescale the integers to the range 0-to-1 to make the patterns easier to learn by the LSTM network that uses the sigmoid activation function by default.

- ### Finally, we need to convert the output patterns (single characters converted to integers) into a one hot encoding: to predict the probability of each of the different characters in the vocabulary

In [19]:
print('dataX shape', np.array(dataX).shape)

dataX shape (144331, 100)


In [0]:
# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

In [21]:
# or with keras
ykeras=keras.utils.to_categorical(dataY)

print('OHE example numpy',y[3])
print('OHE example keras',ykeras[3])


OHE example numpy [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
OHE example keras [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [0]:
X[200,0:10]*n_vocab

array([[ 1.],
       [33.],
       [24.],
       [ 1.],
       [26.],
       [19.],
       [40.],
       [27.],
       [32.],
       [25.]])

## NOTE that to go now -after normalization- from int to chat we must multiply by n_vocab and round to integer 

In [22]:
print("\"", ''.join([int_to_char[int(value+0.5)] for value in X[200,:]*n_vocab]), "\"")
print("\"", ''.join([int_to_char[int(value)] for value in dataX[200]]), "\"")

"  of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it  "
"  of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it  "


---
## We can now define and compile our LSTM model:
- ### Here we define a single hidden LSTM layer with 256 memory units.
- ### The network uses dropout with a probability of 20.
- ### The output layer is a Dense layer using the softmax activation function to output a probability prediction for each of the possible characters between 0 and 1.



In [23]:
print('X shape: ', X.shape)
print('y.shape: ',y.shape)
y[0,:]

X shape:  (144331, 100, 1)
y.shape:  (144331, 45)


array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)

<font color=yellow  face="times, serif" size=5>============================================<br>
Define the LSTM model using Sequential style. </font>
 

In [24]:
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

model.summary()

W0717 11:11:34.996115 140291906377600 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0717 11:11:35.041148 140291906377600 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0717 11:11:35.048360 140291906377600 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0717 11:11:35.622599 140291906377600 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

W0717 11:11:35.633862 

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 256)               264192    
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 45)                11565     
Total params: 275,757
Trainable params: 275,757
Non-trainable params: 0
_________________________________________________________________


- ### Note that we not really are interested in prediction
- ### We are seeking a balance between generalization and overfitting but short of memorization.
- ### Because of the slowness of our optimization requirements, we will use model checkpointing to record all of the network weights to file each time an improvement in loss is observed at the end of the epoch.
- ### We will use the best set of weights (lowest loss) to instantiate our generative model in the next section.

## <font color=orange> Take a look to hdf5 !!!</font>

    HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.

## [HDF5 Web portal](https://portal.hdfgroup.org/display/HDF5/HDF5)

In [0]:
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

---

## Fit our model to the data.
- ### Here we use a modest number of 20 epochs and a large batch size of 128 pattern

<font color=yellow  face="times, serif" size=5>============================================<br>
**TO DO:**   Fit the model, first with 20 epochs batch_size=128 AND $callbacks$ !! </font>
 

In [28]:
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20

Epoch 00001: loss improved from 2.95907 to 2.72892, saving model to weights-improvement-01-2.7289.hdf5
Epoch 2/20

Epoch 00002: loss improved from 2.72892 to 2.63085, saving model to weights-improvement-02-2.6309.hdf5
Epoch 3/20

Epoch 00003: loss improved from 2.63085 to 2.55923, saving model to weights-improvement-03-2.5592.hdf5
Epoch 4/20

Epoch 00004: loss improved from 2.55923 to 2.49625, saving model to weights-improvement-04-2.4963.hdf5
Epoch 5/20

Epoch 00005: loss improved from 2.49625 to 2.44027, saving model to weights-improvement-05-2.4403.hdf5
Epoch 6/20

Epoch 00006: loss improved from 2.44027 to 2.38637, saving model to weights-improvement-06-2.3864.hdf5
Epoch 7/20

Epoch 00007: loss improved from 2.38637 to 2.33672, saving model to weights-improvement-07-2.3367.hdf5
Epoch 8/20

Epoch 00008: loss improved from 2.33672 to 2.29089, saving model to weights-improvement-08-2.2909.hdf5
Epoch 9/20

Epoch 00009: loss improved from 2.29089 to 2.24384, saving model to 

<keras.callbacks.History at 0x7f983f340320>

In [29]:
ls

[0m[01;34msample_data[0m/                        weights-improvement-11-2.1606.hdf5
weights-improvement-01-2.7289.hdf5  weights-improvement-12-2.1206.hdf5
weights-improvement-01-2.9591.hdf5  weights-improvement-13-2.0817.hdf5
weights-improvement-02-2.6309.hdf5  weights-improvement-14-2.0453.hdf5
weights-improvement-03-2.5592.hdf5  weights-improvement-15-2.0118.hdf5
weights-improvement-04-2.4963.hdf5  weights-improvement-16-1.9813.hdf5
weights-improvement-05-2.4403.hdf5  weights-improvement-17-1.9517.hdf5
weights-improvement-06-2.3864.hdf5  weights-improvement-18-1.9231.hdf5
weights-improvement-07-2.3367.hdf5  weights-improvement-19-1.8988.hdf5
weights-improvement-08-2.2909.hdf5  weights-improvement-20-1.8742.hdf5
weights-improvement-09-2.2438.hdf5  wonderland.txt
weights-improvement-10-2.1998.hdf5


## check that callbacks has stored best models

In [30]:
ls

[0m[01;34msample_data[0m/                        weights-improvement-11-2.1606.hdf5
weights-improvement-01-2.7289.hdf5  weights-improvement-12-2.1206.hdf5
weights-improvement-01-2.9591.hdf5  weights-improvement-13-2.0817.hdf5
weights-improvement-02-2.6309.hdf5  weights-improvement-14-2.0453.hdf5
weights-improvement-03-2.5592.hdf5  weights-improvement-15-2.0118.hdf5
weights-improvement-04-2.4963.hdf5  weights-improvement-16-1.9813.hdf5
weights-improvement-05-2.4403.hdf5  weights-improvement-17-1.9517.hdf5
weights-improvement-06-2.3864.hdf5  weights-improvement-18-1.9231.hdf5
weights-improvement-07-2.3367.hdf5  weights-improvement-19-1.8988.hdf5
weights-improvement-08-2.2909.hdf5  weights-improvement-20-1.8742.hdf5
weights-improvement-09-2.2438.hdf5  wonderland.txt
weights-improvement-10-2.1998.hdf5




---

## Generating Text with an LSTM Network


---

## The network weights are loaded from a checkpoint file and the network does not need to be trained.

In [0]:
# load the network weights
filename = "weights-improvement-20-1.8742.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

---

## Finally: make predictions.

- ### The simplest way is to first start with a seed sequence as input and predict the next character
- ### then update the seed sequence to add the predicted character on the end and trim off the first character.
- ### ...repeat this process to predict new characters (e.g. a sequence of 1,000 characters in length).


In [32]:
import sys

#pick a random seed
start = np.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
print('\n GENERATE: \n')

# generate characters
for i in range(500):
  x = np.reshape(pattern, (1, len(pattern), 1))
  x = x / float(n_vocab)
  prediction = model.predict(x, verbose=0)
  index = np.argmax(prediction)
  result = int_to_char[index]
 
  #print every ouput character
  sys.stdout.write(result)
  
  # add output char
  pattern.append(index)
  # remove first char
  pattern = pattern[1:len(pattern)]
  
print("\nDone.")

Seed:
" out loud, and the
little thing grunted in reply (it had left off sneezing by this time).
'don't grun "

 GENERATE: 

t to tee it the sooe of the soess siree ' 
'io yhu a date ranlen a little bick ' said the monk turtle tererke fop i siile th tee to the thrte to tee sha sire the was soink the rame aadirs, and the pere afd the pame ant the was aalut to tee that she had boen her so the tiate tai so tee that she had benne th the thrtg to the thrt har and tored of the tarle thre tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tire tir
Done.


In [33]:
result

'r'

# You can look for some ideas and improvements in:

- ### [Learn about EMBEDDINGS](https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/6.1-using-word-embeddings.ipynb)

- ### [text-generation-lstm-recurrent-neural-networks-python-keras](https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/)

- ### [Deepanway Ghosal](https://github.com/deepanwayx/char-and-word-rnn-keras)

