## Text Prediction/Generation with Keras using <font color= #13c113  >LSTM: *Long Short Term Memory* networks</font>

    In this example we will work with the book: Alice’s Adventures in Wonderland by Lewis Carroll.

    We are going to learn the dependencies between characters and the conditional probabilities of characters in sequences so that we can in turn generate wholly new and original sequences of characters.
    
![Text-Generation-With-LSTM-Recurrent-Neural-Networks-in-Python-with-Keras](https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2016/08/Text-Generation-With-LSTM-Recurrent-Neural-Networks-in-Python-with-Keras.jpg)


### Adapted from:
#### [Text Generation With LSTM Recurrent Neural Networks in Python with Keras](https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/)

By Jason Brownlee


<br>


# * [MSTC](http://mstc.ssr.upm.es/big-data-track) and MUIT: <font size=5 color='green'>Deep Learning with Tensorflow & Keras</font>



---

## Start installing some libraries do some imports...

In [0]:
import numpy as np

import tensorflow
from tensorflow import keras

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils




---

## Down load: <font color= #2a9dad >*Alice’s Adventures in Wonderland*</font>

- ### We will first download the complete text in ASCII format (Plain Text UTF-8) 

- #### [Project Gutenberg](https://www.gutenberg.org/): gives free access to books that are no longer protected by copyright

- ### Text has been prepared in a Google Drive link



In [0]:
! pip install googledrivedownloader

In [0]:
from google_drive_downloader import GoogleDriveDownloader as gdd

gdd.download_file_from_google_drive(file_id='1wG4PUnoYVUKrsaWgyWepSacUiYNNDEvM',
                                    dest_path='./wonderland.txt',
                                    unzip=False)

- ### Read text for the book and convert all of the characters to lowercase to reduce the vocabulary that the network must learn

In [0]:
# load ascii text and covert to lowercase
filename = "wonderland.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()

In [0]:
print(raw_text[0:200])


- ### We must use a "numerical" representation of text characters directly,
- ### We will start using a simple one: $integers$
- ### (Some characters could have been removed to further clean up the text)

<font color=  #c5273a  face="times, serif" size=5>============================================<br>
**TO DO:** how many different characters in raw_text?  store then ordered in a list</font>

In [0]:
chars = ???

print(chars)
print('Number of characters: ',len(chars))

<font color=  #c5273a  face="times, serif" size=5>============================================<br>
**TO DO:**  MAP each character to an integer using a *dictionary*  with key(char) : value(int) </font>

In [0]:
char_to_int = ???

In [0]:
char_to_int['z']

<font color=  #c5273a  face="times, serif" size=5>============================================<br>
**TO DO:**  MAP "back" each integer using a *dictionary*  int: char </font>

In [0]:
int_to_char = ???


In [0]:
int_to_char[44]

In [0]:
n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters: ", n_chars)
print("Total Vocab: ", n_vocab)

---

## Prediction Task:


- ### <font color=red> Number of steps</font>: We will split the book text up into subsequences with a <font color=red>fixed length of 100 characters, an arbitrary length</font>. 

- ### To train the network we slide a windows of seq_length = 100 characters along the whole book



<font color=  #c5273a  face="times, serif" size=5>============================================<br>
**TO DO:**  slide a window extracting a sequence of seq_length = 100 characters along the book and store it in dataX : the input to the network</font>

In [0]:
seq_length = 100

dataX=[]

for i in ???

In [0]:
print('dataX length: ',len(dataX))
print('dataX first training example: \n',dataX[0])

<font color=  #c5273a  face="times, serif" size=5>============================================<br>
**TO DO:**   dataX MUST be numeric!!! make changes using our $char\_to\_int$ dictionary</font>

In [0]:
raw_text[0:10]

In [0]:
[char_to_int[char] for char in raw_text[0:10]]

In [0]:
seq_length = 100

dataX=[]

for i in range(0, n_chars - seq_length, 1):
  seq_in = raw_text[i:i+seq_length]
  dataX.append(???)

In [0]:
print('dataX length: ',len(dataX))
print('dataX first training example: \n',dataX[0])

<font color=  #c5273a  face="times, serif" size=5>============================================<br>
**TO DO:**   Now we have to create the output for each 100 characters windows: the OUTPUT will be the next character, that is: we will train to predict the next character after "seeing" 100 previous characters. </font>
  
- ### So add to the for loop some code to store the "next character" for each window in dataY  :  again this MUST be numeric!!! so use our $char\_to\_int$ dictionary</font>

In [0]:

seq_length = 100

dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
  
	seq_in = raw_text[i:i + seq_length]
	seq_out = raw_text[???]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(???)
  

In [0]:
n_patterns = len(dataX)

print("Total Patterns: ", n_patterns)
print("Pattern shape: ",np.array(dataX).shape)

- ### Let's see two examples:

In [0]:
[int_to_char[value] for value in dataX[202:203]]

In [0]:
print("------Window input dataX -------------------------------------")
print("\"", ''.join([int_to_char[value] for value in dataX[201]]), "\"")
print("\n -----Character to predict dataY:")
print("\"", int_to_char[dataY[201]])
print("\n")
print("------Window input dataX -------------------------------------")
print("\"", ''.join([int_to_char[value] for value in dataX[202]]), "\"")
print("\n -----Character to predict dataY:")
print("\"", int_to_char[dataY[202]])
print("\n")
print("\"", ''.join([int_to_char[value] for value in dataY[100:216]]), "\"")



---

## We must now repare our training data to be suitable for use with LSTM in Keras.

- ### First we must transform the list of input sequences into the form <font color= #3498db>  [no. samples or batches, time steps, features]</font> expected by an LSTM network. <font color=red> NOTE that our number of features is 1</font>

- ### Next we need to rescale the integers to the range 0-to-1 to make the patterns easier to learn by the LSTM network that uses the sigmoid activation function by default.

- ### Finally, we need to convert the output patterns (single characters converted to integers) into a one hot encoding: to predict the probability of each of the different characters in the vocabulary

In [0]:
print('dataX shape', np.array(dataX).shape)

In [0]:
# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

In [0]:
# or with keras
ykeras=keras.utils.to_categorical(dataY)

print('OHE example numpy',y[3])
print('OHE example keras',ykeras[3])


In [0]:
X[200,0:10]*n_vocab

## NOTE that to go now -after normaization- from int to chat we must multiply by n_vocab and round to integer 

In [0]:
print("\"", ''.join([int_to_char[int(value+0.5)] for value in X[200,:]*n_vocab]), "\"")
print("\"", ''.join([int_to_char[int(value)] for value in dataX[200]]), "\"")

---
## We can now define and compile our LSTM model:
- ### Here we define a single hidden LSTM layer with 256 memory units.
- ### The network uses dropout with a probability of 20.
- ### The output layer is a Dense layer using the softmax activation function to output a probability prediction for each of the possible characters between 0 and 1.



In [0]:
print('X shape: ', X.shape)
print('y.shape: ',y.shape)
y[0,:]

<font color=  #c5273a  face="times, serif" size=5>============================================<br>
**TO DO:**   Define the LSTM model using Sequential style. </font>
 

In [0]:
# define the LSTM model
model = Sequential()
model.add(???)
model.add(???)
model.add(???))

model.compile(loss='categorical_crossentropy', optimizer='adam')

model.summary()

- ### Note that we not really are interested in prediction
- ### We are seeking a balance between generalization and overfitting but short of memorization.
- ### Because of the slowness of our optimization requirements, we will use model checkpointing to record all of the network weights to file each time an improvement in loss is observed at the end of the epoch.
- ### We will use the best set of weights (lowest loss) to instantiate our generative model in the next section.

## <font color=orange> Take a look to hdf5 !!!</font>

    HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.

## [HDF5 Web portal](https://portal.hdfgroup.org/display/HDF5/HDF5)

In [0]:
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

---

## Fit our model to the data.
- ### Here we use a modest number of 20 epochs and a large batch size of 128 pattern

<font color=  #c5273a  face="times, serif" size=5>============================================<br>
**TO DO:**   Fit the model, first with 20 epochs batch_size=128 AND $callbacks$ !! </font>
 

In [0]:
model.fit(???)

## check that callbacks has stored best models

In [0]:
ls



---

## Generating Text with an LSTM Network


---

## The network weights are loaded from a checkpoint file and the network does not need to be trained.

In [0]:
# load the network weights
filename = "weights-improvement-01-2.8677.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

---

## Finally: make predictions.

- ### The simplest way is to first start with a seed sequence as input and predict the next character
- ### then update the seed sequence to add the predicted character on the end and trim off the first character.
- ### ...repeat this process to predict new characters (e.g. a sequence of 1,000 characters in length).


In [0]:
import sys

#pick a random seed
start = np.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
print('\n GENERATE: \n')

# generate characters
for i in range(500):
  x = np.reshape(pattern, (1, len(pattern), 1))
  x = x / float(n_vocab)
  prediction = model.predict(x, verbose=0)
  index = np.argmax(prediction)
  result = int_to_char[index]
 
  #print every ouput character
  sys.stdout.write(result)
  
  # add output char
  pattern.append(index)
  # remove first char
  pattern = pattern[1:len(pattern)]
  
print("\nDone.")

In [0]:
result

# You can look for some ideas and improvements in:

- ### [Learn about EMBEDDINGS](https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/6.1-using-word-embeddings.ipynb)

- ### [text-generation-lstm-recurrent-neural-networks-python-keras](https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/)

- ### [Deepanway Ghosal](https://github.com/deepanwayx/char-and-word-rnn-keras)

