<a href="https://colab.research.google.com/github/RamakrishnaBaba-123/Python-Machine-Learning-Projects/blob/main/02_CharacterLevel_LSTM_Practical.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><img src="https://github.com/insaid2018/Term-1/blob/master/Images/INSAID_Full%20Logo.png?raw=true" width="240" height="100" /></center>

<center><h1>Building Character Level Recurrent Neural Networks<center/>

---
# **Table of Contents**
---

**1.** [**Introduction to Character Level Language Modelling**](#Section1)<br>
**2.** [**Problem Description**](#Section2)<br>
**3.** [**Installing & Importing Libraries**](#Section3)<br>
**4.** [**Data Acquisition & Description**](#Section4)<br>
**5.** [**Data Preprocessing**](#Section5)<br>
**6.** [**Character Level LSTM Model**](#Section6)<br>
  - **6.1** [**Define LSTM Model**](#Section61)
  - **6.2** [**Model Training**](#Section62) 
  - **6.3** [**Model Testing**](#Section63)
  - **6.4** [**Building Larger Model**](#Section61)
  - **6.5** [**Text Generation**](#Section62)

**7.** [**Conclusion**](#Section8)<br>

---
<a name = Section1></a>
# **1. Introduction to Character Level Language Modelling**
---

- When modeling the **joint distribution** of a sentence, a simple **n-gram model** would give zero **probability** to all of the combination that were not encountered in the training corpus.

- i.e. It would most **likely** give zero **probability** to most of the **out-of-sample** test cases. However, new **combinations** of n words that were not seen in the **training** set are likely to occur, thus we do not want to **assign** such cases zero **probability**.



- A **language model** is a particular kind of **machine learning algorithm** that learns the statistical structure of language by **"reading" a large corpus** of text. 

- This model can then produce authentic **language segments** by **predicting the next character** (or word, for word-based models) based on **past characters.**

- **Simpler models** may look at a context of a **short sequence** of words, whereas larger **models** may work at the level of **sentences or paragraphs**.

<center><img src = "https://raw.githubusercontent.com/insaid2018/Term-1/master/Images/google-hangouts-feature.png"width="600" height="250"/></center>

### **Word Level Language Modelling**:

<center><img src = "https://raw.githubusercontent.com/insaid2018/Term-1/master/Images/worlevelpro.JPG"width="400" height="200"/></center>

---
<a name = Section2></a>
# **2. Problem Statement**
---

- The main task of the **character-level** language model is to **predict** the next character given all **previous characters** in a sequence of data, i.e. generates **text character** by character.

- The **character-based part** of the model's name means that **every input vector** represents a **single character** (as opposed to, say, a word or part of an image).

- The models are **prepared** for the prediction of **words** by learning the **features** and **characteristics** of a language.

- A language model **learns** the probability of **word occurrence** based on **examples** of text. 

<center><img src = "https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/char-lstm.jpg"width="800" height="210"/></center>

---
<a name = Section3></a>
# **3. Installing and Importing Libraries**
---

In [None]:
# Import tensorflow 2.x
# This code block will only work in Google Colab.
try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
except Exception:
    pass

In [None]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import LSTM
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.utils import to_categorical

---
<a name = Section4></a>
# **4. Data Acquisition & Description**
---

 - We are going to use a favorite book from **childhood** as the dataset:

   - **Alice’s Adventures in Wonderland** by Lewis Carroll.

<br>  
 <center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/51Dp6aAR4HL._SX357_BO1%2C204%2C203%2C200_.jpg" width="300" height="450"/></center>

<br>  
 - We are going to learn the **dependencies** between **characters** and the conditional **probabilities** of characters in **sequences** so that we can in turn generate **whole** new and original **sequences** of characters.

In [None]:
import urllib
sample = urllib.request.urlopen('https://raw.githubusercontent.com/insaid2018/DeepLearning/master/Data/Alice.txt')


---
<a name = Section5></a>
# **5. Data Preprocessing**
---

In [None]:
raw_text = sample.read().decode('utf8')
# Converting all text to lower case.
raw_text = raw_text.lower()



**Remove Preface of the Book**

In [None]:
raw_text = raw_text[623:]

**Remove Liscence that is present at the end of the Book.**

In [None]:
raw_text = raw_text[0:-18757]

**Sample Page from BOOK**

In [None]:
print(raw_text[:1000])

alice’s adventures in wonderland

lewis carroll

the millennium fulcrum edition 3.0




chapter i. down the rabbit-hole

alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it had no pictures or conversations in
it, ‘and what is the use of a book,’ thought alice ‘without pictures or
conversations?’

so she was considering in her own mind (as well as she could, for the
hot day made her feel very sleepy and stupid), whether the pleasure
of making a daisy-chain would be worth the trouble of getting up and
picking the daisies, when suddenly a white rabbit with pink eyes ran
close by her.

there was nothing so very remarkable in that; nor did alice think it so
very much out of the way to hear the rabbit say to itself, ‘oh dear!
oh dear! i shall be late!’ (when she thought it over afterwards, it
occurred to her that she ought to have wondered at this, but at the time
it

**Observations:**

 - Prepare the data for **modeling** by the **neural network**. 
 
- We cannot model the **characters** directly, instead we must convert the **characters** to **integers**.

 - We can do this **easily** by first creating a set of all of the **distinct** characters in the book, then **creating** a **map** of each **character** to a unique integer.

In [None]:
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

In [None]:
chars[:20]

['\n',
 ' ',
 '!',
 '(',
 ')',
 '*',
 ',',
 '-',
 '.',
 '0',
 '3',
 ':',
 ';',
 '?',
 '[',
 ']',
 '_',
 'a',
 'b',
 'c']

In [None]:
char_to_int

{'\n': 0,
 ' ': 1,
 '!': 2,
 '(': 3,
 ')': 4,
 '*': 5,
 ',': 6,
 '-': 7,
 '.': 8,
 '0': 9,
 '3': 10,
 ':': 11,
 ';': 12,
 '?': 13,
 '[': 14,
 ']': 15,
 '_': 16,
 'a': 17,
 'b': 18,
 'c': 19,
 'd': 20,
 'e': 21,
 'f': 22,
 'g': 23,
 'h': 24,
 'i': 25,
 'j': 26,
 'k': 27,
 'l': 28,
 'm': 29,
 'n': 30,
 'o': 31,
 'p': 32,
 'q': 33,
 'r': 34,
 's': 35,
 't': 36,
 'u': 37,
 'v': 38,
 'w': 39,
 'x': 40,
 'y': 41,
 'z': 42,
 '‘': 43,
 '’': 44,
 '“': 45,
 '”': 46}

**Observations:**

- You can see that there may be **some characters** that we could remove to further **clean up** the **dataset** that will **reduce** the vocabulary and may improve the **modeling process**.

- Now that the book has been loaded and the **mapping prepared**, we can **summarize the dataset**.

In [None]:
n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters: ", n_chars)
print("Total Unique Vocab: ", n_vocab)

Total Characters:  144435
Total Unique Vocab:  47


#### Creating Input and Output variables

 - We will split the book text up into **subsequences** with a fixed **length** of **100** characters.

 - Each **training pattern** of the network is comprised of **100** time steps of one character (X) **followed** by one **character** output (y). 
 
- When creating these **sequences**, we slide this window along the whole book one **character** at a time, allowing each **character** a chance to be **learned** from the **100 characters** that preceded it.

 - For example, if the **sequence length** is 5 (for simplicity) then the **first** two **training patterns** would be as follows:

In [None]:
maxlen = 100
step = 1
sentences = []
next_chars = []
for i in range(0, len(raw_text) - maxlen, step):
    sentences.append(raw_text[i: i + maxlen])
    next_chars.append(raw_text[i + maxlen])
print('Number of sequences:', len(sentences))

Number of sequences: 144335


In [None]:
sentences[:10]

['alice’s adventures in wonderland\n\nlewis carroll\n\nthe millennium fulcrum edition 3.0\n\n\n\n\nchapter i. d',
 'lice’s adventures in wonderland\n\nlewis carroll\n\nthe millennium fulcrum edition 3.0\n\n\n\n\nchapter i. do',
 'ice’s adventures in wonderland\n\nlewis carroll\n\nthe millennium fulcrum edition 3.0\n\n\n\n\nchapter i. dow',
 'ce’s adventures in wonderland\n\nlewis carroll\n\nthe millennium fulcrum edition 3.0\n\n\n\n\nchapter i. down',
 'e’s adventures in wonderland\n\nlewis carroll\n\nthe millennium fulcrum edition 3.0\n\n\n\n\nchapter i. down ',
 '’s adventures in wonderland\n\nlewis carroll\n\nthe millennium fulcrum edition 3.0\n\n\n\n\nchapter i. down t',
 's adventures in wonderland\n\nlewis carroll\n\nthe millennium fulcrum edition 3.0\n\n\n\n\nchapter i. down th',
 ' adventures in wonderland\n\nlewis carroll\n\nthe millennium fulcrum edition 3.0\n\n\n\n\nchapter i. down the',
 'adventures in wonderland\n\nlewis carroll\n\nthe millennium fulcrum edition 3.0\n\n\n\

In [None]:
next_chars[:20]

['o',
 'w',
 'n',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'r',
 'a',
 'b',
 'b',
 'i',
 't',
 '-',
 'h',
 'o',
 'l',
 'e',
 '\n']

**Observations:**

- As we split up the **book** into these sequences, we **convert** the **characters** to **integers** using our **lookup** table we prepared earlier.

- This will help us **during** the text **generation** step.

In [None]:
# prepare the dataset of input to output pairs encoded as integers
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
	seq_in = raw_text[i:i + seq_length]
	seq_out = raw_text[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])

n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

Total Patterns:  144335


- **One-hot encode** the **input** sequences, and pack them in a **3D Numpy** array **X** of shape **`(sequences, maxlen, unique_characters)`**.

- Prepare an array **y** containing the corresponding **one-hot-encoded characters** that come after each **extracted** sequence.

- Creating **zero** matrices **X** and **y** of the required shapes.

- This will help us **easily** create the one-hot encoded version of our **input** and **target** data.

In [None]:
X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
X[1]

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])

In [None]:
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
y[1]

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False])

In [None]:
print('Shape of X:', X.shape)
print('Shape of y:', y.shape)

Shape of X: (144335, 100, 47)
Shape of y: (144335, 47)


 - To convert the **output** **patterns** (single characters converted to integers) into a **one hot encoding**. 

- Doing this we can **configure** the network to **predict** the **probability** of each of the **47** different characters in the **vocabulary** rather than trying to **force** it to predict **precisely** the next character.

- Each **y** value is converted into a **sparse vector** with a **length** of 47, full of **zeros** except with a 1 in the column for the **letter** (integer) that the pattern represents.

 - For example, when **`n`** (integer value 31) is one hot **encoded** it looks as follows:

   __[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0. 0.  0.  0.  0.  0.  0.  0.  0.]__

In [None]:
for i, sentence in enumerate(sentences):
    
    for t, char in enumerate(sentence):
        X[i, t, char_to_int[char]] = 1
    
    y[i, char_to_int[next_chars[i]]] = 1

In [None]:
X[0]

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False,  True, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])

In [None]:
y[0]

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False])

---
<a name = Section6></a>
# **6. Machine Translation Model with Attention Mechanism**
---

- We'll be using the following **process sequence** in this notebook:

<br>   
<center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/char_lstm_flow0.png"width="700"height="400"/></center>

<a name = Section11></a>
### **6.1 Define LSTM model**

- Define our LSTM model:

 - Define a single hidden **LSTM** layer with **256** memory units.

 - The network uses **dropout** with a probability of **20 Percent**. 

 - The **output layer** is a Dense layer using the softmax activation function to output a probability **prediction** for each of the **47** characters between 0 and 1.

- The problem is really a single character **classification** problem with 47 classes and as such is defined as **optimizing** the **log loss** (cross entropy), here using the **ADAM optimization** algorithm for **speed**.

In [None]:
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(maxlen, len(chars))))
model.add(Dropout(0.2))
model.add(Dense(len(chars), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

- We are **modeling** the entire **training** dataset to learn the **probability** of each character in a **sequence**.

- This would be a model that **predicts** each **character** in the training dataset **perfectly**. 

- Instead we are interested in a **generalization** of the dataset that **minimizes** the chosen loss function. 

- We are **seeking** a balance between **generalization** and **overfitting** but short of **memorization**.

In [None]:
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

<a name = Section11></a>
### **6.2 Model Training**

In [None]:
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20

Epoch 00001: loss improved from inf to 2.45730, saving model to weights-improvement-01-2.4573.hdf5
Epoch 2/20

Epoch 00002: loss improved from 2.45730 to 1.95006, saving model to weights-improvement-02-1.9501.hdf5
Epoch 3/20

Epoch 00003: loss improved from 1.95006 to 1.74840, saving model to weights-improvement-03-1.7484.hdf5
Epoch 4/20

Epoch 00004: loss improved from 1.74840 to 1.61461, saving model to weights-improvement-04-1.6146.hdf5
Epoch 5/20

Epoch 00005: loss improved from 1.61461 to 1.52086, saving model to weights-improvement-05-1.5209.hdf5
Epoch 6/20

Epoch 00006: loss improved from 1.52086 to 1.44502, saving model to weights-improvement-06-1.4450.hdf5
Epoch 7/20

Epoch 00007: loss improved from 1.44502 to 1.38157, saving model to weights-improvement-07-1.3816.hdf5
Epoch 8/20

Epoch 00008: loss improved from 1.38157 to 1.32408, saving model to weights-improvement-08-1.3241.hdf5
Epoch 9/20

Epoch 00009: loss improved from 1.32408 to 1.27543, saving model to weig

<tensorflow.python.keras.callbacks.History at 0x7f7f702e2390>

 **Observation:**

 - In our case, the file **weights-improvement-20-0.9285.hdf5** has the **least loss** value of **0.9285**

 - This file was generated in the **last** (**20th**) epoch.

In [None]:
!ls # to see list of all weight checkpoint files created

sample_data			    weights-improvement-11-1.1922.hdf5
weights-improvement-01-2.4573.hdf5  weights-improvement-12-1.1551.hdf5
weights-improvement-02-1.9501.hdf5  weights-improvement-13-1.1160.hdf5
weights-improvement-03-1.7484.hdf5  weights-improvement-14-1.0847.hdf5
weights-improvement-04-1.6146.hdf5  weights-improvement-15-1.0529.hdf5
weights-improvement-05-1.5209.hdf5  weights-improvement-16-1.0238.hdf5
weights-improvement-06-1.4450.hdf5  weights-improvement-17-0.9981.hdf5
weights-improvement-07-1.3816.hdf5  weights-improvement-18-0.9730.hdf5
weights-improvement-08-1.3241.hdf5  weights-improvement-19-0.9511.hdf5
weights-improvement-09-1.2754.hdf5  weights-improvement-20-0.9285.hdf5
weights-improvement-10-1.2318.hdf5


- **Generating** text using the **trained** LSTM model.

In [None]:
# load the network weights
filename = "weights-improvement-20-0.9285.hdf5" # use the weight checkpoint file that has least loss value.
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

 - Also, when **preparing** the mapping of **unique** characters to integers, we must also **create** a reverse **mapping** that we can use to **convert** the integers back to **characters** so that we can understand the **predictions**.

In [None]:
int_to_char = dict((i, c) for i, c in enumerate(chars))

<a name = Section11></a>
### **6.3 Model Testing**

The simplest way to use the Keras **LSTM model** to make predictions is to:

 - First start off with a **seed sequence** as input, generate the next character then update the seed sequence to add the **generated** character on the end and trim off the **first** character.
 
 <br>  
<center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/ytr.PNG"width="400" height="500" /></center>

<br>  
 - This process is **repeated** for as long as we want to **predict** new characters (e.g. a sequence of 1,000 characters in length).

- We can pick a **random** input pattern as our **seed sequence**, then print generated characters as we **generate** them.

In [None]:
import sys
# pick a random seed
start = np.random.randint(0, len(dataX) - 1)
pattern = dataX[start]
generated_text = ''.join([int_to_char[value] for value in pattern])

print("Seed:")
print("\"", generated_text, "\"")

# generate characters
for i in range(1000):
    sampled = np.zeros((1, maxlen, len(chars)))
    
    for t, char in enumerate(generated_text):
        sampled[0, t, char_to_int[char]] = 1.

    prediction = model.predict(sampled, verbose=0)

    index = np.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)

    pattern.append(index)
    pattern = pattern[1:len(pattern)]

print("\nDone.")

Seed:
" ng as you’re falling
through the air! do you think you could manage it?) ‘and what an
ignorant littl "
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

Running the below code first outputs the selected random seed, then each character as it is generated.

We can note some observations about the generated text.

 - It generally **conforms to the line format** observed in the original text of less than 80 characters before a new line.

 - The **characters** are separated into word-like groups and most groups are actual English words (e.g. **`the`**, **`little`** and **`was`**), but many do not (e.g. **`lott`**, **`tiie`** and **`taede`**).

 - Some of the **words** in sequence make **sense**(e.g. **`and the white rabbit`**), but many do not (e.g. **`wese tilel`**).

 - The fact that this character **based** model of the book **produces** output like this is very **impressive**. It gives you a sense of the learning **capabilities** of **LSTM** networks.
 

<a name = Section11></a>
### **6.4 Building Larger Model**

![](https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/char_lstm_flow7.png)

- Keep the number of **memory** units the same at **256**, but add a second layer.

- **Increase** the number of **training** epochs from 20 to 50 and **decrease** the batch size from **128 to 64** to give the network more of an **opportunity** to be updated and learn.

In [None]:
model = Sequential()
model.add(LSTM(256, input_shape=(maxlen, len(chars)), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(len(chars), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(X, y, epochs=50, batch_size=64, callbacks=callbacks_list)

Epoch 1/50

Epoch 00001: loss improved from inf to 2.79302, saving model to weights-improvement-01-2.7930-bigger.hdf5
Epoch 2/50

Epoch 00002: loss improved from 2.79302 to 2.40192, saving model to weights-improvement-02-2.4019-bigger.hdf5
Epoch 3/50

Epoch 00003: loss improved from 2.40192 to 2.19847, saving model to weights-improvement-03-2.1985-bigger.hdf5
Epoch 4/50

Epoch 00004: loss improved from 2.19847 to 2.06576, saving model to weights-improvement-04-2.0658-bigger.hdf5
Epoch 5/50

Epoch 00005: loss improved from 2.06576 to 1.97430, saving model to weights-improvement-05-1.9743-bigger.hdf5
Epoch 6/50
   192/144335 [..............................] - ETA: 10:13 - loss: 2.0523

KeyboardInterrupt: 

In [None]:
ls # to see list of all weight checkpoint files created

<a name = Section11></a>
### **6.5 Text generation**

![](https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/char_lstm_flow8.png)

In [None]:
filename = "weights-improvement-47-1.2219-bigger.hdf5" # use the weight checkpoint file that has least loss value.
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')
# pick a random seed
start = np.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print "Seed:"
print "\"", ''.join([int_to_char[value] for value in pattern]), "\""
# generate characters
for i in range(1000):
    x = np.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = np.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print "\nDone.

----

<a id=section7></a>
# **7. Conclusion**
----

 - We can see that **generally** there are **fewer spelling mistakes** and the text looks more **realistic**, but is still quite **nonsensical**.

- If we have **more** data, a **bigger** model, and train longer, we may get more **interesting results**.

- However, to get a **very** interesting results, we should **instead** use **Long Short-Term Memory** (LSTM) model with more than **one layer deep**.

- LSTM models **outperform** simple RNN due to its **ability** in capturing **longer time** dependencies.

- We can control the level of **randomness** using the **sampling strategy**. Here, we **balanced** between what the **model thinks** it’s the right character and the level of randomness.