
# Article Generation(Text) with RNN/LSTM(Tutorial)
## Final project for phase II - EIP at MLBLR.com

**Objective:** To generate meaningful text based on a subject/entity using a recurrent neural network. 

Project based on "Generate realistic Yelp reviews with Keras" by Tony607
[[link]](https://github.com/Tony607/Yelp_review_generation)

## Table of Contents

1. <a href='#1'>Introduction</a>
2. <a href='#2'>RNNs and how+why they work</a>
3. <a href='#3'>Why RNN's don't work</a>
4. <a href='#4'>LSTM</a>
5. <a href='#5'>How+Why LSTM works</a>
6. <a href='#6'>Interesting applications</a>
7. <a href='#7'>Example with code</a>
    **Text generation on countries/articles from Wiikipedia using LSTM(Keras)**
     1. Gathering dataset
     2. Data cleaning
     3. Vectorizing words
     4. Defining helper functions
     5. Defining model
     6. Training
     7. Generate text
8. <a href='#8'>Summary</a>
9. <a href='#9'>Additional reading material</a>

## 1. Introduction
<a id='1'></a>

Traditional neural networks can’t captare temporal dependencies i.e. dependencies that vary over time, and it seems like a major shortcoming. For example, imagine you want to classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its reasoning about previous events in the film to inform later ones.

Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist. Recurrent Neural Networks were created in the 1980’s but have just been recently gaining popularity from advances to the networks designs and increased computational power from graphic processing units. They’re especially useful with sequential data because each neuron or unit can use its internal memory to maintain information about the previous input. This is great because in cases of language, “I had washed my house” is much more different than “I had my house washed”. This allows the network to gain a deeper understanding of the statement.

## 2. RNNs and how+why they work
<a id='2'></a>

The general structure of the RNN is shown in the image below.
![RNN_unfolded](https://cdn-images-1.medium.com/max/1600/1*NKhwsOYNUT5xU7Pyf6Znhg.png)

--------------**FOLDED**---------------------------------------------------------------------**UNFOLDED**---------------------------------------------------------------------------

**Explanation**

**Folded** - In the above diagram, a chunk of neural network, A, looks at some input xt and outputs a value ht. A loop allows information to be passed from one step of the network to the next.A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.

**Unfolded** - That sequential information is preserved in the recurrent network’s hidden state, which manages to span many time steps as it cascades forward to affect the processing of each new example. It is finding correlations between events separated by many moments, and these correlations are called “long-term dependencies”, because an event downstream in time depends upon, and is a function of, one or more events that came before. One way to think about RNNs is this: they are a way to share weights over time. The decision a recurrent net reached at time step t-1 affects the decision it will reach one moment later at time step t. So recurrent networks have two sources of input, the present and the recent past, which combine to determine how they respond to new data, much as we do in life. The chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They’re the natural architecture of neural network to use for such data. The arrow means that long-term information has to sequentially travel through all cells before getting to the present processing cell.

**Why RNNs?**

Since RNNs preserve temporal information they can be used for applications that vary with time. There are many different applications of RNNs. A great application is in collaboration with Natural Language Processing (NLP). RNNs have been demonstrated by many people on the internet who created amazing models that can represent a language model. These language models can take input such as a large set of shakespeares poems, and after training these models they can generate their own Shakespearean poems that are very hard to differentiate from originals.

## 3. Why RNNs don't work
<a id='3'></a>

When training long texts or a lenghty video, the results from a vanilla RNN are not that good. The problem comes from the fact that at each time step during training we are using the same weights to calculate y_t. That multiplication is also done during back-propagation. The further we move backwards, the bigger or smaller our error signal becomes. This means that the network experiences difficulty in memorising words from far away in the sequence and makes predictions based on only the most recent ones.

![RNN_fail](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-longtermdependencies.png)

                (fig) The RNN cannot learn from the hidden state at the beginning of the network

## 4. LSTM
<a id='4'></a>

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997), and were refined and popularized by many people in following work. They work tremendously well on a large variety of problems, and are now widely used.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn.  LSTM module can bypass units and thus remember for longer time steps. LSTM thus have a way to remove some of the vanishing gradients problems.

![LSTM](https://cdn-images-1.medium.com/max/1600/1*J5W8FrASMi93Z81NlAui4w.png)

## 5. How+Why LSTM works
<a id='5'></a>

LSTM comprises of 4 components. They are

### i. Forget gate
The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It looks at ht−1 and xt, and outputs a number between 0 and 1 for each number in the cell state Ct−1. A 1 represents “completely keep this” while a 0 represents “completely get rid of this.”

![forget_gate](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-f.png)

### ii. Remember gate
The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of new candidate values, C~t, that could be added to the state. In the next step, we’ll combine these two to create an update to the state.

![remember_gate](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-i.png)

### iii. Learn gate
It’s now time to update the old cell state, Ct−1, into the new cell state Ct. The previous steps already decided what to do, we just need to actually do it.

We multiply the old state by ft, forgetting the things we decided to forget earlier. Then we add it∗C~t. This is the new candidate values, scaled by how much we decided to update each state value.

![learn_gate](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-C.png)

### iv. Use gate
Finally, we need to decide what we’re going to output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.

![use_gate](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-o.png)

**The Why**

Since in LSTM, there is an additional parameter C which holds temporal information from earlier stages in the network that is needed in the later stages of the network. This gives anedge over the vanilla RNN by a large margin. Though LSTMs are quite difficult to understand, their implementation is pretty easy and the results are relatable to the task at hand.

## 6. Interesting applications
<a id='6'></a> 

 - Are you into gaming and bots? Check out the [DotA 2 bot by Open AI](https://blog.openai.com/dota-2/)
 - How about automatically [adding sounds to silent movies?](https://www.youtube.com/watch?time_continue=1&v=0FW99AQmMc8)
 - Here is a cool tool for [automatic handwriting generation](http://www.cs.toronto.edu/~graves/handwriting.cgi?text=My+name+is+Luka&style=&bias=0.15&samples=3)
 - Amazon's voice to text using [high quality speech recognition, Amazon Lex.](https://aws.amazon.com/lex/faqs/)
 - Facebook uses RNN and LSTM technologies for [building language models](https://code.facebook.com/posts/1827693967466780/building-an-efficient-neural-language-model-over-a-billion-words/)
 - Netflix also uses RNN models - [here is an interesting read](https://arxiv.org/pdf/1511.06939.pdf)

## 7. Example with code
<a id='7'></a>

## Text generation on countries/articles from Wiikipedia using LSTM(Keras)
The following implementation is an example of text generation with LSTM. We will take the summary of a Wikipedia articles of countries and generate text based on it.

## 7.1. Gathering dataset

**Current objective:** To get text from the wikipedia page and store it as as csv.

The first step is to gather text data for training the model. Wikipedia has an entire dump of all articles available as a zip format. But this contains text of topics belonging to all categories. We are specifically interested in country articles alone. So a new dataset has to be created for this purpose.

Wikipedia has a python library which allows for easy text scraping from the article page. It can be limited to only title or summary or even the entire page.

**Link to Wikipedia library documentation:**[[link]](https://pypi.org/project/wikipedia/)

Below code cell extracts summary of countries in the list and saves them as a csv file. It takes about 1-2 seconds per title depending on the internet speed.

**Note 1:** If needed to train on any other data apart from countries, replace the list of countries with that category required to train

**Note 2:** The following cell needs to be run only once for the first time when gathering data.

In [None]:
## imports 
import wikipedia  
from tqdm import tqdm # to print current progress status
import csv

## Create a empty dictionary to store the descriptions
summary_dict={}

## create a list of article headings that the model needs to be trained on
countries_list=["Afghanistan","Albania","Algeria","Andorra","Angola","Anguilla","Antigua & Barbuda",
"Argentina","Armenia","Australia","Austria","Azerbaijan","Bahamas","Bahrain","Bangladesh","Barbados",
"Belarus","Belgium","Belize","Benin","Bermuda","Bhutan","Bolivia","Bosnia & Herzegovina","Botswana",
"Brazil","Brunei Darussalam","Bulgaria","Burkina Faso","Burma","Burundi","Cambodia","Cameroon",
"Canada","Cape Verde","Cayman Islands","Central African Republic","Chad","Chile","China","Colombia","Comoros",
"Democratic Republic of the Congo","Costa Rica","Croatia","Cuba","Cyprus","Czech Republic",
"Denmark","Djibouti","Dominica","Dominican Republic","Ecuador","Egypt",
"El Salvador","Equatorial Guinea","Eritrea","Estonia","Ethiopia","Fiji","Finland","France","French Guiana","Gabon","Gambia",
"Democratic Republic of Georgia","Germany","Ghana","Great Britain","Greece","Grenada","Guadeloupe",
"Guatemala","Guinea","Guinea-Bissau","Guyana","Haiti","Honduras","Hungary","Iceland","India",
"Indonesia","Iran","Iraq","Israel and the Occupied Territories","Italy","Ivory Coast","Jamaica","Japan","Jordan",
"Kazakhstan","Kenya","Kosovo","Kuwait","Kyrgyzstan","Laos","Latvia","Lebanon","Lesotho","Liberia","Libya","Liechtenstein","Lithuania",
"Luxembourg","Republic of Macedonia","Madagascar","Malawi","Malaysia","Maldives","Mali","Malta",
"Martinique","Mauritania","Mauritius","Mayotte","Mexico","Monaco","Mongolia","Montenegro",
"Montserrat","Morocco","Mozambique","Namibia","Nepal","Netherlands","New Zealand","Nicaragua","Niger","Nigeria","North Korea","Norway",
"Oman","Pacific Islands","Pakistan","Panama","Papua New Guinea","Paraguay","Peru","Philippines","Poland",
"Portugal","Puerto Rico","Qatar","Romania","Russian Federation",
"Rwanda","Saint Kitts and Nevis","Saint Lucia","Saint Vincents and the Grenadines","Samoa","Sao Tome and Principe",
"Saudi Arabia","Senegal","Serbia","Seychelles","Sierra Leone",
"Singapore","Slovakia","Slovenia","Solomon Islands","Somalia","South Africa","South Korea","South Sudan",
"Spain","Sri Lanka","Sudan","Suriname","Swaziland","Sweden","Switzerland","Syria",
"Tajikistan","Tanzania","Thailand","Timor Leste","Togo","Trinidad & Tobago","Tunisia","Turkey",
"Turkmenistan","Turks & Caicos Islands","Uganda","Ukraine","United Arab Emirates","United States of America",
"Uruguay","Uzbekistan","Venezuela","Vietnam","Virgin Islands (UK)","Virgin Islands (US)","Yemen",
"Zambia","Zimbabwe"]

## loop through each title to get the summary of the country and store them in the dictionary created earlier
for country in tqdm(countries_list):
    summ=wikipedia.summary(country)
    summary_dict[country]=summ

## create a csv file with the data in the dictionary
with open('country_desc.csv','w', encoding='utf8') as f:
    heads=["country", "description"]
    w=csv.DictWriter(f,heads)
    w.writeheader()
    for key, val in sorted(summary_dict.items()):
            row = {'country': key, 'description':val}
            w.writerow(row) 

## 7.2. Data cleaning
**Currect objective:** To clean and preprocess the text that is capable of being fed into the network and store it in a txt file

Now that we have our data, we need to clean it and preprocess it. The data in the csv is in raw form and it needs to be processed before it can be fed to the network. This process is donw in a number of steps

In [2]:
import pandas as pd
df=pd.read_csv('country_desc.csv')
print("Shape: ", df.shape)
df.head(4)

Shape:  (195, 2)


Unnamed: 0,country,description
0,Afghanistan,Afghanistan ( ( listen); Pashto/Dari: افغانستا...
1,Albania,Albania ( ( listen) a(w)l-BAY-nee-ə; Albanian:...
2,Algeria,"Algeria (; Arabic: الجزائر‎ al-Jazā'ir, famila..."
3,Andorra,"Andorra ( ( listen); Catalan: [ənˈdorə], local..."


So we have a total of 195 countries and their description. The text also has a few foreign characters such as "الجزائر al-Jazā'ir". Such characters are tough to be understood by the neural network unless provided a large number of similar language.

We ultimately want only the description for training the model. One of the best approach is to have a text file that has all the descriptions. In the following cells, we will clean and preprocess the data.

In [5]:
## preview of the text description
print("SAMPLE TEXT")
df['description'][35]

SAMPLE TEXT


'The Cayman Islands ( or ) is an autonomous British Overseas Territory in the western Caribbean Sea. The 264-square-kilometre (102-square-mile) territory comprises the three islands of Grand Cayman, Cayman Brac and Little Cayman located south of Cuba, northeast of Costa Rica, north of Panama, east of Mexico and northwest of Jamaica. Its population is approximately 60,765, and its capital is George Town.\r\nThe Cayman Islands is considered to be part of the geographic Western Caribbean Zone as well as the Greater Antilles. The territory is often considered a major world offshore financial haven for international businesses and many wealthy individuals.'

In [7]:
## Preprocessing #1

description=df[['description']]     ## ceating a new dataframe with only descriptions

## To remove the carriage return(\r) in the text, we use a string replace function and replace with nothing('')
description=description.replace({r'\n+': ''}, regex=True)

## Drop any duplicate descriptions(highly unlikely that this dataset will have any!)
description=description.drop_duplicates()

## using an astype(string) to convert any leftover integers or other data types in the text
description['description']=description['description'].astype(str)

The text has to be in a format called "utf-8" for it to be understood by the system. So the text has to be converted to that format.

In [9]:
for i in range(0,len(description)):  ## looping over the entire dataframe
    ## first we encode the text to ascii. ASCII stands for American Standard Code for Information Interchange.
    ## It has a numerical value for A-Z, a-z, 0-9 and other standard symbols. The following line encodes them to
    ## ascii values and then decodes them back to utf-8
    
    y=description['description'][i].encode("ascii", errors="ignore").decode('utf-8')
    
    ## Adding the decoded text to the dataframe
    description['description'][i]=y

print("SAMPLE TEXT")
print(description['description'][20])

SAMPLE TEXT
Bermuda () is a British Overseas Territory in the North Atlantic Ocean. It is approximately 1,070 km (665 mi) east-southeast of Cape Hatteras, North Carolina; 1,236 km (768 mi) south of Cape Sable Island, Nova Scotia; and 1,759 km (1,093 mi) north of Cuba. The capital city is Hamilton. Bermuda is self-governing, with its own constitution and its own government, which enacts local laws, while the United Kingdom retains responsibility for defence and foreign relations.Bermuda's two largest economic sectors are offshore insurance and reinsurance, and tourism. Bermuda had one of the world's highest GDP per capita for most of the 20th century. Recently, its economic status has been affected by the global recession. The island has a subtropical climate and lies in the hurricane belt and prone to related severe weather; however, it is somewhat protected by a coral reef that surrounds the island and its position at the north of the belt, which limits the direction and severity of 

In [10]:
## Now to save the file as a csv file for reuse later.
filename='description_only.csv'
description.to_csv(filename, index=False, encoding='utf-8')

In [11]:
## Now saving as a txt file, that can be directly fed into the model.
filename='description_text.txt'
description.to_csv(filename, header=None, index=None, sep=' ')

## 7.3. Vectorizing words

**Current Objective:** To encode the characters as integers and store them in a dictionary.

The network can only work with numbers abd it does not have the capacity to work with words or letters. So we have toc onvert the letters into numbers that the network can understand. This process is called **Vectorization**. Here we assign a number to evry unique character in the text. 

Encoding the characters as integers makes it easier to use as input in the network.

In [1]:
## making essential imports

from __future__ import print_function
import numpy as np
import random
import sys
import io

In [2]:
## open the txt file and store it in the variable 'text'

with io.open('description_text.txt') as f:
    text = f.read()
print('corpus length:', len(text))

corpus length: 543952


In [3]:
## Take out unique characters in text using set and store them as a list

chars = sorted(list(set(text)))
print('total chars:', len(chars))

total chars: 81


In [4]:
## Map each character to integer
char_indices = dict((c, i) for i, c in enumerate(chars))

## Map each integer to character. This is the reverse process of the above line. Used to convert predicted
## integers to characters

indices_char = dict((i, c) for i, c in enumerate(chars))

#### Create sentence list

Now, we have to create the training data for our LSTM. We create two lists:
    1. **sentences:** This list contains the sequences of words (i.e. a list of words) used to train the model,
    2. **next_chars:** This list contains the next words for each sequences of the sequences list.
    
**How it works**: To create the first sequence of characters, we take the 50th first characters in the text. The character number 51 is the next set of this first sequence, and is added to the next_chars list.

Then we jump by a step of 1 (step = 1) in the list of characters, to create the second sequence of words and retrieve the second next char.

We iterate this job until the end of the list of chars.

In [5]:
maxlen = 50    ##Number of characters that the LSTM layer looks at a time
step = 1
sentences = []
next_chars = []

In [6]:
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

nb sequences: 543902


Next step is to create the matrix X and y to be the data inputs of our model:

1.  X : the matrix of the following dimensions:

    number of sequences,
    number of words in sequences,
    number of words in the vocabulary.

2. y : the matrix of the following dimensions:

    number of sequences,
    number of words in the vocabulary.
     
    For each word, we retrieve its index in the vocabulary, and we set to 1 its position in the matrix. X and y are our training data.

In [7]:
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Vectorization...


## 7.4. Defining helper functions

**Current Objective:** To define helper functions that will be used for training and prediction
1. sample - to sample an index from a probability array
2. on_epoch_end - invoked at end of each epoch. Prints generated text.

In [8]:
from keras.callbacks import LambdaCallback, ModelCheckpoint
from keras.models import Sequential, load_model
from keras.layers import Dense, Activation, Dropout
from keras.layers import LSTM
from keras.optimizers import RMSprop, Adam
from keras.utils.data_utils import get_file

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


We will not take the words with the highest prediction (or the generation of text will be boring), but we would like to insert some uncertainties, and let the solution, sometime, to pick-up words with less good prediction.
Sample() will draw randomly a word from our vocabulary.

However, the probability for a word to be drawn will depends directly on its probability to be the next word.

In order to tune this probability, we introduce a “temperature” or "diversity" to smooth or sharpen its value.

 - if temperature = 1.0, the probability for a word to be drawn is similar to the probability for the word to be the next one in the sequence (the output of the word prediction model), compared to other words in the dictionary,
 - if temperature is big (much bigger than 1), the range of probabilities is shorten: the probabilities for all words to be the next one will increase. More variety of words will be picked-up from the vocabulary, because more words will have high probabilities.
 - if temperature is small (close to 0), small probabilities will be avoided (they will be set to a value closed to 0). Less words will be picked-up from the vocabulary.

In [10]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In on_opech_end(), a preview of the training process can be obtained. After every 2 epochs, the model predicts based on what it has learnt till then. This gives an understanding on how well the model is training.

In [13]:
def on_epoch_end(epoch, logs):
    # Function invoked at end of each epoch. Prints generated text.
    if(epoch%2==0):
        print('----- Generating text after Epoch: %d' % epoch)

        start_index = random.randint(0, len(text) - maxlen - 1)
        for diversity in [0.5, 1.0]:
            print('----- diversity:', diversity)

            generated = ''
            sentence = text[start_index:start_index+maxlen]
            generated += sentence
            print('----- Generating with seed: "' + sentence + '"')
            sys.stdout.write(generated)

            for i in range(400):
                x_pred = np.zeros((1, maxlen, len(chars)))
                for t, char in enumerate(sentence):
                    x_pred[0, t, char_indices[char]] = 1.

                preds = model.predict(x_pred, verbose=0)[0]
                next_index = sample(preds, diversity)
                next_char = indices_char[next_index]

                generated += next_char
                sentence = sentence[1:] + next_char

                sys.stdout.write(next_char)
                sys.stdout.flush()
            print()

## 7.5. Defining model

**Current Objective:** Define a model for training

Here is the architecture for the model for this tutorial:

1. Sequential model with LSTM layer of 128 units. 
 - 128 units is fairly a lower number of units. But given that the training set has only 500k characters and they are very diverse, 128 units could do the task. If the data set is huge and the pattern is complex, then a LSTM with larger number of units is required.

2. Dropout layer of 0.5; avoids quick divergence
3. A dense layer with the number of characters as size(81 in this case)
4. Softmax activation layer


**Optimizer** The optimizer used is a RMSprop with a learning rate of 0.02.
 - No specific reason for the choice of RMSprop here. Adam oprimizer can also be used.
 
Then compiling the model created

In [9]:
print('Build model...')
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dropout(0.6))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

optimizer = RMSprop(lr=0.02)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Build model...


In [12]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 128)               107520    
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 81)                10449     
_________________________________________________________________
activation_1 (Activation)    (None, 81)                0         
Total params: 117,969
Trainable params: 117,969
Non-trainable params: 0
_________________________________________________________________


#### Defining callbacks

In [14]:
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)
checkpoint=ModelCheckpoint('best_model.h5', monitor='loss', mode='min', save_best_only=True, verbose=1)

## 7.6. Training

Now we train the model.

Here batch size denotes the number of sequences of maxlen that goes into the model at a time.
-----> Batch size=512 for 20 epochs.

In [16]:
model.fit(x, y,batch_size=512, epochs=20, callbacks=[print_callback, checkpoint])

Epoch 1/20
----- Generating text after Epoch: 0
----- diversity: 0.5
----- Generating with seed: "t country on mainland South America after Uruguay,"
t country on mainland South America after Uruguay, the the the ending the Frence was the the Sea for independence bec

  after removing the cwd from sys.path.


ting a countries of the southern country is a mintrity and the and the Bankist As and a meritary in the Horin Inlian country in 1996, and the Republic of Countrie the country of war deveral the north and state on the the country in 1996, for the South States resident sector of ground in the in the republic of a west and the is a fe
----- diversity: 1.0
----- Generating with seed: "t country on mainland South America after Uruguay,"
t country on mainland South America after Uruguay, gic war of by Slating bends purturan to West.In  Socaint, Eurian. East the Vectake uumandes northtencted the century nowe in a to Mevitipe in the Derges decerturly 2018, low compris deporge's pome isaity is Molilid of gar Uin congina porricer in the 7dils, 1899, which nice of European unialst member of the kingdom of populous and independence 373 cariticen Gan Alantly and economic untir sian undi

Epoch 00001: loss improved from inf to 1.83732, saving model to best_model.h5
Epoch 2/20

Epoch 00002: loss impr

ompany of Cecil Rhodes first demarcated the presences coust was is an to the tear base heg and the most conssural and was incount and the many Nordia governed in African, which later the colvercoely to followed as ethnic country huls to later exper to creation country. ned is are Sea centuried (85 km (1 mi) (201 narrita the Lamm of percontito fulnsal serge. The alforcogal bethe archipelanity.In 1996, as ethnic only of 2010. Andetian cooperation o

Epoch 00011: loss improved from 1.57705 to 1.57202, saving model to best_model.h5
Epoch 12/20

Epoch 00012: loss improved from 1.57202 to 1.56964, saving model to best_model.h5
Epoch 13/20
----- Generating text after Epoch: 12
----- diversity: 0.5
----- Generating with seed: "a few hundred metres long.A mid-sized country of j"
a few hundred metres long.A mid-sized country of joint is the has a sovereign largest to the republic and the republic of the Southshernation was internation in the first state of the Area to the south and independence 

<keras.callbacks.History at 0x1f2bc972a90>

### Testing

As we can see from the previews above, the model has done a fairly good job of capturing the pattern between text. The text generated is not exactly like we wanted, but it is in a readable format.

Let us try generating text for an entirely imaginary country... **Wakanda**

![Wakanda](https://upload.wikimedia.org/wikipedia/en/9/96/Wakanda_in_Black_Panther_teaser_poster.jpeg)

Seed text: "Wakanda is a country in the southern part of Afric"

In [20]:
for diversity in [0.5, 1.0]:
    print('----- temperature/diversity:', diversity)
    generated = ''
    sentence = "Wakanda is a country in the southern part of Afric"
    generated += sentence
    print('----- Generating with seed: "' + sentence + '"')
    sys.stdout.write(generated)

    for i in range(500):
        x_pred = np.zeros((1, maxlen, len(chars)))
        for t, char in enumerate(sentence):
            x_pred[0, t, char_indices[char]] = 1.

        preds = model.predict(x_pred, verbose=0)[0]
        next_index = sample(preds, diversity)
        next_char = indices_char[next_index]

        generated += next_char
        sentence = sentence[1:] + next_char

        sys.stdout.write(next_char)
        sys.stdout.flush()
    print()

----- temperature/diversity: 0.5
----- Generating with seed: "Wakanda is a country in the southern part of Afric"
Wakanda is a country in the southern part of African country in settled by national in south between Presid

  after removing the cwd from sys.path.


ent and the nation in 1990 in the action and be a state and a international population of the Congo of the European Athian to the north of the Independence of its the a successive south of the the United Nations in the country is a founding capital is ad conslise and a state of the country has a country is has official independence in the Capital Seration in the 1915 becement reserves and the country. For the to the south of the world in t
----- temperature/diversity: 1.0
----- Generating with seed: "Wakanda is a country in the southern part of Afric"
Wakanda is a country in the southern part of African population for shouring the native prossen regions (nater seads of Argentinas geoprote in its wead population has a lands constitution chambly Heugh union, cilic which of Burses and cathereling millions proration in eite is reoiss a Hunguagia, Touemara-and Republic of The yingepigal Portugal social promalling Karitle, was since memres and in its corsising undly mincar by its economy. he

**Baseline Infernce** 

The model has trained pretty well, with readable text and imaginary places like "Congo of the European Athian","chambly Heugh union", "Republic of The yingepigal Portugal".

But the readability decrease when the temperature(randomness increases)

Let us train for an additional 10 epochs and see if the performance improves.

**Additional training for 10 epochs**

In [17]:
model.fit(x, y,batch_size=512, epochs=10, callbacks=[print_callback, checkpoint])

Epoch 1/10
----- Generating text after Epoch: 0
----- diversity: 0.5
----- Generating with seed: " with Finland losing parts of Karelia, Salla, Kuus"
 with Finland losing parts of Karelia, Salla, Kuusia is the country of par

  after removing the cwd from sys.path.


t of the west of when the Council of constitution in the world and the President Empire, and the Bastarai Part of the population of the island and subse and form and the Iran exporten of the benetial and large in the north and east of the Antillian severy in the north and east of South end of the south of the population of the most political settlers from the to the country
----- diversity: 1.0
----- Generating with seed: " with Finland losing parts of Karelia, Salla, Kuus"
 with Finland losing parts of Karelia, Salla, Kuushans, the Par. The major used war regions thate joce, verling as. S, the Man-presentured the largest states the west of the north. Spain is the south, in the largest is a centuries the menied Movement (Ultlabi which of the presud was monarchy largume the world, Independent Ocean. Couse of Empire was the Aliwu Safgen democratios, city of Partyive is World is People's territory of K.cank Sider are a

Epoch 00001: loss did not improve from 1.55359
Epoch 2/10

Epoch 0000

<keras.callbacks.History at 0x1f2bd42ddd8>

**Inference**

For 3 epochs, the model trained but then started overfitting and the loss starts increasing. So the ideal stopping point would be at 23 wpochs in this case.

Now, let us load the best model and generate text out of it and see how it performs

In [21]:
model.load_weights('best_model.h5')

## 7.7 Generating text

In [23]:
generated = ''
sentence = "Wakanda is a country in the southern part of Afric"
generated += sentence
print('----- Generating with seed: "' + sentence + '"')
sys.stdout.write(generated)

for i in range(500):
    x_pred = np.zeros((1, maxlen, len(chars)))
    for t, char in enumerate(sentence):
        x_pred[0, t, char_indices[char]] = 1.

    preds = model.predict(x_pred, verbose=0)[0]
    next_index = sample(preds, 0.5)
    next_char = indices_char[next_index]
    generated += next_char
    sentence = sentence[1:] + next_char
    sys.stdout.write(next_char)
    sys.stdout.flush()
print()

----- Generating with seed: "Wakanda is a country in the southern part of Afric"
Wakanda is a country in the southern part of African border of the Nether and and sence in the world and becoming a mining with a southers the southeast member of the 

  after removing the cwd from sys.path.


independence with the population and west and a one was established population of the constitution of the control of the independence is official centuries constitution by the , and country to independent country in the Independence in the  in the miginal under in the world and the world part of the South Turkic African country is a population of the Albopicar of the most of the s


### Not bad!!!

![obama](https://images.frenchly.us/2016/11/not-bad-obama.jpg)

## 8. Summary
<a id='8'></a>

As we can see above, the model was able to grasp most of the features of the original text.With 500k charactes this is pretty good!

Here are some cool places generated by the model during training
 - "member of the nation of cherition of Prostin"
 - "Warmanoca War"
 - "Astandan Africa
 - "Sea for independence"
 - "Alakok-territory of Yorea"
 - "Aonartanian Indies Asia"
 - "northern excintare"

## 9. Additional resources
<a id='9'></a>

Andrej Karpathy's blog: http://karpathy.github.io/2015/05/21/rnn-effectiveness/


Colah's blog: http://colah.github.io/posts/2015-08-Understanding-LSTMs/