<h1 style="text-align:center">Deep Learning   </h1>
<h1 style="text-align:center"> Lab Session 3 - 3 Hours </h1>
<h1 style="text-align:center">Long Short Term Memory (LSTM) for Language Modeling</h1>

<div class="alert alert-info">
<b> Student 1:</b> Daniele Reda 
<br>
<b> Student 2:</b> Matteo Romiti
</div>
 
In this Lab Session,  you will build and train a Recurrent Neural Network, based on Long Short-Term Memory (LSTM) units for next word prediction task. 

Answers and experiments should be made by groups of one or two students. Each group should fill and run appropriate notebook cells. 
Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an pdf document using print as PDF (Ctrl+P). Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed by June 9th 2017.

Send you pdf file to benoit.huet@eurecom.fr and olfa.ben-ahmed@eurecom.fr using **[DeepLearning_lab3]** as Subject of your email.

#  Introduction

You will train a LSTM to predict the next word using a sample short story. The LSTM will learn to predict the next item of a sentence from the 3 previous items (given as input). Punctuation marks are considered as dictionary items so they can be predicted too. Figure 1 shows the LSTM and the process of next word prediction. 

<img src="lstm.png" height="370" width="370"> 


Each word (and punctuation) from text sentences is encoded by a unique integer. The integer value corresponds to the index of the corresponding word (or punctuation mark) in the dictionary. The network output is a one-hot-vector indicating the index of the predicted word in the reversed dictionary (Section 1.2). For example if the prediction is 86, the predicted word will be "company". 



You will use a sample short story from Aesop’s Fables (http://www.taleswithmorals.com/) to train your model. 


<font size="3" face="verdana" > <i> "There was once a young Shepherd Boy who tended his sheep at the foot of a mountain near a dark forest.

It was rather lonely for him all day, so he thought upon a plan by which he could get a little company and some excitement.
He rushed down towards the village calling out "Wolf, Wolf," and the villagers came out to meet him, and some of them stopped with him for a considerable time.
This pleased the boy so much that a few days afterwards he tried the same trick, and again the villagers came to his help.
But shortly after this a Wolf actually did come out from the forest, and began to worry the sheep, and the boy of course cried out "Wolf, Wolf," still louder than before.
But this time the villagers, who had been fooled twice before, thought the boy was again deceiving them, and nobody stirred to come to his help.
So the Wolf made a good meal off the boy's flock, and when the boy complained, the wise man of the village said:
"A liar will not be believed, even when he speaks the truth."  "</i> </font>.    







Start by loading the necessary libraries and resetting the default computational graph. For more details about the rnn packages, we suggest you to take a look at https://www.tensorflow.org/api_guides/python/contrib.rnn

In [1]:
import numpy as np
import collections # used to build the dictionary
import random
import time
import pickle # may be used to save your model 
import matplotlib.pyplot as plt
#Import Tensorflow and rnn
import tensorflow as tf
from tensorflow.contrib import rnn  

# Target log path
logs_path = 'lstm_words'
writer = tf.summary.FileWriter(logs_path)

# Next-word prediction task

## Part 1: Data  preparation

### 1.1. Loading data

Load and split the text of our story

In [2]:
def load_data(filename):
    with open(filename) as f:
        data = f.readlines()
    data = [x.strip().lower() for x in data]
    data = [data[i].split() for i in range(len(data))]
    data = np.array(data)
    data = np.reshape(data, [-1, ])
    print(data)
    return data

train_file ='data/story.txt'
train_data = load_data(train_file)
print("Loaded training data...")
print(len(train_data))

['there' 'was' 'once' 'a' 'young' 'shepherd' 'boy' 'who' 'tended' 'his'
 'sheep' 'at' 'the' 'foot' 'of' 'a' 'mountain' 'near' 'a' 'dark' 'forest'
 '.' 'it' 'was' 'rather' 'lonely' 'for' 'him' 'all' 'day' ',' 'so' 'he'
 'thought' 'upon' 'a' 'plan' 'by' 'which' 'he' 'could' 'get' 'a' 'little'
 'company' 'and' 'some' 'excitement' '.' 'he' 'rushed' 'down' 'towards'
 'the' 'village' 'calling' 'out' 'wolf' ',' 'wolf' ',' 'and' 'the'
 'villagers' 'came' 'out' 'to' 'meet' 'him' ',' 'and' 'some' 'of' 'them'
 'stopped' 'with' 'him' 'for' 'a' 'considerable' 'time' '.' 'this'
 'pleased' 'the' 'boy' 'so' 'much' 'that' 'a' 'few' 'days' 'afterwards'
 'he' 'tried' 'the' 'same' 'trick' ',' 'and' 'again' 'the' 'villagers'
 'came' 'to' 'his' 'help' '.' 'but' 'shortly' 'after' 'this' 'a' 'wolf'
 'actually' 'did' 'come' 'out' 'from' 'the' 'forest' ',' 'and' 'began' 'to'
 'worry' 'the' 'sheep,' 'and' 'the' 'boy' 'of' 'course' 'cried' 'out'
 'wolf' ',' 'wolf' ',' 'still' 'louder' 'than' 'before' '.' 'but' 't

### 1.2.Symbols encoding

The LSTM input's can only be numbers. A way to convert words (symbols or any items) to numbers is to assign a unique integer to each word. This process is often based on frequency of occurrence for efficient coding purpose.

Here, we define a function to build an indexed word dictionary (word->number). The "build_vocabulary" function builds both:

- Dictionary : used for encoding words to numbers for the LSTM inputs 
- Reverted dictionary : used for decoding the outputs of the LSTM into words (and punctuation).

For example, in the story above, we have **113** individual words. The "build_vocabulary" function builds a dictionary with the following entries ['the': 0], [',': 1], ['company': 85],...


In [3]:
def build_vocabulary(words):
    count = collections.Counter(words).most_common()
    dic= dict()
    for word, _ in count:
        dic[word] = len(dic)
    reverse_dic= dict(zip(dic.values(), dic.keys()))
    return dic, reverse_dic


<div class="alert alert-warning">
Run the cell below to display the vocabulary
</div>

In [4]:
dictionary, reverse_dictionary = build_vocabulary(train_data)
vocabulary_size= len(dictionary) 
print("Dictionary size (Vocabulary size) = ", vocabulary_size)
print("\n")
print("Dictionary : \n")
print(dictionary)
print("\n")
print("Reverted Dictionary : \n" )
print(reverse_dictionary)

Dictionary size (Vocabulary size) =  113


Dictionary : 

{'the': 0, ',': 1, 'a': 2, 'and': 3, '.': 4, 'wolf': 5, 'boy': 6, 'he': 7, 'to': 8, 'of': 9, 'out': 10, 'was': 11, 'his': 12, 'him': 13, 'so': 14, 'villagers': 15, 'this': 16, 'who': 17, 'forest': 18, 'for': 19, 'thought': 20, 'some': 21, 'village': 22, 'came': 23, 'them': 24, 'time': 25, 'again': 26, 'help': 27, 'but': 28, 'come': 29, 'before': 30, 'when': 31, 'there': 32, 'once': 33, 'young': 34, 'shepherd': 35, 'tended': 36, 'sheep': 37, 'at': 38, 'foot': 39, 'mountain': 40, 'near': 41, 'dark': 42, 'it': 43, 'rather': 44, 'lonely': 45, 'all': 46, 'day': 47, 'upon': 48, 'plan': 49, 'by': 50, 'which': 51, 'could': 52, 'get': 53, 'little': 54, 'company': 55, 'excitement': 56, 'rushed': 57, 'down': 58, 'towards': 59, 'calling': 60, 'meet': 61, 'stopped': 62, 'with': 63, 'considerable': 64, 'pleased': 65, 'much': 66, 'that': 67, 'few': 68, 'days': 69, 'afterwards': 70, 'tried': 71, 'same': 72, 'trick': 73, 'shortly': 74, 'after': 

## Part 2 : LSTM Model in TensorFlow

<div class="alert alert-warning">
Since you have defined how the data will be modeled, you are now to develop an LSTM model to predict the word of following a sequence of 3 words.
</div>

### 2.1. Model definition

Define a 2-layers LSTM model.  

For this use the following classes from the tensorflow.contrib library:
<ul>
<li>rnn.BasicLSTMCell(number of hidden units) 
<li>rnn.static_rnn(rnn_cell, data, dtype=tf.float32)
<li>rnn.MultiRNNCell(,)
</ul>
<br>
You may need some tensorflow functions (https://www.tensorflow.org/api_docs/python/tf/) :
<ul>
<li>tf.split
<li>tf.reshape 
<li>...
</ul>

In [5]:
def lstm_model(x, w, b):
    x = tf.reshape(x, [-1, n_input])
    x = tf.split(x,n_input,1)
    rnn_cell = rnn.MultiRNNCell([rnn.BasicLSTMCell(n_hidden),rnn.BasicLSTMCell(n_hidden)])
    outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)
    return tf.matmul(outputs[-1], w['out']) + b['out']

<div class="alert alert-warning">
Training Parameters and constants
</div>

In [9]:
# Training Parameters
learning_rate = 0.001

epochs = 100000
display_step = 10000
n_input = 3
n_hidden = 64


In [7]:
# tf Graph input
x = tf.placeholder("float", [None, n_input, 1])
y = tf.placeholder("float", [None, vocabulary_size])

# LSTM  weights and biases
weights = { 'out': tf.Variable(tf.random_normal([n_hidden, vocabulary_size]))}
biases = {'out': tf.Variable(tf.random_normal([vocabulary_size])) }


#build the model
pred = lstm_model(x, weights, biases)

<div class="alert alert-warning">
Define the Loss/Cost and optimizer
</div>

In [8]:
# Cross Entropy loss
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
# RMSProp Optimizer
optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate).minimize(cost)

# Model evaluation
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

<div class="alert alert-warning">
We give you here the Test Function
</div>

In [10]:
def test(sentence, session, verbose=False):
    sentence = sentence.strip()
    words = sentence.split(' ')
    if len(words) != n_input:
        print("sentence length should be equal to", n_input, "!")
    try:
        symbols_inputs = [dictionary[str(words[i - n_input])] for i in range(n_input)]
        keys = np.reshape(np.array(symbols_inputs), [-1, n_input, 1])
        prediction_hot = session.run(pred, feed_dict={x: keys})
        prediction_hot_index = int(tf.argmax(prediction_hot, 1).eval())
        words.append(reverse_dictionary[prediction_hot_index])
        sentence = " ".join(words)
        if verbose:
            print(sentence)
        return reverse_dictionary[prediction_hot_index]
    except:
        print(" ".join(["Word", words[i - n_input], "not in dictionary"]))

## Part 3 : LSTM Training  

<div class="alert alert-warning">
In the Training process, at each epoch, 3 words are taken from the training data, encoded to integer to form the input vector. The training labels are one-hot vector encoding the word that comes after the 3 inputs words. Display the loss and the training accuracy every 1000 iteration. Save the model at the end of training in the **lstm_model** folder
</div>

In [10]:
# Initializing the variables
start_time = time.time()
init = tf.global_variables_initializer()
model_saver = tf.train.Saver()

print("Start Training")
##############################################

with tf.Session() as session:
    session.run(init)
    step = 0
    offset = random.randint(0,n_input+1)
    # offset = 0
    end_offset = n_input + 1
    acc_total = 0
    loss_total = 0

    writer.add_graph(session.graph)

    while step < epochs:
        if offset > (len(train_data)-end_offset):
            offset = random.randint(0, n_input+1)
    #         offset = 0
        input_symbols = np.array([[dictionary[str(train_data[i])]] for i in range(offset, offset+n_input)])
        input_symbols = input_symbols.reshape((-1, n_input, 1))

        output_symbols_hot = np.zeros([vocabulary_size], dtype=float)
        output_symbols_hot[dictionary[str(train_data[offset+n_input])]] = 1.0
        output_symbols_hot = output_symbols_hot.reshape((1,-1))

        _, acc, loss, prediction_hot = session.run([optimizer, accuracy, cost, pred], \
                                                feed_dict={x: input_symbols, y: output_symbols_hot})
        loss_total += loss
        acc_total += acc
        if (step+1) % display_step == 0:
            print("Iter= " + str(step+1) + ", Average Loss= " + \
                  "{:.6f}".format(loss_total/display_step) + ", Average Accuracy= " + \
                  "{:.2f}%".format(100*acc_total/display_step))
            acc_total = 0
            loss_total = 0
            symbols_in = [train_data[i] for i in range(offset, offset + n_input)]
            symbols_out = train_data[offset + n_input]
            symbols_out_pred = reverse_dictionary[int(tf.argmax(prediction_hot, 1).eval())]
            print("%s - [%s] vs [%s]" % (symbols_in,symbols_out,symbols_out_pred))
        step += 1
        offset += (n_input+1)
    ##############################################

    print("Training Finished!")
    print("Time: ",time.time() - start_time)
    print("For tensorboard visualisation run on command line.")
    print("\ttensorboard --logdir=%s" % (logs_path))
    print("and point your web browser to the returned link")
    ##############################################
    saved_path = "lstm_model/model_" + str(epochs) + "_" + str(n_input)
    model_saver.save(session, saved_path)
    ##############################################
    print("Model saved")

Start Training
Iter= 10000, Average Loss= 2.593704, Average Accuracy= 37.13%
['plan', 'by', 'which'] - [he] vs [he]
Iter= 20000, Average Loss= 0.896524, Average Accuracy= 75.07%
['excitement', '.', 'he'] - [rushed] vs [rushed]
Iter= 30000, Average Loss= 0.514068, Average Accuracy= 83.84%
['that', 'a', 'few'] - [days] vs [by]
Iter= 40000, Average Loss= 0.409926, Average Accuracy= 87.09%
['to', 'his', 'help'] - [.] vs [.]
Iter= 50000, Average Loss= 0.348626, Average Accuracy= 88.88%
['of', 'the', 'village'] - [said] vs [said]
Iter= 60000, Average Loss= 0.289760, Average Accuracy= 90.79%
['village', 'said', ':'] - [a] vs [a]
Iter= 70000, Average Loss= 0.266070, Average Accuracy= 92.15%
['him', 'all', 'day'] - [,] vs [,]
Iter= 80000, Average Loss= 0.242592, Average Accuracy= 92.56%
[',', 'and', 'the'] - [villagers] vs [villagers]
Iter= 90000, Average Loss= 0.234790, Average Accuracy= 93.04%
['and', 'began', 'to'] - [worry] vs [worry]
Iter= 100000, Average Loss= 0.230907, Average Accuracy= 

## Part 4 : Test your model 

### 3.1. Next word prediction

<div class="alert alert-warning">
Load your model (using the model_saved variable given in the training session) and test the sentences :
- 'get a little' 
- 'nobody tried to'
- Try with other sentences using words from the story's vocabulary.
</div>

In [13]:
saved_path = "lstm_model/model_" + str(epochs) + "_" + str(n_input)
with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    saver = tf.train.Saver()
    saver.restore(session, saved_path)
    words = [0,0]
    sentence_length = 1
    while len(words)>1:
        print("-"*20)
        prompt = "Enter %s words: " % n_input
        sentence = input(prompt)
        sentence = sentence.strip()
        words = sentence.split(' ')
        if len(words) != n_input:
            continue
        try:
            input_symbols = [dictionary[str(words[i])] for i in range(len(words))]
            for i in range(sentence_length):
                keys = np.reshape(np.array(input_symbols), [-1, n_input, 1])
                prediction_hot = session.run(pred, feed_dict={x: keys})
                prediction_hot_index = int(tf.argmax(prediction_hot, 1).eval())
                sentence = "%s %s" % (sentence,reverse_dictionary[prediction_hot_index])
                input_symbols = input_symbols[1:]
                input_symbols.append(prediction_hot_index)
            print(sentence)
        except:
            print("Word not in dictionary")

INFO:tensorflow:Restoring parameters from lstm_model/model_100000_3
--------------------
Enter 3 words: get a little
get a little company
--------------------
Enter 3 words: nobody tried to
nobody tried to tried
--------------------
Enter 3 words: the wolf was
the wolf was again
--------------------
Enter 3 words: 


<div class="alert alert-success">
By doing experiments in our tests, we discovered that adding randomness in the `offset` turned out to be fundamental to create good sentences. Also, the number of epochs was increased to get higher accuracy.
</div>

### 3.2. More fun with the Fable Writer !

<div class="alert alert-warning">
You will use the RNN/LSTM model learned in the previous question to create a new story/fable. For this you will choose 3 words from the dictionary which will start your story and initialize your network. Using those 3 words the RNN will generate the next word of the story. Using the last 3 words (the newly predicted one and the last 2 from the input) you will use the network to predict 5 words of the story and so on until your story is 5 sentence long. 
Make a point at the end of your story. To implement that, you will use the test function.
</div>

In [11]:
def write_story(n_input, lastN_str):
    saved_path = "lstm_model/model_" + str(epochs) + "_" + str(n_input)
    with tf.Session() as session:
        saver = tf.train.Saver()
        saver.restore(session, saved_path)
        
        story = "\n"
        lastN_lst = lastN_str.strip().split(' ')
        nxt1 = str(test(lastN_str, session))
        story += lastN_str + " " + nxt1
        lastN_lst.pop(0); lastN_lst.append(nxt1)
        lastN_str = ' '.join(lastN_lst)
        sentences_in_story = 5
        s = 0
        while s < sentences_in_story:
            nxt1 = str(test(lastN_str, session))
            if nxt1 is ".":
                s += 1
                story += nxt1 + "\n\n"
            else:
                story += " " + nxt1
            lastN_lst.pop(0); lastN_lst.append(nxt1)
            lastN_str = ' '.join(lastN_lst)
        print(story)

In [12]:
write_story(n_input=3, lastN_str="the man was")

INFO:tensorflow:Restoring parameters from lstm_model/model_100000_3

the man was of a wolf , still louder than before.

 but this time the villagers came out to meet him , and some of them stopped with him for a considerable time.

 this pleased the boy so much that a few days afterwards he tried the same trick , and again the villagers came out to meet him , and some of them stopped with him for a considerable time.

 this pleased the boy so much that a few days afterwards he tried the same trick , and again the villagers came out to meet him , and some of them stopped with him for a considerable time.

 this pleased the boy so much that a few days afterwards he tried the same trick , and again the villagers came out to meet him , and some of them stopped with him for a considerable time.




<div class="alert alert-success">
The model is able to write a meaningful sentence, but not a complete story.
</div>

### 3.3. Play with number of inputs

<div class="alert alert-warning">
The number of input in our example is 3, see what happens when you use other number (1 and 5).
</div>

In [15]:
write_story(n_input=5, lastN_str="and the man was near")

INFO:tensorflow:Restoring parameters from lstm_model/model_100000_5

and the man was near the sheep, and a boy of course than out wolf , wolf , and the villagers came out to meet him , and some of them stopped with him for a considerable time.

 this pleased the boy so much that a few days afterwards he tried the same trick , and again the villagers came to his help.

 but shortly after this a wolf actually did come out from the forest , and began to worry the sheep, and the boy of course cried out wolf , wolf , and the villagers came out to meet him , and some of them stopped with him for a considerable time.

 this pleased the boy so much that a few days afterwards he tried the same trick , and again the villagers came to his help.

 but shortly after this a wolf actually did come out from the forest , and began to worry the sheep, and the boy of course cried out wolf , wolf , and the villagers came out to meet him , and some of them stopped with him for a considerable time.




In [12]:
write_story(n_input=1, lastN_str="when ")

INFO:tensorflow:Restoring parameters from lstm_model/model_100000_1

when  he his came.

 when he his came.

 when he his came.

 when he his came.

 when he his came.




<div class="alert alert-success">
When using n_input = 1, we get really bad results. Fixing the training epochs, we obtained 15% accuracy.
When using n_input = 5, both the accuracy and the predicted story are much better good. Some parts of the story, though, repeat themselves and we should use a larger dataset and increase the training epochs.
</div>