<h1 style="text-align:center">Deep Learning   </h1>
<h1 style="text-align:center"> Lab Session 3 - 3 Hours </h1>
<h1 style="text-align:center">Long Short Term Memory (LSTM) for Language Modeling</h1>

<b> Student 1:</b> Hanna Johansson 
<b> Student 2:</b> Matteo Fiore
 
 
In this Lab Session,  you will build and train a Recurrent Neural Network, based on Long Short-Term Memory (LSTM) units for next word prediction task. 

Answers and experiments should be made by groups of one or two students. Each group should fill and run appropriate notebook cells. 
Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an pdf document using print as PDF (Ctrl+P). Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed by June 9th 2017.

Send you pdf file to benoit.huet@eurecom.fr and olfa.ben-ahmed@eurecom.fr using **[DeepLearning_lab3]** as Subject of your email.

#  Introduction

You will train a LSTM to predict the next word using a sample short story. The LSTM will learn to predict the next item of a sentence from the 3 previous items (given as input). Ponctuation marks are considered as dictionnary items so they can be predicted too. Figure 1 shows the LSTM and the process of next word prediction. 

<img src="lstm.png" height="370" width="370"> 


Each word (and punctuation) from text sentences is encoded by a unique integer. The integer value corresponds to the index of the corresponding word (or punctuation mark) in the dictionnary. The network output is a one-hot-vector indicating the index of the predicted word in the reversed dictionnary (Section 1.2). For example if the prediction is 86, the predicted word will be "company". 



You will use a sample short story from Aesop’s Fables (http://www.taleswithmorals.com/) to train your model. 


<font size="3" face="verdana" > <i> "There was once a young Shepherd Boy who tended his sheep at the foot of a mountain near a dark forest.

It was rather lonely for him all day, so he thought upon a plan by which he could get a little company and some excitement.
He rushed down towards the village calling out "Wolf, Wolf," and the villagers came out to meet him, and some of them stopped with him for a considerable time.
This pleased the boy so much that a few days afterwards he tried the same trick, and again the villagers came to his help.
But shortly after this a Wolf actually did come out from the forest, and began to worry the sheep, and the boy of course cried out "Wolf, Wolf," still louder than before.
But this time the villagers, who had been fooled twice before, thought the boy was again deceiving them, and nobody stirred to come to his help.
So the Wolf made a good meal off the boy's flock, and when the boy complained, the wise man of the village said:
"A liar will not be believed, even when he speaks the truth."  "</i> </font>.    







Start by loading the necessary libraries and resetting the default computational graph. For more details about the rnn packages, we suggest you to take a look at https://www.tensorflow.org/api_guides/python/contrib.rnn

In [1]:
import numpy as np
import collections # used to build the dictionary
import random
import time
import pickle # may be used to save your model 
import matplotlib.pyplot as plt
#Import Tensorflow and rnn
import tensorflow as tf
from tensorflow.contrib import rnn  

# Target log path
logs_path = 'lstm_words'

# Next-word prediction task

## Part 1: Data  preparation

### 1.1. Loading data

Load and split the text of our story

In [2]:
def load_data(filename):
    with open(filename) as f:
        data = f.readlines()
    data = [x.strip().lower() for x in data]
    data = [data[i].split() for i in range(len(data))]
    data = np.array(data)
    data = np.reshape(data, [-1, ])
    print(data)
    return data

#Run the cell 
train_file ='data/story.txt'
train_data = load_data(train_file)
print("Loaded training data...")
print(len(train_data))

['there' 'was' 'once' 'a' 'young' 'shepherd' 'boy' 'who' 'tended' 'his'
 'sheep' 'at' 'the' 'foot' 'of' 'a' 'mountain' 'near' 'a' 'dark' 'forest'
 '.' 'it' 'was' 'rather' 'lonely' 'for' 'him' 'all' 'day' ',' 'so' 'he'
 'thought' 'upon' 'a' 'plan' 'by' 'which' 'he' 'could' 'get' 'a' 'little'
 'company' 'and' 'some' 'excitement' '.' 'he' 'rushed' 'down' 'towards'
 'the' 'village' 'calling' 'out' 'wolf' ',' 'wolf' ',' 'and' 'the'
 'villagers' 'came' 'out' 'to' 'meet' 'him' ',' 'and' 'some' 'of' 'them'
 'stopped' 'with' 'him' 'for' 'a' 'considerable' 'time' '.' 'this'
 'pleased' 'the' 'boy' 'so' 'much' 'that' 'a' 'few' 'days' 'afterwards'
 'he' 'tried' 'the' 'same' 'trick' ',' 'and' 'again' 'the' 'villagers'
 'came' 'to' 'his' 'help' '.' 'but' 'shortly' 'after' 'this' 'a' 'wolf'
 'actually' 'did' 'come' 'out' 'from' 'the' 'forest' ',' 'and' 'began' 'to'
 'worry' 'the' 'sheep,' 'and' 'the' 'boy' 'of' 'course' 'cried' 'out'
 'wolf' ',' 'wolf' ',' 'still' 'louder' 'than' 'before' '.' 'but' 't

### 1.2.Symbols encoding

The LSTM input's can only be numbers. A way to convert words (symbols or any items) to numbers is to assign a unique integer to each word. This process is often based on frequency of occurrence for efficient coding purpose.

Here, we define a function to build an indexed word dictionary (word->number). The "build_vocabulary" function builds both:

- Dictionary : used for encoding words to numbers for the LSTM inputs 
- Reverted dictionnary : used for decoding the outputs of the LSTM into words (and punctuation).

For example, in the story above, we have **113** individual words. The "build_vocabulary" function builds a dictionary with the following entries ['the': 0], [',': 1], ['company': 85],...


In [3]:
def build_vocabulary(words):
    count = collections.Counter(words).most_common()
    dic= dict()
    for word, _ in count:
        dic[word] = len(dic)
    reverse_dic= dict(zip(dic.values(), dic.keys()))
    return dic, reverse_dic


Run the cell below to display the vocabulary

In [4]:
dictionary, reverse_dictionary = build_vocabulary(train_data)
vocabulary_size= len(dictionary) 
print("Dictionary size (Vocabulary size) = ", vocabulary_size)
print("\n")
print("Dictionary : \n")
print(dictionary)
print("\n")
print("Reversed Dictionary : \n" )
print(reverse_dictionary)

Dictionary size (Vocabulary size) =  113


Dictionary : 

{'company': 32, 'louder': 95, 'but': 21, 'young': 34, 'shortly': 35, 'it': 36, 'this': 11, 'did': 37, 'dark': 33, 'the': 0, 'nobody': 38, 'rushed': 40, 'lonely': 42, 'meal': 43, 'not': 44, 'tended': 45, 'towards': 46, 'could': 48, 'stopped': 49, 'again': 18, 'meet': 51, 'rather': 52, 'believed': 53, 'began': 54, 'once': 55, 'had': 56, 'than': 57, 'liar': 41, 'course': 58, 'wolf': 5, 'trick': 59, 'will': 60, 'sheep,': 61, 'that': 62, 'there': 101, 'who': 17, 'he': 6, 'out': 9, 'calling': 64, 'flock': 65, 'get': 94, 'forest': 19, 'at': 67, 'pleased': 68, 'excitement': 69, 'after': 70, 'deceiving': 82, 'came': 20, 'fooled': 72, 'help': 22, 'day': 73, 'when': 23, ':': 74, 'few': 75, 'good': 76, 'so': 13, 'actually': 77, "boy's": 79, 'before': 24, 'which': 71, 'with': 80, 'complained': 81, 'to': 7, 'him': 12, 'wise': 87, 'thought': 25, 'cried': 83, 'worry': 63, 'was': 14, 'time': 28, 'still': 85, 'tried': 86, 'down': 107, 'little': 8

## Part 2 : LSTM Model in TensorFlow

Since you have defined how the data will be modeled, you are now to develop an LSTM model to predict the word of following a sequence of 3 words. 

### Model definition

Define a 2-layers LSTM model.  

For this use the following classes from the tensorflow.contrib library:

- rnn.BasicLSTMCell(number of hidden units) 
- rnn.static_rnn(rnn_cell, data, dtype=tf.float32)
- rnn.MultiRNNCell(,)


You may need some tensorflow functions (https://www.tensorflow.org/api_docs/python/tf/) :
- tf.split
- tf.reshape 
- ...




<div style="background-color:#327191; vertical-align: middle; padding:5px 0px 10px 10px;">
    <h2><font color='white'>What we did</font></h2>
</div>

<div style="background-color:#d9e6fc; padding:10px 15px 10px 15px;">
We decided to merge part 2 (LSTM Model in TensorFlow), part 3 (LSTM Training) and part 4 (Test your model) and to create one class for the model, which is covering model definition, training and testing. There are several reasons for this choice of implementation, listed below:
<ul>
<li>When it comes to changing parameters it is easier to have one class, which we can have several instances of with different parameters, rather than copying and pasting the same code multiple times. 
<li>It is easier to have one class containing the implementation of the different functions, than to call each function separately.
<li>Overall, we think it is more clear to use a class in this way and it gives a better overview of the implementation.
</ul>
<br>
We have commented within the class to indicate which parts of code belongs to which section in the original guidelines of the lab.
<br>
<h4>Regarding the testing..</h4><br>
Initially we were given a test function that we were supposed to use for the prediction of one word and later on five sentences. Instead of using this function as it was originally we decided to modify it. We did the modifications to obtain a unique function that the user can use to directly generate a given number of words or sentences.The function is defined within the class, but the actual testing of the model is still done in part 4.
</div>

In [5]:
class Model():
    
    #=====================================================
    #    Initialization, default parameters
    #=====================================================
    def __init__(self,dictionary, reverse_dictionary,logs_path,train_data,model='my_model',
                 learning_rate = 0.001,epochs = 50000,display_step = 1000,n_input = 3,n_hidden = 64
                 ):
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.display_step = display_step
        self.n_input = n_input
        self.n_hidden = n_hidden
        self.vocabulary_size = len(dictionary) 
        self.dictionary = dictionary
        self.reverse_dictionary = reverse_dictionary
        self.logs_path = logs_path
        self.train_data = train_data
        self.model = 'lstm_model/'+model
        
    #=====================================================
    #    2. Model definition
    #=====================================================
    def lstm_model(self, x, w, b):
        """defines the model that we are going to use"""

        x = tf.reshape(x, [-1, self.n_input])
        # Generate a n_input-element sequence of inputs
        # (eg. [had] [a] [general] -> [20] [6] [33])
        x = tf.split(x,self.n_input,1)

        # 1-layer LSTM with n_hidden units.
        rnn_cell = rnn.BasicLSTMCell(self.n_hidden)

        # 2-layer LSTM with n_hidden units.
        rnn_cell2 = rnn.BasicLSTMCell(self.n_hidden)

        # multi-rnn from the two basic lstm cells
        multi_rnn = rnn.MultiRNNCell([rnn_cell, rnn_cell2])

        # generate prediction
        outputs, states = rnn.static_rnn(multi_rnn, x, dtype=tf.float32)

        # there are n_input outputs but
        # we only want the last output
        return tf.matmul(outputs[-1], w['out']) + b['out']
    
    #=====================================================
    #    3. LSTM Training
    #=====================================================
    def train(self):
        """defines the training phase of the model"""
        
        tf.reset_default_graph()

        #=====================================================
        #    Training parameters and constants
        #=====================================================
        # tf Graph input
        self.x = tf.placeholder("float", [None, self.n_input, 1], name='InputData')
        self.y = tf.placeholder("float", [None, self.vocabulary_size], name='Labels')

        # LSTM  weights and biases
        weights = {'out': tf.Variable(tf.random_normal([self.n_hidden, self.vocabulary_size]))}
        biases = {'out': tf.Variable(tf.random_normal([self.vocabulary_size])) }

        #build the model
        with tf.name_scope('Model'):
            self.pred = self.lstm_model(self.x, weights, biases)
        
        #=====================================================
        #    Define Loss/Cost and optimizer
        #=====================================================
        with tf.name_scope('Loss'):
            # Loss and optimizer
            cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.pred, labels=self.y))
        with tf.name_scope('RMSPOpt'):    
            #use RMSProp Optimizer
            optimizer = tf.train.RMSPropOptimizer(learning_rate=self.learning_rate).minimize(cost)

        # Model evaluation
        correct_pred = tf.equal(tf.argmax(self.pred,1), tf.argmax(self.y,1))
        accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
        
        #=====================================================
        #    Initialize variables and summary
        #=====================================================
        # Initializing the variables
        start_time = time.time()
        self.init = tf.global_variables_initializer()

        # Create a summary to monitor cost and accuracy tensor
        tf.summary.scalar("Loss", cost)
        tf.summary.scalar("Accuracy", accuracy)
        merged_summary_op = tf.summary.merge_all()

        # Initialize the saver
        self.model_saver = tf.train.Saver()

        #=====================================================
        #    Training
        #=====================================================
        print("Start Training")

        with tf.Session() as sess:
            sess.run(self.init)
            # op to write logs to Tensorboard
            summary_writer = tf.summary.FileWriter(self.logs_path, graph=tf.get_default_graph())
            # Training cycle
            for epoch in range(self.epochs):
                avg_cost = 0.
                offset = epoch % (len(self.train_data)-self.n_input)
                # 3 words are taken from the training data, encoded to integer to form the input vector
                symbols_in_keys = np.array([ [dictionary[ str(self.train_data[i])]] 
                                            for i in range(offset, offset+self.n_input) ])
                symbols_in_keys = symbols_in_keys.reshape(-1,self.n_input,1)

                # creation of the one-hot vector for training labels
                symbols_out_onehot = np.array(np.zeros([self.vocabulary_size], dtype=float))
                # putting to one the cell of the prediction
                symbols_out_onehot[dictionary[str(train_data[offset+self.n_input])]] = 1.0
                # reshaping 
                symbols_out_onehot = symbols_out_onehot.reshape(-1,self.vocabulary_size)

                # running the session
                _, acc, loss, summary, onehot_pred = sess.run([optimizer, accuracy, cost,
                                merged_summary_op, self.pred], 
                                feed_dict={self.x: symbols_in_keys, self.y: symbols_out_onehot})
                
                summary_writer.add_summary(summary, epoch)
                # Display logs per epoch step
                if (epoch+1) % self.display_step == 0:
                    print("Epoch: ", '%02d' % (epoch+1))
                    print("\t\t=====> Loss=", "{:.9f}".format(loss))
                    print("\t\t=====> Accuracy=", "{:.9f}".format(acc))

            # Print
            print("End Of training Finished!")
            print("time: ",time.time() - start_time)
            print("For tensorboard visualisation run on command line.")
            print("\ttensorboard --logdir=%s" % (self.logs_path))
            print("and point your web browser to the returned link")
            
            self.model_saver.save(sess, self.model)
            print("Model saved")
            
    #=====================================================
    #    4. Testing the model
    #=====================================================     
    def test(self, sentences, number_of_sentences=-1, number_of_words=-1):
        """defines the testing function
            Parameters:
             - sentences: an array containing starting sentences; if the starting sentences
                          contains more words than expected no prediction will be done
             - number_of_sentences: if set to -1 no sentence will be created
             - number_of_words: taken into account when number_of_sentences set to -1
                                indicate how many words we want to predict
            If both number_of_sentences and number_of_words are -1, the default behaviour is setting 
            number_of_sentences to 1
        """
        if number_of_sentences != -1:
            number_of_whatever = number_of_sentences  
            number_of_words = -1
        elif number_of_words != -1:
            number_of_whatever = number_of_words
            number_of_sentences = -1
        else:
            number_of_whatever=1
            number_of_sentences=1
        
        with tf.Session() as sess:
            # Initialize variables

            sess.run(self.init)
            self.model_saver.restore(sess, self.model)
#             print('sess ok')
            for sentence in sentences:
                sentence = sentence.strip()
                words = sentence.split(' ')
#                 print(words)
                s_count=0
                w_count=0                
                # Checking input dimensions
                if len(words) != self.n_input:
                    print("wrong number of input words",len(words),self.n_input)
                    break
                try:
                    # Find the key for each word
                    symbols_in_keys = [self.dictionary[str(words[i])] for i in range(len(words))]
                    print('\n'+sentence,end=' ')
                    # Iterate until the number of sentences/words defined by the user is reached
                    while(s_count<number_of_whatever):
                        keys = np.reshape(np.array(symbols_in_keys), [-1, self.n_input, 1])
                        onehot_pred = sess.run(self.pred, feed_dict={self.x: keys})
                        onehot_pred_index = int(tf.argmax(onehot_pred, 1).eval())
                        prediction = self.reverse_dictionary[onehot_pred_index]
                        sentence = "%s %s" % (sentence,prediction)
                        print(prediction,end=' ')
                        # If number of words was defined by the user -> increment the count
                        if number_of_words != -1:
                            s_count += 1
                        #If number of sentences was defined by the user and a '.' is predicted -> increment
                        if prediction == '.' and number_of_sentences != -1:
                            s_count += 1
                        symbols_in_keys = symbols_in_keys[1:]
                        symbols_in_keys.append(onehot_pred_index)
                # Catch errors
                except:
                    print("Word not in dictionary")



## Part 3 : LSTM Training  

In the Training process, at each epoch, 3 words are taken from the training data, encoded to integer to form the input vector. The training labels are one-hot vector encoding the word that comes after the 3 inputs words. Display the loss and the training accuracy every 1000 iteration. Save the model at the end of training in the **lstm_model** folder

In [6]:
mod = Model(dictionary=dictionary,reverse_dictionary=reverse_dictionary,
            logs_path=logs_path,train_data=train_data,
            learning_rate = 0.001,epochs = 50000,display_step = 1000,n_input = 3,n_hidden = 64
           )
mod.train()

Start Training
Epoch:  1000
		=====> Loss= 2.589306116
		=====> Accuracy= 0.000000000
Epoch:  2000
		=====> Loss= 2.581354618
		=====> Accuracy= 1.000000000
Epoch:  3000
		=====> Loss= 3.488306761
		=====> Accuracy= 0.000000000
Epoch:  4000
		=====> Loss= 5.000435829
		=====> Accuracy= 0.000000000
Epoch:  5000
		=====> Loss= 0.827232361
		=====> Accuracy= 1.000000000
Epoch:  6000
		=====> Loss= 1.051124096
		=====> Accuracy= 1.000000000
Epoch:  7000
		=====> Loss= 0.192727685
		=====> Accuracy= 1.000000000
Epoch:  8000
		=====> Loss= 0.764243901
		=====> Accuracy= 1.000000000
Epoch:  9000
		=====> Loss= 0.194511652
		=====> Accuracy= 1.000000000
Epoch:  10000
		=====> Loss= 0.634211838
		=====> Accuracy= 1.000000000
Epoch:  11000
		=====> Loss= 0.437799424
		=====> Accuracy= 1.000000000
Epoch:  12000
		=====> Loss= 0.037846763
		=====> Accuracy= 1.000000000
Epoch:  13000
		=====> Loss= 0.109591953
		=====> Accuracy= 1.000000000
Epoch:  14000
		=====> Loss= 0.053574841
		=====> Accuracy

## Part 4 : Test your model 

### 4.1. Next word prediction

Load your model (using the model_saved variable given in the training session) and test the sentences :
- 'get a little' 
- 'nobody tried to'
- Try with other sentences using words from the story's vocabulary. 

In [8]:
mod.test(["a liar come", "nobody tried to", "get a little", "the forest cried"],-1,1)

INFO:tensorflow:Restoring parameters from lstm_model/my_model

a liar come . 
nobody tried to wolf 
get a little meal 
the forest cried out 

### 4.2. More fun with the Fable Writer !

You will use the RNN/LSTM model learned in the previous question to create a
new story/fable.
For this you will choose 3 words from the dictionary which will start your
story and initialize your network. Using those 3 words the RNN will generate
the next word of the story. Using the last 3 words (the newly predicted one
and the last 2 from the input) you will use the network to predict the 5
word of the story.. and so on until your story is 5 sentence long. 
Make a point at the end of your story. 
To implement that, you will use the test function. 

In [9]:
mod.test(["a wolf come"],5,-1)


INFO:tensorflow:Restoring parameters from lstm_model/my_model

a wolf come the boy complained , the wise man of the village said : a liar will not be believed , even when he speaks the truth . a boy , and some the boy was again deceiving them , and nobody stirred to come to his help . so the wolf made a good meal off the boy's flock , and when the boy complained , the wise man of the village said : a liar will not be believed , even when he speaks the truth . a boy , and some the boy was again deceiving them , and nobody stirred to come to his help . so the wolf made a good meal off the boy's flock , and when the boy complained , the wise man of the village said : a liar will not be believed , even when he speaks the truth . 

### 4.3. Play with number of inputs

The number of input in our example is 3, see what happens when you use other number (1 and 5)

Your answer goes here