# EF Demo - Recurrent Entity Networks
## Raza Habib

### Feb 2017


## What I'm going to Demo and Why?

What?:
* A machine learning algorithm that can be trained to achieve state of the art results on non-trivial reasoning tasks


Why?:
* Perceptual AI has made huge progress in recent years but reasoning has not improved equally.
* First model to solve the Facebook AI bAbI tasks.
* Showcases the kind of problem that I'm excited about solving

## The bAbI Tasks


* Towards AI Complete Question Answering: A Set of Prerequisite Toy Tasks. Weston et Al 
* 20 tasks each of which forms a sort of "unit test" for reasoning.

E.g:

1 John journeyed to the hallway.


2 After that he journeyed to the garden.


3 Where is John? 	

Answer: garden


- Notice that this is extremely easy for us but very hard for computers because they must do "co-reference resolution". That means they need to learn that John and he refer to the same thing. THis is very non-trivial.

## Why should you be Excited by the Recurrent Entity Network

* Solves all of the "pre-requisite" reasoning tasks.
* Is trained "End-to-End" from data. No hand engineered knowledge.
* Reads and produces answers directly from data. No Explicit Knowledge base!

## What is a Recurrent Entity Network?

* Neural Networks are just (composed) functions with lots parameters. Learning is the problem of finding the best parameters.
* Recurrent Neural Networks (RNNs) are a type of neural net designed specifically with sequences in mind.
* Entity Networks give RNNs a structured memory.

![Network](Network.tiff)


![math](math.tiff)

## My Implementation

* Used Theano which is a library for performing automatic-differentiation. (Also used Numpy ofcourse)
* Must first construct the entire computation graph in python and then Theano compiles it to C.
* The process of finding the best parameters (in a non-bayesian framework) is to use optimisation. I used ADAM from lasagne here but the core code is raw theano.
* Two main classes RenCell and EntityNetwork:

In [None]:
class RenCell:
    """ The Core Recurrent Entity Network Cell. As described in
    arXiv:1612.03969v1.
    """

    def __init__(self, emb_dim, num_slots, activation=T.nnet.relu):
        """Initialise all the paramters as shared variables"""
        self.num_slots = num_slots  # M
        self.activation = activation
        self.emb_dim = emb_dim  # J

        # Initialise Parameters
        self.U = self._initialize_weights(emb_dim, emb_dim, name='U')
        self.V = self._initialize_weights(emb_dim, emb_dim, name='V')
        self.W = self._initialize_weights(emb_dim, emb_dim, name='W')
        self.a = theano.shared(1.0)  # Prelu gradient
        self.params = {'U': self.U, 'V': self.V, 'W': self.W, 'a': self.a}


In [3]:
    def _initialize_weights(self, inputs, outputs, name=None, scale=0.1):
        return theano.shared(scale*np.random.randn(inputs, outputs), name=name)

    def _get_gate(self, S_t, H, Keys):
        """ Equation (2) in arXiv:1612.03969v1"""
        S_t = S_t.dimshuffle([0, 'x', 1])
        return T.nnet.sigmoid(T.sum(H*S_t + Keys*S_t, axis=2, keepdims=True))

    def _get_candidate(self, S_t, H, Keys):
        """ Equation (3) in arXiv:1612.03969v1"""
        return self.activation(T.dot(S_t, self.U).dimshuffle([0, 'x', 1]) +
                               T.dot(H, self.V) + T.dot(Keys, self.W),
                               self.a)

    def _update_memory(self, H, _H, gate):
        """ Equation (4)/(5) in arXiv:1612.03969v1"""
        _H_prime = H + gate*_H
        return _H_prime/(_H_prime.norm(2, axis=2).dimshuffle([0, 1, 'x']))

In [None]:
    def __call__(self, inputs, init_state, init_keys, indices=-1):
        """ Take mini-bath of inputs and return the sate of the REN Cell
        at the time-steps specified by indices.
            inputs - (Time_steps, N_stories, emb_dim) tensor
            init_state - (Time_steps, N_stories, num_slots, emb_dim) tensor
            init_keys - (Time_steps, N_stories, num_slots, emb_dim) tensor
            indices - (N_stories, N_questions)

            output - (N_stories, N_questions, num_slots, emb_dim) tensor
        """
        story_indices = T.arange(T.shape(inputs)[1]).dimshuffle([0, 'x'])

        def REN_step(S_t, H_tm1, Keys, U, V, W):
            """ Perfrom one step of the RNN updates"""

            gate = self._get_gate(S_t, H_tm1, Keys)  # should be (N, M, emb_dim)

            _H = self._get_candidate(S_t, H_tm1, Keys)  # should be (N, M, emb_dim)

            H = self._update_memory(H_tm1, _H, gate)

            return H

        out_vals, updates = theano.scan(REN_step,
                                        sequences=inputs,
                                        outputs_info=[init_state],
                                        non_sequences=[init_keys,
                                                       self.U,
                                                       self.V,
                                                       self.W],
                                        )

        return out_vals[indices, story_indices], update

### Demo of the Results Before and After and Training

In [1]:
import theano
import theano.tensor
import numpy as np
from REN import Model

  "downsample module has been moved to the theano.tensor.signal.pool module.")


In [3]:
# Load and show some data
def extract_stories(data):
    return data['stories'], data['queries'], data['indices'], data['answers']

stories, questions, ind, answers = extract_stories(np.load('Data/Train/qa1_single-supporting-fact_train.npz'))
  
print('Story', stories[2])
print('questions', questions[2])

   

('Story', array([[14, 21, 19, 18,  6,  2,  0],
       [11, 12, 19, 18,  5,  2,  0],
       [17, 12, 19, 18,  9,  2,  0],
       [11, 12, 19, 18,  8,  2,  0],
       [11, 12, 19, 18,  5,  2,  0],
       [17, 12, 19, 18,  8,  2,  0],
       [17, 21,  4, 19, 18,  6,  2],
       [ 7, 20, 19, 18,  5,  2,  0],
       [11, 21, 19, 18, 16,  2,  0],
       [14, 15, 19, 18, 16,  2,  0]]))
('questions', array([[22, 10, 11,  3,  0,  0,  0],
       [22, 10, 14,  3,  0,  0,  0],
       [22, 10, 11,  3,  0,  0,  0],
       [22, 10, 11,  3,  0,  0,  0],
       [22, 10, 17,  3,  0,  0,  0]]))


In [4]:
# Lets load a randomly intialised model and see how it does?
params = {'embeding_dimension': 100,
          'num_slots': 20,
          'init_learning_rate': 0.01,
          'num_epochs': 10,
          'vocab_size': 160,
          'batch_size': 32,
           'max_sent_len': 7}

Ent_Net = Model.EntityNetwork(params['embeding_dimension'],
                                  params['vocab_size'],
                                  params['num_slots'],
                                  params['max_sent_len'])

In [5]:
# Correct Answer
print(answers[0])
# Untrained Model answer 
print(np.argmax(Ent_Net.get_answer(stories, questions, ind), 1)[0:5])
# Accuracy
print(Ent_Net.test_network(stories, questions, ind, answers)[1])

[ 5  9  9 16  5]
[64 40 40 40 64]
0.0


In [7]:
# Now lets get a pre-trained model
import cPickle
with open('Results/modelEF.save') as f:
    Ent_Net = cPickle.load(f)
Ent_Net

# Correct Answer
print(answers[0])
# Trained answer
print(np.argmax(Ent_Net.get_answer(stories, questions, ind), 1)[0:5])

[ 5  9  9 16  5]
[60 60 60 60 61]


## Future Work

* The present EntNet has no truly long term memory. This is an open problem I'm working on.
* EntNet is great on a short passage of text but we need to have a heirarchy of models from corpuses to documents to individual paragraphs.
* Very high variance in results.
* Slow to train (I want to apply some of the work I did during my Masters thesis here)