# Test embedalign with SentEval 

This notebook will allow you to test EmbedAlign using SentEval. In particular, this also works on **CPUs** :D

* Dependencies:
    * Python 3.5 with NumPy/SciPy
    * Pytorch 
    * Tensorflow 1.5.0  (for CPUs or GPUs depending on how you plan to run it)
        * For example in MacOS: 
        ```
        pip install https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.5.0-py3-none-any.whl
        ```
    * scikit-learn>=0.18.0
    * dill>=0.2.7.1


* Install `dgm4nlp` by following the instructions [here](https://github.com/uva-slpl/dgm4nlp), we highly recommend the use of `virtualenv`.

In the same `virtualenv`, do the following:

* Clone repo from FAIR github
```
    git clone https://github.com/facebookresearch/SentEval.git
    cd SentEval/
```

* Install senteval
```
    python setup.py install
```

* Download datasets (it takes some time...)
    * these are downstream tasks
    * new Senteval also has probing tasks (https://github.com/facebookresearch/SentEval/tree/master/data/probing) for evaluating linguistic properties of your embeddings. 

```
    cd data/downstream/
    ./get_transfer_data.bash
```

* Download [pretained embedlaign model](https://surfdrive.surf.nl/files/index.php/s/9M4h5zqmYETSmf3)


* The following code evaluates embedalign pretrained embeddings on en-fr Europarl on different NLP downstream tasks.



In [12]:
from __future__ import absolute_import, division, unicode_literals

import sys
import numpy as np
import logging
import sklearn
#import data 
# data.py is part of Senteval and it is used for loading word2vec style files
import senteval
import tensorflow as tf
import logging
from collections import defaultdict
import dill
import dgm4nlp

In [13]:
class dotdict(dict):
    """ dot.notation access to dictionary attributes """
    __getattr__ = dict.get
    __setattr__ = dict.__setitem__
    __delattr__ = dict.__delitem__

class EmbeddingExtractor:
    """
    This will compute a forward pass with the inference model of EmbedAlign and 
        give you the variational mean for each L1 word in the batch.
        
    Note that this takes monolingual L1 sentences only (at this point we have a traiend EmbedAlign model
        which dispenses with L2 sentences).    
        
    You don't really want to touch anything in this class.
    """

    def __init__(self, graph_file, ckpt_path, config=None):        
        g1 = tf.Graph()
        self.meta_graph = graph_file
        self.ckpt_path = ckpt_path
        
        self.softmax_approximation = 'botev-batch' #default
        with g1.as_default():
            self.sess = tf.Session(config=config, graph=g1)
            # load architecture computational graph
            self.new_saver = tf.train.import_meta_graph(self.meta_graph)
            # restore checkpoint
            self.new_saver.restore(self.sess, self.ckpt_path) #tf.train.latest_checkpoint(
            self.graph = g1  #tf.get_default_graph()
            # retrieve input variable
            self.x = self.graph.get_tensor_by_name("X:0")
            # retrieve training switch variable (True:trianing, False:Test)
            self.training_phase = self.graph.get_tensor_by_name("training_phase:0")
            #self.keep_prob = self.graph.get_tensor_by_name("keep_prob:0")

    def get_z_embedding_batch(self, x_batch):
        """
        :param x_batch: is np array of shape [batch_size, longest_sentence] containing the unique ids of words
        
        :returns: [batch_size, longest_sentence, z_dim]        
        """
        # Retrieve embeddings from latent variable Z
        # we can sempale several n_samples, default 1
        try:
            z_mean = self.graph.get_tensor_by_name("z:0")
            
            feed_dict = {
                self.x: x_batch,
                self.training_phase: False,
                #self.keep_prob: 1.

            }
            z_rep_values = self.sess.run(z_mean, feed_dict=feed_dict) 
        except:
            raise ValueError('tensor Z not in graph!')
        return z_rep_values

This is how you interface with SentEval. The only think you need to change are the paths to trained models in the main block at the end.

In [14]:
# Copyright (c) 2017-present, Facebook, Inc.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
#



# Set PATHs
# path to senteval
#PATH_TO_SENTEVAL = '../'



# import SentEval
#sys.path.insert(0, PATH_TO_SENTEVAL)

# Set params for SentEval
# we use logistic regression (usepytorch: Fasle) and kfold 10
# In this dictionary you can add extra information that you model needs for initialization
# for example the path to a dictionary of indices, of hyper parameters
# this dictionary is passed to the batched and the prepare fucntions
params_senteval = {'task_path': '',
                   'usepytorch': False,
                   'kfold': 10,
                   'ckpt_path': '',
                   'tok_path': '',
                   'extractor': None,
                   'tks1': None}
# made dictionary a dotdict
params_senteval = dotdict(params_senteval)
# this is the config for the NN classifier but we are going to use scikit-learn logistic regression with 10 kfold
# usepytorch = False 
#params_senteval['classifier'] = {'nhid': 0, 'optim': 'rmsprop', 'batch_size': 128,
#                                 'tenacity': 3, 'epoch_size': 2}



def prepare(params, samples):
    """
    In this example we are going to load a tensorflow model, 
    we open a dictionary with the indices of tokens and the computation graph
    """
    params.extractor = EmbeddingExtractor(
        graph_file='%s.meta'%(params.ckpt_path),
        ckpt_path=params.ckpt_path,
        config=None #run in cpu
    )

    # load tokenizer from training
    params.tks1 = dill.load(open(params.tok_path, 'rb'))
    return

def batcher(params, batch):
    """
    At this point batch is a python list containing sentences. Each sentence is a list of tokens (each token a string).
    The code below will take care of converting this to unique ids that EmbedAlign can understand.
    
    This function should return a single vector representation per sentence in the batch.
    In this example we use the average of word embeddings (as predicted by EmbedAlign) as a sentence representation.
    
    In this method you can do mini-batching or you can process sentences 1 at a time (batches of size 1).
    We choose to do it 1 sentence at a time to avoid having to deal with masking. 
    
    This should not be too slow, and it also saves memory.
    """
    # if a sentence is empty dot is set to be the only token
    # you can change it into NULL dependening in your model
    batch = [sent if sent != [] else ['.'] for sent in batch]
    embeddings = []
    for sent in batch:
        # Here is where dgm4nlp converts strings to unique ids respecting the vocabulary
        # of the pre-trained EmbedAlign model
        # from tokens ot ids position 0 is en
        x1 = params.tks1[0].to_sequences([(' '.join(sent))])
        
        # extract word embeddings in context for a sentence
        # [1, sentence_length, z_dim]
        z_batch1 = params.extractor.get_z_embedding_batch(x_batch=x1)
        # sentence vector is the mean of word embeddings in context
        # [1, z_dim]
        sent_vec = np.mean(z_batch1, axis=1)
        # check if there is any NaN in vector (they appear sometimes when there's padding)
        if np.isnan(sent_vec.sum()):
            sent_vec = np.nan_to_num(sent_vec)        
        embeddings.append(sent_vec)
    embeddings = np.vstack(embeddings)
    return embeddings


# Set up logger
logging.basicConfig(format='%(asctime)s : %(message)s', level=logging.DEBUG)

if __name__ == "__main__":
    # define paths
    # path to senteval data
    # note senteval adds downstream into the path
    params_senteval.task_path = '/Users/Druv/Documents/Jupiter/ULL/Lab3/SentEval/data'
    # path to computation graph
    # we use best model on validation AER
    # TODO: you have to point to valid paths! Use the pre-trained model linked from the top of this notebook.
    params_senteval.ckpt_path = '/Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt'
    # path to tokenizer with ids of trained Europarl data
    # out dictionary id depends on dill for pickle
    params_senteval.tok_path = '/Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/tokenizer.pickle'
    # we use 10 fold cross validation
    params_senteval.kfold = 10
    se = senteval.engine.SE(params_senteval, batcher, prepare)
    
    # here you define the NLP taks that your embedding model is going to be evaluated
    # in (https://arxiv.org/abs/1802.05883) we use the following :
    # SICKRelatedness (Sick-R) needs torch cuda to work (even when using logistic regression), 
    # but STS14 (semantic textual similarity) is a similar type of semantic task
    transfer_tasks = ['TopConstituents','BigramShift', 'Tense',
                         'SubjNumber', 'ObjNumber', 'OddManOut', 'CoordinationInversion']
                      #['Length', 'WordContent', 'Depth']
                      #['MR', 'CR', 'MPQA', 'SUBJ', 'SST2', 'TREC', 'MRPC', 'SICKEntailment', 'STS14'] 
                      # ['CR', 'MR', 'MPQA', 'SUBJ', 'SST2', 'SST5', 'TREC', 'MRPC', 'SNLI',
#                         'SICKEntailment', 'SICKRelatedness', 'STSBenchmark', 'ImageCaptionRetrieval',
#                         'STS12', 'STS13', 'STS14', 'STS15', 'STS16',
#                         'Length', 'WordContent', 'Depth', 'TopConstituents','BigramShift', 'Tense',
#                         'SubjNumber', 'ObjNumber', 'OddManOut', 'CoordinationInversion']
    # senteval prints the results and returns a dictionary with the scores
    results = se.eval(transfer_tasks)
    

2018-05-25 22:22:52,577 : ***** (Probing) Transfer task : TOPCONSTITUENTS classification *****
2018-05-25 22:22:53,921 : Loaded 100000 train - 10000 dev - 10000 test for TopConstituents


INFO:tensorflow:Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt


2018-05-25 22:22:56,999 : Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt
2018-05-25 22:23:01,436 : Computing embeddings for train/dev/test
2018-05-25 22:34:32,931 : Computed embeddings
2018-05-25 22:34:32,932 : Training sklearn-LogReg with standard validation..
2018-05-25 22:43:45,860 : [('reg:0.25', 26.85), ('reg:0.5', 27.24), ('reg:1', 28.08), ('reg:2', 28.9), ('reg:4', 30.62), ('reg:8', 32.7)]
2018-05-25 22:43:45,860 : Validation : best param found is reg = 8 with score             32.7
2018-05-25 22:43:45,860 : Evaluating...
2018-05-25 22:46:38,822 : 
Dev acc : 32.7 Test acc : 32.6 for TOPCONSTITUENTS classification

2018-05-25 22:46:38,846 : ***** (Probing) Transfer task : BIGRAMSHIFT classification *****
2018-05-25 22:46:39,354 : Loaded 100000 train - 10000 dev - 10000 test for BigramShift


INFO:tensorflow:Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt


2018-05-25 22:46:42,625 : Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt
2018-05-25 22:46:43,280 : Computing embeddings for train/dev/test
2018-05-25 23:00:11,033 : Computed embeddings
2018-05-25 23:00:11,034 : Training sklearn-LogReg with standard validation..
2018-05-25 23:01:09,880 : [('reg:0.25', 50.65), ('reg:0.5', 50.55), ('reg:1', 50.65), ('reg:2', 50.61), ('reg:4', 50.72), ('reg:8', 50.85)]
2018-05-25 23:01:09,881 : Validation : best param found is reg = 8 with score             50.85
2018-05-25 23:01:09,882 : Evaluating...
2018-05-25 23:01:27,990 : 
Dev acc : 50.9 Test acc : 51.1 for BIGRAMSHIFT classification

2018-05-25 23:01:28,014 : ***** (Probing) Transfer task : TENSE classification *****
2018-05-25 23:01:28,540 : Loaded 100000 train - 10000 dev - 10000 test for Tense


INFO:tensorflow:Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt


2018-05-25 23:01:31,654 : Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt
2018-05-25 23:01:32,311 : Computing embeddings for train/dev/test
2018-05-25 23:13:17,777 : Computed embeddings
2018-05-25 23:13:17,777 : Training sklearn-LogReg with standard validation..
2018-05-25 23:14:01,695 : [('reg:0.25', 67.73), ('reg:0.5', 68.13), ('reg:1', 68.67), ('reg:2', 69.26), ('reg:4', 70.11), ('reg:8', 70.87)]
2018-05-25 23:14:01,695 : Validation : best param found is reg = 8 with score             70.87
2018-05-25 23:14:01,695 : Evaluating...
2018-05-25 23:14:13,132 : 
Dev acc : 70.9 Test acc : 67.5 for TENSE classification

2018-05-25 23:14:13,163 : ***** (Probing) Transfer task : SUBJNUMBER classification *****
2018-05-25 23:14:13,945 : Loaded 100000 train - 10000 dev - 10000 test for SubjNumber


INFO:tensorflow:Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt


2018-05-25 23:14:16,742 : Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt
2018-05-25 23:14:17,335 : Computing embeddings for train/dev/test
2018-05-25 23:26:18,656 : Computed embeddings
2018-05-25 23:26:18,656 : Training sklearn-LogReg with standard validation..
2018-05-25 23:27:08,591 : [('reg:0.25', 63.34), ('reg:0.5', 64.5), ('reg:1', 65.86), ('reg:2', 68.07), ('reg:4', 70.2), ('reg:8', 72.28)]
2018-05-25 23:27:08,591 : Validation : best param found is reg = 8 with score             72.28
2018-05-25 23:27:08,591 : Evaluating...
2018-05-25 23:27:24,232 : 
Dev acc : 72.3 Test acc : 70.8 for SUBJNUMBER classification

2018-05-25 23:27:24,247 : ***** (Probing) Transfer task : OBJNUMBER classification *****
2018-05-25 23:27:25,575 : Loaded 100000 train - 10000 dev - 10000 test for ObjNumber


INFO:tensorflow:Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt


2018-05-25 23:27:28,638 : Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt
2018-05-25 23:27:29,231 : Computing embeddings for train/dev/test
2018-05-25 23:40:34,949 : Computed embeddings
2018-05-25 23:40:34,950 : Training sklearn-LogReg with standard validation..
2018-05-25 23:41:30,793 : [('reg:0.25', 61.52), ('reg:0.5', 62.31), ('reg:1', 63.5), ('reg:2', 64.41), ('reg:4', 65.82), ('reg:8', 66.98)]
2018-05-25 23:41:30,794 : Validation : best param found is reg = 8 with score             66.98
2018-05-25 23:41:30,795 : Evaluating...
2018-05-25 23:41:48,200 : 
Dev acc : 67.0 Test acc : 68.5 for OBJNUMBER classification

2018-05-25 23:41:48,230 : ***** (Probing) Transfer task : ODDMANOUT classification *****
2018-05-25 23:41:48,831 : Loaded 100000 train - 10000 dev - 10000 test for OddManOut


INFO:tensorflow:Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt


2018-05-25 23:41:52,015 : Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt
2018-05-25 23:41:52,719 : Computing embeddings for train/dev/test
2018-05-25 23:57:51,926 : Computed embeddings
2018-05-25 23:57:51,926 : Training sklearn-LogReg with standard validation..
2018-05-25 23:58:34,300 : [('reg:0.25', 49.48), ('reg:0.5', 49.3), ('reg:1', 49.56), ('reg:2', 49.6), ('reg:4', 49.53), ('reg:8', 49.57)]
2018-05-25 23:58:34,300 : Validation : best param found is reg = 2 with score             49.6
2018-05-25 23:58:34,300 : Evaluating...
2018-05-25 23:58:41,362 : 
Dev acc : 49.6 Test acc : 50.2 for ODDMANOUT classification

2018-05-25 23:58:41,378 : ***** (Probing) Transfer task : COORDINATIONINVERSION classification *****
2018-05-25 23:58:42,362 : Loaded 100002 train - 10002 dev - 10002 test for CoordinationInversion


INFO:tensorflow:Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt


2018-05-25 23:58:45,253 : Restoring parameters from /Users/Druv/Documents/Jupiter/ULL/Lab3/ull-practical3-embedalign/model.best.validation.aer.ckpt
2018-05-25 23:58:45,847 : Computing embeddings for train/dev/test
2018-05-26 00:12:35,578 : Computed embeddings
2018-05-26 00:12:35,578 : Training sklearn-LogReg with standard validation..
2018-05-26 00:13:15,749 : [('reg:0.25', 50.2), ('reg:0.5', 50.15), ('reg:1', 50.27), ('reg:2', 50.49), ('reg:4', 51.1), ('reg:8', 51.53)]
2018-05-26 00:13:15,765 : Validation : best param found is reg = 8 with score             51.53
2018-05-26 00:13:15,765 : Evaluating...
2018-05-26 00:13:26,342 : 
Dev acc : 51.5 Test acc : 51.0 for COORDINATIONINVERSION classification



In [17]:
import pickle
   
with open('Embedalign_3.pkl','wb') as f:
    pickle.dump(results, f)
print(results)

{'SubjNumber': {'ndev': 10000, 'devacc': 72.28, 'ntest': 10000, 'acc': 70.78}, 'CoordinationInversion': {'ndev': 10002, 'devacc': 51.53, 'ntest': 10002, 'acc': 51.05}, 'BigramShift': {'ndev': 10000, 'devacc': 50.85, 'ntest': 10000, 'acc': 51.13}, 'ObjNumber': {'ndev': 10000, 'devacc': 66.98, 'ntest': 10000, 'acc': 68.45}, 'TopConstituents': {'ndev': 10000, 'devacc': 32.7, 'ntest': 10000, 'acc': 32.64}, 'OddManOut': {'ndev': 10000, 'devacc': 49.6, 'ntest': 10000, 'acc': 50.16}, 'Tense': {'ndev': 10000, 'devacc': 70.87, 'ntest': 10000, 'acc': 67.46}}


In [16]:
%autosave 10

Autosaving every 10 seconds
