# SentEval usage example

* Clone repo from FAIR github
```
    git clone https://github.com/facebookresearch/SentEval.git
    cd SentEval/
```
* Dependencies:
    * Python 2/3 with NumPy/SciPy
    * Pytorch
    * scikit-learn>=0.18.0

* Install senteval
```
    python setup.py install
```
* Download datasets (it takes some time...)
    * these are downstream tasks
    * new Senteval also has probing tasks (https://github.com/facebookresearch/SentEval/tree/master/data/probing) for evaluating linguistic properties of your embeddings. 
```
    cd data/downstream/
    ./get_transfer_data.bash
```
* Download pretained Glove embeddings:

```
    mkdir pretrained
    cd pretrained
    wget http://nlp.stanford.edu/data/glove.840B.300d.zip
   
```

* The following code evaluates Glove pretrained embeddings on different NLP downstream tasks.

In [10]:
# Copyright (c) 2017-present, Facebook, Inc.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
#

from __future__ import absolute_import, division, unicode_literals

import sys
import numpy as np
import logging
import sklearn
import SentEval.examples.data as data

from gensim.models import Word2Vec

# Set PATHs
# path to senteval
PATH_TO_SENTEVAL = './SentEval-master/'
# path to the NLP datasets 
PATH_TO_DATA = './SentEval-master/data/'


# import SentEval
sys.path.insert(0, PATH_TO_SENTEVAL)
import senteval


def prepare(params, samples):
    # load model:
    sg_model = Word2Vec.load('gensim_skipgram_model')
    params.word_vec = sg_model.wv
    # dimensionality of glove embeddings
    params.wvec_dim = 300
    return

def batcher(params, batch):
    """
    In this example we use the average of word embeddings as a sentence representation.
    Each batch consists of one vector for sentence.
    Here you can process each sentence of the batch, 
    or a complete batch (you may need masking for that).
    
    """
    # if a sentence is empty dot is set to be the only token
    # you can change it into NULL dependening in your model
    batch = [sent if sent != [] else ['.'] for sent in batch]
    embeddings = []

    for sent in batch:
        sentvec = []
        # the format of a sentence is a lists of words (tokenized and lowercased)
        for word in sent:
            if word in params.word_vec:
                # [number of words, embedding dimensionality]
                sentvec.append(params.word_vec[word])
        if not sentvec:
            vec = np.zeros(params.wvec_dim)
            # [number of words, embedding dimensionality]
            sentvec.append(vec)
        # average of word embeddings for sentence representation
        # [embedding dimansionality]
        sentvec = np.mean(sentvec, 0)
        embeddings.append(sentvec)
    # [batch size, embedding dimensionality]
    embeddings = np.vstack(embeddings)
    return embeddings


# Set params for SentEval
# we use logistic regression (usepytorch: Fasle) and kfold 10
# In this dictionary you can add extra information that you model needs for initialization
# for example the path to a dictionary of indices, of hyper parameters
# this dictionary is passed to the batched and the prepare fucntions
params_senteval = {'task_path': PATH_TO_DATA, 'usepytorch': False, 'kfold': 10}
# this is the config for the NN classifier but we are going to use scikit-learn logistic regression with 10 kfold
# usepytorch = False 
#params_senteval['classifier'] = {'nhid': 0, 'optim': 'rmsprop', 'batch_size': 128,
#                                 'tenacity': 3, 'epoch_size': 2}

# Set up logger
logging.basicConfig(format='%(asctime)s : %(message)s', level=logging.DEBUG)

if __name__ == "__main__":
    se = senteval.engine.SE(params_senteval, batcher, prepare)
    
    # here you define the NLP taks that your embedding model is going to be evaluated
    # in (https://arxiv.org/abs/1802.05883) we use the following :
    # SICKRelatedness (Sick-R) needs torch cuda to work (even when using logistic regression), 
    # but STS14 (semantic textual similarity) is a similar type of semantic task
    transfer_tasks = ['MR', 'CR', 'MPQA', 'SUBJ', 'SST2', 'TREC',
                      'MRPC', 'SICKEntailment', 'STS14']
    # senteval prints the results and returns a dictionary with the scores
    results = se.eval(transfer_tasks)
    print(results)

2018-05-24 19:08:00,033 : ***** Transfer task : MR *****


2018-05-24 19:08:00,077 : loading Word2Vec object from gensim_skipgram_model
2018-05-24 19:08:00,082 : {'mode': 'rb', 'kw': {}, 'uri': 'gensim_skipgram_model'}
2018-05-24 19:08:00,082 : encoding_wrapper: {'mode': 'rb', 'errors': 'strict', 'encoding': None, 'fileobj': <_io.BufferedReader name='gensim_skipgram_model'>}
2018-05-24 19:08:00,343 : loading wv recursively from gensim_skipgram_model.wv.* with mmap=None
2018-05-24 19:08:00,353 : loading vectors from gensim_skipgram_model.wv.vectors.npy with mmap=None
2018-05-24 19:08:00,403 : setting ignored attribute vectors_norm to None
2018-05-24 19:08:00,403 : loading vocabulary recursively from gensim_skipgram_model.vocabulary.* with mmap=None
2018-05-24 19:08:00,403 : loading trainables recursively from gensim_skipgram_model.trainables.* with mmap=None
2018-05-24 19:08:00,403 : loading syn1neg from gensim_skipgram_model.trainables.syn1neg.npy with mmap=None
2018-05-24 19:08:00,463

2018-05-24 19:19:33,712 : Generated sentence embeddings
2018-05-24 19:19:33,712 : Training sklearn-LogReg with (inner) 10-fold cross-validation
2018-05-24 19:20:01,083 : Best param found at split 1: l2reg = 2                 with score 87.54
2018-05-24 19:20:25,693 : Best param found at split 2: l2reg = 2                 with score 87.72
2018-05-24 19:20:50,892 : Best param found at split 3: l2reg = 2                 with score 87.9
2018-05-24 19:21:18,688 : Best param found at split 4: l2reg = 2                 with score 87.69
2018-05-24 19:21:44,385 : Best param found at split 5: l2reg = 1                 with score 87.5
2018-05-24 19:22:12,822 : Best param found at split 6: l2reg = 4                 with score 87.71
2018-05-24 19:22:38,862 : Best param found at split 7: l2reg = 2                 with score 87.66
2018-05-24 19:23:06,057 : Best param found at split 8: l2reg = 2                 with score 87.8
2018-05-24 19:23:37,566 : Best param found at split 9: l2reg = 2           

2018-05-24 19:27:39,802 : Evaluating...
2018-05-24 19:27:41,215 : 
Dev acc : 78.0 Test acc : 77.94 for                        SICK entailment

2018-05-24 19:27:41,225 : ***** Transfer task : STS14 *****


2018-05-24 19:27:41,281 : loading Word2Vec object from gensim_skipgram_model
2018-05-24 19:27:41,286 : {'mode': 'rb', 'kw': {}, 'uri': 'gensim_skipgram_model'}
2018-05-24 19:27:41,286 : encoding_wrapper: {'mode': 'rb', 'errors': 'strict', 'encoding': None, 'fileobj': <_io.BufferedReader name='gensim_skipgram_model'>}
2018-05-24 19:27:41,522 : loading wv recursively from gensim_skipgram_model.wv.* with mmap=None
2018-05-24 19:27:41,522 : loading vectors from gensim_skipgram_model.wv.vectors.npy with mmap=None
2018-05-24 19:27:41,555 : setting ignored attribute vectors_norm to None
2018-05-24 19:27:41,555 : loading vocabulary recursively from gensim_skipgram_model.vocabulary.* with mmap=None
2018-05-24 19:27:41,555 : loading trainables recursively from gensim_skipgram_model.trainables.*

{'MR': {'ntest': 10662, 'acc': 70.74, 'devacc': 71.05, 'ndev': 10662}, 'TREC': {'ntest': 500, 'acc': 71.2, 'devacc': 68.93, 'ndev': 5452}, 'MPQA': {'ntest': 10606, 'acc': 87.27, 'devacc': 87.46, 'ndev': 10606}, 'SICKEntailment': {'ntest': 4927, 'acc': 77.94, 'devacc': 78.0, 'ndev': 500}, 'MRPC': {'ntest': 1725, 'acc': 71.25, 'f1': 80.58, 'devacc': 70.88, 'ndev': 4076}, 'SUBJ': {'ntest': 10000, 'acc': 87.72, 'devacc': 87.67, 'ndev': 10000}, 'STS14': {'all': {'spearman': {'wmean': 0.6094875045642937, 'mean': 0.6008219604817223}, 'pearson': {'wmean': 0.6055138454955457, 'mean': 0.5959913260210546}}, 'tweet-news': {'spearman': SpearmanrResult(correlation=0.6780507767722375, pvalue=4.192678121791063e-102), 'pearson': (0.6992969890074551, 3.6628712642219704e-111), 'nsamples': 750}, 'deft-news': {'spearman': SpearmanrResult(correlation=0.6571064899560274, pvalue=1.8488000386425802e-38), 'pearson': (0.6771492090258799, 1.363169965697736e-41), 'nsamples': 300}, 'headlines': {'spearman': Spearma