Evaluation of language models on agreement task
=======

This notebook provides methodological guidelines for testing the performance of pre-trained language models on agreeement task .


Loading a pretrained transformer language model
------------

The best pretrained model for each architecture are stored in their `MODEL_DIR` named `tied_layers_30_7` (paramètres partagés) and `no_tied_layers_37_9`, the numbers 30.7/37.9 indicate the perplexity of LM.
- `lm_params.pt`: the pretrained model to load
- `model.yaml`: values of the hyperparameters
- `tokcodes`: vocabulary of language model (50k)


In [15]:
from nnlm import load_transformer_model
from data import Dataset


Incremental Language models
-------

In [16]:
# load models with shared parameters (tied_layers)
model_dir = "tied_layers_30_7/"
encoder, lm = load_transformer_model(model_dir,cpu=True)

I will use positional embeddings


**Next word prediction** Incremental language models perform naturally the task of next word prediction. Given a batch of sentences, we can predict the next word log probability using the following functions.

In [17]:
batch_size = 1
testdata = Dataset("examples_obj_pp.txt", parentencoding=model_dir)
for elt in lm.predict(testdata,batch_size,device='cpu'):
    print('input:',' '.join(elt['token']))
    print(elt)
    print()

input: <bos> Les offres que le directeur a acceptées sont intéressantes
           token       ref_next     pred_next  ref_prob  pred_prob
0          <bos>            Les            Le -2.889215  -2.234221
1            Les         offres         <unk> -9.581299  -3.058100
2         offres            que            de -6.800640  -1.432299
3            que             le           les -2.670989  -1.939359
4             le      directeur  gouvernement -6.045998  -2.295597
5      directeur              a            de -2.008267  -1.628106
6              a      acceptées         <unk> -6.883583  -2.724935
7      acceptées           sont          sont -1.640829  -1.640829
8           sont  intéressantes            en -9.809179  -3.100528
9  intéressantes              .          pour -1.502111  -1.251670

input: <bos> Les offres que le directeur a acceptée sont intéressantes
           token       ref_next     pred_next  ref_prob  pred_prob
0          <bos>            Les            Le -2.889

We compare, given a `prefix` (e.g. Les offres que le directeur a `____` ), the probabilities a language model assigns to the plural form of the target participle (`acceptées`) and its singular form (`acceptée`). We consider the model has predicted the agreement correctly if the form with the correct number has a higer probability:

In the above example:

p(`acceptées` | the prefix) = -6.883583

p(`acceptée`  | the prefix) = -7.570808

p(`acceptées` | the prefix)> p(`acceptée` | the prefix) $\rightarrow$ The transformer model predict the plural form. 

### evaluation data(test.txt) format

For each prefix/sentence, we have the right form `Les offres que le directeur a acceptées` and the wrong form `Les offres que le directeur a acceptée`. This format allows us to compare more easily the predictions of a language model given the same prefix. 