<a href="https://colab.research.google.com/github/Huertas97/SentEval/blob/master/notebooks/SentEval_ensemble.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<font color="orange" size=6>Information</font>

We are going to evaluate the ensemble of the multilinguals Sentence Transformer  models without dimensional reduction over the different task of SentEval dataset (https://github.com/facebookresearch/SentEval). 


<font color="orange">We are gonna evaluate the following models on SentEval: </font>.

1. Each model separately without PCA
  * distiluse-base-multilingual-cased
  * xlm-r-distilroberta-base-paraphrase-v1
  * xlm-r-bert-base-nli-stsb-mean-tokens
  * LaBSE
  * distilbert-multilingual-nli-stsb-quora-ranking


2. The analogous ensemble combination of models with PCA over STS benchmark 2017 Multilingual without PCA 
  * Ensemble of 5 models (without the PCA)  


3. The analogous ensemble combination of models with PCA with best ratio result / nº dim over STS benchmark 2017 Multilingual without PCA 

  * Combination of 2 models without PCA:  xlm-r-bert-base-nli-stsb-mean-tokens, xlm-r-distilroberta-base-paraphrase-v1

<br>
<font  size=5>Important</font>



The results from STS Benchmark 2017 can show slight differences. That because how is computed the cosine similarity. We have used `1 - paired_cosine_distances ` from sklearn.metrics.pairwise [(+ info)](https://stackoverflow.com/questions/36998330/cosine-similarity-output-different-for-different-libraries).

Meanwhile SentEval uses 


```
def cosine(u, v):
    return np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))
```


The difference resides in the rounded result. 
Our approach (pairwise distance) gives just one decimal. Meanwhile, SentEval gives more.  
<br>
<font size=5>Important</font>

ImageCaptionRetrieval and SNLI are highly computational consumming. Thus, we do not consider this task. 


---



In [1]:
!pip install -U -q sentence_transformers

[K     |████████████████████████████████| 71kB 8.4MB/s 
[K     |████████████████████████████████| 1.3MB 24.8MB/s 
[K     |████████████████████████████████| 890kB 56.5MB/s 
[K     |████████████████████████████████| 1.1MB 54.5MB/s 
[K     |████████████████████████████████| 2.9MB 57.0MB/s 
[?25h  Building wheel for sentence-transformers (setup.py) ... [?25l[?25hdone
  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone


# Loading SentEval

In [2]:
# Clone the repository and all the dependencies
!git clone https://github.com/Huertas97/SentEval.git

# Download the data
%cd /content/SentEval/data/downstream
!bash ./get_transfer_data.bash

Cloning into 'SentEval'...
remote: Enumerating objects: 50, done.[K
remote: Counting objects: 100% (50/50), done.[K
remote: Compressing objects: 100% (35/35), done.[K
remote: Total 739 (delta 29), reused 34 (delta 15), pack-reused 689[K
Receiving objects: 100% (739/739), 43.78 MiB | 28.25 MiB/s, done.
Resolving deltas: 100% (463/463), done.
/content/SentEval/data/downstream
Cloning Moses github repository (for tokenization scripts)...
Cloning into 'mosesdecoder'...
remote: Enumerating objects: 20, done.[K
remote: Counting objects: 100% (20/20), done.[K
remote: Compressing objects: 100% (17/17), done.[K
remote: Total 147592 (delta 5), reused 11 (delta 3), pack-reused 147572[K
Receiving objects: 100% (147592/147592), 129.76 MiB | 23.11 MiB/s, done.
Resolving deltas: 100% (114031/114031), done.
mkdir: cannot create directory ‘.’: File exists
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spen

# Each model separately without PCA

## distiluse-base-multilingual-cased

In [None]:
%cd
%cd /content/SentEval/examples
!python ensemble.py --models distiluse-base-multilingual-cased

[1;30;43mSe han truncado las últimas 5000 líneas del flujo de salida.[0m
Batches: 100% 4/4 [00:00<00:00, 22.90it/s]
Batches: 100% 4/4 [00:00<00:00, 23.86it/s]
Batches: 100% 4/4 [00:00<00:00, 23.24it/s]
Batches: 100% 4/4 [00:00<00:00, 23.77it/s]
Batches: 100% 4/4 [00:00<00:00, 22.41it/s]
Batches: 100% 4/4 [00:00<00:00, 24.38it/s]
Batches: 100% 4/4 [00:00<00:00, 24.75it/s]
Batches: 100% 4/4 [00:00<00:00, 23.22it/s]
Batches: 100% 4/4 [00:00<00:00, 24.31it/s]
Batches: 100% 4/4 [00:00<00:00, 23.46it/s]
Batches: 100% 4/4 [00:00<00:00, 22.88it/s]
Batches: 100% 4/4 [00:00<00:00, 24.07it/s]
Batches: 100% 4/4 [00:00<00:00, 22.47it/s]
Batches: 100% 4/4 [00:00<00:00, 24.43it/s]
Batches: 100% 4/4 [00:00<00:00, 22.33it/s]
Batches: 100% 4/4 [00:00<00:00, 21.85it/s]
Batches: 100% 4/4 [00:00<00:00, 23.70it/s]
Batches: 100% 4/4 [00:00<00:00, 22.93it/s]
Batches: 100% 4/4 [00:00<00:00, 23.45it/s]
Batches: 100% 4/4 [00:00<00:00, 23.61it/s]
Batches: 100% 4/4 [00:00<00:00, 23.40it/s]
Batches: 100% 4/4 [00:

## xlm-r-distilroberta-base-paraphrase-v1

In [None]:
%cd
%cd /content/SentEval/examples
!python ensemble.py --models xlm-r-distilroberta-base-paraphrase-v1

[1;30;43mSe han truncado las últimas 5000 líneas del flujo de salida.[0m
Batches: 100% 4/4 [00:00<00:00, 13.44it/s]
Batches: 100% 4/4 [00:00<00:00, 13.24it/s]
Batches: 100% 4/4 [00:00<00:00, 13.15it/s]
Batches: 100% 4/4 [00:00<00:00, 13.61it/s]
Batches: 100% 4/4 [00:00<00:00, 13.33it/s]
Batches: 100% 4/4 [00:00<00:00, 13.14it/s]
Batches: 100% 4/4 [00:00<00:00, 12.70it/s]
Batches: 100% 4/4 [00:00<00:00, 13.22it/s]
Batches: 100% 4/4 [00:00<00:00, 13.21it/s]
Batches: 100% 4/4 [00:00<00:00, 13.04it/s]
Batches: 100% 4/4 [00:00<00:00, 12.67it/s]
Batches: 100% 4/4 [00:00<00:00, 13.20it/s]
Batches: 100% 4/4 [00:00<00:00, 12.96it/s]
Batches: 100% 4/4 [00:00<00:00, 12.88it/s]
Batches: 100% 4/4 [00:00<00:00, 12.97it/s]
Batches: 100% 4/4 [00:00<00:00, 12.77it/s]
Batches: 100% 4/4 [00:00<00:00, 12.77it/s]
Batches: 100% 4/4 [00:00<00:00, 12.94it/s]
Batches: 100% 4/4 [00:00<00:00, 12.81it/s]
Batches: 100% 4/4 [00:00<00:00, 12.86it/s]
Batches: 100% 4/4 [00:00<00:00, 12.92it/s]
Batches: 100% 4/4 [00:

## xlm-r-bert-base-nli-stsb-mean-tokens

In [None]:
%cd
%cd /content/SentEval/examples
!python ensemble.py --models xlm-r-bert-base-nli-stsb-mean-tokens

[1;30;43mSe han truncado las últimas 5000 líneas del flujo de salida.[0m
Batches: 100% 4/4 [00:00<00:00, 13.51it/s]
Batches: 100% 4/4 [00:00<00:00, 13.09it/s]
Batches: 100% 4/4 [00:00<00:00, 13.18it/s]
Batches: 100% 4/4 [00:00<00:00, 13.73it/s]
Batches: 100% 4/4 [00:00<00:00, 13.31it/s]
Batches: 100% 4/4 [00:00<00:00, 13.10it/s]
Batches: 100% 4/4 [00:00<00:00, 12.68it/s]
Batches: 100% 4/4 [00:00<00:00, 13.29it/s]
Batches: 100% 4/4 [00:00<00:00, 13.31it/s]
Batches: 100% 4/4 [00:00<00:00, 13.21it/s]
Batches: 100% 4/4 [00:00<00:00, 12.49it/s]
Batches: 100% 4/4 [00:00<00:00, 13.11it/s]
Batches: 100% 4/4 [00:00<00:00, 12.98it/s]
Batches: 100% 4/4 [00:00<00:00, 13.11it/s]
Batches: 100% 4/4 [00:00<00:00, 12.89it/s]
Batches: 100% 4/4 [00:00<00:00, 12.68it/s]
Batches: 100% 4/4 [00:00<00:00, 12.33it/s]
Batches: 100% 4/4 [00:00<00:00, 13.07it/s]
Batches: 100% 4/4 [00:00<00:00, 12.91it/s]
Batches: 100% 4/4 [00:00<00:00, 12.76it/s]
Batches: 100% 4/4 [00:00<00:00, 12.55it/s]
Batches: 100% 4/4 [00:

## LaBSE

In [None]:
%cd
%cd /content/SentEval/examples
!python ensemble.py --models LaBSE

[1;30;43mSe han truncado las últimas 5000 líneas del flujo de salida.[0m
Batches: 100% 4/4 [00:00<00:00, 13.83it/s]
Batches: 100% 4/4 [00:00<00:00, 14.23it/s]
Batches: 100% 4/4 [00:00<00:00, 14.14it/s]
Batches: 100% 4/4 [00:00<00:00, 14.13it/s]
Batches: 100% 4/4 [00:00<00:00, 14.26it/s]
Batches: 100% 4/4 [00:00<00:00, 13.99it/s]
Batches: 100% 4/4 [00:00<00:00, 14.49it/s]
Batches: 100% 4/4 [00:00<00:00, 13.78it/s]
Batches: 100% 4/4 [00:00<00:00, 14.33it/s]
Batches: 100% 4/4 [00:00<00:00, 14.11it/s]
Batches: 100% 4/4 [00:00<00:00, 13.65it/s]
Batches: 100% 4/4 [00:00<00:00, 13.43it/s]
Batches: 100% 4/4 [00:00<00:00, 13.57it/s]
Batches: 100% 4/4 [00:00<00:00, 13.46it/s]
Batches: 100% 4/4 [00:00<00:00, 13.82it/s]
Batches: 100% 4/4 [00:00<00:00, 13.50it/s]
Batches: 100% 4/4 [00:00<00:00, 13.29it/s]
Batches: 100% 4/4 [00:00<00:00, 13.73it/s]
Batches: 100% 4/4 [00:00<00:00, 13.06it/s]
Batches: 100% 4/4 [00:00<00:00, 13.52it/s]
Batches: 100% 4/4 [00:00<00:00, 13.69it/s]
Batches: 100% 4/4 [00:

## distilbert-multilingual-nli-stsb-quora-ranking

In [None]:
%cd
%cd /content/SentEval/examples
!python ensemble.py --models distilbert-multilingual-nli-stsb-quora-ranking

[1;30;43mSe han truncado las últimas 5000 líneas del flujo de salida.[0m
Batches: 100% 4/4 [00:00<00:00, 23.13it/s]
Batches: 100% 4/4 [00:00<00:00, 23.38it/s]
Batches: 100% 4/4 [00:00<00:00, 21.33it/s]
Batches: 100% 4/4 [00:00<00:00, 23.43it/s]
Batches: 100% 4/4 [00:00<00:00, 22.45it/s]
Batches: 100% 4/4 [00:00<00:00, 22.30it/s]
Batches: 100% 4/4 [00:00<00:00, 23.65it/s]
Batches: 100% 4/4 [00:00<00:00, 22.33it/s]
Batches: 100% 4/4 [00:00<00:00, 23.08it/s]
Batches: 100% 4/4 [00:00<00:00, 23.73it/s]
Batches: 100% 4/4 [00:00<00:00, 22.05it/s]
Batches: 100% 4/4 [00:00<00:00, 23.15it/s]
Batches: 100% 4/4 [00:00<00:00, 22.73it/s]
Batches: 100% 4/4 [00:00<00:00, 23.12it/s]
Batches: 100% 4/4 [00:00<00:00, 22.80it/s]
Batches: 100% 4/4 [00:00<00:00, 22.79it/s]
Batches: 100% 4/4 [00:00<00:00, 20.51it/s]
Batches: 100% 4/4 [00:00<00:00, 22.28it/s]
Batches: 100% 4/4 [00:00<00:00, 22.43it/s]
Batches: 100% 4/4 [00:00<00:00, 22.37it/s]
Batches: 100% 4/4 [00:00<00:00, 23.19it/s]
Batches: 100% 4/4 [00:

# Best ensemble combination of models without PCA over STS benchmark 2017 Multilingual

## Ensemble of 5 models



In [None]:
%cd
%cd /content/SentEval/examples
!python ensemble.py --models distiluse-base-multilingual-cased,xlm-r-distilroberta-base-paraphrase-v1,xlm-r-bert-base-nli-stsb-mean-tokens,LaBSE,distilbert-multilingual-nli-stsb-quora-ranking

[1;30;43mSe han truncado las últimas 5000 líneas del flujo de salida.[0m
Batches: 100% 4/4 [00:00<00:00, 10.47it/s]
Batches: 100% 4/4 [00:00<00:00, 12.25it/s]
Batches: 100% 4/4 [00:00<00:00, 20.14it/s]
Batches: 100% 4/4 [00:00<00:00, 18.61it/s]
Batches: 100% 4/4 [00:00<00:00, 10.11it/s]
Batches: 100% 4/4 [00:00<00:00, 10.23it/s]
Batches: 100% 4/4 [00:00<00:00, 11.51it/s]
Batches: 100% 4/4 [00:00<00:00, 18.22it/s]
Batches: 100% 4/4 [00:00<00:00, 18.85it/s]
Batches: 100% 4/4 [00:00<00:00, 10.30it/s]
Batches: 100% 4/4 [00:00<00:00, 10.13it/s]
Batches: 100% 4/4 [00:00<00:00, 11.55it/s]
Batches: 100% 4/4 [00:00<00:00, 18.99it/s]
Batches: 100% 4/4 [00:00<00:00, 19.25it/s]
Batches: 100% 4/4 [00:00<00:00,  9.85it/s]
Batches: 100% 4/4 [00:00<00:00,  9.76it/s]
Batches: 100% 4/4 [00:00<00:00, 11.34it/s]
Batches: 100% 4/4 [00:00<00:00, 18.29it/s]
Batches: 100% 4/4 [00:00<00:00, 18.44it/s]
Batches: 100% 4/4 [00:00<00:00, 10.51it/s]
Batches: 100% 4/4 [00:00<00:00, 10.41it/s]
Batches: 100% 4/4 [00:

# Ensemble combination of models without PCA with best ratio result / nº dim over STS benchmark 2017

xlm-r-bert-base-nli-stsb-mean-tokens, xlm-r-distilroberta-base-paraphrase-v1

In [None]:
%cd
%cd /content/SentEval/examples
!python ensemble.py --models xlm-r-distilroberta-base-paraphrase-v1,xlm-r-bert-base-nli-stsb-mean-tokens 

[1;30;43mSe han truncado las últimas 5000 líneas del flujo de salida.[0m
Batches: 100% 4/4 [00:00<00:00, 20.33it/s]
Batches: 100% 4/4 [00:00<00:00, 20.59it/s]
Batches: 100% 4/4 [00:00<00:00, 21.08it/s]
Batches: 100% 4/4 [00:00<00:00, 20.76it/s]
Batches: 100% 4/4 [00:00<00:00, 20.65it/s]
Batches: 100% 4/4 [00:00<00:00, 20.65it/s]
Batches: 100% 4/4 [00:00<00:00, 21.01it/s]
Batches: 100% 4/4 [00:00<00:00, 21.16it/s]
Batches: 100% 4/4 [00:00<00:00, 21.67it/s]
Batches: 100% 4/4 [00:00<00:00, 20.21it/s]
Batches: 100% 4/4 [00:00<00:00, 20.14it/s]
Batches: 100% 4/4 [00:00<00:00, 21.12it/s]
Batches: 100% 4/4 [00:00<00:00, 20.73it/s]
Batches: 100% 4/4 [00:00<00:00, 21.09it/s]
Batches: 100% 4/4 [00:00<00:00, 20.93it/s]
Batches: 100% 4/4 [00:00<00:00, 21.21it/s]
Batches: 100% 4/4 [00:00<00:00, 20.90it/s]
Batches: 100% 4/4 [00:00<00:00, 20.43it/s]
Batches: 100% 4/4 [00:00<00:00, 20.75it/s]
Batches: 100% 4/4 [00:00<00:00, 20.11it/s]
Batches: 100% 4/4 [00:00<00:00, 20.08it/s]
Batches: 100% 4/4 [00: