<a href="https://colab.research.google.com/github/SelmaDM/Pyterrier/blob/master/PyTerrier-Neural%20Re-ranking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTerrier - Neural Re-Ranking 

Dans ce TP vous allez :

 - reclasser des documents en utilisant des modèles neuronaux comme KNRM, Vanilla BERT, EPIC et monoT5.


# Installation


In [None]:
!pip install -q --upgrade python-terrier

### Installation des plugins Pyterrier  

Nous installons les plugins PyTerrier [OpenNIR](https://opennir.net/) et [monoT5](https://github.com/terrierteam/pyterrier_t5). Vous pouvez ignorer sans risque les erreurs de version des paquets.

In [None]:
!pip install -q --upgrade git+https://github.com/Georgetown-IR-Lab/OpenNIR
!pip install -q --upgrade git+https://github.com/terrierteam/pyterrier_t5

## Preliminary steps

Ces lignes sont nécessaires pour travailler avec DeepCT.

In [None]:
%tensorflow_version 1.x
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 
import tensorflow as tf
assert tf.__version__.startswith("1"), "TF 1 is required by DeepCT; on Colab, use %tensorflow_version 1.x"
tf.logging.set_verbosity(tf.logging.ERROR)

TensorFlow 1.x selected.


**[PyTerrier](https://github.com/terrier-org/pyterrier) initialization** 

In [None]:
import pyterrier as pt
if not pt.started():
    pt.init()

cord19 = pt.datasets.get_dataset('irds:cord19/trec-covid')


from pyterrier.measures import * # allow for natural measure names
import onir_pt

PyTerrier 0.8.0 has loaded Terrier 5.6 (built by craigmacdonald on 2021-09-17 13:27)

[INFO] trec-robust04 is deprecated. Consider using disks45/nocr/trec-robust-2004 instead, which provides better parsing of the corpus.
[INFO] trec-robust04/fold1 is deprecated. Consider using disks45/nocr/trec-robust-2004/fold1 instead, which provides better parsing of the corpus.
[INFO] trec-robust04/fold2 is deprecated. Consider using disks45/nocr/trec-robust-2004/fold2 instead, which provides better parsing of the corpus.
[INFO] trec-robust04/fold3 is deprecated. Consider using disks45/nocr/trec-robust-2004/fold3 instead, which provides better parsing of the corpus.
[INFO] trec-robust04/fold4 is deprecated. Consider using disks45/nocr/trec-robust-2004/fold4 instead, which provides better parsing of the corpus.
[INFO] trec-robust04/fold5 is deprecated. Consider using disks45/nocr/trec-robust-2004/fold5 instead, which provides better parsing of the corpus.


Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.


### Indexation

In [None]:
!rm -rf ./terrier_cord19/*

In [None]:
import os

pt_index_path = './terrier_cord19'

if not os.path.exists(pt_index_path + "/data.properties"):
    # create the index, using the IterDictIndexer indexer 
    indexer = pt.index.IterDictIndexer(pt_index_path, blocks=True)

    # we give the dataset get_corpus_iter() directly to the indexer
    # while specifying the fields to index and the metadata to record
    index_ref = indexer.index(cord19.get_corpus_iter(), 
                              fields=('abstract',), 
                              meta=('docno',))

else:
    # if you already have the index, use it.
    index_ref = pt.IndexRef.of(pt_index_path + "/data.properties")

cord19/trec-covid documents:   0%|          | 0/192509 [24ms<?, ?it/s]

  del sys.path[0]


21:08:58.801 [ForkJoinPool-1-worker-3] WARN org.terrier.structures.indexing.Indexer - Indexed 54937 empty documents
21:09:00.248 [ForkJoinPool-1-worker-3] ERROR org.terrier.structures.indexing.Indexer - Could not finish MetaIndexBuilder: 
java.io.IOException: Key 8lqzfj2e is not unique: 37597,11755
For MetaIndex, to suppress, set metaindex.compressed.reverse.allow.duplicates=true
	at org.terrier.structures.collections.FSOrderedMapFile$MultiFSOMapWriter.mergeTwo(FSOrderedMapFile.java:1374)
	at org.terrier.structures.collections.FSOrderedMapFile$MultiFSOMapWriter.close(FSOrderedMapFile.java:1308)
	at org.terrier.structures.indexing.BaseMetaIndexBuilder.close(BaseMetaIndexBuilder.java:321)
	at org.terrier.structures.indexing.classical.BlockIndexer.createDirectIndex(BlockIndexer.java:472)
	at org.terrier.structures.indexing.Indexer.index(Indexer.java:369)
	at org.terrier.python.ParallelIndexer$1.apply(ParallelIndexer.java:63)
	at org.terrier.python.ParallelIndexer$1.apply(ParallelIndexer.j

## Re-Rankers 

Commençons à explorer quelques méthodes neuronales de re-classement ! Nous pouvons les construire à partir de zéro en utilisant `onir_pt.reranker`.

Le modèle de re-ranking d'OpenNIR est composé de :
 - `ranker` (par exemple, `drmm`, `knrm`, ou `pacrr`). Ceci définit l'architecture neuronale de classement.
 - `vocab` (par exemple, `wordvec_hash`, ou `bert`). Ceci définit comment le texte est encodé par le modèle. Cette approche rend facile l'échange de différentes représentations de texte.

L'exécution de cette ligne prendra quelques minutes car elle télécharge et prépare les vecteurs de mots.

In [None]:
knrm = onir_pt.reranker('knrm', 'wordvec_hash', text_field='abstract')

[02;37m[2022-04-03 21:34:14,509][WordvecHashVocab][DEBUG] [0m[37m[starting] reading cached at /root/data/onir/vocab/wordvec_hash/fasttext-wiki-news-300d-1M.p[0m
[02;37m[2022-04-03 21:34:24,751][WordvecHashVocab][DEBUG] [0m[37m[finished] reading cached at /root/data/onir/vocab/wordvec_hash/fasttext-wiki-news-300d-1M.p [10.24s][0m


Voyons comment ces modèles fonctionnent pour le classement !

In [None]:
tfidf = pt.BatchRetrieve(index_ref, wmodel="TF_IDF") % 50
get_text = pt.text.get_text(cord19, 'abstract') #>> pt.apply.title_abstract(lambda r: r['title'] + ' ' + r['abstract'])


In [None]:
topics = cord19.get_topics(variant='description')
qrels = cord19.get_qrels()

  df.drop(df.columns.difference(['qid','query']), 1, inplace=True)


In [None]:
SEED=42

from sklearn.model_selection import train_test_split

tr_va_topics, test_topics = train_test_split(topics, test_size=15, random_state=SEED)
train_topics, valid_topics =  train_test_split(tr_va_topics, test_size=5, random_state=SEED)


test_qrels = qrels # seulement les annotations des topics en réponse sont utilisés, donc pas de problème si on utilise tout
train_qrels = qrels
valid_qrels = qrels

In [None]:

# build a sub-pipeline to get the concatenated title and abstract text
pipeline = tfidf >> get_text >> knrm
pt.Experiment(
    [tfidf, pipeline],
    test_topics,
    qrels,
    names=['TFIDF', 'TFIDF >> KNRM'],
    eval_metrics=[AP(rel=2), nDCG, nDCG@10, P(rel=2)@10]
)

[02;37m[2022-04-03 21:09:35,827][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:09:35,839][onir_pt][DEBUG] [0m[37m[starting] batches[0m


batches:   0%|          | 0/188 [23ms<?, ?it/s]

[02;37m[2022-04-03 21:09:36,988][onir_pt][DEBUG] [0m[37m[finished] batches: [1.15s] [188it] [163.84it/s][0m


Unnamed: 0,name,AP(rel=2),nDCG,nDCG@10,P(rel=2)@10
0,TFIDF,0.054568,0.123589,0.595818,0.546667
1,TFIDF >> KNRM,0.042547,0.112703,0.423958,0.36


Cela ne fonctionne pas très bien car le modèle n'est pas entraîné ; il utilise des poids aléatoires pour combiner les scores de la matrice de similarité.

## Entraînement du re-ranker

Vous pouvez entraîner des modèles de re-classement dans PyTerrier en utilisant la méthode `fit`. 

In [None]:
pipeline.fit(
    train_topics,
    train_qrels,
    valid_topics,
    valid_qrels)

[02;37m[2022-04-03 21:30:05,863][onir_pt][DEBUG] [0m[37m[starting] validation[0m
[02;37m[2022-04-03 21:30:05,866][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:30:05,868][onir_pt][DEBUG] [0m[37m[starting] batches[0m


batches:   0%|          | 0/63 [26ms<?, ?it/s]

[02;37m[2022-04-03 21:30:06,242][onir_pt][DEBUG] [0m[37m[finished] batches: [373ms] [63it] [169.11it/s][0m
[02;37m[2022-04-03 21:30:06,435][onir_pt][DEBUG] [0m[37m[finished] validation [572ms][0m
[02;37m[2022-04-03 21:30:06,436][onir_pt][INFO] [0m[32mpre-validation: 0.0061[0m
[02;37m[2022-04-03 21:30:06,439][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:30:06,440][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:30:06,441][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [32ms<?, ?it/s]

[02;37m[2022-04-03 21:30:06,815][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:07,828][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:09,827][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:11,892][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:12,785][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:14,794][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:16,861][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.42s] [1024it] [98.28it/s][0m
[02;37m[2022-04-03 21:30:16,867][onir_pt][DEBUG] [0m[37m[finished] training [10.43s][0m
[02;37m[2022-04-03 21:30:16,868][onir_pt][INFO] [0m[32mtraining   it=0 loss=0.1864[0m
[02;37m[2022-04-03 21:30:16,869][onir_pt][DEBUG] [0m[37m[starting] validation[0m
[02;37m[2022-04-03 21:30:16,873][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 2

batches:   0%|          | 0/63 [49ms<?, ?it/s]

[02;37m[2022-04-03 21:30:17,249][onir_pt][DEBUG] [0m[37m[finished] batches: [374ms] [63it] [168.49it/s][0m
[02;37m[2022-04-03 21:30:17,431][onir_pt][DEBUG] [0m[37m[finished] validation [562ms][0m
[02;37m[2022-04-03 21:30:17,433][onir_pt][INFO] [0m[32mvalidation it=0 map=0.0062 ndcg=0.0157 P_10=0.0740 <--[0m
[02;37m[2022-04-03 21:30:17,433][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:30:17,434][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:30:17,435][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [45ms<?, ?it/s]

[02;37m[2022-04-03 21:30:17,975][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:18,138][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:21,637][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:23,505][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:24,041][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:24,866][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:25,117][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:26,123][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:27,200][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:27,600][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:27,883][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.45s] [1024it] [98.01it/s][0m
[02;37m[2022-04-03 21:30:27,887][onir_pt][DEBUG] [0m[37m[finished] training [10.45s][0m


batches:   0%|          | 0/63 [44ms<?, ?it/s]

[02;37m[2022-04-03 21:30:28,273][onir_pt][DEBUG] [0m[37m[finished] batches: [382ms] [63it] [164.76it/s][0m
[02;37m[2022-04-03 21:30:28,456][onir_pt][DEBUG] [0m[37m[finished] validation [568ms][0m
[02;37m[2022-04-03 21:30:28,458][onir_pt][INFO] [0m[32mvalidation it=1 map=0.0063 ndcg=0.0157 P_10=0.0760 <--[0m
[02;37m[2022-04-03 21:30:28,459][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:30:28,461][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:30:28,461][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [25ms<?, ?it/s]

[02;37m[2022-04-03 21:30:29,558][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:30,351][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:31,448][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:31,773][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:32,205][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:32,875][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:33,257][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:35,814][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:38,534][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:38,893][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.43s] [1024it] [98.17it/s][0m
[02;37m[2022-04-03 21:30:38,898][onir_pt][DEBUG] [0m[37m[finished] training [10.44s][0m
[02;37m[2022-04-03 21:30:38,899][onir_pt][INFO] [0m[32mtraining   it=2 loss=0

batches:   0%|          | 0/63 [42ms<?, ?it/s]

[02;37m[2022-04-03 21:30:39,275][onir_pt][DEBUG] [0m[37m[finished] batches: [372ms] [63it] [169.36it/s][0m
[02;37m[2022-04-03 21:30:39,448][onir_pt][DEBUG] [0m[37m[finished] validation [549ms][0m
[02;37m[2022-04-03 21:30:39,449][onir_pt][INFO] [0m[32mvalidation it=2 map=0.0062 ndcg=0.0156 P_10=0.0720[0m
[02;37m[2022-04-03 21:30:39,450][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:30:39,450][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:30:39,451][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [23ms<?, ?it/s]

[02;37m[2022-04-03 21:30:39,507][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:42,840][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:43,619][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:44,360][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:44,962][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:45,412][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:47,505][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:48,664][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:49,614][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.16s] [1024it] [100.76it/s][0m
[02;37m[2022-04-03 21:30:49,617][onir_pt][DEBUG] [0m[37m[finished] training [10.17s][0m
[02;37m[2022-04-03 21:30:49,618][onir_pt][INFO] [0m[32mtraining   it=3 loss=0.1863[0m
[02;37m[2022-04-03 21:30:49,618][onir_pt][DEBUG] [0m[37m[starting

batches:   0%|          | 0/63 [62ms<?, ?it/s]

[02;37m[2022-04-03 21:30:50,014][onir_pt][DEBUG] [0m[37m[finished] batches: [393ms] [63it] [160.50it/s][0m
[02;37m[2022-04-03 21:30:50,204][onir_pt][DEBUG] [0m[37m[finished] validation [586ms][0m
[02;37m[2022-04-03 21:30:50,206][onir_pt][INFO] [0m[32mvalidation it=3 map=0.0062 ndcg=0.0156 P_10=0.0700[0m
[02;37m[2022-04-03 21:30:50,206][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:30:50,208][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:30:50,208][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [22ms<?, ?it/s]

[02;37m[2022-04-03 21:30:50,859][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:51,590][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:51,958][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:53,943][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:55,032][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:55,814][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:59,039][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:30:59,343][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:00,217][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.01s] [1024it] [102.32it/s][0m
[02;37m[2022-04-03 21:31:00,219][onir_pt][DEBUG] [0m[37m[finished] training [10.01s][0m
[02;37m[2022-04-03 21:31:00,219][onir_pt][INFO] [0m[32mtraining   it=4 loss=0.1856[0m
[02;37m[2022-04-03 21:31:00,219][onir_pt][DEBUG] [0m[37m[starting

batches:   0%|          | 0/63 [33ms<?, ?it/s]

[02;37m[2022-04-03 21:31:00,591][onir_pt][DEBUG] [0m[37m[finished] batches: [369ms] [63it] [170.58it/s][0m
[02;37m[2022-04-03 21:31:00,777][onir_pt][DEBUG] [0m[37m[finished] validation [558ms][0m
[02;37m[2022-04-03 21:31:00,779][onir_pt][INFO] [0m[32mvalidation it=4 map=0.0062 ndcg=0.0155 P_10=0.0720[0m
[02;37m[2022-04-03 21:31:00,779][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:31:00,781][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:31:00,781][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [42ms<?, ?it/s]

[02;37m[2022-04-03 21:31:01,025][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:01,883][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:02,926][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:03,029][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:03,432][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:04,681][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:05,462][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:08,975][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:11,268][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.49s] [1024it] [97.66it/s][0m
[02;37m[2022-04-03 21:31:11,270][onir_pt][DEBUG] [0m[37m[finished] training [10.49s][0m
[02;37m[2022-04-03 21:31:11,271][onir_pt][INFO] [0m[32mtraining   it=5 loss=0.1815[0m
[02;37m[2022-04-03 21:31:11,271][onir_pt][DEBUG] [0m[37m[starting]

batches:   0%|          | 0/63 [36ms<?, ?it/s]

[02;37m[2022-04-03 21:31:11,657][onir_pt][DEBUG] [0m[37m[finished] batches: [384ms] [63it] [164.04it/s][0m
[02;37m[2022-04-03 21:31:11,840][onir_pt][DEBUG] [0m[37m[finished] validation [569ms][0m
[02;37m[2022-04-03 21:31:11,841][onir_pt][INFO] [0m[32mvalidation it=5 map=0.0061 ndcg=0.0155 P_10=0.0700[0m
[02;37m[2022-04-03 21:31:11,842][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:31:11,843][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:31:11,843][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [23ms<?, ?it/s]

[02;37m[2022-04-03 21:31:12,903][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:13,818][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:18,140][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:19,592][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:20,703][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:21,813][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:22,431][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.59s] [1024it] [96.72it/s][0m
[02;37m[2022-04-03 21:31:22,433][onir_pt][DEBUG] [0m[37m[finished] training [10.59s][0m
[02;37m[2022-04-03 21:31:22,434][onir_pt][INFO] [0m[32mtraining   it=6 loss=0.1693[0m
[02;37m[2022-04-03 21:31:22,434][onir_pt][DEBUG] [0m[37m[starting] validation[0m
[02;37m[2022-04-03 21:31:22,434][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 2

batches:   0%|          | 0/63 [39ms<?, ?it/s]

[02;37m[2022-04-03 21:31:22,838][onir_pt][DEBUG] [0m[37m[finished] batches: [402ms] [63it] [156.77it/s][0m
[02;37m[2022-04-03 21:31:23,023][onir_pt][DEBUG] [0m[37m[finished] validation [589ms][0m
[02;37m[2022-04-03 21:31:23,024][onir_pt][INFO] [0m[32mvalidation it=6 map=0.0062 ndcg=0.0158 P_10=0.0700[0m
[02;37m[2022-04-03 21:31:23,025][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:31:23,026][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:31:23,027][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [22ms<?, ?it/s]

[02;37m[2022-04-03 21:31:24,392][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:24,833][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:25,587][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:25,921][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:26,245][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:29,104][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:30,042][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:33,110][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:33,750][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:34,085][onir_pt][DEBUG] [0m[37m[finished] train pairs: [11.06s] [1024it] [92.62it/s][0m
[02;37m[2022-04-03 21:31:34,090][onir_pt][DEBUG] [0m[37m[finished] training [11.06s][0m
[02;37m[2022-04-03 21:31:34,093][onir_pt][INFO] [0m[32mtraining   it=7 loss=0

batches:   0%|          | 0/63 [43ms<?, ?it/s]

[02;37m[2022-04-03 21:31:34,476][onir_pt][DEBUG] [0m[37m[finished] batches: [377ms] [63it] [167.23it/s][0m
[02;37m[2022-04-03 21:31:34,651][onir_pt][DEBUG] [0m[37m[finished] validation [556ms][0m
[02;37m[2022-04-03 21:31:34,652][onir_pt][INFO] [0m[32mvalidation it=7 map=0.0061 ndcg=0.0156 P_10=0.0700[0m
[02;37m[2022-04-03 21:31:34,653][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:31:34,654][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:31:34,655][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [45ms<?, ?it/s]

[02;37m[2022-04-03 21:31:34,793][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:36,339][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:36,967][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:37,126][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:38,147][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:40,422][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:41,669][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:42,166][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:42,634][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:43,236][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:44,551][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:45,206][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.55s] [1024it] [97.06it/s][0m
[02;37m[2022-

batches:   0%|          | 0/63 [42ms<?, ?it/s]

[02;37m[2022-04-03 21:31:45,593][onir_pt][DEBUG] [0m[37m[finished] batches: [379ms] [63it] [166.18it/s][0m
[02;37m[2022-04-03 21:31:45,771][onir_pt][DEBUG] [0m[37m[finished] validation [560ms][0m
[02;37m[2022-04-03 21:31:45,772][onir_pt][INFO] [0m[32mvalidation it=8 map=0.0060 ndcg=0.0153 P_10=0.0720[0m
[02;37m[2022-04-03 21:31:45,773][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:31:45,773][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:31:45,774][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [39ms<?, ?it/s]

[02;37m[2022-04-03 21:31:45,911][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:47,729][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:49,034][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:49,362][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:50,501][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:51,829][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:53,238][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:54,733][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:54,807][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:55,899][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:55,980][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.21s] [1024it] [100.34it/s][0m
[02;37m[2022-04-03 21:31:55,983][onir_pt][DEBUG] [0m[37m[finished] training [10.21s][0m


batches:   0%|          | 0/63 [44ms<?, ?it/s]

[02;37m[2022-04-03 21:31:56,367][onir_pt][DEBUG] [0m[37m[finished] batches: [381ms] [63it] [165.29it/s][0m
[02;37m[2022-04-03 21:31:56,549][onir_pt][DEBUG] [0m[37m[finished] validation [566ms][0m
[02;37m[2022-04-03 21:31:56,550][onir_pt][INFO] [0m[32mvalidation it=9 map=0.0061 ndcg=0.0154 P_10=0.0660[0m
[02;37m[2022-04-03 21:31:56,551][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:31:56,551][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:31:56,552][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [53ms<?, ?it/s]

[02;37m[2022-04-03 21:31:58,176][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:58,519][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:31:59,539][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:00,204][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:00,363][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:02,506][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:02,650][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:04,159][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:05,315][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:06,469][onir_pt][DEBUG] [0m[37m[finished] train pairs: [9.92s] [1024it] [103.27it/s][0m
[02;37m[2022-04-03 21:32:06,471][onir_pt][DEBUG] [0m[37m[finished] training [9.92s][0m
[02;37m[2022-04-03 21:32:06,471][onir_pt][INFO] [0m[32mtraining   it=10 loss=0

batches:   0%|          | 0/63 [44ms<?, ?it/s]

[02;37m[2022-04-03 21:32:06,858][onir_pt][DEBUG] [0m[37m[finished] batches: [385ms] [63it] [163.83it/s][0m
[02;37m[2022-04-03 21:32:07,039][onir_pt][DEBUG] [0m[37m[finished] validation [568ms][0m
[02;37m[2022-04-03 21:32:07,040][onir_pt][INFO] [0m[32mvalidation it=10 map=0.0061 ndcg=0.0154 P_10=0.0660[0m
[02;37m[2022-04-03 21:32:07,041][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:32:07,043][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:32:07,044][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [26ms<?, ?it/s]

[02;37m[2022-04-03 21:32:08,531][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:08,726][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:09,483][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:10,663][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:12,310][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:14,176][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:15,537][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:17,147][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.10s] [1024it] [101.37it/s][0m
[02;37m[2022-04-03 21:32:17,149][onir_pt][DEBUG] [0m[37m[finished] training [10.11s][0m
[02;37m[2022-04-03 21:32:17,150][onir_pt][INFO] [0m[32mtraining   it=11 loss=0.1746[0m
[02;37m[2022-04-03 21:32:17,150][onir_pt][DEBUG] [0m[37m[starting] validation[0m
[02;37m[2022-04-03 21:32:17,150][onir_pt][ERROR] [0m[31mgp

batches:   0%|          | 0/63 [41ms<?, ?it/s]

[02;37m[2022-04-03 21:32:17,544][onir_pt][DEBUG] [0m[37m[finished] batches: [392ms] [63it] [160.73it/s][0m
[02;37m[2022-04-03 21:32:17,719][onir_pt][DEBUG] [0m[37m[finished] validation [569ms][0m
[02;37m[2022-04-03 21:32:17,720][onir_pt][INFO] [0m[32mvalidation it=11 map=0.0062 ndcg=0.0155 P_10=0.0700[0m
[02;37m[2022-04-03 21:32:17,721][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:32:17,722][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:32:17,723][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [23ms<?, ?it/s]

[02;37m[2022-04-03 21:32:18,464][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:18,916][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:22,734][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:23,817][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:24,084][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:24,763][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:25,791][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:27,057][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:27,940][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.22s] [1024it] [100.23it/s][0m
[02;37m[2022-04-03 21:32:27,942][onir_pt][DEBUG] [0m[37m[finished] training [10.22s][0m
[02;37m[2022-04-03 21:32:27,943][onir_pt][INFO] [0m[32mtraining   it=12 loss=0.1739[0m
[02;37m[2022-04-03 21:32:27,943][onir_pt][DEBUG] [0m[37m[startin

batches:   0%|          | 0/63 [41ms<?, ?it/s]

[02;37m[2022-04-03 21:32:28,345][onir_pt][DEBUG] [0m[37m[finished] batches: [399ms] [63it] [157.94it/s][0m
[02;37m[2022-04-03 21:32:28,522][onir_pt][DEBUG] [0m[37m[finished] validation [579ms][0m
[02;37m[2022-04-03 21:32:28,523][onir_pt][INFO] [0m[32mvalidation it=12 map=0.0060 ndcg=0.0154 P_10=0.0700[0m
[02;37m[2022-04-03 21:32:28,523][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:32:28,524][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:32:28,524][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [43ms<?, ?it/s]

[02;37m[2022-04-03 21:32:29,185][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:29,350][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:31,325][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:31,867][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:35,636][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:36,402][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:36,651][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:37,646][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:38,929][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.40s] [1024it] [98.42it/s][0m
[02;37m[2022-04-03 21:32:38,931][onir_pt][DEBUG] [0m[37m[finished] training [10.41s][0m
[02;37m[2022-04-03 21:32:38,932][onir_pt][INFO] [0m[32mtraining   it=13 loss=0.1929[0m
[02;37m[2022-04-03 21:32:38,932][onir_pt][DEBUG] [0m[37m[starting

batches:   0%|          | 0/63 [45ms<?, ?it/s]

[02;37m[2022-04-03 21:32:39,338][onir_pt][DEBUG] [0m[37m[finished] batches: [403ms] [63it] [156.26it/s][0m
[02;37m[2022-04-03 21:32:39,516][onir_pt][DEBUG] [0m[37m[finished] validation [584ms][0m
[02;37m[2022-04-03 21:32:39,518][onir_pt][INFO] [0m[32mvalidation it=13 map=0.0060 ndcg=0.0153 P_10=0.0680[0m
[02;37m[2022-04-03 21:32:39,518][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:32:39,520][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:32:39,520][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [23ms<?, ?it/s]

[02;37m[2022-04-03 21:32:40,275][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:41,404][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:43,347][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:45,994][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:46,452][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:49,686][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:49,782][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.26s] [1024it] [99.80it/s][0m
[02;37m[2022-04-03 21:32:49,784][onir_pt][DEBUG] [0m[37m[finished] training [10.26s][0m
[02;37m[2022-04-03 21:32:49,785][onir_pt][INFO] [0m[32mtraining   it=14 loss=0.1828[0m
[02;37m[2022-04-03 21:32:49,785][onir_pt][DEBUG] [0m[37m[starting] validation[0m
[02;37m[2022-04-03 21:32:49,786][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 

batches:   0%|          | 0/63 [47ms<?, ?it/s]

[02;37m[2022-04-03 21:32:50,169][onir_pt][DEBUG] [0m[37m[finished] batches: [380ms] [63it] [165.70it/s][0m
[02;37m[2022-04-03 21:32:50,342][onir_pt][DEBUG] [0m[37m[finished] validation [556ms][0m
[02;37m[2022-04-03 21:32:50,342][onir_pt][INFO] [0m[32mvalidation it=14 map=0.0061 ndcg=0.0156 P_10=0.0720[0m
[02;37m[2022-04-03 21:32:50,343][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:32:50,344][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:32:50,344][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [22ms<?, ?it/s]

[02;37m[2022-04-03 21:32:50,410][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:50,992][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:52,596][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:53,177][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:53,689][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:56,296][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:58,525][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:58,828][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:32:59,382][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:00,146][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:00,742][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.40s] [1024it] [98.49it/s][0m
[02;37m[2022-04-03 21:33:00,747][onir_pt][DEBUG] [0m[37m[finished] training [10.40s][0m


batches:   0%|          | 0/63 [47ms<?, ?it/s]

[02;37m[2022-04-03 21:33:01,171][onir_pt][DEBUG] [0m[37m[finished] batches: [416ms] [63it] [151.61it/s][0m
[02;37m[2022-04-03 21:33:01,355][onir_pt][DEBUG] [0m[37m[finished] validation [607ms][0m
[02;37m[2022-04-03 21:33:01,355][onir_pt][INFO] [0m[32mvalidation it=15 map=0.0060 ndcg=0.0154 P_10=0.0660[0m
[02;37m[2022-04-03 21:33:01,356][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:33:01,357][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:33:01,357][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [41ms<?, ?it/s]

[02;37m[2022-04-03 21:33:02,187][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:03,130][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:03,312][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:03,552][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:03,994][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:04,711][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:10,031][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:10,206][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:12,317][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.96s] [1024it] [93.44it/s][0m
[02;37m[2022-04-03 21:33:12,321][onir_pt][DEBUG] [0m[37m[finished] training [10.96s][0m
[02;37m[2022-04-03 21:33:12,322][onir_pt][INFO] [0m[32mtraining   it=16 loss=0.1902[0m
[02;37m[2022-04-03 21:33:12,323][onir_pt][DEBUG] [0m[37m[starting

batches:   0%|          | 0/63 [41ms<?, ?it/s]

[02;37m[2022-04-03 21:33:12,745][onir_pt][DEBUG] [0m[37m[finished] batches: [419ms] [63it] [150.29it/s][0m
[02;37m[2022-04-03 21:33:12,935][onir_pt][DEBUG] [0m[37m[finished] validation [612ms][0m
[02;37m[2022-04-03 21:33:12,936][onir_pt][INFO] [0m[32mvalidation it=16 map=0.0060 ndcg=0.0154 P_10=0.0720[0m
[02;37m[2022-04-03 21:33:12,937][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:33:12,938][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:33:12,939][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [23ms<?, ?it/s]

[02;37m[2022-04-03 21:33:13,366][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:15,124][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:15,623][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:16,842][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:17,123][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:18,433][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:18,738][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:20,703][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:21,230][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:21,434][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:22,492][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:23,990][onir_pt][DEBUG] [0m[37m[finished] train pairs: [11.05s] [1024it] [92.67it/s][0m
[02;37m[2022-

batches:   0%|          | 0/63 [43ms<?, ?it/s]

[02;37m[2022-04-03 21:33:24,438][onir_pt][DEBUG] [0m[37m[finished] batches: [443ms] [63it] [142.10it/s][0m
[02;37m[2022-04-03 21:33:24,612][onir_pt][DEBUG] [0m[37m[finished] validation [619ms][0m
[02;37m[2022-04-03 21:33:24,614][onir_pt][INFO] [0m[32mvalidation it=17 map=0.0061 ndcg=0.0154 P_10=0.0700[0m
[02;37m[2022-04-03 21:33:24,614][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:33:24,621][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:33:24,623][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [28ms<?, ?it/s]

[02;37m[2022-04-03 21:33:24,708][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:24,919][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:25,886][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:26,378][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:27,506][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:29,472][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:30,282][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:31,642][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:34,817][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:35,698][onir_pt][DEBUG] [0m[37m[finished] train pairs: [11.07s] [1024it] [92.46it/s][0m
[02;37m[2022-04-03 21:33:35,703][onir_pt][DEBUG] [0m[37m[finished] training [11.08s][0m
[02;37m[2022-04-03 21:33:35,704][onir_pt][INFO] [0m[32mtraining   it=18 loss=

batches:   0%|          | 0/63 [40ms<?, ?it/s]

[02;37m[2022-04-03 21:33:36,146][onir_pt][DEBUG] [0m[37m[finished] batches: [439ms] [63it] [143.46it/s][0m
[02;37m[2022-04-03 21:33:36,328][onir_pt][DEBUG] [0m[37m[finished] validation [624ms][0m
[02;37m[2022-04-03 21:33:36,329][onir_pt][INFO] [0m[32mvalidation it=18 map=0.0059 ndcg=0.0151 P_10=0.0680[0m
[02;37m[2022-04-03 21:33:36,330][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:33:36,330][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:33:36,331][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [41ms<?, ?it/s]

[02;37m[2022-04-03 21:33:37,678][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:38,186][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:41,258][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:43,718][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:44,153][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:44,603][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:45,087][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:47,426][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:47,529][onir_pt][DEBUG] [0m[37m[finished] train pairs: [11.20s] [1024it] [91.45it/s][0m
[02;37m[2022-04-03 21:33:47,535][onir_pt][DEBUG] [0m[37m[finished] training [11.20s][0m
[02;37m[2022-04-03 21:33:47,536][onir_pt][INFO] [0m[32mtraining   it=19 loss=0.1878[0m
[02;37m[2022-04-03 21:33:47,537][onir_pt][DEBUG] [0m[37m[starting

batches:   0%|          | 0/63 [49ms<?, ?it/s]

[02;37m[2022-04-03 21:33:47,954][onir_pt][DEBUG] [0m[37m[finished] batches: [415ms] [63it] [151.86it/s][0m
[02;37m[2022-04-03 21:33:48,135][onir_pt][DEBUG] [0m[37m[finished] validation [598ms][0m
[02;37m[2022-04-03 21:33:48,136][onir_pt][INFO] [0m[32mvalidation it=19 map=0.0059 ndcg=0.0153 P_10=0.0700[0m
[02;37m[2022-04-03 21:33:48,137][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:33:48,137][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:33:48,138][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [22ms<?, ?it/s]

[02;37m[2022-04-03 21:33:48,974][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:49,114][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:49,412][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:49,858][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:53,238][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:54,389][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:55,335][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:56,950][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:58,697][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:33:59,084][onir_pt][DEBUG] [0m[37m[finished] train pairs: [10.95s] [1024it] [93.56it/s][0m
[02;37m[2022-04-03 21:33:59,089][onir_pt][DEBUG] [0m[37m[finished] training [10.95s][0m
[02;37m[2022-04-03 21:33:59,090][onir_pt][INFO] [0m[32mtraining   it=20 loss=

batches:   0%|          | 0/63 [43ms<?, ?it/s]

[02;37m[2022-04-03 21:33:59,544][onir_pt][DEBUG] [0m[37m[finished] batches: [450ms] [63it] [139.94it/s][0m
[02;37m[2022-04-03 21:33:59,741][onir_pt][DEBUG] [0m[37m[finished] validation [651ms][0m
[02;37m[2022-04-03 21:33:59,742][onir_pt][INFO] [0m[32mvalidation it=20 map=0.0061 ndcg=0.0155 P_10=0.0700[0m
[02;37m[2022-04-03 21:33:59,742][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:33:59,743][onir_pt][DEBUG] [0m[37m[starting] training[0m
[02;37m[2022-04-03 21:33:59,744][onir_pt][DEBUG] [0m[37m[starting] train pairs[0m


train pairs:   0%|          | 0/1024 [28ms<?, ?it/s]

[02;37m[2022-04-03 21:34:00,285][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:34:01,434][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:34:01,814][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:34:02,540][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:34:05,119][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:34:05,756][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:34:07,138][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:34:07,941][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:34:08,474][onir_pt][DEBUG] [0m[37mnot enough negs[0m
[02;37m[2022-04-03 21:34:10,755][onir_pt][DEBUG] [0m[37m[finished] train pairs: [11.01s] [1024it] [93.00it/s][0m
[02;37m[2022-04-03 21:34:10,760][onir_pt][DEBUG] [0m[37m[finished] training [11.02s][0m
[02;37m[2022-04-03 21:34:10,761][onir_pt][INFO] [0m[32mtraining   it=21 loss=

batches:   0%|          | 0/63 [47ms<?, ?it/s]

[02;37m[2022-04-03 21:34:11,196][onir_pt][DEBUG] [0m[37m[finished] batches: [431ms] [63it] [146.16it/s][0m
[02;37m[2022-04-03 21:34:11,377][onir_pt][DEBUG] [0m[37m[finished] validation [616ms][0m
[02;37m[2022-04-03 21:34:11,378][onir_pt][INFO] [0m[32mvalidation it=21 map=0.0061 ndcg=0.0154 P_10=0.0680[0m
[02;37m[2022-04-03 21:34:11,379][onir_pt][INFO] [0m[32mearly stopping; model reverting back to it=1[0m


In [None]:
pt.Experiment(
    [tfidf, pipeline],
    test_topics,
    qrels,
    names=['TFIDF', 'TFIDF >> KNRM (trained)'],
    eval_metrics=[AP(rel=2), nDCG, nDCG@10, P(rel=2)@10]
)

[02;37m[2022-04-03 21:34:13,522][onir_pt][ERROR] [0m[31mgpu=True, but CUDA is not available. Falling back on CPU.[0m
[02;37m[2022-04-03 21:34:13,523][onir_pt][DEBUG] [0m[37m[starting] batches[0m


batches:   0%|          | 0/188 [40ms<?, ?it/s]

[02;37m[2022-04-03 21:34:14,426][onir_pt][DEBUG] [0m[37m[finished] batches: [902ms] [188it] [208.41it/s][0m


Unnamed: 0,name,AP(rel=2),nDCG,nDCG@10,P(rel=2)@10
0,TFIDF,0.054568,0.123589,0.595818,0.546667
1,TFIDF >> KNRM (trained),0.053241,0.122879,0.573381,0.493333


Le résultat est plus élévé, mais cela reste moins bon que le TFIDF. Proposez une hypothèse sur le problème.

## Vanilla BERT

Les modèles linguistiques contextualisés, tels que [BERT] (https://arxiv.org/abs/1810.04805), sont des modèles neuronaux beaucoup plus puissants qui se sont avérés efficaces pour le classement.

Nous allons essayer d'utiliser une version "vanille" (ou "mono") du modèle BERT. Le modèle BERT est pré-entraîné pour la modélisation du langage et la prédiction de la phrase suivante.

In [None]:
#del knrm # clear out memory from KNRM
vbert = onir_pt.reranker('vanilla_transformer', 'bert', text_field='abstract', vocab_config={'train': True})

Voyons comment ce modèle se comporte sur TREC COVID.

In [None]:
pipeline = tfidf % 50 >> get_text >> vbert
pt.Experiment(
    [tfidf, pipeline],
    test_topics,
    qrels,
    names=['TFIDF', 'TFIDF >> VBERT'],
    baseline=0,
    eval_metrics=[AP(rel=2), nDCG, nDCG@10, P(rel=2)@10]
)

Comme nous le constatons, bien que le modèle soit pré-entraîné, il n'obtient pas de très bons résultats. Cela est dû au fait qu'il n'est pas réglé pour la tâche de classement par pertinence.

Cependant, nous pouvons entraîner le modèle pour le classement (comme indiqué ci-dessus pour KNRM).

## monoT5

Le modèle [monoT5](https://arxiv.org/abs/2003.06713) évalue les documents à l'aide d'un modèle de langage causal. Voyons comment cette approche fonctionne sur TREC COVID.

La classe `MonoT5ReRanker` de `pyterrier_t5` charge automatiquement une version du classeur monoT5 qui est entraînée sur le jeu de données MS MARCO passage.

In [None]:
from pyterrier_t5 import MonoT5ReRanker
monoT5 = MonoT5ReRanker(text_field='abstract')

In [None]:
pipeline = (tfidf >> get_text >> monoT5)
pt.Experiment(
    [tfidf, pipeline],
    test_topics,
    qrels,
    names=['TFIDF', 'TFIDF >> T5'],
    eval_metrics=[AP(rel=2), nDCG, nDCG@10, P(rel=2)@10, "mrt"]
)

Comme on pouvait s'y attendre, les résultats sont bien meilleurs en termes de NDCG@10 (0.5958 vs 0.6855).

# Tâche pratique

Comme pour le TP precedent, utilisez les modèles implémentés pour cord19 dans une tâche de question-réponse. Dans ce contexte, les requêtes sont de questions et les documents sont des documents qui pourraient contenir la réponse. Notez que vous devez refaire l'indexation ainsi que les autres étapes étudiées dans ce TP. Vous pouvez le télécharger le dataset en utilisant les lines de code ci-dessous.

In [None]:
fiqa = {}
fiqa['train'] = pt.datasets.get_dataset('irds:beir/fiqa/train')
fiqa['valid'] = pt.datasets.get_dataset('irds:beir/fiqa/dev')
fiqa['test'] = pt.datasets.get_dataset('irds:beir/fiqa/test')

test_topics = fiqa['test'].get_topics(variant='text')
test_qrels = fiqa['test'].get_qrels()

train_topics = fiqa['train'].get_topics(variant='text')
train_qrels = fiqa['train'].get_qrels()

valid_topics = fiqa['valid'].get_topics(variant='text')
valid_qrels = fiqa['valid'].get_qrels()

In [None]:

#INDEXATION

import pyterrier as pt
if not pt.started():
  pt.init()
fiqa = pt.datasets.get_dataset('irds:beir/fiqa')

import os

pt_index_path = './terrier_fiqa'

if not os.path.exists(pt_index_path + "/data.properties"):
    # create the index, using the IterDictIndexer indexer 
    indexer = pt.index.IterDictIndexer(pt_index_path, blocks=True)

    # we give the dataset get_corpus_iter() directly to the indexer
    # while specifying the fields to index and the metadata to record
    index_ref = indexer.index(fiqa.get_corpus_iter(), 
                             )

else:
    # if you already have the index, use it.
    index_ref = pt.IndexRef.of(pt_index_path + "/data.properties")
    

[INFO] [starting] building docstore
[INFO] [starting] opening zip file
[INFO] [finished] opening zip file [14ms]
docs_iter: 100%|████████████████████████| 57638/57638 [2.51s<0ms, 22986.49doc/s]
[INFO] [finished] docs_iter: [2.51s] [57638doc] [22972.84doc/s]
[INFO] [finished] building docstore [2.52s]


beir/fiqa documents:   0%|          | 0/57638 [24ms<?, ?it/s]

In [None]:
tfidf = pt.BatchRetrieve(index_ref, wmodel="TF_IDF") % 50
get_text = pt.text.get_text(fiqa, 'abstract') #>> pt.apply.title_abstract(lambda r: r['title'] + ' ' + r['abstract'])

In [None]:
topics = fiqa.get_topics(variant='description')
qrels = fiqa.get_qrels()

In [None]:

# build a sub-pipeline to get the concatenated title and abstract text
pipeline = tfidf >> get_text >> knrm
pt.Experiment(
    [tfidf, pipeline],
    test_topics,
    qrels,
    names=['TFIDF', 'TFIDF >> KNRM'],
    eval_metrics=[AP(rel=2), nDCG, nDCG@10, P(rel=2)@10]
)

In [None]:
pipeline.fit(
    train_topics,
    train_qrels,
    valid_topics,
    valid_qrels)

In [None]:
pt.Experiment(
    [tfidf, pipeline],
    test_topics,
    qrels,
    names=['TFIDF', 'TFIDF >> KNRM (trained)'],
    eval_metrics=[AP(rel=2), nDCG, nDCG@10, P(rel=2)@10]
)


Identifiez-vous des avantages par rapport à l'utilisation de Learning to Rank ?