This repository is only for fine-tuning cross-encoder with ReadQG dataset. The details of ReadQG (generator) is in this repo: ReadQG.
pip install pyserini
pip install transformers
pip install datasets
pip install sentence-transformers
-
Corpus/queires. We download the datasets from BEIR repository.
-
First-stage retrieval results (runs) You can reproduce via Pyserini's 2CR. Or download from our Huggingface dataset. There are also preprocessed qrels, which is the same as in this repo.
For each dataset in BEIR, we've constructed 10 queries for each documents.
We uploaded the datset in Huggingface .
The dataset includes mulitple versions of data, comprsing of
(1) different decoding methods (greedy, top10, beam3), and
(2) different generator models.
Please checkout ReadQG repo for more details regarding the generator.
We use the official trec_eval for evaluation.
See TREC NIST's page
You should replace the data/model path in scripts. The argument description will be updated soon.
bash train_oodrerank.sh
You should replace the data/model path in scripts. The argument description will be updated soon.
bash rerank_oodrerank.sh