Skip to content

DylanJoo/crossencoder-readqg

Repository files navigation

Crossencoder with ReadQG-augmeneted data

This repository is only for fine-tuning cross-encoder with ReadQG dataset. The details of ReadQG (generator) is in this repo: ReadQG.


Preliminary

Requirement

pip install pyserini
pip install transformers
pip install datasets
pip install sentence-transformers

BEIR Dataset

  • Corpus/queires. We download the datasets from BEIR repository.

  • First-stage retrieval results (runs) You can reproduce via Pyserini's 2CR. Or download from our Huggingface dataset. There are also preprocessed qrels, which is the same as in this repo.

Generated ReadQG Data (for training cross-encoder)

For each dataset in BEIR, we've constructed 10 queries for each documents. We uploaded the datset in Huggingface . The dataset includes mulitple versions of data, comprsing of (1) different decoding methods (greedy, top10, beam3), and (2) different generator models.

Please checkout ReadQG repo for more details regarding the generator.

Evaluation tools

We use the official trec_eval for evaluation. See TREC NIST's page

Training

You should replace the data/model path in scripts. The argument description will be updated soon.

bash train_oodrerank.sh

Reranking

You should replace the data/model path in scripts. The argument description will be updated soon.

bash rerank_oodrerank.sh

About

The training codes for crossencoder using ReadQG-BEIR synthetic data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors