Offical code for 2023 ACM SIGIR International Conference on the Theory of Information Retrieval paper: Zero-shot Query Reformulation for Conversational Search
If you find this repo to be helpful, please considering cite our paper:
@inproceedings{dayu-2023-zeqr,
author = {Yang, Dayu and Zhang, Yue and Hui, Fang},
booktitle = {ICTIR 2023: The 13th International Conference on the Theory of Information Retrieval},
month = {July},
publisher = {ACM},
title = {Zero-shot Query Reformulation for Conversational Search},
year = {2023}}
- install dependent packages
conda env create -f zeqr_env.yml
conda activate zeqr
-
create indices Since the index files are enormous, we apologize we cannot upload them to github.
- For the download and preprocess instruction, please refer to TREC CAsT. The data preprocess step is achieved by trec-cast-tool. The
trec-cast-tool
is basically removing duplicate documents and converting original files to jsonl files. Please git checkout to certain versions oftrec-cast-tool
for different CAsT years. - After successfully downloading and preprocessing data, the data folder for each CAsT year should be like:
- 2019 (~19GB) - car.jsonl - msmarco_passage_v1.jsonl - wp.jsonl - 2020 (~16GB) - car.jsonl - msmarco_passage_v1.jsonl - 2021 (~28GB) - kilt.jsonl - msmarco_document.jsonl - wp.jsonl - 2022 (~120GB) - kilt.jsonl - msmarco_passage_v2.jsonl - wp.jsonl
- after
.jsonl
files are obtained, indexing using Pyserini. The configuration for indexing really depends on the storage/computational power of your machine. A sample.sh
file for indexing is likesample/index/indexing.sh
. (You must create 20 index shreds(0-19) for now. Will update to support arbitrary numbers of shreds later.) - After you have successfully created index, the folder under index/ should be like(assume you create index using tctcolbert dense indexer):
- index_tctcolbert - dense_2019 - cast2019_den0 - cast2019_den1 - ... - cast2019_den18 - cast2019_den19 - docid - index - dense_2020 - dense_2021 - dense_2022
- For the download and preprocess instruction, please refer to TREC CAsT. The data preprocess step is achieved by trec-cast-tool. The
-
fine-tuning BERT for MRC
- Option1: train your own BERT for MRC checkpoint by running the script
MRC/train.py
- Option2: using our checkpoint uploaded in huggingface.
- Option1: train your own BERT for MRC checkpoint by running the script
-
retrieval
- Option1:
- run
ranking_<CAsT_year_to_rank>.sh
to create sub ranking files. Each sub ranking file is a.pkl
file which is named asdense[shred_of_index]_[qid].pkl
. (reason to split as many small.pkl
files is acceleration.) - run
eval_retrieval_results.py
.
- It has two arguments.
name
is the name of your experiment, should be consistent to what you set in the.sh
scriptrun_name
year
is which year of CAsT dataset you are using. Should be2019
,2020
,2021
, or2022
.
- This script will automatically combine all the small
.pkl
and output a full ranking file under theretrieval_sesults/<name_of_your_experiment>/
, name as<year>.run
.
- after the run file is obtained. you can use
eval_sample_results.ipynb
to compute any additional metrics you want.
- run
- Option2: skip the ranking step and evaluate the provided
.run
file insamples/ranking_results
. See Evaluation Step for detail.
- Option1:
-
Evaluation
- Option1: run
eval_ranking_results.py --year <year> --run_name <your_experiment_name>
- in case you want to compute additional metrics, you can use
eval_sample_results.ipynb
. - For CAsT-19, due to some bugs of
trec-cast-tool
, the processed documents (retrieval candidates pool) has some duplicated files. You need to use the officialdedup.py
fromtrec-cast-tool
to filter out duplicated docid before usingtrec-eval
to evaluation. - For CAsT-21, because the run files submitted by many teams only contains the docid(no passage id). The organizer decided to only evaluate on docid.
trec-cast-tool
provide adocize.py
script to transform passage id to docid. You need to do this before evaluation.
- in case you want to compute additional metrics, you can use
- Option2: directly use the
.run
file I provided insample/ranking_results/
to reproduce the experiment results.
- Option1: run
Please contact me via email (dayu ant udel dot edu) if you have any question. Thanks!
Our framework is designed for adaptability, enabling easy integration with various Machine Reading Comprehension (MRC) and Question Answering (QA) models, including large ones like ChatGPT. While we've implemented MRC, the goal is to enable researchers to fully utilize the system independently, without needing external resources. It allows training MRC models using standard datasets like SQuAD on consumer-level GPUs in about an hour.
We've included results from experiments using ChatGPT in the Appendix, showing performance nearing human capabilities.
Please use ChatGPT/eval_step_multicompare.ipynb
to see the experiment result for replacing MRC model as ChatGPT.