In this paper, we adopt contextual embeddings to perform performance prediction specifically for the task of query performance prediction.The fine-tuned contextual representations can estimate the performance of a query based on the association between the representation of the query and the retrieved documents. We compare the performance of our approach with the state-of-the-art based on the MS MARCO passage retrieval corpus and its three associated query sets: (1) MS MARCO development set, (2) TREC DL 2019, and (3) TREC DL 2020. We show that our approach not only shows significant improved prediction performance compared to all the state-of-the-art methods, but also, unlike past neural predictors, it shows significantly lower latency, making it possible to use in practice.
We adopt two architechtures namely cross-encoder network and bi-encoder network to address QPP task.
To replicate our results with BERT-QPPcross and BERT-QPPbi on MSMARCO passage collection,
- Clone this repository.
- Install the required packages are listed in
requirement.txt
on python 3.7+. - Download MSMARCO collection
collection.tsv
and store it incollection
repository. - If you are willing to predict the performance of BM25 retrieval method on MSMARCO, skip this step. Otherwise, when evaluating any other retrieval method, you need to prepare the similar run file to
bm25_first_docs_train.tsv
andbm25_first_docs_dev.tsv
which include the run file for first retrieved documents for queries in MSMARCO train and dev set.- The runfile of your desired retrieval approach should havethe folloinwg format for each query per line:
QID<\t>DOCID<\t>1
. - Then, modify the
run_file
variable increate_train_pkl_file.py
andcreate_test_pkl_file.py
so that they point to your desiredrun_file
s on train and sev set of MSMARCO.
- The runfile of your desired retrieval approach should havethe folloinwg format for each query per line:
- To train BERT-QPPcross, we require the query, the first retrieved document, and the queries' performance. To do so, in
create_train_pkl_file.py
we create a dictionary including the following attributes:
train_dic[qid] ["qtext"]=query_text
train_dic[qid] ["performance"]=query_performance_value
train_dic[qid]["doc_text"]=document_text
you can train the model on your desired metric by creating the assosiated train pkl file. Here, we use map@20.
Run create_train_pkl_file.py
to save a dictionary including query and document text as well as their associated performance. As a result train_map.pkl
will be saved in pklfiles
directory.
- Run
create_test_pkl_file.py
to save a dictionary including query and document text on the MSMARCO developement set. As a resulttest_dev_map.pkl
will be saved inpklfiles
directory.
- run
train_CE.py
to learn the map@20 of BM25 retrieval on MSMARCO train set. alternatively, you can train with your desired metric by creating the assosiated train pkl file. me On a single 24GB RTX3090 GPU, it took less than 2 hours. You may also change theepoch_num
,batch_size
, and initial pre-trained model in this file. We usedbert-base-uncased
in this experiment. The trained model will be saved inmodels
directory. - If you are not willing to train the model, you can download our BERT-QPPcross trained model on bert-based-uncased from here.
- add the
trained_model
you are willing to test intest_CE.py
and runtest_CE.py
. - The results will be saved in results directory in the following format: QID\tPredicted_QPP_value
The results will be saved in
results
directory in the following format:QID<\t>Predicted_QPP_value
- To evaluate the results, you can calculate the correlation between the actual performance of each query and predicted QPP value.
- run
train_bi.py
to learn the map@20 of BM25 retrieval on MSMARCO train set. . me On a single 24GB RTX3090 GPU, it took ~1hour. You may also change theepoch_num
,batch_size
, and initial pre-trained model in this file. We usedbert-base-uncased
in this experiment. The trained model will be saved inmodels
directory. - If you are not willing to train the model, you can download our BERT-QPPbi trained model on bert-based-uncased from here.
- add the
trained_model
you are willing to test intest_bi.py
and runtest_bi.py
. - The results will be saved in results directory in the following format: QID\tPredicted_QPP_value
The results will be saved in
results
directory in the following format:QID\tPredicted_QPP_value
- To evaluate the results, you can calculate the correlation between the actual performance of each query and predicted QPP value.