Exploring methods to improve effectiveness of ad-hoc retrieval systems for long and complex queries

This is the implementation corresponding to my final BSc thesis as part of the TU Delft 2024 Research Project, the paper can be found here. We explore different methods to improve ranking quality of long and complex queries in different ad-hoc retrieval tasks, using the Fast-Forward index framework. The methods explored include query reduction using large language models and re-ranking utilising multiple semantic models.

Installation & Usage

Install all necessary dependencies:

pip install -r requirements.txt

In order to run this code, for each dataset both a sparse and dense index are needed. Sparse indexing was done using PyTerrier using the following script pt_index.py. All the datasets used are provided by ir_datasets and can be accessed in PyTerrier. Dense indexing was done using the Fast-Forward index framework, the scripts for all 3 dense encoders are available in the fast_forward_indexing directory.

Note: due to their large storage size, it isn't possible to upload the indexes to this repository. The indexing process is very resource-intensive and was primarily conducted using the Delft High Performance Computing Centrte. As this may be a limitation for some users, the indexes are also available upon request.

Overview

The repository is organized as follows:

fast_forward_indexing - includes all the scripts related to dense indexing. /fast_forward_indexing/script_pt.sh contains the bash script utilised when indexing in the DelftBlue supercomputer.
length_experiments - collection of scripts that measure the retrieval quality for each individual query of the dataset and plots it against their respective length.
multi_rerank - contains all the experiments related to utilising multiple dense re-rankers in the Fast-Forward framework.
- generate_scores - generate the final ranking scores before interpolation
- multi_rank - experiments that compare the ranking performance for various numbers of dense re-rankers.
- scifact_alpha_tuning - script that tuned the alpha values in the development set to their optimal values
query_reduction - contains all the experiments related to query reduction using LLM's.
- llama3_reduce.py - script that generates the reductions using Meta-Llama-3-8B-Instruct model.
- reduced_queries - directory that stores the reduced queries generated in csv format.
- system_prompts.txt - system prompts utilised for each dataset.
- eval_reduction_* - scripts that compare ranking quality between the original and reduced queries.
sparse_indexing - includes all the scripts related to sparse indexing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring methods to improve effectiveness of ad-hoc retrieval systems for long and complex queries

Installation & Usage

Overview

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
fast_forward_indexing		fast_forward_indexing
length_experiments		length_experiments
multi_rerank		multi_rerank
query_reduction		query_reduction
sparse_indexing		sparse_indexing
README.md		README.md
requirements.txt		requirements.txt

Erhan1706/fast-forward-long-query-effectiveness

Folders and files

Latest commit

History

Repository files navigation

Exploring methods to improve effectiveness of ad-hoc retrieval systems for long and complex queries

Installation & Usage

Overview

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages