This repository contains the code and data for the paper "Negative Sampling Techniques for Dense Passage Retrieval in a Multilingual Setting" submitted to the Reproducibility Track of SIGIR 2024.
To install the required packages, run the following commands:
conda create -n ns python=3.10 pandas tqdm
conda activate ns
conda install pytorch=2.0.1 pytorch-cuda=11.7 -c pytorch -c nvidia
conda install -c pytorch -c nvidia faiss-gpu=1.7.4 mkl=2021 blas=1.0=mkl
pip install transformers
pip install datasets
pip install simpletransformers
Run download_data.sh to download the data.
bash download_data.sh
To train the models, run the following commands:
bash train.sh
You can edit the train.sh file to select the models you want to train.
The hyperparameters can be changed in the training scripts. The hyperparameters are set to the values used in the paper by default.
To evaluate the models, run the following commands:
bash evaluate.sh
You can edit the evaluate.sh file to select the models you want to evaluate.
The results will be saved in the results
directory.