Source Bias

The official repository of the KDD 2024 paper "Neural Retrievers are Biased Towards LLM-Generated Content". [arXiv]

🌟 New Release!🌟 Check out our latest project, "Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration" at GitHub. This extensive benchmark includes 16 datasets, over ten popular retrieval models, and easy-to-use evaluation tools. Please dive into our repository for more details!

Citation

If you find our code or work useful for your research, please cite our work.

@article{dai2024neural,
  title={Neural Retrievers are Biased Towards LLM-Generated Content},
  author={Dai, Sunhao and Zhou, Yuqi and Pang, Liang and Liu, Weihao and Hu, Xiaolin and Liu, Yong and Zhang, Xiao and Wang, Gang and Xu, Jun},
  journal={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  year={2024}
}

Quick Start

For details of datasets, please check file datasets/README.md
For details of evaluating codes, please check the code in the folder evaluate/
For details of dataloader code, please check the file beir/datasets/data_loader.py

File Structure

.
├── beir  # * evaluating codes from beir
│   ├── datasets # * codes for datalaoder
│   ├── reranking # * codes for reranking model
│   └── retrieval # * codes for lexical and dense retrieval model 
├── datasets
│   ├── 0.2 # * corpus generted by LLM with temperature 0.2
│   ├── 1.0 # * corpus generted by LLM with temperature 1.0
│   └── qrels # * relevance for queries
└── evaluate  # * codes for evaluating different retrieval model

Quick Start Example with Contriever

# test on human corpus
python evaluate/evaluate_contriever.py --test_dataset scifact \
    --target human --candidate_lm human

# test on llama-2-7b-chat corpus
python evaluate/evaluate_contriever.py --test_dataset scifact \
    --target llama-2-7b-chat --candidate_lm llama-2-7b-chat

# test metric targeting on human-written on mix-corpora
python evaluate/evaluate_contriever.py --test_dataset scifact \
    --target human --candidate_lm human llama-2-7b-chat

# test metric targeting on LLM-generated on mix-corpora
python evaluate/evaluate_contriever.py --test_dataset scifact \
    --target llama-2-7b-chat --candidate_lm human llama-2-7b-chat

Dependencies

The Cocktail benchmark is built based on BEIR and Sentence Transformers.

This repository has the following dependency requirements.

python==3.10.13
pandas==2.1.4
scikit-learn==1.3.2
evaluate==0.4.1
sentence-transformers==2.2.2
spacy==3.7.2
tiktoken==0.5.2
pytrec-eval==0.5

The required packages can be installed via pip install -r requirements.txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Source Bias

Citation

Quick Start

File Structure

Quick Start Example with Contriever

Dependencies

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
beir		beir
datasets		datasets
evaluate		evaluate
README.md		README.md
requirements.txt		requirements.txt

KID-22/Source-Bias

Folders and files

Latest commit

History

Repository files navigation

Source Bias

Citation

Quick Start

File Structure

Quick Start Example with Contriever

Dependencies

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages