RareBench Can LLMs Serve as Rare Diseases Specialists?

RareBench is a pioneering benchmark designed to systematically evaluate the capabilities of LLMs on 4 critical dimensions within the realm of rare diseases. Meanwhile, we have compiled the largest open-source dataset on rare disease patients, establishing a benchmark for future studies in this domain. To facilitate differential diagnosis of rare diseases, we develop a dynamic few-shot prompt methodology, leveraging a comprehensive rare disease knowledge graph synthesized from multiple knowledge bases, significantly enhancing LLMs’ diagnos- tic performance. Moreover, we present an exhaustive comparative study of GPT-4’s diagnostic capabilities against those of specialist physicians. Our experimental findings underscore the promising potential of integrating LLMs into the clinical diagnostic process for rare diseases.

⚙️ How to evaluate on LongBench

Load Data

from datasets import load_dataset

datasets = ["RAMEDIS", "MME", "HMS", "LIRICAL", "PUMCH_ADM"]

for dataset in datasets:
    data = load_dataset('chenxz/RareBench', dataset, split='test')
    print(data)

API-based LLMs

Put your own Openai key in the llm_utils/gpt_key.txt file.
Put your own Gemini key in the llm_utils/gemini_key.txt file.
Put your own Zhipuai key in the llm_utils/glm_key.txt file.

Local LLMs

Replace the content in the mapping/local_llm_path.json file with the path to the LLM on your local machine.

📄 Acknowledgement

Some of the dataset of RareBench are based on previous researchers, including RAMEDIS, MME, LIRICAL PhenoBrain.

📝 Citation

@article{chen2024rarebench,
  title={RareBench: Can LLMs Serve as Rare Diseases Specialists?},
  author={Chen, Xuanzhong and Mao, Xiaohao and Guo, Qihan and Wang, Lun and Zhang, Shuyang and Chen, Ting},
  journal={arXiv preprint arXiv:2402.06341},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
llm_utils		llm_utils
mapping		mapping
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
prompt.py		prompt.py
run.sh		run.sh
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RareBench Can LLMs Serve as Rare Diseases Specialists?

⚙️ How to evaluate on LongBench

Load Data

API-based LLMs

Local LLMs

📄 Acknowledgement

📝 Citation

About

Releases

Packages

Languages

License

chenxz1111/RareBench

Folders and files

Latest commit

History

Repository files navigation

RareBench Can LLMs Serve as Rare Diseases Specialists?

⚙️ How to evaluate on LongBench

Load Data

API-based LLMs

Local LLMs

📄 Acknowledgement

📝 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages