Original implementation of the paper "Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate" in Findings of EMNLP-23 by Boshi Wang, Xiang Yue and Huan Sun.
Put your OpenAI API key in a file called "api_key.txt".
.
├── grade-school-math/ # GSM8K (https://arxiv.org/abs/2110.14168)
├── prontoqa/ # PrOntoQA (https://arxiv.org/abs/2210.01240)
├── commonsense/ # commonsense reasoning, including StrategyQA, CommonsenseQA-2.0, and Creak
└── BBH/ # big-bench-hard (https://arxiv.org/abs/2210.09261)
main.ipynb
in each sub-directory contains the code and cached evaluation results.- Some randomly-sampled failure examples are included in here.
@inproceedings{wang2023can,
title={Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate},
author={Wang, Boshi and Yue, Xiang and Sun, Huan},
booktitle={Findings of EMNLP},
year={2023}
}