Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate

Original implementation of the paper "Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate" in Findings of EMNLP-23 by Boshi Wang, Xiang Yue and Huan Sun.

Setup

Put your OpenAI API key in a file called "api_key.txt".

Repo Tour

.
├── grade-school-math/             # GSM8K (https://arxiv.org/abs/2110.14168)
├── prontoqa/                      # PrOntoQA (https://arxiv.org/abs/2210.01240)
├── commonsense/                   # commonsense reasoning, including StrategyQA, CommonsenseQA-2.0, and Creak
└── BBH/                           # big-bench-hard (https://arxiv.org/abs/2210.09261)

main.ipynb in each sub-directory contains the code and cached evaluation results.
Some randomly-sampled failure examples are included in here.

Citation

@inproceedings{wang2023can,
    title={Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate},
    author={Wang, Boshi and Yue, Xiang and Sun, Huan},
    booktitle={Findings of EMNLP},
    year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate

Setup

Repo Tour

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
BBH		BBH
commonsense		commonsense
grade-school-math		grade-school-math
prontoqa		prontoqa
Readme.md		Readme.md

OSU-NLP-Group/Auto-Dialectical-Evaluation

Folders and files

Latest commit

History

Repository files navigation

Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate

Setup

Repo Tour

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages