You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Large Language Models (LLMs) have achieved impressive performance acrossvarious reasoning tasks. However, even state-of-the-art LLMs such as ChatGPTare prone to logical errors during their reasoning processes. Existingsolutions, such as deploying task-specific verifiers or voting over multiplereasoning paths, either require extensive human annotations or fail inscenarios with inconsistent responses. To address these challenges, weintroduce RankPrompt, a new prompting method that enables LLMs to self-ranktheir responses without additional resources. RankPrompt breaks down theranking problem into a series of comparisons among diverse responses,leveraging the inherent capabilities of LLMs to generate chains of comparisonas contextual exemplars. Our experiments across 11 arithmetic and commonsensereasoning tasks show that RankPrompt significantly enhances the reasoningperformance of ChatGPT and GPT-4, with improvements of up to 13%. Moreover,RankPrompt excels in LLM-based automatic evaluations for open-ended tasks,aligning with human judgments 74% of the time in the AlpacaEval dataset. Italso exhibits robustness to variations in response order and consistency.Collectively, our results validate RankPrompt as an effective method foreliciting high-quality feedback from language models.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: