Branch-Solve-Merge Improves Large Language Model Evaluation and Generation, Swarnadeep Saha+, N/A, arXiv'23 #1088

AkihikoWatanabe · 2023-10-25T06:19:07Z

URL

https://arxiv.org/abs/2310.15123

Affiliations

Swarnadeep Saha, N/A
Omer Levy, N/A
Asli Celikyilmaz, N/A
Mohit Bansal, N/A
Jason Weston, N/A
Xian Li, N/A

Abstract

Large Language Models (LLMs) are frequently used for multi-faceted languagegeneration and evaluation tasks that involve satisfying intricate userconstraints or taking into account multiple aspects and criteria. However,their performance can fall short, due to the model's lack of coherence andinability to plan and decompose the problem. We propose Branch-Solve-Merge(BSM), a Large Language Model program (Schlag et al., 2023) for tackling suchchallenging natural language tasks. It consists of branch, solve, and mergemodules that are parameterized with specific prompts to the base LLM. Thesethree modules plan a decomposition of the task into multiple parallelsub-tasks, independently solve them, and fuse the solutions to the sub-tasks.We apply our method to the tasks of LLM response evaluation and constrainedtext generation and evaluate its effectiveness with multiple LLMs, includingVicuna, LLaMA-2-chat, and GPT-4. BSM improves the evaluation correctness andconsistency for each LLM by enhancing human-LLM agreement by up to 26%,reducing length and pairwise position biases by up to 50%, and allowingLLaMA-2-chat to match or outperform GPT-4 on most domains. On the constraintstory generation task, BSM improves the coherence of the stories while alsoimproving constraint satisfaction by 12%.

Translation (by gpt-3.5-turbo)

大規模言語モデル（LLMs）は、複雑なユーザーの制約を満たしたり、複数の側面や基準を考慮したりする多面的な言語生成および評価タスクに頻繁に使用されます。しかし、モデルの一貫性の欠如や問題の計画と分解のできなさにより、性能が不十分になることがあります。本研究では、このような難解な自然言語タスクに取り組むための大規模言語モデルプログラム（Schlag et al.、2023）であるBranch-Solve-Merge（BSM）を提案します。BSMは、ブランチ、ソルブ、マージの3つのモジュールから構成されており、それぞれ特定のプロンプトを基本のLLMにパラメータ化しています。これらの3つのモジュールは、タスクを複数の並列サブタスクに分解し、それぞれを独立して解決し、サブタスクの解決策を統合します。私たちは、LLMの応答評価と制約付きテキスト生成のタスクに私たちの手法を適用し、Vicuna、LLaMA-2-chat、およびGPT-4を含む複数のLLMでその効果を評価しました。BSMは、人間とLLMの一致を最大26%向上させることにより、評価の正確性と一貫性を向上させ、長さとペアワイズの位置のバイアスを最大50%削減し、LLaMA-2-chatがほとんどのドメインでGPT-4と同等またはそれ以上のパフォーマンスを発揮できるようにしました。制約ストーリー生成タスクでは、BSMはストーリーの一貫性を向上させると同時に、制約の満足度を12%向上させました。

Summary (by gpt-3.5-turbo)

本研究では、多面的な言語生成および評価タスクにおいて、大規模言語モデルプログラム（BSM）を提案します。BSMは、ブランチ、ソルブ、マージの3つのモジュールから構成され、タスクを複数のサブタスクに分解し、独立して解決し、解決策を統合します。実験により、BSMが評価の正確性と一貫性を向上させ、パフォーマンスを向上させることが示されました。

AkihikoWatanabe added the Pocket label Oct 25, 2023

AkihikoWatanabe changed the title あ Branch-Solve-Merge Improves Large Language Model Evaluation and Generation, Swarnadeep Saha+, N/A, arXiv'23 Oct 25, 2023

AkihikoWatanabe added NLP LanguageModel Evaluation labels Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation, Swarnadeep Saha+, N/A, arXiv'23 #1088

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation, Swarnadeep Saha+, N/A, arXiv'23 #1088

AkihikoWatanabe commented Oct 25, 2023 •

edited

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation, Swarnadeep Saha+, N/A, arXiv'23 #1088

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation, Swarnadeep Saha+, N/A, arXiv'23 #1088

Comments

AkihikoWatanabe commented Oct 25, 2023 • edited

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Oct 25, 2023 •

edited