Reasoning with Language Model is Planning with World Model, Shibo Hao+, N/A, arXiv'23 #1095

AkihikoWatanabe · 2023-10-27T04:01:58Z

URL

https://arxiv.org/abs/2305.14992

Affiliations

Shibo Hao, N/A
Yi Gu, N/A
Haodi Ma, N/A
Joshua Jiahua Hong, N/A
Zhen Wang, N/A
Daisy Zhe Wang, N/A
Zhiting Hu, N/A

Abstract

Large language models (LLMs) have shown remarkable reasoning capabilities,especially when prompted to generate intermediate reasoning steps (e.g.,Chain-of-Thought, CoT). However, LLMs can still struggle with problems that areeasy for humans, such as generating action plans for executing tasks in a givenenvironment, or performing complex math, logical, and commonsense reasoning.The deficiency stems from the key fact that LLMs lack an internal$\textit{world model}$ to predict the world $\textit{state}$ (e.g., environmentstatus, intermediate variable values) and simulate long-term outcomes ofactions. This prevents LLMs from performing deliberate planning akin to humanbrains, which involves exploring alternative reasoning paths, anticipatingfuture states and rewards, and iteratively refining existing reasoning steps.To overcome the limitations, we propose a new LLM reasoning framework,$\underline{R}$easoning vi$\underline{a}$ $\underline{P}$lanning$\textbf{(RAP)}$. RAP repurposes the LLM as both a world model and a reasoningagent, and incorporates a principled planning algorithm (based on Monto CarloTree Search) for strategic exploration in the vast reasoning space. Duringreasoning, the LLM (as agent) incrementally builds a reasoning tree under theguidance of the LLM (as world model) and task-specific rewards, and obtains ahigh-reward reasoning path efficiently with a proper balance betweenexploration $\textit{vs.}$ exploitation. We apply RAP to a variety ofchallenging reasoning problems including plan generation, math reasoning, andlogical inference. Empirical results on these tasks demonstrate the superiorityof RAP over various strong baselines, including CoT and least-to-most promptingwith self-consistency. RAP on LLAMA-33B surpasses CoT on GPT-4 with 33%relative improvement in a plan generation setting.

Translation (by gpt-3.5-turbo)

大規模言語モデル（LLMs）は、中間の推論ステップ（Chain-of-Thought、CoTなど）を生成するように促された場合、特に顕著な推論能力を示しています。しかし、LLMsは、与えられた環境でのタスクの実行のためのアクションプランの生成や、複雑な数学的、論理的、常識的な推論など、人間にとって容易な問題にも苦労することがあります。その欠点は、LLMsが世界の状態（環境の状態、中間変数の値など）を予測し、アクションの長期的な結果をシミュレートするための内部の「ワールドモデル」を欠いているという事実に起因しています。これにより、LLMsは、人間の脳と同様に意図的な計画を実行することができず、代替の推論パスを探索し、将来の状態と報酬を予測し、既存の推論ステップを反復的に改善するということができません。これらの制限を克服するために、私たちは新しいLLM推論フレームワーク、$\underline{R}$easoning vi$\underline{a}$ $\underline{P}$lanning $\textbf{(RAP)}$を提案します。RAPは、LLMを世界モデルと推論エージェントの両方として再利用し、広大な推論空間での戦略的な探索のための原則に基づいた計画アルゴリズム（Monto Carlo Tree Searchに基づく）を組み込みます。推論中、LLM（エージェントとして）は、LLM（ワールドモデルとして）とタスク固有の報酬の指導の下で推論ツリーを段階的に構築し、探索と活用の適切なバランスを保ちながら効率的に高報酬の推論パスを得ます。私たちは、RAPをプラン生成、数学的推論、論理的推論など、さまざまな困難な推論問題に適用しました。これらのタスクにおける実証的な結果は、CoTや自己整合性を持つ最小から最大までのプロンプティングなど、さまざまな強力なベースラインに比べてRAPの優位性を示しています。LLAMA-33B上のRAPは、プラン生成の設定でGPT-4上のCoTを33％相対的に改善しました。

Summary (by gpt-3.5-turbo)

大規模言語モデル（LLMs）は、推論能力において顕著な成果を上げていますが、複雑な推論には苦労しています。これは、LLMsが内部の「ワールドモデル」を持たず、計画を実行する能力が制限されているためです。そこで、私たちはRAPという新しいLLM推論フレームワークを提案しました。RAPは、LLMを世界モデルと推論エージェントの両方として再利用し、計画アルゴリズムを組み込むことで、戦略的な探索を行います。実験結果は、RAPの優位性を示しています。

AkihikoWatanabe added the Pocket label Oct 27, 2023

AkihikoWatanabe changed the title あ Reasoning with Language Model is Planning with World Model, Shibo Hao+, N/A, arXiv'23 Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reasoning with Language Model is Planning with World Model, Shibo Hao+, N/A, arXiv'23 #1095

Reasoning with Language Model is Planning with World Model, Shibo Hao+, N/A, arXiv'23 #1095

AkihikoWatanabe commented Oct 27, 2023 •

edited

Reasoning with Language Model is Planning with World Model, Shibo Hao+, N/A, arXiv'23 #1095

Reasoning with Language Model is Planning with World Model, Shibo Hao+, N/A, arXiv'23 #1095

Comments

AkihikoWatanabe commented Oct 27, 2023 • edited

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Oct 27, 2023 •

edited