Measuring Faithfulness in Chain-of-Thought Reasoning, Anthropic, 2023 #896

AkihikoWatanabe · 2023-07-23T01:58:54Z

https://www.anthropic.com/index/measuring-faithfulness-in-chain-of-thought-reasoning

AkihikoWatanabe · 2023-07-23T01:59:20Z

Large language models (LLMs) perform better when they produce step-by-step, “Chain-ofThought” (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model’s actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT’s performance boost does not seem to come from CoT’s added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning on most tasks we study. Overall, our results suggest that CoT can be faithful if the circumstances such as the model size and task are carefully chosen.

Translation (by gpt-3.5-turbo)

大規模言語モデル（LLMs）は、質問に答える前にステップバイステップの「Chain-of-Thought」（CoT）推論を生成すると、より良いパフォーマンスを発揮しますが、その推論がモデルの実際の推論（つまり、質問に答えるためのプロセス）の忠実な説明であるかどうかは明確ではありません。私たちは、CoT推論がどのように忠実でないかの仮説を調査し、CoTに介入することでモデルの予測がどのように変化するかを調べることでそれを行います（例：間違いを追加したり、言い換えたりすることで）。モデルは、回答を予測する際にCoTにどれだけ強く依存するか、タスクごとに大きなバリエーションを示し、時にはCoTに大きく依存し、他の時には主に無視します。CoTのパフォーマンス向上は、CoTの追加のテスト時の計算だけからではなく、CoTの特定の表現を介してエンコードされた情報からも来るようには思われません。モデルがより大きく、より能力が高くなるにつれて、私たちが研究するほとんどのタスクで、より忠実な推論を生成します。全体的に、私たちの結果は、モデルのサイズやタスクなどの状況が慎重に選ばれている場合、CoTは忠実である可能性があることを示唆しています。

Summary (by gpt-3.5-turbo)

大規模言語モデル（LLMs）は、Chain-of-Thought（CoT）推論を生成することで質問に答える性能を向上させるが、その推論が実際の推論を忠実に表しているかは不明である。本研究では、CoT推論の忠実さを調査し、CoTに介入することでモデルの予測がどのように変化するかを調べる。結果は、モデルのサイズやタスクによってCoTの忠実さが異なることを示唆している。

AkihikoWatanabe added the translation_required label Jul 23, 2023

AkihikoWatanabe changed the title あ Measuring Faithfulness in Chain-of-Thought Reasoning, Anthropic Jul 23, 2023

AkihikoWatanabe changed the title ~~Measuring Faithfulness in Chain-of-Thought Reasoning, Anthropic~~ Measuring Faithfulness in Chain-of-Thought Reasoning, Anthropic, 2023 Jul 23, 2023

AkihikoWatanabe added NLP LanguageModel CoT Faithfulness labels Oct 21, 2023

AkihikoWatanabe mentioned this issue Oct 22, 2023

SCOTT: Self-Consistent Chain-of-Thought Distillation, ACL'23 #829

Open

AkihikoWatanabe added the Prompting label Nov 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measuring Faithfulness in Chain-of-Thought Reasoning, Anthropic, 2023 #896

Measuring Faithfulness in Chain-of-Thought Reasoning, Anthropic, 2023 #896

AkihikoWatanabe commented Jul 23, 2023

AkihikoWatanabe commented Jul 23, 2023 •

edited

Measuring Faithfulness in Chain-of-Thought Reasoning, Anthropic, 2023 #896

Measuring Faithfulness in Chain-of-Thought Reasoning, Anthropic, 2023 #896

Comments

AkihikoWatanabe commented Jul 23, 2023

AkihikoWatanabe commented Jul 23, 2023 • edited

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Jul 23, 2023 •

edited