MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning, Xiang Yue+, N/A, arXiv'23 #1050

AkihikoWatanabe · 2023-09-30T09:39:04Z

URL

https://arxiv.org/abs/2309.05653

Affiliations

Xiang Yue, N/A
Xingwei Qu, N/A
Ge Zhang, N/A
Yao Fu, N/A
Wenhao Huang, N/A
Huan Sun, N/A
Yu Su, N/A
Wenhu Chen, N/A

Abstract

We introduce MAmmoTH, a series of open-source large language models (LLMs)specifically tailored for general math problem-solving. The MAmmoTH models aretrained on MathInstruct, our meticulously curated instruction tuning dataset.MathInstruct is compiled from 13 math datasets with intermediate rationales,six of which have rationales newly curated by us. It presents a unique hybridof chain-of-thought (CoT) and program-of-thought (PoT) rationales, and alsoensures extensive coverage of diverse fields in math. The hybrid of CoT and PoTnot only unleashes the potential of tool use but also allows different thoughtprocesses for different math problems. As a result, the MAmmoTH seriessubstantially outperform existing open-source models on nine mathematicalreasoning datasets across all scales with an average accuracy gain between 13%and 29%. Remarkably, our MAmmoTH-7B model reaches 35% on MATH (acompetition-level dataset), which exceeds the best open-source 7B model(WizardMath) by 25%, and the MAmmoTH-34B model achieves 46% accuracy on MATH,even surpassing GPT-4's CoT result. Our work underscores the importance ofdiverse problem coverage and the use of hybrid rationales in developingsuperior math generalist models.

Translation (by gpt-3.5-turbo)

私たちは、一般的な数学の問題解決に特化したオープンソースの大規模言語モデル（LLMs）であるMAmmoTHを紹介します。
MAmmoTHモデルは、厳密にキュレーションされたMathInstructという教育チューニングデータセットで訓練されています。
MathInstructは、中間の根拠を持つ13の数学データセットから編集されており、そのうち6つは私たちによって新たにキュレーションされたものです。
MathInstructは、CoT（思考の連鎖）とPoT（思考のプログラム）のハイブリッドな根拠を提供し、さまざまな数学の分野を包括的にカバーしています。
CoTとPoTのハイブリッドは、ツールの使用の可能性を引き出すだけでなく、異なる数学の問題に対して異なる思考プロセスを可能にします。
その結果、MAmmoTHシリーズは、すべてのスケールの9つの数学的推論データセットで既存のオープンソースモデルを大幅に上回り、平均的な精度の向上率は13％から29％です。
特筆すべきことに、MAmmoTH-7BモデルはMATH（競技レベルのデータセット）で35％に達し、最高のオープンソース7Bモデル（WizardMath）を25％上回ります。
また、MAmmoTH-34BモデルはMATHで46％の精度を達成し、GPT-4のCoTの結果をも上回ります。
私たちの研究は、多様な問題のカバレッジとハイブリッドな根拠の使用が優れた数学の総合モデルの開発において重要であることを強調しています。

Summary (by gpt-3.5-turbo)

MAmmoTHは、数学の問題解決に特化した大規模言語モデルであり、厳密にキュレーションされた教育データセットで訓練されています。このモデルは、CoTとPoTのハイブリッドな根拠を提供し、さまざまな数学の分野を包括的にカバーしています。MAmmoTHは、既存のオープンソースモデルを大幅に上回り、特にMATHデータセットで高い精度を示しています。この研究は、多様な問題のカバレッジとハイブリッドな根拠の使用の重要性を強調しています。

AkihikoWatanabe · 2023-09-30T09:42:12Z

9つのmath reasoningが必要なデータセットで13-29%のgainでSoTAを達成。
260kの根拠情報を含むMath Instructデータでチューニングされたモデル。

project page: https://tiger-ai-lab.github.io/MAmmoTH/

AkihikoWatanabe added the Pocket label Sep 30, 2023

AkihikoWatanabe changed the title あ MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning, Xiang Yue+, N/A, arXiv'23 Sep 30, 2023

AkihikoWatanabe added NLP Dataset LanguageModel NumericReasoning InstructionTuning Mathematics labels Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning, Xiang Yue+, N/A, arXiv'23 #1050

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning, Xiang Yue+, N/A, arXiv'23 #1050

AkihikoWatanabe commented Sep 30, 2023 •

edited

AkihikoWatanabe commented Sep 30, 2023

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning, Xiang Yue+, N/A, arXiv'23 #1050

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning, Xiang Yue+, N/A, arXiv'23 #1050

Comments

AkihikoWatanabe commented Sep 30, 2023 • edited

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Sep 30, 2023

AkihikoWatanabe commented Sep 30, 2023 •

edited