Evolutionary Optimization of Model Merging Recipes, Takuya Akiba+, N/A, arXiv'24 #1257

AkihikoWatanabe · 2024-03-21T11:44:31Z

URL

https://arxiv.org/abs/2403.13187

Affiliations

Takuya Akiba, N/A
Makoto Shing, N/A
Yujin Tang, N/A
Qi Sun, N/A
David Ha, N/A

Abstract

We present a novel application of evolutionary algorithms to automate thecreation of powerful foundation models. While model merging has emerged as apromising approach for LLM development due to its cost-effectiveness, itcurrently relies on human intuition and domain knowledge, limiting itspotential. Here, we propose an evolutionary approach that overcomes thislimitation by automatically discovering effective combinations of diverseopen-source models, harnessing their collective intelligence without requiringextensive additional training data or compute. Our approach operates in bothparameter space and data flow space, allowing for optimization beyond just theweights of the individual models. This approach even facilitates cross-domainmerging, generating models like a Japanese LLM with Math reasoningcapabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-artperformance on a variety of established Japanese LLM benchmarks, evensurpassing models with significantly more parameters, despite not beingexplicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLMgenerated through our approach demonstrates its effectiveness in describingJapanese culture-specific content, outperforming previous Japanese VLMs. Thiswork not only contributes new state-of-the-art models back to the open-sourcecommunity, but also introduces a new paradigm for automated model composition,paving the way for exploring alternative, efficient approaches to foundationmodel development.

Translation (by gpt-3.5-turbo)

進化アルゴリズムを用いた新しいアプローチを提案し、強力な基盤モデルの自動生成を実現します。LLMの開発において、モデルの統合は費用対効果が高いため有望な手法として登場していますが、現在は人間の直感とドメイン知識に依存しており、その潜在能力が制限されています。本研究では、多様なオープンソースモデルの効果的な組み合わせを自動的に発見し、追加の大量のトレーニングデータや計算を必要とせずに、それらの集合知を活用する進化的アプローチを提案します。このアプローチは、パラメータ空間とデータフロー空間の両方で動作し、個々のモデルの重みだけでなく、最適化を可能にします。このアプローチは、日本語のLLMと数学推論能力を持つモデルなど、異なるドメイン間の統合さえ容易にしました。驚くべきことに、日本語数学LLMは、明示的にそのようなタスクのためにトレーニングされていないにもかかわらず、多くのパラメータを持つモデルを上回り、様々な確立された日本語LLMベンチマークで最先端のパフォーマンスを達成しました。さらに、当該アプローチによって生成された文化に敏感な日本語VLMは、従来の日本語VLMを上回る日本独自のコンテンツの記述においてその効果を示しました。この研究は、最新のモデルをオープンソースコミュニティに還元するだけでなく、自動モデル構成の新しいパラダイムを導入し、基盤モデル開発における代替的で効率的なアプローチの探索の道を開いています。

Summary (by gpt-3.5-turbo)

進化アルゴリズムを使用した新しいアプローチを提案し、強力な基盤モデルの自動生成を実現。LLMの開発において、人間の直感やドメイン知識に依存せず、多様なオープンソースモデルの効果的な組み合わせを自動的に発見する。このアプローチは、日本語のLLMと数学推論能力を持つモデルなど、異なるドメイン間の統合を容易にし、日本語VLMの性能向上にも貢献。オープンソースコミュニティへの貢献と自動モデル構成の新しいパラダイム導入により、基盤モデル開発における効率的なアプローチを模索。

AkihikoWatanabe · 2024-03-27T08:49:37Z

複数のLLMを融合するモデルマージの話。日本語LLMと英語の数学LLNをマージさせることで日本語の数学性能を大幅に向上させたり、LLMとVLMを融合したりすることで、日本にしか存在しない概念の画像も、きちんと回答できるようになる。

著者スライドによると、従来のモデルマージにはbase modelが同一でないとうまくいかなかったり（重みの線型結合によるモデルマージ）、パラメータが増減したり（複数LLMのLayerを重みは弄らず再配置する）。また日本語LLMに対してモデルマージを実施しようとすると、マージ元のLLMが少なかったり、広範囲のモデルを扱うとマージがうまくいかない、といった課題があった。本研究ではこれら課題を解決できる。

AkihikoWatanabe · 2024-04-23T08:49:31Z

著者による資料（NLPコロキウム）:
https://speakerdeck.com/iwiwi/17-nlpkorokiumu

AkihikoWatanabe added the Pocket label Mar 21, 2024

AkihikoWatanabe changed the title あ Evolutionary Optimization of Model Merging Recipes, Takuya Akiba+, N/A, arXiv'24 Mar 21, 2024

AkihikoWatanabe added the action_wanted label Mar 27, 2024

AkihikoWatanabe added ComputerVision NLP LanguageModel labels Apr 23, 2024

AkihikoWatanabe mentioned this issue Apr 29, 2024

mergekit-evolve #1298

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evolutionary Optimization of Model Merging Recipes, Takuya Akiba+, N/A, arXiv'24 #1257

Evolutionary Optimization of Model Merging Recipes, Takuya Akiba+, N/A, arXiv'24 #1257

AkihikoWatanabe commented Mar 21, 2024 •

edited

AkihikoWatanabe commented Mar 27, 2024 •

edited

AkihikoWatanabe commented Apr 23, 2024 •

edited

Evolutionary Optimization of Model Merging Recipes, Takuya Akiba+, N/A, arXiv'24 #1257

Evolutionary Optimization of Model Merging Recipes, Takuya Akiba+, N/A, arXiv'24 #1257

Comments

AkihikoWatanabe commented Mar 21, 2024 • edited

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Mar 27, 2024 • edited

AkihikoWatanabe commented Apr 23, 2024 • edited

AkihikoWatanabe commented Mar 21, 2024 •

edited

AkihikoWatanabe commented Mar 27, 2024 •

edited

AkihikoWatanabe commented Apr 23, 2024 •

edited