You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We present a novel application of evolutionary algorithms to automate thecreation of powerful foundation models. While model merging has emerged as apromising approach for LLM development due to its cost-effectiveness, itcurrently relies on human intuition and domain knowledge, limiting itspotential. Here, we propose an evolutionary approach that overcomes thislimitation by automatically discovering effective combinations of diverseopen-source models, harnessing their collective intelligence without requiringextensive additional training data or compute. Our approach operates in bothparameter space and data flow space, allowing for optimization beyond just theweights of the individual models. This approach even facilitates cross-domainmerging, generating models like a Japanese LLM with Math reasoningcapabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-artperformance on a variety of established Japanese LLM benchmarks, evensurpassing models with significantly more parameters, despite not beingexplicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLMgenerated through our approach demonstrates its effectiveness in describingJapanese culture-specific content, outperforming previous Japanese VLMs. Thiswork not only contributes new state-of-the-art models back to the open-sourcecommunity, but also introduces a new paradigm for automated model composition,paving the way for exploring alternative, efficient approaches to foundationmodel development.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: