-
Notifications
You must be signed in to change notification settings - Fork 246
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
AnyMoE: Build an MoE model from anything, quickly (#476)
* Add gating layer and some infrastructure * Add traits to the pipeline * Add the training loop * Remove unused * Add loader and pipeline * Complete merge * Move method * Add a default for anymoeconfig * Add training support * Expose in toml selector * Inject anymoe layers * Load pretraining dataset from csv * Run the training * Add a csv file * Add default dtype option * Fix lin varmap * Add some debugs and fix * Template it * Fix assert condition * To scalar * Take cached outputs * It doesn't oom * Remove debugs * Nice progress bar * Nice progress bar * Add anymoe support to plain models * Add get mlps and get mlps mut layers * Check if supported * Clippy and slightly more info * Load the mlps into vbs * Add support for loading experts * Update toml selector * Remove deadlock * Add support for selecting only certain layers * Default * Fix off by one * Handle it correctly * Check if is moe layer * Update csv training set and add moe layers to toml * Remove unnecessary training infra * Done training * Fix trainable params calculation * More consistent naming * Add support for loading from lora experts * Fix the toml files * Get in and out dims * Add amoe support for gemma2 * More info * Complete merge * Add some docs * Add example adapter * Handle target modules * Fix toml selector * Add topk * Correctly gate * Correctly gate * Fix clippy * Fix training * Fix typos * Fix scale * Use the zephyr lora adapter * Change the base moel * Typo * Add to the python api * Update the type stubs * Update readme * Add some examples * Add some examples and docs * Update docs * Update example to use layers * Take into account silent loading * Clippy * Update example lora model id * Update target modules * Remove multiple target modules from examples * Update readme * Update readme * Missed anymoe lora rust example
- Loading branch information
1 parent
21ed180
commit a3c8eaa
Showing
58 changed files
with
3,369 additions
and
215 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,206 @@ | ||
# AnyMoE: Build an MoE model from anything, quickly | ||
|
||
AnyMoE is technique to dynamically and efficiently create MoE models. By providing a set of experts and a small pretraining dataset, you can create an MoE locally! | ||
|
||
It has the following features: | ||
- Apply AnyMoE to any supported model | ||
- `plain` | ||
- Specify the layers to apply AnyMoE to for efficient training | ||
|
||
## Dataset | ||
Currently, AnyMoE expects a CSV dataset with 2 columns: `prompt` and `expert`. For example: | ||
```csv | ||
prompt,expert | ||
Discuss the impact of Renaissance art on modern aesthetics,0 | ||
Explain the significance of the theory of relativity in modern physics,1 | ||
Analyze the themes of existentialism in 20th-century literature,0 | ||
Describe the process of photosynthesis and its importance to ecosystems,1 | ||
Evaluate the role of classical music in contemporary film scores,0 | ||
Outline the steps of the scientific method and their importance in experiments,1 | ||
Compare and contrast the philosophies of Socrates and Nietzsche,0 | ||
Discuss the ethical implications of artificial intelligence in society,1 | ||
Interpret the symbolism in Salvador Dalí's paintings,0 | ||
Describe the function and structure of DNA in genetic inheritance,1 | ||
``` | ||
|
||
## Experts | ||
AnyMoE experts can be either fine-tuned models or LoRA adapter models. Only the mlp layers will be loaded from each. The experts must be homogeneous: they must be all fine-tuned or all adapter. Additionally, certain layers can be specified to apply AnyMoE. | ||
|
||
> Note: When using LoRA adapter experts, it may not be necessary to set the layers where AnyMoE will be applied due to the lower memory usage. | ||
### Example of TOML selector with fine-tuned experts | ||
```toml | ||
[model] | ||
model_id = "mistralai/Mistral-7B-Instruct-v0.1" | ||
arch = "mistral" | ||
|
||
[anymoe] | ||
dataset_csv = "examples/amoe.csv" | ||
prefix = "model.layers" | ||
mlp = "mlp" | ||
model_ids = ["HuggingFaceH4/zephyr-7b-beta"] | ||
layers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] | ||
|
||
[anymoe.config] | ||
hidden_size = 4096 | ||
expert_type = "fine_tuned" | ||
``` | ||
|
||
### Example of TOML selector with LoRA adapter experts | ||
```toml | ||
[model] | ||
model_id = "HuggingFaceH4/zephyr-7b-beta" | ||
arch = "mistral" | ||
|
||
[anymoe] | ||
dataset_csv = "examples/amoe.csv" | ||
prefix = "model.layers" | ||
mlp = "mlp" | ||
model_ids = ["EricB/example_adapter"] | ||
layers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] | ||
|
||
[anymoe.config] | ||
hidden_size = 4096 | ||
|
||
[anymoe.config.expert_type.lora_adapter] | ||
rank = 16 | ||
alpha = 16 | ||
target_modules = ["gate_proj"] | ||
``` | ||
|
||
## Examples | ||
|
||
## `mistralrs-server` | ||
|
||
CLI usage is via the [TOML selector](TOML_SELECTOR.md#anymoe) where you can also find docs on the required fields. | ||
|
||
For example, to use the demo fine-tuned expert: | ||
``` | ||
./mistralrs_server -i toml -f toml-selectors/anymoe.toml | ||
``` | ||
|
||
To use the demo LoRA expert: | ||
``` | ||
./mistralrs_server -i toml -f toml-selectors/anymoe_lora.toml | ||
``` | ||
|
||
## Python example | ||
```py | ||
from mistralrs import ( | ||
Runner, | ||
Which, | ||
ChatCompletionRequest, | ||
Architecture, | ||
AnyMoeConfig, | ||
AnyMoeExpertType, | ||
) | ||
|
||
runner = Runner( | ||
which=Which.Plain( | ||
model_id="mistralai/Mistral-7B-Instruct-v0.1", | ||
tokenizer_json=None, | ||
repeat_last_n=64, | ||
arch=Architecture.Mistral, | ||
), | ||
anymoe_config=AnyMoeConfig( | ||
hidden_size=4096, | ||
dataset_csv="examples/amoe.csv", | ||
prefix="model.layers", | ||
mlp="mlp", | ||
expert_type=AnyMoeExpertType.FineTuned(), | ||
lr=1e-3, | ||
epochs=100, | ||
batch_size=4, | ||
model_ids=["HuggingFaceH4/zephyr-7b-beta"], | ||
), | ||
) | ||
|
||
res = runner.send_chat_completion_request( | ||
ChatCompletionRequest( | ||
model="mistral", | ||
messages=[ | ||
{"role": "user", "content": "Tell me a story about the Rust type system."} | ||
], | ||
max_tokens=256, | ||
presence_penalty=1.0, | ||
top_p=0.1, | ||
temperature=0.1, | ||
) | ||
) | ||
print(res.choices[0].message.content) | ||
print(res.usage) | ||
``` | ||
|
||
## Rust API | ||
```rust | ||
use either::Either; | ||
use indexmap::IndexMap; | ||
use std::sync::Arc; | ||
use tokio::sync::mpsc::channel; | ||
|
||
use mistralrs::{ | ||
AnyMoeConfig, AnyMoeExpertType, AnyMoeLoader, Constraint, Device, DeviceMapMetadata, Loader, | ||
MistralRs, MistralRsBuilder, ModelDType, NormalLoaderBuilder, NormalLoaderType, NormalRequest, | ||
NormalSpecificConfig, Request, RequestMessage, Response, Result, SamplingParams, | ||
SchedulerMethod, TokenSource, | ||
}; | ||
|
||
/// Gets the best device, cpu, cuda if compiled with CUDA | ||
pub(crate) fn best_device() -> Result<Device> { | ||
#[cfg(not(feature = "metal"))] | ||
{ | ||
Device::cuda_if_available(0) | ||
} | ||
#[cfg(feature = "metal")] | ||
{ | ||
Device::new_metal(0) | ||
} | ||
} | ||
|
||
fn setup() -> anyhow::Result<Arc<MistralRs>> { | ||
// Select a Mistral model | ||
let loader = NormalLoaderBuilder::new( | ||
NormalSpecificConfig { | ||
use_flash_attn: false, | ||
repeat_last_n: 64, | ||
}, | ||
None, | ||
None, | ||
Some("mistralai/Mistral-7B-Instruct-v0.1".to_string()), | ||
) | ||
.build(NormalLoaderType::Mistral); | ||
let loader: Box<dyn Loader> = Box::new(AnyMoeLoader { | ||
target: loader, | ||
config: AnyMoeConfig { | ||
hidden_size: 4096, | ||
lr: 1e-3, | ||
epochs: 100, | ||
batch_size: 4, | ||
expert_type: AnyMoeExpertType::LoraAdapter { | ||
rank: 64, | ||
alpha: 16., | ||
target_modules: vec![ | ||
"gate_proj".to_string(), | ||
], | ||
}, | ||
}, | ||
prefix: "model.layers".to_string(), | ||
mlp: "mlp".to_string(), | ||
path: "examples/amoe.csv".to_string(), | ||
model_ids: vec!["typeof/zephyr-7b-beta-lora".to_string()], | ||
layers: vec![], | ||
}); | ||
// Load, into a Pipeline | ||
let pipeline = loader.load_model_from_hf( | ||
None, | ||
TokenSource::CacheToken, | ||
&ModelDType::Auto, | ||
&best_device()?, | ||
false, | ||
DeviceMapMetadata::dummy(), | ||
None, | ||
)?; | ||
// Create the MistralRs, which is a runner | ||
Ok(MistralRsBuilder::new(pipeline, SchedulerMethod::Fixed(5.try_into().unwrap())).build()) | ||
} | ||
``` |
Oops, something went wrong.