refactor: GGUF + GGML Loaders with `ModelKind` #356

polarathene · 2024-05-28T15:32:15Z

This is mostly where I wanted to get it functionally. Still plenty more to do, but there is a lot of churn going on lately that it's starting to introduce conflicts when rebasing, so I'd rather get this through review.

I'll provide more context in about 12 hours, but the commit history is iterative with detailed commit messages on changes. Some of it became redundant and replaced, but diff may be easier to follow if helpful.

ModelKind workarounds dropped, so the newer version is now used as requested.
A similar workaround is on the model from_gguf() and from_ggml() methods, I can address that in a follow-up PR. These methods now use a trait that needs to be imported to use them.
Some potential bugs spotted and others fixed.

I'll handle the clippy and rustfmt feedback when I return 😓

These methods are very verbose, but really only differ by two params to differentiate from Lora vs XLora.

`ModelConfig` groups the common properties used across the `from_gguf()` methods. This will better communicate differences across impl of `from_gguf()`. The quantized xlora models `from_gguf()` now have a prop to param forwarder as a workaround to minimize breakage elsewhere.

Very similar to the `from_gguf`, except only `quantized_llama.rs` xlora supports this. No `Device` params, slightly different `File` params from GGUF type.

Finally, all this extra boilerplate can be shifted into the model config `Adapter` struct to self-contain in a single method. This required adjusting ownership a little to satisfy the compiler. The original `from_gguf()` and `from_ggml()` methods are unaffected, they still receive the expected params as reference.

This introduces a slight breaking change, in that using these `from_gguf()` / `from_ggml()` methods now requires importing the trait into scope. The methods drop the `pub` prefix as they inherit `pub` from the trait requirement itself. The purpose of this trait is to not require each model to duplicate the structs to params mapping helper method. Instead that can be centralized.

These no longer need to be maintained as copies within the supported model modules. They now leverage the common shared traits and take an annotated type parameter to handle. The syntax for usage is presently a little more verbose than desired.

For reference, these alternatives could be considered.

- Impl traits for the non-adapter quant models - Since adapter variants only append parameters and that is now a distinct struct, `model_config` can be defined earlier and a helper `with_adapter()` can convert to the adapter type variant.

With a rebase to adopt new methods for `ModelKind`, the shared logic can be hoisted out of the match arms. XLora specific variables were moved into `Adapter::try_new()` (`model_config.rs`) as they can share the same `paths` parameter by adding a separate bool to toggle x-lora usage. By hoisting `model_config` variable out of the match arm, the type would change when calling `with_adapter()`, thus to prevent that the separate `Adapter*` tuple structs have been dropped in favor of `ModelParams<Q>` which uses generic `Q` for the quantization type (trait marker) and a separate adapter optional that can be updated. `MapParamsToModel<T>` also is no longer compatible as an approach since the trait bound is ambiguous as there is no distinct adapter type (eg: `for AdapterGGUF`) to impl upon, unique method names also become required to avoid conflict on the same type. - `try_into_model()` + `try_into_model_with_adapter()` for the separate `Q` types (GGUF/GGML) with/without adapter. - Due to new struct introduced, slight change to destructuring. The `impl` also bundles both methods now for each `Q` variant. Order was adjusted to basic followed by adapter methods for each `Q` variant, instead of both basic, then both adapter variations following afterwards. - Likewise the `ggml.rs` and `gguf.rs` methods without the `MapParamsToModel<T>` trait now rely on `TryFrom` trait impl.

This approach introduces another generic parameter `MaybeAdapter` to emulate a `Option<Adapter>` that can be used as type to `impl` upon. To continue the unified type usage with an adapter variant in `ggml.rs` / `gguf.rs` pipelines, this must now leverage an enum for the two variants. - Slightly more complexity added as a result. - Adapter `try_into_model()` methods no longer need to check for `Some(Adapter)` to unwrap, since that should always be the case. This is now guaranteed. - However similar logic has bubbled up to the `TryFrom` for all impl due to the enum wrapper, thus this approach may not be much better beyond broader consistency. Likewise with the `with_adapter()` method. To minimize boilerplate in handling unwrapping of the enum in the `TryFrom` methods, `Variantly` has been introduced for it's `expect_variant()` method. As all four types are distinct, the `_with_adapter()` method can also be `try_into_model()` due to separate impl for the new generic param `MaybeAdapter`.

Since the type constraint for `try_into_model()` methods is bound as the return type, it can be inferred without any hint in the `TryFrom`, no need to annotate with `Self`. Use `derive_more` for terser construction of `Config` struct for `ModelParams` variants.

This is an alternative approach to build the config. Construction of the config from the param inputs is handled at the end now, not dependent upon separate `new()` + optional `with_adapter()` calls on a mutable variable. Unfortunately `buildstructor` and `typed-builder` APIs don't allow for easy flexibility of builder methods in different scopes (_due to moves_). `derive-builder` can do this but not with the more complex types due to lack of a `Copy` / `Clone`. Thus the `None` option is required as input regardless of if an adapter is needed.

This better communicates the block is only relevant to assigning this value. While the two `is_lora` / `is_xlora` variables are hoisted above due to usage later as metadata inputs.

`pipeline/gguf.rs` + `pipeline/ggml.rs` now ensure that `activate_adapters()` works for X-LoRA too. This is assumed as a bugfix due to the `XLoraLlama` model the two adapter kinds share along with code everywhere else checking `is_xlora`, no other usage of `is_lora` seems to be used. - To ensure further ambiguity is avoided, the condition is better communicated as `has_adapter`. - It is unclear if all usage of `is_xlora` is specific to X-LoRA or also intended to be applicable to LoRA since `XLora*` models do impl `is_xlora() -> true` (except Gemma, which is a potential bug). `pipeline/normal.rs` handled it's own `is_xlora` bool differently than `gguf.rs` / `ggml.rs` loaders. - It relied upon`model.is_xlora() && !is_lora`, but we already assume X-LoRA via prior matching on `ModelKind` which now provides this information via it's own `is_x_lora()` method. - Only `xlora_models/gemma.rs` would behave differently with this change, but Gemma might have meant to return `true`?

Matches are only for `Quantized` or `AdapterQuantized` variants with no difference in handling by `AdapterKind` variant used. Additionally restores the `GGUF X-LoRA` bail formatted string. For consistency the non-adapter branch also appends `for GGUF` and the architecture in the lora branch now comes before the `ModelKind`.

A better approach for the most part at encoding the kind info.

github-actions · 2024-05-28T15:33:20Z

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                    9           21           21            0            0
 Python                 21          741          622           21           98
 TOML                   15          393          355            1           37
-------------------------------------------------------------------------------
 Jupyter Notebooks       1            0            0            0            0
 |- Markdown             1           60           30           22            8
 |- Python               1           96           87            1            8
 (Total)                            156          117           23           16
-------------------------------------------------------------------------------
 Markdown               16         1054            0          781          273
 |- BASH                 6          203          190            0           13
 |- Python               6          121          110            0           11
 |- Rust                 3          185          172            9            4
 (Total)                           1563          472          790          301
-------------------------------------------------------------------------------
 Rust                   86        28448        26034          379         2035
 |- Markdown            42          439            0          427           12
 (Total)                          28887        26034          806         2047
===============================================================================
 Total                 151        31133        27426         1182         2525
===============================================================================

EricLBuehler

@polarathene, this looks excellent, and I think it makes it much more idiomatic and type safe which is always better. I wrote some comments, and it looks like there are a few clippy warnings, one of which can be ignored.

EricLBuehler · 2024-05-29T01:51:08Z

mistralrs-core/src/utils/model_config.rs

+
+        // Create VarBuilder:
+        // TODO: `from_mmaped_safetensors` has `xlora_paths` as the 2nd param, is this a bug?
+        let vb = from_mmaped_safetensors(


No, perhaps we should rename xlora_paths to xlora_classifier_path though as that is really what it carries here.

The 2nd param of from_mmaped_safetensors takes the adapters paths, perhaps we should rename that accordingly too?

The 2nd param of from_mmaped_safetensors takes the adapters paths, perhaps we should rename that accordingly too?

Yes. I think more clearer names or at least some associated comments can assist with maintenance. Especially those that aren't familiar with all the ML / LLM jargon and only understand how to read code 😓

Can address that in a smaller follow-up PR and you can suggest the new names then? As mentioned I already was working on from_mmaped_safetensors() rewrite. Hopefully no merge conflicts there when I return to it 😝

EricLBuehler · 2024-05-29T01:53:51Z

mistralrs-core/src/pipeline/gguf.rs

@@ -657,8 +583,8 @@ impl Pipeline for GGUFPipeline {
        }
    }
    fn activate_adapters(&mut self, adapter_names: Vec<String>) -> anyhow::Result<usize> {
-        if !self.metadata.is_lora {
-            anyhow::bail!("Cannot activate adapters non-LoRA models.")
+        if !self.metadata.has_adapter {


This should remain as !self.metadata.is_lora because for X-LoRA, the adapter set must remain the same.

Ok, could we better communicate this?

LoRA models share the same as X-LoRA when they're created? GGUF/GGML is converted into XLoraLlama for example, but this has an is_xlora() -> true method.

Something seems off, or for someone like me not too familiar with these terms, other than you were involved in developing X-LoRA, I don't know how the usage differs in respect to XLoraLlama vs GeneralMetadata since is_xlora and is_xlora() kind of conflict there, but the model and pipeline are presumably related?

Is activate_adapters() only meant for LoRA? Or for any potential AdapterKind that isn't X-LoRA?

// Pseudo-code of what might be a better way to query? if self.metadata.adapter.is_some_and(|a| !a.is_xlora()) { activate } else { bail }

EricLBuehler · 2024-05-29T01:53:58Z

mistralrs-core/src/pipeline/ggml.rs

@@ -508,8 +478,8 @@ impl Pipeline for GGMLPipeline {
        }
    }
    fn activate_adapters(&mut self, adapter_names: Vec<String>) -> anyhow::Result<usize> {
-        if !self.metadata.is_lora {
-            anyhow::bail!("Cannot activate adapters non-LoRA models.")
+        if !self.metadata.has_adapter {


This should remain as !self.metadata.is_lora because for X-LoRA, the adapter set must remain the same.

EricLBuehler · 2024-05-29T01:57:38Z

mistralrs-core/src/pipeline/mod.rs

-    pub is_lora: bool,
+    // Model utilizes an adapter (LoRA or X-LoRA):
+    pub has_adapter: bool,
+    // TODO: There is a possibility that code mistakenly queries only `is_xlora`,


I don't think so, it is used for the X-LoRA cache management and inputs generation.

EricLBuehler · 2024-05-29T01:59:29Z

mistralrs-core/src/utils/model_config.rs

+            FileGGML { ct, gqa },
+        ) = self.quant;
+
+        // Forwards all structured fields above into the required flattened param sequence:


Same comment for the others, but perhaps the models can take the structured input? I would like to start that process for the plain models too, as their ISQ + device mapping args are hard to read.

Yes, I plan to, but as mentioned I wanted to minimize running into conflicts. My first approach was just getting something that was an improvement and moving most changes into this model_config.rs module to minimize conflicts as I spent quite a bit of time trying a few different things 😅

I still have considerations for ModelKind to move the quantization and adapter into the Plain / Normal variant via struct fields.

After this PR I can look into addressing the concern with the prop drilling (main.rs of each CLI seems to have a lengthy chain to go from CLI args to Loader). There's plenty to tackle, I've just got to do it in smaller bursts I think due to the activity in the repo going on.

EricLBuehler · 2024-05-29T02:00:31Z

mistralrs-core/src/xlora_models/gemma.rs

@@ -736,6 +736,7 @@ impl NormalModel for XLoraModel {
    fn device(&self) -> &Device {
        &self.device
    }
+    // BUG?: This is the only x-lora model with `is_xlora() -> false`


Yes! Thanks for catching that. As a long term and safer solution, maybe we can get this information from the ModelKind instead of using hardcoded values? Just an idea.

💯 I was thinking that made sense too.

Should is_xlora() be true here when the AdapterKind is Lora instead of XLora? That was one of my concerns about how these two are treated since presently these would return true for Lora kind.

EricLBuehler · 2024-05-29T02:07:06Z

mistralrs-core/src/utils/model_config.rs

This is really great, it makes the code much more type safe and (I think) easier to read! It looks like there are a few clippy warnings here, some of which, like the enum size one, can probably be ignored.

polarathene

Context for review @EricLBuehler

I'll go take care of appeasing the lint gods now, followed by resolving the conflict.

EDIT: Oh you already beat me to it 😂

I collapsed comments that don't need any attention and responded to your earlier review feedback for any overlapping discussion I brought up.

I'll avoid the longer walkthrough in future refactorings since that seemed unnecessary :)

Todo:

Replace has_adapter: bool with adapter: AdapterKind, or store the entire ModelKind in GeneralMetadata. Update GGUF + GGML activate_adapters() to restrict method specifically to AdapterKind::Lora.
Get clarification on what ct and gqa are abbreviations of, add contextual comments to struct fields.
Replace is_xlora() model methods to rely on querying ModelKind. If usage is only for X-LoRA, fix bug of LoRA models outputting true, or have callers specifically call is_xlora() on the ModelKind as a field of the model. May defer to follow-up PR.
Follow-up PR to replace from_gguf() / from_ggml() method signatures to use struct inputs for simplified param inputs.
Follow-up PR better param names for from_mmaped_safetensors().

mistralrs-core/Cargo.toml

mistralrs-core/src/models/quantized_llama.rs

mistralrs-core/src/pipeline/ggml.rs

mistralrs-core/src/utils/model_config.rs

mistralrs-core/src/xlora_models/gemma.rs

`model_config.rs` GGUF and GGML structs prefixed with `Params`. Two exceptions added as the concerns don't seem to warrant change: - `#[allow(clippy::borrowed_box)]` - `#[allow(clippy::large_enum_variant)]`

This file has no other change in the commit beyond line ending conversion. It was mistakenly using CRLF since creation.

`GeneralMetadata` now stores the `ModelKind` for this type of information. `activate_adapters()` error message revised. `mod.rs` version includes contextual comment about X-LoRA not being equivalent.

Most likely redundant with `GeneralMetadata` now having `ModelKind` to query, but fixing here until all queries replaced. Additionally updates `model_config.rs` note to clarify not a bug.

polarathene · 2024-05-29T15:11:48Z

I think I've addressed the feedback to get this reviewed again and merged. I have a checklist above of feedback to still tackle but I don't think it needs to block this PR from merging.

EDIT: Oh, there's a markdown example with using Python that does use the ModelKind enum, I'm not quite sure what the syntax for that would be with the new variant ModelKind::Quantized { quant: QuantizationKind::Gguf }, perhaps that would benefit from a new_*() helper method 🤷‍♂️

polarathene · 2024-05-30T03:00:05Z

While waiting on review here, I'm going over varbuilder_utils.rs and curious if this is intentional?:

mistral.rs/mistralrs-core/src/utils/varbuilder_utils.rs

Lines 87 to 88 in 527e7f5

    
           let pos = new_name.find(".lora").unwrap(); 
        
           new_name.insert_str(pos + 7, &format!(".{}", i + 1));

mistral.rs/mistralrs-core/src/utils/varbuilder_utils.rs

Lines 78 to 93 in 527e7f5

    
           for (name, _) in tensors.tensors() { 
        
               if name.contains("internal_xlora_classifier") { 
        
                   continue; 
        
               } 
        
               let mut new_name = if name.contains("base_model.model.model") { 
        
                   name.replace("base_model.model.model", "model") 
        
               } else { 
        
                   name.clone() 
        
               }; 
        
               let pos = new_name.find(".lora").unwrap(); 
        
               new_name.insert_str(pos + 7, &format!(".{}", i + 1)); 
        
               let tensor = tensors 
        
                   .load(&name, &device)? 
        
                   .to_device(&device)? 
        
                   .to_dtype(dtype)?; 
        
               accum.insert(new_name, tensor);

I'm not sure what i is intended for exactly, but with the current approach since there is a continue condition earlier, an entry can be skipped but i will increment. Is that context relevant? Or do the numbers assigned not need to be sequential, it's ok to skip some i?

EricLBuehler · 2024-05-30T10:18:45Z

I'll write a review soon, but just responding to your comments:

I'm not sure what i is intended for exactly, but with the current approach since there is a continue condition earlier, an entry can be skipped but i will increment. Is that context relevant? Or do the numbers assigned not need to be sequential, it's ok to skip some i?

i is the adapter number. It's +1 because adapter 0 is not what HF/peft does.

EDIT: Oh, there's a markdown example with using Python that does use the ModelKind enum, I'm not quite sure what the syntax for that would be with the new variant ModelKind::Quantized { quant: QuantizationKind::Gguf }, perhaps that would benefit from a new_*() helper method 🤷‍♂️

If you could add a more type-safe option to the Python API that would be great too.

EricLBuehler

Looks good, almost ready to merge! Just one question about a TODO.

mistralrs-core/src/pipeline/mod.rs

EricLBuehler

@polarathene, is it OK to merge? Looks good on my end.

polarathene · 2024-05-31T00:02:44Z

Yes happy to get this merged! 🚀

EricLBuehler · 2024-05-31T00:26:58Z

Sounds good! I wanted to check and see if you wanted to implement the other points, but I assume you plan on doing that in a future PR?

EricLBuehler · 2024-05-31T00:29:42Z

Thank you!

polarathene added 18 commits May 29, 2024 00:43

chore: Communicate actual difference

748e4d6

These methods are very verbose, but really only differ by two params to differentiate from Lora vs XLora.

refactor: Add from_ggml proxy

6b58411

Very similar to the `from_gguf`, except only `quantized_llama.rs` xlora supports this. No `Device` params, slightly different `File` params from GGUF type.

chore: DRY - Dedupe prop mapping methods

591b316

These no longer need to be maintained as copies within the supported model modules. They now leverage the common shared traits and take an annotated type parameter to handle. The syntax for usage is presently a little more verbose than desired.

chore: Contextual commit - Alternative prop mapping approaches

a218010

For reference, these alternatives could be considered.

chore: Fix typo

345bf52

chore: Wrap model_config assignment into expression

7bf774b

This better communicates the block is only relevant to assigning this value. While the two `is_lora` / `is_xlora` variables are hoisted above due to usage later as metadata inputs.

chore: Support params via tuple into() + add note of possible bug

733cfc2

breaking: Replace ModelKind with new version

be11ae7

A better approach for the most part at encoding the kind info.

EricLBuehler reviewed May 29, 2024

View reviewed changes

polarathene commented May 29, 2024

View reviewed changes

polarathene added 3 commits May 29, 2024 18:35

lint(clippy): Appease the lint gods

3420b62

`model_config.rs` GGUF and GGML structs prefixed with `Params`. Two exceptions added as the concerns don't seem to warrant change: - `#[allow(clippy::borrowed_box)]` - `#[allow(clippy::large_enum_variant)]`

chore: Convert from CRLF to LF

028eb0a

This file has no other change in the commit beyond line ending conversion. It was mistakenly using CRLF since creation.

lint(rustfmt): Appease the lint gods

7810792

EricLBuehler added documentation Improvements or additions to documentation models Additions to model or architectures fix labels May 29, 2024

polarathene added 3 commits May 30, 2024 03:00

fix: Restore is_lora condition

d75959f

`GeneralMetadata` now stores the `ModelKind` for this type of information. `activate_adapters()` error message revised. `mod.rs` version includes contextual comment about X-LoRA not being equivalent.

fix: Gemma X-Lora model is_xlora() should return true

3312ccc

Most likely redundant with `GeneralMetadata` now having `ModelKind` to query, but fixing here until all queries replaced. Additionally updates `model_config.rs` note to clarify not a bug.

Merge branch 'master' into refactor/pipeline-builder-match

8549316

polarathene requested a review from EricLBuehler May 29, 2024 15:12

EricLBuehler mentioned this pull request May 30, 2024

Running model from a GGUF file, only #326

Closed

EricLBuehler reviewed May 30, 2024

View reviewed changes

mistralrs-core/src/pipeline/mod.rs Show resolved Hide resolved

EricLBuehler approved these changes May 30, 2024

View reviewed changes

EricLBuehler merged commit 1d21c5f into EricLBuehler:master May 31, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: GGUF + GGML Loaders with `ModelKind` #356

refactor: GGUF + GGML Loaders with `ModelKind` #356

polarathene commented May 28, 2024 •

edited

Loading

github-actions bot commented May 28, 2024 •

edited

Loading

EricLBuehler left a comment

EricLBuehler May 29, 2024

polarathene May 29, 2024

EricLBuehler May 29, 2024

polarathene May 29, 2024

EricLBuehler May 29, 2024

EricLBuehler May 29, 2024

EricLBuehler May 29, 2024

polarathene May 29, 2024

EricLBuehler May 29, 2024

polarathene May 29, 2024

EricLBuehler May 29, 2024

polarathene left a comment •

edited

Loading

polarathene commented May 29, 2024 •

edited

Loading

polarathene commented May 30, 2024 •

edited

Loading

EricLBuehler commented May 30, 2024

EricLBuehler left a comment

EricLBuehler left a comment

polarathene commented May 31, 2024

EricLBuehler commented May 31, 2024

EricLBuehler commented May 31, 2024

refactor: GGUF + GGML Loaders with ModelKind #356

refactor: GGUF + GGML Loaders with ModelKind #356

Conversation

polarathene commented May 28, 2024 • edited Loading

github-actions bot commented May 28, 2024 • edited Loading

EricLBuehler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

polarathene left a comment • edited Loading

Choose a reason for hiding this comment

polarathene commented May 29, 2024 • edited Loading

polarathene commented May 30, 2024 • edited Loading

EricLBuehler commented May 30, 2024

EricLBuehler left a comment

Choose a reason for hiding this comment

EricLBuehler left a comment

Choose a reason for hiding this comment

polarathene commented May 31, 2024

EricLBuehler commented May 31, 2024

EricLBuehler commented May 31, 2024

refactor: GGUF + GGML Loaders with `ModelKind` #356

refactor: GGUF + GGML Loaders with `ModelKind` #356

polarathene commented May 28, 2024 •

edited

Loading

github-actions bot commented May 28, 2024 •

edited

Loading

polarathene left a comment •

edited

Loading

polarathene commented May 29, 2024 •

edited

Loading

polarathene commented May 30, 2024 •

edited

Loading