Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: GGUF + GGML Loaders with ModelKind #356

Merged

Conversation

polarathene
Copy link
Contributor

@polarathene polarathene commented May 28, 2024

This is mostly where I wanted to get it functionally. Still plenty more to do, but there is a lot of churn going on lately that it's starting to introduce conflicts when rebasing, so I'd rather get this through review.

I'll provide more context in about 12 hours, but the commit history is iterative with detailed commit messages on changes. Some of it became redundant and replaced, but diff may be easier to follow if helpful.


  • ModelKind workarounds dropped, so the newer version is now used as requested.
  • A similar workaround is on the model from_gguf() and from_ggml() methods, I can address that in a follow-up PR. These methods now use a trait that needs to be imported to use them.
  • Some potential bugs spotted and others fixed.

I'll handle the clippy and rustfmt feedback when I return 😓

These methods are very verbose, but really only differ by two params to differentiate from Lora vs XLora.
`ModelConfig` groups the common properties used across the `from_gguf()` methods. This will better communicate differences across impl of `from_gguf()`.

The quantized xlora models `from_gguf()` now have a prop to param forwarder as a workaround to minimize breakage elsewhere.
Very similar to the `from_gguf`, except only `quantized_llama.rs` xlora supports this. No `Device` params, slightly different `File` params from GGUF type.
Finally, all this extra boilerplate can be shifted into the model config `Adapter` struct to self-contain in a single method.

This required adjusting ownership a little to satisfy the compiler. The original `from_gguf()` and `from_ggml()` methods are unaffected, they still receive the expected params as reference.
This introduces a slight breaking change, in that using these `from_gguf()` / `from_ggml()` methods now requires importing the trait into scope.

The methods drop the `pub` prefix as they inherit `pub` from the trait requirement itself.

The purpose of this trait is to not require each model to duplicate the structs to params mapping helper method. Instead that can be centralized.
These no longer need to be maintained as copies within the supported model modules.

They now leverage the common shared traits and take an annotated type parameter to handle.

The syntax for usage is presently a little more verbose than desired.
For reference, these alternatives could be considered.
- Impl traits for the non-adapter quant models
- Since adapter variants only append parameters and that is now a distinct struct, `model_config` can be defined earlier and a helper `with_adapter()` can convert to the adapter type variant.
With a rebase to adopt new methods for `ModelKind`, the shared logic can be hoisted out of the match arms.

XLora specific variables were moved into `Adapter::try_new()` (`model_config.rs`) as they can share the same `paths` parameter by adding a separate bool to toggle x-lora usage.

By hoisting `model_config` variable out of the match arm, the type would change when calling `with_adapter()`, thus to prevent that the separate `Adapter*` tuple structs have been dropped in favor of `ModelParams<Q>` which uses generic `Q` for the quantization type (trait marker) and a separate adapter optional that can be updated.

`MapParamsToModel<T>` also is no longer compatible as an approach since the trait bound is ambiguous as there is no distinct adapter type (eg: `for AdapterGGUF`) to impl upon, unique method names also become required to avoid conflict on the same type.
- `try_into_model()` + `try_into_model_with_adapter()` for the separate `Q` types (GGUF/GGML) with/without adapter.
- Due to new struct introduced, slight change to destructuring. The `impl` also bundles both methods now for each `Q` variant. Order was adjusted to basic followed by adapter methods for each `Q` variant, instead of both basic, then both adapter variations following afterwards.
- Likewise the `ggml.rs` and `gguf.rs` methods without the `MapParamsToModel<T>` trait now rely on `TryFrom` trait impl.
This approach introduces another generic parameter `MaybeAdapter` to emulate a `Option<Adapter>` that can be used as type to `impl` upon.

To continue the unified type usage with an adapter variant in `ggml.rs` / `gguf.rs` pipelines, this must now leverage an enum for the two variants.
- Slightly more complexity added as a result.
- Adapter `try_into_model()` methods no longer need to check for `Some(Adapter)` to unwrap, since that should always be the case. This is now guaranteed.
- However similar logic has bubbled up to the `TryFrom` for all impl due to the enum wrapper, thus this approach may not be much better beyond broader consistency. Likewise with the `with_adapter()` method.

To minimize boilerplate in handling unwrapping of the enum in the `TryFrom` methods, `Variantly` has been introduced for it's `expect_variant()` method.

As all four types are distinct, the `_with_adapter()` method can also be `try_into_model()` due to separate impl for the new generic param `MaybeAdapter`.
Since the type constraint for `try_into_model()` methods is bound as the return type, it can be inferred without any hint in the `TryFrom`, no need to annotate with `Self`.

Use `derive_more` for terser construction of `Config` struct for `ModelParams` variants.
This is an alternative approach to build the config. Construction of the config from the param inputs is handled at the end now, not dependent upon separate `new()` + optional `with_adapter()` calls on a mutable variable.

Unfortunately `buildstructor` and `typed-builder` APIs don't allow for easy flexibility of builder methods in different scopes (_due to moves_). `derive-builder` can do this but not with the more complex types due to lack of a `Copy` / `Clone`. Thus the `None` option is required as input regardless of if an adapter is needed.
This better communicates the block is only relevant to assigning this value. While the two `is_lora` / `is_xlora` variables are hoisted above due to usage later as metadata inputs.
`pipeline/gguf.rs` + `pipeline/ggml.rs` now ensure that `activate_adapters()` works for X-LoRA too. This is assumed as a bugfix due to the `XLoraLlama` model the two adapter kinds share along with code everywhere else checking `is_xlora`, no other usage of `is_lora` seems to be used.
- To ensure further ambiguity is avoided, the condition is better communicated as `has_adapter`.
- It is unclear if all usage of `is_xlora` is specific to X-LoRA or also intended to be applicable to LoRA since `XLora*` models do impl `is_xlora() -> true` (except Gemma, which is a potential bug).

`pipeline/normal.rs` handled it's own `is_xlora` bool differently than `gguf.rs` / `ggml.rs` loaders.
- It relied upon`model.is_xlora() && !is_lora`, but we already assume X-LoRA via prior matching  on `ModelKind` which now provides this information via it's own `is_x_lora()` method.
- Only `xlora_models/gemma.rs` would behave differently with this change, but Gemma might have meant to return `true`?
Matches are only for `Quantized` or `AdapterQuantized` variants with no difference in handling by `AdapterKind` variant used.

Additionally restores the `GGUF X-LoRA` bail formatted string. For consistency the non-adapter branch also appends `for GGUF` and the architecture in the lora branch now comes before the `ModelKind`.
A better approach for the most part at encoding the kind info.
Copy link

github-actions bot commented May 28, 2024

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                    9           21           21            0            0
 Python                 21          741          622           21           98
 TOML                   15          393          355            1           37
-------------------------------------------------------------------------------
 Jupyter Notebooks       1            0            0            0            0
 |- Markdown             1           60           30           22            8
 |- Python               1           96           87            1            8
 (Total)                            156          117           23           16
-------------------------------------------------------------------------------
 Markdown               16         1054            0          781          273
 |- BASH                 6          203          190            0           13
 |- Python               6          121          110            0           11
 |- Rust                 3          185          172            9            4
 (Total)                           1563          472          790          301
-------------------------------------------------------------------------------
 Rust                   86        28448        26034          379         2035
 |- Markdown            42          439            0          427           12
 (Total)                          28887        26034          806         2047
===============================================================================
 Total                 151        31133        27426         1182         2525
===============================================================================
  

Copy link
Owner

@EricLBuehler EricLBuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@polarathene, this looks excellent, and I think it makes it much more idiomatic and type safe which is always better. I wrote some comments, and it looks like there are a few clippy warnings, one of which can be ignored.


// Create VarBuilder:
// TODO: `from_mmaped_safetensors` has `xlora_paths` as the 2nd param, is this a bug?
let vb = from_mmaped_safetensors(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, perhaps we should rename xlora_paths to xlora_classifier_path though as that is really what it carries here.

The 2nd param of from_mmaped_safetensors takes the adapters paths, perhaps we should rename that accordingly too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd param of from_mmaped_safetensors takes the adapters paths, perhaps we should rename that accordingly too?

Yes. I think more clearer names or at least some associated comments can assist with maintenance. Especially those that aren't familiar with all the ML / LLM jargon and only understand how to read code 😓

Can address that in a smaller follow-up PR and you can suggest the new names then? As mentioned I already was working on from_mmaped_safetensors() rewrite. Hopefully no merge conflicts there when I return to it 😝

@@ -657,8 +583,8 @@ impl Pipeline for GGUFPipeline {
}
}
fn activate_adapters(&mut self, adapter_names: Vec<String>) -> anyhow::Result<usize> {
if !self.metadata.is_lora {
anyhow::bail!("Cannot activate adapters non-LoRA models.")
if !self.metadata.has_adapter {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should remain as !self.metadata.is_lora because for X-LoRA, the adapter set must remain the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, could we better communicate this?

LoRA models share the same as X-LoRA when they're created? GGUF/GGML is converted into XLoraLlama for example, but this has an is_xlora() -> true method.

Something seems off, or for someone like me not too familiar with these terms, other than you were involved in developing X-LoRA, I don't know how the usage differs in respect to XLoraLlama vs GeneralMetadata since is_xlora and is_xlora() kind of conflict there, but the model and pipeline are presumably related?


Is activate_adapters() only meant for LoRA? Or for any potential AdapterKind that isn't X-LoRA?

// Pseudo-code of what might be a better way to query?
if self.metadata.adapter.is_some_and(|a| !a.is_xlora()) { activate } else { bail }

@@ -508,8 +478,8 @@ impl Pipeline for GGMLPipeline {
}
}
fn activate_adapters(&mut self, adapter_names: Vec<String>) -> anyhow::Result<usize> {
if !self.metadata.is_lora {
anyhow::bail!("Cannot activate adapters non-LoRA models.")
if !self.metadata.has_adapter {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should remain as !self.metadata.is_lora because for X-LoRA, the adapter set must remain the same.

pub is_lora: bool,
// Model utilizes an adapter (LoRA or X-LoRA):
pub has_adapter: bool,
// TODO: There is a possibility that code mistakenly queries only `is_xlora`,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, it is used for the X-LoRA cache management and inputs generation.

FileGGML { ct, gqa },
) = self.quant;

// Forwards all structured fields above into the required flattened param sequence:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for the others, but perhaps the models can take the structured input? I would like to start that process for the plain models too, as their ISQ + device mapping args are hard to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I plan to, but as mentioned I wanted to minimize running into conflicts. My first approach was just getting something that was an improvement and moving most changes into this model_config.rs module to minimize conflicts as I spent quite a bit of time trying a few different things 😅

I still have considerations for ModelKind to move the quantization and adapter into the Plain / Normal variant via struct fields.

After this PR I can look into addressing the concern with the prop drilling (main.rs of each CLI seems to have a lengthy chain to go from CLI args to Loader). There's plenty to tackle, I've just got to do it in smaller bursts I think due to the activity in the repo going on.

@@ -736,6 +736,7 @@ impl NormalModel for XLoraModel {
fn device(&self) -> &Device {
&self.device
}
// BUG?: This is the only x-lora model with `is_xlora() -> false`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thanks for catching that. As a long term and safer solution, maybe we can get this information from the ModelKind instead of using hardcoded values? Just an idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 I was thinking that made sense too.

Should is_xlora() be true here when the AdapterKind is Lora instead of XLora? That was one of my concerns about how these two are treated since presently these would return true for Lora kind.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great, it makes the code much more type safe and (I think) easier to read! It looks like there are a few clippy warnings here, some of which, like the enum size one, can probably be ignored.

Copy link
Contributor Author

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context for review @EricLBuehler

I'll go take care of appeasing the lint gods now, followed by resolving the conflict.


EDIT: Oh you already beat me to it 😂

I collapsed comments that don't need any attention and responded to your earlier review feedback for any overlapping discussion I brought up.

I'll avoid the longer walkthrough in future refactorings since that seemed unnecessary :)


Todo:

mistralrs-core/Cargo.toml Show resolved Hide resolved
mistralrs-core/src/models/quantized_llama.rs Show resolved Hide resolved
mistralrs-core/src/pipeline/ggml.rs Outdated Show resolved Hide resolved
mistralrs-core/src/pipeline/ggml.rs Outdated Show resolved Hide resolved
mistralrs-core/src/pipeline/ggml.rs Outdated Show resolved Hide resolved
mistralrs-core/src/utils/model_config.rs Outdated Show resolved Hide resolved
mistralrs-core/src/utils/model_config.rs Outdated Show resolved Hide resolved
mistralrs-core/src/utils/model_config.rs Outdated Show resolved Hide resolved
mistralrs-core/src/utils/model_config.rs Outdated Show resolved Hide resolved
mistralrs-core/src/xlora_models/gemma.rs Outdated Show resolved Hide resolved
`model_config.rs` GGUF and GGML structs prefixed with `Params`.

Two exceptions added as the concerns don't seem to warrant change:
- `#[allow(clippy::borrowed_box)]`
- `#[allow(clippy::large_enum_variant)]`
This file has no other change in the commit beyond line ending conversion. It was mistakenly using CRLF since creation.
@EricLBuehler EricLBuehler added documentation Improvements or additions to documentation models Additions to model or architectures fix labels May 29, 2024
`GeneralMetadata` now stores the `ModelKind` for this type of information.

`activate_adapters()` error message revised. `mod.rs` version includes contextual comment about X-LoRA not being equivalent.
Most likely redundant with `GeneralMetadata` now having `ModelKind` to query, but fixing here until all queries replaced.

Additionally updates `model_config.rs` note to clarify not a bug.
@polarathene
Copy link
Contributor Author

polarathene commented May 29, 2024

I think I've addressed the feedback to get this reviewed again and merged. I have a checklist above of feedback to still tackle but I don't think it needs to block this PR from merging.


EDIT: Oh, there's a markdown example with using Python that does use the ModelKind enum, I'm not quite sure what the syntax for that would be with the new variant ModelKind::Quantized { quant: QuantizationKind::Gguf }, perhaps that would benefit from a new_*() helper method 🤷‍♂️

@polarathene
Copy link
Contributor Author

polarathene commented May 30, 2024

While waiting on review here, I'm going over varbuilder_utils.rs and curious if this is intentional?:

let pos = new_name.find(".lora").unwrap();
new_name.insert_str(pos + 7, &format!(".{}", i + 1));

for (name, _) in tensors.tensors() {
if name.contains("internal_xlora_classifier") {
continue;
}
let mut new_name = if name.contains("base_model.model.model") {
name.replace("base_model.model.model", "model")
} else {
name.clone()
};
let pos = new_name.find(".lora").unwrap();
new_name.insert_str(pos + 7, &format!(".{}", i + 1));
let tensor = tensors
.load(&name, &device)?
.to_device(&device)?
.to_dtype(dtype)?;
accum.insert(new_name, tensor);

I'm not sure what i is intended for exactly, but with the current approach since there is a continue condition earlier, an entry can be skipped but i will increment. Is that context relevant? Or do the numbers assigned not need to be sequential, it's ok to skip some i?

@EricLBuehler
Copy link
Owner

I'll write a review soon, but just responding to your comments:

I'm not sure what i is intended for exactly, but with the current approach since there is a continue condition earlier, an entry can be skipped but i will increment. Is that context relevant? Or do the numbers assigned not need to be sequential, it's ok to skip some i?

i is the adapter number. It's +1 because adapter 0 is not what HF/peft does.

EDIT: Oh, there's a markdown example with using Python that does use the ModelKind enum, I'm not quite sure what the syntax for that would be with the new variant ModelKind::Quantized { quant: QuantizationKind::Gguf }, perhaps that would benefit from a new_*() helper method 🤷‍♂️

If you could add a more type-safe option to the Python API that would be great too.

Copy link
Owner

@EricLBuehler EricLBuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, almost ready to merge! Just one question about a TODO.

mistralrs-core/src/pipeline/mod.rs Show resolved Hide resolved
Copy link
Owner

@EricLBuehler EricLBuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@polarathene, is it OK to merge? Looks good on my end.

@polarathene
Copy link
Contributor Author

Yes happy to get this merged! 🚀

@EricLBuehler
Copy link
Owner

Sounds good! I wanted to check and see if you wanted to implement the other points, but I assume you plan on doing that in a future PR?

@EricLBuehler EricLBuehler merged commit 1d21c5f into EricLBuehler:master May 31, 2024
11 checks passed
@EricLBuehler
Copy link
Owner

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation fix models Additions to model or architectures
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants