Problem loading metadata of gguf file #2152

cnlancehu · 2024-05-02T16:41:46Z

I encountered an error while executing the example quantized-phi, which I slightly modified. However, I suspect the issue might not be with my modifications.

The problem seems to be related to the function candle_transformers::models::quantized_llama::ModelWeights::from_gguf. It appears to be unable to locate the necessary metadata from the model. This is interesting because Hugging Face is able to display the model's metadata correctly.

Here are some screenshots for further reference:

I would appreciate any assistance in resolving this issue. Thank you in advance.

Full Code

The text was updated successfully, but these errors were encountered:

LaurentMazare · 2024-05-02T16:49:05Z

You may want to use the latest github version as 0.4.1 may well not be compatible with phi-3.
Also you will need to use the --which phi-3 flag to specify that you're using this variant.

cnlancehu · 2024-05-03T02:50:14Z

I found that the naming convention in the phi-3 metadata (also the tensors) is different from llama, so we can't directly apply quantized-llama
Here is the from_gguf func, please check the notes

LaurentMazare · 2024-05-05T05:31:16Z

Seems that there was a "silent" change of the naming convention in phi-3 gguf models, see #2154 , candle now supports both the old and the new naming convention in the quantized-phi example, Phi3 is the "new" version with a phi3 architecture and Phi3b is the version with a llama architecture.

cnlancehu · 2024-05-16T14:26:07Z

I think the real reason caused the problem is this.
Firstly, there are two different methods mentioned for conversion:

convert.py convert model to gguf with the architecture llama always
convert-hf-to-gguf.py convert model to gguf with the architecture from the given model

However, it appears that phi3 can only be converted using convert-hf-to-gguf.py due to encountering an NotImplementedError with the message: Unknown rope scaling type: su.

This inconsistency in conversion methods seems to have led to the problem. The left model in the screenshot was converted using convert-hf-to-gguf.py, while the right one was converted using convert.py.

I am wondering if candle could auto detect the architecture from the gguf model converted by convert-hf-to-gguf.py and run it correctly, which will get to the root of the problem.
Sorry for the delayed response.

LaurentMazare · 2024-05-16T14:41:17Z

I would have thought that we support both methods now, the phi3 architecture with the quantized-phi example and the llama one with the quantized example. Doesn't that work for you?

cnlancehu · 2024-05-16T14:52:43Z

It works out fine, but if quantized_llama could run all models converted by convert-hf-to-gguf.py would be excellent.

Here are many models to run, but I must modify the architecture from candle_transformers::models::quantized_phi3 to run it normally.
For example to run the qwen model, just rename phi3 to qwen2.

But if quantized_phi3::ModelWeights::from_gguf don't write the architecture to death, we can run everything converted by convert-hf-to-gguf.py at once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem loading metadata of gguf file #2152

Problem loading metadata of gguf file #2152

cnlancehu commented May 2, 2024

LaurentMazare commented May 2, 2024

cnlancehu commented May 3, 2024

LaurentMazare commented May 5, 2024

cnlancehu commented May 16, 2024

LaurentMazare commented May 16, 2024

cnlancehu commented May 16, 2024

Problem loading metadata of gguf file #2152

Problem loading metadata of gguf file #2152

Comments

cnlancehu commented May 2, 2024

LaurentMazare commented May 2, 2024

cnlancehu commented May 3, 2024

LaurentMazare commented May 5, 2024

cnlancehu commented May 16, 2024

LaurentMazare commented May 16, 2024

cnlancehu commented May 16, 2024