Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem loading metadata of gguf file #2152

Open
cnlancehu opened this issue May 2, 2024 · 6 comments
Open

Problem loading metadata of gguf file #2152

cnlancehu opened this issue May 2, 2024 · 6 comments

Comments

@cnlancehu
Copy link

I encountered an error while executing the example quantized-phi, which I slightly modified. However, I suspect the issue might not be with my modifications.

The problem seems to be related to the function candle_transformers::models::quantized_llama::ModelWeights::from_gguf. It appears to be unable to locate the necessary metadata from the model. This is interesting because Hugging Face is able to display the model's metadata correctly.

Here are some screenshots for further reference:

Error screenshot

Hugging Face display

I would appreciate any assistance in resolving this issue. Thank you in advance.

Full Code

@LaurentMazare
Copy link
Collaborator

You may want to use the latest github version as 0.4.1 may well not be compatible with phi-3.
Also you will need to use the --which phi-3 flag to specify that you're using this variant.

@cnlancehu
Copy link
Author

I found that the naming convention in the phi-3 metadata (also the tensors) is different from llama, so we can't directly apply quantized-llama
Here is the from_gguf func, please check the notes

code

@LaurentMazare
Copy link
Collaborator

Seems that there was a "silent" change of the naming convention in phi-3 gguf models, see #2154 , candle now supports both the old and the new naming convention in the quantized-phi example, Phi3 is the "new" version with a phi3 architecture and Phi3b is the version with a llama architecture.

@cnlancehu
Copy link
Author

I think the real reason caused the problem is this.
Firstly, there are two different methods mentioned for conversion:

  • convert.py convert model to gguf with the architecture llama always
  • convert-hf-to-gguf.py convert model to gguf with the architecture from the given model

However, it appears that phi3 can only be converted using convert-hf-to-gguf.py due to encountering an NotImplementedError with the message: Unknown rope scaling type: su.

This inconsistency in conversion methods seems to have led to the problem. The left model in the screenshot was converted using convert-hf-to-gguf.py, while the right one was converted using convert.py.
screenshot

I am wondering if candle could auto detect the architecture from the gguf model converted by convert-hf-to-gguf.py and run it correctly, which will get to the root of the problem.
Sorry for the delayed response.

@LaurentMazare
Copy link
Collaborator

I would have thought that we support both methods now, the phi3 architecture with the quantized-phi example and the llama one with the quantized example. Doesn't that work for you?

@cnlancehu
Copy link
Author

It works out fine, but if quantized_llama could run all models converted by convert-hf-to-gguf.py would be excellent.

Here are many models to run, but I must modify the architecture from candle_transformers::models::quantized_phi3 to run it normally.
For example to run the qwen model, just rename phi3 to qwen2.
image

But if quantized_phi3::ModelWeights::from_gguf don't write the architecture to death, we can run everything converted by convert-hf-to-gguf.py at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants