Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGUF interaction with Transformers using AutoModel Class #30889

Open
Abdullah-kwl opened this issue May 18, 2024 · 5 comments
Open

GGUF interaction with Transformers using AutoModel Class #30889

Abdullah-kwl opened this issue May 18, 2024 · 5 comments
Labels

Comments

@Abdullah-kwl
Copy link

Feature request

https://huggingface.co/docs/transformers/main/en/gguf
in above documentation it shows that it loads the gguf model and provided the simple example

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)

but when I run the code it shows the error :
OSError: TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
and some time show your transformer may be not updated

even after updating transformer it shows the same error not loading gguf model , add support to load gguf model

Motivation

it will solve to load gguf model with out the help of other library such as llama.cpp and ollama

Your contribution

I do not have a complete implementation in mind, but I suggest to start with previous method which mention in https://huggingface.co/docs/transformers/main/en/gguf

@younesbelkada
Copy link
Contributor

Hi @Abdullah-kwl
Thanks for the issue ! Can you make sure you have the latest transformers installed ? pip install -U transformers

@Abdullah-kwl
Copy link
Author

yes, I am using an updated transformer library.
in collab I am using the same command to download transformers.
pip install -U transformers

@younesbelkada
Copy link
Contributor

Thanks @Abdullah-kwl , will try to repro and report back here

@younesbelkada
Copy link
Contributor

Hi @Abdullah-kwl
I ran successfully the code on fresh new Google Colab env: https://colab.research.google.com/drive/1rfJZp3DsbavH6IFo-rXUDNvqwFJVIbOb?usp=sharing
Note we do run a bunch of tests: https://github.com/huggingface/transformers/blob/main/tests/quantization/ggml/test_ggml.py and they do all pass on our end as of today !

@Abdullah-kwl
Copy link
Author

@younesbelkada yes i again run it now its working there was some dependencies conflict with other libraries but now its running.

but now I am facing problem that my session crashed after using all available RAM, I think its loading model on ram but when I use llama-cpp-python it does not load model on ram and we can easily inference with larger models even more then 7B.
is there is a way that my session doe not crash due to ram usage.

try out using this model:

model_id = "TheBloke/WestLake-7B-v2-GGUF"
filename = "westlake-7b-v2.Q2_K.gguf"
model = AutoModel.from_pretrained(model_id, gguf_file=filename)
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants