GGUF interaction with Transformers using AutoModel Class #30889

Abdullah-kwl · 2024-05-18T09:10:23Z

Feature request

https://huggingface.co/docs/transformers/main/en/gguf
in above documentation it shows that it loads the gguf model and provided the simple example

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)

but when I run the code it shows the error :
OSError: TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
and some time show your transformer may be not updated

even after updating transformer it shows the same error not loading gguf model , add support to load gguf model

Motivation

it will solve to load gguf model with out the help of other library such as llama.cpp and ollama

Your contribution

I do not have a complete implementation in mind, but I suggest to start with previous method which mention in https://huggingface.co/docs/transformers/main/en/gguf

younesbelkada · 2024-05-20T08:42:58Z

Hi @Abdullah-kwl
Thanks for the issue ! Can you make sure you have the latest transformers installed ? pip install -U transformers

Abdullah-kwl · 2024-05-28T13:40:59Z

yes, I am using an updated transformer library.
in collab I am using the same command to download transformers.
pip install -U transformers

younesbelkada · 2024-05-28T13:57:20Z

Thanks @Abdullah-kwl , will try to repro and report back here

younesbelkada · 2024-05-28T14:09:24Z

Hi @Abdullah-kwl
I ran successfully the code on fresh new Google Colab env: https://colab.research.google.com/drive/1rfJZp3DsbavH6IFo-rXUDNvqwFJVIbOb?usp=sharing
Note we do run a bunch of tests: https://github.com/huggingface/transformers/blob/main/tests/quantization/ggml/test_ggml.py and they do all pass on our end as of today !

Abdullah-kwl · 2024-05-28T14:53:15Z

@younesbelkada yes i again run it now its working there was some dependencies conflict with other libraries but now its running.

but now I am facing problem that my session crashed after using all available RAM, I think its loading model on ram but when I use llama-cpp-python it does not load model on ram and we can easily inference with larger models even more then 7B.
is there is a way that my session doe not crash due to ram usage.

try out using this model:

model_id = "TheBloke/WestLake-7B-v2-GGUF"
filename = "westlake-7b-v2.Q2_K.gguf"
model = AutoModel.from_pretrained(model_id, gguf_file=filename)
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)

amyeroberts added the GGUF label May 20, 2024

younesbelkada mentioned this issue May 28, 2024

FIX: Add accelerate as a hard requirement #31090

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF interaction with Transformers using AutoModel Class #30889

GGUF interaction with Transformers using AutoModel Class #30889

Abdullah-kwl commented May 18, 2024

younesbelkada commented May 20, 2024

Abdullah-kwl commented May 28, 2024

younesbelkada commented May 28, 2024

younesbelkada commented May 28, 2024

Abdullah-kwl commented May 28, 2024

GGUF interaction with Transformers using AutoModel Class #30889

GGUF interaction with Transformers using AutoModel Class #30889

Comments

Abdullah-kwl commented May 18, 2024

Feature request

Motivation

Your contribution

younesbelkada commented May 20, 2024

Abdullah-kwl commented May 28, 2024

younesbelkada commented May 28, 2024

younesbelkada commented May 28, 2024

Abdullah-kwl commented May 28, 2024