Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes #30887

Open
4 tasks
AnandUgale opened this issue May 18, 2024 · 5 comments

Comments

@AnandUgale
Copy link

System Info

Packages installed with CUDA 11.8:

torch - 2.3.0+cu118
llama-index - 0.10.37
llama-index-llms-huggingface - 0.2.0
transformers - 4.39.0
accelerate - 0.27.0
bitsandbytes - 0.43.1

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch
from llama_index.llms.huggingface import HuggingFaceLLM

Optional quantization to 4bit

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)

llm = HuggingFaceLLM(
model_name="meta-llama/Meta-Llama-3-8B-Instruct",
model_kwargs={
"token": hf_token,
"torch_dtype": torch.bfloat16, # comment this line and uncomment below to use 4bit
# "quantization_config": quantization_config
},
generate_kwargs={
"do_sample": True,
"temperature": 0.6,
"top_p": 0.9,
},
tokenizer_name="meta-llama/Meta-Llama-3-8B-Instruct",
tokenizer_kwargs={"token": hf_token},
stopping_ids=stopping_ids,
)

Expected behavior

able to run LLM model

@amyeroberts
Copy link
Collaborator

@AnandUgale Have you tried installing accelerate as per the error message?

@RuABraun
Copy link

I have the same issue. Accelerate is installed. This is while trying to run inference after successfully doing training with bitsandbytes.

@RuABraun
Copy link

The issue seems to be that is_bitsandbytes_available() in import_utils.py returns false when a cuda device is not available. So one should just not use the 4/8bit stuff at all when the device is CPU, which to be fair makes sense.

@amyeroberts
Copy link
Collaborator

@RuABraun Yes, cuda is required for using bitsandbytes

cc @younesbelkada - maybe we can update the warning to make things clearer?

@younesbelkada
Copy link
Contributor

Hi !
1872bde should be included in the latest transformers so whenever you don't have access to a GPU it should error out with a clearer error message (I see you are using transformers==4.39.0)
Though I will enhance the error message to point out users to install bitsandbytes through the simpler command pip install -U bitsandbytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants