-
-
Notifications
You must be signed in to change notification settings - Fork 797
Description
System Info
Hi,
I'm running bitsandbytes from a Docker container based on the nvcr.io/nvidia/pytorch:23.12-py3 Docker image. I've installed the library from source, with the following commands:
CUDA_VERSION=123 make cuda12x
python setup.py installThe installation completes successfully, and I can import bitsandbytes from the Python interpreter.
However, executing the following code snippet throws the following error.
from transformers import AutoTokenizer, AutoModelForCausalLM
model = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model, device_map="auto", load_in_4bit=True)False
===================================BUG REPORT===================================
================================================================================
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/compat/lib'), PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
The following directories listed in your path were found to be non-existent: {PosixPath('//matplotlib_inline.backend_inline'), PosixPath('module')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=123, Highest Compute Capability: 8.6.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda123.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=123
python setup.py install
CUDA SETUP: Setup Failed!
/usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/cuda_setup/main.py:167: UserWarning: Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
/usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/cuda_setup/main.py:167: UserWarning: /usr/local/lib/python3.10/dist-packages/torch/lib:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:1382, in _LazyModule._get_module(self, module_name)
1381 try:
-> 1382 return importlib.import_module("." + module_name, self.__name__)
1383 except Exception as e:
File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)
File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)
File <frozen importlib._bootstrap>:1006, in _find_and_load_unlocked(name, import_)
File <frozen importlib._bootstrap>:688, in _load_unlocked(spec)
File <frozen importlib._bootstrap_external>:883, in exec_module(self, module)
File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)
File /usr/local/lib/python3.10/dist-packages/transformers/integrations/bitsandbytes.py:11
10 if is_bitsandbytes_available():
---> 11 import bitsandbytes as bnb
12 import torch
File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/__init__.py:6
1 # Copyright (c) Facebook, Inc. and its affiliates.
2 #
3 # This source code is licensed under the MIT license found in the
4 # LICENSE file in the root directory of this source tree.
----> 6 from . import cuda_setup, utils, research
7 from .autograd._functions import (
8 MatmulLtState,
9 bmm_cublas,
(...)
13 matmul_4bit
14 )
File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/research/__init__.py:1
----> 1 from . import nn
2 from .autograd._functions import (
3 switchback_bnb,
4 matmul_fp8_global,
5 matmul_fp8_mixed,
6 )
File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/research/nn/__init__.py:1
----> 1 from .modules import LinearFP8Mixed, LinearFP8Global
File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/research/nn/modules.py:8
7 import bitsandbytes as bnb
----> 8 from bitsandbytes.optim import GlobalOptimManager
9 from bitsandbytes.utils import OutlierTracer, find_outlier_dims
File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/optim/__init__.py:6
1 # Copyright (c) Facebook, Inc. and its affiliates.
2 #
3 # This source code is licensed under the MIT license found in the
4 # LICENSE file in the root directory of this source tree.
----> 6 from bitsandbytes.cextension import COMPILED_WITH_CUDA
8 from .adagrad import Adagrad, Adagrad8bit, Adagrad32bit
File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/cextension.py:20
19 CUDASetup.get_instance().print_log_stack()
---> 20 raise RuntimeError('''
21 CUDA Setup failed despite GPU being available. Please run the following command to get more information:
22
23 python -m bitsandbytes
24
25 Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
26 to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
27 and open an issue at: [https://github.com/TimDettmers/bitsandbytes/issues](https://github.com/TimDettmers/bitsandbytes/issues%3C/span%3E%3Cspan) style="color:rgb(175,0,0)">''')
28 lib.cadam32bit_grad_fp32 # runs on an error if the library could not be found -> COMPILED_WITH_CUDA=False
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
Cell In[4], line 3
1 model = "meta-llama/Llama-2-7b-chat-hf"
2 tokenizer = AutoTokenizer.from_pretrained(model)
----> 3 model = AutoModelForCausalLM.from_pretrained(model, device_map="auto", load_in_4bit=True)
File /usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py:566, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
564 elif type(config) in cls._model_mapping.keys():
565 model_class = _get_model_class(config, cls._model_mapping)
--> 566 return model_class.from_pretrained(
567 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
568 )
569 raise ValueError(
570 f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
571 f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
572 )
File /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:3476, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3473 keep_in_fp32_modules = []
3475 if load_in_8bit or load_in_4bit:
-> 3476 from .integrations import get_keys_to_not_convert, replace_with_bnb_linear
3478 llm_int8_skip_modules = quantization_config.llm_int8_skip_modules
3479 load_in_8bit_fp32_cpu_offload = quantization_config.llm_int8_enable_fp32_cpu_offload
File <frozen importlib._bootstrap>:1075, in _handle_fromlist(module, fromlist, import_, recursive)
File /usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:1372, in _LazyModule.__getattr__(self, name)
1370 value = self._get_module(name)
1371 elif name in self._class_to_module.keys():
-> 1372 module = self._get_module(self._class_to_module[name])
1373 value = getattr(module, name)
1374 else:
File /usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:1384, in _LazyModule._get_module(self, module_name)
1382 return importlib.import_module("." + module_name, self.__name__)
1383 except Exception as e:
-> 1384 raise RuntimeError(
1385 f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
1386 f" traceback):\n{e}"
1387 ) from e
RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback):
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
$ which nvcc
/usr/local/cuda/bin/nvcc
I've tried to add /usr/local/cuda/bin to LD_LIBRARY_PATH with export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/bin/, but the error remains.
Reproduction
from transformers import AutoTokenizer, AutoModelForCausalLM
model = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model, device_map="auto", load_in_4bit=True)
Expected behavior
I should be able to load the model in 4bit.