Skip to content

Installation succeed with CUDA 12.3, but libcudart.so is not found #956

@IamGianluca

Description

@IamGianluca

System Info

Hi,

I'm running bitsandbytes from a Docker container based on the nvcr.io/nvidia/pytorch:23.12-py3 Docker image. I've installed the library from source, with the following commands:

CUDA_VERSION=123 make cuda12x
python setup.py install

The installation completes successfully, and I can import bitsandbytes from the Python interpreter.

However, executing the following code snippet throws the following error.

from transformers import AutoTokenizer, AutoModelForCausalLM
model = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model, device_map="auto", load_in_4bit=True)
False

===================================BUG REPORT===================================
================================================================================
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/compat/lib'), PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
The following directories listed in your path were found to be non-existent: {PosixPath('//matplotlib_inline.backend_inline'), PosixPath('module')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=123, Highest Compute Capability: 8.6.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda123.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=123
python setup.py install
CUDA SETUP: Setup Failed!
/usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/cuda_setup/main.py:167: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes


/usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/cuda_setup/main.py:167: UserWarning: /usr/local/lib/python3.10/dist-packages/torch/lib:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:1382, in _LazyModule._get_module(self, module_name)
   1381 try:
-> 1382     return importlib.import_module("." + module_name, self.__name__)
   1383 except Exception as e:

File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1006, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:688, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:883, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File /usr/local/lib/python3.10/dist-packages/transformers/integrations/bitsandbytes.py:11
     10 if is_bitsandbytes_available():
---> 11     import bitsandbytes as bnb
     12     import torch

File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/__init__.py:6
      1 # Copyright (c) Facebook, Inc. and its affiliates.
      2 #
      3 # This source code is licensed under the MIT license found in the
      4 # LICENSE file in the root directory of this source tree.
----> 6 from . import cuda_setup, utils, research
      7 from .autograd._functions import (
      8     MatmulLtState,
      9     bmm_cublas,
   (...)
     13     matmul_4bit
     14 )

File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/research/__init__.py:1
----> 1 from . import nn
      2 from .autograd._functions import (
      3     switchback_bnb,
      4     matmul_fp8_global,
      5     matmul_fp8_mixed,
      6 )

File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/research/nn/__init__.py:1
----> 1 from .modules import LinearFP8Mixed, LinearFP8Global

File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/research/nn/modules.py:8
      7 import bitsandbytes as bnb
----> 8 from bitsandbytes.optim import GlobalOptimManager
      9 from bitsandbytes.utils import OutlierTracer, find_outlier_dims

File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/optim/__init__.py:6
      1 # Copyright (c) Facebook, Inc. and its affiliates.
      2 #
      3 # This source code is licensed under the MIT license found in the
      4 # LICENSE file in the root directory of this source tree.
----> 6 from bitsandbytes.cextension import COMPILED_WITH_CUDA
      8 from .adagrad import Adagrad, Adagrad8bit, Adagrad32bit

File /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.41.3.post1-py3.10.egg/bitsandbytes/cextension.py:20
     19     CUDASetup.get_instance().print_log_stack()
---> 20     raise RuntimeError('''
     21     CUDA Setup failed despite GPU being available. Please run the following command to get more information:
     22 
     23     python -m bitsandbytes
     24 
     25     Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
     26     to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
     27     and open an issue at: [https://github.com/TimDettmers/bitsandbytes/issues](https://github.com/TimDettmers/bitsandbytes/issues%3C/span%3E%3Cspan) style="color:rgb(175,0,0)">''')
     28 lib.cadam32bit_grad_fp32 # runs on an error if the library could not be found -> COMPILED_WITH_CUDA=False

RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[4], line 3
      1 model = "meta-llama/Llama-2-7b-chat-hf"
      2 tokenizer = AutoTokenizer.from_pretrained(model)
----> 3 model = AutoModelForCausalLM.from_pretrained(model, device_map="auto", load_in_4bit=True)

File /usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py:566, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    564 elif type(config) in cls._model_mapping.keys():
    565     model_class = _get_model_class(config, cls._model_mapping)
--> 566     return model_class.from_pretrained(
    567         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    568     )
    569 raise ValueError(
    570     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    571     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    572 )

File /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:3476, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3473     keep_in_fp32_modules = []
   3475 if load_in_8bit or load_in_4bit:
-> 3476     from .integrations import get_keys_to_not_convert, replace_with_bnb_linear
   3478     llm_int8_skip_modules = quantization_config.llm_int8_skip_modules
   3479     load_in_8bit_fp32_cpu_offload = quantization_config.llm_int8_enable_fp32_cpu_offload

File <frozen importlib._bootstrap>:1075, in _handle_fromlist(module, fromlist, import_, recursive)

File /usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:1372, in _LazyModule.__getattr__(self, name)
   1370     value = self._get_module(name)
   1371 elif name in self._class_to_module.keys():
-> 1372     module = self._get_module(self._class_to_module[name])
   1373     value = getattr(module, name)
   1374 else:

File /usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:1384, in _LazyModule._get_module(self, module_name)
   1382     return importlib.import_module("." + module_name, self.__name__)
   1383 except Exception as e:
-> 1384     raise RuntimeError(
   1385         f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
   1386         f" traceback):\n{e}"
   1387     ) from e

RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback):

        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
$ which nvcc
/usr/local/cuda/bin/nvcc

I've tried to add /usr/local/cuda/bin to LD_LIBRARY_PATH with export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/bin/, but the error remains.

Reproduction

from transformers import AutoTokenizer, AutoModelForCausalLM
model = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model, device_map="auto", load_in_4bit=True)

Expected behavior

I should be able to load the model in 4bit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions