Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation]: Inconsistent install instructions for ROCm/bitsandbytes #3132

Open
garrettbyrd opened this issue May 15, 2024 · 5 comments
Open
Assignees

Comments

@garrettbyrd
Copy link

garrettbyrd commented May 15, 2024

Description of errors

There are some recent blog posts [1,2,3] that provide the following install instructions for ROCm/bitsandbytes for ROCm v6.x:

git clone --recurse https://github.com/ROCm/bitsandbytes.git
cd bitsandbytes
git checkout rocm_enabled
make hip
python setup.py install

One can easily verify that the rocm_enabled branch does not include a Makefile.

In the README for this branch, it does provide the following instructions for installing ROCm/bitsandbytes:

git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
# Checkout branch as needed
# for rocm 5.7 - rocm5.7_internal_testing
# for rocm 6.x - rocm6.2_internal_testing
git checkout <branch>
make hip
python setup.py install

These instructions seem to work.

However, in the README for the branch rocm6.2_internal_testing, the same incorrect install steps are provided for ROCm/bitsandbytes:

# Install BitsandBytes
git clone --recurse https://github.com/ROCmSoftwarePlatform/bitsandbytes
cd bitsandbytes
git checkout rocm_enabled
make hip
python setup.py install

The same incorrect instructions are in the README for the branch rocm5.7_internal_testing.

The branch rocm6.2_internal_testing needs to be corrected to have the following install instructions:

git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
git checkout rocm6.2_internal_testing
make hip
python setup.py install

And the branch rocm5.7_internal_testing needs to be corrected to have the following install instructions

git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
git checkout rocm5.7_internal_testing
make hip
python setup.py install

This would provide working, consistent install instructions between the three branches rocm5.7_internal_testing, rocm6.2_internal_testing, and rocm_enabled.

Other branches might have incorrect instructions, but I have not checked branches besides these three.

[1] https://rocm.blogs.amd.com/artificial-intelligence/starcoder-fine-tune/README.html

[2] https://rocm.blogs.amd.com/artificial-intelligence/llama2-Qlora/README.html

[3] https://rocm.blogs.amd.com/artificial-intelligence/llama2-lora/README.html

@clintg6
Copy link

clintg6 commented May 15, 2024

Hi Garrett,

The instructions in the blog will be updated shortly. In the meantime, the recommended installation for bitsandbytes for ROCm is as follows:

git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
git checkout rocm_enabled
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=hip -S .
make
pip install .

@pnunna93
Copy link
Collaborator

Hi @garrettbyrd , Thanks for bringing it to our attention. There were lot of updates in the last few weeks which made those instructions obsolete. All the branches are updated now with latest instructions.
rocm_enabled - https://github.com/ROCm/bitsandbytes/blob/rocm_enabled/README.md
rocm6.2_internal_testing - https://github.com/ROCm/bitsandbytes/blob/rocm6.2_internal_testing/README.md
rocm5.7_internal_testing - https://github.com/ROCm/bitsandbytes/blob/rocm5.7_internal_testing/README.md

@ppanchad-amd
Copy link

@garrettbyrd Please advise if we can go ahead and close the ticket. Thanks!

@garrdbyrd
Copy link

This is @garrettbyrd using my personal account.

I was not able to reproduce.

The package is able to build and is successfully added to my conda environment, but errors occur at runtime.

I set up a fresh conda env as follows:

# Install ROCm
# ...
# Install conda
# ...

# Conda env setup
conda create -n bits-env
conda activate bits-env
conda install pip jupyterlab

# Install pytorch
# https://pytorch.org/get-started/locally/
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

# Install bitsandbytes
git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
git checkout rocm_enabled
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=hip -S .
make
pip install .

I set up an example notebook with two cells.
First cell:

import torch
import bitsandbytes
from torch import nn

First cell output:

Could not load bitsandbytes native library: [/home/USER/.conda/envs/bits-env/lib/python3.12/site-packages/zmq/backend/cython/../../../../.././libstdc](https://file+.vscode-resource.vscode-cdn.net/home/USER/.conda/envs/bits-env/lib/libstdc)++.so.6: version `GLIBCXX_3.4.32' not found (required by [/home/USER/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/libbitsandbytes_hip_nohipblaslt.so](https://file+.vscode-resource.vscode-cdn.net/home/USER/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/libbitsandbytes_hip_nohipblaslt.so))
Traceback (most recent call last):
  File "/home/USER/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 124, in <module>
    lib = get_native_library()
          ^^^^^^^^^^^^^^^^^^^^
  File "/home/USER/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 104, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/USER/.conda/envs/bits-env/lib/python3.12/ctypes/__init__.py", line 460, in LoadLibrary
    return self._dlltype(name)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/USER/.conda/envs/bits-env/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [/home/USER/.conda/envs/bits-env/lib/python3.12/site-packages/zmq/backend/cython/../../../../.././libstdc](https://file+.vscode-resource.vscode-cdn.net/home/USER/.conda/envs/bits-env/lib/libstdc)++.so.6: version `GLIBCXX_3.4.32' not found (required by [/home/USER/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/libbitsandbytes_hip_nohipblaslt.so](https://file+.vscode-resource.vscode-cdn.net/home/USER/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/libbitsandbytes_hip_nohipblaslt.so))

CUDA Setup failed despite CUDA being available. Please run the following command to get more information:

python -m bitsandbytes

Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

Second cell:

model = nn.Sequential(
    nn.Linear(10, 50),
    nn.ReLU(),
    nn.Linear(50, 1)
)

optimizer = bitsandbytes.optim.Adam8bit(model.parameters())

input = torch.randn(1, 10)
target = torch.randn(1, 1)

output = model(input)
loss = nn.MSELoss()(output, target)
loss.backward()
optimizer.step()

print("Loss:", loss.item())

Second cell output:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], [line 21](vscode-notebook-cell:?execution_count=2&line=21)
     [19](vscode-notebook-cell:?execution_count=2&line=19) # Backward pass and optimize
     [20](vscode-notebook-cell:?execution_count=2&line=20) loss.backward()
---> [21](vscode-notebook-cell:?execution_count=2&line=21) optimizer.step()
     [23](vscode-notebook-cell:?execution_count=2&line=23) print("Loss:", loss.item())

File [~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/optim/optimizer.py:391](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/optim/optimizer.py:391), in Optimizer.profile_hook_step.<locals>.wrapper(*args, **kwargs)
    [386](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/optim/optimizer.py:386)         else:
    [387](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/optim/optimizer.py:387)             raise RuntimeError(
    [388](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/optim/optimizer.py:388)                 f"{func} must return None or a tuple of (new_args, new_kwargs), but got {result}."
    [389](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/optim/optimizer.py:389)             )
--> [391](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/optim/optimizer.py:391) out = func(*args, **kwargs)
    [392](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/optim/optimizer.py:392) self._optimizer_step_code()
    [394](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/optim/optimizer.py:394) # call optimizer step post hooks

File [~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/utils/_contextlib.py:115](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/utils/_contextlib.py:115), in context_decorator.<locals>.decorate_context(*args, **kwargs)
    [112](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/utils/_contextlib.py:112) @functools.wraps(func)
    [113](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/utils/_contextlib.py:113) def decorate_context(*args, **kwargs):
    [114](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/utils/_contextlib.py:114)     with ctx_factory():
--> [115](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/torch/utils/_contextlib.py:115)         return func(*args, **kwargs)

File [~/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/optim/optimizer.py:287](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/optim/optimizer.py:287), in Optimizer8bit.step(self, closure)
    [284](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/optim/optimizer.py:284)             self.init_state(group, p, gindex, pindex)
...
-> [1620](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/functional.py:1620)     optim_func = str2optimizer32bit[optimizer_name][0]
   [1621](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/functional.py:1621) elif g.dtype == torch.float16:
   [1622](https://file+.vscode-resource.vscode-cdn.net/home/USER/Documents/temp/~/.conda/envs/bits-env/lib/python3.12/site-packages/bitsandbytes/functional.py:1622)     optim_func = str2optimizer32bit[optimizer_name][1]

NameError: name 'str2optimizer32bit' is not defined

This is consistent with some testing I was doing on another system with basically identical (but anecdotal) conditions. I also get the same output when wrapping with with torch.cuda.device(0):.

Here is the output of python -m bitsandbytes:

(bits-env) [USER@USER bitsandbytes]$ python -m bitsandbytes
Could not load bitsandbytes native library: /home/USER/.conda/envs/bits-env/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /home/USER/Documents/temp/bitsandbytes/bitsandbytes/libbitsandbytes_hip_nohipblaslt.so)
Traceback (most recent call last):
  File "/home/USER/Documents/temp/bitsandbytes/bitsandbytes/cextension.py", line 124, in <module>
    lib = get_native_library()
          ^^^^^^^^^^^^^^^^^^^^
  File "/home/USER/Documents/temp/bitsandbytes/bitsandbytes/cextension.py", line 104, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/USER/.conda/envs/bits-env/lib/python3.12/ctypes/__init__.py", line 460, in LoadLibrary
    return self._dlltype(name)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/USER/.conda/envs/bits-env/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/USER/.conda/envs/bits-env/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /home/USER/Documents/temp/bitsandbytes/bitsandbytes/libbitsandbytes_hip_nohipblaslt.so)

CUDA Setup failed despite CUDA being available. Please run the following command to get more information:

python -m bitsandbytes

Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(11, 0), cuda_version_string='60', cuda_version_tuple=(6, 0))
PyTorch settings found: CUDA_VERSION=60, Highest Compute Capability: (11, 0).
WARNING: CUDA versions lower than 11 are currently not supported for LLM.int8().
You will be only to use 8-bit optimizers and quantization routines!
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
The directory listed in your path is found to be non-existent: local/USER
The directory listed in your path is found to be non-existent: @/tmp/.ICE-unix/1853,unix/USER
The directory listed in your path is found to be non-existent: /org/freedesktop/DisplayManager/Session1
The directory listed in your path is found to be non-existent: /etc/gtk/gtkrc
The directory listed in your path is found to be non-existent: /home/USER/.gtkrc
The directory listed in your path is found to be non-existent: /etc/gtk-2.0/gtkrc
The directory listed in your path is found to be non-existent: /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/session.slice/plasma-plasmashell.service/memory.pressure
The directory listed in your path is found to be non-existent: /Sessions/1
The directory listed in your path is found to be non-existent: /org/freedesktop/DisplayManager/Seat0
The directory listed in your path is found to be non-existent: /home/USER/.cache/dotnet_bundle_extract
The directory listed in your path is found to be non-existent: //debuginfod.archlinux.org 
The directory listed in your path is found to be non-existent: /var/lib/spack/modules/tcl
The directory listed in your path is found to be non-existent: /Windows/1
CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
Couldn't load the bitsandbytes library, likely due to missing binaries.
Please ensure bitsandbytes is properly installed.

For source installations, compile the binaries with `cmake -DCOMPUTE_BACKEND=cuda -S .`.
See the documentation for more details if needed.

Trying a simple check anyway, but this will likely fail...
Traceback (most recent call last):
  File "/home/USER/Documents/temp/bitsandbytes/bitsandbytes/diagnostics/main.py", line 66, in main
    sanity_check()
  File "/home/USER/Documents/temp/bitsandbytes/bitsandbytes/diagnostics/main.py", line 40, in sanity_check
    adam.step()
  File "/home/USER/.conda/envs/bits-env/lib/python3.12/site-packages/torch/optim/optimizer.py", line 391, in wrapper
    out = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/USER/.conda/envs/bits-env/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/USER/Documents/temp/bitsandbytes/bitsandbytes/optim/optimizer.py", line 287, in step
    self.update_step(group, p, gindex, pindex)
  File "/home/USER/.conda/envs/bits-env/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/USER/Documents/temp/bitsandbytes/bitsandbytes/optim/optimizer.py", line 496, in update_step
    F.optimizer_update_32bit(
  File "/home/USER/Documents/temp/bitsandbytes/bitsandbytes/functional.py", line 1620, in optimizer_update_32bit
    optim_func = str2optimizer32bit[optimizer_name][0]
                 ^^^^^^^^^^^^^^^^^^
NameError: name 'str2optimizer32bit' is not defined
Above we output some debug information.
Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose
WARNING: Please be sure to sanitize sensitive info from the output before posting it.

It seems that GCC 13.2 is required (see mentions of GLIBCXX_3.4.32, I am using 14.1.1), but I have not found this formally stated anywhere. If this is the case, I can test on another system that uses module environments (same alternative system mentioned above that didn't work before).

Minor note: this hunch did not work with a simple conda install conda-forge::gcc (which is 13.2.0) and remake/installing.

@pnunna93
Copy link
Collaborator

pnunna93 commented Jun 4, 2024

@garrdbyrd , could you please build with this dockerfile and check?
bnb_rocm_dockerfile.txt
Your environment may have multiple libstdc++.so.* files, which caused the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants