Feature Request: ROCm support (AMD GPU) #107

gururise · 2022-12-11T17:46:36Z

Could you please add official AMD ROCm support to this library? An unofficial working port already exists:

https://github.com/broncotc/bitsandbytes-rocm

Thank You

TimDettmers · 2023-02-02T20:29:33Z

Amazing! Thank you for bringing this to my attention. I will try to get in touch with the author of the ROCm library and support AMD GPUs by default.

YellowRoseCx · 2023-02-12T12:13:31Z

Amazing! Thank you for bringing this to my attention. I will try to get in touch with the author of the ROCm library and support AMD GPUs by default.

that would be AMAZING! especially with you recently adding 8 bit support. I tried to make my own merge of the forks but I don't really know what I'm doing and don't think I did it correctly

anonymous721 · 2023-02-14T04:20:44Z

If the ROCm fork does get merged in, would the Int8 Matmul compatibility improvements also work for AMD GPUs?

deftdawg · 2023-04-17T03:28:56Z

@TimDettmers, curious if AMD support any nearer to being merged? @agrocylo made a PR (#296) based somewhat on @broncotc's fork...

gururise · 2023-06-20T14:59:07Z

EDIT: A slightly newer version branched from v0.37 available here:
https://github.com/Titaniumtown/bitsandbytes-rocm/tree/patch-2

elukey · 2023-06-22T12:23:56Z

The Wikimedia foundation is really interested in the ROCm support too, since Nvidia is not viable for us due to open-source constraints. @TimDettmers we offer any help (testing/review/etc..) to help merge this feature, it would be really great for the ML open source ecosystem. Thanks in advance!

Aria-K-Alethia · 2023-07-17T09:37:56Z

EDIT: A slightly newer version branched from v0.37 available here: https://github.com/Titaniumtown/bitsandbytes-rocm/tree/patch-2

Hi,
I'm also seeking an AMD-GPU-compatible version.
I tried your patch-2 version but the code still cannot work.
The error info looks like:

  File "/home/.local/lib/python3.8/site-packages/bitsandbytes/autograd/__init__.py", line 1, in <module>
    from ._functions import undo_layout, get_inverse_transform_indices
  File "/home/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 9, in <module>
    import bitsandbytes.functional as F
  File "/home/.local/lib/python3.8/site-packages/bitsandbytes/functional.py", line 17, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "/home/.local/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 74, in <module>
    raise RuntimeError('''
RuntimeError: 
        CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
        If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
        https://github.com/TimDettmers/bitsandbytes/issues

I use AMD MI200 card.
Do you have any idea on this?
Many thanks.

PatchouliPatch · 2023-10-08T08:01:54Z

Hello, I was wondering how far-off the ROCm support is. I'm trying to see if my 7900XTX will be useful in a project of mine. The Llama2 quick start guide makes use of bitsandbytes, and as far as I know there isn't any other alternatives.

jiagaoxiang · 2023-10-30T23:30:05Z

Found this rocm version of bitsandbytes: https://github.com/Lzy17/bitsandbytes-rocm/tree/main

mauricioscotton · 2023-10-31T21:35:58Z

The only rocm version that worked for me on GFX900 was this one: https://github.com/agrocylo/bitsandbytes-rocm
All the others failed to compile/install
(Rocm 5.2)

st1vms · 2023-11-23T20:51:23Z

For anyone that needs a patch for RDNA3 cards I created this fork https://github.com/st1vms/bitsandbytes-rocm-gfx1100

This fork patches the Makefile for targeting gfx1100 amdgpu module along latest ROCM and clang17...and fixes some hip include warnings.

Works with a RX7900XT and ROCM5.7 (along with torch-rocm5.7) installed.

Anyway there should be a better way of targeting the correct amdgpu module in the build system...

Edit:

Probably won't work with libraries requiring version > 0.35

Wintoplay · 2023-12-11T14:07:40Z

@st1vms There is a problem.
The version of BNB is 0.35.4 which is kind of outdated, and the latest version of Peft requires bitsandbytes>=0.37.0

st1vms · 2023-12-11T15:00:44Z

@st1vms There is a problem.
The version of BNB is 0.35.4 which is kind of outdated, and the latest version of Peft requires bitsandbytes>=0.37.0

If that fork still works for you, maybe it is ok to just change the version number.

You can test if the library works with:

python -m bitsandbytes

If that is the case, try editing the version number in the setup.py of the fork before building and installing it, i.e. change it to 0.37.0 and see if Peft works...

Wintoplay · 2023-12-11T17:14:38Z

@st1vms I tried BNB 0.39.0
The dependencies seem fine. However, when I tried to lora finetune according to this https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing

The Jupyter kernel crash, reason: undefined

st1vms · 2023-12-11T17:55:08Z

@st1vms I tried BNB 0.39.0
The dependencies seem fine. However, when I tried to lora finetune according to this https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing

The Jupyter kernel crash, reason: undefined

Well, the fork is probably obsolete already for some libraries, you should look for updated ones.

Wintoplay · 2023-12-11T18:10:02Z

@st1vms
I retried with a new virtual env and change from .ipynb to .py
This is the result.

(torch3) win@win-MS-7E02:/mnt/1df6b45e-20dc-41ca-9a04-b271fd3a4940/Learn$ /usr/bin/env /home/win/torch3/bin/python /home/win/.vscode-oss/extensions/ms-python.python-2023.20.0-universal/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 60843 -- /mnt/1df6b45e-20dc-41ca-9a04-b271fd3a4940/Learn/finetune.py
/home/win/torch3/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/home/win/torch3/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/home/win/torch3/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00, 7.21s/it]
trainable params: 8388608 || all params: 6666862592 || trainable%: 0.12582542214183376
0%| | 0/200 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
/home/win/torch3/lib/python3.10/site-packages/torch/utils/checkpoint.py:461: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/home/win/torch3/lib/python3.10/site-packages/bitsandbytes-0.41.0-py3.10.egg/bitsandbytes/autograd/_functions.py:231: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")

=============================================
ERROR: Your GPU does not support Int8 Matmul!

python: /mnt/1df6b45e-20dc-41ca-9a04-b271fd3a4940/bitsandbytes-rocm-gfx1100/csrc/ops.cu:347: int igemmlt(cublasLtHandle_t, int, int, int, const int8_t *, const int8_t *, void *, float *, int, int, int) [FORMATB = 3, DTYPE_OUT = 32, SCALE_ROWS = 0]: Assertion `false' failed.

gururise · 2023-12-11T20:20:57Z

@st1vms I tried BNB 0.39.0
The dependencies seem fine. However, when I tried to lora finetune according to this https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing
The Jupyter kernel crash, reason: undefined

Well, the fork is probably obsolete already for some libraries, you should look for updated ones.

Can someone post to this thread any updated forks? The lack of proper BnB support is really holding back the AMD cards.

gururise · 2023-12-20T16:29:43Z

Looks like things may finally move forward with official support in the not too distant future! Hope with ROCm 6.x we can finally see support merged into this repo.

TimDettmers · 2024-01-01T17:40:04Z

Sorry for taking so long on this. I am currently onboarding more maintainers and we should see some progress on this very soon. This is one of our high-priority issues.

SakshamG7 · 2024-01-06T08:24:56Z

Would love to see ROCM support, keep doing your good work

PatchouliPatch · 2024-02-11T13:38:27Z

if I may ask, what's the progress so far?

Airradda · 2024-02-11T19:59:57Z

if I may ask, what's the progress so far?

If you haven't already seen it, there was a comment made in the discussions with an accompanying tracking issue for general cross-platform support rather than just AMD/ROCM support. To that end it appears it is currently in the planning phase.

amathews-amd · 2024-03-23T19:47:20Z

@TimDettmers @Titus-von-Koeller , we are at ~95% parity for bnb for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled on Instinct class gpus, and working to close the gaps on Navi. At this point, we should be seriously considering upstreaming. Could you drop me an email at aswin.mathews@amd.com, and we can set up a call to discuss further.
cc: @sunway513 @Lzy17 @pnunna93

chauhang · 2024-03-29T07:16:00Z

@amathews-amd I tired compiling ROCm version of BnB from the rocm_enabled branch, but it is failing with errors on AMD MI250x. Do you have any suggestions for how to resolve the issue?

pnunna93 · 2024-03-29T15:24:02Z

@chauhang Could you try with rocm 6.0? You can use this docker - rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2 and install bitsandbytes directly.

chauhang · 2024-03-29T17:19:59Z

@pnunna93 I am already using ROCm 6.0 -- have added details of the pytorch environment here.

pnunna93 · 2024-03-29T18:28:35Z

@chauhang, you can skip the hipblaslt update and install bitsandbytes directly then. Please let me know if you face any issues.

ehartford · 2024-03-29T18:36:40Z

I was using arlo-phoenix fork. https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6/tree/rocm

Should I use the ROCm fork instead? https://github.com/ROCm/bitsandbytes/tree/rocm_enabled

pnunna93 · 2024-03-29T18:38:56Z

Yes, its updated for rocm 6

matthewdouglas · 2024-03-30T00:17:21Z

@TimDettmers @Titus-von-Koeller , we are at ~95% parity for bnb for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled on Instinct class gpus, and working to close the gaps on Navi. At this point, we should be seriously considering upstreaming. Could you drop me an email at aswin.mathews@amd.com, and we can set up a call to discuss further. cc: @sunway513 @Lzy17 @pnunna93

I've often had trouble understanding the state of GPU support in ROCm. So with that said, I have some clarification questions:

Can we clarify on what we mean by "Instinct-class" GPUs?
- The ROCm 6.0.2 docs suggest to me this is all CDNA, so MI100 and newer? Or is MI50 expected to work also?
What is the intention for Navi support?
- Is this for RDNA2/RDNA3 only?
Is there intent to support with ROCm < 6?

I'd like to be able to help get this merged, but need to figure out the constraints. The only AMD GPUs that I have on hand (RX 570 and R9 270X) aren't going to cut it.

The other issue is how far behind main this is. Ideally this could be implemented as a separate backend as proposed in #898. We would want to change to use CMake for building. I also think that it'd be better to unify the C++/CUDA code with the hipify code and take care of most of the changes with conditional compilation.

amathews-amd · 2024-03-30T00:41:07Z

Sure, here is the official list: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus. For BnB, since we are at initial enablement, it is dependent on where we are testing it (both hardware and software versions). We are currently focusing on MI250/MI300/gfx1100, and newer ROCm versions for testing.
We are assessing #898 as well, to see how we can adapt the rocm_enabled branch, so it fits in the new design.

Titus-von-Koeller · 2024-04-03T09:02:04Z

@TimDettmers @Titus-von-Koeller , we are at ~95% parity for bnb for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled on Instinct class gpus, and working to close the gaps on Navi. At this point, we should be seriously considering upstreaming. Could you drop me an email at aswin.mathews@amd.com, and we can set up a call to discuss further. cc: @sunway513 @Lzy17 @pnunna93

@amathews-amd I sent you an invite to our bnb-crossplatform slack, to the email you provided. Of course we should invite your other collaborators as well. Can we talk there and coordinate on scheduling a kickoff call?

Titus-von-Koeller · 2024-04-08T09:23:21Z

@amathews-amd the changes introduced through #898 are not final and weren't merged onto main but instead multi-backend-refactor in order to keep main releasable and allow us to iteratively arrive at a solution that works for all parties involved.

This means that there's ongoing work where a series of PRs onto multi-backend-refactor will concretize things further in tight collaboration with the community. Feel free to pitch in with opinions and concrete work, in case there's something that catches your eye and fits your expertise.

PatchouliPatch · 2024-05-05T16:41:15Z

is there a place where we can track the progress on the implementation of this?

PatchouliPatch · 2024-05-12T18:34:56Z

by the way, does anyone know where I can submit bug reports for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled? going to the page, there's no Issues tab.

Titus-von-Koeller · 2024-05-13T15:09:41Z

by the way, does anyone know where I can submit bug reports for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled? going to the page, there's no Issues tab.

Maybe @pnunna93 or @amathews-amd from AMD can help with that? I'm sure they'd appreciate your report.

is there a place where we can track the progress on the implementation of this?

Right now the best place is to look at PRs and recently merged PRs to the multi-backend-refactor branch.

We should make significant progress in the next weeks and make a alpha/beta release built off of that branch available as a nightly package release relatively soon.

(@PatchouliPatch)

pnunna93 · 2024-05-13T15:51:32Z

by the way, does anyone know where I can submit bug reports for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled? going to the page, there's no Issues tab.

We created issues tab - https://github.com/ROCm/bitsandbytes/issues , please feel free to open any bug reports.

TimDettmers added the enhancement New feature or request label Feb 2, 2023

gururise mentioned this issue Apr 3, 2023

AMD GPU tloen/alpaca-lora#245

Open

deftdawg mentioned this issue May 8, 2023

CUDA issues for AMD RX6900 XT ? #372

Closed

arlo-phoenix mentioned this issue Sep 8, 2023

Add ROCm support #756

Closed

Airradda mentioned this issue Oct 23, 2023

Bitsandbytes dos not support ROCm nixified-ai/flake#56

Open

TimDettmers added high priority (first issues that will be worked on) Low Risk Risk of bugs in transformers and other libraries labels Jan 1, 2024

clefourrier mentioned this issue Feb 26, 2024

Does lighteval support AMD GPUs? huggingface/lighteval#49

Closed

chauhang mentioned this issue Mar 29, 2024

Full finetune < 16GB pytorch/torchtune#527

Merged

Feature Request: ROCm support (AMD GPU) #107

Feature Request: ROCm support (AMD GPU) #107

Comments

gururise commented Dec 11, 2022

TimDettmers commented Feb 2, 2023

YellowRoseCx commented Feb 12, 2023

anonymous721 commented Feb 14, 2023 • edited

deftdawg commented Apr 17, 2023

gururise commented Jun 20, 2023

elukey commented Jun 22, 2023

Aria-K-Alethia commented Jul 17, 2023 • edited

PatchouliPatch commented Oct 8, 2023

jiagaoxiang commented Oct 30, 2023

mauricioscotton commented Oct 31, 2023

st1vms commented Nov 23, 2023 • edited

Wintoplay commented Dec 11, 2023

st1vms commented Dec 11, 2023

Wintoplay commented Dec 11, 2023 • edited

st1vms commented Dec 11, 2023

Wintoplay commented Dec 11, 2023

============================================= ERROR: Your GPU does not support Int8 Matmul!

gururise commented Dec 11, 2023

gururise commented Dec 20, 2023

TimDettmers commented Jan 1, 2024

SakshamG7 commented Jan 6, 2024

PatchouliPatch commented Feb 11, 2024

Airradda commented Feb 11, 2024

amathews-amd commented Mar 23, 2024

chauhang commented Mar 29, 2024

pnunna93 commented Mar 29, 2024

chauhang commented Mar 29, 2024

pnunna93 commented Mar 29, 2024

ehartford commented Mar 29, 2024

pnunna93 commented Mar 29, 2024

matthewdouglas commented Mar 30, 2024

amathews-amd commented Mar 30, 2024

Titus-von-Koeller commented Apr 3, 2024

Titus-von-Koeller commented Apr 8, 2024

PatchouliPatch commented May 5, 2024

PatchouliPatch commented May 12, 2024

Titus-von-Koeller commented May 13, 2024 • edited

pnunna93 commented May 13, 2024

anonymous721 commented Feb 14, 2023 •

edited

Aria-K-Alethia commented Jul 17, 2023 •

edited

st1vms commented Nov 23, 2023 •

edited

Wintoplay commented Dec 11, 2023 •

edited

=============================================
ERROR: Your GPU does not support Int8 Matmul!

Titus-von-Koeller commented May 13, 2024 •

edited