-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which devices are even supported? (HIP/ROCm) #1714
Comments
@samuelpmish We are sorry you were unable to find the information you need on the documentation portal. Please refer to the ROCm Installation Guide and the latest version of the ROCm Release Notes (v5.0), and let us know if they were helpful. If there's specific information you need, please let me know, and I am happy to help. AMD ROCm Documentation Team |
this does not contain any information about which devices support ROCm or HIP.
https://docs.amd.com/bundle/ROCm_Release_Notes_v5.0/page/About_This_Document.html Thank you, this document does indicate that there are seven GPUs that support ROCm: Instinct (MI50, MI60, MI100, MI200) and Pro (VII, W6800, V620). Does this imply that all other AMD GPUs do not support ROCm? All of the products indicated above have multi-thousand-dollar price tags and/or are not even being manufactured.
The original question was specific: which AMD GPUs support ROCm and/or HIP? |
I tried AMD Vega 64 and it works so at least there is that. I do want to figure out if Navi 21 is supported, then what prevents Navi 22 from getting supported? Does something like 6700 XT get supported, even unofficially? |
The list of supported GPUs is also found here in the prerequisite actions document. Even here though it does not specify if other GPUs based on the same architecture are supported. |
It does not even list all supported GPU. I have a Vega 64 and I can confirm it works |
It works on my RX 6800 XT now. AMD should really add a "unsupported but works" category to their list of supported devices. |
Yeah I think so too. The point of a document is to make thing clear. It seems AMD is trying so hard to do the exact opposite. It seems the company really do not want "casual" radeon users to know that their card can work for some reason. Anyway, does that mean a 6800 non XT should work to, cuz I am thinking of getting one. |
Here is my understanding. ROCm is a software suite with compilers, runtime libraries, accelerated numerical libraries, AI related libraries and more. "Support" simply means given hardware are validated at AMD with the whole ROCm stack. a) Technically the compiler likely works for all the GPUs being listed https://llvm.org/docs/AMDGPUUsage.html. This means compiling/linking not necessarily running the code. Users may just need a subset of the stack for their purpose. That is why some ROCm "unsupported" hardware works in limited scopes. Since the scope is on per-user basis, this is not meaningful to list "unsupported but works". |
Thanks to @mark-decker and @ye-luo for linking some relevant documentation to shed light on this issue. I still wish someone official would weigh in, rather than having us speculate about the reality of what works and what doesn't. I agree that "unsupported but works" is sort of a meaningless idea, perhaps "untested" would be more accurate. If it is the case that some of the libraries in the stack do not support certain cards, then AMD should at least communicate that, rather than being ambiguous about it. e.g. (NOTE: this table is for illustration only, it does not reflect what actually works and what doesn't)
✔️ : confirmed to work Something like the above needs to be front and center on the documentation, if it is the case that the library support is so limited. |
What more interesting to me is why gfx1030 works, but gfx 1031 does not? It was not the case with Polars. It was not the case with Vega. The cut down version works just fine. It seems to me that AMD is trying so hard to limit Rocm tool for high-end/professional grade product. Meanwhile Nvidia has a 3060 with 12GB VRAM, bringing ML to everyone. It is a shame really. |
I think there is a distinction to be made between "working" and "supported". That is, a GPU might seemingly work, but has subtle bugs (e.g. correctness). AMD might choose not to be bothered with bug reports about older cards with this state (e.g. gfx803). I would suggest considering these cards working with known issues, yet being unsupported. On the other hand, as a prospective buyer I want to know, to which AMD commits some amount of attention. For example, the W6800 is currently supported, so if one buys that card today, one should reasonably expect to find any reported issues with it being honored on this issue tracker within its useful lifetime. This consideration necessitates a fourth category:
|
Adding to the list of unhelpful information, there is also this two-year-old - ehm - gem of an outdated document to add confusion: https://github.com/ROCm/ROCm.github.io/blob/master/hardware.md |
It's unfortunate, but official replies can be hard to come by at times, especially regarding support for hardware. A small subset of issues that received either vague or no official answers is #1706 #1694 #1683 #1676 #1617 #1623 #1631 #1595 #1592 #1544 #1547 #1539 When timelines have been given/set, they've been missed every time that I'm aware of. RDNA1 is nearly 3 years in market (launch was July 7 2019) but the workstation card still had no support in the stack. With the Frontier super computer now behind schedule with it's software stack, I'm expecting engineering resources that would be allocated to RDNA1-2 support to be redirected towards improving CDNA2. See https://insidehpc.com/2022/03/oak-ridge-frontier-exascale-to-deliver-full-user-operations-on-jan-1-2023-crusher-test-system-now-running-code/ for more information on the Frontier delay. |
My graphics card is 6800xt and I tried to install rocm5.1 and pytorch, pytorch displays CUDA.is_ Available is true, but an error about HIP will be reported when running. However,there is no problem when I run the training in the packaged image in docker,. I don't know how to solve the problem and how to configure the pytorch in my local environment. |
What is the error? Nothing was shown to you? |
Actually the hip/clang compiler support many GPUs. When ROCm-4.3 released, I added With help from ROCm developers, navi22 enabled rocBLAS is distributed on gentoo, and I expect gfx1031 on other packages can be more easily enabled. |
Well yes but the problem is the amount of tinkering required to make, say 6700 XT, works maybe a lot. Assuming I am a casual student who do gaming on Windows, but want to dabble in to ML. Not only I have to install a completely new OS, I need to figure out the many tricks of Ubuntu/Linux to install 6700 XT and make it run pytorch... Or I can just get a Nvidia card, and it "just works" on Windows. Now if you think about it, Rocm user-friendliness is like 10 steps behind Nvidia. Is there any "it just work" guide for installing rocm to run tf/pytorch on 6700 XT? If not, that is a huge problem. |
Yeah, ROCm absolutely needs a proper support matrix and a strong public commitment from AMD to get as many GPUs supported as possible, as quickly as possible.. According to two AMD engineers, ROCm actually supports pretty much every GPU since Polaris to varying degrees. rocm-opencl for example should work on everything since Vega, while HIP should work on every GPU since Polaris (but has apparently seen very little testing on older chips). It's also a chicken-and-egg problem, there's really not much software to test with in the first place, and the limited official support makes ROCm not very attractive to developers. Looking at the seven officially supported cards would do little to convince most devs to target ROCm. |
Well if someone has to take a bet, it has to be AMD. Can't win a war if you do not burn some money. As a programmer myself, I would say AMD is hesitant to burn more R&D budget on Rocm that they has already did, thus creating this unfinished product called Rocm that works with every card, but 50% of the cards, and every time, but 50% of the time. Goes big or goes home does apply here, and I believe Intel is very much willing to chew away this market from Nvidia also. My opinion means shit of course, but maybe expanding their budgets on Rocm, both technical and marketing. Hide more programmers, sure, but also gives out free/discounted AMD GPUs to academy institutions, create competitions like BETA ML with AMD or something to both hunt bugs and make progress with Rocm. More people in, more data for dev to work with GPUs, more polished product and so on... And also please freaking make Rocm works on Windows. Treat Rocm becomes a product, not a tool. Well, just my two cents of BSing. I do want to support AMD/Rocm, but I would love not to pay scalper money to get a lack luster ML GPU that does not event "officially" supported on paper. |
After exploring for a few days, I think I know the reason. According to the official website documentation, I know i need to download the source code of torch and compile a version of torch suitable for my hardware in my local environment. I failed this step because I am a linux novice, but It doesn't matter, it's more convenient to use docker images, and local deployment is just because of my obsessive-compulsive disorder.finally thank you。 |
Couldn't agree more. Also: clear categories for HPC, workstation/prosumer and consumer hardware. |
The box of RX 6800XT literally advertises something that’s not officially supported. Why is there no word about whether it’s officially supported? |
Navi1x GPU support will not be available in ROCm. My apologies for the delays in confirming this. AMD GPU support is based on ISA architectures. We officially support two Navi21 GPUs that use the gfx1030 architecture. These two GPUs are Radeon Pro V620 and Radeon Pro W6800. However, if you look at https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/be030feb91fff8d6d2b4409153fe549b81237580/CMakeLists.txt#L113-L118, our code only incorporates GPU support based on the ISA architecture. The model name only impacts official support. As a result, you can be confident that Radeon RX 6800, Radeon RX 6800 XT and Radeon RX 6900 XT run on a stack that has undergone full QA verification of the ISA code generated that is specific to this GPU architecture. Of course, at the moment no official support is promised for the consumer GPUs. And performance optimizations for the supported GPUs may not carry over to the unsupported gfx1030 GPUs due minor hardware differences. Going forward, the lack of clarity on GPU support will be addressed. Please be patient and continue to report issues. |
There was a discussion about better documenting the GPU support status: ROCm#1714 This pull request makes an attempt on documenting the latest official statement on the matter by @saadrahim: ROCm#1714 (comment)
@saadrahim thanks for clarifying the matter. I created a pull request documenting the current state of unofficial support in the README. Would you please extend your statement to the recently released "50" variants of cards? The AMD Radeon 6950XT is also using the gfx1030 ISA and should therefore also be unofficially supported, right? |
I successfully use HIP and rocm-opencl on a 5700XT, so RDNA1 evidently works, even if it's not officially supported. AMD's own recently released HIP-RT officially supports Vega1, Vega2, RDNA1 and RDNA2, and runs on ROCm - which officially only supports one of those GPU generations. There appears to be a lot of confusion on AMD's side what "supported" means and what ROCm even is in the first place. |
It's somewhat off-topic, but folks may also be interested in Debian's Supported GPU List for their ROCm packages. |
Briefly jumping in: a few factors will dictate if you can run a model:
assuming you run a 3b model at int8 quanta, that's 3GBs of model data in vram. add some margin for pointers, maths and so on (context/tokens for 1k length can be ~500MB) and you're at 3.5GBs. Don't forget you also have a desktop environment to run. In essence, a 3B model can barely fit on a 4GB card, but will fit (depending on your setup). Navi10 GPUs (5600xt, 5700, 5700xt) all ship with 6 to 8 GBs of vram. |
What architecture does RX 5700 XT use? |
The RX 5700 XT is gfx1010. |
Thanks for the info! |
I don't understand why Debian have to list AMD gpus supported by ROCm and not AMD officially |
Because when a HW vendor says something is supported, they can be taken to task for it when it breaks. When opensource gets something that's sort of working/running, there's a larger understanding of what that does and doesn't mean. It's the same reason you'll always see the WS/enterprise cards supported "first" by a vendor, because the support surface area for specific applications being problematic is much smaller |
Sure, you mean like Intel or AMD that were taken to task for all their
security related hardware bugs, which they mostly fixed by new microcode
that made their CPUs run slower. Significantly slower. And in some cases,
Intel just cancelled some features in their CPUs that caused problems.
Spectre, Meltdown, Zenbleed,SQUIP, AEPIC, Downfall, the list is very long,
on average gets longer by ~2 entries per year, the newest ones are really
irritating as they relate to CPUs that might old, but laptops with 10th &
11th generation Intel CPUs are still sold today.
Sure, HW vendors that don't deliver their documented promises get punished.
ROTFL.
I mean, it took nearly a decade for German courts to come to the conclusion
that a car with patched engine firmware that does not deliver the promised
performance is less valuable than the car offered and sold. And IT HW
vendors seem to have yet to been sued for their "security fixes" that kind
of break the product, only a cynic would think that making old CPUs run
either in an unsafe way or slowly was a hint to upgrade to new hardware?
Am Mo., 28. Aug. 2023 um 14:12 Uhr schrieb FelixCLC <
***@***.***>:
… Because when a HW vendor says something is supported, they can be taken to
task for it when it breaks.
When opensource gets something that's sort of working/running, there's a
larger understanding of what that does and doesn't mean.
It's the same reason you'll always see the WS/enterprise cards supported
"first" by a vendor, because the support surface area for specific
applications being problematic is much smaller
—
Reply to this email directly, view it on GitHub
<#1714 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAH5B3CBEND2QP356RZC7ZLXXSDJ5ANCNFSM5RSXBIRQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
So, I got a used RX 5700 XT from ebay and want to get things running. |
You mean huggingface/transformers? It seems to depend on PyTorch, Tensorflow or Jax. The RX 5700 XT is gfx1010. It is not officially supported by ROCm. To my knowledge, only Debian is building the ROCm math libraries for that architecture. However, Debian has not yet packaged miopen or pytorch-rocm. You can use the Debian packages for most of the ROCm libraries, but would need to extend MIOpen and PyTorch with support for gfx1010, then build them from source. |
Yes the huggingface transformers library, exactly And another question: |
I don't know.
It is Navi 23 and is therefore gfx1032.
Neither is officially supported, but the gfx1032 ISA is identical to the gfx1030 ISA. It can probably be made to work by setting the environment variable As one of the members of the Debian AI team working on packaging this stuff, I think you can expect improvements for all RDNA cards over the next year as we've nearly finished packaging the ROCm math libraries and are moving on to packaging the AI libraries. For the most part, it has not been very difficult to extend basic functionality to all discrete AMD GPUs as we've prepared the packages. |
For AI stuff and some gaming would the RTX 3060 be the best price to performence option? |
For AI, yes, for gaming, no. But we are getting off topic here. |
For RDNA3 optimization is still on-going, e.g. ROCm/rocBLAS@247d4a9 is still in develop branch and do not enter any release yet. With out that optimization you will get poor FP32 performance (ROCm/Tensile#1715). However its FP16 and FP32+=FP16*FP16 mixed performance already looks good. So wait for optimizations for RDNA3. |
Now with the release of ROCm 5.7.1, does only the RX 7900 xtx work or also GPUs like the RX 7600? |
I am very confident, that AMD did not remove any before supported GPUs, and those we managed to get working. It would be extremely counter productive to wipe all the previous support and start from zero. |
I think that thread is a good place to ask - I'm a daily Linux user and ML student. I'd like to buy myself RX 6700 XT for Christmas. Has anyone made it work with Pytorch on Linux with ROCM 5.6/5.7? Is the performance of this GPU better than RTX 3060 or does lack of official support for Linux slow it down in any way? I'm fine with tinkering, just curious if it's even possible before buying |
On Linux, after the rocr-runtime/hsa level, you can get nearly same level of support of Pro W6800 (gfx1030) which is on the official support list, via environment variable References: |
I have 2 6750xt 12gb, and it works pretty good, if your having troubles and picked up a card. I have a list of maybe 25 calls from a minimal rhel install to running ml, pytorch/diffusers/transformers and such do work as well and nearly out of box, just a little memory management needs to be done. |
Is Debian going to be an officially supported hip/ROCm distro? |
Has AMD made any firm support-date commitments for officially supported cards? I mean nVidia has demonstrated continued cuda support for many cards that are nearly 10 years old so their actions are proof enough. I would like to avoid the proprietary Cuda garden and maybe program with openSyCL. But why should I gamble thousands of USD assembling a new machine when hip/ROCm/pro-driver support can be haphazardly pulled out from under me next month? And AMD has a bad habit of eliminating access to older versions of software/firmware that could be used on older systems. (My current system doesn't support PCIe atomics, so a new AMD card would mean a fresh build.) On the CPU side AMD has shown excellent long-term support, but my experience on the GPGPU side has burned me twice due to poor/misleading marketing of features and compatibility. |
The same issue for me. I search support list for blender, and my rx480 but documentation very ugly. |
The list for ROCm 6.0.2 can be found at https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.0.2/reference/system-requirements.html#supported-gpus |
I'm a long-time CUDA developer looking to explore ROCm and HIP development, but finding out which hardware even supports these tools is harder than it needs to be.
Let's see... this repo's readme has a section on "Supported GPUs":
Okay, "extends" implies it supports other GPUs too-- which ones? Maybe the FAQ has more info:
Nope, it'll tell me all of the NVIDIA cards that work, but none of the AMD ones apparently. Okay, I guess I'll look at their HIP Programming Guide pdf. Skimming the table of contents, no indication of "supported GPUs"-- it's a 100 page document, surely they don't expect a user to read all of that to just see if a card works or not? Let's try searching instead:
CTRL+F "supported GPU": zero results
CTRL+F "supported platform": zero results
CTRL+F "supported device": zero results
okay..
CTRL+F "supported": 87 results, great. Going through them one by one, I guess. First 76 results unrelated, 77 is the closest thing I can find:
This sounds sort of related to what I'm looking for, although it's deprecated, so the options for
gpu_arch
are probably out of date. I would like to know what HIP currently supports, let's look at the option--offload-arch=<target>
documentation:Okay, the documentation doesn't actually explain anything at all, it just links to something. I might have wasted a lot of my time getting here, but finally, a link with an answer to my simple question:
https://clang.llvm.org/docs/ClangOffloadBundlerFileFormat.html#target-id
Ah, of course-- the link is also broken. Maybe try:
https://clang.llvm.org/docs/ClangOffloadBundlerFileFormat.html
No, also broken.
Forgive the sarcastic tone of this issue, but am I an idiot or is this documentation just abysmal?
If I want to know which NVIDIA GPUs support CUDA, and which features, all of that information is readily available in many places, e.g.
https://developer.nvidia.com/cuda-gpus
I've been looking for an hour and found nothing official about the AMD support for HIP, so I quit. Hopefully creating a github issue will lead to an answer to this trivial question.
The text was updated successfully, but these errors were encountered: