-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doesn't ROCm support AMD's integrated GPU (APU)? #2216
Comments
Run like following,
|
Thank you @xfyucg , but isn't that the same code I posted above? It returned |
Somebody mentioned that I missed the
and this script run as
Which is surprising, because if no ROCM devices were found, then how is everything fine? And during installation, the commands
Distribution: ps: I really hope AMD could do something about making this install and use simple. It's such a good chance to compete, now that AI is going really big. |
Good news guys. I found the solution. The problem was with a version mismatch, and I don't think I had run this command You have to run this after installing: Check the RoCm version with: Now select the Nightly version of PyTorch (or whichever version matches your RoCm version): and install PyTorch. Now if you run this script, it'll show:
and you can even run this script without the environment variable.
Note to the Radeon Open Compute developers:When you close this issue, I'll know that you've seen it. Please close it. However, I hope you realize two things:
Thanks for trying to help. Hope y'all would automate the install process. |
One more thing. In the way that NVIDIA created a CUDA toolkit, it would help if AMD also created a module or interface which would help any application seamlessly use the GPU, whether the GPU is on a graphics card or on the processor. Also, it should work irrespective of the version.
This and this issue speak volumes about how much more needs to be done to improve RoCm support. People are feeling cheated. This is the right time to ask your managers to provide you more time and resources to build a good architecture for RoCm, to support Machine Learning. Please do so. |
HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 app.py |
Thank you xfyucg. It works. Verified with |
Thank you for your feedback and persistence to get this resolved. Good job! We will close this issue. |
On gfx902, using amdgpu-install-6.0.60002-1, I am getting a HIP error when referencing a device , even though I am using HSA_OVERRIDE_GFX_VERSION=9.0.0 (see example below) @nav9, were you only succesfull with amdgpu-install-5.5 or should higher version work as well? Thanks --------- Failure
|
@mzimmerm : I just did a fresh install using |
@nav9: Many thanks for the quick confirmation. I will deinstall and reinstall everything from 6.0.2 following steps as you described exactly and will report here. For early full disclosure (I did not mention earlier) I am running OpenSUSE Leap 15.5. |
@nav9: Unfortunately, my reinstall attempt lead to the same result.
Same error regarding HIP when running python torch with 'device=' HSA_OVERRIDE_GFX_VERSION=9.0.0 python
Can a AMD team member help, to narrow it further please? Thanks. Rant: I realize gfx902 is unsupported, but AMD only supports like 6 cards. Not supporting APUs is detrimental for people trying to start with machine learning tuning mini models etc. |
@nav9, I appologize, feel free to ignore, but in case you have time and opportunity, would you mind running on your gfx902 build the following to see if there are no errors: HSA_OVERRIDE_GFX_VERSION=9.0.0 python -c "import torch; cuda0 = torch.device('cuda:0');print(torch.ones([2, 4], dtype=torch.float64, device=cuda0)); print('done')" I am getting consistent errors on the ROCm 6.0.2 on 3 OSs I tried (Leap 15.5, Tumbleweed, Ubuntu 22.04) both with the AMD versions and Pytorch versions of Pytorch. I am trying building PyTorch from source now, but there are issues as well. Thanks |
@mzimmerm : I need to apologise here. I had only tried the simple script and the rocm checks. When I ran your script and this script, it just hung. I don't know what could solve this. The AMD team needs to look into it. |
@nav9 : Many thanks for the follow up. Yeah, I also experienced hanging in some builds - I tried a massive amount of combinations, with no luck. Well, at least I know I am not going crazy thanks to your response. I found several claims of people got gfx902 working but I doubt it now. Unfortunately I also doubt AMD will help. For any ML stuff, they are driving people to NVIDIA in herds... Why would I buy AMD card or even APU when high chances are it is a dead metal for any ML/AI stuff. Thanks again for your help here. |
@nav9 I can confirm that it also does not works with I also attach rocminfo output (looks like all good) rocminfo.txt |
Thanks serhii-nakon. I wonder if this issue needs to be re-opened or a fresh issue needs to be created. I really hope the AMD team would consider it prudent to build the necessary functionality to allow GPU use seamlessly. |
@nav9 @serhii-nakon : For one, I would like this re-open as a new issue. It is such a waste for AMD APUs (and many other inexpensive AMD cards) to be useless for ML/AI training and inference. I started (and now officially concluded as failed) my quest to run on APUs to answer a few questions: 1) Can APUs be useful for running inference on tiny LLM models? 2) Can APUs be useful for running training on some tiny LLM models? . If one of you opens it, please Cc me there, I'd provide supporting info of things I tried and failed to run Pytorch on ROCm gfx902 (which would cover a good portion of APUs). |
Among the open issues, there are already some that speak of ROCm 6 not working. |
How much space did you need to install pytorch with ROCm, because every time I try to install it shows an error message: not enough space. |
@berrieshawn: You could have just Googled it to find out. |
I have more than 3GB free on my device, around 37 GB free, but the error still persists; and all the stackoverflow thread states is that you need at least 3GB space which is not the issue at all |
Does AMD really believe that if someone runs into this they'll add an AMD card to fix the issue after spending hours to make their existing (i)GPU work? It's just not plausible. I wanted to play and learn the hard way, and luckily 5.4 does the job for now. But it's clear that getting this to work took as much time as just going do my normal work and buy a T4 or similar. |
@FlorianHeigl I think AMD will support upcoming AMD AI 300 due it has really powerful iGPU and VRAM bus bandwidth. Regular 7800G not even enough to run some Llama 7B. Also I can confirm that ROCm works with 7800X iGPU (RDNA2). Looks like problem with Vega based iGPUs. I think that ROCm won't support older hardware than RDNA2 due low performance and many issues compare to new hardware. Technically you can use ROCm from Debian team that built ROCm with older hardware and some patches and workarounds. |
It's because you don't have enough space in tmp. Try doing this before running pip. |
I have an AMD Ryzen 5 5600G processor which has an integrated GPU, and I do not have a separate graphics card. Am using
Linux Mint 21
Cinnamon.I installed PyTorch with this command
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
and according to posts on a forum, running the following command is supposed to tell me if PyTorch can use ROCm to perform CUDA equivalent processing.The output is
False
.PyTorch redirects people to this repository's readme page to check for compatible GPU information, but I didn't see any. So for the sake of anyone searching for this info:
Update: Tried this script too, but the output is
Tried installing ROCm via instructions on this page (tried with the
deb
file for bionic and focal).On running
sudo rocminfo
, I get:On running
rocminfo
:This script now outputs:
The text was updated successfully, but these errors were encountered: