Cursed rx 6800m installation/ better support for gfx1031 #1726

LuisB79 · 2022-04-14T14:54:01Z

I wasted 4 days trying to install rocm in ubuntu 20.04 to no avail, either amdgpu-dkms didn't wanted to install or the 5.13 ubuntu kernel was too old, or it had no candidates for 5.17, the only 2 times i got it "working" (after restarting my laptop it just blanked out) running hipinfo returned me "hipErrorInvalidDevice(101) at hipInfo.ccp:205" failing the test, i tried following the 5.1.1 guide to no avail, i tried other methods even diferent kernels, amdgpu-dkms would always fail me

ffleader1 · 2022-04-14T17:10:03Z

Your GPU is officially not supported. Given that Navi 21 is supported like 2 months ago, I don't think Navi 22 support will be available anytime soon, possibly never.

LuisB79 · 2022-04-15T03:25:48Z

Your GPU is officially not supported. Given that Navi 21 is supported like 2 months ago, I don't think Navi 22 support will be available anytime soon, possibly never.

why is that?, it's a rdna2 gpu, that is ridiculous

ffleader1 · 2022-04-15T03:31:33Z

Your GPU is officially not supported. Given that Navi 21 is supported like 2 months ago, I don't think Navi 22 support will be available anytime soon, possibly never.

why is that?, it's a rdna2 gpu, that is ridiculous

To be honest I do not know lol. It just one of those AMD and their infinite wisdom thing. Though I do think someone got success when compiling from the source with the 6700 XT (Also Navi 22 like your GPU), but replacing all gfx1030 in the source code with gfx1031. Try #1668
Hey, if you got it up and running, please tell me. I will be heading out and buy one 6700XT too.

LuisB79 · 2022-04-15T03:38:38Z

@ROCmSupport could you share insight into why there is little to no support for consumer cards? even though it's the same architecture?

ffleader1 · 2022-04-15T03:44:59Z

@ROCmSupport could you share insight into why there is little to no support for consumer cards? even though it's the same architecture?

Well it does work with consumer card, it just has to be Navi 21: Rx 6800 and above.
The reason I guess is that AMD do not have the intend or resource to do the testing for Navi 22 and lower, so they just drop the bomb altogether. Seeing compiling from the source by replacing gfx1030 with gfx1031 works, this is the most likely reason imo.
It sure does suck balls.
Really do wish AMD spend more of their budget on Rocm. Their strategy of approaching ML is basically giving the consumer market to Nvidia at this point lol.

LuisB79 · 2022-04-15T03:45:53Z

@ROCmSupport could you share insight into why there is little to no support for consumer cards? even though it's the same architecture?

Well it does work with consumer card, it just has to be Navi 21: Rx 6800 and above. The reason I guess is that AMD do not have the intend or resource to do the testing for Navi 22 and lower, so they just drop the bomb altogether. Seeing compiling from the source by replacing gfx1030 with gfx1031 works, this is the most likely reason imo. It sure does suck balls. Really do wish AMD spend more of their budget on Rocm. Their strategy of approaching ML is basically giving the consumer market to Nvidia at this point lol.

@ROCmSupport is that true?

keryell · 2022-04-16T02:27:21Z

@ffleader1 I am new (again) at AMD and, while I am not working on this project, I am trying to understand the full story and see how the current situation can be improved.

ffleader1 · 2022-04-16T02:57:55Z

@ffleader1 I am new (again) at AMD and, while I am not working on this project, I am trying to understand the full story and see how the current situation can be improved.

Well that is interesting. I do wonder how many people are working on Rocm at AMD. It is confidential for sure but my wild guess is 2-3 ppl as the core programmer.
There is not much to talk about besides the obvious fact that Rocm is dragging their feet while even Microsoft is arguably doing AMD a better job in this field.
Not even talking about performance, Rocm has 2 majors disadvantages compared to CUDA:

It does not work on Windows
Its number of supported GPU is countable by toddler

Issue 1 is a huge problem for sure, but I imagine it will require major code revision. And also since Microsoft is doing AMD favor on Windows already, that conversation could be put away for another day.
Issue 2 is what gets me. Like AMD rep promised of Navi 1 gen supported... Nope, did not happen. There has been so many issues opened on this so I really don't feel like bother citing.
For Navi 2 gen, AMD technically support it... for like 3 cards on the consumer side (Navi 21 includes 6800, 6800 XT, 6900 XT), and they don't event include them in the official doc. Worst part is Rocm actually could support lower Navi GPUs, like Navi 22, evident by the fact that changing the gfx1030 number in the source code, to say gfx1031, and compile from the source and Rocm could work on 6700 XT (not sure the latest version does but previous version can for sure). I think this is the lack of resources for AMD to do the testing for lower card, thus basically abandon them together.
So yes, both issues have led to the state in which Rocm is used exclusively by 3 people. Rocm is like the bulldozer in ML world.
I think AMD is despair for resource on this project. So their strategy now is abandoning the consumer segment and only aim for the big-budget enterprise environment. This is imo a bad move, but what do I know.
Anyway, at least they can provide support for more GPU. That is for sure within their range. They do not have to test it though. They can just come out and say: Look, here is the "Unstable" rocm version that supports 6700XT, 6600XT, 6500XT, 6400 or whatever. We did not do a lot of testing, hence the label "Unstable", but is still here, for you (the consumer market, not the enterprise market) to chew, and we welcome feed back.

LuisB79 · 2022-04-16T06:58:54Z

@ffleader1 I am new (again) at AMD and, while I am not working on this project, I am trying to understand the full story and see how the current situation can be improved.

Please move strings, make noise, there are lots of talented people who could develop stuff for hip, but can't because of its little support

Varpie · 2022-04-17T13:51:50Z

I haven't tried compiling the source with gfx1031 yet, but honestly if it is all it takes, I don't get it.

A bit over a year ago, PyTorch started support for ROCm, and AMD has a community build for support of Tensorflow since September 2019.
Before that it was arguably better to ignore the consumer market but now that both of the major ML frameworks support ROCm, the only thing stopping AMD from having an actual impact on the ML market is their lack of support for consumer cards.

Even research labs don't necessarily use "pro" cards because of the costs, the uni I went to had a cluster that was mostly made of 2080 Ti, and in that case using AMD hardware was not even a question because support for ROCm takes so long to come on their last consumer cards, as if it is an afterthought (which is likely the case, unfortunately)...
As a result, consumer cards don't have ROCm => interest in ROCm is limited => people think CUDA is required for ML, so they buy Nvidia by default.
All it would take to change that is thinking about ROCm when shipping new hardware, and actually giving support, rather than hiding it from the official documentation. They are losing the ML market for no real reason.

ffleader1 · 2022-04-17T17:23:30Z

I haven't tried compiling the source with gfx1031 yet, but honestly if it is all it takes, I don't get it.

A bit over a year ago, PyTorch started support for ROCm, and AMD has a community build for support of Tensorflow since September 2019. Before that it was arguably better to ignore the consumer market but now that both of the major ML frameworks support ROCm, the only thing stopping AMD from having an actual impact on the ML market is their lack of support for consumer cards.

Even research labs don't necessarily use "pro" cards because of the costs, the uni I went to had a cluster that was mostly made of 2080 Ti, and in that case using AMD hardware was not even a question because support for ROCm takes so long to come on their last consumer cards, as if it is an afterthought (which is likely the case, unfortunately)... As a result, consumer cards don't have ROCm => interest in ROCm is limited => people think CUDA is required for ML, so they buy Nvidia by default. All it would take to change that is thinking about ROCm when shipping new hardware, and actually giving support, rather than hiding it from the official documentation. They are losing the ML market for no real reason.

I think it is because there is not enough resources for them to validate those lower-end cards, and so they think of the bad reason like: Rocm is aimed for processional, so **** you all casual consumers.

Anyway, I believe this whole stinky state is due to AMD has lost its interest in the ML side of thing, or Rocm precisely.

Rocm was written for Linux, and only Linux. No matter how much PR or master-racism from Linux fanboy (and there is a lot), a niche software stack written only for Linux will never prosperous with the mass. And when it doesn't you can really have much to show about its potential, isn't it? And when you do not have much to show, you lose interest. And when you lose interest, you do not spend more money to upgrade the software stack...

We have gone through the full circles of Rocm.

I know many companies using 3090 Ti for training. Seems like a good deal considering for the price of a Tesla A100, you can get 16 of the 3090 Ti. So, abandoning the consumer market is a bad bad bad move. But, well... no interest then no budget, no budge then no progress, no progress and no interest.

The only way for AMD to salvage Rocm is do: Go big or go home. But that would require some kind of major direction change, which... meh... I would rather trust Microsoft DirectML than this.

LuisB79 · 2022-04-28T06:08:06Z

It seems amd won't give a proper explanation, turns out its easier to use rocm stuff on windows than in linux thanks to antares.

ffleader1 · 2022-04-28T06:12:48Z

First time I heard of this. Learned something new. Thank you. But does it work with your 6800M though?

LuisB79 · 2022-04-28T14:20:29Z

it did, hipinifo werked, and i could do other things

ffleader1 · 2022-04-28T14:55:45Z

it did, hipinifo werked, and i could do other things

What Rocm version though. I want to get a 6700 because it's more in my budget than a 6800. But the 6700 is not supported by Rocm current version. Does antares have some kind of special "sauce".

LuisB79 · 2022-04-28T22:28:25Z

it has some sauce, it's not meant for full rocm emulation, for what i understand it compiles something you can add to your sourcecode so your code uses amdhip64.dll, and it seem that dll has a wide support for gpu's You can compile hip code and run it on windows, and so on, pytorch stuff too.

LuisB79 closed this as completed Apr 28, 2022

deisi mentioned this issue Jul 21, 2023

Can't run stable diffusion using AMD gpu (7900xtx) AUTOMATIC1111/stable-diffusion-webui#11900

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cursed rx 6800m installation/ better support for gfx1031 #1726

Cursed rx 6800m installation/ better support for gfx1031 #1726

LuisB79 commented Apr 14, 2022

ffleader1 commented Apr 14, 2022 •

edited

LuisB79 commented Apr 15, 2022

ffleader1 commented Apr 15, 2022 •

edited

LuisB79 commented Apr 15, 2022

ffleader1 commented Apr 15, 2022

LuisB79 commented Apr 15, 2022

keryell commented Apr 16, 2022

ffleader1 commented Apr 16, 2022 •

edited

LuisB79 commented Apr 16, 2022

Varpie commented Apr 17, 2022 •

edited

ffleader1 commented Apr 17, 2022

LuisB79 commented Apr 28, 2022

ffleader1 commented Apr 28, 2022

LuisB79 commented Apr 28, 2022

ffleader1 commented Apr 28, 2022

LuisB79 commented Apr 28, 2022

Cursed rx 6800m installation/ better support for gfx1031 #1726

Cursed rx 6800m installation/ better support for gfx1031 #1726

Comments

LuisB79 commented Apr 14, 2022

ffleader1 commented Apr 14, 2022 • edited

LuisB79 commented Apr 15, 2022

ffleader1 commented Apr 15, 2022 • edited

LuisB79 commented Apr 15, 2022

ffleader1 commented Apr 15, 2022

LuisB79 commented Apr 15, 2022

keryell commented Apr 16, 2022

ffleader1 commented Apr 16, 2022 • edited

LuisB79 commented Apr 16, 2022

Varpie commented Apr 17, 2022 • edited

ffleader1 commented Apr 17, 2022

LuisB79 commented Apr 28, 2022

ffleader1 commented Apr 28, 2022

LuisB79 commented Apr 28, 2022

ffleader1 commented Apr 28, 2022

LuisB79 commented Apr 28, 2022

ffleader1 commented Apr 14, 2022 •

edited

ffleader1 commented Apr 15, 2022 •

edited

ffleader1 commented Apr 16, 2022 •

edited

Varpie commented Apr 17, 2022 •

edited