Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursed rx 6800m installation/ better support for gfx1031 #1726

Closed
LuisB79 opened this issue Apr 14, 2022 · 16 comments
Closed

Cursed rx 6800m installation/ better support for gfx1031 #1726

LuisB79 opened this issue Apr 14, 2022 · 16 comments

Comments

@LuisB79
Copy link

LuisB79 commented Apr 14, 2022

I wasted 4 days trying to install rocm in ubuntu 20.04 to no avail, either amdgpu-dkms didn't wanted to install or the 5.13 ubuntu kernel was too old, or it had no candidates for 5.17, the only 2 times i got it "working" (after restarting my laptop it just blanked out) running hipinfo returned me "hipErrorInvalidDevice(101) at hipInfo.ccp:205" failing the test, i tried following the 5.1.1 guide to no avail, i tried other methods even diferent kernels, amdgpu-dkms would always fail me

@ffleader1
Copy link

ffleader1 commented Apr 14, 2022

Your GPU is officially not supported. Given that Navi 21 is supported like 2 months ago, I don't think Navi 22 support will be available anytime soon, possibly never.

@LuisB79
Copy link
Author

LuisB79 commented Apr 15, 2022

Your GPU is officially not supported. Given that Navi 21 is supported like 2 months ago, I don't think Navi 22 support will be available anytime soon, possibly never.

why is that?, it's a rdna2 gpu, that is ridiculous

@ffleader1
Copy link

ffleader1 commented Apr 15, 2022

Your GPU is officially not supported. Given that Navi 21 is supported like 2 months ago, I don't think Navi 22 support will be available anytime soon, possibly never.

why is that?, it's a rdna2 gpu, that is ridiculous

To be honest I do not know lol. It just one of those AMD and their infinite wisdom thing. Though I do think someone got success when compiling from the source with the 6700 XT (Also Navi 22 like your GPU), but replacing all gfx1030 in the source code with gfx1031. Try #1668
Hey, if you got it up and running, please tell me. I will be heading out and buy one 6700XT too.

@LuisB79
Copy link
Author

LuisB79 commented Apr 15, 2022

@ROCmSupport could you share insight into why there is little to no support for consumer cards? even though it's the same architecture?

@ffleader1
Copy link

@ROCmSupport could you share insight into why there is little to no support for consumer cards? even though it's the same architecture?

Well it does work with consumer card, it just has to be Navi 21: Rx 6800 and above.
The reason I guess is that AMD do not have the intend or resource to do the testing for Navi 22 and lower, so they just drop the bomb altogether. Seeing compiling from the source by replacing gfx1030 with gfx1031 works, this is the most likely reason imo.
It sure does suck balls.
Really do wish AMD spend more of their budget on Rocm. Their strategy of approaching ML is basically giving the consumer market to Nvidia at this point lol.

@LuisB79
Copy link
Author

LuisB79 commented Apr 15, 2022

@ROCmSupport could you share insight into why there is little to no support for consumer cards? even though it's the same architecture?

Well it does work with consumer card, it just has to be Navi 21: Rx 6800 and above. The reason I guess is that AMD do not have the intend or resource to do the testing for Navi 22 and lower, so they just drop the bomb altogether. Seeing compiling from the source by replacing gfx1030 with gfx1031 works, this is the most likely reason imo. It sure does suck balls. Really do wish AMD spend more of their budget on Rocm. Their strategy of approaching ML is basically giving the consumer market to Nvidia at this point lol.

@ROCmSupport is that true?

@keryell
Copy link
Contributor

keryell commented Apr 16, 2022

@ffleader1 I am new (again) at AMD and, while I am not working on this project, I am trying to understand the full story and see how the current situation can be improved.

@ffleader1
Copy link

ffleader1 commented Apr 16, 2022

@ffleader1 I am new (again) at AMD and, while I am not working on this project, I am trying to understand the full story and see how the current situation can be improved.

Well that is interesting. I do wonder how many people are working on Rocm at AMD. It is confidential for sure but my wild guess is 2-3 ppl as the core programmer.
There is not much to talk about besides the obvious fact that Rocm is dragging their feet while even Microsoft is arguably doing AMD a better job in this field.
Not even talking about performance, Rocm has 2 majors disadvantages compared to CUDA:

  1. It does not work on Windows
  2. Its number of supported GPU is countable by toddler

Issue 1 is a huge problem for sure, but I imagine it will require major code revision. And also since Microsoft is doing AMD favor on Windows already, that conversation could be put away for another day.
Issue 2 is what gets me. Like AMD rep promised of Navi 1 gen supported... Nope, did not happen. There has been so many issues opened on this so I really don't feel like bother citing.
For Navi 2 gen, AMD technically support it... for like 3 cards on the consumer side (Navi 21 includes 6800, 6800 XT, 6900 XT), and they don't event include them in the official doc. Worst part is Rocm actually could support lower Navi GPUs, like Navi 22, evident by the fact that changing the gfx1030 number in the source code, to say gfx1031, and compile from the source and Rocm could work on 6700 XT (not sure the latest version does but previous version can for sure). I think this is the lack of resources for AMD to do the testing for lower card, thus basically abandon them together.
So yes, both issues have led to the state in which Rocm is used exclusively by 3 people. Rocm is like the bulldozer in ML world.
I think AMD is despair for resource on this project. So their strategy now is abandoning the consumer segment and only aim for the big-budget enterprise environment. This is imo a bad move, but what do I know.
Anyway, at least they can provide support for more GPU. That is for sure within their range. They do not have to test it though. They can just come out and say: Look, here is the "Unstable" rocm version that supports 6700XT, 6600XT, 6500XT, 6400 or whatever. We did not do a lot of testing, hence the label "Unstable", but is still here, for you (the consumer market, not the enterprise market) to chew, and we welcome feed back.

@LuisB79
Copy link
Author

LuisB79 commented Apr 16, 2022

@ffleader1 I am new (again) at AMD and, while I am not working on this project, I am trying to understand the full story and see how the current situation can be improved.

Please move strings, make noise, there are lots of talented people who could develop stuff for hip, but can't because of its little support

@Varpie
Copy link

Varpie commented Apr 17, 2022

I haven't tried compiling the source with gfx1031 yet, but honestly if it is all it takes, I don't get it.

A bit over a year ago, PyTorch started support for ROCm, and AMD has a community build for support of Tensorflow since September 2019.
Before that it was arguably better to ignore the consumer market but now that both of the major ML frameworks support ROCm, the only thing stopping AMD from having an actual impact on the ML market is their lack of support for consumer cards.

Even research labs don't necessarily use "pro" cards because of the costs, the uni I went to had a cluster that was mostly made of 2080 Ti, and in that case using AMD hardware was not even a question because support for ROCm takes so long to come on their last consumer cards, as if it is an afterthought (which is likely the case, unfortunately)...
As a result, consumer cards don't have ROCm => interest in ROCm is limited => people think CUDA is required for ML, so they buy Nvidia by default.
All it would take to change that is thinking about ROCm when shipping new hardware, and actually giving support, rather than hiding it from the official documentation. They are losing the ML market for no real reason.

@ffleader1
Copy link

I haven't tried compiling the source with gfx1031 yet, but honestly if it is all it takes, I don't get it.

A bit over a year ago, PyTorch started support for ROCm, and AMD has a community build for support of Tensorflow since September 2019. Before that it was arguably better to ignore the consumer market but now that both of the major ML frameworks support ROCm, the only thing stopping AMD from having an actual impact on the ML market is their lack of support for consumer cards.

Even research labs don't necessarily use "pro" cards because of the costs, the uni I went to had a cluster that was mostly made of 2080 Ti, and in that case using AMD hardware was not even a question because support for ROCm takes so long to come on their last consumer cards, as if it is an afterthought (which is likely the case, unfortunately)... As a result, consumer cards don't have ROCm => interest in ROCm is limited => people think CUDA is required for ML, so they buy Nvidia by default. All it would take to change that is thinking about ROCm when shipping new hardware, and actually giving support, rather than hiding it from the official documentation. They are losing the ML market for no real reason.

I think it is because there is not enough resources for them to validate those lower-end cards, and so they think of the bad reason like: Rocm is aimed for processional, so **** you all casual consumers.

Anyway, I believe this whole stinky state is due to AMD has lost its interest in the ML side of thing, or Rocm precisely.

Rocm was written for Linux, and only Linux. No matter how much PR or master-racism from Linux fanboy (and there is a lot), a niche software stack written only for Linux will never prosperous with the mass. And when it doesn't you can really have much to show about its potential, isn't it? And when you do not have much to show, you lose interest. And when you lose interest, you do not spend more money to upgrade the software stack...

We have gone through the full circles of Rocm.

I know many companies using 3090 Ti for training. Seems like a good deal considering for the price of a Tesla A100, you can get 16 of the 3090 Ti. So, abandoning the consumer market is a bad bad bad move. But, well... no interest then no budget, no budge then no progress, no progress and no interest.

The only way for AMD to salvage Rocm is do: Go big or go home. But that would require some kind of major direction change, which... meh... I would rather trust Microsoft DirectML than this.

@LuisB79
Copy link
Author

LuisB79 commented Apr 28, 2022

It seems amd won't give a proper explanation, turns out its easier to use rocm stuff on windows than in linux thanks to antares.

@LuisB79 LuisB79 closed this as completed Apr 28, 2022
@ffleader1
Copy link

First time I heard of this. Learned something new. Thank you. But does it work with your 6800M though?

@LuisB79
Copy link
Author

LuisB79 commented Apr 28, 2022

it did, hipinifo werked, and i could do other things

@ffleader1
Copy link

it did, hipinifo werked, and i could do other things

What Rocm version though. I want to get a 6700 because it's more in my budget than a 6800. But the 6700 is not supported by Rocm current version. Does antares have some kind of special "sauce".

@LuisB79
Copy link
Author

LuisB79 commented Apr 28, 2022

it has some sauce, it's not meant for full rocm emulation, for what i understand it compiles something you can add to your sourcecode so your code uses amdhip64.dll, and it seem that dll has a wide support for gpu's You can compile hip code and run it on windows, and so on, pytorch stuff too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants