Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RX 470 card no longer recognized by clinfo after 4.5 update #1608

Closed
boxerab opened this issue Nov 3, 2021 · 30 comments
Closed

RX 470 card no longer recognized by clinfo after 4.5 update #1608

boxerab opened this issue Nov 3, 2021 · 30 comments

Comments

@boxerab
Copy link

boxerab commented Nov 3, 2021

Card was working fine with 4.3.

I uninstalled my previous version (4.3) and installed 4.5.

output from /opt/rocm/opencl/bin/clinfo :

Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.2 AMD-APP (3361.0)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback 


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 0

output from rocminfo

hsa api call failure at: /long_pathname_so_that_rpms_can_package_the_debug_info/src/rocminfo/rocminfo.cc:1143
Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.

output from rocm-smi

======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU  Temp   AvgPwr   SCLK     MCLK    Fan     Perf  PwrCap  VRAM%  GPU%  
0    47.0c  23.116W  1169Mhz  300Mhz  19.22%  auto  92.0W    16%   0%    
================================================================================
============================= End of ROCm SMI Log ==============================
@boxerab boxerab changed the title RX 470 card no longer recognized after 4.5 update RX 470 card no longer recognized by clinfo after 4.5 update Nov 3, 2021
@clavinet
Copy link

clavinet commented Nov 4, 2021

RX 570 here.
clinfo shows the same output here with "Number of devices: 0"
However, rocminfo shows the correct (I think) output and no error message.
rocm-smi shows the same output as yours as well.

I just installed 4.3.1 again to cross-check and it works without issue.

@boxerab
Copy link
Author

boxerab commented Nov 4, 2021

@clavinet did you try/opt/rocm/opencl/bin/clinfo ?
I should add that I am running on Ubuntu 20.04

@clavinet
Copy link

clavinet commented Nov 5, 2021

I tried both the bundled clinfo that comes with ROCm, as well as the system clinfo.
My distro is openSUSE Tumbleweed 20211102 .

@boxerab
Copy link
Author

boxerab commented Nov 6, 2021

After re-install, I can now get rocminfo to show what looks like correct output.
However, I can't run an opencl program - no device is recognized.
Come on, AMD !

@boxerab
Copy link
Author

boxerab commented Nov 6, 2021

@clavinet how do I re-install the older 4.3 version ?

@boxerab
Copy link
Author

boxerab commented Nov 6, 2021

never mind - I've gone back to 4.3.1. Now recognizing 470 as cl device.

@boxerab boxerab closed this as completed Nov 8, 2021
@clavinet
Copy link

clavinet commented Nov 8, 2021

@boxerab isn't it too early to close this?
The issue still persists with 4.5.

@boxerab
Copy link
Author

boxerab commented Nov 8, 2021

@clavinet right you are.

@boxerab boxerab reopened this Nov 8, 2021
@clavinet
Copy link

clavinet commented Nov 8, 2021

Compiling this beast is daunting even for people experienced in building software (not me), so I can't say if it's an issue with the binaries or the rocm source...

Would be great to know if this issue also happens with self-compiled builds.

@ROCmSupport
Copy link

Thanks @boxerab for reaching out.
RX 470 is not supported anymore and so things might not work.
For supported hardware, please check @ https://github.com/RadeonOpenCompute/ROCm#hardware-and-software-support
Thank you.

@boxerab
Copy link
Author

boxerab commented Nov 10, 2021

Thanks, @ROCmSupport . So, support for Polaris 10 was dropped for 4.5.
Was this mentioned explicitly in the release notes for version 4.5 ?

@clavinet
Copy link

There's a difference between "not supported" as in "we don't provide support", and "not supported" as in "we don't enable that feature in our binaries".

I had hoped that "not supported" in ROCm meant the first case, similar to ECC RAM on consumer Ryzen platforms which recieves no support as in help and assistance from AMD, yet it works at your own risk.

It's sad to see those Polaris cards now apparently being dropped from ROCm binary releases.

@ROCmSupport Can't there be a policy similar to Ryzen and ECC, in that you don't provide assistance for that feature, but don't actively prevent it from running either?

@boxerab
Copy link
Author

boxerab commented Nov 10, 2021

@clavinet on the one hand there are a ton of Polaris cards out there, on the other hand 5 years is a reasonable time to support this card. Is there any reason why you need 4.5 instead of 4.3.1 ?

@clavinet
Copy link

clavinet commented Nov 10, 2021

Is there any reason why you need 4.5 instead of 4.3.1?

Not really. It's just that it's usually better to run newer rather than older software, and nobody knows how long 4.3.1 will keep working on modern systems.

@johnbridgman
Copy link

Thanks @boxerab for reaching out. RX 470 is not supported anymore and so things might not work. For supported hardware, please check @ https://github.com/RadeonOpenCompute/ROCm#hardware-and-software-support Thank you.

Same question as others are asking - my understanding was that we had stopped testing on Polaris but were not disabling Polaris in the code paths... but it looks like maybe we disabled in the code ?

@ROCmSupport
Copy link

I am not sure about the gfx8 code whether its disabled or not. AFAIK, we might not have removed any code intentionally. but maybe something changed in the stack and we dont validate gfx8, so it might not be working anymore.

@boxerab
Copy link
Author

boxerab commented Nov 11, 2021

@ROCmSupport @johnbridgman Polaris 10 cards have worked fine with ROCM since the library was launched in 2016.
It would be a shame, in my opinion, to disable these cards at this stage, unless there's a good reason for doing so.

@johnbridgman
Copy link

Agreed - I checked with our OpenCL management and they are not aware of any action to disable Polaris - seems most likely that this is a bug resulting from all the build/packaging/install changes we made in 4.5 as part of unifying the ROCm and AMDGPU-PRO stacks.

@boxerab
Copy link
Author

boxerab commented Nov 12, 2021

Thanks for looking into this issue - hopefully this can be addressed easily and we can start using the latest ROCm version with our cards.

@boxerab
Copy link
Author

boxerab commented Nov 13, 2021

As this appears to be an unintentional bug, can this issue please be re-opened

cc @ROCmSupport

@boxerab
Copy link
Author

boxerab commented Nov 26, 2021

@ROCmSupport @johnbridgman any updates on this issue ? Also, can we re-open until this is resolved ?

@boxerab
Copy link
Author

boxerab commented Dec 13, 2021

@johnbridgman do you know if this issue is fixed in 4.5.2 release ?

@vladtcvs
Copy link

I have rx570, problem also exists - no devices on rocm 4.5, while rocminfo shows it. rocm 4.3.1 work fine

@boxerab
Copy link
Author

boxerab commented Jan 22, 2022

@vladtcvs don't hold your breath for a fix - AMD has been studiously ignoring this issue.

@ROCmSupport
Copy link

AFAIK, we have not removed any code intentionally. But maybe something changed in the stack and we don't validate gfx8 on ROCm, so it might not be working anymore.
One thing from support point of view, each card has some duration of support. We can not continue supporting cards for more number of years as per business standards. As new cards coming into the market, we keep adding the new ones into the supported list and keep dropping the old ones after certain amount of time, which is the process.
Thank you.

@Atemu
Copy link

Atemu commented Jan 28, 2022

Where can we these business standards for software support duration of AMD cards?

@boxerab
Copy link
Author

boxerab commented Jan 28, 2022

@ROCmSupport absolutely, all cards have an EOL, but 6 years is not very long IMHO.

@John-Gee
Copy link

@ROCmSupport absolutely, all cards have an EOL, but 6 years is not very long IMHO.

The 590 is not even 4 years old yet I believe.

@boxerab
Copy link
Author

boxerab commented Jan 29, 2022

I've opened a new issue for this situation

#1659

@rajhlinux
Copy link

I have the RX-580 GPU... this is also a Polaris 10 card and only 5 years since release. Anyways, its best to learn AMD's open source documents of their GPU ISA and OpenCL programming so you do not need to worry about situations like this, it's somewhat complicated to learn but provides you freedom in not relying on anyone and possibilities to get better performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants