-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocminfo does not report a "Marketing Name" for MI100 or MI200 GPUs #1778
Comments
Can confirm the same issue for MI50 and Radeon RX6600 Update Dec 2023:
Related to https://gitlab.freedesktop.org/mesa/drm/-/commit/613cc945b36e7ba3ce8de0e42b5057b32bc7c69c not being ported, perhaps? |
Also confirming that ROCM/5.4.0 also does not return a device name for MI60. In contrast, |
Can you guys please try the latest ROCm 6.0.0 to see if your issue has been resolved ? If resolved, please close the ticket. Thanks. |
ROCm 6.0.0 and MI50, 5.15.0-91-generic kernel.
|
Hi @al42and, thanks for your response. Let me check with the internal team on your finding and will get back to you. Thanks. |
Should go to runtime, not compiler. Saad, please reassign. |
This information comes from package: libdrm-amdgpu. @bigtrak @al42and @milthorpe, can you please check that you have a recent version of libdrm-amdgpu installed. libdrm usually stores this information here: |
I cannot find this package in ROCm repos and the file does not exist neither in Ubuntu nor on HPE/Cray systems. There is The Ubuntu 22.04 version seems to correspond to https://gitlab.freedesktop.org/mesa/drm/-/commit/e214a6a6e88610aed09a046aac23e61430b76975, the HPE/Cray has an even older version. |
It looks like it comes from this package: And that will install the file here: |
As of the the June 2023 rocm, 5.6.0 rocminfo returns the correct names for the GPUs I have -- mi25, mi100, mi210. I'll check HIP when I have a chance as well, as well as some of the off-site platforms. |
Note that the distro-provided libdrm-amdgpu-common version of amdgpu.ids will be outdated compared to the one provided by ROCm on pretty much every release . So if the ROCm-distributed libdrm-amdgpu-common doesn't get installed, you'll have the amdgpu.ids provided by your distro. And that can be significantly different from one distro to the next in terms of how often they update. |
Our system (MI60 Ubuntu 22.04.3 with ROCM 5.4.3 and ROCM 6.0 installed as modules) has both |
Can you share the driver files of MI250X (OLCF Frontier) with me? |
@milthorpe The one in /usr/share will be from the OS-distributed libdrm. That we don't have much control over, as the distro will update it on their schedule. The one in /opt/amdgpu will be from the ROCm (or amdgpu-pro) install and will be the up-to-date one. I do agree that versioning should be used here, since 1.0.0 doesn't really help much. But with how much churn there is for marketing names, it's probably not feasible. @yanbosmu The ROCm release will support MI250X. There are no "driver files of MI250X" to speak of, they're all in the regular ROCm release. Install ROCm on a supported OS, and MI250X will work. |
@bigtrak Has this been resolved for you? If so, please close ticket. Thanks! |
As of ROCm 6.0 (only version I have available) this issue is fixed. |
I earlier reported an issue with HIP (to the HIP group) that HIP was not returning device
properties for some GPU models.
I recalled an earlier issue with rocminfo not reporting the "Marketing Name" (human readable
GPU name) for some GPUs.
Upon examining this issue again, the same GPUs (MI100, MI200) which do not report a marketing name
via rocminfo, also do not report the device name at the HIP level.
Marketing Name is reported for an MI25. I don't have an MI50 or MI60 to test behavior on those models.
The repeater is to run rocminfo (5.2.0 version is latest, with corresponding amdgpu driver on enterprise
linux 8 (or SLES)) and observe that "Marketing Name" is blank for MI100 and MI200, and populated for MI25.
The text was updated successfully, but these errors were encountered: