Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU ID does not like rocm-opencl-runtime #238

Closed
RushingAlien opened this issue Sep 28, 2022 · 4 comments
Closed

CPU ID does not like rocm-opencl-runtime #238

RushingAlien opened this issue Sep 28, 2022 · 4 comments
Assignees
Labels

Comments

@RushingAlien
Copy link

Describe the bug/Expected behavior

cpu-x: /home/main-builder/pkgwork/src/ROCclr-rocm-5.2.3/device/rocm/rocdevice.cpp:1132: bool roc::Device::populateOCLDeviceConstants(): Assertion `cachesize[0] > 0' failed.

For CPU ID to grab OpenCL device

Additional information

  • Operating system name and version: Arch Linux
  • CPU-X installation type: AUR

Bug information

========================= Backtrace =========================
CPU-X 4.4.0 (Sep 26 2022 02:48:24, Linux x86_64, GNU 12.2.0)
# 1 /usr/lib/libc.so.6(+0x38a00) [0x7f85d1ff4a00]
# 2 /usr/lib/libc.so.6(+0x8849c) [0x7f85d204449c]
# 3 /usr/lib/libc.so.6(gsignal+0x18) [0x7f85d1ff4958]
# 4 /usr/lib/libc.so.6(abort+0xd7) [0x7f85d1fde53d]
# 5 /usr/lib/libc.so.6(+0x2245c) [0x7f85d1fde45c]
# 6 /usr/lib/libc.so.6(+0x31486) [0x7f85d1fed486]
# 7 /opt/rocm/lib/libamdocl64.so(+0xf5d24) [0x7f85b051ed24]
# 8 /opt/rocm/lib/libamdocl64.so(+0xfa217) [0x7f85b0523217]
# 9 /opt/rocm/lib/libamdocl64.so(+0xfab73) [0x7f85b0523b73]
#10 /opt/rocm/lib/libamdocl64.so(+0xb4176) [0x7f85b04dd176]
#11 /opt/rocm/lib/libamdocl64.so(+0xea3d0) [0x7f85b05133d0]
#12 /opt/rocm/lib/libamdocl64.so(+0x8d736) [0x7f85b04b6736]
#13 /usr/lib/libc.so.6(+0x8b6d7) [0x7f85d20476d7]
#14 /opt/rocm/lib/libamdocl64.so(clIcdGetPlatformIDsKHR+0xa0) [0x7f85b04b7760]
#15 /opt/rocm/lib/libOpenCL.so.1(+0x31cc) [0x7f85d21e11cc]
======================== End Backtrace =======================

cpu-x.log
cpu-x-daemon.log

@Umio-Yasuno
Copy link
Contributor

It may be a problem with the OpenCL runtime.
Do clinfo and rocminfo work correctly?

@RushingAlien
Copy link
Author

clinfo reported an error that core is dumped, same error that appears when launching cpu-x, however, rocminfo does work correctly

@Umio-Yasuno
Copy link
Contributor

If the error occurs in clinfo, I think it is a problem on the runtime side.

ROCm does not officially support Raven APU.
Does the same problem occur with AMDGPU-PRO driver?

Note: AMD_LOG_LEVEL=2 clinfo may give a more detailed log.

ref: https://github.com/ROCm-Developer-Tools/ROCclr/blob/rocm-5.2.3/device/rocm/rocdevice.cpp#L1132
ref: https://github.com/ROCm-Developer-Tools/ROCclr/blob/develop/utils/flags.hpp
Related: rocm-arch/rocm-arch#670

@TheTumultuousUnicornOfDarkness
Copy link
Owner

CPU-X crashes due to a SIGABRT caused somewhere after clGetPlatformIDs call. In that case, it seems to be a bug with ROCm.
I was not able to reproduce this bug with ROCm 5.2.3 on my AMD Radeon RX 580.
However, it seems reasonable to me to handle this signal in that case and try to recover. C language does not have a proper way to handle exceptions unfortunately, so let's say 3ff4a60 is a workaround.

It would produce something like this without crashing the app:

...
find_gpu_device_path: device_path=/sys/bus/pci/devices/0000:26:00.0
Vulkan instance extensions count: 20
Found instance extension: VK_KHR_get_physical_device_properties2
Found instance extension: VK_KHR_portability_enumeration
Vulkan devices count: 1

Oops, something was wrong! CPU-X has received signal 6 (Aborted) and is trying to recover.
========================= Backtrace =========================
CPU-X 4.4.0+git-r20-g5da0020eab12ac33178c682147fd58dfdfbebee1-dirty (Oct  1 2022 12:22:11, Linux x86_64, GNU 12.2.0)
# 1 main.c:720 cpu-x() [0x40d117]
# 2 /usr/lib/libc.so.6(+0x38a00) [0x7f430014da00]
# 3 /usr/lib/libc.so.6(kill+0xb) [0x7f430014dc3b]
# 4 core.c:1202 cpu-x(test_crash+0x18) [0x41528a]
# 5 core.c:1220 cpu-x() [0x41553b]
# 6 core.c:1461 cpu-x() [0x416f64]
# 7 core.c:111 cpu-x(fill_labels+0x76) [0x410499]
# 8 main.c:856 cpu-x(main+0x277) [0x40d4c6]
# 9 /usr/lib/libc.so.6(+0x23290) [0x7f4300138290]
#10 /usr/lib/libc.so.6(__libc_start_main+0x8a) [0x7f430013834a]
#11 start.S:117 cpu-x(_start+0x25) [0x409325]
======================== End Backtrace =======================

There is no platform with OpenCL support (SIGABRT)
request_sensor_path(base_dir=/sys/class/hwmon, cached_path=/sys/class/hwmon/hwmon0/temp1_input, which=0) ==> 0
request_sensor_path(base_dir=/sys/bus/pci/devices/0000:26:00.0/hwmon, cached_path=/sys/bus/pci/devices/0000:26:00.0/hwmon/hwmon2, which=5) ==> 0
request_sensor_path(base_dir=/sys/bus/pci/devices/0000:26:00.0/drm, cached_path=/sys/bus/pci/devices/0000:26:00.0/drm/card0, which=4) ==> 0
...

TheTumultuousUnicornOfDarkness added a commit that referenced this issue Jan 11, 2023
We face numerous issues related to OpenCL (#238, #258, #264).
OpenCL drivers are in bad shape in Linux.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants