Skip to content
This repository has been archived by the owner on Jan 26, 2024. It is now read-only.

ROCclr-rocm-5.4.3/os/os_posix.cpp:305: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed #158

Open
darkbasic opened this issue Feb 20, 2023 · 4 comments

Comments

@darkbasic
Copy link

darkbasic commented Feb 20, 2023

niko@talos2 ~ $ clinfo
clinfo: /var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCclr-rocm-5.4.3/os/os_posix.cpp:305: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
Aborted (core dumped)

I'm on Gentoo Linux ppc64le (4K page size) using linux-6.1.12.
GPU is AMD RX 570 (mesa git master).
LLVM is 15.0.7.

rocm-opencl-runtime-5.4.3 compiles fine but as soon as I run clinfo it crashes.

The coredump looks completely useless:

Mon 2023-02-20 15:01:01 CET   212914 1000 1000 SIGABRT present  /usr/bin/clinfo                                                                              1.5M
talos2 ~ # coredumpctl gdb 212914
           PID: 212914 (clinfo)
           UID: 1000 (niko)
           GID: 1000 (niko)
        Signal: 6 (ABRT)
     Timestamp: Mon 2023-02-20 15:01:01 CET (46s ago)
  Command Line: clinfo
    Executable: /usr/bin/clinfo
 Control Group: /user.slice/user-1000.slice/user@1000.service/session.slice/vte-spawn-4515b441-519b-4357-9405-43fc61d5f6db.scope
          Unit: user@1000.service
     User Unit: vte-spawn-4515b441-519b-4357-9405-43fc61d5f6db.scope
         Slice: user-1000.slice
     Owner UID: 1000 (niko)
       Boot ID: 0dca6c1f75ea46d7b02761482c0ec1d6
    Machine ID: b3e834569b8ff461391f5ac061feb773
      Hostname: talos2
       Storage: /var/lib/systemd/coredump/core.clinfo.1000.0dca6c1f75ea46d7b02761482c0ec1d6.212914.1676901661000000.zst (present)
  Size on Disk: 1.5M
       Message: Process 212914 (clinfo) of user 1000 dumped core.

GNU gdb (Gentoo 12.1 vanilla) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc64le-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/clinfo...
(No debugging symbols found in /usr/bin/clinfo)
[New LWP 212914]
[New LWP 212918]
[New LWP 212915]
[New LWP 212916]
[New LWP 212917]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `clinfo '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00003fff8c7e603c in ?? () from /usr/lib64/libc.so.6
[Current thread is 1 (Thread 0x3fff8ca58020 (LWP 212914))]
(gdb) info threads
  Id   Target Id                          Frame 
* 1    Thread 0x3fff8ca58020 (LWP 212914) 0x00003fff8c7e603c in ?? () from /usr/lib64/libc.so.6
  2    Thread 0x3fff7abab120 (LWP 212918) 0x00003fff8c7ddf84 in ?? () from /usr/lib64/libc.so.6
  3    Thread 0x3fff7c4ef120 (LWP 212915) 0x00003fff8c7ddf84 in ?? () from /usr/lib64/libc.so.6
  4    Thread 0x3fff7bbad120 (LWP 212916) 0x00003fff8c7ddf84 in ?? () from /usr/lib64/libc.so.6
  5    Thread 0x3fff7b3ac120 (LWP 212917) 0x00003fff8c7ddf84 in ?? () from /usr/lib64/libc.so.6
(gdb) thread 1
[Switching to thread 1 (Thread 0x3fff8ca58020 (LWP 212914))]
#0  0x00003fff8c7e603c in ?? () from /usr/lib64/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 0x3fff7abab120 (LWP 212918))]
#0  0x00003fff8c7ddf84 in ?? () from /usr/lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x3fff7c4ef120 (LWP 212915))]
#0  0x00003fff8c7ddf84 in ?? () from /usr/lib64/libc.so.6
(gdb) thread 4
[Switching to thread 4 (Thread 0x3fff7bbad120 (LWP 212916))]
#0  0x00003fff8c7ddf84 in ?? () from /usr/lib64/libc.so.6
(gdb) thread 5
[Switching to thread 5 (Thread 0x3fff7b3ac120 (LWP 212917))]
#0  0x00003fff8c7ddf84 in ?? () from /usr/lib64/libc.so.6
(gdb) quit

I'm using dev-libs/rocm-opencl-runtime-5.4.3 and dev-libs/rocr-runtime, dev-libs/rocm-comgr, dev-libs/rocm-device-libs, dev-util/rocm-cmake and dev-libs/roct-thunk-interface 5.4.3 as well.

I've compiled dev-libs/rocm-opencl-runtime and dev-libs/rocr-runtime with debug symbols:

FEATURES="${FEATURES} nostrip"
CFLAGS="${CFLAGS} -ggdb3 -Wall"
CXXFLAGS="${CFLAGS}"

and I've enabled the debug use flag to enable assertions and other debug code paths as well.

I've tried ROCm-OpenCL-Runtime from git master but it still gives me the very same error at runtime.

@darkbasic
Copy link
Author

I noticed that when I try to compile dev-libs/rocm-opencl-runtime-5.4.3 with the test use flag it fails:

[122/224] /usr/bin/powerpc64le-unknown-linux-gnu-g++ -DCL_TARGET_OPENCL_VERSION=220 -DEMU_ENV=1 -DUSE_OPENGL=1 -Doclperf_EXPORTS -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/khronos/headers/opencl2.2 -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/include -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/common -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/include -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/amdocl  -O2 -pipe -mcpu=power9 -mtune=power9 -ggdb3 -Wall -fPIC -std=c++14 -MD -MT tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o -MF tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o.d -o tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o -c /var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.cpp
FAILED: tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o 
/usr/bin/powerpc64le-unknown-linux-gnu-g++ -DCL_TARGET_OPENCL_VERSION=220 -DEMU_ENV=1 -DUSE_OPENGL=1 -Doclperf_EXPORTS -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/khronos/headers/opencl2.2 -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/include -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/common -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/include -I/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/amdocl  -O2 -pipe -mcpu=power9 -mtune=power9 -ggdb3 -Wall -fPIC -std=c++14 -MD -MT tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o -MF tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o.d -o tests/ocltst/module/perf/CMakeFiles/oclperf.dir/OCLPerfKernelThroughput.cpp.o -c /var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.cpp
In file included from /var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.cpp:21:
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:16: error: typedef ‘CPUKernel’ is initialized (use ‘decltype’ instead)
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                ^~~~~~~~~
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:27: error: ‘__m128’ was not declared in this scope; did you mean ‘__ibm128’?
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                           ^~~~~~
      |                           __ibm128
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:35: error: expected primary-expression before ‘,’ token
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                                   ^
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:37: error: ‘__m128’ was not declared in this scope; did you mean ‘__ibm128’?
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                                     ^~~~~~
      |                                     __ibm128
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:45: error: expected primary-expression before ‘,’ token
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                                             ^
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCm-OpenCL-Runtime-rocm-5.4.3/tests/ocltst/module/perf/OCLPerfKernelThroughput.h:48:47: error: expected primary-expression before ‘unsigned’
   48 | typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
      |                                               ^~~~~~~~
      ```

@darkbasic
Copy link
Author

rocm-opencl-runtime-tests-ppc64.patch.txt

The following patch fixes compilation of the tests, which fail with the same error as clinfo:

PORTAGE_USERNAME=niko PORTAGE_GRPNAME=niko OCLGL_DISPLAY=${DISPLAY} OCLGL_XAUTHORITY=${XAUTHORITY} FEATURES=test USE=test emerge -v --oneshot rocm-opencl-runtime

>>> Test phase: dev-libs/rocm-opencl-runtime-5.4.3

 * Running oclgl test under DISPLAY :0 ...
OpenGL vendor string: AMD
Built for Emulation Environment
ocltst: /var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/work/ROCclr-rocm-5.4.3/os/os_posix.cpp:305: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
/var/tmp/portage/dev-libs/rocm-opencl-runtime-5.4.3/temp/environment: line 2213:    38 Aborted                 (core dumped) ./ocltst -m $(realpath liboclgl.so) -A ogl.exclude

@Ashark
Copy link

Ashark commented May 1, 2023

I get this error when running davinci resolve (the error is in the pre-pre last line):

[andrey@unihost DaVinci Resolve]$ LC_ALL=C ROC_ENABLE_PRE_VEGA=1 wine Resolve.exe
0084:fixme:hid:handle_IRP_MN_QUERY_ID Unhandled type 00000005
0084:fixme:hid:handle_IRP_MN_QUERY_ID Unhandled type 00000005
0084:fixme:hid:handle_IRP_MN_QUERY_ID Unhandled type 00000005
0084:fixme:hid:handle_IRP_MN_QUERY_ID Unhandled type 00000005
0084:fixme:wineusb:query_id Unhandled ID query type 0x5.
010c:fixme:actctx:parse_depend_manifests Could not find dependent assembly L"SMDK-VC140-x64-4_21_0" (4.21.0.159)
010c:err:winediag:load_odbc failed to open library "libodbc.so": libodbc.so: cannot open shared object file: No such file or directory
010c:fixme:reg:NtNotifyChangeMultipleKeys Unimplemented optional parameter
010c:fixme:reg:NtNotifyChangeMultipleKeys Unimplemented optional parameter
010c:fixme:reg:NtNotifyChangeMultipleKeys Unimplemented optional parameter
010c:fixme:reg:NtNotifyChangeMultipleKeys Unimplemented optional parameter
0110:fixme:combase:RoActivateInstance (00007FFFFFCEFB70, 00007FFFFFCEFA78): semi-stub
0110:fixme:combase:RoGetActivationFactory (L"Windows.Management.Deployment.PackageManager", {00000035-0000-0000-c000-000000000046}, 00007FFFFFCEF978): semi-stub
0110:err:combase:RoGetActivationFactory Failed to find library for L"Windows.Management.Deployment.PackageManager"
010c:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (000000000021FC20 1 C) semi-stub
010c:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (000000000021FC80 1 C) semi-stub
ActCCMessage Already in Table: Code= c005, Mode= 13, Level=  1, CmdKey= -1, Option= 0
ActCCMessage Already in Table: Code= c006, Mode= 13, Level=  1, CmdKey= -1, Option= 0
ActCCMessage Already in Table: Code= c007, Mode= 13, Level=  1, CmdKey= -1, Option= 0
ActCCMessage Already in Table: Code= 2282, Mode=  0, Level=  0, CmdKey= 8, Option= 0
PnlMsgActionStringAdapter Already in Table: Code= 615e, Mode=  0, Level=  0, CmdKey= -1, Option= 0
010c:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (000000000021F840 1 C) semi-stub
010c:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (000000000021F480 1 C) semi-stub
18.5.0b.0016 Windows/MSVC x86_64
Main thread starts: 0000010C
QCoreApplication::applicationDirPath: Please instantiate the QApplication object first
[0x0000010c] | Undefined            | INFO  | 2023-05-01 16:42:56,948 | --------------------------------------------------------------------------------
010c:fixme:file:NtLockFile I/O completion on lock not implemented yet
[0x0000010c] | Undefined            | INFO  | 2023-05-01 16:42:56,948 | Loaded log config from C:\users\andrey\AppData\Roaming\Blackmagic Design\DaVinci Resolve\Preferences\log-conf.xml
[0x0000010c] | Undefined            | INFO  | 2023-05-01 16:42:56,948 | --------------------------------------------------------------------------------
010c:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (000000000021BCF0 1 C) semi-stub
010c:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (000000000021BCD0 1 C) semi-stub
010c:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION
m Files\Blackmagic Design\DaVinci Resolve\Resolve.exe: /usr/src/debug/rocm-opencl-runtime/ROCclr-rocm-5.4.3/os/os_posix.cpp:305: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
010c:err:seh:call_stack_handlers invalid frame 000000000011F270 (0000000000122000-0000000000220000)
010c:err:seh:NtRaiseException Exception frame is not in stack limits => unable to dispatch exception.

@leavelet
Copy link

leavelet commented Sep 4, 2023

Same problem with loongarch64

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants