Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU-X does not start anymore since 4.5.0 #246

Closed
ptr1337 opened this issue Oct 12, 2022 · 25 comments
Closed

CPU-X does not start anymore since 4.5.0 #246

ptr1337 opened this issue Oct 12, 2022 · 25 comments

Comments

@ptr1337
Copy link

ptr1337 commented Oct 12, 2022

Describe the bug/Expected behavior

Additional information

  • Operating system name and version: CachyOS (Archlinux based) -->
  • CPU-X installation type: < Tested stable build and -git build from the aur, both have the same behavior -->

Bug information

========================= Backtrace =========================
❯ cpu-x --verbose
Setting label names
Calling libcpuid for retrieving static data
Finding CPU technology
Finding devices
Finding Vulkan API version
CPU-X:core.c:1215: failed to find number of OpenCL devices for platform 'Clover OpenCL 1.1 Mesa 22.2.1' (CL_DEVICE_NOT_FOUND)
Identifying running system
fish: Job 1, 'cpu-x --verbose' terminated by signal SIGSEGV (Address boundary error)

======================== End Backtrace =======================

You can open a new issue here, by filling the template as requested:
https://github.com/X0rg/CPU-X/issues/new?template=bug_report.md

fish: Job 1, 'cpu-x --verbose' terminated by signal SIGSEGV (Address boundary error)


======================== End Backtrace =======================
@ptr1337
Copy link
Author

ptr1337 commented Oct 12, 2022

cpu-x.log

@TheTumultuousUnicornOfDarkness
Copy link
Owner

It seems it founds 2 OpenCL platforms:

  • the first one (NVIDIA CUDA OpenCL 3.0 CUDA 11.8.87) is properly detected
  • the second one (Clover OpenCL 1.1 Mesa 22.2.1) fails with CL_DEVICE_NOT_FOUND

Then it crashes later, but backtrace output overlap with other messages, weird. Segmentation fault appears in libnvidia-opencl.so. Maybe a threading issue.
@Umio-Yasuno it is normal that it loops until num_ocl_dev, even if compute units are found?

@TheTumultuousUnicornOfDarkness
Copy link
Owner

@ptr1337 if I understand well, you did not face this issue in v4.4.0?

@ptr1337
Copy link
Author

ptr1337 commented Oct 12, 2022

@ptr1337 if I understand well, you did not face this issue in v4.4.0?

Yes, correct. Actually I have compiled the 4.4 version again and started it with --verbose:

❯ cpu-x --verbose
Setting label names
Calling libcpuid for retrieving static data
Finding CPU technology
Finding devices
CPU-X:core.c:841: failed to call GLFW (65544): Wayland: Failed to connect to display
Vulkan API is not available
CPU-X:core.c:1112: failed to find number of OpenCL devices for platform 'Clover OpenCL 1.1 Mesa 22.2.1' (CL_DEVICE_NOT_FOUND)
Identifying running system
Finding CPU package in fallback mode
Retrieving motherboard information in fallback mode
Calling libcpuid for retrieving dynamic data
Calculating CPU usage
Retrieving CPU temperature in fallback mode
Retrieving CPU voltage in fallback mode
CPU-X:core.c:2140: failed to retrieve CPU voltage (fallback mode)
Calling bandwidth
Calling libprocps
Retrieving GPU clocks
Updating benchmark status
Starting GTK GUI…
Freeing memory
Calling libcpuid for retrieving dynamic data

It seems to be a regression between 4.4.0 and 4.5.0.
Actually I have the latest nvidia driver (520) installed and also mesa. I have a GTX 1070TI and a AMD 5900x.

@TheTumultuousUnicornOfDarkness
Copy link
Owner

Hmmm, when I check changes in get_gpu_comp_unit() function between v4.4.0 and v4.5.0 (v4.4.0...v4.5.0), nothing major changed here... Only 3ff4a60, but it seems not relevant. Error is somewhere else.

Can you retry to build and install the cpu-x-git package from the AUR and valgrind package also? Then, please run valgrind cpu-x --issue-fmt and provide the output.

@ptr1337
Copy link
Author

ptr1337 commented Oct 12, 2022

Hmmm, when I check changes in get_gpu_comp_unit() function between v4.4.0 and v4.5.0 (v4.4.0...v4.5.0), nothing major changed here... Only 3ff4a60, but it seems not relevant. Error is somewhere else.

Can you retry to build and install the cpu-x-git package from the AUR and valgrind package also? Then, please run valgrind cpu-x --issue-fmt and provide the output.

Sure. Actually I have compiled a cpu-x-git and reverted this commit, and it seems not to help.
Here the valgrind output of the latest cpu-x-git (have used the -s option, to not supress the errors:

❯ valgrind -s cpu-x --issue-fmt
==236570== Memcheck, a memory error detector
==236570== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==236570== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==236570== Command: cpu-x --issue-fmt
==236570== 
==236570== Warning: noted but unhandled ioctl 0x6444 with no size/direction hints.
==236570==    This could cause spurious value errors to appear.
==236570==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==236570== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==236570==    This could cause spurious value errors to appear.
==236570==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==236570== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==236570==    This could cause spurious value errors to appear.
==236570==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==236570== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
==236570==    This could cause spurious value errors to appear.
==236570==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==236570== Warning: noted but unhandled ioctl 0x37 with no size/direction hints.
==236570==    This could cause spurious value errors to appear.
==236570==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==236570== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
==236570==    This could cause spurious value errors to appear.
==236570==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==236570== Warning: set address range perms: large range [0x200000000, 0x300200000) (noaccess)
==236570== Warning: set address range perms: large range [0x15d1b000, 0x35d1a000) (noaccess)
==236570== Warning: noted but unhandled ioctl 0x19 with no size/direction hints.
==236570==    This could cause spurious value errors to appear.
==236570==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==236570== Warning: noted but unhandled ioctl 0x18 with no size/direction hints.
==236570==    This could cause spurious value errors to appear.
==236570==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==236570== Warning: noted but unhandled ioctl 0x1a with no size/direction hints.
==236570==    This could cause spurious value errors to appear.
==236570==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==236570== Thread 6:
==236570== Jump to the invalid address stated on the next line
==236570==    at 0xD88CDEC: ???
==236570==    by 0x143A66BF: ???
==236570==    by 0x143A5BEF: ???
==236570==    by 0x63474C27: ???
==236570==    by 0x2BD77AEF: ???
==236570==    by 0x63474C27: ???
==236570==    by 0x9B295: ???
==236570==    by 0x13BA5FFF: ???
==236570==    by 0xF4FDECF: ???
==236570==    by 0xD88F03F: ???
==236570==    by 0xF4FE01F: ???
==236570==  Address 0xd88cdec is not stack'd, malloc'd or (recently) free'd
==236570== 
==236570== Invalid read of size 1
==236570==    at 0x6BA1088: x86_64_fallback_frame_state (md-unwind-support.h:63)
==236570==    by 0x6BA1088: uw_frame_state_for (unwind-dw2.c:1271)
==236570==    by 0x6BA2FFA: _Unwind_Backtrace (unwind.inc:303)
==236570==    by 0x5C52FF2: backtrace (backtrace.c:78)
==236570==    by 0x40B865: common_sighandler (main.c:672)
==236570==    by 0x5B583EF: ??? (in /usr/lib/libc.so.6)
==236570==    by 0xD88CDEB: ???
==236570==    by 0x143A66BF: ???
==236570==    by 0x143A5BEF: ???
==236570==    by 0x63474C27: ???
==236570==    by 0x2BD77AEF: ???
==236570==    by 0x63474C27: ???
==236570==    by 0x9B295: ???
==236570==  Address 0xd88cdec is not stack'd, malloc'd or (recently) free'd
==236570== 
==236570== 
==236570== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==236570==  Access not within mapped region at address 0xD88CDEC
==236570==    at 0x6BA1088: x86_64_fallback_frame_state (md-unwind-support.h:63)
==236570==    by 0x6BA1088: uw_frame_state_for (unwind-dw2.c:1271)
==236570==    by 0x6BA2FFA: _Unwind_Backtrace (unwind.inc:303)
==236570==    by 0x5C52FF2: backtrace (backtrace.c:78)
==236570==    by 0x40B865: common_sighandler (main.c:672)
==236570==    by 0x5B583EF: ??? (in /usr/lib/libc.so.6)
==236570==    by 0xD88CDEB: ???
==236570==    by 0x143A66BF: ???
==236570==    by 0x143A5BEF: ???
==236570==    by 0x63474C27: ???
==236570==    by 0x2BD77AEF: ???
==236570==    by 0x63474C27: ???
==236570==    by 0x9B295: ???
==236570==  If you believe this happened as a result of a stack
==236570==  overflow in your program's main thread (unlikely but
==236570==  possible), you can try to increase the size of the
==236570==  main thread stack using the --main-stacksize= flag.
==236570==  The main thread stack size used in this run was 8388608.
==236570== 
==236570== HEAP SUMMARY:
==236570==     in use at exit: 5,536,844 bytes in 6,935 blocks
==236570==   total heap usage: 38,850 allocs, 31,915 frees, 3,179,112,461 bytes allocated
==236570== 
==236570== LEAK SUMMARY:
==236570==    definitely lost: 187,136 bytes in 15 blocks
==236570==    indirectly lost: 178,354 bytes in 1,147 blocks
==236570==      possibly lost: 55,156 bytes in 767 blocks
==236570==    still reachable: 5,099,374 bytes in 4,834 blocks
==236570==         suppressed: 0 bytes in 0 blocks
==236570== Rerun with --leak-check=full to see details of leaked memory
==236570== 
==236570== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
==236570== 
==236570== 1 errors in context 1 of 2:
==236570== Invalid read of size 1
==236570==    at 0x6BA1088: x86_64_fallback_frame_state (md-unwind-support.h:63)
==236570==    by 0x6BA1088: uw_frame_state_for (unwind-dw2.c:1271)
==236570==    by 0x6BA2FFA: _Unwind_Backtrace (unwind.inc:303)
==236570==    by 0x5C52FF2: backtrace (backtrace.c:78)
==236570==    by 0x40B865: common_sighandler (main.c:672)
==236570==    by 0x5B583EF: ??? (in /usr/lib/libc.so.6)
==236570==    by 0xD88CDEB: ???
==236570==    by 0x143A66BF: ???
==236570==    by 0x143A5BEF: ???
==236570==    by 0x63474C27: ???
==236570==    by 0x2BD77AEF: ???
==236570==    by 0x63474C27: ???
==236570==    by 0x9B295: ???
==236570==  Address 0xd88cdec is not stack'd, malloc'd or (recently) free'd
==236570== 
==236570== 
==236570== 1 errors in context 2 of 2:
==236570== Jump to the invalid address stated on the next line
==236570==    at 0xD88CDEC: ???
==236570==    by 0x143A66BF: ???
==236570==    by 0x143A5BEF: ???
==236570==    by 0x63474C27: ???
==236570==    by 0x2BD77AEF: ???
==236570==    by 0x63474C27: ???
==236570==    by 0x9B295: ???
==236570==    by 0x13BA5FFF: ???
==236570==    by 0xF4FDECF: ???
==236570==    by 0xD88F03F: ???
==236570==    by 0xF4FE01F: ???
==236570==  Address 0xd88cdec is not stack'd, malloc'd or (recently) free'd
==236570== 
==236570== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
fish: Job 1, 'valgrind -s cpu-x --issue-fmt' terminated by signal SIGSEGV (Address boundary error)

@ptr1337
Copy link
Author

ptr1337 commented Oct 12, 2022

@x0rg

I've bisected the breaking commit.
This commit, is the one which break actually CPU-X for me.
44f209d

@Umio-Yasuno
Copy link
Contributor

Please the result of cpu-x --debug -v?
Does this occur when OCL_ICD_VENDORS env is set to only either NVIDIA driver or Mesa Clover?

@TheTumultuousUnicornOfDarkness
Copy link
Owner

Please the result of cpu-x --debug -v?

That is the content of cpu-x.log file. 😉

@ptr1337 thank you for bisecting. It is pointing the Vulkan code here, but Valgrind output is not helpful unfortunately.
Can you please retry with valgrind -s cpu-x -v --debug? I hope it will help to find the root cause of Jump to the invalid address.

@Umio-Yasuno
Copy link
Contributor

Please the result of cpu-x --debug -v?

That is the content of cpu-x.log file. wink

Thanks.

@ptr1337 Please the results of clinfo and cpu-x --debug -v (CPU-X v4.4.0 & v4.5.0) ?

@ptr1337
Copy link
Author

ptr1337 commented Oct 13, 2022

@Umio-Yasuno
cpu-x --debug -v (4.4.0)

Setting label names
Calling libcpuid for retrieving static data
change_current_core_id(type=0) ==> 0
Finding CPU technology
cpu_technology: model   1, ext. model  33, ext. family  25 => values to find
cpu_technology: model   0, ext. model  16, ext. family  21 => entry #000 does not match
cpu_technology: model   0, ext. model  48, ext. family  21 => entry #001 does not match
cpu_technology: model   0, ext. model 112, ext. family  21 => entry #002 does not match
cpu_technology: model   0, ext. model  -1, ext. family  22 => entry #003 does not match
cpu_technology: model   1, ext. model  -1, ext. family  18 => entry #004 does not match
cpu_technology: model   1, ext. model  -1, ext. family  20 => entry #005 does not match
cpu_technology: model   1, ext. model   1, ext. family  21 => entry #006 does not match
cpu_technology: model   1, ext. model  96, ext. family  21 => entry #007 does not match
cpu_technology: model   2, ext. model  -1, ext. family  16 => entry #008 does not match
cpu_technology: model   2, ext. model  -1, ext. family  20 => entry #009 does not match
cpu_technology: model   2, ext. model  -1, ext. family  21 => entry #010 does not match
cpu_technology: model   3, ext. model  -1, ext. family  15 => entry #011 does not match
cpu_technology: model   3, ext. model  -1, ext. family  21 => entry #012 does not match
cpu_technology: model   4, ext. model  -1, ext. family  15 => entry #013 does not match
cpu_technology: model   4, ext. model  -1, ext. family  16 => entry #014 does not match
cpu_technology: model   5, ext. model  -1, ext. family  16 => entry #015 does not match
cpu_technology: model   5, ext. model  -1, ext. family  21 => entry #016 does not match
cpu_technology: model   6, ext. model  -1, ext. family  16 => entry #017 does not match
cpu_technology: model   8, ext. model  -1, ext. family   6 => entry #018 does not match
cpu_technology: model   8, ext. model  -1, ext. family  15 => entry #019 does not match
cpu_technology: model   8, ext. model  -1, ext. family  21 => entry #020 does not match
cpu_technology: model   9, ext. model  -1, ext. family  16 => entry #021 does not match
cpu_technology: model  10, ext. model  -1, ext. family   6 => entry #022 does not match
cpu_technology: model  10, ext. model  -1, ext. family  16 => entry #023 does not match
cpu_technology: model  11, ext. model  -1, ext. family  15 => entry #024 does not match
cpu_technology: model  12, ext. model  -1, ext. family  15 => entry #025 does not match
cpu_technology: model  15, ext. model  79, ext. family  15 => entry #026 does not match
cpu_technology: model  15, ext. model 127, ext. family  15 => entry #027 does not match
cpu_technology: model  -1, ext. model   1, ext. family  23 => entry #028 does not match
cpu_technology: model  -1, ext. model  17, ext. family  23 => entry #029 does not match
cpu_technology: model  -1, ext. model   8, ext. family  23 => entry #030 does not match
cpu_technology: model  -1, ext. model  32, ext. family  23 => entry #031 does not match
cpu_technology: model  -1, ext. model  24, ext. family  23 => entry #032 does not match
cpu_technology: model  -1, ext. model  49, ext. family  23 => entry #033 does not match
cpu_technology: model  -1, ext. model  96, ext. family  23 => entry #034 does not match
cpu_technology: model  -1, ext. model 104, ext. family  23 => entry #035 does not match
cpu_technology: model  -1, ext. model 113, ext. family  23 => entry #036 does not match
cpu_technology: model  -1, ext. model 144, ext. family  23 => entry #037 does not match
cpu_technology: model  -1, ext. model  33, ext. family  25 => entry #038 matches
Finding devices
find_gpu_device_path: device_path=/sys/bus/pci/devices/0000:07:00.0
CPU-X:core.c:859: failed to call GLFW (65544): Wayland: Failed to connect to display
Finding Vulkan API version
Vulkan API is not available
Number of OpenCL platforms: 2
Number of OpenCL devices for platform 'NVIDIA CUDA OpenCL 3.0 CUDA 11.8.87': 1
OpenCL device 0 is 'NVIDIA GeForce GTX 1070 Ti OpenCL 3.0 CUDA'
CPU-X:core.c:1253: failed to find number of OpenCL devices for platform 'Clover OpenCL 1.1 Mesa 22.2.1' (CL_DEVICE_NOT_FOUND)
Identifying running system
Finding CPU package in fallback mode
Retrieving motherboard information in fallback mode
Calling libcpuid for retrieving dynamic data
Calculating CPU usage
Retrieving CPU temperature in fallback mode
request_sensor_path(base_dir=/sys/class/hwmon, cached_path=/sys/class/hwmon/hwmon1/temp1_input, which=0) ==> 0
Retrieving CPU voltage in fallback mode
request_sensor_path(base_dir=/sys/class/hwmon, cached_path=/sys/class/hwmon/hwmon1/in1_input, which=2) ==> 0
Calling bandwidth
Calling libprocps
Retrieving GPU clocks
gpu_monitoring: nvidia: nvidia_cmd_args=nvidia-smi --format=csv,noheader,nounits --id=0
Updating benchmark status
Starting GTK GUI…
Calling libcpuid for retrieving dynamic data
Calculating CPU usage
Retrieving CPU temperature in fallback mode
Retrieving CPU voltage in fallback mode

cpu-x --debug -v (4.5.0)

❯ cpu-x --debug -v
Setting label names
Calling libcpuid for retrieving static data
change_current_core_id(type=0) ==> 0
Finding CPU technology
cpu_technology: model   1, ext. model  33, ext. family  25 => values to find
cpu_technology: model   0, ext. model  16, ext. family  21 => entry #000 does not match
cpu_technology: model   0, ext. model  48, ext. family  21 => entry #001 does not match
cpu_technology: model   0, ext. model 112, ext. family  21 => entry #002 does not match
cpu_technology: model   0, ext. model  -1, ext. family  22 => entry #003 does not match
cpu_technology: model   1, ext. model  -1, ext. family  18 => entry #004 does not match
cpu_technology: model   1, ext. model  -1, ext. family  20 => entry #005 does not match
cpu_technology: model   1, ext. model   1, ext. family  21 => entry #006 does not match
cpu_technology: model   1, ext. model  96, ext. family  21 => entry #007 does not match
cpu_technology: model   2, ext. model  -1, ext. family  16 => entry #008 does not match
cpu_technology: model   2, ext. model  -1, ext. family  20 => entry #009 does not match
cpu_technology: model   2, ext. model  -1, ext. family  21 => entry #010 does not match
cpu_technology: model   3, ext. model  -1, ext. family  15 => entry #011 does not match
cpu_technology: model   3, ext. model  -1, ext. family  21 => entry #012 does not match
cpu_technology: model   4, ext. model  -1, ext. family  15 => entry #013 does not match
cpu_technology: model   4, ext. model  -1, ext. family  16 => entry #014 does not match
cpu_technology: model   5, ext. model  -1, ext. family  16 => entry #015 does not match
cpu_technology: model   5, ext. model  -1, ext. family  21 => entry #016 does not match
cpu_technology: model   6, ext. model  -1, ext. family  16 => entry #017 does not match
cpu_technology: model   8, ext. model  -1, ext. family   6 => entry #018 does not match
cpu_technology: model   8, ext. model  -1, ext. family  15 => entry #019 does not match
cpu_technology: model   8, ext. model  -1, ext. family  21 => entry #020 does not match
cpu_technology: model   9, ext. model  -1, ext. family  16 => entry #021 does not match
cpu_technology: model  10, ext. model  -1, ext. family   6 => entry #022 does not match
cpu_technology: model  10, ext. model  -1, ext. family  16 => entry #023 does not match
cpu_technology: model  11, ext. model  -1, ext. family  15 => entry #024 does not match
cpu_technology: model  12, ext. model  -1, ext. family  15 => entry #025 does not match
cpu_technology: model  15, ext. model  79, ext. family  15 => entry #026 does not match
cpu_technology: model  15, ext. model 127, ext. family  15 => entry #027 does not match
cpu_technology: model  -1, ext. model   1, ext. family  23 => entry #028 does not match
cpu_technology: model  -1, ext. model  17, ext. family  23 => entry #029 does not match
cpu_technology: model  -1, ext. model   8, ext. family  23 => entry #030 does not match
cpu_technology: model  -1, ext. model  32, ext. family  23 => entry #031 does not match
cpu_technology: model  -1, ext. model  24, ext. family  23 => entry #032 does not match
cpu_technology: model  -1, ext. model  49, ext. family  23 => entry #033 does not match
cpu_technology: model  -1, ext. model  96, ext. family  23 => entry #034 does not match
cpu_technology: model  -1, ext. model 104, ext. family  23 => entry #035 does not match
cpu_technology: model  -1, ext. model 113, ext. family  23 => entry #036 does not match
cpu_technology: model  -1, ext. model 144, ext. family  23 => entry #037 does not match
cpu_technology: model  -1, ext. model  33, ext. family  25 => entry #038 matches
Finding devices
find_gpu_device_path: device_path=/sys/bus/pci/devices/0000:07:00.0
CPU-X:core.c:858: failed to call GLFW (65544): Wayland: Failed to connect to display
Finding Vulkan API version
Vulkan devices count: 1
Number of OpenCL platforms: 2
Number of OpenCL devices for platform 'NVIDIA CUDA OpenCL 3.0 CUDA 11.8.87': 1
OpenCL device 0 is 'NVIDIA GeForce GTX 1070 Ti OpenCL 3.0 CUDA'
CPU-X:core.c:1215: failed to find number of OpenCL devices for platform 'Clover OpenCL 1.1 Mesa 22.2.1' (CL_DEVICE_NOT_FOUND)
Identifying running system
Finding CPU package in fallback mode
Retrieving motherboard information in fallback mode
Calling libcpuid for retrieving dynamic data
Calculating CPU usage
Retrieving CPU temperature in fallback mode
request_sensor_path(base_dir=/sys/class/hwmon, cached_path=/sys/class/hwmon/hwmon1/temp1_input, which=0) ==> 0
Retrieving CPU voltage in fallback mode
request_sensor_path(base_dir=/sys/class/hwmon, cached_path=/sys/class/hwmon/hwmon1/in1_input, which=2) ==> 0
Calling bandwidth
fish: Job 1, 'cpu-x --debug -v' terminated by signal SIGSEGV (Address boundary error

clinfo:

❯ clinfo
Number of platforms                               2
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 3.0 CUDA 11.8.87
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd
  Platform Extensions with Version                cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
                                                  cl_khr_external_semaphore                                          0x9000 (0.9.0)
                                                  cl_khr_external_memory                                             0x9000 (0.9.0)
                                                  cl_khr_external_semaphore_opaque_fd                                0x9000 (0.9.0)
                                                  cl_khr_external_memory_opaque_fd                                   0x9000 (0.9.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             NV
  Platform Host timer resolution                  0ns

  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 22.2.1
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     NVIDIA GeForce GTX 1070 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     a486b376-22e8-7c2d-cbff-dbc4d1e3a57d
  Driver UUID                                     a486b376-22e8-7c2d-cbff-dbc4d1e3a57d
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  520.56.06
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:07:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               19
  Max clock frequency                             1683MHz
  Compute Capability (NV)                         6.1
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
=== CL_PROGRAM_BUILD_LOG ===
  Preferred work group size multiple (kernel)     <getWGsizes:1504: create kernel : error -45>
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              8510963712 (7.926GiB)
  Error Correction support                        No
  Max memory allocation                           2127740928 (1.982GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        933888 (912KiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                16
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  2
    IL version                                    (n/a)
    ILs with version                              <printDeviceInfo:186: get CL_DEVICE_ILS_WITH_VERSION : error -30>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <printDeviceInfo:190: get CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION : error -30>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
                                                  cl_khr_external_semaphore                                          0x9000 (0.9.0)
                                                  cl_khr_external_memory                                             0x9000 (0.9.0)
                                                  cl_khr_external_semaphore_opaque_fd                                0x9000 (0.9.0)
                                                  cl_khr_external_memory_opaque_fd                                   0x9000 (0.9.0)

  Platform Name                                   Clover
Number of devices                                 0

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
  clCreateContext(NULL, ...) [default]            Success [NV]
  clCreateContext(NULL, ...) [other]              
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.1
  ICD loader Profile                              OpenCL 3.0

@ptr1337
Copy link
Author

ptr1337 commented Oct 13, 2022

Please the result of cpu-x --debug -v?

That is the content of cpu-x.log file. wink

@ptr1337 thank you for bisecting. It is pointing the Vulkan code here, but Valgrind output is not helpful unfortunately. Can you please retry with valgrind -s cpu-x -v --debug? I hope it will help to find the root cause of Jump to the invalid address.

❯ valgrind -s cpu-x -v --debug
==7331== Memcheck, a memory error detector
==7331== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==7331== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==7331== Command: cpu-x -v --debug
==7331== 
Setting label names
Calling libcpuid for retrieving static data
change_current_core_id(type=0) ==> 0
Finding CPU technology
cpu_technology: model  12, ext. model  60, ext. family   6 => values to find
cpu_technology: model   0, ext. model   0, ext. family  -1 => entry #000 does not match
cpu_technology: model   1, ext. model   1, ext. family   6 => entry #001 does not match
cpu_technology: model   1, ext. model   1, ext. family  15 => entry #002 does not match
cpu_technology: model   2, ext. model   2, ext. family  -1 => entry #003 does not match
cpu_technology: model   3, ext. model   3, ext. family   5 => entry #004 does not match
cpu_technology: model   3, ext. model   3, ext. family   6 => entry #005 does not match
cpu_technology: model   3, ext. model   3, ext. family  15 => entry #006 does not match
cpu_technology: model   4, ext. model   4, ext. family  -1 => entry #007 does not match
cpu_technology: model   5, ext. model   5, ext. family   6 => entry #008 does not match
cpu_technology: model   5, ext. model  37, ext. family  -1 => entry #009 does not match
cpu_technology: model   5, ext. model  53, ext. family  -1 => entry #010 does not match
cpu_technology: model   5, ext. model  69, ext. family  -1 => entry #011 does not match
cpu_technology: model   5, ext. model  85, ext. family  -1 => entry #012 does not match
cpu_technology: model   5, ext. model 165, ext. family   6 => entry #013 does not match
cpu_technology: model   6, ext. model   6, ext. family   6 => entry #014 does not match
cpu_technology: model   6, ext. model   6, ext. family  15 => entry #015 does not match
cpu_technology: model   6, ext. model  22, ext. family  -1 => entry #016 does not match
cpu_technology: model   6, ext. model  54, ext. family  -1 => entry #017 does not match
cpu_technology: model   6, ext. model  70, ext. family  -1 => entry #018 does not match
cpu_technology: model   6, ext. model 102, ext. family  -1 => entry #019 does not match
cpu_technology: model   7, ext. model   7, ext. family  -1 => entry #020 does not match
cpu_technology: model   7, ext. model  23, ext. family  -1 => entry #021 does not match
cpu_technology: model   7, ext. model  55, ext. family  -1 => entry #022 does not match
cpu_technology: model   7, ext. model  71, ext. family  -1 => entry #023 does not match
cpu_technology: model   7, ext. model 151, ext. family  -1 => entry #024 does not match
cpu_technology: model   7, ext. model 167, ext. family  -1 => entry #025 does not match
cpu_technology: model   8, ext. model   0, ext. family   0 => entry #026 does not match
cpu_technology: model   8, ext. model   8, ext. family  -1 => entry #027 does not match
cpu_technology: model   9, ext. model   9, ext. family  -1 => entry #028 does not match
cpu_technology: model  10, ext. model  26, ext. family  -1 => entry #029 does not match
cpu_technology: model  10, ext. model  30, ext. family  -1 => entry #030 does not match
cpu_technology: model  10, ext. model  42, ext. family  -1 => entry #031 does not match
cpu_technology: model  10, ext. model  58, ext. family  -1 => entry #032 does not match
cpu_technology: model  10, ext. model 122, ext. family  -1 => entry #033 does not match
cpu_technology: model  10, ext. model 154, ext. family  -1 => entry #034 does not match
cpu_technology: model  11, ext. model  11, ext. family  -1 => entry #035 does not match
cpu_technology: model  12, ext. model  28, ext. family  -1 => entry #036 does not match
cpu_technology: model  12, ext. model  44, ext. family  -1 => entry #037 does not match
cpu_technology: model  12, ext. model  60, ext. family  -1 => entry #038 matches
Finding devices
find_gpu_device_path: device_path=/sys/bus/pci/devices/0000:07:00.0
CPU-X:core.c:858: failed to call GLFW (65544): Wayland: Failed to connect to display
Finding Vulkan API version
Vulkan devices count: 1
==7331== Warning: noted but unhandled ioctl 0x6444 with no size/direction hints.
==7331==    This could cause spurious value errors to appear.
==7331==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==7331== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==7331==    This could cause spurious value errors to appear.
==7331==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==7331== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==7331==    This could cause spurious value errors to appear.
==7331==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==7331== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
==7331==    This could cause spurious value errors to appear.
==7331==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==7331== Warning: noted but unhandled ioctl 0x37 with no size/direction hints.
==7331==    This could cause spurious value errors to appear.
==7331==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==7331== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
==7331==    This could cause spurious value errors to appear.
==7331==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==7331== Warning: set address range perms: large range [0x200000000, 0x300200000) (noaccess)
==7331== Warning: set address range perms: large range [0x1591b000, 0x3591a000) (noaccess)
==7331== Warning: noted but unhandled ioctl 0x19 with no size/direction hints.
==7331==    This could cause spurious value errors to appear.
==7331==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==7331== Warning: noted but unhandled ioctl 0x18 with no size/direction hints.
==7331==    This could cause spurious value errors to appear.
==7331==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==7331== Warning: noted but unhandled ioctl 0x1a with no size/direction hints.
==7331==    This could cause spurious value errors to appear.
==7331==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==7331== Thread 6:
==7331== Jump to the invalid address stated on the next line
==7331==    at 0xD88CDEC: ???
==7331==    by 0x13BA66BF: ???
==7331==    by 0x13BA5BEF: ???
==7331==    by 0x6347E81A: ???
==7331==    by 0x1136A58F: ???
==7331==    by 0x6347E81A: ???
==7331==    by 0x2E179: ???
==7331==    by 0x133A5FFF: ???
==7331==    by 0xF88AF8F: ???
==7331==    by 0xD88F03F: ???
==7331==    by 0xF88B0DF: ???
==7331==  Address 0xd88cdec is not stack'd, malloc'd or (recently) free'd
==7331== 
==7331== Invalid read of size 1
==7331==    at 0x6BA1088: x86_64_fallback_frame_state (md-unwind-support.h:63)
==7331==    by 0x6BA1088: uw_frame_state_for (unwind-dw2.c:1271)
==7331==    by 0x6BA2FFA: _Unwind_Backtrace (unwind.inc:303)
==7331==    by 0x5C52FF2: backtrace (backtrace.c:78)
==7331==    by 0x40B865: common_sighandler (main.c:672)
==7331==    by 0x5B583EF: ??? (in /usr/lib/libc.so.6)
==7331==    by 0xD88CDEB: ???
==7331==    by 0x13BA66BF: ???
==7331==    by 0x13BA5BEF: ???
==7331==    by 0x6347E81A: ???
==7331==    by 0x1136A58F: ???
==7331==    by 0x6347E81A: ???
==7331==    by 0x2E179: ???
==7331==  Address 0xd88cdec is not stack'd, malloc'd or (recently) free'd
==7331== 
==7331== 
==7331== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==7331==  Access not within mapped region at address 0xD88CDEC
==7331==    at 0x6BA1088: x86_64_fallback_frame_state (md-unwind-support.h:63)
==7331==    by 0x6BA1088: uw_frame_state_for (unwind-dw2.c:1271)
==7331==    by 0x6BA2FFA: _Unwind_Backtrace (unwind.inc:303)
==7331==    by 0x5C52FF2: backtrace (backtrace.c:78)
==7331==    by 0x40B865: common_sighandler (main.c:672)
==7331==    by 0x5B583EF: ??? (in /usr/lib/libc.so.6)
==7331==    by 0xD88CDEB: ???
==7331==    by 0x13BA66BF: ???
==7331==    by 0x13BA5BEF: ???
==7331==    by 0x6347E81A: ???
==7331==    by 0x1136A58F: ???
==7331==    by 0x6347E81A: ???
==7331==    by 0x2E179: ???
==7331==  If you believe this happened as a result of a stack
==7331==  overflow in your program's main thread (unlikely but
==7331==  possible), you can try to increase the size of the
==7331==  main thread stack using the --main-stacksize= flag.
==7331==  The main thread stack size used in this run was 8388608.
==7331== 
==7331== HEAP SUMMARY:
==7331==     in use at exit: 5,537,191 bytes in 6,897 blocks
==7331==   total heap usage: 39,180 allocs, 32,283 frees, 3,173,858,875 bytes allocated
==7331== 
==7331== LEAK SUMMARY:
==7331==    definitely lost: 187,136 bytes in 15 blocks
==7331==    indirectly lost: 178,323 bytes in 1,147 blocks
==7331==      possibly lost: 55,235 bytes in 770 blocks
==7331==    still reachable: 5,099,673 bytes in 4,793 blocks
==7331==         suppressed: 0 bytes in 0 blocks
==7331== Rerun with --leak-check=full to see details of leaked memory
==7331== 
==7331== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
==7331== 
==7331== 1 errors in context 1 of 2:
==7331== Invalid read of size 1
==7331==    at 0x6BA1088: x86_64_fallback_frame_state (md-unwind-support.h:63)
==7331==    by 0x6BA1088: uw_frame_state_for (unwind-dw2.c:1271)
==7331==    by 0x6BA2FFA: _Unwind_Backtrace (unwind.inc:303)
==7331==    by 0x5C52FF2: backtrace (backtrace.c:78)
==7331==    by 0x40B865: common_sighandler (main.c:672)
==7331==    by 0x5B583EF: ??? (in /usr/lib/libc.so.6)
==7331==    by 0xD88CDEB: ???
==7331==    by 0x13BA66BF: ???
==7331==    by 0x13BA5BEF: ???
==7331==    by 0x6347E81A: ???
==7331==    by 0x1136A58F: ???
==7331==    by 0x6347E81A: ???
==7331==    by 0x2E179: ???
==7331==  Address 0xd88cdec is not stack'd, malloc'd or (recently) free'd
==7331== 
==7331== 
==7331== 1 errors in context 2 of 2:
==7331== Jump to the invalid address stated on the next line
==7331==    at 0xD88CDEC: ???
==7331==    by 0x13BA66BF: ???
==7331==    by 0x13BA5BEF: ???
==7331==    by 0x6347E81A: ???
==7331==    by 0x1136A58F: ???
==7331==    by 0x6347E81A: ???
==7331==    by 0x2E179: ???
==7331==    by 0x133A5FFF: ???
==7331==    by 0xF88AF8F: ???
==7331==    by 0xD88F03F: ???
==7331==    by 0xF88B0DF: ???
==7331==  Address 0xd88cdec is not stack'd, malloc'd or (recently) free'd
==7331== 
==7331== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
fish: Job 1, 'valgrind -s cpu-x -v --debug' terminated by signal SIGSEGV (Address boundary error)

@Umio-Yasuno
Copy link
Contributor

Does this occur when OCL_ICD_VENDORS env is set to only either NVIDIA driver or Mesa Clover?

By setting the path of the OpenCL ICD file to OCL_ICD_VENDORS env, you can specify the platform and execute the OpenCL program.

@ptr1337
Copy link
Author

ptr1337 commented Oct 13, 2022

@Umio-Yasuno

Sorry for the dumb question, I have never done anything with opencl. Which program should I test ?

@Umio-Yasuno
Copy link
Contributor

Umio-Yasuno commented Oct 13, 2022

@ptr1337 Please run cpu-x --isssue-fmt or cpu-x -D ---debug -v.
e.g. OCL_ICD_VENDORS="/etc/OpenCL/vendors/nvidia.icd" cpu-x --issue-fmt

But, Arch Linux may have ICD installed in a different location.

@ptr1337
Copy link
Author

ptr1337 commented Oct 13, 2022

@ptr1337 Please run cpu-x --isssue-fmt or cpu-x -D ---debug -v. e.g. OCL_ICD_VENDORS="/etc/OpenCL/vendors/nvidia.icd" cpu-x --issue-fmt

But, Arch Linux may have ICD installed in a different location.

Ah, then I did understood it right first.

❯ OCL_ICD_VENDORS=/etc/OpenCL/vendors/nvidia.icd cpu-x --issue-fmt
fish: Job 1, 'OCL_ICD_VENDORS=/etc/OpenCL/ven…' terminated by signal SIGSEGV (Address boundary error)
❯ OCL_ICD_VENDORS=/etc/OpenCL/vendors/nvidia.icd cpu-x --debug -v
Setting label names
Calling libcpuid for retrieving static data
change_current_core_id(type=0) ==> 0
Finding CPU technology
cpu_technology: model   1, ext. model  33, ext. family  25 => values to find
cpu_technology: model   0, ext. model  16, ext. family  21 => entry #000 does not match
cpu_technology: model   0, ext. model  48, ext. family  21 => entry #001 does not match
cpu_technology: model   0, ext. model 112, ext. family  21 => entry #002 does not match
cpu_technology: model   0, ext. model  -1, ext. family  22 => entry #003 does not match
cpu_technology: model   1, ext. model  -1, ext. family  18 => entry #004 does not match
cpu_technology: model   1, ext. model  -1, ext. family  20 => entry #005 does not match
cpu_technology: model   1, ext. model   1, ext. family  21 => entry #006 does not match
cpu_technology: model   1, ext. model  96, ext. family  21 => entry #007 does not match
cpu_technology: model   2, ext. model  -1, ext. family  16 => entry #008 does not match
cpu_technology: model   2, ext. model  -1, ext. family  20 => entry #009 does not match
cpu_technology: model   2, ext. model  -1, ext. family  21 => entry #010 does not match
cpu_technology: model   3, ext. model  -1, ext. family  15 => entry #011 does not match
cpu_technology: model   3, ext. model  -1, ext. family  21 => entry #012 does not match
cpu_technology: model   4, ext. model  -1, ext. family  15 => entry #013 does not match
cpu_technology: model   4, ext. model  -1, ext. family  16 => entry #014 does not match
cpu_technology: model   5, ext. model  -1, ext. family  16 => entry #015 does not match
cpu_technology: model   5, ext. model  -1, ext. family  21 => entry #016 does not match
cpu_technology: model   6, ext. model  -1, ext. family  16 => entry #017 does not match
cpu_technology: model   8, ext. model  -1, ext. family   6 => entry #018 does not match
cpu_technology: model   8, ext. model  -1, ext. family  15 => entry #019 does not match
cpu_technology: model   8, ext. model  -1, ext. family  21 => entry #020 does not match
cpu_technology: model   9, ext. model  -1, ext. family  16 => entry #021 does not match
cpu_technology: model  10, ext. model  -1, ext. family   6 => entry #022 does not match
cpu_technology: model  10, ext. model  -1, ext. family  16 => entry #023 does not match
cpu_technology: model  11, ext. model  -1, ext. family  15 => entry #024 does not match
cpu_technology: model  12, ext. model  -1, ext. family  15 => entry #025 does not match
cpu_technology: model  15, ext. model  79, ext. family  15 => entry #026 does not match
cpu_technology: model  15, ext. model 127, ext. family  15 => entry #027 does not match
cpu_technology: model  -1, ext. model   1, ext. family  23 => entry #028 does not match
cpu_technology: model  -1, ext. model  17, ext. family  23 => entry #029 does not match
cpu_technology: model  -1, ext. model   8, ext. family  23 => entry #030 does not match
cpu_technology: model  -1, ext. model  32, ext. family  23 => entry #031 does not match
cpu_technology: model  -1, ext. model  24, ext. family  23 => entry #032 does not match
cpu_technology: model  -1, ext. model  49, ext. family  23 => entry #033 does not match
cpu_technology: model  -1, ext. model  96, ext. family  23 => entry #034 does not match
cpu_technology: model  -1, ext. model 104, ext. family  23 => entry #035 does not match
cpu_technology: model  -1, ext. model 113, ext. family  23 => entry #036 does not match
cpu_technology: model  -1, ext. model 144, ext. family  23 => entry #037 does not match
cpu_technology: model  -1, ext. model  33, ext. family  25 => entry #038 matches
Finding devices
find_gpu_device_path: device_path=/sys/bus/pci/devices/0000:07:00.0
CPU-X:core.c:858: failed to call GLFW (65544): Wayland: Failed to connect to display
Finding Vulkan API version
Vulkan devices count: 1
Number of OpenCL platforms: 1
Number of OpenCL devices for platform 'NVIDIA CUDA OpenCL 3.0 CUDA 11.8.87': 1
OpenCL device 0 is 'NVIDIA GeForce GTX 1070 Ti OpenCL 3.0 CUDA'
Identifying running system
Finding CPU package in fallback mode
Retrieving motherboard information in fallback mode
Calling libcpuid for retrieving dynamic data
Calculating CPU usage
Retrieving CPU temperature in fallback mode
request_sensor_path(base_dir=/sys/class/hwmon, cached_path=/sys/class/hwmon/hwmon1/temp1_input, which=0) ==> 0
Retrieving CPU voltage in fallback mode
request_sensor_path(base_dir=/sys/class/hwmon, cached_path=/sys/class/hwmon/hwmon1/in1_input, which=2) ==> 0
Calling bandwidth
fish: Job 1, 'OCL_ICD_VENDORS=/etc/OpenCL/ven…' terminated by signal SIGSEGV (Address boundary error)
❯ OCL_ICD_VENDORS=/etc/OpenCL/vendors/mesa.icd cpu-x --debug -v
Setting label names
Calling libcpuid for retrieving static data
change_current_core_id(type=0) ==> 0
Finding CPU technology
cpu_technology: model   1, ext. model  33, ext. family  25 => values to find
cpu_technology: model   0, ext. model  16, ext. family  21 => entry #000 does not match
cpu_technology: model   0, ext. model  48, ext. family  21 => entry #001 does not match
cpu_technology: model   0, ext. model 112, ext. family  21 => entry #002 does not match
cpu_technology: model   0, ext. model  -1, ext. family  22 => entry #003 does not match
cpu_technology: model   1, ext. model  -1, ext. family  18 => entry #004 does not match
cpu_technology: model   1, ext. model  -1, ext. family  20 => entry #005 does not match
cpu_technology: model   1, ext. model   1, ext. family  21 => entry #006 does not match
cpu_technology: model   1, ext. model  96, ext. family  21 => entry #007 does not match
cpu_technology: model   2, ext. model  -1, ext. family  16 => entry #008 does not match
cpu_technology: model   2, ext. model  -1, ext. family  20 => entry #009 does not match
cpu_technology: model   2, ext. model  -1, ext. family  21 => entry #010 does not match
cpu_technology: model   3, ext. model  -1, ext. family  15 => entry #011 does not match
cpu_technology: model   3, ext. model  -1, ext. family  21 => entry #012 does not match
cpu_technology: model   4, ext. model  -1, ext. family  15 => entry #013 does not match
cpu_technology: model   4, ext. model  -1, ext. family  16 => entry #014 does not match
cpu_technology: model   5, ext. model  -1, ext. family  16 => entry #015 does not match
cpu_technology: model   5, ext. model  -1, ext. family  21 => entry #016 does not match
cpu_technology: model   6, ext. model  -1, ext. family  16 => entry #017 does not match
cpu_technology: model   8, ext. model  -1, ext. family   6 => entry #018 does not match
cpu_technology: model   8, ext. model  -1, ext. family  15 => entry #019 does not match
cpu_technology: model   8, ext. model  -1, ext. family  21 => entry #020 does not match
cpu_technology: model   9, ext. model  -1, ext. family  16 => entry #021 does not match
cpu_technology: model  10, ext. model  -1, ext. family   6 => entry #022 does not match
cpu_technology: model  10, ext. model  -1, ext. family  16 => entry #023 does not match
cpu_technology: model  11, ext. model  -1, ext. family  15 => entry #024 does not match
cpu_technology: model  12, ext. model  -1, ext. family  15 => entry #025 does not match
cpu_technology: model  15, ext. model  79, ext. family  15 => entry #026 does not match
cpu_technology: model  15, ext. model 127, ext. family  15 => entry #027 does not match
cpu_technology: model  -1, ext. model   1, ext. family  23 => entry #028 does not match
cpu_technology: model  -1, ext. model  17, ext. family  23 => entry #029 does not match
cpu_technology: model  -1, ext. model   8, ext. family  23 => entry #030 does not match
cpu_technology: model  -1, ext. model  32, ext. family  23 => entry #031 does not match
cpu_technology: model  -1, ext. model  24, ext. family  23 => entry #032 does not match
cpu_technology: model  -1, ext. model  49, ext. family  23 => entry #033 does not match
cpu_technology: model  -1, ext. model  96, ext. family  23 => entry #034 does not match
cpu_technology: model  -1, ext. model 104, ext. family  23 => entry #035 does not match
cpu_technology: model  -1, ext. model 113, ext. family  23 => entry #036 does not match
cpu_technology: model  -1, ext. model 144, ext. family  23 => entry #037 does not match
cpu_technology: model  -1, ext. model  33, ext. family  25 => entry #038 matches
Finding devices
find_gpu_device_path: device_path=/sys/bus/pci/devices/0000:07:00.0
CPU-X:core.c:858: failed to call GLFW (65544): Wayland: Failed to connect to display
Finding Vulkan API version
Vulkan devices count: 1
Number of OpenCL platforms: 1
CPU-X:core.c:1215: failed to find number of OpenCL devices for platform 'Clover OpenCL 1.1 Mesa 22.2.1' (CL_DEVICE_NOT_FOUND)
Identifying running system
Finding CPU package in fallback mode
Retrieving motherboard information in fallback mode
Calling libcpuid for retrieving dynamic data
Calculating CPU usage
Retrieving CPU temperature in fallback mode

Oops, something was wrong! CPU-X has received signal 11 (Segmentation fault) and has crashed.
========================= Backtrace =========================
CPU-X 4.5.0+git-r3-g5091a71cf8d8ffa0bc41aba83f958ede18a07147 (Oct 13 2022 10:24:14, Linux x86_64, GNU 12.2.0)
request_sensor_path(base_dir=/sys/class/hwmon, cached_path=/sys/class/hwmon/hwmon1/temp1_input, which=0) ==> 0
Retrieving CPU voltage in fallback mode
request_sensor_path(base_dir=/sys/class/hwmon, cached_path=/sys/class/hwmon/hwmon1/in1_input, which=2) ==> 0
Calling bandwidth
# 1 /usr/lib/libc.so.6(+0x3c3f0) [0x7fa99683c3f0]
# 2 /usr/lib/libclang-cpp.so.14(_ZN5clang4ento22PathDiagnosticLocation6createERKNS_12ProgramPointERKNS_13SourceManagerE+0x12c) [0x7fa98268cdec]
# 3 [0x1e33000]
======================== End Backtrace =======================

You can open a new issue here, by filling the template as requested:
https://github.com/X0rg/CPU-X/issues/new?template=bug_report.md

fish: Job 1, 'OCL_ICD_VENDORS=/etc/OpenCL/ven…' terminated by signal SIGSEGV (Address boundary error)

@Umio-Yasuno
Copy link
Contributor

Hmm... I have no idea...
Have you updated the NVIDIA driver recently?

@TheTumultuousUnicornOfDarkness
Copy link
Owner

@ptr1337 thank you for the new dump, this is helpful.

@Umio-Yasuno the bug seems not related to OpenCL code (get_gpu_comp_unit()) but Vulkan code (get_vulkan_api_version()). Each dump show a different backtrace, it is a bad memory corruption.
With Valgrind, it enters in get_vulkan_api_version(), it prints Vulkan devices count: 1, then a function creates a new thread and it crashes later.
I will add more MSG_DEBUG() to narrow the root cause.

@TheTumultuousUnicornOfDarkness
Copy link
Owner

If VK_EXT_PCI_BUS_INFO_EXTENSION_NAME is not defined, I am afraid that bus_info is uninitialized.

@Umio-Yasuno in case below, you are evaluating bus_info before checking use_device_id, maybe that is the issue:

if(((uint32_t)dev->domain    == bus_info.pciDomain    &&
	dev->bus                 == bus_info.pciBus       &&
	dev->dev                 == bus_info.pciDevice    &&
	dev->func                == bus_info.pciFunction) ||
	(use_device_id &&
	(uint32_t)dev->device_id == prop2.properties.deviceID))

I added some commits on master, maybe a18f9c8 fixed something.

@ptr1337 could you please rebuild the cpu-x-git package from master, then rerun valgrind -s cpu-x -v --debug and share the output?

@ptr1337
Copy link
Author

ptr1337 commented Oct 13, 2022

If VK_EXT_PCI_BUS_INFO_EXTENSION_NAME is not defined, I am afraid that bus_info is uninitialized.

@Umio-Yasuno in case below, you are evaluating bus_info before checking use_device_id, maybe that is the issue:

if(((uint32_t)dev->domain    == bus_info.pciDomain    &&
	dev->bus                 == bus_info.pciBus       &&
	dev->dev                 == bus_info.pciDevice    &&
	dev->func                == bus_info.pciFunction) ||
	(use_device_id &&
	(uint32_t)dev->device_id == prop2.properties.deviceID))

I added some commits on master, maybe a18f9c8 fixed something.

@ptr1337 could you please rebuild the cpu-x-git package from master, then rerun valgrind -s cpu-x -v --debug and share the output?

Hey!
Just came home and did rebuild cpu-x-git and it seems that the issue is solved. I can start it without any issues.

Here the log of valgrind:

❯ valgrind -s cpu-x -v --debug
==6023== Memcheck, a memory error detector
==6023== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==6023== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==6023== Command: cpu-x -v --debug
==6023== 
Setting label names
Calling libcpuid for retrieving static data
change_current_core_id(type=0) ==> 0
Finding CPU technology
cpu_technology: model  12, ext. model  60, ext. family   6 => values to find
cpu_technology: model   0, ext. model   0, ext. family  -1 => entry #000 does not match
cpu_technology: model   1, ext. model   1, ext. family   6 => entry #001 does not match
cpu_technology: model   1, ext. model   1, ext. family  15 => entry #002 does not match
cpu_technology: model   2, ext. model   2, ext. family  -1 => entry #003 does not match
cpu_technology: model   3, ext. model   3, ext. family   5 => entry #004 does not match
cpu_technology: model   3, ext. model   3, ext. family   6 => entry #005 does not match
cpu_technology: model   3, ext. model   3, ext. family  15 => entry #006 does not match
cpu_technology: model   4, ext. model   4, ext. family  -1 => entry #007 does not match
cpu_technology: model   5, ext. model   5, ext. family   6 => entry #008 does not match
cpu_technology: model   5, ext. model  37, ext. family  -1 => entry #009 does not match
cpu_technology: model   5, ext. model  53, ext. family  -1 => entry #010 does not match
cpu_technology: model   5, ext. model  69, ext. family  -1 => entry #011 does not match
cpu_technology: model   5, ext. model  85, ext. family  -1 => entry #012 does not match
cpu_technology: model   5, ext. model 165, ext. family   6 => entry #013 does not match
cpu_technology: model   6, ext. model   6, ext. family   6 => entry #014 does not match
cpu_technology: model   6, ext. model   6, ext. family  15 => entry #015 does not match
cpu_technology: model   6, ext. model  22, ext. family  -1 => entry #016 does not match
cpu_technology: model   6, ext. model  54, ext. family  -1 => entry #017 does not match
cpu_technology: model   6, ext. model  70, ext. family  -1 => entry #018 does not match
cpu_technology: model   6, ext. model 102, ext. family  -1 => entry #019 does not match
cpu_technology: model   7, ext. model   7, ext. family  -1 => entry #020 does not match
cpu_technology: model   7, ext. model  23, ext. family  -1 => entry #021 does not match
cpu_technology: model   7, ext. model  55, ext. family  -1 => entry #022 does not match
cpu_technology: model   7, ext. model  71, ext. family  -1 => entry #023 does not match
cpu_technology: model   7, ext. model 151, ext. family  -1 => entry #024 does not match
cpu_technology: model   7, ext. model 167, ext. family  -1 => entry #025 does not match
cpu_technology: model   8, ext. model   0, ext. family   0 => entry #026 does not match
cpu_technology: model   8, ext. model   8, ext. family  -1 => entry #027 does not match
cpu_technology: model   9, ext. model   9, ext. family  -1 => entry #028 does not match
cpu_technology: model  10, ext. model  26, ext. family  -1 => entry #029 does not match
cpu_technology: model  10, ext. model  30, ext. family  -1 => entry #030 does not match
cpu_technology: model  10, ext. model  42, ext. family  -1 => entry #031 does not match
cpu_technology: model  10, ext. model  58, ext. family  -1 => entry #032 does not match
cpu_technology: model  10, ext. model 122, ext. family  -1 => entry #033 does not match
cpu_technology: model  10, ext. model 154, ext. family  -1 => entry #034 does not match
cpu_technology: model  11, ext. model  11, ext. family  -1 => entry #035 does not match
cpu_technology: model  12, ext. model  28, ext. family  -1 => entry #036 does not match
cpu_technology: model  12, ext. model  44, ext. family  -1 => entry #037 does not match
cpu_technology: model  12, ext. model  60, ext. family  -1 => entry #038 matches
Finding devices
find_gpu_device_path: device_path=/sys/bus/pci/devices/0000:07:00.0
CPU-X:core.c:858: failed to call GLFW (65544): Wayland: Failed to connect to display
Finding Vulkan API version
Vulkan devices count: 1
Looping into Vulkan device 0
==6023== Warning: noted but unhandled ioctl 0x6444 with no size/direction hints.
==6023==    This could cause spurious value errors to appear.
==6023==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
Vulkan device 0: device matches with pci_dev
==6023== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==6023==    This could cause spurious value errors to appear.
==6023==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==6023== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==6023==    This could cause spurious value errors to appear.
==6023==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==6023== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
==6023==    This could cause spurious value errors to appear.
==6023==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==6023== Warning: noted but unhandled ioctl 0x37 with no size/direction hints.
==6023==    This could cause spurious value errors to appear.
==6023==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==6023== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
==6023==    This could cause spurious value errors to appear.
==6023==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==6023== Warning: set address range perms: large range [0x200000000, 0x300200000) (noaccess)
==6023== Warning: set address range perms: large range [0x1591b000, 0x3591a000) (noaccess)
==6023== Warning: noted but unhandled ioctl 0x19 with no size/direction hints.
==6023==    This could cause spurious value errors to appear.
==6023==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
Vulkan device 0: Ray Tracing support is ON
==6023== Warning: noted but unhandled ioctl 0x18 with no size/direction hints.
==6023==    This could cause spurious value errors to appear.
==6023==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==6023== Warning: noted but unhandled ioctl 0x1a with no size/direction hints.
==6023==    This could cause spurious value errors to appear.
==6023==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
Vulkan device 0: version is '1.3.205'
==6023== Warning: set address range perms: large range [0x400000000, 0x500200000) (noaccess)
==6023== Warning: set address range perms: large range [0x379d9000, 0x579d8000) (noaccess)
Number of OpenCL platforms: 2
Looping into OpenCL platform 0
OpenCL platform 0: name is 'NVIDIA CUDA'
OpenCL platform 0: version is 'OpenCL 3.0 CUDA 11.8.87'
OpenCL platform 0: found 1 devices
Looping into OpenCL platform 0, device 0
OpenCL platform 0, device 0: found vendor 0x10DE
OpenCL platform 0, device 0: name is 'NVIDIA GeForce GTX 1070 Ti'
OpenCL platform 0, device 0: version is 'OpenCL 3.0 CUDA'
OpenCL platform 0, device 0: vendor is NVIDIA
OpenCL platform 0, device 0: found 19 SM
==6023== Invalid read of size 1
==6023==    at 0x40C745: casprintf (util.c:88)
==6023==    by 0x415453: find_devices (core.c:1451)
==6023==    by 0x417578: fill_labels (core.c:110)
==6023==    by 0x40AF74: main (main.c:853)
==6023==  Address 0xf777711 is 0 bytes after a block of size 1 alloc'd
==6023==    at 0x48458B8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6023==    by 0x5BA6268: __vasprintf_internal (vasprintf.c:71)
==6023==    by 0x40C6FC: vasprintf (stdio2.h:169)
==6023==    by 0x40C6FC: casprintf (util.c:79)
==6023==    by 0x415453: find_devices (core.c:1451)
==6023==    by 0x417578: fill_labels (core.c:110)
==6023==    by 0x40AF74: main (main.c:853)
==6023== 
==6023== Invalid write of size 1
==6023==    at 0x40C7EB: casprintf (util.c:98)
==6023==    by 0x415453: find_devices (core.c:1451)
==6023==    by 0x417578: fill_labels (core.c:110)
==6023==    by 0x40AF74: main (main.c:853)
==6023==  Address 0xf777711 is 0 bytes after a block of size 1 alloc'd
==6023==    at 0x48458B8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6023==    by 0x5BA6268: __vasprintf_internal (vasprintf.c:71)
==6023==    by 0x40C6FC: vasprintf (stdio2.h:169)
==6023==    by 0x40C6FC: casprintf (util.c:79)
==6023==    by 0x415453: find_devices (core.c:1451)
==6023==    by 0x417578: fill_labels (core.c:110)
==6023==    by 0x40AF74: main (main.c:853)
==6023== 
Identifying running system
Finding CPU package in fallback mode
Your CPU socket is not present in the database ==> Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz, codename: Haswell (Core i7)
Retrieving motherboard information in fallback mode
Calling libcpuid for retrieving dynamic data
Calculating CPU usage
Retrieving CPU temperature in fallback mode
request_sensor_path(base_dir=/sys/class/hwmon, cached_path=/sys/class/hwmon/hwmon1/temp1_input, which=0) ==> 0
Retrieving CPU voltage in fallback mode
request_sensor_path(base_dir=/sys/class/hwmon, cached_path=/sys/class/hwmon/hwmon1/in1_input, which=2) ==> 0
Calling bandwidth
Calling libprocps
Retrieving GPU clocks
gpu_monitoring: nvidia: nvidia_cmd_args=nvidia-smi --format=csv,noheader,nounits --id=0
Updating benchmark status
Starting GTK GUI…
The futex facility returned an unexpected error code.

Oops, something was wrong! CPU-X has received signal 6 (Aborted) and is trying to recover.
========================= Backtrace =========================
CPU-X 4.5.0+git-r8-g603cb67e4351cc6190a9bb7c643de8bbef21331c (Oct 13 2022 19:39:23, Linux x86_64, GNU 12.2.0)
# 1 /usr/lib/libc.so.6(+0x3c3f0) [0x5b583f0]
# 2 /usr/lib/libc.so.6(pthread_kill+0x11b) [0x5bb53bb]
# 3 /usr/lib/libc.so.6(gsignal+0x18) [0x5b58348]
# 4 /usr/lib/libc.so.6(abort+0xd7) [0x5b3e53d]
# 5 /usr/lib/libc.so.6(+0x8b574) [0x5ba7574]
# 6 /usr/lib/libc.so.6(+0x8b8a0) [0x5ba78a0]
# 7 /usr/lib/libc.so.6(+0x93a2d) [0x5bafa2d]
# 8 /usr/lib/libc.so.6(+0x99905) [0x5bb5905]
# 9 /usr/lib/libc.so.6(pthread_cond_wait+0x10c) [0x5bb22fc]
#10 /usr/lib/libpulse.so.0(pa_threaded_mainloop_wait+0x2d) [0xee02cfd]
#11 /usr/lib/libcanberra-0.30/libcanberra-pulse.so(pulse_driver_open+0xb9) [0xe315d89]
#12 /usr/lib/libcanberra.so.0(+0xc2e4) [0xb9c32e4]
#13 /usr/lib/libcanberra.so.0(+0x3189) [0xb9ba189]
#14 /usr/lib/libcanberra.so.0(ca_context_open+0x34) [0xb9ba914]
#15 /usr/lib/libcanberra-0.30/libcanberra-multi.so(+0x11f1) [0xe30d1f1]
======================== End Backtrace =======================

==6023== 
==6023== Process terminating with default action of signal 6 (SIGABRT): dumping core
==6023==    at 0x5BB53BB: __pthread_kill_implementation (pthread_kill.c:44)
==6023==    by 0x5BB53BB: __pthread_kill_internal (pthread_kill.c:78)
==6023==    by 0x5BB53BB: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==6023==    by 0x5B58347: raise (raise.c:26)
==6023==    by 0x5B3E5C6: abort (abort.c:100)
==6023==    by 0x5BA7573: __libc_message.constprop.0 (libc_fatal.c:155)
==6023==    by 0x5BA789F: __libc_fatal (libc_fatal.c:164)
==6023==    by 0x5BAFA2C: futex_fatal_error (futex-internal.h:87)
==6023==    by 0x5BAFA2C: __futex_lock_pi64 (futex-internal.c:203)
==6023==    by 0x5BB5904: __pthread_mutex_cond_lock_full (pthread_mutex_lock.c:441)
==6023==    by 0x5BB22FB: __pthread_cond_wait_common (pthread_cond_wait.c:607)
==6023==    by 0x5BB22FB: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:618)
==6023==    by 0xEE02CFC: pa_threaded_mainloop_wait (in /usr/lib/libpulse.so.0.24.2)
==6023==    by 0xE315D88: pulse_driver_open (in /usr/lib/libcanberra-0.30/libcanberra-pulse.so)
==6023==    by 0xB9C32E3: ??? (in /usr/lib/libcanberra.so.0.2.5)
==6023==    by 0xB9BA188: ??? (in /usr/lib/libcanberra.so.0.2.5)
==6023== 
==6023== HEAP SUMMARY:
==6023==     in use at exit: 11,122,402 bytes in 79,058 blocks
==6023==   total heap usage: 446,461 allocs, 367,403 frees, 3,290,675,023 bytes allocated
==6023== 
==6023== LEAK SUMMARY:
==6023==    definitely lost: 194,766 bytes in 64 blocks
==6023==    indirectly lost: 217,023 bytes in 2,724 blocks
==6023==      possibly lost: 142,440 bytes in 2,473 blocks
==6023==    still reachable: 9,619,165 bytes in 67,596 blocks
==6023==         suppressed: 0 bytes in 0 blocks
==6023== Rerun with --leak-check=full to see details of leaked memory
==6023== 
==6023== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
==6023== 
==6023== 1 errors in context 1 of 2:
==6023== Invalid write of size 1
==6023==    at 0x40C7EB: casprintf (util.c:98)
==6023==    by 0x415453: find_devices (core.c:1451)
==6023==    by 0x417578: fill_labels (core.c:110)
==6023==    by 0x40AF74: main (main.c:853)
==6023==  Address 0xf777711 is 0 bytes after a block of size 1 alloc'd
==6023==    at 0x48458B8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6023==    by 0x5BA6268: __vasprintf_internal (vasprintf.c:71)
==6023==    by 0x40C6FC: vasprintf (stdio2.h:169)
==6023==    by 0x40C6FC: casprintf (util.c:79)
==6023==    by 0x415453: find_devices (core.c:1451)
==6023==    by 0x417578: fill_labels (core.c:110)
==6023==    by 0x40AF74: main (main.c:853)
==6023== 
==6023== 
==6023== 1 errors in context 2 of 2:
==6023== Invalid read of size 1
==6023==    at 0x40C745: casprintf (util.c:88)
==6023==    by 0x415453: find_devices (core.c:1451)
==6023==    by 0x417578: fill_labels (core.c:110)
==6023==    by 0x40AF74: main (main.c:853)
==6023==  Address 0xf777711 is 0 bytes after a block of size 1 alloc'd
==6023==    at 0x48458B8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6023==    by 0x5BA6268: __vasprintf_internal (vasprintf.c:71)
==6023==    by 0x40C6FC: vasprintf (stdio2.h:169)
==6023==    by 0x40C6FC: casprintf (util.c:79)
==6023==    by 0x415453: find_devices (core.c:1451)
==6023==    by 0x417578: fill_labels (core.c:110)
==6023==    by 0x40AF74: main (main.c:853)
==6023== 
==6023== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
fish: Job 1, 'valgrind -s cpu-x -v --debug' terminated by signal SIGABRT (Abort)

TheTumultuousUnicornOfDarkness added a commit that referenced this issue Oct 13, 2022
Version strings must be keep as it: there is no need to remove stuff from string.
Related to #246
@TheTumultuousUnicornOfDarkness
Copy link
Owner

Ok, thanks. I fixed the Invalid write of size 1 caused at 0x415453: find_devices (core.c:1451) in 792429b.

If it does not crash anymore, let's close this issue.
Thank you very much for the help!

@ptr1337
Copy link
Author

ptr1337 commented Oct 13, 2022

Ok, thanks. I fixed the Invalid write of size 1 caused at 0x415453: find_devices (core.c:1451) in 792429b.

If it does not crash anymore, let's close this issue. Thank you very much for the help!

Much thanks for fixing so fast the issue!

Actually one question I would have - Is it possible to show at cpu-x (like at cpu-z on windows) the timings/freq of the ram?
I think that would be a amazing addition!

@TheTumultuousUnicornOfDarkness
Copy link
Owner

Actually one question I would have - Is it possible to show at cpu-x (like at cpu-z on windows) the timings/freq of the ram? I think that would be a amazing addition!

It was requested some time ago in #96.
The problem for me is I never found way on Linux to get current memory settings.
Tools like decode-dimms needs specific kernel modules (like eeprom) to scan I2C bus. And on some platforms, there are ACPI conflicts and users need to boot with acpi_enforce_resources=lax but it has drawbacks (see https://bugzilla.kernel.org/show_bug.cgi?id=204807#c37).
decode-dimms can decode SPD and provide memory characteristics, but more like capabilities instead of real values.

So that is why this feature is missing in CPU-X.

@ptr1337
Copy link
Author

ptr1337 commented Oct 13, 2022

Yes, I noticed that, that at linux general is a missing possibility for that. Maybe in the future!

Thanks for your fast info and your work!

@Umio-Yasuno
Copy link
Contributor

If VK_EXT_PCI_BUS_INFO_EXTENSION_NAME is not defined, I am afraid that bus_info is uninitialized.

@Umio-Yasuno in case below, you are evaluating bus_info before checking use_device_id, maybe that is the issue:

if(((uint32_t)dev->domain    == bus_info.pciDomain    &&
	dev->bus                 == bus_info.pciBus       &&
	dev->dev                 == bus_info.pciDevice    &&
	dev->func                == bus_info.pciFunction) ||
	(use_device_id &&
	(uint32_t)dev->device_id == prop2.properties.deviceID))

I added some commits on master, maybe a18f9c8 fixed something.

@ptr1337 could you please rebuild the cpu-x-git package from master, then rerun valgrind -s cpu-x -v --debug and share the output?

Sorry, it was a mistake in my code.
I thought VK_EXT_PCI_BUS_INFO_EXTENSION_NAME is defined in the Vulkan SDK ver1.3.224.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants