Skip to content

Clocksource and AMDGPU unstable on kernels 6.11.X could be related to firmware on BIOS 3.05 (AMD Framework 13th) #17

@AkechiShiro

Description

@AkechiShiro

Device Information

Framework 13th AMD, Ryzen 7 7840U

System Model or SKU

Framework 13th AMD, Ryzen 7 7840U

Please select one of the following

  • Framework Laptop 13 (11th Gen Intel® Core™)
  • Framework Laptop 13 (12th Gen Intel® Core™)
  • Framework Laptop 13 (13th Gen Intel® Core™)
  • Framework Laptop 13 (AMD Ryzen™ 7040 Series)
  • Framework Laptop 13 (Intel® Core™ Ultra Series 1)
  • Framework Laptop 16 (AMD Ryzen™ 7040 Series)

BIOS VERSION

Please provide the bios version.

Linux: 03.05

DIY Edition information

If you are experiencing an issue on a DIY system, Please also fill out the memory and storage devices you are using.

Memory: Framework official RAM DDR5 2x32GB
Storage: Samsung SSD 970 EVO Plus 2TB

Port/Peripheral information

Seems to be related to the BIOS/AMDGPU driver.

Standalone Operation

Are you running your mainboard as a standalone device. Is standalone mode enabled in the BIOS?

  • Yes
  • No

Describe the bug

Under kernel 6.11.X, clocksource seems to be reported as unstable, logs message in kernel log buffer (dmesg) indicates this is due to a broken BIOS.

AMDGPU can sometimes also report very weird errors @superm1, this could also be due to the BIOS/firmware.

 on CPU2: Marking clocksource 'tsc' as unstable because the skew is too large:
  'hpet' wd_nsec: 503458959 wd_now: 2ed90c0a wd_last: 2e6b0d62 mask: ffffffff
  'tsc' cs_nsec: 503985962 cs_now: fccb13b417a6 cs_last: fccab0c1f179 mask: ffffffffffffffff
  Clocksource 'tsc' skewed 527003 ns (0 ms) over watchdog 'hpet' interval of 503458959 ns (503 ms)
sept. 06 20:57:03 hostname kernel: clocksource:                       'tsc' is current clocksource.
sept. 06 20:57:03 hostname kernel: tsc: Marking TSC unstable due to clocksource watchdog
sept. 06 20:57:03 hostname kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
sept. 06 20:57:03 hostname kernel: sched_clock: Marking unstable (22961617650112, 61413986153193)<-(84375623658522, -19871412)
sept. 06 20:57:03 hostname kernel: clocksource: Checking clocksource tsc synchronization from CPU 4 to CPUs 0-2,5,13,15.
sept. 06 20:57:03 hostname kernel: clocksource: Switched to clocksource hpet
nov. 26 22:25:05 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:05 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:05 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:06 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:06 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:06 hostname wpa_supplicant[1896]: wlp1s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-38 noise=9999 txrate=286700
nov. 26 22:25:06 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:07 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:07 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:07 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:07 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:08 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:08 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:08 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:08 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:09 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
nov. 26 22:25:09 hostname kernel: amdgpu 0000:c1:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

Steps To Reproduce

Steps to reproduce the behavior:

  1. Let the computer be used for at least 7 days of uptime and 30GB of RAM at least.
  2. Randomly the issue will show up I have no way of reproducing (I'm under 6.11.8 and have seen the issue from 6.11.4 up to 6.11.8 Linux kernels)
  3. Multiples suspends cycles were done during the 7 days of usage (USB-C dock can be used or not doesn't really matter)

workaround for AMDGPU issue

Sometimes manually triggering a gpu recovery with the driver seems to resolve the very heavy lag situation but it can also just freeze the laptop :

  • sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover

Expected behavior

Both AMDGPU and the clocksource issue should not be happening.

Operating System (please complete the following information):

  • OS/Distribution: NixOS
  • Version: 24.05 (soon will move to the next release 24.11) however note I'm using the latest kernel available in Nixpkgs stable repositories.
  • Linux Kernel Version: uname -a : Linux hostname 6.11.8 #1-NixOS SMP PREEMPT_DYNAMIC Thu Nov 14 12:21:16 UTC 2024 x86_64 GNU/Linux

Additional context

I have opened this topic on Framework Community but no one has answered yet, there are more logs output inside and more kernels version where I hit the issues : https://community.frame.work/t/nixos-amd-framework-13th-amd-ryzen-7-7840u-64gb-framework-ddr5-ram-uma-settings-gamer-on-kernels-6-11-x-have-random-heavy-lags-related-to-amdgpu-or-possibly-firmware/60561

I will try 6.12 kernels next week hopefully and also BIOS 3.06 once the release is marked as stable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions