Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD CPU Frequency Scaling Broken #8008

Open
Geblaat opened this issue Jan 30, 2023 · 8 comments
Open

AMD CPU Frequency Scaling Broken #8008

Geblaat opened this issue Jan 30, 2023 · 8 comments
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: Xen hardware support needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@Geblaat
Copy link

Geblaat commented Jan 30, 2023

Qubes OS release

4.1

Brief summary

CPU frequencies do not seem to scale properly, at least for the lower frequencies.

Steps to reproduce

Tested on:

Lenovo Thinkpad L14 Gen 3 AMD with Ryzen 7 Pro 5875u
Latest BIOS from Dec. 2022
Kernel: 6.1.5-1
EFI install.

(Also reported by @lunarthegrey on Ryzen 7 Pro 3700u on R4.0 with kernel 5.4.10-1: #4604 (comment))

In dom0, sudo xenpm get-cpufreq-para

Expected behavior

A low minimum frequency should be shown for powersaving and a maximum frequency of 4500MHz.
Frequencies should be able to be used.

Maybe implement amd p-state driver.

Actual behavior

Minimum frequency shown is 1600MHz and maximum is only 2000MHz:

cpu id               : 0
affected_cpus        : 0
cpuinfo frequency    : max [2000000] min [1600000] cur [1600000]
scaling_driver       : powernow
scaling_avail_gov    : userspace performance powersave ondemand
current_governor     : ondemand
  ondemand specific  :
    sampling_rate    : max [10000000] min [10000] cur [20000]
    up_threshold     : 80
scaling_avail_freq   : 2000000 1800000 *1600000
scaling frequency    : max [2000000] min [1600000] cur [1600000]
turbo mode           : enabled

Changing governor to performance does not work.
BIOS setting to set CPU power usage to auto-saving does not make any difference.

Only 3 P states are reported:
sudo xenpm get-cpufreq-states
cpu id               : 0
total P-states       : 3
usable P-states      : 3
current frequency    : 1600 MHz
P0         [2000 MHz]: transition [                1467]
                       residency  [               24856 ms]
P1         [1800 MHz]: transition [                  96]
                       residency  [                 587 ms]
*P2        [1600 MHz]: transition [                1434]
                       residency  [               90121 ms]

Just as the reporter of issue #4604 mentioned later, when using 'xenpm start 1', the average frequency reported under load are actually higher than the reported maximum. I've seen 4000-4100MHz under load.
When idle, it is never lower than 1600MHz though.

Notes

It is also necessary to use clocksource=tsc tsc=unstable hpetbroadcast=0 for proper performance. Not sure if related:
#6055

@Geblaat Geblaat added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Jan 30, 2023
@SurFlurer
Copy link

Maybe implement amd p-state driver.

Yes, that's exactly what I'm expecting. The current driver only supports 3 P-states and 4 C-states for a AMD CPU. I would expect at least the same level as acpi-cpufreq for a Intel CPU, which is 16 P-states and 7 C-states according to this comment.

When idle, it is never lower than 1600MHz though.

I've read somewhere that linux didn't implement the extreme low idle frequency as Windows did, but I cannot find the original article, so my memory may be wrong.

@andrewdavidwong andrewdavidwong added C: Xen hardware support needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Jan 30, 2023
@andrewdavidwong andrewdavidwong added this to the Release 4.1 updates milestone Jan 30, 2023
@dylangerdaly
Copy link

If we could have real, actual boost / idle states for AMD that'd be great.

@tasket
Copy link

tasket commented Feb 4, 2023

FWIW, I have a ThinkPad T14 with Ryzen 7 Pro 4750U (Renoir) that exhibited a lot of the same problems when Qubes 4.1 was in alpha and beta. But I think its been over a year now since those issues have subsided. That machine has a Qubes install performed with 4.1 alpha and upgraded/tweaked over the years (I had to shoehorn Linux 5.8 into it to get it to boot after install).

First, my CPU clock appears to be fluidly scaling between 1400 and 3800MHz according to xenpm get-cpufreq-average. The ceiling for this chip is 4100MHz but I didn't push it very hard, just viewing several video streams at once. xenpm does not let me set turbo mode, however.

Also, I haven't needed clocksource=tsc tsc=unstable hpetbroadcast=0 for a long time.

Current dom0 kernel version is 6.1.1-1.fc32.qubes.x86_64 and xen version is 4.14.5-15.

Here are the xen and linux lines from /etc/default/grub:

GRUB_CMDLINE_XEN_DEFAULT="console=none dom0_max_vcpus=4 dom0_vcpus_pin=0 dom0_mem=min:1024M dom0_mem=max:2600M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096 ept=exec-sp"
GRUB_DISABLE_OS_PROBER="true"
GRUB_CMDLINE_LINUX="rd.luks.uuid=luks-ebc32f3e-6002-4c17-9759-db70e0f6c859 rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles rd.driver.pre=btrfs rhgb quiet"
GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX rhgb rcu_nocbs=0 rd.qubes.hide_all_usb=1"

PS - dom0 was set to update from 'testing' until recently; many components may be from there.

@Geblaat
Copy link
Author

Geblaat commented Feb 7, 2023

FWIW, I have a ThinkPad T14 with Ryzen 7 Pro 4750U (Renoir) that exhibited a lot of the same problems when Qubes 4.1 was in alpha and beta. But I think its been over a year now since those issues have subsided. That machine has a Qubes install performed with 4.1 alpha and upgraded/tweaked over the years (I had to shoehorn Linux 5.8 into it to get it to boot after install).

First, my CPU clock appears to be fluidly scaling between 1400 and 3800MHz according to xenpm get-cpufreq-average. The ceiling for this chip is 4100MHz but I didn't push it very hard, just viewing several video streams at once. xenpm does not let me set turbo mode, however.

Also, I haven't needed clocksource=tsc tsc=unstable hpetbroadcast=0 for a long time.

Current dom0 kernel version is 6.1.1-1.fc32.qubes.x86_64 and xen version is 4.14.5-15.

Here are the xen and linux lines from /etc/default/grub:

GRUB_CMDLINE_XEN_DEFAULT="console=none dom0_max_vcpus=4 dom0_vcpus_pin=0 dom0_mem=min:1024M dom0_mem=max:2600M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096 ept=exec-sp"
GRUB_DISABLE_OS_PROBER="true"
GRUB_CMDLINE_LINUX="rd.luks.uuid=luks-ebc32f3e-6002-4c17-9759-db70e0f6c859 rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles rd.driver.pre=btrfs rhgb quiet"
GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX rhgb rcu_nocbs=0 rd.qubes.hide_all_usb=1"

PS - dom0 was set to update from 'testing' until recently; many components may be from there.

If I don't use clocksource=tsc tsc=unstable hpetbroadcast=0, the machine is so slow that some startup jobs timeout and doesn't even reach the disk password prompt. If I use the vcpus parameters instead I do not have that problem, but then later during boot, the start jobs for "LVM event activation on device 253:0" and "Activation of DM RAID sets" take a very long time. I do not know if they eventually succeed, the longest I waited was 7 minutes before doing a hard shutdown. I have tried it with dom0_max_vcpus=4, 2 and 1. Together with dom0_vcpus_pin=0 or dom0_vcpus_pin. But none worked. I have also used the testing repository recently, which included Xen 4.14.5-15 and Kernel_latest 6.1.5-1 at the time.
dom0_max_vcpus=1 and dom0_vcpus_pin did work before though: during and after installation. But it was still quite slow.
Maybe the reason you do not need clocksource=tsc tsc=unstable hpetbroadcast=0 is because Lenovo fixed it at UEFI level like they did for some other Thinkpad X13 and T14s gen 1:
#6055 (comment)

@tasket
Copy link

tasket commented Feb 7, 2023

I can't be sure, but I think a UEFI firmware upgrade was part of the solution for me. That definitely rings a bell. I did update the firmware at least once around the time the performance cleared up. Currently I can use any number of CPUs in dom0 without any of the lurching slowdowns returning. But I don't think this is necessarily related to the lack of CPU frequency scaling; its probably a separate issue.

@n0madK
Copy link

n0madK commented Feb 7, 2023

@Geblaat have you tried x2apic=false on xen cmdline instead of those others? I've been using this on new Ryzens instead of pinning the vcpus

@Geblaat
Copy link
Author

Geblaat commented Feb 9, 2023

@Geblaat have you tried x2apic=false on xen cmdline instead of those others? I've been using this on new Ryzens instead of pinning the vcpus

Yes, same result as with the vpcu parameters. I let it go on now, "Activation of DM RAID sets" succeeds eventually but "LVM event activation on device 253:0" does not and after about 9 minutes the booting fails when "Mounting /boot" fails.

@biergaizi
Copy link

biergaizi commented Apr 12, 2023

I would not say the frequency scaling the broken, but it has incomplete support (the ancient "powernow" frequency governor may be suboptimal by today's standard, and there's no support for low-power states below P2, a problem on laptops but not desktops) and it does not report Boost frequencies (but it's still enabled and managed by hardware itself). If you run a multi-core stress test and run xenpm start 60, you would see that the actual sampled CPU clock goes way above the base clock frequency, indicating that Turbo boost is functional. On my CPU, I'm getting 3.4 GHz base clock but the sampled frequency is around 4.6 GHz during a stress test.

The situation on Xen is similar to bare-metal FreeBSD, which behaves exactly the same way.

@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
@andrewdavidwong andrewdavidwong removed this from the Release 4.1 updates milestone Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: Xen hardware support needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

7 participants