Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMD][Zen] kernel cpuidle and cpufreq integration #115

Closed
cyring opened this issue May 12, 2019 · 12 comments
Closed

[AMD][Zen] kernel cpuidle and cpufreq integration #115

cyring opened this issue May 12, 2019 · 12 comments
Assignees

Comments

@cyring
Copy link
Owner

cyring commented May 12, 2019

Version 1.51

This version implements a C-States handler.

Build

Code can be compiled with the feature directive of level 2

make FEAT_DBG=2 clean all

Prerequisites

  1. This code is so far compatible with the Nehalem architecture only.
  2. Blacklist any driver which implements an idle handler but leave cpufreq enabled and idle initialized to halt

Here's my boot command line:
(probably too many drivers blacklisted, but I need control over the hardware)

nmi_watchdog=0 modprobe.blacklist=pcspkr,iTCO_wdt,acpi_cpufreq,pcc_cpufreq,intel_cstate,intel_uncore,intel_powerclamp,i7core_edac,aesni_intel,ghash_clmulni_intel,crc32c_intel,crypto-crc32c-intel,i5500_temp,vboxnetflt,vboxnetadp,vboxpci,vboxdrv,kvm_intel,kvm,coretemp idle=halt intel_pstate=disable cpu0_hotplug audit=0 intel_idle.max_cstate=0

Start

  1. Ensures the current driver is first listed as none
# cat /sys/devices/system/cpu/cpuidle/current_driver
none
  1. Load the CoreFreq driver and check corefreqk-idle becomes the new idle driver
# insmod corefreqk.ko
# cat /sys/devices/system/cpu/cpuidle/current_driver
corefreqk-idle
  1. Start the Daemon and the Client

Stop

Issue: The Kernel may refuse to unload the CoreFreq driver.
(I believe that as soon as an idle handler is instantiated it has to stay resident)
However, you can force the removal:

rmmod -f corefreqk.ko

Screenshots

2019-05-12-170617_804x644_scrot

  • The CoreFreq Client confirms the name & parameters of the Idle driver
  • The Idle C-States view shows the Nehalem counters C1, C3 and C6
$ corefreq-cli -k
Linux:                                                                          
|- Release                                                 [5.0.13-arch1-1-ARCH]
|- Version                         [#1 SMP PREEMPT Wed May 8 18:22:16 CEST 2019]
|- Machine                                                              [x86_64]
Memory:                                                                         
|- Total RAM                                                         12285500 KB
|- Shared RAM                                                           53684 KB
|- Free RAM                                                          10542940 KB
|- Buffer RAM                                                          104308 KB
|- Total High                                                               0 KB
|- Free High                                                                0 KB
Idle driver                                                    [@corefreqk-idle]
   |- State:          POLL    C1      C1E     C3      C6                        
   |- Power:          -1      0       0       0       0                         
   |- Latency:        0       3       10      20      200                       
   |- Residency:      0       6       20      80      800

Tuning

  • Nehalem C-States table
    static IDLE_STATE NHM_IdleState[] = {
  • Other tables will be produced depending on my possibility to test other architectures.

Thanks

  • My code is inspired by the sources of the Linux Kernel and the Intel driver.
@cyring cyring self-assigned this May 12, 2019
@cyring cyring pinned this issue May 12, 2019
@cyring
Copy link
Owner Author

cyring commented May 13, 2019

Skylake Idle States

static IDLE_STATE SKL_IdleState[] = {
	{
	.Name		= "C1",
	.Desc		= "SKL-C1",
	.flags		= 0x00 << 24,
	.Latency	= 2,
	.Residency	= 2
	},
	{
	.Name		= "C1E",
	.Desc		= "SKL-C1E",
	.flags		= 0x01 << 24,
	.Latency	= 10,
	.Residency	= 20
	},
	{
	.Name		= "C3",
	.Desc		= "SKL-C3",
	.flags		= (0x10 << 24) | 0x10000,
	.Latency	= 70,
	.Residency	= 100
	},
	{
	.Name		= "C6",
	.Desc		= "SKL-C6",
	.flags		= (0x20 << 24) | 0x10000,
	.Latency	= 85,
	.Residency	= 200
	},
	{
	.Name		= "C7",
	.Desc		= "SKL-C7",
	.flags		= (0x33 << 24) | 0x10000,
	.Latency	= 124,
	.Residency	= 800
	},
	{
	.Name		= "C8",
	.Desc		= "SKL-C8",
	.flags		= (0x40 << 24) | 0x10000,
	.Latency	= 200,
	.Residency	= 800
	},
	{
	.Name		= "C9",
	.Desc		= "SKL-C9",
	.flags		= (0x50 << 24) | 0x10000,
	.Latency	= 480,
	.Residency	= 5000
	},
	{
	.Name		= "C10",
	.Desc		= "SKL-C10",
	.flags		= (0x60 << 24) | 0x10000,
	.Latency	= 890,
	.Residency	= 5000
	},
	{NULL}
};

[Skylake_S]  = {							/* 39*/
	.Signature = _Skylake_S,
	.Query = Query_Broadwell,
	.Update = PerCore_Skylake_Query,
	.Start = Start_Skylake,
	.Stop = Stop_Skylake,
	.Exit = NULL,
	.Timer = InitTimer_Skylake,
	.BaseClock = BaseClock_Skylake,
	.ClockMod = ClockMod_Skylake_HWP,
	.TurboClock = Intel_Turbo_Config8C,
	.thermalFormula = THERMAL_FORMULA_INTEL,
	.voltageFormula = VOLTAGE_FORMULA_INTEL_SNB,
	.powerFormula   = POWER_FORMULA_INTEL,
	.PCI_ids = PCI_Skylake_ids,
	.Uncore = {
		.Start = Start_Uncore_Skylake,
		.Stop = Stop_Uncore_Skylake,
		.ClockMod = NULL
		},
	.Specific = Void_Specific,
	.IdleState = SKL_IdleState,
	.Architecture = Arch_Skylake_S
	},

2019-05-13-142254_804x744_scrot

# cat /sys/devices/system/cpu/cpuidle/current_driver
corefreqk-idle

@cyring
Copy link
Owner Author

cyring commented May 14, 2019

SandyBridge Idle States

static IDLE_STATE SNB_IdleState[] = {
	{
	.Name		= "C1",
	.Desc		= "SNB-C1",
	.flags		= 0x00 << 24,
	.Latency	= 2,
	.Residency	= 2
	},
	{
	.Name		= "C1E",
	.Desc		= "SNB-C1E",
	.flags		= 0x01 << 24,
	.Latency	= 10,
	.Residency	= 20
	},
	{
	.Name		= "C3",
	.Desc		= "SNB-C3",
	.flags		= (0x10 << 24) | 0x10000,
	.Latency	= 80,
	.Residency	= 211
	},
	{
	.Name		= "C6",
	.Desc		= "SNB-C6",
	.flags		= (0x20 << 24) | 0x10000,
	.Latency	= 104,
	.Residency	= 345
	},
	{
	.Name		= "C7",
	.Desc		= "SNB-C7",
	.flags		= (0x30 << 24) | 0x10000,
	.Latency	= 109,
	.Residency	= 345
	},
	{NULL}
};

[SandyBridge] = {							/* 26*/
	.Signature = _SandyBridge,
	.Query = Query_SandyBridge,
	.Update = PerCore_SandyBridge_Query,
	.Start = Start_SandyBridge,
	.Stop = Stop_SandyBridge,
	.Exit = NULL,
	.Timer = InitTimer_SandyBridge,
	.BaseClock = BaseClock_SandyBridge,
	.ClockMod = ClockMod_SandyBridge_PPC,
	.TurboClock = Intel_Turbo_Config8C,
	.thermalFormula = THERMAL_FORMULA_INTEL,
	.voltageFormula = VOLTAGE_FORMULA_INTEL_SNB,
	.powerFormula   = POWER_FORMULA_INTEL,
	.PCI_ids = PCI_SandyBridge_ids,
	.Uncore = {
		.Start = Start_Uncore_SandyBridge,
		.Stop = Stop_Uncore_SandyBridge,
		.ClockMod = NULL
		},
	.Specific = Void_Specific,
	.IdleState = SNB_IdleState,
	.Architecture = Arch_SandyBridge
	},

2019-05-14-130500_644x485_scrot
2019-05-14-130712_644x576_scrot

@cyring
Copy link
Owner Author

cyring commented May 14, 2019

Code change to add and fix the C-States description.

From the Intel SDM:

CPUID.01H:ECX.MONITOR[bit 3] indicates the availability of MONITOR and MWAIT in the processor.

Software can execute MWAIT with ECX[0] = 1 only if CPUID.05H:ECX[bit 1] = 1

2019-05-15-094243_724x436_scrot

@cyring
Copy link
Owner Author

cyring commented May 16, 2019

Version 1.51.1
2019-05-16-165524_724x436_scrot
2019-05-16-165719_724x436_scrot

@cyring cyring changed the title New CPU Idle Handler [WIP] kernel cpuidle and cpufreq integration May 16, 2019
@cyring cyring unpinned this issue May 18, 2019
@cyring
Copy link
Owner Author

cyring commented May 21, 2019

Code is stable, FEAT_DBG=2 is no more used to build the driver.

@cyring
Copy link
Owner Author

cyring commented May 28, 2019

Kernel ability to cap the C-States

Referring to the last commit:

CoreFreqK.IdleDriver.state_count = prm.dl.lo;

Found a proper way to cap the idle states (instead of altering the indexes count):

CoreFreqK.IdleDriver.states[index].disabled = {true or false}

The kernel mechanism of disablement appears to be in descending order (starting from the right side of the array).

For example, with the plan below, to limit to a Cn state, the superior idle state indexes must all be disabled.

Index     |1      |2      |3      |4      |5      |6      |7      |8     
State     |POLL   |C1     |C1E    |C3     |C6     |C7     |C8     |C9     
          |CPUIDLE| SKL-C1|SKL-C1E| SKL-C3| SKL-C6| SKL-C7| SKL-C8| SKL-C9
Power     |     -1|      0|      0|      0|      0|      0|      0|      0
Latency   |      0|      2|     10|     70|     85|    124|    200|    480
Residency |      0|      2|     20|    100|    200|    800|    800|   5000

According to the kernel documentation , the mwait invocation with ecx = 0 will lead to issues: no interrupt flag will wake the CPU up.

mwait_idle_with_hints(MWAIT, Proc->Features.MWait.ECX.IBE_MWAIT);

@cyring
Copy link
Owner Author

cyring commented May 30, 2019

CoreFreq is now making use of the cpuidle_state.disabled structure field to set the C-States floor
2019-05-30-184819_804x644_scrot

@cyring
Copy link
Owner Author

cyring commented Jun 10, 2019

  • Improving the Sub C-States counting against the CPU-Idle indexes
    static int CoreFreqK_IdleDriver_Init(void)
static int CoreFreqK_IdleDriver_Init(void)
{
	int rc = -EPERM;
#if defined(CONFIG_CPU_IDLE) && LINUX_VERSION_CODE >= KERNEL_VERSION(4, 14, 0)
  if (Arch[Proc->ArchID].SystemDriver != NULL)
  {
	IDLE_STATE *pIdleState = Arch[Proc->ArchID].SystemDriver->IdleState;
    if ((pIdleState != NULL) && Proc->Features.Std.ECX.MONITOR)
    {
	if((CoreFreqK.IdleDevice = alloc_percpu(struct cpuidle_device)) == NULL)
		rc = -ENOMEM;
	else {
		unsigned int subState[] = {
			Proc->Features.MWait.EDX.SubCstate_MWAIT0,  /*   C0  */
			Proc->Features.MWait.EDX.SubCstate_MWAIT1,  /*   C1  */
			Proc->Features.MWait.EDX.SubCstate_MWAIT1,  /*  C1E  */
			Proc->Features.MWait.EDX.SubCstate_MWAIT2,  /*   C3  */
			Proc->Features.MWait.EDX.SubCstate_MWAIT3,  /*   C6  */
			Proc->Features.MWait.EDX.SubCstate_MWAIT4,  /*   C7  */
			Proc->Features.MWait.EDX.SubCstate_MWAIT5,  /*   C8  */
			Proc->Features.MWait.EDX.SubCstate_MWAIT6,  /*   C9  */
			Proc->Features.MWait.EDX.SubCstate_MWAIT7   /*  C10  */
		};
		const unsigned int subStateCount = sizeof(subState)
						 / sizeof(subState[0]);
		/* Kernel polling loop					*/
		cpuidle_poll_state_init(&CoreFreqK.IdleDriver);

		CoreFreqK.IdleDriver.state_count = 1;
		/* Idle States						*/
	    while (pIdleState->Name != NULL)
	    {
		if ((CoreFreqK.IdleDriver.state_count < subStateCount)
		&& (subState[CoreFreqK.IdleDriver.state_count] > 0))
		{
			StrCopy(CoreFreqK.IdleDriver.states[
					CoreFreqK.IdleDriver.state_count
				].name, pIdleState->Name, CPUIDLE_NAME_LEN);

			StrCopy(CoreFreqK.IdleDriver.states[
					CoreFreqK.IdleDriver.state_count
				].desc, pIdleState->Desc, CPUIDLE_NAME_LEN);

			CoreFreqK.IdleDriver.states[
				CoreFreqK.IdleDriver.state_count
			].flags = pIdleState->flags;

			CoreFreqK.IdleDriver.states[
				CoreFreqK.IdleDriver.state_count
			].exit_latency = pIdleState->Latency;

			CoreFreqK.IdleDriver.states[
				CoreFreqK.IdleDriver.state_count
			].target_residency = pIdleState->Residency;

			CoreFreqK.IdleDriver.states[
				CoreFreqK.IdleDriver.state_count
			].enter = CoreFreqK_IdleHandler;

			CoreFreqK.IdleDriver.states[
				CoreFreqK.IdleDriver.state_count
			].enter_s2idle = CoreFreqK_S2IdleHandler;

			CoreFreqK.IdleDriver.state_count++;
		}
		pIdleState++;
	    }
	    if ((rc = cpuidle_register_driver(&CoreFreqK.IdleDriver)) == 0) {
		struct cpuidle_device *device;
		unsigned int cpu;
		for (cpu = 0; cpu < Proc->CPU.Count; cpu++) {
		    if (!BITVAL(KPublic->Core[cpu]->OffLine, HW)) {
			device = per_cpu_ptr(CoreFreqK.IdleDevice, cpu);
			device->cpu = cpu;
			if ((rc = cpuidle_register_device(device)) == 0)
				continue;

			cpuidle_unregister_driver(&CoreFreqK.IdleDriver);
			break;
		    }
		}
	    }
	}
    }
  }
#endif /* CONFIG_CPU_IDLE */
	return(rc);
}

2019-06-10-115210_644x329_scrot
2019-06-10-115201_644x329_scrot

@cyring cyring pinned this issue Jun 10, 2019
@cyring cyring unpinned this issue Jun 15, 2019
@cyring
Copy link
Owner Author

cyring commented Aug 26, 2019

Version 1.63.2

Register the sub-drivers prior querying the platform information, such as the Turbo state, through the Controller_Init() call

static int __init CoreFreqK_init(void)

...
			/* Set the uArch's name with the first found codename */
			StrCopy(Proc->Architecture,
				Arch[Proc->ArchID].Architecture[0].CodeName,
				CODENAME_LEN);

			/* Copy various SMBIOS data [version 3.2]	*/
			SMBIOS_Collect();

			/* Register the Idle & Frequency sub-drivers	*/
		    if (Register_CPU_Idle == 1) {
			Proc->Registration.Driver.cpuidle =		\
					CoreFreqK_IdleDriver_Init() == 0;
		    }
		    if (Register_CPU_Freq == 1) {
			Proc->Registration.Driver.cpufreq =		\
					CoreFreqK_FreqDriver_Init() == 0;
		    }

			/* Initialize the CoreFreq controller		*/
			Controller_Init();

			MatchPeerForDefaultService(&Proc->Service,
						iArg.localProcessor);

			printk(KERN_INFO "CoreFreq(%u:%d):"	\
				" Processor [%2X%1X_%1X%1X]"	\
				" Architecture [%s] %3s [%u/%u]\n",
				Proc->Service.Core, Proc->Service.Thread,
				Proc->Features.Std.EAX.ExtFamily,
				Proc->Features.Std.EAX.Family,
				Proc->Features.Std.EAX.ExtModel,
				Proc->Features.Std.EAX.Model,
				Proc->Architecture,
				Proc->Features.HTT_Enable ? "SMT" : "CPU",
				Proc->CPU.OnLine,
				Proc->CPU.Count);
/*TODO: CleanUp
		    if (Register_CPU_Idle == 1) {
			Proc->Registration.Driver.cpuidle =		\
					CoreFreqK_IdleDriver_Init() == 0;
		    }
		    if (Register_CPU_Freq == 1) {
			Proc->Registration.Driver.cpufreq =		\
					CoreFreqK_FreqDriver_Init() == 0;
		    }
*/
			Controller_Start(0);
...

@cyring
Copy link
Owner Author

cyring commented Aug 27, 2019

Above change verified ok

@cyring cyring unpinned this issue Aug 27, 2019
@cyring
Copy link
Owner Author

cyring commented Aug 27, 2019

Although AMD is not covered yet, C-States limit and Frequency target can be controlled on most Intel architectures.
Closing the issue

@cyring cyring closed this as completed Aug 27, 2019
@cyring cyring changed the title [WIP] kernel cpuidle and cpufreq integration [Done] kernel cpuidle and cpufreq integration Aug 27, 2019
@cyring cyring changed the title [Done] kernel cpuidle and cpufreq integration [AMD][Zen] kernel cpuidle and cpufreq integration Sep 26, 2020
@cyring
Copy link
Owner Author

cyring commented Sep 26, 2020

Version 1.81.1

Prerequisites

This version has been tested with Matisse

  • Blacklist any driver which makes use of kernel function amd_smn_read() to read and write the SMU
  • Blacklist any driver which registers as a cpufreq driver
  • Prevent any cpuidle driver to load into kernel
  • Optionally let CoreFreq acts as a clock source

Bellow my boot command line parameters:

modprobe.blacklist=pcspkr,nouveau,k10temp,acpi_cpufreq,edac_mce_amd idle=halt notsc acpi_enforce_resources=lax cpu0_hotplug audit=0 nowatchdog sysrq_always_enabled

Build

  • Use default arguments to build CoreFreq

    Don't build with the LEGACY argument (which implements amd_smn_read())

    Add DELAY_TSC to let CoreFreq implements its own delay function

make DELAY_TSC=1 clean all

Registering the CoreFreq sub-drivers

-- Two ways --

[A] When loading the CoreFreq kernel module

## Load the CoreFreq driver
insmod corefreqk.ko Register_ClockSource=1 Register_Governor=1 Register_CPU_Freq=1 Register_CPU_Idle=1

## Switch the Clock source
echo corefreq > /sys/bus/clocksource/devices/clocksource0/current_clocksource

## Now start the Daemon
corefreqd

[B] From the UI in the Settings window

  1. Register Clock Source
  2. In another root session, switch the system Clock source to CoreFreq
echo corefreq > /sys/bus/clocksource/devices/clocksource0/current_clocksource
corefreq: Freq_KHz[3500000] Kernel CPU_KHZ[3500004] TSC_KHZ[3500004]
LPJ[11666680] mask[ffffffffffffffff] mult[4793490] shift[24]
  1. Register Governor
  2. Register CPU-FREQ
  3. Register CPU-IDLE

Stopping CoreFreq

  • Kernel prevents from unregistering the last and current cpuidle driver:
rmmod corefreqk
rmmod: ERROR: Module corefreqk is in use

Thus make sure to unregister CPU-IDLE before leaving the UI
You can off course restart the Daemon and the Client to proceed to the unregistration; next you can safely unload the CoreFreq driver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant