Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAPL for Intel/AMD architectures #57

Closed
cyring opened this issue Jun 18, 2018 · 66 comments
Closed

RAPL for Intel/AMD architectures #57

cyring opened this issue Jun 18, 2018 · 66 comments
Assignees
Milestone

Comments

@cyring
Copy link
Owner

cyring commented Jun 18, 2018

Replace PWR_ACCU_SandyBridge with PWR_ACCU_Skylake

#define PWR_ACCU_Skylake(Pkg, T)                                        \
({                                                                      \
        RDCOUNTER(Pkg->Counter[T].Power.ACCU[PWR_DOMAIN(PKG)],          \
                                                MSR_PKG_ENERGY_STATUS); \
                                                                        \
        RDCOUNTER(Pkg->Counter[T].Power.ACCU[PWR_DOMAIN(CORES)],        \
                                                MSR_PP0_ENERGY_STATUS); \
                                                                        \
        RDCOUNTER(Pkg->Counter[T].Power.ACCU[PWR_DOMAIN(UNCORE)],       \
                                                MSR_PP1_ENERGY_STATUS); \
                                                                        \
        RDCOUNTER(Pkg->Counter[T].Power.ACCU[PWR_DOMAIN(RAM)],          \
                                                MSR_DRAM_ENERGY_STATUS);\
})
@cyring cyring self-assigned this Jun 18, 2018
@cyring cyring changed the title RAPL for Skylake architecture Skylake architecture Jun 19, 2018
@cyring cyring changed the title Skylake architecture RAPL for Skylake architecture Jun 19, 2018
@cyring
Copy link
Owner Author

cyring commented Jun 28, 2018

KBL RAPL and Voltage during high load ...
corefreq_kbl_rapl_high
... during low load
corefreq_kbl_rapl_low

@cyring cyring changed the title RAPL for Skylake architecture RAPL for Intel architectures Jul 30, 2018
@cyring
Copy link
Owner Author

cyring commented Jul 30, 2018

Measurements issue with i7-3770 (IvyBridge). TDP specifications have to be 77 W
2018-07-30-175649_644x364_scrot
2018-07-30-175557_644x364_scrot
2018-07-30-175513_644x364_scrot

@cyring
Copy link
Owner Author

cyring commented Aug 3, 2018

Skylake: add missing code to measure DRAM power

Delta_PWR_ACCU(Proc, UNCORE);

			Delta_PWR_ACCU(Proc, RAM);

Save_PWR_ACCU(Proc, UNCORE);

			Save_PWR_ACCU(Proc, RAM);

2018-08-03-171023_644x316_scrot
2018-08-03-171110_644x316_scrot
2018-08-03-171214_644x316_scrot

@cyring
Copy link
Owner Author

cyring commented Sep 13, 2018

Workaround for IVB [Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz]

switch (Proc->powerFormula) {

	Shm->Proc.Power.Unit.Watts = Proc->PowerThermal.Unit.PU > 0 ?
			1.0 / (double) (1 << Proc->PowerThermal.Unit.PU) : 0;

	Shm->Proc.Power.Unit.Watts /= (Proc->CPU.Count >> Proc->Features.HTT_Enable);

2018-09-13-120420_644x316_scrot
Accurate ?

@cyring
Copy link
Owner Author

cyring commented Dec 1, 2018

Issue closed until other hardware is available for testings.

@cyring cyring closed this as completed Dec 1, 2018
@cyring
Copy link
Owner Author

cyring commented Jul 21, 2019

  • Intel
Architecture Processor ESU (J)1 Pkg load (J)2 Cores load (J)2 TDP (W)3
Skylake/S i5-6600K 0.000061035 84.761535645 75.259704590 91
Skylake/S i7-6700 0.000061035 55.686950684 44.473632812 65
Haswell/U i3-4010U 0.000061035 6.958801270 4.341308594 15
Haswell/U i7-4650U 0.000061035 18.376159668 14.542602539 15
IvyBridge/EP E5-1607 0.000015259 34.55101032 26.025909424 130
SandyBridge i7-2710QE 0.000015259 44.356430054 40.928970337 45
  • AMD Zen
Architecture Processor ESU (J) Pkg load (J) Cores load (J) TDP (W)
Pinnacle Ridge 2700X 0.000015259 115.171279907 116.535186768 105

Remarks

  1. RAPL Units Register.
  2. All Cores fully loaded
    One second interval
    RAPL Energy architectural counter
  3. Manufacturer specification

@cyring cyring changed the title RAPL for Intel architectures RAPL for Intel/AMD architectures Jul 21, 2019
@cyring cyring reopened this Jul 21, 2019
@cyring cyring pinned this issue Jul 21, 2019
@cyring
Copy link
Owner Author

cyring commented Jul 21, 2019

Proposal for the Power & Voltage view

AMD Zen

# Freq MHz VID Vcore Energy(J) Counter(Raw)
#0 4015.73 . . Package 131.852859497 9876543210
#1 4015.67 . . Cores 122.611679077 .
#2 4015.74 54 1.2125 Uncore 10.000000000 9876543210
#3 4015.73 . . Memory 5.000000000 .
# Freq MHz VID Vcore Power(W) Core(W)
#0 4015.73 . . Package 131.852859497 13.000000000
#1 4015.67 . . Cores 122.611679077 .
#2 4015.74 54 1.2125 Uncore 10.000000000 11.000000000
#3 4015.73 . . Memory 5.000000000 .

Intel

# Freq MHz VID Vcore Energy(J) Counter(Raw)
#0 4015.73 . . Package 131.852859497 876543210987654321
#1 4015.67 . . Cores 122.611679077 876543210987654321
#2 4015.74 54 1.2125 Uncore 10.000000000 106543210987654321
#3 4015.73 . . Memory 5.000000000 056543210987654321
# Freq MHz VID Vcore Power(W) Core(W)
#0 4015.73 . . Package 131.852859497 .
#1 4015.67 . . Cores 122.611679077 .
#2 4015.74 54 1.2125 Uncore 10.000000000 .
#3 4015.73 . . Memory 5.000000000 .

Remarks

  • One shortcut to toggle between the Energy(J) and Power(W) layouts
  • To my knowledge, only the AMD Zen architecture is providing a per physical core RAPL counter;
    whereas Intel will only show the cumulative raw counter value.

@cyring
Copy link
Owner Author

cyring commented Jul 27, 2019

RAPL in the AMD Zen architecture

Energy consumed

Ryzen 3xxx

61916877-a044b500-af07-11e9-8404-d2a39c56fe59

Ryzen 2xxx

61589073-3e84f380-ab73-11e9-9016-573618e4daa1

Topology

Ryzen 2xxx

Processor                              [AMD Ryzen 7 2700X Eight-Core Processor ]
|- Architecture                                            [Zen+ Pinnacle Ridge]
|- Vendor ID                                                      [AuthenticAMD]
|- Microcode                                                        [ 134251019]
|- Signature                                                            [ 8F_08]
|- Stepping                                                             [     2]
|- Online CPU                                                           [ 16/16]
...
Power & Thermal Monitoring:                                                     
...
|- Units                                                                        
   |- Power                                               watt   [  0.125000000]
   |- Energy                                             joule   [  0.000015259]
   |- Window                                            second   [  0.000976562]

CoreFreq_2700X_Topology

Ryzen 3xxx

Processor                                  [AMD Ryzen 7 3700X 8-Core Processor ]
|- Architecture                                               [Zen/Summit Ridge]
|- Vendor ID                                                      [AuthenticAMD]
|- Microcode                                                        [ 141561873]
|- Signature                                                            [ 8F_71]
|- Stepping                                                             [     0]
|- Online CPU                                                           [ 16/16]
...
Power & Thermal                                                                 
...
|- Units                                                                        
   |- Power                                               watt   [  0.125000000]
   |- Energy                                             joule   [  0.000015259]
   |- Window                                            second   [  0.000976562]

CPU Pkg  Apic  Core Thread  Caches      (w)rite-Back (i)nclusive              
 #   ID   ID    ID     ID  L1-Inst Way  L1-Data Way      L2  Way      L3  Way 
00: BSP     0     0      0      32  8        32  8       512  8     32768  9  
01:   0     2     1      0      32  8        32  8       512  8     32768  9  
02:   0     4     2      0      32  8        32  8       512  8     32768  9  
03:   0     6     3      0      32  8        32  8       512  8     32768  9  
04:   0     8     4      0      32  8        32  8       512  8     32768  9  
05:   0    10     5      0      32  8        32  8       512  8     32768  9  
06:   0    12     6      0      32  8        32  8       512  8     32768  9  
07:   0    14     7      0      32  8        32  8       512  8     32768  9  
08:   0     1     0      1      32  8        32  8       512  8     32768  9  
09:   0     3     1      1      32  8        32  8       512  8     32768  9  
10:   0     5     2      1      32  8        32  8       512  8     32768  9  
11:   0     7     3      1      32  8        32  8       512  8     32768  9  
12:   0     9     4      1      32  8        32  8       512  8     32768  9  
13:   0    11     5      1      32  8        32  8       512  8     32768  9  
14:   0    13     6      1      32  8        32  8       512  8     32768  9  
15:   0    15     7      1      32  8        32  8       512  8     32768  9

Threadripper 2950X

Remark: Threadripper results below are from an old CoreFreq version

Processor                      [AMD Ryzen Threadripper 2950X 16-Core Processor ]
|- Architecture                                                    [Zen+ Colfax]
|- Vendor ID                                                      [AuthenticAMD]
|- Microcode                                                        [ 134251019]
|- Signature                                                            [ 8F_08]
|- Stepping                                                             [     2]
|- Online CPU                                                           [ 32/32]
...
Power & Thermal                                                                 
...
|- Units                                                                        
   |- Power                                               watt   [  0.007812500]
   |- Energy                                             joule   [  0.000000954]
   |- Window                                            second   [  0.000976562]

CPU Pkg  Apic  Core Thread  Caches      (w)rite-Back (i)nclusive              
 #   ID   ID    ID     ID  L1-Inst Way  L1-Data Way      L2  Way      L3  Way 
00: BSP     0     0      0      64  4        32  8       512  8     32768 10  
01:   0     2     1      0      64  4        32  8       512  8     32768 10  
02:   0     4     2      0      64  4        32  8       512  8     32768 10  
03:   0     6     3      0      64  4        32  8       512  8     32768 10  
04:   0     8     4      0      64  4        32  8       512  8     32768 10  
05:   0    10     5      0      64  4        32  8       512  8     32768 10  
06:   0    12     6      0      64  4        32  8       512  8     32768 10  
07:   0    14     7      0      64  4        32  8       512  8     32768 10  
08:   1    16     0      0      64  4        32  8       512  8     32768 10  
09:   1    18     1      0      64  4        32  8       512  8     32768 10  
10:   1    20     2      0      64  4        32  8       512  8     32768 10  
11:   1    22     3      0      64  4        32  8       512  8     32768 10  
12:   1    24     4      0      64  4        32  8       512  8     32768 10  
13:   1    26     5      0      64  4        32  8       512  8     32768 10  
14:   1    28     6      0      64  4        32  8       512  8     32768 10  
15:   1    30     7      0      64  4        32  8       512  8     32768 10  
16:   0     1     0      1      64  4        32  8       512  8     32768 10  
17:   0     3     1      1      64  4        32  8       512  8     32768 10  
18:   0     5     2      1      64  4        32  8       512  8     32768 10  
19:   0     7     3      1      64  4        32  8       512  8     32768 10  
20:   0     9     4      1      64  4        32  8       512  8     32768 10  
21:   0    11     5      1      64  4        32  8       512  8     32768 10  
22:   0    13     6      1      64  4        32  8       512  8     32768 10  
23:   0    15     7      1      64  4        32  8       512  8     32768 10  
24:   1    17     0      1      64  4        32  8       512  8     32768 10  
25:   1    19     1      1      64  4        32  8       512  8     32768 10  
26:   1    21     2      1      64  4        32  8       512  8     32768 10  
27:   1    23     3      1      64  4        32  8       512  8     32768 10  
28:   1    25     4      1      64  4        32  8       512  8     32768 10  
29:   1    27     5      1      64  4        32  8       512  8     32768 10  
30:   1    29     6      1      64  4        32  8       512  8     32768 10  
31:   1    31     7      1      64  4        32  8       512  8     32768 10  

Issue

AMD specifications

MSRC001_029B [Package Energy Status] (Core::X86::Msr::PKG_ENERGY_STAT)
Read-only,Volatile. Reset: 0000_0000_0000_0000h. 
_lthree[1:0]; MSRC001_029B
Bits Description
63:32 Reserved. 
31:0 TotalEnergyConsumed.
CCX: Core Complex where more than one core shares L3 resources.
  • ApicId Enumeration Requirements
Each Core::X86::Apic::ApicId[ApicId] register is preset as follows:
• ApicId[6] = Socket ID.
• ApicId[5:4] = Node ID.
• ApicId[3] = Logical CCX L3 complex ID
• ApicId[2:0]= (SMT) ? {LogicalCoreID[1:0],ThreadId} : {1'b0,LogicalCoreID[1:0]}.
CPUID_Fn8000001E_EAX [Extended APIC ID] (Core::X86::Cpuid::ExtApicId)
Read-only.
If Core::X86::Cpuid::FeatureExtIdEcx[TopologyExtensions] == 0 then CPUID Fn8000001E_E[D,C,B,A]X are reserved.
If (Core::X86::Msr::APIC_BAR[ApicEn] == 0) then Core::X86::Cpuid::ExtApicId[ExtendedApicId] is reserved.
_lthree[1:0]_core[3:0]_thread[1:0]; CPUID_Fn8000001E_EAX
Bits Description
31:0 ExtendedApicId: extended APIC ID. Read-only. See 2.1.12.2.1.3 [ApicId Enumeration Requirements].
Reset: Core::X86::Msr::APIC_BAR[ApicEn] ? Fixed,{00_0000h , Core::X86::Apic::ApicId[ApicId]} :
Fixed,0000_0000h.


CPUID_Fn8000001E_EBX [Core Identifiers] (Core::X86::Cpuid::CoreId)
Read-only.
See Core::X86::Cpuid::ExtApicId.
_lthree[1:0]_core[3:0]_thread[1:0]; CPUID_Fn8000001E_EBX
Bits Description
31:16 Reserved.
15:8 ThreadsPerCore: threads per core. Read-only. Reset: XXh. The number of threads per core is
ThreadsPerCore+1.
7:0 CoreId: core ID. Read-only. Reset: Fixed,XXh.
Description: For Family 17, Model 1, Revision 1 and later:
CoreId = ({2'b0, DieId[1:0], LogicalComplexId[0], LogicalThreadId[2:0]} >> SMT).


CPUID_Fn8000001E_ECX [Node Identifiers] (Core::X86::Cpuid::NodeId)
Read-only.
_lthree[1:0]_core[3:0]_thread[1:0]; CPUID_Fn8000001E_ECX
Bits Description
31:11 Reserved.
10:8 NodesPerProcessor: Node per processor. Read-only. Reset: XXXb.
ValidValues:
Value Description
0h 1 node per processor.
1h 2 nodes per processor.
2h Reserved.
3h 4 nodes per processor.
7h-4h Reserved.
7:0 NodeId: Node ID. Read-only. Reset: Fixed,XXh.
Description: For Family 17, Model 1, Revision 1 and later:
{5'b00000,1'b[SOCKET_ID],2'b[DIE_ID]}.

Improvements

  • find the CCX identifier into the CPU topology
  • read the Package energy counter per CCX
  • sum the per CCX values into a Package value

Questions

  • what about the multi dies processors (TR, Naples, Rome)

@cyring
Copy link
Owner Author

cyring commented Jul 27, 2019

Core Complex ID

Ryzen 3xxx [SMT ON]

CPU# Pkg ID Apic Ext_ID Core ID Thread ID Apic[3:0] CCX ID1 _lthree scope2
00 BSP 0 0 0 0000 0 Y
01 0 2 1 0 0010 0 .
02 0 4 2 0 0100 0 .
03 0 6 3 0 0110 0 .
04 0 8 4 0 1000 1 Y
05 0 10 5 0 1010 1 .
06 0 12 6 0 1100 1 .
07 0 14 7 0 1110 1 .
08 0 1 0 1 0001 0 .
09 0 3 1 1 0011 0 .
10 0 5 2 1 0101 0 .
11 0 7 3 1 0111 0 .
12 0 9 4 1 1001 1 .
13 0 11 5 1 1011 1 .
14 0 13 6 1 1101 1 .
15 0 15 7 1 1111 1 .
  1. ApicId[3] = Logical CCX L3 complex ID
CCX_ID = ( leaf8000001e.EAX.ExtApicId & 0b1000 ) >> 1
  1. First ID of the CCX instance

@cyring
Copy link
Owner Author

cyring commented Jul 27, 2019

After long hours of the 3700X debugging, it appears that the RAPL Package Energy Counter delta remains the same, whatever the CPU is used to read the msr
CoreFreq_Ryzen_3700X_Energy

Edit: as specified, the RAPL Energy status is package scope which returns the same value whatever is the Core used for reading.
CCX is not involved in the issue.

Call for help on Reddit

@cyring
Copy link
Owner Author

cyring commented Aug 4, 2019

@olejon
Copy link

olejon commented Aug 12, 2019

Here are my Screenshots + Output as requested in #129

  • https://www.olejon.net/files/CF-X570-Ryzen-3600X/
  • Also added the benchmarks from Blender Benchmark & Geekbench, because it shows Linux is already superior here 👍
  • Both benchmarks run under same conditions (SMT on, Precision Boost Override on, System totally idle with no other programs open and Geekbench run in CLI-mode on Windows as well, which also run the latest and greatest build from the Insider Program and latest chipset drivers)
  • Note to others: Yes, bought the 3600X... because it was the only one in stock country-wide in online stores in Norway, showing the interest in Ryzen. When released in Taiwan, long queues outside stores on launch day. When launched in the US, long queues outside physical stores country-wide to get the new MBs and CPUs. When was the last time one saw that after an Intel/AMD keynote? Probably some iPhones ago LOL - though those iSheeps and not an AMD/Intel keynote. I'm not stupid - had ordered the 3900X, but didn't come in when stores said they might, so canceled and thought, "Hey, system will probably run much faster anyway, even with the stock cooler (and yes: 30k+ in Geekbench vs 17k+ in pretty Overclocked i5-6600K with proper cooler) so can save those money on a 3900X to buy the 3950X + good cooler ASAP it becomes available!". Better plan IMO

@cyring
Copy link
Owner Author

cyring commented Aug 12, 2019

To all reading this issue, look in the above screenshots for the Package Power measurements; Cores all stressed :

Both cases, same algorithm based on RAPL registers; what can we conclude ?

@olejon
Copy link

olejon commented Aug 12, 2019

  • That only 9.28W is definitely not right... Intel CPUs you have tested well and CoreFreq supports well, with more or less similar Cores & Threads, when they're all stressed, use way more Watts
  • My Skylake 4-core i5-6600K has a 91W TDP so the below makes pretty much sense
  • When my Skylake 4-core i5-6600K used ~85W with All Cores Stressed using Conic Compute
  • 4.4 GHz is the max turbo speed for my 3600X, and has a "Default TDP/TDPROM-06a" of 95W, which makes sense for my result of 91.76W no?
  • It can manage 4.4 GHz 1 Thread and 4.2 GHz for 2 Threads at a time, but does not seem to get to 4.4 GHz with SMT = ON. Seems to take advantage of the latter, 4.2 GHz with 2 Threads, that with regular use, offloading load from 2 active cores to 2 sleeping cores and so on, managing cooling very well
  • Of course running a Stress Test is different. All Cores = ~4.0 GHz, as per the specs of the 3600X
  • On Windows running Blender Benchmark - which is the bench that causes the most load (adding in more like CPU-Z or Geekbench does NOT stress it more) Ryzen Master peaks at 70% of "128W PPT (CPU)"
  • This is calculates to 89.6W, which is very close to the CoreFreq reading
  • CoreFreq showing a little more I think is because "Conic Compute" stresses the CPU even more than Blender Benchmark, since on Windows the CPU, according to the most reliable tool, Ryzen Master peaks at temp 79.xx-80.xx C (stays just a couple of seconds at 80.xx C and then goes below 80 C for a couple of seconds, then up again and so on, and above 80 C the color of the bar turns orange so easy to see)
  • CoreFreq manages to get it to at least more than this, using "Conic Compute" IIRC, even maybe 82-84 C IIRC

  • Conclusion: CoreFreq is right for my 3600X on an X570 MB
  • So 9.28W can't be right
  • Also my benchs have shown Linux is much faster than Windows running Blender Benchmark, suggesting maybe higher temps, though didn't monitor temps
  • Can run Blender Benchmark again and monitor with CoreFreq

IMPORTANT NOTE:

  • As far as I've read and seen/heard on YouTube videos, AMD straight out said they do NOT recommend Ryzen 3rd Gen on X370 MBs!
  • According to AMD, if using an older MB, the X470 ones can they be used just fine, but that one will probably run into problems with a CPU with a TDP above 65W at stock, so not recommended for the X version of the 3600, BUT the 3700X has a TDP of 65W though. Problem is it costs more and according to benchs for gamers with a good enough GPU is just wasting money...
  • AND one MUST remember to tell the seller/retailer to update to the latest BIOS (usually 2 updates, 1st to get it ready for the 2nd) before shipping, as if NOT it will NOT RUN 3rd Gen, and without a 2nd Gen laying around you're stuck without that BIOS update which as far as I've read must be done with a USB stick
  • So one should stick to a CPU like the 3600 (without the X) with an X470, which is 65W, AND is the CPU of choice recommended by the most respected YouTubers for gamers looking for a cheap update, as good X470 MBs are very cheap now, and the 3600 is cheap
  • With an X370 you're "on your own" it seems according to what AMD said
  • But with an X470 one can probably find a cheaper and better performing 2nd Gen Ryzen CPU, and you'll lose PCIe4 support if not using X570 anyways, so why not go for a probably cheaper 2nd Gen? At least they will drop in prices if not already
  • As an example, with CPU-Z on Windows and its references, the 2700X outperforms the 3600X, like the 3700X it has 8c/16t compared to 6c/12t. Also the 1700X and 1800X are better (8c/16t) than the 3600X. For Ryzen I must go down to the 1600 (no X) 6c/12t to see my 3600X beating it clearly. On the Intel side my 3600X also clearly beats the i7-8700K (same 6c/12t), but of course not the i9-9900K since it has 8c/16t and costs a lot compared to a 8c/16t Ryzen like the 3700X, which I think the Intel i9 beats since it has Turbo to 5 GHz, but the 5800X (but don't buy that one, only 0.1 GHz more Turbo! but 105W TDP vs 65W on the 3700X), the 3800X may beat it with its also 8c/16t like the 3700X, while Turbo is lower at 4.5 GHz than the i9-9900K, so I doubt it, but it has more cache and newer tech in general. As said not much more Turbo compared to the 3700X's 4.4 GHz (0.1 GHz more), but a TDP of 105W compared to the 3700X's 65W. The 3900X beats the i9 of course with 12c/24t, and clearly the 3950X when launching will beat it with 16c/32t.

@olejon
Copy link

olejon commented Aug 12, 2019

EDIT: Some additional info... Please read IMPORTANT NOTE section for X370/X470, especially the first.

@olejon
Copy link

olejon commented Aug 13, 2019

What I wrote very much confirmed:

CoreFreq manages to get it to at least more than this, using "Conic Compute" IIRC, even maybe 82-84 C IIRC

  • Just running Conic Compute for ~1 minute and the CPU reached 85 C
  • System otherwise totally idle. Not even logged in to desktop. Did this over SSH
  • Much more than I manage on Windows with any tool/bench
  • All fans are set to 100 % Duty Cycle above 75 C so not because of that
  • Must be CPU using more Power. Since Limit according to Ryzen Master in Windows is 95 C, there's no reason it should stay around 80 C using popular tools that maxes out the CPU for a long time
  • Basically Conic Compute in CoreFreq is better at stressing the CPU and hence is able to go to 91.5W. Great job!
  • As said on Windows it calculates to 89.6W (see comment above how it was calculated)
  • Can explain so much better Blender Benchmark score(?)

@cyring
Copy link
Owner Author

cyring commented Oct 16, 2019

As a reminder: MSR 0x64d for the SOC Power Domain added on the development roadmap.

As mentioned In the SDM specifications, there is no guarantee this counter exists for the listed architecture families. Tested with a Skylake i7-6700, this MSR returns a zero value on all cores.

@cyring
Copy link
Owner Author

cyring commented Oct 21, 2019

  • Bellow my tests with version 1.67.6
    2019-10-21-103438_724x436_scrot
  • Need to fix the Power Formula Scope at this line:
    enum POWER_FORMULAS {
enum POWER_FORMULAS {
	POWER_FORMULA_NONE =						\
	(0b000000000000000000000000 << 8) | FORMULA_SCOPE_NONE,
	POWER_FORMULA_INTEL =						\
	(0b000000000000000000000001 << 8) | FORMULA_SCOPE_NONE,
	POWER_FORMULA_INTEL_ATOM =					\
	(0b000000000000000000000011 << 8) | FORMULA_SCOPE_NONE,
	POWER_FORMULA_AMD =						\
	(0b000000000001000000000000 << 8) | FORMULA_SCOPE_CORE,
	POWER_FORMULA_AMD_17h =						\
	(0b000100000001000000000000 << 8) | FORMULA_SCOPE_CORE
};
  • Rebuild all and test again
    2019-10-21-103657_724x436_scrot
  • Results as expected for Skylake/S:
  1. Voltage per Package
  2. Temperature per Thread
  3. No Power per CPU
  4. Power for whole Package
  • --Update--
  1. Results for SandyBridge : OK
    2019-10-21-111437_724x436_scrot
  2. Results for IvyBridge/EP : OK
    2019-10-21-112435_724x292_scrot
  • Please, let me know about yours -;)

@adatum
Copy link

adatum commented Oct 21, 2019

corefreq1676

@cyring
Copy link
Owner Author

cyring commented Oct 21, 2019

@adatum : thank you

  • Cores power looks OK
  • Single temperature sensor: OK
  • Voltage per Core or per SMT: I'm not sure what to display ?
    If a single CPU is stressed (first a CPU Core, next a CPU thread), do we read discret Vcore ?

@adatum
Copy link

adatum commented Oct 21, 2019

Looks like the voltages per core are independent, but the threads on the same core (threads 2 & 10 in screenshot) show the same voltage:

corefreq_single_cpu_voltage

@cyring
Copy link
Owner Author

cyring commented Oct 21, 2019

Looks like the voltages per core are independent, but the threads on the same core (threads 2 & 10 in screenshot) show the same voltage:

My understanding of the Zen SMT architecture is that CPU 10 is the logical peer of the core CPU 2, and they have the Vcore in common.
The view is showing the relation between them.

@adatum
Copy link

adatum commented Oct 22, 2019

CPU 10 is the logical peer of the core CPU 2, and they have the Vcore in common.

Yes, that's what I meant to highlight. It makes sense for the two virtual cores (what I meant by "threads") to have the same Vcore since it is the same physical core. I'm not sure if that necessarily has to be the case, but at least it makes sense.

@cyring
Copy link
Owner Author

cyring commented Oct 22, 2019

The Topology is the clue. Just checking at the screenshots inside the Wiki CPU Support and both 2700X and 3600X have a similar Topology, where the (CCX, CoreID, ThreadID) forms the cluster.
Tests will also be interesting when facing the topology of the dual EPYC

@cyring

This comment has been minimized.

@olejon
Copy link

olejon commented Oct 23, 2019

If I stress 2 to the max, stress does the job (CoreFreq cannot) almost as good as Conic Compute (to raise temperature) on AMD, and on Intel it's even or even better at stressing the CPU (above Windows tools always).

I get pretty much expected results. So using stress -c 2:

They're divided into CCX0 and CCX1, you can see from screenshot, counting down 4 first (CCX0), then counting down just 2 after that, since then it's come to CCX1 (for my CPU with 12 threads), and then counting down 4 again, still CCX1, but goes 4 down again, I assume for the CPU logic, which again I assume is made that way so it comes down to efficient heat spreading, later handling over to other cores with the same pattern to maintain performance.

So makes perfect sense how CoreFreq represents the "spread", like Ryzen Master would if using say CPU-Z on Windows, choosing same amount of threads.

Image

On a 8c/16t CPU like the 3700X I would assume the same pattern, just different "counting".

For what I've seen from your debug output it seems CoreFreq can already show CCXs', and it would be nice if the GUI to separate them, like showing a column for CCX{X} or whatever. I doubt AMD will do any significant changes there when it comes to "grouping" in 4th gen.

So far still best tool for monitoring I'd say, Watts per core etc my NUC doesn't show (and IIRC not RM either, gotta check later, maybe another popular tool at least does it).

Ryzen Master of course an advantage of very nice GUI (not that I expect that from you!), drag and slide OC and whatnot, for RAM stuff as well, but it changes UEFI settings of course, it doesn't come with a Windows CPU driver or anything, so anyone can do it. Just easier for the regular user. But as posted in the chat thread, OC is very limited anyway, performance is great out of the box and the quick settings are easy peasy in UEFI. In the end most will end up with Auto Overclock in Ryzen Master anyway, a Precision Boost Override of 100 MHz, which can be increased there to 200 MHz - kind of more obvious for anyone into OC on Intel as well, in UEFI. Except AMD has put it into both Tuning and XFR parts of UEFI (one set to freq, another just enabled an all other values "Auto", is what RM does).

Point is, CoreFreq seems basically complete for me on X570. Let's see with 3950X, but shouldn't be any different. Same arch, (much) more cores, basically.

I really don't see you have to do more for X570... Got all the data I need + more stress test than Windows can do. Only showing CCXs'... Cosmetics and stuff. Maybe showing N/A, or maybe simply the same value as the "cores" above for the corresponding below - the "cores" with Watts reading, since on a 12t CPU only the first 6 shows that reading, show same on corresponding core with the same thread (voltage reading). I think you know what I mean, 6c shows as 12c basically but isn't, and there's a corresponding thread with the same reading as the actual core, of course (Hyperthreading yeah).

Now a power user should probably understand, but well, just a suggestion.

If anyone were to want to make a Ryzen Master desktop GUI equivalent on Linux they should be able to do so using say Qt + CoreFreq as backbone. At least monitoring will be the same, if not better.

@cyring
Copy link
Owner Author

cyring commented Oct 23, 2019

Only showing CCXs'... Cosmetics and stuff. Maybe showing N/A, or maybe simply the same value as the "cores" above for the corresponding below ,,,

That was indeed the purpose of all these changes requested by CoreFreq's users:

  1. provide Temperature along with the Voltage and Power
  2. show the Power per Core for capable processors like the Zen uarch
  3. mask zero values if Intel processor is in used

It took hundreds of source code lines to refactor this View, please feel free to draw an ascii proposal of your ideas. After debate, changes could be engaged.
FYI, CCX is available in the Topology view

There are tons of remaining things to do with the Zen uarch: we are just at the middle of the subject.
Yesterday, reading the September's revision of the specifications update just left me feel that there are still many unpublished registers yet among those erratum ...

@olejon
Copy link

olejon commented Oct 23, 2019

True that! It's available in Topology. I'm not very skilled at image editing, but I know GIMP well enough I guess. Anyway I think you'll always have a better idea (like a "rejection").

I mainly just think a column for CCX just as in Topology...

I totally understand your hard work and don't ask for anything requiring tons of code. You're already a super FOSS hero! You should know I think that way about you by now. Like heck - no other tools even shows temperatures for Ryzen 3rd Gen, so!

Sometimes I even wish you didn't respond so quickly. I mean if you really enjoy the project, go on! But if you're tired, don't be afraid of taking a week offline at least...

Any more requests regarding debugging my NUC or my X570+3600X, you'll get it. More specific the better. NUC has that 0 Watts per core, still all cores Watts shown, correctly I assume, BUT then interpreted false by many I guess. I assume you refer to this as "masking zero values for Intel". Well, maybe just show total Cores Watts then. Highlighted. Like all cores showing total or "See below" (as a stupid example)

I'm not as skillful as you in this nor as understanding of the vocabulary so the easier instructions the better :) I think you more or less know my knowledge. I might sometimes be a sysadmin of hundreds of crucial govt servers. Doesn't mean I read hundreds of pages of CPU specs. We simply don't have to, we have absolutely no reason to. Things work since we always buy compatible HW for Linux and VMs. What I have at home is another case.

My monster laptop - I think you asked me to try some code changes on it - sorry haven't been able to get the time to it yet. All my previous posts are from my phone too. Haven't been close to a workstation for weeks. All is SSH. That laptop needs SATA2 disk moved to SATA1 as it apparently won't boot otherwise, although HDD set as first boot device and installation successfull. Tired of live USBs so it has a solid installation that doesn't boot basically. Also the arch is so old, is it very important or more of a "challenge/curiosity" from your side? CPU from 2009 you know... I seems to me it can help you with some newer ones, but still worth it? I mean for me, not even being were it is and bad health? Not to play that card, but yeah.

I might be stupid but let's say you support only CPUs from 2014+ minimum and get rid of legacy code, and officially support mainly Intel, but AMD is basically supported as well, Alpha or Beta depending.

I assume you know the difference, but since you add more features it seems Alpha for both 2nd Gen and 3rd Gen. A very stable Alpha though! still adjusting/adding features it's per definition Alpha. But never caused a crash or nothing of the sort. And CoreFreq does deep level sh*t I assume 1 wrong like could easily freeze the system just loading the module or starting a stress test or whatever.

Still it might be considerable to stop supporting like ALL 64-bit x86 CPUs out there? How many would use CoreFreq on my monster laptop's CPU you think? I hate to throw away still good HW as I guess you do, but most do, or they at least don't expect GitHub projects to work on their 9+ years old HW.

For cosmetics I think Power Usage has "lost" some of it's easiness of reading. People won't necessarily look at the bottom strip and seeing "Package" and "Cores". They're not highlighted nor do they have even W behind them. Should be the entire Watts. Before that was completely seen immediately, I mean Power Usage in Joules and Watts. Now you gotta know where to seek and to look AND know it's Watts estimate.

@olejon
Copy link

olejon commented Oct 23, 2019

EDIT: Just added last paragraph. Think it's kind of important. Should be more present and clearly shows power usage WITH what type of measurement. Like RM uses % of Watts but at least clearly says (and shows) it's X % of Watts, if you get it. Easy to see, a top priority measurement shown as a "wheel" AT THE TOP, meaning highlighted, not asking for a wheel, but before it was more clear.

Of course I know where to look know, now, but even I had to "look around".

Remember new users just following your instructions to compile and run may "dumb in our book", MANY DON'T EVEN HAVE A GITHUB ACCOUNT, so they'll never report and issue. I assume you haven't added hidden analytics or whatever, haven't even bothered to check, but probably a lot testing, running, trying stuff and expect like official tool experience. Which you actually have, and more, for Intel CPUs regarding changing stuff on demand, and for AMD monitoring (the most important after all).

I'd always thought about your README.md, *that needs some update to show the true power of this software. Good sections, or very understandable links to the Wiki with no "crazy" (for a regular person) CPU topology talk etc. Installation and usage. I can probably commit an update to it.

Point is, however how powerful this software is, keep installation, first use and navigation and main highlights and switches people look for KISS (Keep It Simple Stupid). I assume you know the acronym.

You can delete this comment after reading the last added paragraph to the comment before.

BTW: I'm sure there's still lot to do regarding Zen2 on X570 and later, but I think if you perfect Zen2 on X570, it'll work perfectly fine, maybe with some slight mods for 4th Gen. Just saying it may be worth the while. Although I'm perfectly happy, just saying doubt AMD will change anything drastically for 4th Gen, if you keep up reading on 3rd Gen, you'll probably have 4th Gen working perfectly. AMD seems to have set a path. I assume any 4000 CPU to just slide into my X570 socket and work. If not, AMD has not kept its promise, and with 3rd Gen it really seems they mean business. If you think 4th Gen will give you more core insight, fine. I'm happy. But kind of assume 4th Gen will be an incremental update to 3rd Gen using more or less same chipset, just more cores etc. Do you think they'll go say PCIe5? It's out as a standard, but nobody needs it - nobody use PCIe4 to its max at all yet. IDK just feels it'll be yet another "hush hush, surprisingly bash Intel on price vs performance with all manufacturers on board" (kind of incredible how AMD pulled of 3rd Gen without any leaks basically, and having MBs from ALL major manufacturers already lined up and ready to ship at launch just days later).

@cyring
Copy link
Owner Author

cyring commented Oct 28, 2019

TDP

Formula

PU = 2 << ( ( <val1> & 0x0f ) - 1 )

TDP = ( ( <val2> & 0x7fff ) / PU )

Where PU is an unsigned integer (default value is 0011b, indicating power unit is in 1/8 Watts increment). <val1> is the value of CPU MSR 0x606 and <val2> of MSR 0x614

Skylake i7-6700

  • Read the Power Unit and the Power Info
# rdmsr -acx 0x606
0xa0e03
0xa0e03
0xa0e03
0xa0e03
0xa0e03
0xa0e03
0xa0e03
0xa0e03

# rdmsr -acx 0x614
0x208
0x208
0x208
0x208
0x208
0x208
0x208
0x208
PU = 2 << ( ( 0xa0e03 & 0x0f ) - 1 )
PU = 2 << 2
PU = 8

TDP = ( ( 0x208 & 0x7fff ) / 8 )
TDP = 0x208 / 8
TDP = 65

TDP of Skylake i7-6700 = 65 Watts

SandyBridge i7-2710QE

# rdmsr -acx 0x606
0xa1003
0xa1003
0xa1003
0xa1003
0xa1003
0xa1003
0xa1003
0xa1003

# rdmsr -acx 0x614
0x10024001200168
0x10024001200168
0x10024001200168
0x10024001200168
0x10024001200168
0x10024001200168
0x10024001200168
0x10024001200168
PU = 2 << ( ( <val1> & 0x0f ) - 1 ) = 2 << ( ( 0xa1003 & 0x0f ) - 1 ) = 8

TDP = ( ( <val2> & 0x7fff ) / PU ) = ( ( 0x10024001200168 & 0x7fff ) / 8 )
TDP = 45

TDP of SandyBridge i7-2710QE = 45 Watts

Next steps

So far the Power Unit MSR is available for Intel and AMD Zen but the Power Info MSR is only found on Intel

@cyring
Copy link
Owner Author

cyring commented Nov 6, 2019

  • Here are the results of the last feature to get the TDP (Intel only)
    2019-11-06-101904_644x316_scrot
    2019-11-06-101901_644x316_scrot
    2019-11-06-101858_644x316_scrot
  • Please post yours !

@olejon
Copy link

olejon commented Nov 7, 2019

That's cool! My NUC shows 15 TDP, although a full stress test gives 6+ Watts. Haven't checked specs for the CPU but 99 % sure it's 15 Watts (or in that area).

EDIT: Need a screenshot for that?

Just gone through the hassle of BIOS update, latest Windows Insider Update, Radeon Update (really just GPU, no chipset), reset BIOS and adjusted all settings again in case new settings.

Using Ryzen Master (no updates to it yet) for PBO I know very well what it does to UEFI now. Thank God AMD it's at least OS transparent (you can do exactly the same without Windows, just gotta change like 3 things in UEFI for a RM standard 100 MHz PBO, where 200 is max, it's not recommended nor gives really any better results, really).

I've done Geekbench again, this time v 4.4.2, even though GB 5 is out, it uses very different scores and is quite new, and previous results were from 4.4.1. Will post screenshots but can say Linux vs Windows totally idle and CLI, Linux crushes Windows again, maybe even more. Gotta check Blender Benchmark tool though, if there's still like a 3-4 minute gain. Heck it's the only bench tool on Windows that it's possible to get the CPU Package Watts to max... CoreFreq is just Conic... There's a new version of CPU-Z and HWMonitor, but at least CPU-Z isn't close to stress the CPU to max set to max threads, but it seems HWMonitor is more precise, but still waaay too inaccurate compared to official RM, where CoreFreq gives basically the same result as RM. So again CoreFreq is just as good as the official Windows tool for Watts, temperature and Voltage ;-)

@cyring
Copy link
Owner Author

cyring commented Nov 7, 2019

EDIT: Need a screenshot for that?

Yes, please, such as the IvyBridge screenshot above: showing the TDP in Power window, plus the view Voltage with Package and Cores watt, doing a full stressed processor, with algorithm Conics - 2 plans

Indeed, same issue encountered with IVB where TDP is computed to 130W but the stressed processor does not consume that TDP power.

About AMD and pre SandyBridge processors, I've not found any MSR, PCI registers to compute the TDP. I'll avoid to maintain a values table for those thus they are showing a zero TDP result.

@olejon
Copy link

olejon commented Nov 7, 2019

With "Conic 2 plans" do you mean the "Two parallel planes"?

Well on AMD the TDP can at least be obtained by simply stressing the CPU with Conic and look at the Package Watts usage, which corresponds to the TDP (on my CPU at least). Actually getting a little above, like 96.x Watts (probably due to PBO which is an official tiny overclock).

Windows don't seem to be able to reach TDP completely, even using the installed Ryzen Performance Power Plan == Performance Governor on Linux == CPU always at highest frequency, C0, at all times, when stressed to the max, which only Blender Benchmark manages on Windows.

Maybe that's why Linux crushes Windows in benchmarks...? Weird anyway.

Didn't do a full test yesterday as the renders take 20+ minutes, but saw Ryzen Master showing a significant higher % of 128 Watts (which is how RM shows Power Usage) than if stressing, choosing all cores, with say CPU-Z.

Anyway Watts % tends to drop slightly in RM as the CPU gets hotter, so I doubt a full render will reach max TDP, although the second render (classroom) is harder on the CPU than the first (bmw).

@olejon
Copy link

olejon commented Nov 7, 2019

Here you go from my NUC (Conic Compute - Two Parallel Planes):

EDIT: Indeed the TDP is 15 Watts according to the Intel datasheet.

Image

As said for AMD it seems to just be to stress the CPU using whatever Conic Compute stresstest, and the Package Watts will raise to the TDP it seems (perfectly for me, unless you've changed some UEFI stuff that touches that).

As CoreFreq shows exactly what CPU model you have, it's just a Google search away anyway. AMD has the basics people care about, like TDP, listed for every CPU on that CPU's "homepage" at the bottom. I can only assume when I get a 3950X, which has a TDP of 105 Watts, stressing it with Conic will raise it to that. Well this for at least X570 + 3rd Gen it seems.

Maybe write "Missing" instead of "0" for TDP on AMD? Since you use the words "Capable" and "Missing". People will understand, that of course a TDP exists, but that CoreFreq can't read it.

Cheers.

@cyring
Copy link
Owner Author

cyring commented Nov 7, 2019

The general formula for the energy units might be the issue. My formula based on Intel specs is OK for SandyBridge and afterward architectures. However, specs also mention an exception for Atom kind processors.
I put those conditions here:

switch (Proc->powerFormula) {

I presume other architectures, such as NUC and IVB-EP, may be subject to other computation exceptions to resolve the power unit formula.
Unfortunately I have not found any datasheets which specify some good values or equation to apply.
We could tune the NUC power formula until we reach the TDP but how to be sure this approximation is true for the whole architecture...

@olejon
Copy link

olejon commented Nov 7, 2019

Well the CPU in the NUC is from the third quarter of 2013 so I don't expect full support. Don't think many users with Atom processors do either (if you buy that then well you're probably not a power user, a 4th Gen Raspberry Pi can probably be faster, at least iGPU). The cheapest NUCs now come with Pentium Gold CPUs, and as mine the second cheapest has an i3.

The lower-than-TDP usage may be just the way Intel has built the board for maximum power saving - just putting in a CPU otherwise sold in laptops - but for the NUCs they maybe don't need as much, and headroom for peripherals.

Remember when I had the peripherals plugged in? With HDMI, dongle for wireless keyboard and mouse and a USB stick, the power usage was quite higher on the CPU, several Watts higher IIRC, so Intel has made some headroom there in case you use all the ports, which seems to make the CPU use several Watts more.

In this screenshot it's headless, no desktop, only Ethernet and power.

It has 4 USB ports, 1 mini-HDMI, 1 mini-DisplayPort, analog audio out and in (microphone) and an Infrared sensor as well (which works perfectly with LIRC BTW, like mapping a generic or common IR remote control).

@cyring
Copy link
Owner Author

cyring commented Nov 10, 2019

Todo

  • Find the RAPL unit factor to apply as an exception to the power and energy formula
  • Impacted architectures:
  1. IVB-EP and probably SNB-EP
  2. Various Atoms
  3. NUC platforms

@cyring
Copy link
Owner Author

cyring commented Feb 27, 2020

Feature is stable

@cyring cyring closed this as completed Feb 27, 2020
@cyring cyring unpinned this issue Feb 27, 2020
This was referenced Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants