Added support for Intel Power Gadget #18

philipheinisch · 2019-02-24T15:30:22Z

Added support for Intel Power Gadget. Right now only Windows (CPU&GPU) is supported. Linux support is possible but still work in progress.

ranocha · 2019-02-25T13:34:40Z

It might be good to add some CI tests for this. On Appveyor, the windows builds run on Intel CPUs, cf. https://ci.appveyor.com/project/ranocha/toolkiticl/builds/22608195/job/fuwfknym9x7dah60#L140.

philipheinisch · 2019-02-27T11:41:46Z

I added a very basic MSR power interface for Linux, just for testing. Do you get power values for one of the sockets or just the overall package? There might be some problem with the Intel register offsets.

ranocha · 2019-02-27T12:27:04Z

What do I have to do/install in order to make this work? I don't have any /dev/cpu/[0-9]:

$ ll -la /dev/cpu
total 0
drwxr-xr-x  2 root root      60 Feb 27 13:17 ./
drwxr-xr-x 21 root root    4620 Feb 27 13:17 ../
crw-------  1 root root 10, 184 Feb 27 13:17 microcode

Hence, I get the same error as the tests on Appveyor & Travis.

philipheinisch · 2019-02-27T13:54:36Z

You need the the msr-tools package and maybe sudo modprobe msr.

ranocha · 2019-02-27T14:17:42Z

You need the the msr-tools package and sudo modprobe msr.

That fixed the problem for me. However, I had to run sudo toolkitICL, so we might want to discuss whether the default option should be to disable this and whether there are possibilities to disable errors if isp is not used.

Anyway, here is the output of some test:

julia> using HDF5

julia> h5open("out_DP_256_test.h5", "r") do io
           for n in names(io)
               println(n)
               display(read(io, n))
               println()
           end
       end
Data
Dict{String,Any} with 3 entries:
  "B" => Float32[4.0 4.0 … 0.0 0.0]
  "A" => Float32[6.0 6.0 … 0.0 0.0]
  "C" => Float32[2.4e7 0.0]

Data_LoadTime
5.479

Data_StoreTime
23.68

Global_Range
1×3 Array{Int32,2}:
 1000192  1  1

Host_OS
"Linux Ubuntu 18.04.2 LTS/4.15.0-45-generic/#48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019"

Intel_HK
Dict{String,Any} with 5 entries:
  "DRAM"       => [0.563965 0.456543 … 0.466309 0.458984]
  "Package"    => [115.625 120.781 … 95.0781 95.7813]
  "Socket 0"   => [104.766 109.844 … 84.0625 83.9258]
  "Socket 1"   => [0.0 0.0 … 0.0 0.0]
  "Power_Time" => ["27-02-2019 15:12:25:052", "27-02-2019 15:12:25:153", "27-02-201…

Kernel_ExecStart
"27-02-2019 15:12:32"

Kernel_ExecTime
7458.389

Kernel_Settings
"-DREDUCE=1 -DNR=1 -DNC=1 -DNI=1 -DREAL=float -DW_SIZE=256 -cl-single-precision-constant -cl-mad-enable -cl-no-signed-zeros -cl-finite-math-only"

Local_Range
1×3 Array{Int32,2}:
 256  1  1

NV_HK
Dict{String,Any} with 0 entries

OpenCL_Device
"Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz"

OpenCL_Version
"OpenCL 2.1 (Build 0)"

Range_Start
1×3 Array{Int32,2}:
 0  0  0

Total_ExecTime
7643.304

Does that answer your question?

ranocha · 2019-02-27T14:46:14Z

Runnign a test on a virtual machine (Ubuntu Xenial) yields

$ h5dump -g /Intel_HK out_DP_256_test.h5 
HDF5 "out_DP_256_test.h5" {
GROUP "/Intel_HK" {
   DATASET "DRAM" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 8, 1 ) / ( 8, 1 ) }
      DATA {
      (0,0): 0,
      (1,0): 0,
      (2,0): 0,
      (3,0): 0,
      (4,0): 0,
      (5,0): 0,
      (6,0): 0,
      (7,0): 0
      }
   }
   DATASET "Package" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 8, 1 ) / ( 8, 1 ) }
      DATA {
      (0,0): 0,
      (1,0): 0,
      (2,0): 0,
      (3,0): 0,
      (4,0): 0,
      (5,0): 0,
      (6,0): 0,
      (7,0): 0
      }
   }
   DATASET "Power_Time" {
      DATATYPE  H5T_STRING {
         STRSIZE 24;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SIMPLE { ( 8 ) / ( 8 ) }
      DATA {
      (0): "27-02-2019 15:35:43:327", "27-02-2019 15:35:48:327",
      (2): "27-02-2019 15:35:53:337", "27-02-2019 15:35:58:337",
      (4): "27-02-2019 15:36:03:337", "27-02-2019 15:36:08:338",
      (6): "27-02-2019 15:36:13:338", "27-02-2019 15:36:18:338"
      }
   }
   DATASET "Socket 0" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 8, 1 ) / ( 8, 1 ) }
      DATA {
      (0,0): 0,
      (1,0): 0,
      (2,0): 0,
      (3,0): 0,
      (4,0): 0,
      (5,0): 0,
      (6,0): 0,
      (7,0): 0
      }
   }
   DATASET "Socket 1" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 8, 1 ) / ( 8, 1 ) }
      DATA {
      (0,0): 0,
      (1,0): 0,
      (2,0): 0,
      (3,0): 0,
      (4,0): 0,
      (5,0): 0,
      (6,0): 0,
      (7,0): 0
      }
   }
}
}

Edit: Click on the triangle to expand the output.

philipheinisch · 2019-02-27T15:06:46Z

It might be necessary to set the /dev/cpu permissions once, afterwards it should work.
The results seem strange for an 8700K and the DRAM consumption is very low. It is possible the DRAM power estimation is no supported on your system, but >100W is implausible for your system. I will recheck the MSR registers. Thanks.

ranocha · 2019-02-27T15:10:28Z

No problem; I can run new tests if you wish.

Small note: The time format "%d-%m-%Y %H:%M:%S" is not really international, "%Y-%m-%d %H:%M:%S" might be better (and is consistent, since the units become smaller from left to right).

philipheinisch · 2019-02-27T20:43:16Z

We can definitively change the date format. I tested it using an i5-8400 and it seems to work. Even though it seems the power draw at idle gets underestimated, it looks ok under load. Have you tired using the -b switch? In this case you should see the increase in power after 4 s. Maybe the high power consumption is caused by Intel TurboBoost? Could you try increasing the execution time and see if the power drops? It also seems to work on an Amazon ec2 instance with a E5-2676 v3 but only the package power not the sockets. To check if the correct MSRs are read, I added a function to read out the TDP, which should match the TDP given by Intel.

philipheinisch · 2019-02-27T20:50:04Z

It also seems to work on @Kostaszki notebooks with an i5 CPU. Maybe at this point I should compare the MSR RAPL results against real power measurements.

ranocha · 2019-02-28T05:46:31Z

Using `$ sudo toolkitICL -b -isp 1000 -d 1 -c SSPRK_test.h5`, I get the following results:

julia> [intel["DRAM"]' intel["Package"]' intel["Socket 0"]' intel["Socket 1"]']
95×4 Array{Float64,2}:
 0.45282    13.0098    1.9209   0.0
 0.448181   12.2617    1.18066  0.0
 0.44635    12.168     1.09961  0.0
 0.449524   12.25      1.14063  0.0
 0.453613  137.094   126.085    0.0
 0.446838   98.6055   87.5176   0.0
 0.447449   95.0059   83.9395   0.0
 0.448059   94.9941   83.9355   0.0
 0.447754   94.9414   83.8818   0.0
 0.447205   95.0527   83.8877   0.0
 0.446655   94.9238   83.8662   0.0
 0.446716   94.9336   83.8721   0.0
 0.447754   94.9414   83.8711   0.0
 0.44696    94.9277   83.873    0.0
 0.448242   94.918    83.8457   0.0
 0.446899   94.9297   83.8721   0.0
 0.44751    94.9082   83.8477   0.0
 0.447144   94.9336   83.8643   0.0
 0.447815   95.0039   83.8535   0.0
 0.446655   94.9316   83.8633   0.0
 0.447449   94.9297   83.8662   0.0
 0.446899   94.9199   83.8574   0.0
 0.446655   94.9336   83.8701   0.0
 0.446777   94.9258   83.8643   0.0
 0.447937   94.918    83.8545   0.0
 0.44696    94.9336   83.8652   0.0
 0.447144   94.9258   83.8594   0.0
 0.447571   95.002    83.8467   0.0
 0.447632   94.9258   83.8633   0.0
 0.44696    94.9258   83.8584   0.0
 0.447388   94.9277   83.8623   0.0
 0.446838   94.9141   83.8564   0.0
 0.447571   94.9336   83.8643   0.0
 ⋮                                 
 0.446716   94.9199   83.8691   0.0
 0.447449   94.918    83.8496   0.0
 0.447876   94.9355   83.8711   0.0
 0.448181   95.002    83.8398   0.0
 0.447449   94.9316   83.8652   0.0
 0.447021   94.9297   83.8652   0.0
 0.446899   94.9121   83.8496   0.0
 0.447144   94.9277   83.8574   0.0
 0.446716   94.9199   83.8555   0.0
 0.446716   94.9316   83.873    0.0
 0.447327   94.918    83.8516   0.0
 0.447266   94.9355   83.8633   0.0
 0.446838   94.9141   83.8516   0.0
 0.447144   95.0254   83.8672   0.0
 0.446899   94.916    83.8496   0.0
 0.447449   94.9277   83.8672   0.0
 0.44751    94.9199   83.8535   0.0
 0.446655   94.9238   83.8633   0.0
 0.446838   94.9238   83.8477   0.0
 0.447021   94.9395   83.8789   0.0
 0.447327   94.916    83.8613   0.0
 0.446777   94.9121   83.8379   0.0
 0.447021   94.9258   83.8574   0.0
 0.447266   95.0293   83.875    0.0
 0.447266   94.9063   83.8418   0.0
 0.447083   94.9336   83.8652   0.0
 0.447449   94.9219   83.8594   0.0
 0.447021   94.918    83.8574   0.0
 0.449036   94.9297   83.8594   0.0
 0.453003   30.6055   19.4902   0.0
 0.447144   12.3594    1.27734  0.0
 0.449402   12.4297    1.35938  0.0
 0.446533   12.3047    1.23633  0.0

Clearly, the power increases after 4 seconds. The TDP is 95 (stored as a double?).

Edit: Click on the triangle to expand the output.

philipheinisch · 2019-02-28T06:47:26Z

That looks surprisingly good! Thanks. As it seems the >100W were indeed caused by Turbo-Boost - interesting. The TDP is also correct, so the registers get read out correctly. So apart from getting the CI checks working, the rest is mostly cleanup. How do we want to enable RAPL on Linux systems? Always enable in cmake if Linux and Intel CPUs are detected and make the user try if -isp works without crashing or try and detect if the required MSR files are accessible in cmake and exclude RAPL during build if not (which would mean the user needs to recompile after installing msr-tools)?

ranocha · 2019-02-28T06:52:57Z

As far as I see, there is no explicit dependency on msr-tools during build time. Can we restructure the code such that there is no access to msr files if the sample_rate for -isp is zero? In that case, toolkitICL can be used as before in most cases and the user must explicitly opt-in to log the power.

If that is possible, changes to the automatic tests are minimised. We only have to add some new tests.

ranocha · 2019-02-28T07:30:53Z

If the Intel and the Nvidia parts work, it might be an option to create a unified interface to the power consumption related parts, save the results in one data group and add an identifier (string), describing the method used to log the power. What do you think?

Of course, I am willing to work on that part.

philipheinisch · 2019-02-28T07:56:47Z

A unified power interface is a good idea. Especially if I get the AMD part working as well (work is ongoing). At some point it might even become an independent project, as a light weight cross platform power consumption interface with support for AMD, Intel and NVidia might be interesting to other people as well. Maybe we should discuss this in detail.

@ranocha do you know the best solution to get CPU temperature with Linux?

ranocha · 2019-02-28T08:22:25Z

Maybe we should discuss this in detail.

That's good!

Sorry, I don't know a really good solution. On my system,

$ cat /sys/class/thermal/thermal_zone2/temp

seems to do the trick (returns the temperature in milli °C, cf. https://www.kernel.org/doc/Documentation/thermal/sysfs-api.txt). However, there is no such file on my VMs.

ranocha · 2019-02-28T08:57:47Z

Did you try something like https://github.com/torvalds/linux/blob/7d762d69145a54d169f58e56d6dac57a5508debc/tools/power/x86/turbostat/turbostat.c#L1937?

philipheinisch · 2019-02-28T09:12:52Z

The question is, if we want to use something vendor specific like MSRs or something more generic like the sysfs drivers? A solution based on the MSRs might be the most powerful and requires very little extra dependencies. The sysfs approach is more generic and cross platform but might not work out of the box on all systems (especially clusters).

ranocha · 2019-02-28T09:22:20Z

Well, maybe there is simply no "best" solution. If we want to have something working on as many platforms as possible, we might be forced to do something like

if (sysfs_works()) {
  do_sysfs();
}
else if (msr_works()) {
  do_msr();
}
else if (...)

philipheinisch · 2019-02-28T09:35:57Z

That might very well be the case. For now, the MSR approach might be the easiest, as the necessary MSR read functions are already implemented.

ranocha · 2019-03-01T04:56:53Z

If we unify the interface for power/temperature/etc. logging, we should also unify the command line options. This would also simplify the testing and application of toolkitICL.

ranocha · 2019-03-01T06:04:17Z

Adding really good CI tests might be a bit problematic, since the low-level MSR tools are not really supported in VMs. On my machines, the output is simply zero for all variables. Thus, we cannot test the output reliably. Nevertheless, I will add some tests of the basic functionality later.

ranocha · 2019-03-10T15:34:42Z

Let's see whether github recognises this message: Commits included here close #21.

philipheinisch · 2019-03-12T14:47:50Z

Theoretically this should be enough to merge this PR or is there still something missing?

ranocha · 2019-03-12T14:58:16Z

I will have a look at the new commits soon.
What's the problem with concurrent power and temperature logging you mentioned earlier (in #20)?

ranocha · 2019-03-12T15:11:29Z

src/main.cpp

@@ -942,7 +1020,7 @@ int main(int argc, char *argv[]) {

  if (intel_log_power)
  {
-    h5_write_single<float>(out_name, "/housekeeping/intel/TDP", rapl->get_TDP(),
+    h5_write_single<uint32_t>(out_name, "/housekeeping/intel/TDP", rapl->get_TDP(),


Other power data are stored as float. Should we really use two different data types here?

No, float is ok, the drivers just return ints, so it was just the intuitive habit to change everything to the same datatype without thinking...

Okay. Will you change it back to float in that case?

ranocha · 2019-03-12T15:12:10Z

src/main.cpp

+  }
+  if (amd_temp_rate > 0)
+  {
+      h5_write_buffer<double>(out_name, "/housekeeping/amd/temperature_time", amd_power_time.data(), amd_power_time.size(),


Shouldn't this be amd_temp_time instead of amd_power_time?

It should, if two separate threads with different sampling rates are used, but unfortunately, this does not work. So right now it is more resource efficient to use one thread with one time vector. The problem is that if both temperature and power logging are enabled, the driver only returns power. Separately it works.

Okay. Could you please add a corresponding remark in the source code and file an issue?

ranocha · 2019-03-12T15:22:46Z

Which encoding do you use for the README.md? On my system using UTF-8 as standard encoding, the µ in AMD µProf is displayed as �.

ranocha · 2019-03-12T15:24:59Z

Do you plan to add support for power/temperature logging on AMD GPUs in this PR or in another one in the future? If it shall be implemented in the future, we should add that part to #20.

philipheinisch · 2019-03-12T15:29:17Z

Right now, it should output GPU power/temperature as well, if CPU logging is enabled, as there is no device specific selection. I just have not managed to test it yet due to driver problems. As soon as I get the drivers running, I can test how the GPU is enumerated and add a specific switch. Maybe we should amend #20?

philipheinisch · 2019-03-12T15:30:53Z

Which encoding do you use for the README.md? On my system using UTF-8 as standard encoding, the µ in AMD µProf is displayed as �.

It seems the encoding is changed during the push process by git on my system - could you fix it?

ranocha · 2019-03-12T15:32:55Z

Maybe we should amend #20?

That's a good idea.

ranocha · 2019-03-12T15:36:28Z

If the remark described in #18 (comment) is added and the change to float described in #18 (comment) is done, we can merge this PR.

Edit: Should also mention the problem with concurrent power/temperature logging for AMD in the README or the displayed help?

philipheinisch · 2019-03-12T15:48:44Z

Datatype is fixed and I added a warning, if concurrent AMD logging is selected.

Added support for Intel Power Gadget

b2c3edf

philipheinisch requested a review from Kostaszki February 24, 2019 15:30

Added basic Intel MSR power logging for Linux.

cdc5952

philipheinisch requested a review from ranocha February 27, 2019 11:41

Added MSR TDP output for MSR validation

315d69d

Changed TDP to uint32 and made MSR optional.

bd1b1a2

philipheinisch added 2 commits February 28, 2019 09:45

Updated README.md.

2289297

Changed date format and added Windows Intel TDP output.

9cc7456

ranocha and others added 3 commits March 10, 2019 13:23

naming of input/output variables

c270763

Small fix for Windows Intel power logging.

2c5f6b4

Started implementing unified RAPL class for Windows and Linux.

ad74732

ranocha mentioned this pull request Mar 10, 2019

Power and Temperature Logging on Mac OS X #29

Closed

ranocha and others added 3 commits March 10, 2019 17:08

add datetime documentation to doc/datatypes.md

8855bae

Small include fix

ceac113

move kernel_repetitions to /settings

ade2105

ranocha mentioned this pull request Mar 11, 2019

Additional Logging #20

Open

6 tasks

philipheinisch added 2 commits March 12, 2019 09:18

Finished first version of single rapl interface.

12a2a0c

Basic AMD temperature

af237c4

Added AMD requirements to README.md

78ac23b

ranocha reviewed Mar 12, 2019

View reviewed changes

fix a compiler warning

e41b987

fix µ in README.md [ci skip]

c2b33b2

Fixed RAPL datatype and added warning for AMD logging.

67ffb47

Added hint in source code regarding the AMD issue.

29e56ee

ranocha self-requested a review March 12, 2019 15:55

ranocha approved these changes Mar 12, 2019

View reviewed changes

ranocha merged commit 1d4fef3 into IANW-Projects:master Mar 12, 2019

Added support for Intel Power Gadget #18

Added support for Intel Power Gadget #18

Conversation

philipheinisch commented Feb 24, 2019

ranocha commented Feb 25, 2019

philipheinisch commented Feb 27, 2019

ranocha commented Feb 27, 2019 • edited Loading

philipheinisch commented Feb 27, 2019

ranocha commented Feb 27, 2019 • edited Loading

ranocha commented Feb 27, 2019 • edited Loading

philipheinisch commented Feb 27, 2019

ranocha commented Feb 27, 2019

philipheinisch commented Feb 27, 2019

philipheinisch commented Feb 27, 2019

ranocha commented Feb 28, 2019 • edited Loading

philipheinisch commented Feb 28, 2019

ranocha commented Feb 28, 2019 • edited Loading

ranocha commented Feb 28, 2019

philipheinisch commented Feb 28, 2019

ranocha commented Feb 28, 2019 • edited Loading

ranocha commented Feb 28, 2019

philipheinisch commented Feb 28, 2019

ranocha commented Feb 28, 2019

philipheinisch commented Feb 28, 2019

ranocha commented Mar 1, 2019 • edited Loading

ranocha commented Mar 1, 2019

ranocha commented Mar 10, 2019

philipheinisch commented Mar 12, 2019

ranocha commented Mar 12, 2019 • edited Loading

ranocha Mar 12, 2019

Choose a reason for hiding this comment

philipheinisch Mar 12, 2019

Choose a reason for hiding this comment

ranocha Mar 12, 2019

Choose a reason for hiding this comment

ranocha Mar 12, 2019

Choose a reason for hiding this comment

philipheinisch Mar 12, 2019

Choose a reason for hiding this comment

ranocha Mar 12, 2019

Choose a reason for hiding this comment

ranocha commented Mar 12, 2019

ranocha commented Mar 12, 2019

philipheinisch commented Mar 12, 2019

philipheinisch commented Mar 12, 2019

ranocha commented Mar 12, 2019

ranocha commented Mar 12, 2019 • edited Loading

philipheinisch commented Mar 12, 2019

ranocha commented Feb 27, 2019 •

edited

Loading

ranocha commented Feb 27, 2019 •

edited

Loading

ranocha commented Feb 27, 2019 •

edited

Loading

ranocha commented Feb 28, 2019 •

edited

Loading

ranocha commented Feb 28, 2019 •

edited

Loading

ranocha commented Feb 28, 2019 •

edited

Loading

ranocha commented Mar 1, 2019 •

edited

Loading

ranocha commented Mar 12, 2019 •

edited

Loading

ranocha commented Mar 12, 2019 •

edited

Loading