-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for Intel Power Gadget #18
Conversation
It might be good to add some CI tests for this. On Appveyor, the windows builds run on Intel CPUs, cf. https://ci.appveyor.com/project/ranocha/toolkiticl/builds/22608195/job/fuwfknym9x7dah60#L140. |
I added a very basic MSR power interface for Linux, just for testing. Do you get power values for one of the sockets or just the overall package? There might be some problem with the Intel register offsets. |
What do I have to do/install in order to make this work? I don't have any
Hence, I get the same error as the tests on Appveyor & Travis. |
You need the the |
That fixed the problem for me. However, I had to run Anyway, here is the output of some test:julia> using HDF5
julia> h5open("out_DP_256_test.h5", "r") do io
for n in names(io)
println(n)
display(read(io, n))
println()
end
end
Data
Dict{String,Any} with 3 entries:
"B" => Float32[4.0 4.0 … 0.0 0.0]
"A" => Float32[6.0 6.0 … 0.0 0.0]
"C" => Float32[2.4e7 0.0]
Data_LoadTime
5.479
Data_StoreTime
23.68
Global_Range
1×3 Array{Int32,2}:
1000192 1 1
Host_OS
"Linux Ubuntu 18.04.2 LTS/4.15.0-45-generic/#48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019"
Intel_HK
Dict{String,Any} with 5 entries:
"DRAM" => [0.563965 0.456543 … 0.466309 0.458984]
"Package" => [115.625 120.781 … 95.0781 95.7813]
"Socket 0" => [104.766 109.844 … 84.0625 83.9258]
"Socket 1" => [0.0 0.0 … 0.0 0.0]
"Power_Time" => ["27-02-2019 15:12:25:052", "27-02-2019 15:12:25:153", "27-02-201…
Kernel_ExecStart
"27-02-2019 15:12:32"
Kernel_ExecTime
7458.389
Kernel_Settings
"-DREDUCE=1 -DNR=1 -DNC=1 -DNI=1 -DREAL=float -DW_SIZE=256 -cl-single-precision-constant -cl-mad-enable -cl-no-signed-zeros -cl-finite-math-only"
Local_Range
1×3 Array{Int32,2}:
256 1 1
NV_HK
Dict{String,Any} with 0 entries
OpenCL_Device
"Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz"
OpenCL_Version
"OpenCL 2.1 (Build 0)"
Range_Start
1×3 Array{Int32,2}:
0 0 0
Total_ExecTime
7643.304 |
Runnign a test on a virtual machine (Ubuntu Xenial) yields
Edit: Click on the triangle to expand the output. |
It might be necessary to set the /dev/cpu permissions once, afterwards it should work. |
No problem; I can run new tests if you wish. Small note: The time format |
We can definitively change the date format. I tested it using an i5-8400 and it seems to work. Even though it seems the power draw at idle gets underestimated, it looks ok under load. Have you tired using the |
It also seems to work on @Kostaszki notebooks with an i5 CPU. Maybe at this point I should compare the MSR RAPL results against real power measurements. |
Using `$ sudo toolkitICL -b -isp 1000 -d 1 -c SSPRK_test.h5`, I get the following results:julia> [intel["DRAM"]' intel["Package"]' intel["Socket 0"]' intel["Socket 1"]']
95×4 Array{Float64,2}:
0.45282 13.0098 1.9209 0.0
0.448181 12.2617 1.18066 0.0
0.44635 12.168 1.09961 0.0
0.449524 12.25 1.14063 0.0
0.453613 137.094 126.085 0.0
0.446838 98.6055 87.5176 0.0
0.447449 95.0059 83.9395 0.0
0.448059 94.9941 83.9355 0.0
0.447754 94.9414 83.8818 0.0
0.447205 95.0527 83.8877 0.0
0.446655 94.9238 83.8662 0.0
0.446716 94.9336 83.8721 0.0
0.447754 94.9414 83.8711 0.0
0.44696 94.9277 83.873 0.0
0.448242 94.918 83.8457 0.0
0.446899 94.9297 83.8721 0.0
0.44751 94.9082 83.8477 0.0
0.447144 94.9336 83.8643 0.0
0.447815 95.0039 83.8535 0.0
0.446655 94.9316 83.8633 0.0
0.447449 94.9297 83.8662 0.0
0.446899 94.9199 83.8574 0.0
0.446655 94.9336 83.8701 0.0
0.446777 94.9258 83.8643 0.0
0.447937 94.918 83.8545 0.0
0.44696 94.9336 83.8652 0.0
0.447144 94.9258 83.8594 0.0
0.447571 95.002 83.8467 0.0
0.447632 94.9258 83.8633 0.0
0.44696 94.9258 83.8584 0.0
0.447388 94.9277 83.8623 0.0
0.446838 94.9141 83.8564 0.0
0.447571 94.9336 83.8643 0.0
⋮
0.446716 94.9199 83.8691 0.0
0.447449 94.918 83.8496 0.0
0.447876 94.9355 83.8711 0.0
0.448181 95.002 83.8398 0.0
0.447449 94.9316 83.8652 0.0
0.447021 94.9297 83.8652 0.0
0.446899 94.9121 83.8496 0.0
0.447144 94.9277 83.8574 0.0
0.446716 94.9199 83.8555 0.0
0.446716 94.9316 83.873 0.0
0.447327 94.918 83.8516 0.0
0.447266 94.9355 83.8633 0.0
0.446838 94.9141 83.8516 0.0
0.447144 95.0254 83.8672 0.0
0.446899 94.916 83.8496 0.0
0.447449 94.9277 83.8672 0.0
0.44751 94.9199 83.8535 0.0
0.446655 94.9238 83.8633 0.0
0.446838 94.9238 83.8477 0.0
0.447021 94.9395 83.8789 0.0
0.447327 94.916 83.8613 0.0
0.446777 94.9121 83.8379 0.0
0.447021 94.9258 83.8574 0.0
0.447266 95.0293 83.875 0.0
0.447266 94.9063 83.8418 0.0
0.447083 94.9336 83.8652 0.0
0.447449 94.9219 83.8594 0.0
0.447021 94.918 83.8574 0.0
0.449036 94.9297 83.8594 0.0
0.453003 30.6055 19.4902 0.0
0.447144 12.3594 1.27734 0.0
0.449402 12.4297 1.35938 0.0
0.446533 12.3047 1.23633 0.0 Edit: Click on the triangle to expand the output. |
That looks surprisingly good! Thanks. As it seems the >100W were indeed caused by Turbo-Boost - interesting. The TDP is also correct, so the registers get read out correctly. So apart from getting the CI checks working, the rest is mostly cleanup. How do we want to enable RAPL on Linux systems? Always enable in cmake if Linux and Intel CPUs are detected and make the user try if |
As far as I see, there is no explicit dependency on If that is possible, changes to the automatic tests are minimised. We only have to add some new tests. |
If the Intel and the Nvidia parts work, it might be an option to create a unified interface to the power consumption related parts, save the results in one data group and add an identifier (string), describing the method used to log the power. What do you think? Of course, I am willing to work on that part. |
A unified power interface is a good idea. Especially if I get the AMD part working as well (work is ongoing). At some point it might even become an independent project, as a light weight cross platform power consumption interface with support for AMD, Intel and NVidia might be interesting to other people as well. Maybe we should discuss this in detail. @ranocha do you know the best solution to get CPU temperature with Linux? |
That's good! Sorry, I don't know a really good solution. On my system,
seems to do the trick (returns the temperature in milli °C, cf. https://www.kernel.org/doc/Documentation/thermal/sysfs-api.txt). However, there is no such file on my VMs. |
The question is, if we want to use something vendor specific like MSRs or something more generic like the sysfs drivers? A solution based on the MSRs might be the most powerful and requires very little extra dependencies. The sysfs approach is more generic and cross platform but might not work out of the box on all systems (especially clusters). |
Well, maybe there is simply no "best" solution. If we want to have something working on as many platforms as possible, we might be forced to do something like
|
That might very well be the case. For now, the MSR approach might be the easiest, as the necessary MSR read functions are already implemented. |
If we unify the interface for power/temperature/etc. logging, we should also unify the command line options. This would also simplify the testing and application of |
Adding really good CI tests might be a bit problematic, since the low-level MSR tools are not really supported in VMs. On my machines, the output is simply zero for all variables. Thus, we cannot test the output reliably. Nevertheless, I will add some tests of the basic functionality later. |
Let's see whether github recognises this message: Commits included here close #21. |
Theoretically this should be enough to merge this PR or is there still something missing? |
I will have a look at the new commits soon. |
src/main.cpp
Outdated
@@ -942,7 +1020,7 @@ int main(int argc, char *argv[]) { | |||
|
|||
if (intel_log_power) | |||
{ | |||
h5_write_single<float>(out_name, "/housekeeping/intel/TDP", rapl->get_TDP(), | |||
h5_write_single<uint32_t>(out_name, "/housekeeping/intel/TDP", rapl->get_TDP(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other power data are stored as float
. Should we really use two different data types here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, float is ok, the drivers just return ints, so it was just the intuitive habit to change everything to the same datatype without thinking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Will you change it back to float
in that case?
} | ||
if (amd_temp_rate > 0) | ||
{ | ||
h5_write_buffer<double>(out_name, "/housekeeping/amd/temperature_time", amd_power_time.data(), amd_power_time.size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be amd_temp_time
instead of amd_power_time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should, if two separate threads with different sampling rates are used, but unfortunately, this does not work. So right now it is more resource efficient to use one thread with one time vector. The problem is that if both temperature and power logging are enabled, the driver only returns power. Separately it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Could you please add a corresponding remark in the source code and file an issue?
Which encoding do you use for the |
Do you plan to add support for power/temperature logging on AMD GPUs in this PR or in another one in the future? If it shall be implemented in the future, we should add that part to #20. |
Right now, it should output GPU power/temperature as well, if CPU logging is enabled, as there is no device specific selection. I just have not managed to test it yet due to driver problems. As soon as I get the drivers running, I can test how the GPU is enumerated and add a specific switch. Maybe we should amend #20? |
It seems the encoding is changed during the push process by git on my system - could you fix it? |
That's a good idea. |
If the remark described in #18 (comment) is added and the change to Edit: Should also mention the problem with concurrent power/temperature logging for AMD in the README or the displayed help? |
Datatype is fixed and I added a warning, if concurrent AMD logging is selected. |
Added support for Intel Power Gadget. Right now only Windows (CPU&GPU) is supported. Linux support is possible but still work in progress.