Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
likwid-perfscope: Tool to perform live plotting of performance data
likwid-perfscope is a command line application written in Lua that uses the timeline mode of likwid-perfctr to create on-the-fly pictures with the current measurements. It uses the feedGnuplot Perl script to send the current data to gnuplot. In order to make it more convenient for users, preconfigured plots of interesting metrics are embedded into likwid-perfscope. Since the plot windows are normally closed directly after the execution of the monitored applications, likwid-perfscope waits until Ctrl+c is pressed.
-h, --help Help message -v, --version Version information -V, --verbose <level> Verbose output, 0 (only errors), 1 (info), 2 (details), 3 (developer) -a Print all preconfigured plot configurations for the current system. -c <list> Processor ids to measure, e.g. 1,2-4,8 -C <list> Processor ids to pin threads and measure, e.g. 1,2-4,8 -g, --group <string> Preconfigured plot group or custom event set string with plot config. -t, --time <time> Frequency in s, ms or us, e.g. 300ms, for the timeline mode of likwid-perfctr -d, --dump Print output as it is send to feedGnuplot. -p, --plotdump Use dump functionality of feedGnuplot. Plots out plot configurations plus data to directly submit to gnuplot --host <host> Run likwid-perfctr on the selected host using SSH. Evaluation and plotting is done locally. This can be used for machines that have no gnuplot installed. All paths must be similar to the local machine.
The basic usage of likwid-perfscope is to use one of the predefined plot configurations that are embedded into the Lua script. All of them are time resolved, e.g. Mbyte/s or FLOP/s. A list of all plot available for the current architecture can be retrieved with
$ likwid-perfscope -a
which prints on an Intel IvyBridge EP system:
Group NUMA Perfctr group: NUMA Match for metric: Local DRAM bandwidth [MByte/s] Title of plot: NUMA separated memory bandwidth Title of x-axis: Time Title of y-axis: Bandwidth [MBytes/s] Match for second metric: Remote DRAM bandwidth [MByte/s] Title of y2-axis: Bandwidth [MBytes/s] Group MEM_BAND Perfctr group: MEM Match for metric: Memory bandwidth [MBytes/s] Title of plot: Memory bandwidth Title of x-axis: Time Title of y-axis: Bandwidth [MBytes/s] Group FLOPS_DP Perfctr group: FLOPS_DP Match for metric: MFlops/s Title of plot: Double Precision Flop Rate Title of x-axis: Time Title of y-axis: MFlops/s Group L2_BAND Perfctr group: L2 Match for metric: L2 bandwidth [MBytes/s] Title of plot: L2 cache bandwidth Title of x-axis: Time Title of y-axis: Bandwidth [MBytes/s] Group L3_BAND Perfctr group: L3 Match for metric: L3 bandwidth [MBytes/s] Title of plot: L3 cache bandwidth Title of x-axis: Time Title of y-axis: Bandwidth [MBytes/s] Group FLOPS_SP Perfctr group: FLOPS_SP Match for metric: MFlops/s Title of plot: Single Precision Flop Rate Title of x-axis: Time Title of y-axis: MFlops/s Group TEMP Perfctr group: ENERGY Match for metric: Temperature [C] Title of plot: Temperature Title of x-axis: Time Title of y-axis: Temperature [C] Group POWER Perfctr group: ENERGY Match for metric: Power [W] Title of plot: Consumed power Title of x-axis: Time Title of y-axis: Power [W] Match for second metric: Power DRAM [W] Title of y2-axis: Power DRAM [W] Group QPI_BAND Perfctr group: QPI Match for metric: QPI data bandwidth [MByte/s] Title of plot: QPI bandwidth Title of x-axis: Time Title of y-axis: Bandwidth [MBytes/s] Match for second metric: QPI link bandwidth [MByte/s] Title of y2-axis: Bandwidth [MBytes/s]
You can run these groups in a similar manner as with likwid-perfctr like:
$ likwid-perfscope -C S0:0 -g L3_BAND ./a.out
which measures the memory bandwidth on the first CPU of socket 0 and plots it using the title "L3 cache bandwidth", the x-axis has the label "Time" and the y-axis the label "Bandwidth [MBytes/s]". If you execute on multiple CPUs, each CPU gets its own line in the plot.
There are plot configurations, like
POWER that plots two lines per CPU, one for the CPU package power consumption and one for the DRAM power consumption. The DRAM power consumption uses the right y-axis with an own axis label "Power DRAM [W]".
You can increase the number of samples by setting
-t <time> on the command line. The default value is one sample per second.
$ likwid-perfscope -C S0:0 -g L3_BAND -t 500ms ./a.out
Moreover, you can use the group switching functionality of the timeline mode to measure multiple metrics at once:
$ likwid-perfscope -C S0:0 -g L3_BAND -g L2_BAND -g MEM_BAND -t 500ms ./a.out
Each group opens its own plotting window and is updated in a round-robin fashion. Each group is measured
If you want to record the measurements, you can use either
-p. The difference is, that
-d outputs the strings that are send to feedGnuplot. The plot environment (title, labels) is not included. With
-p the dump is made by feedGnuplot which prints the plot environment first and then for each update step the whole data that has been collected.
Output format of
<groupID> <runtime> <value_1_CPU1> (<value_2_CPU1>) (<value_1_CPU2>) (<value_2_CPU2>) ...
Example output of
set grid set xlabel "Time" set ylabel "Bandwidth [MBytes/s]" set title "L3 cache bandwidth" set boxwidth 1 histbin(x) = 1 * floor(0.5 + x/1) set xtics set xrange ["0":] plot '-' title "L3 bandwidth [MBytes/s]" with linespoints 0 0 1.000161322585 48.433210629261 2.000241249986 21.798359943835 3.0003206090227 21.337482595053 4.0004001520114 14.873424079086 5.0004813269837 7.8612681493985 e
You can also perform the measurements on another host using the
$ likwid-perfscope -C S0:0@S1:0 -g POWER --host host1 ./a.out
but all paths need to be similar to the local system, the group must be available on the host and the CPU list valid. This feature is currently experimental.