Test the accuracy of some derived metrics
Data from hardware performance counters seem to offer a complete, valid and reliable view of the operations done at hardware level but is the data really complete, valid and reliable? The LIKWID team uses hardware counters for a quite some time and we have seen events over- or undercounting as well as many accurate ones.
In order to compare the measured data with calculated ones, an application is needed that has the following features:
- Parseable output of the interested metric
- Known instruction stream (to specify valid scaling factor and to predict results)
- Easy to instrument using LIKWID's Marker API
An application that offers all the above points is likwid-bench, because for its assembly benchmarks we can calculate the performed floating-point operations and the consumed data exactly. Moreover, likwid-bench can be easily instrumented with the Likwid Marker API. Nevertheless, likwid-bench currently offers only streaming benchmarks, hence not all interested metrics can be covered like cache access ratios or energy consumption.
Accuracy test tool
The accuracy tool included into the LIKWID suite is written in Python and compares the calculated metric results of likwid-bench with the measured and derived ones of likwid-perfctr. For some tests, likwid-bench does not calculate and print the appropriate metrics, like 'Instructions per branch', but they are commonly constant, hence we can define the result in the test input files.
The accuracy tool can be found in the LIKWID sources in the folder
test/accuracy and all following paths are relative to this one. The test runs are defined by the files in the
TESTS folder. An example definition looks like this:
REGEX_BENCH MByte\/s:\s+([0-9]+) REGEX_PERF \|\s+L2 bandwidth \[MBytes\/s\]\s+\|\s+([0-9\.e\+\-]+) TEST load RUNS 5 WA_FACTOR 1.0 VARIANT 12kB 20000 VARIANT 1MB 10000 VARIANT 4MB 7500 VARIANT 1GB 50 TEST xxx [...]
REGEX_BENCH is used to parse the data from likwid-bench and
REGEX_PERF for the output of likwid-perfctr. After an empty line, the test blocks can be listed. The string after
TEST defines the benchmark kernel used for likwid-bench. How often each data size should be tested can be defined at
WA_FACTOR is required to scale the output of likwid-bench in order to correct the results to take write-allocate traffic into account. Finally, there are multiple lines with
VARIANT size iterations. It is recommended to use selected sizes to see the influence of the CPU caches. The iteration defintion is not needed anymore, because starting with version 4.0.0 of LIKWID, likwid-bench determines a suitable iteration count itself to output reliable results.
Which test groups should be performed can be defined in the file
SETS.txt. Each line in the file specifies one test file without the suffix
.txt, or it can be set on command line using the
-s/--sets ≤comma-separated list≥ option.
The tool has some command line options to display the comparison:
|--grace||Write an input file for Xmgrace (PNG)|
|--gnuplot||Write an imput file for gnuplot (JPG)|
|--pgf||Write an input file for PGFPlots (PDF)|
|--script||Write a script to results directory creating all images|
|--scriptname||Specify the filename for the script file|
The results of an accuracy run are stored at
RESULTS/≤hostname≥. The output of all runs are stored in
.raw files. The input files for plotting are named
.dat, where the
plain files are the results of likwid-bench, the
marker files for likwid-perfctr and the
correct files are the scaled results in the
plain files using the
Depending on the command line options, there are also
.plot files for gnuplot,
.agr files for Xmgrace and
.tex for PGFPlots. In order to allow all plotting tools simultaneously, each tool uses another output format, noted in the above table. Finally, the script file to create all images is there. The default filename for the script is
create_plots.sh. Each plotting backend provides more or less details of the tests.
Running accuracy tests
At first, likwid-bench must be compiled and copied to the local folder. This can be done easily by calling
in the base folder (
test/accuracy) of the accuracy test tool. It compiles the likwid-bench with and without instrumentation and copies the executables to the current folder. The accuracy tool uses the likwid-perfctr executable in the current source tree, not the maybe locally installed one, hence the path to the access daemon must be set in
config.mk before running
make in the accuracy tool folder. You can use likwid-perfctr from your system by changing the variable
perfctr in the default settings of the accuracy script
After setting up the executables, start the test runs with gnuplot backend.
./likwid-accuracy --gnuplot --script
It prints the current group and test name. Each iteration is shown with a
*. In the end, go to the results folder and create the plots.
cd RESULTS/$(hostname -s) ./create_plots.sh
Since we are working in a computing center, we have a wide range of microarchitectures inhouse. We tested the accuracy of most architectures. Here is a list of all tested architectures with a link to their accuracy results.
- Intel Core 2
- Intel Westmere
- Intel Westmere EX
- Intel SandyBridge EP
- Intel IvyBridge EP
- Intel Haswell
- Intel Haswell EP
- Intel Broadwell EP
- Intel Skylake
- Intel Xeon Phi (Knights Landing)
- Intel Skylake X
Problems and ideas
In the moment, the accuracy tool is fixed to single-threaded likwid-bench, it would be nice to allow different benchmark applications and to use multiple threads. Moreover, other hardware performance counter tools could be integrated like PAPI or perf_event to see whether they do a more accurate job. There are already some parts that are extended to use PAPI but there is no PAPI integration in likwid-bench.