-
Notifications
You must be signed in to change notification settings - Fork 229
TestAccuracy
Data from hardware performance counters seem to offer a complete, valid and reliable view of the operations done at hardware level but is the data really complete, valid and reliable? The LIKWID team uses hardware counters for a quite some time and we have seen events over- or undercounting as well as many accurate ones.
In order to compare the measured data with calculated ones, an application is needed that has the following features:
- Parseable output of the interested metric
- Known instruction stream (to specify valid scaling factor and to predict results)
- Easy to instrument using LIKWID's Marker API
An application that offers all the above points is likwid-bench, because for its assembly benchmarks we can calculate the performed floating-point operations and the consumed data exactly. Moreover, likwid-bench can be easily instrumented with the Likwid Marker API. Nevertheless, likwid-bench currently offers only streaming benchmarks, hence not all interested metrics can be covered like cache access ratios or energy consumption.
The accuracy tool included into the LIKWID suite is written in Python and compares the calculated metric results of likwid-bench with the measured and derived ones of likwid-perfctr. For some tests, likwid-bench does not calculate and print the appropriate metrics, like 'Instructions per branch', but they are commonly constant, hence we can define the result in the test input files.
The accuracy tool can be found in the LIKWID sources in the folder test/accuracy
and all following paths are relative to this one. The test runs are defined by the files in the TESTS
folder. An example definition looks like this:
REGEX_BENCH MByte\/s:\s+([0-9]+)
REGEX_PERF \|\s+L2 bandwidth \[MBytes\/s\]\s+\|\s+([0-9\.e\+\-]+)
TEST load
RUNS 5
WA_FACTOR 1.0
VARIANT 12kB 20000
VARIANT 1MB 10000
VARIANT 4MB 7500
VARIANT 1GB 50
TEST xxx
[...]
The REGEX_BENCH
is used to parse the data from likwid-bench and REGEX_PERF
for the output of likwid-perfctr. After an empty line, the test blocks can be listed. The string after TEST
defines the benchmark kernel used for likwid-bench. How often each data size should be tested can be defined at RUNS
. The WA_FACTOR
is required to scale the output of likwid-bench in order to correct the results to take write-allocate traffic into account. Finally, there are multiple lines with VARIANT size iterations
. It is recommended to use selected sizes to see the influence of the CPU caches. The iteration defintion is not needed anymore, because starting with version 4.0.0 of LIKWID, likwid-bench determines a suitable iteration count itself to output reliable results.
Which test groups should be performed can be defined in the file SETS.txt
. Each line in the file specifies one test file without the suffix .txt
, or it can be set on command line using the -s/--sets ≤comma-separated list≥
option.
The tool has some command line options to display the comparison:
Option | Comment |
---|---|
--grace | Write an input file for Xmgrace (PNG) |
--gnuplot | Write an imput file for gnuplot (JPG) |
--pgf | Write an input file for PGFPlots (PDF) |
--script | Write a script to results directory creating all images |
--scriptname | Specify the filename for the script file |
The results of an accuracy run are stored at RESULTS/≤hostname≥
. The output of all runs are stored in .raw
files. The input files for plotting are named .dat
, where the plain
files are the results of likwid-bench, the marker
files for likwid-perfctr and the correct
files are the scaled results in the plain
files using the WA_FACTOR
.
Depending on the command line options, there are also .plot
files for gnuplot, .agr
files for Xmgrace and .tex
for PGFPlots. In order to allow all plotting tools simultaneously, each tool uses another output format, noted in the above table. Finally, the script file to create all images is there. The default filename for the script is create_plots.sh
. Each plotting backend provides more or less details of the tests.
At first, likwid-bench must be compiled and copied to the local folder. This can be done easily by calling
make
in the base folder (test/accuracy
) of the accuracy test tool. It compiles the likwid-bench with and without instrumentation and copies the executables to the current folder. The accuracy tool uses the likwid-perfctr executable in the current source tree, not the maybe locally installed one, hence the path to the access daemon must be set in config.mk
before running make
in the accuracy tool folder. You can use likwid-perfctr from your system by changing the variable perfctr
in the default settings of the accuracy script likwid-accuracy.py
After setting up the executables, start the test runs with gnuplot backend.
./likwid-accuracy --gnuplot --script
It prints the current group and test name. Each iteration is shown with a *
. In the end, go to the results folder and create the plots.
cd RESULTS/$(hostname -s)
./create_plots.sh
Since we are working in a computing center, we have a wide range of microarchitectures inhouse. We tested the accuracy of most architectures. Here is a list of all tested architectures with a link to their accuracy results.
- Intel Core 2
- Intel Westmere
- Intel Westmere EX
- Intel SandyBridge EP
- Intel IvyBridge EP
- Intel Haswell
- Intel Haswell EP
- Intel Broadwell EP
- Intel Skylake
- Intel Xeon Phi (Knights Landing)
- Intel Skylake X
- Intel Icelake X
In the moment, the accuracy tool is fixed to single-threaded likwid-bench, it would be nice to allow different benchmark applications and to use multiple threads. Moreover, other hardware performance counter tools could be integrated like PAPI or perf_event to see whether they do a more accurate job. There are already some parts that are extended to use PAPI but there is no PAPI integration in likwid-bench.
-
Applications
-
Config files
-
Daemons
-
Architectures
- Available counter options
- AMD
- Intel
- Intel Atom
- Intel Pentium M
- Intel Core2
- Intel Nehalem
- Intel NehalemEX
- Intel Westmere
- Intel WestmereEX
- Intel Xeon Phi (KNC)
- Intel Silvermont & Airmont
- Intel Goldmont
- Intel SandyBridge
- Intel SandyBridge EP/EN
- Intel IvyBridge
- Intel IvyBridge EP/EN/EX
- Intel Haswell
- Intel Haswell EP/EN/EX
- Intel Broadwell
- Intel Broadwell D
- Intel Broadwell EP
- Intel Skylake
- Intel Coffeelake
- Intel Kabylake
- Intel Xeon Phi (KNL)
- Intel Skylake X
- Intel Cascadelake SP/AP
- Intel Tigerlake
- Intel Icelake
- Intel Icelake X
- Intel SappireRapids
- Intel GraniteRapids
- Intel SierraForrest
- ARM
- POWER
-
Tutorials
-
Miscellaneous
-
Contributing