perfect

Branch	Status
master

CPU/GPU Performance control library for benchmarking on Linux, x86, POWER, and Nvidia.

Features

Contributors

Carl Pearson

Installing

CMake

Ensure you have CMake 3.13+.

Add the source tree to your project and then use add_subdirectory

git submodule add git@github.com:cwpearson/perfect.git thirdparty/perfect

CMakeLists.txt

...
add_subdirectory(thirdparty/perfect)
...
target_link_libraries(your-target perfect)

Without CMake

Download the source AND

for compiling with a non-cuda compiler:
- add the include directory to your includes
- add nvidia-ml to your link flags
- add -DPERFECT_HAS_CUDA to your compile definitions
with a CUDA compiler, just compile normally (PERFECT_HAS_CUDA is defined for you)

g++ code_using_perfect.cpp -DPERFECT_HAS_CUDA -Iperfect/include -lnvidia-ml 
nvcc code_using_perfect.cu -Iperfect/include -lnvidia-ml

If you don't have CUDA, then you could just do

g++ code_using_perfect.cpp -I perfect/include

Tools Usage

tools/perfect-cli

perfect provides some useful tools on Linux:

$ tools/perfect-cli -h
SYNOPSIS
        ./tools/perfect-cli --no-mod [-n <INT>] -- <cmd>...
        ./tools/perfect-cli ([-u <INT>] | [-s <INT>]) [--no-drop-cache] [--no-max-perf] [--aslr]
                            [--cpu-turbo] [--stdout <PATH>] [--stderr <PATH>] [-n <INT>] -- <cmd>...

OPTIONS
        --no-mod               don't control performance
        -u                     number of unshielded CPUs
        -s                     number of shielded CPUs
        --no-drop-cache        do not drop filesystem caches
        --no-max-perf          do not max os perf
        --aslr                 enable ASLR
        --cpu-turbo            enable CPU turbo
        --stdout               redirect child stdout
        --stderr               redirect child stderr
        -n                     run multiple times

The basic usage is tools/perfect-cli -- my-exe, which will attempt to configure the system for repeatable performance before executing my-exe, and then restore the system to the original performance state before exiting. Most modifications require elevated privileges. The default behavior is to:

disable ASLR
set CPU performance to maximum
disable CPU turbo
drop filesystem caches before each iteration

Some options (all should provided before the -- option):

--no-mod flag will cause perfect-cli to not modify the system performance state
-n INT will run the requested program INT times.
--stderr/--stdout will redirect the program-under-test's stderr and stdout to the provided paths.
-s/-u: set the number of shielded /unshielded CPUs. The program-under-test will run on the shielded CPUs. All other tasks will run on the unshielded CPUs.

A common invocation might look like:

sudo tools/perfect-cli -n 5 --stderr=run.err --stdout=run.out -- ./my-benchmark

This will disable ASLR, set CPU performance to maximum, disable CPU turbo, and then run ./my-benchmark 5 times after dropping the filesystem cache before each run, redirecting stdout/stderr of ./my-benchmark to run.out/run.err. The owner of run.out and run.err will be set to whichever user called sudo.

tools/addr

Print the address of main, a stack variable, and a heap variable. Useful for demoing ASLR.

tools/no-aslr

Disable ASLR on the provided execution.

With ASLR, addresses are different with each invocation

$ tools/addr
main:  94685074364704
stack: 140734279743492
heap:  94685084978800
$ tools/addr
main:  93891046344992
stack: 140722671706708
heap:  93891068624496

Without ASLR, addresses are the same in each invocation

$ tools/no-aslr tools/addrs       
main:  93824992233760
stack: 140737488347460
heap:  93824994414192
$ tools/no-aslr tools/addrs       
main:  93824992233760
stack: 140737488347460
heap:  93824994414192

API Usage

The perfect functions all return a perfect::Result, which is defined in [include/perfect/result.hpp]. When things are working, it will be perfect::Result::SUCCESS. A PERFECT macro is also defined, which will terminate with an error message unless the perfect::Result is perfect::Result::SUCCESS.

perfect::CpuTurboState state;
PERFECT(perfect::get_cpu_turbo_state(&state));

High Priority

perfect can set high scheduling priority for a process

See examples/high_priority.cpp

#include "perfect/priority.hpp"

Result set_high_priority(): set the highest possible scheduling priority for the calling process

Monitoring

perfect can monitor and record GPU activity.

See examples/gpu_monitor.cu

#include "perfect/gpu_monitor.hpp"

Monitor(std::ostream *stream): create a monitor that will write to stream.
void Monitor::start(): start the monitor
void Monitor::stop(): terminate the monitor
void Monitor::pause(): pause the monitor thread
void Monitor::resume(): resume the monitor thread

Disable ASLR

perfect can disable ASLR

See tools/no_aslr.cpp

#include "perfect/aslr.hpp"

Result disable_aslr(): disable ASLR
Result get_aslr(AslrState &state): save the current ASLR state
Result set_aslr(const AslrState &state): set a previously-saved ASLR state

Flush file system caches

perfect can drop various filesystem caches

See tools/sync_drop_caches.cpp

#include "perfect/drop_caches.hpp"

Result sync(): flush filesystem caches to disk
Result drop_caches(DropCaches_t mode = DropCaches_t(PAGECACHE | ENTRIES)): remove file system caches
- mode = PAGECACHE: drop page caches
- mode = ENTRIES: drop dentries and inodes
- mode = PAGECACHE | ENTRIES: both

CPU Turbo

perfect can enable and disable CPU boost through the Intel p-state mechanism or the ACPI cpufreq mechanism.

See examples/cpu_turbo.cpp.

#include "perfect/cpu_turbo.hpp"

Result get_cpu_turbo_state(CpuTurboState *state): save the current CPU turbo state
Result set_cpu_turbo_state(CpuTurboState *state): restore a saved CPU turbo state
Result disable_cpu_turbo(): disable CPU turbo
Result enable_cpu_turbo(): enable CPU turbo
bool is_turbo_enabled(CpuTurboState state): check if turbo is enabled

OS Performance

perfect can control the OS governor on linux.

See examples/os_perf.cpp.

#include "perfect/os_perf.hpp"

Result get_os_perf_state(OsPerfState &state): Save the current OS governor mode for all CPUs.
Result os_perf_state_maximum(const int cpu): Set the OS governor to it's maximum performance mode.
Result set_os_perf_state(OsPerfState state): Restore a previously-saved OS governor mode.

GPU Turbo

perfect can enable/disable GPU turbo boost.

See examples/gpu_turbo.cu.

#include "perfect/gpu_turbo.hpp"

Result get_gpu_turbo_state(GpuTurboState *state, unsigned int idx): Get the current turbo state for GPU idx, useful to restore later.
bool is_turbo_enabled(GpuTurboState state): Check if turbo is enabled.
Result set_gpu_turbo_state(GpuTurboState state, unsigned int idx): Set a previously saved turbo state.
Result disable_gpu_turbo(unsigned int idx): Disable GPU idx turbo.
Result enable_gpu_turbo(unsigned int idx): Enable GPU idx turbo.

GPU Clocks

perfect can lock GPU clocks to their maximum values.

See examples/gpu_clocks.cu.

#include "perfect/gpu_clocks.hpp"

Result set_max_gpu_clocks(unsigned int idx): Set GPU idx clocks to their maximum reported values.
Result reset_gpu_clocks(unsigned int idx): Unset GPU idx clocks.

CPU Cache

perfect can flush data from CPU caches. Unlike the other APIs, these do not return a Result because they do not fail.

See examples/cpu_cache.cpp.

#include "perfect/cpu_cache.hpp"

void flush_all(void *p, const size_t n): Flush all cache lines starting at p for n bytes.

Changelog

v0.5.0
- add tools/stress
- add tools/max-os-perf
- add tools/min-os-perf
- add tools/enable-cpu-turbo
- add tools/disable-cpu-turbo
v0.4.0
- Add ASLR interface
- Disambiguate some filesystem errors
- Fix some powerpc namespace issues
v0.3.0
- Add filesystem cache interface
v0.2.0
- add GPU monitoring
- Make CUDA optional
v0.1.0
- cache control
- Intel P-State control
- linux governor control
- POWER cpufreq control
- Nvidia GPU boost control
- Nvidia GPU clock control

Wish List

only monitor certain GPUs
hyperthreading interface

LLVM benchmarking instructions covering ASLR, Linux governor, cpuset shielding, SMT, and Intel turbo.
easyperf.net blog post discussing ACPI/Intel turbo, SMT, Linux governor, CPU affinity, process priority, file system caches, and ASLR.
parttimenerd/temci benchmarking tool for cpu sheilding and disabling hyperthreading, among other things.
aclements/perflock tool for locking CPU frequency scaling domains
lpechacek/cpuset python package/tool for managing CPU shielding

Acks

Uses muellan/clipp for cli option parsing.
Uses martinmoene/optional-lite.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
examples		examples
include/perfect		include/perfect
tools		tools
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

perfect

Features

Contributors

Installing

CMake

Without CMake

Tools Usage

tools/perfect-cli

tools/addr

tools/no-aslr

API Usage

High Priority

Monitoring

Disable ASLR

Flush file system caches

CPU Turbo

OS Performance

GPU Turbo

GPU Clocks

CPU Cache

Changelog

Wish List

Related

Acks

About

Releases

Packages

Languages

cwpearson/perfect

Folders and files

Latest commit

History

Repository files navigation

perfect

Features

Contributors

Installing

CMake

Without CMake

Tools Usage

tools/perfect-cli

tools/addr

tools/no-aslr

API Usage

High Priority

Monitoring

Disable ASLR

Flush file system caches

CPU Turbo

OS Performance

GPU Turbo

GPU Clocks

CPU Cache

Changelog

Wish List

Related

Acks

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages