Skip to content

CPU/GPU performance control library for benchmarking

Notifications You must be signed in to change notification settings

cwpearson/perfect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

perfect

Branch Status
master Build Status

CPU/GPU Performance control library for benchmarking on Linux, x86, POWER, and Nvidia.

Features

  • GPU power/utilization/temperature monitoring (nvidia)
  • Disable CPU turbo (linux)
  • Set OS CPU performance mode to maximum (linux)
  • Set GPU clocks (nvidia)
  • Disable GPU turbo (nvidia)
  • Flush addresses from cache (amd64, POWER)
  • CUDA not required (GPU functions will not be compiled)
  • Flush file system caches (linux)
  • Disable ASLR (linux)
  • process priority interface (linux)

Contributors

Installing

CMake

Ensure you have CMake 3.13+.

Add the source tree to your project and then use add_subdirectory

git submodule add git@github.com:cwpearson/perfect.git thirdparty/perfect

CMakeLists.txt

...
add_subdirectory(thirdparty/perfect)
...
target_link_libraries(your-target perfect)

Without CMake

Download the source AND

  • for compiling with a non-cuda compiler:
    • add the include directory to your includes
    • add nvidia-ml to your link flags
    • add -DPERFECT_HAS_CUDA to your compile definitions
  • with a CUDA compiler, just compile normally (PERFECT_HAS_CUDA is defined for you)
g++ code_using_perfect.cpp -DPERFECT_HAS_CUDA -Iperfect/include -lnvidia-ml 
nvcc code_using_perfect.cu -Iperfect/include -lnvidia-ml

If you don't have CUDA, then you could just do

g++ code_using_perfect.cpp -I perfect/include

Tools Usage

tools/perfect-cli

perfect provides some useful tools on Linux:

$ tools/perfect-cli -h
SYNOPSIS
        ./tools/perfect-cli --no-mod [-n <INT>] -- <cmd>...
        ./tools/perfect-cli ([-u <INT>] | [-s <INT>]) [--no-drop-cache] [--no-max-perf] [--aslr]
                            [--cpu-turbo] [--stdout <PATH>] [--stderr <PATH>] [-n <INT>] -- <cmd>...

OPTIONS
        --no-mod               don't control performance
        -u                     number of unshielded CPUs
        -s                     number of shielded CPUs
        --no-drop-cache        do not drop filesystem caches
        --no-max-perf          do not max os perf
        --aslr                 enable ASLR
        --cpu-turbo            enable CPU turbo
        --stdout               redirect child stdout
        --stderr               redirect child stderr
        -n                     run multiple times

The basic usage is tools/perfect-cli -- my-exe, which will attempt to configure the system for repeatable performance before executing my-exe, and then restore the system to the original performance state before exiting. Most modifications require elevated privileges. The default behavior is to:

  • disable ASLR
  • set CPU performance to maximum
  • disable CPU turbo
  • drop filesystem caches before each iteration

Some options (all should provided before the -- option):

  • --no-mod flag will cause perfect-cli to not modify the system performance state
  • -n INT will run the requested program INT times.
  • --stderr/--stdout will redirect the program-under-test's stderr and stdout to the provided paths.
  • -s/-u: set the number of shielded /unshielded CPUs. The program-under-test will run on the shielded CPUs. All other tasks will run on the unshielded CPUs.

A common invocation might look like:

sudo tools/perfect-cli -n 5 --stderr=run.err --stdout=run.out -- ./my-benchmark

This will disable ASLR, set CPU performance to maximum, disable CPU turbo, and then run ./my-benchmark 5 times after dropping the filesystem cache before each run, redirecting stdout/stderr of ./my-benchmark to run.out/run.err. The owner of run.out and run.err will be set to whichever user called sudo.

tools/addr

Print the address of main, a stack variable, and a heap variable. Useful for demoing ASLR.

tools/no-aslr

Disable ASLR on the provided execution.

With ASLR, addresses are different with each invocation

$ tools/addr
main:  94685074364704
stack: 140734279743492
heap:  94685084978800
$ tools/addr
main:  93891046344992
stack: 140722671706708
heap:  93891068624496

Without ASLR, addresses are the same in each invocation

$ tools/no-aslr tools/addrs       
main:  93824992233760
stack: 140737488347460
heap:  93824994414192
$ tools/no-aslr tools/addrs       
main:  93824992233760
stack: 140737488347460
heap:  93824994414192

API Usage

The perfect functions all return a perfect::Result, which is defined in [include/perfect/result.hpp]. When things are working, it will be perfect::Result::SUCCESS. A PERFECT macro is also defined, which will terminate with an error message unless the perfect::Result is perfect::Result::SUCCESS.

perfect::CpuTurboState state;
PERFECT(perfect::get_cpu_turbo_state(&state));

High Priority

perfect can set high scheduling priority for a process

See examples/high_priority.cpp

#include "perfect/priority.hpp"
  • Result set_high_priority(): set the highest possible scheduling priority for the calling process

Monitoring

perfect can monitor and record GPU activity.

See examples/gpu_monitor.cu

#include "perfect/gpu_monitor.hpp"
  • Monitor(std::ostream *stream): create a monitor that will write to stream.
  • void Monitor::start(): start the monitor
  • void Monitor::stop(): terminate the monitor
  • void Monitor::pause(): pause the monitor thread
  • void Monitor::resume(): resume the monitor thread

Disable ASLR

perfect can disable ASLR

See tools/no_aslr.cpp

#include "perfect/aslr.hpp"
  • Result disable_aslr(): disable ASLR
  • Result get_aslr(AslrState &state): save the current ASLR state
  • Result set_aslr(const AslrState &state): set a previously-saved ASLR state

Flush file system caches

perfect can drop various filesystem caches

See tools/sync_drop_caches.cpp

#include "perfect/drop_caches.hpp"
  • Result sync(): flush filesystem caches to disk
  • Result drop_caches(DropCaches_t mode = DropCaches_t(PAGECACHE | ENTRIES)): remove file system caches
    • mode = PAGECACHE: drop page caches
    • mode = ENTRIES: drop dentries and inodes
    • mode = PAGECACHE | ENTRIES: both

CPU Turbo

perfect can enable and disable CPU boost through the Intel p-state mechanism or the ACPI cpufreq mechanism.

See examples/cpu_turbo.cpp.

#include "perfect/cpu_turbo.hpp"
  • Result get_cpu_turbo_state(CpuTurboState *state): save the current CPU turbo state
  • Result set_cpu_turbo_state(CpuTurboState *state): restore a saved CPU turbo state
  • Result disable_cpu_turbo(): disable CPU turbo
  • Result enable_cpu_turbo(): enable CPU turbo
  • bool is_turbo_enabled(CpuTurboState state): check if turbo is enabled

OS Performance

perfect can control the OS governor on linux.

See examples/os_perf.cpp.

#include "perfect/os_perf.hpp"
  • Result get_os_perf_state(OsPerfState &state): Save the current OS governor mode for all CPUs.
  • Result os_perf_state_maximum(const int cpu): Set the OS governor to it's maximum performance mode.
  • Result set_os_perf_state(OsPerfState state): Restore a previously-saved OS governor mode.

GPU Turbo

perfect can enable/disable GPU turbo boost.

See examples/gpu_turbo.cu.

#include "perfect/gpu_turbo.hpp"
  • Result get_gpu_turbo_state(GpuTurboState *state, unsigned int idx): Get the current turbo state for GPU idx, useful to restore later.
  • bool is_turbo_enabled(GpuTurboState state): Check if turbo is enabled.
  • Result set_gpu_turbo_state(GpuTurboState state, unsigned int idx): Set a previously saved turbo state.
  • Result disable_gpu_turbo(unsigned int idx): Disable GPU idx turbo.
  • Result enable_gpu_turbo(unsigned int idx): Enable GPU idx turbo.

GPU Clocks

perfect can lock GPU clocks to their maximum values.

See examples/gpu_clocks.cu.

#include "perfect/gpu_clocks.hpp"
  • Result set_max_gpu_clocks(unsigned int idx): Set GPU idx clocks to their maximum reported values.
  • Result reset_gpu_clocks(unsigned int idx): Unset GPU idx clocks.

CPU Cache

perfect can flush data from CPU caches. Unlike the other APIs, these do not return a Result because they do not fail.

See examples/cpu_cache.cpp.

#include "perfect/cpu_cache.hpp"
  • void flush_all(void *p, const size_t n): Flush all cache lines starting at p for n bytes.

Changelog

  • v0.5.0
    • add tools/stress
    • add tools/max-os-perf
    • add tools/min-os-perf
    • add tools/enable-cpu-turbo
    • add tools/disable-cpu-turbo
  • v0.4.0
    • Add ASLR interface
    • Disambiguate some filesystem errors
    • Fix some powerpc namespace issues
  • v0.3.0
    • Add filesystem cache interface
  • v0.2.0
    • add GPU monitoring
    • Make CUDA optional
  • v0.1.0
    • cache control
    • Intel P-State control
    • linux governor control
    • POWER cpufreq control
    • Nvidia GPU boost control
    • Nvidia GPU clock control

Wish List

  • only monitor certain GPUs
  • hyperthreading interface

Related

Acks