A lightweight, low-overhead sampling profiler for Linux applications. This tool leverages the perf_event_open system call to sample the instruction pointer (IP) of a child process at high frequency, aggregating those samples into a human-readable report showing the percentage of time spent in each function.
- Non-Intrusive Profiling: Uses hardware performance counters (via
perf_event_open) to sample the CPU without modifying the target binary. - Symbol Resolution: Automatically maps instruction addresses to function names using
dladdr. - Aggregation by Function: Groups multiple instruction pointers belonging to the same function to show total function-level impact.
- Percentage-Based Reporting: Clear output showing exactly which functions are the hottest bottlenecks.
- Minimal Overhead: Uses a ring-buffer shared memory (
mmap) approach to minimize context switches between the kernel and the profiler.
The profiler operates in two main phases:
-
Sampling Phase:
- The profiler forks a child process and executes the target binary.
- It configures a
PERF_COUNT_HW_INSTRUCTIONSevent with a specificsample_period. - The Linux kernel periodically writes the current Instruction Pointer (IP) into a ring-buffer shared via
mmap. - The profiler reads these samples asynchronously while the child runs.
-
Reporting Phase:
- Samples are stored in a custom Linear Probing Hash Map for efficient counting.
- The profiler uses
dladdr(Dynamic Linker API) to translate raw memory addresses into function symbols. - It aggregates all instructions belonging to the same symbol and calculates the percentage of total execution time.
- Linux Kernel: Requires
perf_eventssupport (standard on most modern distros). - Permissions: Linux restricts access to performance counters by default. You may need to run:
(Set to -1 for most permissive, 1 for user-level profiling only.)
sudo sysctl -w kernel.perf_event_paranoid=1
-
Clone the repository:
git clone https://github.com/KarthikeyaAnna/SamplingProfiler.git cd SamplingProfiler -
Build the project: The provided
Makefilehandles the specific flags required for symbol resolution.make
To profile a program, simply pass its path and arguments to the profiler:
./profiler ./test_targetFor the profiler to resolve symbols correctly, your target programs should be compiled with:
-g: Include debug information.-rdynamic: Export symbols to the dynamic symbol table (critical fordladdr).-no-pie: Disable Position Independent Execution to ensure address consistency.
Test program started (pid=452607)
Done. sink=130049999695000000
Function Symbol / Address Samples Percentage
--------------------------------------------------------------------------
my_expensive_loop 29602 78.54%
compute_hash 6124 16.24%
main 1955 5.19%
_start 10 0.03%
| File | Description |
|---|---|
sampling_profiler.c |
Core logic: process forking, perf_event_open setup, and ring-buffer processing. |
ip_hashmap.c |
Implementation of the fixed-size hash map for storing IP counts and symbol aggregation. |
ip_hashmap.h |
Header definitions for the hash map and reporting functions. |
test_code.c |
A sample target program used to verify profiler accuracy. |
Makefile |
Build automation with correct linker flags. |
- User-Space Only: This implementation currently excludes kernel and hypervisor samples (
exclude_kernel = 1). - Static Buffer: The sample storage is currently limited by
TABLE_SIZE. For extremely long runs, a dynamic resizing hash map or periodic flushing would be required. - PIE Support: Profiling Position Independent Executables (PIE) may require additional offset calculation via
/proc/pid/maps.
This project is open-source and available under the MIT License.