Work in progress.
- skidtest.c: in a loop, this calls a memory read followed by a NOP "runway" of thousands of no-operations. The read can then be instrumented with PMCs to capture the instruction pointer, which often falls on the NOP runway (which themselves will not cause a read, other than initially loading into the instruction cache). The offset seen in the NOP runway shows the magnitude of skid.
gcc -O0 -o skidtest skidtest.c
For example, every 1000 LLC-miss on Intel using Linux perf:
perf record -vv -e r412e -c 1000 ./skidtest 1000000
Check the verbose output (-vv) to see if precise_ip (PEBS) was auto-enabled or not (to measure the baseline skid, you want this off).
Choose a size greater than the LLC cache to induce misses.
Various ways to post-process the perf capture. Each of these uses -F to customize the perf script output (older versions, -f), however, on newer kernels the perf script output is sufficient by default (has symoff).
perf script --header -F comm,pid,tid,time,event,ip,sym,symoff,dso |\
awk '/noprunway/ { skid++ } /memreader/ { hit++ } END { printf "hits %d, skid %d\n", hit, skid }'
perf script --header -F comm,pid,tid,time,event,ip,sym,symoff,dso |\
awk '/noprunway/ { sub(/noprunway\+/, "", $6); print $6 }' | perl -ne 'print hex($_) . "\n"' | sort -n
This can also be saved to a file, and used as input for skid.r plotting. Sample:
That's excluding hits, although in this case it was over 99% skids (hits 131, skid 152565).
perf script --header -F comm,pid,tid,time,event,ip,sym,symoff,dso |\
awk '/noprunway/ { sub(/noprunway\+/, "", $6); print $6 }' | perl -e 'while (<>) { $idx = int(hex($_)/10); @a[$idx]++; $m = $idx if $idx > $m; } for ($i = 0; $i < $m; $i++) { $a[$i] += 0; print $i * 10 . " " . $a[$i] . "\n"; }'