Skip to content
HW interface for memory caches
C Python C++ Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
config
include
src
tool
.gitignore
Makefile Initial commit Dec 17, 2019
README.md

README.md

CacheQuery

A tool for interacting with hardware memory caches in modern Intel CPUs.

  • Linux Kernel module: generate non-interfering x86 code of arbitrary memory access sequences automatically profiled.
  • Low noise environment: disable hardware prefetchers, hyperthreading, frequency scaling, etc.
  • Support for TSC, core cycle (default), and performance counters (L3, L2, and L1, misses) (see config/settings.h or /sys/kernel/cachequery/config/[use_pmc|core_cycles]/val booleans).
  • Sysfs at /sys/kernel/cachequery/<level>/<set>/run accepts queries of logical blocks produced by the fronted and returns sequence of hits and misses for the target cache set and level. Note that <set> is ((index << slice_bits) | slice).
  • tool/cachequery.py provides a high-level interface with a REPL environment.

Run

The following command runs a single MemBlockLang (MBL) query against L3's set 33:

$ cd tool/
$ ./cachequery.py -l l3 -s 33 @ M _?
(L3:33) r @ M _?
0 1 2 3 4 5 6 7 8 9 10 11 12 0? -> 0
0 1 2 3 4 5 6 7 8 9 10 11 12 1? -> 100
0 1 2 3 4 5 6 7 8 9 10 11 12 2? -> 100
0 1 2 3 4 5 6 7 8 9 10 11 12 3? -> 100
0 1 2 3 4 5 6 7 8 9 10 11 12 4? -> 100
0 1 2 3 4 5 6 7 8 9 10 11 12 5? -> 100
0 1 2 3 4 5 6 7 8 9 10 11 12 6? -> 100
0 1 2 3 4 5 6 7 8 9 10 11 12 7? -> 100
0 1 2 3 4 5 6 7 8 9 10 11 12 8? -> 100
0 1 2 3 4 5 6 7 8 9 10 11 12 9? -> 100
0 1 2 3 4 5 6 7 8 9 10 11 12 10? -> 100
0 1 2 3 4 5 6 7 8 9 10 11 12 11? -> 100

Example of a 12-ways L3 cache set, where the LRU block is evicted by M. Output value is the number of measured HITs (change number of repetitions as you like in config/settings.h or /sys/kernel/cachequery/config/num_repetitions/val).

Install

Tested on Linux kernel >= 4.9.x branches.

Modify config/settings.h as required and select the specific architecture. Some settings can be dynamically modified later on via /sys/kernel/cachequery/config/.

If no timing thresholds are given it will automatically compute some, but calibration takes time and is done on each execution.

(WARNING: The code is unstable and it can crash your system. Use it under your own risk.)

$ make cpu=iX-yyyy
$ make install

Current support for i7-4790, i5-6500 (default), and i7-8550u. Add header file in config/ and build with corresponding make cpu=iX-yyyy.

Dependencies

Lark parser: pip3 install lark-parser

LevelDB + Plyvel: https://plyvel.readthedocs.io/en/latest/installation.html

Uninstall

$ make uninstall

Help:

$ ./cachequery.py -h

    [!] ./cachequery [options] <query>

    Options:
        -h --help
        -i --interactive
        -v --verbose

        -c --config=filename    path to filename with config (default: 'cachequery.ini')
        -b --batch              path to filename with list of commands
        -o --output             path to output file for session log

        -l --level              target cache level: L3|L2|L1
        -s --set                target cache set number

By default it loads tool/cachequery.ini configuration file.

Current support for i7-4790, i5-6500 (default), and i7-8550u. Add header file in config/ and build with corresponding make cpu=iX-yyyy.

Uninstall

$ make uninstall

MemBlockLang

Simple language to facilitate manual writing of cache queries.

A query is a sequence of one or more memory operations. Each memory operation is specified as a block (represented by arbitary identifiers), and it is decorated with an optional tag (? for profiling, or ! for flushing, no tag means just access).

MBL features several macros:

  • Expansion macro @, that produces a sequence of associativity many different blocks in increasing orders. For example, for associativity 8, @ expands to a b c d e f g h.
  • A wildcard macro _, that produces associativity many different queries, each one consisting of a different block. For example, for associativity 8, _ expands to the set of single-block queries a, b, c, d, e, f, g, h.
  • Concatenation of queries is implicit.
  • An extension macro, s1 [s2] that takes as input queries s1 and s2 and creates |s2| copies of s1 extending each of them with a different element of s2. For example, (a b c d)[e f] expands to a b c d e, a b c d f.
  • A power operator, (s1)N that repeats a query n times. For example, (a b c)3 expands to a b c a b c a b c.
  • A tag over (s1) or [s1] applies to every block. For example, (a b)? expands to a? b?.

Extensions:

  • A single ! without a preceding block executes wbinvd.

Reduce system's noise

Install msr-tool and acpi-cpufreq and load the modules with modprobe.

Set options to True in tool/cachequery.ini to load modules and enable/disable noise by default.

Disable multi-core and hyperthreading

Disable: echo 0 | sudo tee /sys/devices/system/cpu/cpu*/online

Enable: echo 1 | sudo tee /sys/devices/system/cpu/cpu*/online

HW prefetching

Disable: wrmsr -a 0x1a4 15

Enable: wrmsr -a 0x1a4 0

Turbo Boost

Disable: wrmsr -a 0x1a0 0x4000850089

Enable: wrmsr -a 0x1a0 0x850089

Disable frequency scaling

Recommended when using RDTSC.

Disable: sudo cpupower frequency-set -d 2000MHz; sudo cpupower frequency-set -u 2000MHz

Enable: sudo cpupower frequency-set -d 1Mhz; sudo cpupower frequency-set -u 5000MHz (use hw default limits)

CAT of L3 cache sets

Reduce to assoc 4: wrmsr -a 0xc90 0x000f

Restore to assoc 16: wrmsr -a 0xc90 0xffff

You can’t perform that action at this time.