-
Notifications
You must be signed in to change notification settings - Fork 1
Iterative Debugging
This guide explains Mess's incremental refinement workflow: how to start with quick measurements and progressively add detail to your bandwidth-latency curves.
Before starting a full benchmark run, we strongly recommend performing a "sanity check" to ensure the system is stable and capable of producing valid measurements.
1. Target Peak Bandwidth (Read) Run a single repetition with maximum verbosity, targeting peak bandwidth (0 pause) for reads (100% ratio):
./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3 --repetitions=1Check the output for:
- No errors or warnings.
- Reasonable bandwidth numbers (e.g., >10 GB/s for modern systems).
2. Target Write Bandwidth Test the opposite end of the spectrum (100% writes):
./build/bin/mess --profile --ratio=0 --pause=0 --verbose=3 --repetitions=1If both runs complete successfully and show expected bandwidth levels, proceed with the iterative refinement workflow below.
Mess does not delete output files on rerun. When you run the benchmark multiple times with the same output folder, new measurements are added to existing data rather than replacing it.
This enables an iterative workflow:
- Start with a quick, low-resolution run
- Examine the results to identify regions of interest
- Re-run with specific pause values to add detail where needed
- Repeat until you have the resolution you need
When Mess runs, it creates CSV files in the output directory (default: measuring/). Each file contains bandwidth-latency measurements for a specific configuration.
| Scenario | Behavior |
|---|---|
| New pause value | New row added to CSV |
| Same pause value | Row updated with new measurement |
| Different ratio | Separate file, same rules apply |
| Different traffic mode | Separate file, same rules apply |
This means you can safely:
- Run the benchmark first for an overview
- Then run
--pause=<specific values>to fill in gaps - Results accumulate in the same output files
Start with a standard run to capture the full bandwidth-latency curves:
./build/bin/mess --profileThis produces a detailed set of points covering the entire saturation range.
Use Plotter to examine the bandwidth-latency curves:
python3 utils/plotter.py measuring/multisequentialLook for:
- Inflection points - where latency starts increasing
- Steep regions - areas of rapid change
- Gaps - sparse regions that need more detail
Identify the bandwidth range or pause values where you need more resolution. Then run with specific pause values:
# If the interesting region is around pause 100-500
./build/bin/mess --ratio=100 --pause=100,150,200,250,300,350,400,450,500 --profileThe new points are added to your existing data.
Plot again to see the combined results:
python3 utils/plotter.py measuring/multisequentialYour bandwidth-latency curve now has higher resolution in the region of interest.
Continue adding points until you have sufficient detail:
# Even finer resolution in a specific region
./build/bin/mess --ratio=100 --pause=200,210,220,230,240,250 --profile# 1. Initial run
./build/bin/mess --ratio=100 --profile
python3 utils/plotter.py measuring/multisequential
# 2. Notice interesting behavior around pause 50-200, add detail
./build/bin/mess --ratio=100 --pause=50,75,100,125,150,175,200 --profile
python3 utils/plotter.py measuring/multisequential
# 3. Saturation region needs more points (high pause values)
./build/bin/mess --ratio=100 --pause=5000,7500,10000,15000,20000 --profile
python3 utils/plotter.py measuring/multisequential
# 4. Final high-resolution bandwidth-latency curve with all accumulated dataWhen measurements don't look right, use iterative debugging to isolate the issue.
./build/bin/mess --dry-run --verbose=2Check for:
- Correct architecture detection (x86, ARM, Power, RISC-V)
- CPU count and topology
- NUMA node configuration
- Selected bandwidth measurement backend (perf, likwid, or other)
Test a single point at maximum verbosity:
./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3Expected:
- Bandwidth: Significant value (10-300 GB/s depending on system)
- Latency: ~80-200ns for idle/unloaded
Common issues:
- Zero bandwidth → Check perf permissions (
/proc/sys/kernel/perf_event_paranoid) - Very high latency → Check NUMA binding, background processes
# Test bandwidth measurement alone
./build/bin/mess-profiler --dry-run
./build/bin/mess-profiler -s 100ms sleep 5
# Test multiple ratios
./build/bin/mess --ratio=100 --pause=0 --verbose=3 # Reads
./build/bin/mess --ratio=0 --pause=0 --verbose=3 # Writes
# Test pause variation
./build/bin/mess --ratio=100 --pause=0 --verbose=3
./build/bin/mess --ratio=100 --pause=1000 --verbose=3
./build/bin/mess --ratio=100 --pause=10000 --verbose=3Expected progression:
-
pause=0: Maximum bandwidth -
pause=1000: Reduced bandwidth -
pause=10000: Very low bandwidth, higher latency
# Few points, single ratio
./build/bin/mess --ratio=100 --pause=0,100,1000,10000 --verbose=2
# Full run, single ratio
./build/bin/mess --ratio=100 --profile --verbose=2
# Full run, all ratios (standard)
./build/bin/mess --profile --verbose=1Use --add-counters to capture additional performance counters during your debug runs. This provides valuable context for understanding system behavior:
# Monitor cache behavior while debugging
./build/bin/mess --profile --add-counters=cache-misses,cache-references
# Track instructions and cycles
./build/bin/mess --profile --add-counters=instructions,cycles
# Combine with targeted pause values
./build/bin/mess --ratio=100 --pause=0,100,1000 --add-counters=cycles,instructions,cache-misses --profileThese counters are recorded alongside bandwidth/latency data in the output files, helping you correlate memory performance with CPU activity. Check available counters with perf list.
Possible causes:
-
perf counter permissions
echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid
-
Wrong counter selected
./build/bin/mess-profiler --dry-run # Check which counters are being used -
Insufficient cores
./build/bin/mess --total-cores=4 --verbose=3
Possible causes:
-
Background processes
top # Check for CPU usage -
Frequency scaling
sudo cpupower frequency-set -g performance
-
Insufficient repetitions
./build/bin/mess --repetitions=10 --profile
Possible causes:
-
Pause values not reaching saturation
./build/bin/mess --pause=0,10,100,1000,10000,100000,1000000 --verbose=3
-
System saturates at low pause values
- Normal for some systems
- Manually add more points in the low-pause region using
--pause=...
Possible causes:
-
NUMA effects - Memory accessed from remote node
./build/bin/mess --bind=0 --cores=0-7 --profile
-
Measurement noise - Increase repetitions
./build/bin/mess --repetitions=10 --profile
| Check | Command | Expected |
|---|---|---|
| System detection | mess --dry-run -v2 |
Correct arch, cores, NUMA |
| perf access | perf stat ls |
No permission errors |
| Single point | mess --profile --ratio=100 --pause=0 -v3 |
Non-zero BW, ~100ns latency |
| Ratio variation | Compare ratio=100 vs ratio=0 | Different BW values |
| Pause variation | Compare pause=0 vs pause=10000 | BW decreases, latency increases |
| Level | Information |
|---|---|
--verbose=1 |
Progress bar, ETA, summary |
--verbose=2 |
Configuration, execution steps |
--verbose=3 |
Raw measurements, subprocess commands, stabilization data |
For debugging, always use --verbose=3:
./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3 2>&1 | tee debug.logIf you've followed this guide and still have issues:
-
Capture verbose output:
./build/bin/mess --dry-run --verbose=3 2>&1 > system_info.txt ./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3 2>&1 > debug.log
-
Include in your report:
system_info.txtdebug.log- Output of
uname -a - Output of
gcc --version
-
Contact: mess@bsc.es
- Understanding CLI arguments - Complete CLI reference
- FAQ - Common issues and solutions
- Understand output - Output file formats
- Plotter - Visualizing results
- Understand output
- Huge memory pages
- Temporal vs Non-Temporal stores
- Load-store vs Read-write
- Traffic generator setup
[mess.bsc.es](https://mess.bsc.es) | [GitHub](https://github.com/bsc-mem/Mess-2.0)