Skip to content

Iterative Debugging

Victor Xirau Guardans edited this page Feb 9, 2026 · 2 revisions

This guide explains Mess's incremental refinement workflow: how to start with quick measurements and progressively add detail to your bandwidth-latency curves.


Initial Sanity Check

Before starting a full benchmark run, we strongly recommend performing a "sanity check" to ensure the system is stable and capable of producing valid measurements.

1. Target Peak Bandwidth (Read) Run a single repetition with maximum verbosity, targeting peak bandwidth (0 pause) for reads (100% ratio):

./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3 --repetitions=1

Check the output for:

  • No errors or warnings.
  • Reasonable bandwidth numbers (e.g., >10 GB/s for modern systems).

2. Target Write Bandwidth Test the opposite end of the spectrum (100% writes):

./build/bin/mess --profile --ratio=0 --pause=0 --verbose=3 --repetitions=1

If both runs complete successfully and show expected bandwidth levels, proceed with the iterative refinement workflow below.


Key Concept: Output File Persistence

Mess does not delete output files on rerun. When you run the benchmark multiple times with the same output folder, new measurements are added to existing data rather than replacing it.

This enables an iterative workflow:

  1. Start with a quick, low-resolution run
  2. Examine the results to identify regions of interest
  3. Re-run with specific pause values to add detail where needed
  4. Repeat until you have the resolution you need

Output File Behavior

When Mess runs, it creates CSV files in the output directory (default: measuring/). Each file contains bandwidth-latency measurements for a specific configuration.

Appending vs Overwriting

Scenario Behavior
New pause value New row added to CSV
Same pause value Row updated with new measurement
Different ratio Separate file, same rules apply
Different traffic mode Separate file, same rules apply

This means you can safely:

  • Run the benchmark first for an overview
  • Then run --pause=<specific values> to fill in gaps
  • Results accumulate in the same output files

Iterative Refinement Workflow

Step 1: Baseline Run

Start with a standard run to capture the full bandwidth-latency curves:

./build/bin/mess --profile

This produces a detailed set of points covering the entire saturation range.

Step 2: Visualize Initial Results

Use Plotter to examine the bandwidth-latency curves:

python3 utils/plotter.py measuring/multisequential

Look for:

  • Inflection points - where latency starts increasing
  • Steep regions - areas of rapid change
  • Gaps - sparse regions that need more detail

Step 3: Add Detail to Regions of Interest

Identify the bandwidth range or pause values where you need more resolution. Then run with specific pause values:

# If the interesting region is around pause 100-500
./build/bin/mess --ratio=100 --pause=100,150,200,250,300,350,400,450,500 --profile

The new points are added to your existing data.

Step 4: Re-visualize

Plot again to see the combined results:

python3 utils/plotter.py measuring/multisequential

Your bandwidth-latency curve now has higher resolution in the region of interest.

Step 5: Repeat as Needed

Continue adding points until you have sufficient detail:

# Even finer resolution in a specific region
./build/bin/mess --ratio=100 --pause=200,210,220,230,240,250 --profile

Example: Full Iterative Session

# 1. Initial run
./build/bin/mess --ratio=100 --profile
python3 utils/plotter.py measuring/multisequential

# 2. Notice interesting behavior around pause 50-200, add detail
./build/bin/mess --ratio=100 --pause=50,75,100,125,150,175,200 --profile
python3 utils/plotter.py measuring/multisequential

# 3. Saturation region needs more points (high pause values)
./build/bin/mess --ratio=100 --pause=5000,7500,10000,15000,20000 --profile
python3 utils/plotter.py measuring/multisequential

# 4. Final high-resolution bandwidth-latency curve with all accumulated data

Debugging Unexpected Results

When measurements don't look right, use iterative debugging to isolate the issue.

Step 1: Verify System Detection

./build/bin/mess --dry-run --verbose=2

Check for:

  • Correct architecture detection (x86, ARM, Power, RISC-V)
  • CPU count and topology
  • NUMA node configuration
  • Selected bandwidth measurement backend (perf, likwid, or other)

Step 2: Single Point Test

Test a single point at maximum verbosity:

./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3

Expected:

  • Bandwidth: Significant value (10-300 GB/s depending on system)
  • Latency: ~80-200ns for idle/unloaded

Common issues:

  • Zero bandwidth → Check perf permissions (/proc/sys/kernel/perf_event_paranoid)
  • Very high latency → Check NUMA binding, background processes

Step 3: Validate Components Individually

# Test bandwidth measurement alone
./build/bin/mess-profiler --dry-run
./build/bin/mess-profiler -s 100ms sleep 5

# Test multiple ratios
./build/bin/mess --ratio=100 --pause=0 --verbose=3  # Reads
./build/bin/mess --ratio=0 --pause=0 --verbose=3    # Writes

# Test pause variation
./build/bin/mess --ratio=100 --pause=0 --verbose=3
./build/bin/mess --ratio=100 --pause=1000 --verbose=3
./build/bin/mess --ratio=100 --pause=10000 --verbose=3

Expected progression:

  • pause=0: Maximum bandwidth
  • pause=1000: Reduced bandwidth
  • pause=10000: Very low bandwidth, higher latency

Step 4: Scale Up Gradually

# Few points, single ratio
./build/bin/mess --ratio=100 --pause=0,100,1000,10000 --verbose=2

# Full run, single ratio
./build/bin/mess --ratio=100 --profile --verbose=2

# Full run, all ratios (standard)
./build/bin/mess --profile --verbose=1

Step 5: Add Extra Performance Context (Optional)

Use --add-counters to capture additional performance counters during your debug runs. This provides valuable context for understanding system behavior:

# Monitor cache behavior while debugging
./build/bin/mess --profile --add-counters=cache-misses,cache-references

# Track instructions and cycles
./build/bin/mess --profile --add-counters=instructions,cycles

# Combine with targeted pause values
./build/bin/mess --ratio=100 --pause=0,100,1000 --add-counters=cycles,instructions,cache-misses --profile

These counters are recorded alongside bandwidth/latency data in the output files, helping you correlate memory performance with CPU activity. Check available counters with perf list.


Common Issues and Solutions

Zero or Unrealistic Bandwidth

Possible causes:

  1. perf counter permissions

    echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid
  2. Wrong counter selected

    ./build/bin/mess-profiler --dry-run
    # Check which counters are being used
  3. Insufficient cores

    ./build/bin/mess --total-cores=4 --verbose=3

High Latency Variance

Possible causes:

  1. Background processes

    top  # Check for CPU usage
  2. Frequency scaling

    sudo cpupower frequency-set -g performance
  3. Insufficient repetitions

    ./build/bin/mess --repetitions=10 --profile

Flat Bandwidth-Latency Curve (No Bandwidth Variation)

Possible causes:

  1. Pause values not reaching saturation

    ./build/bin/mess --pause=0,10,100,1000,10000,100000,1000000 --verbose=3
  2. System saturates at low pause values

    • Normal for some systems
    • Manually add more points in the low-pause region using --pause=...

Non-monotonic Bandwidth-Latency Curve

Possible causes:

  1. NUMA effects - Memory accessed from remote node

    ./build/bin/mess --bind=0 --cores=0-7 --profile
  2. Measurement noise - Increase repetitions

    ./build/bin/mess --repetitions=10 --profile

Debugging Checklist

Check Command Expected
System detection mess --dry-run -v2 Correct arch, cores, NUMA
perf access perf stat ls No permission errors
Single point mess --profile --ratio=100 --pause=0 -v3 Non-zero BW, ~100ns latency
Ratio variation Compare ratio=100 vs ratio=0 Different BW values
Pause variation Compare pause=0 vs pause=10000 BW decreases, latency increases

Verbose Output Reference

Level Information
--verbose=1 Progress bar, ETA, summary
--verbose=2 Configuration, execution steps
--verbose=3 Raw measurements, subprocess commands, stabilization data

For debugging, always use --verbose=3:

./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3 2>&1 | tee debug.log

Getting Help

If you've followed this guide and still have issues:

  1. Capture verbose output:

    ./build/bin/mess --dry-run --verbose=3 2>&1 > system_info.txt
    ./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3 2>&1 > debug.log
  2. Include in your report:

    • system_info.txt
    • debug.log
    • Output of uname -a
    • Output of gcc --version
  3. Contact: mess@bsc.es


See Also

Clone this wiki locally