Iterative Debugging

This guide explains Mess's incremental refinement workflow: how to start with quick measurements and progressively add detail to your bandwidth-latency curves.

Initial Sanity Check

Before starting a full benchmark run, we strongly recommend performing a "sanity check" to ensure the system is stable and capable of producing valid measurements.

1. Target Peak Bandwidth (Read) Run a single repetition with maximum verbosity, targeting peak bandwidth (0 pause) for reads (100% ratio):

./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3 --repetitions=1

Check the output for:

No errors or warnings.
Reasonable bandwidth numbers (e.g., >10 GB/s for modern systems).

2. Target Write Bandwidth Test the opposite end of the spectrum (100% writes):

./build/bin/mess --profile --ratio=0 --pause=0 --verbose=3 --repetitions=1

If both runs complete successfully and show expected bandwidth levels, proceed with the iterative refinement workflow below.

Key Concept: Output File Persistence

Mess does not delete output files on rerun. When you run the benchmark multiple times with the same output folder, new measurements are added to existing data rather than replacing it.

This enables an iterative workflow:

Start with a quick, low-resolution run
Examine the results to identify regions of interest
Re-run with specific pause values to add detail where needed
Repeat until you have the resolution you need

Output File Behavior

When Mess runs, it creates CSV files in the output directory (default: measuring/). Each file contains bandwidth-latency measurements for a specific configuration.

Appending vs Overwriting

Scenario	Behavior
New pause value	New row added to CSV
Same pause value	Row updated with new measurement
Different ratio	Separate file, same rules apply
Different traffic mode	Separate file, same rules apply

This means you can safely:

Run the benchmark first for an overview
Then run --pause=<specific values> to fill in gaps
Results accumulate in the same output files

Iterative Refinement Workflow

Step 1: Baseline Run

Start with a standard run to capture the full bandwidth-latency curves:

./build/bin/mess --profile

This produces a detailed set of points covering the entire saturation range.

Step 2: Visualize Initial Results

Use Plotter to examine the bandwidth-latency curves:

python3 utils/plotter.py measuring/multisequential

Look for:

Inflection points - where latency starts increasing
Steep regions - areas of rapid change
Gaps - sparse regions that need more detail

Step 3: Add Detail to Regions of Interest

Identify the bandwidth range or pause values where you need more resolution. Then run with specific pause values:

# If the interesting region is around pause 100-500
./build/bin/mess --ratio=100 --pause=100,150,200,250,300,350,400,450,500 --profile

The new points are added to your existing data.

Step 4: Re-visualize

Plot again to see the combined results:

python3 utils/plotter.py measuring/multisequential

Your bandwidth-latency curve now has higher resolution in the region of interest.

Step 5: Repeat as Needed

Continue adding points until you have sufficient detail:

# Even finer resolution in a specific region
./build/bin/mess --ratio=100 --pause=200,210,220,230,240,250 --profile

Example: Full Iterative Session

# 1. Initial run
./build/bin/mess --ratio=100 --profile
python3 utils/plotter.py measuring/multisequential

# 2. Notice interesting behavior around pause 50-200, add detail
./build/bin/mess --ratio=100 --pause=50,75,100,125,150,175,200 --profile
python3 utils/plotter.py measuring/multisequential

# 3. Saturation region needs more points (high pause values)
./build/bin/mess --ratio=100 --pause=5000,7500,10000,15000,20000 --profile
python3 utils/plotter.py measuring/multisequential

# 4. Final high-resolution bandwidth-latency curve with all accumulated data

Debugging Unexpected Results

When measurements don't look right, use iterative debugging to isolate the issue.

Step 1: Verify System Detection

./build/bin/mess --dry-run --verbose=2

Check for:

Correct architecture detection (x86, ARM, Power, RISC-V)
CPU count and topology
NUMA node configuration
Selected bandwidth measurement backend (perf, likwid, or other)

Step 2: Single Point Test

Test a single point at maximum verbosity:

./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3

Expected:

Bandwidth: Significant value (10-300 GB/s depending on system)
Latency: ~80-200ns for idle/unloaded

Common issues:

Zero bandwidth → Check perf permissions (/proc/sys/kernel/perf_event_paranoid)
Very high latency → Check NUMA binding, background processes

Step 3: Validate Components Individually

# Test bandwidth measurement alone
./build/bin/mess-profiler --dry-run
./build/bin/mess-profiler -s 100ms sleep 5

# Test multiple ratios
./build/bin/mess --ratio=100 --pause=0 --verbose=3  # Reads
./build/bin/mess --ratio=0 --pause=0 --verbose=3    # Writes

# Test pause variation
./build/bin/mess --ratio=100 --pause=0 --verbose=3
./build/bin/mess --ratio=100 --pause=1000 --verbose=3
./build/bin/mess --ratio=100 --pause=10000 --verbose=3

Expected progression:

pause=0: Maximum bandwidth
pause=1000: Reduced bandwidth
pause=10000: Very low bandwidth, higher latency

Step 4: Scale Up Gradually

# Few points, single ratio
./build/bin/mess --ratio=100 --pause=0,100,1000,10000 --verbose=2

# Full run, single ratio
./build/bin/mess --ratio=100 --profile --verbose=2

# Full run, all ratios (standard)
./build/bin/mess --profile --verbose=1

Step 5: Add Extra Performance Context (Optional)

Use --add-counters to capture additional performance counters during your debug runs. This provides valuable context for understanding system behavior:

# Monitor cache behavior while debugging
./build/bin/mess --profile --add-counters=cache-misses,cache-references

# Track instructions and cycles
./build/bin/mess --profile --add-counters=instructions,cycles

# Combine with targeted pause values
./build/bin/mess --ratio=100 --pause=0,100,1000 --add-counters=cycles,instructions,cache-misses --profile

These counters are recorded alongside bandwidth/latency data in the output files, helping you correlate memory performance with CPU activity. Check available counters with perf list.

Common Issues and Solutions

Zero or Unrealistic Bandwidth

Possible causes:

perf counter permissions

echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid

Wrong counter selected

./build/bin/mess-profiler --dry-run
# Check which counters are being used

Insufficient cores

./build/bin/mess --total-cores=4 --verbose=3

High Latency Variance

Possible causes:

Background processes
```
top  # Check for CPU usage
```

Frequency scaling

sudo cpupower frequency-set -g performance

Insufficient repetitions

./build/bin/mess --repetitions=10 --profile

Flat Bandwidth-Latency Curve (No Bandwidth Variation)

Possible causes:

Pause values not reaching saturation

./build/bin/mess --pause=0,10,100,1000,10000,100000,1000000 --verbose=3

System saturates at low pause values
- Normal for some systems
- Manually add more points in the low-pause region using --pause=...

Non-monotonic Bandwidth-Latency Curve

Possible causes:

NUMA effects - Memory accessed from remote node

./build/bin/mess --bind=0 --cores=0-7 --profile

Measurement noise - Increase repetitions

./build/bin/mess --repetitions=10 --profile

Debugging Checklist

Check	Command	Expected
System detection	`mess --dry-run -v2`	Correct arch, cores, NUMA
perf access	`perf stat ls`	No permission errors
Single point	`mess --profile --ratio=100 --pause=0 -v3`	Non-zero BW, ~100ns latency
Ratio variation	Compare ratio=100 vs ratio=0	Different BW values
Pause variation	Compare pause=0 vs pause=10000	BW decreases, latency increases

Verbose Output Reference

Level	Information
`--verbose=1`	Progress bar, ETA, summary
`--verbose=2`	Configuration, execution steps
`--verbose=3`	Raw measurements, subprocess commands, stabilization data

For debugging, always use --verbose=3:

./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3 2>&1 | tee debug.log

Getting Help

If you've followed this guide and still have issues:

Capture verbose output:

./build/bin/mess --dry-run --verbose=3 2>&1 > system_info.txt
./build/bin/mess --profile --ratio=100 --pause=0 --verbose=3 2>&1 > debug.log

Include in your report:
- system_info.txt
- debug.log
- Output of uname -a
- Output of gcc --version
Contact: mess@bsc.es

Iterative Debugging

Initial Sanity Check

Key Concept: Output File Persistence

Output File Behavior

Appending vs Overwriting

Iterative Refinement Workflow

Step 1: Baseline Run

Step 2: Visualize Initial Results

Step 3: Add Detail to Regions of Interest

Step 4: Re-visualize

Step 5: Repeat as Needed

Example: Full Iterative Session

Debugging Unexpected Results

Step 1: Verify System Detection

Step 2: Single Point Test

Step 3: Validate Components Individually

Step 4: Scale Up Gradually

Step 5: Add Extra Performance Context (Optional)

Common Issues and Solutions

Zero or Unrealistic Bandwidth

High Latency Variance

Flat Bandwidth-Latency Curve (No Bandwidth Variation)

Non-monotonic Bandwidth-Latency Curve

Debugging Checklist

Verbose Output Reference

Getting Help

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Getting Started

Tools

Running Mess

Benchmark Internals

Development

Clone this wiki locally