Skip to content

fix(fuzz): make default pass selection complete and add runtime equivalence boundary #145

@doomhammerhell

Description

@doomhammerhell

Summary

During a non-invasive local review of Azoth’s fuzzing path, I found that the current fuzzer appears to miss one of the default obfuscation passes and that --check-deploy validates deployment success rather than runtime behavioral equivalence.

This is not a production exploit claim. The issue is about assurance coverage: Azoth is positioned as a deterministic EVM bytecode obfuscator intended to make Mirage execution contracts indistinguishable from ordinary unverified deployments. Because of that role, fuzzing should ideally exercise all default passes and distinguish deployability from semantic/runtime equivalence.

The concrete issue is that the default pass list contains four passes, but the fuzzing pass-selection mask appears to generate only three bits, making string_obfuscate unreachable through the current randomized pass selection path.

Affected components

crates/cli/src/commands/fuzz.rs
crates/cli/src/commands/mod.rs

Technical description

The default pass string includes:

arithmetic_chain, push_split, slot_shuffle, string_obfuscate

However, the fuzzer appears to select passes using:

let passes = passes_from_bits((rng.next_u32() % 8) as u8);

Since % 8 only produces values in the range 0..7, only three bits can be set. If the default passes are mapped sequentially, bit 3 is required to select the fourth pass, string_obfuscate. Under the current mask, that pass is not reachable through the randomized pass-combination path.

This narrows fuzzing coverage relative to the default transform set.

A second, broader assurance-boundary issue is that --check-deploy appears to compare deployment success, not runtime equivalence. Deployment success is useful, but it is weaker than checking that original and obfuscated bytecode behave equivalently over calldata, state, environment, reverts, logs, returndata, external calls, storage writes, and gas.

For a privacy-critical obfuscation pipeline, a contract can deploy successfully while still diverging at runtime or exposing stable runtime artifacts. Naturally, EVM bytecode finds a way to be annoying exactly where the happy path stops looking.

Proof of Concept

Run the following local inspection commands:

rg -n "DEFAULT_PASSES|passes_from_bits|rng.next_u32\\(\\) % 8|Contract::ALL|check_deploy" crates/cli/src/commands

Expected relevant observations:

DEFAULT_PASSES = "arithmetic_chain, push_split, slot_shuffle, string_obfuscate"

and:

let passes = passes_from_bits((rng.next_u32() % 8) as u8);

Because % 8 yields only three usable bits, the fourth default pass is not selected by this fuzzing path.

Then run smoke fuzzing:

cargo run --bin azoth -- fuzz -i 100
cargo run --bin azoth -- fuzz -i 100 --check-deploy

Observed local result:

cargo run --bin azoth -- fuzz -i 100

Iterations: 114
Successes: 100
Errors: 0
Unique crashes saved: 0
cargo run --bin azoth -- fuzz -i 100 --check-deploy

Iterations: 114
Successes: 100
Errors: 0
Deployment mismatches: 0
Unique crashes saved: 0

The successful smoke runs are good, but they do not exercise string_obfuscate through the current bitmask and do not establish runtime equivalence.

The Iterations: 114 value for -i 100 also suggests the parallel worker counter can overshoot the requested fuzzing budget. That is not the main issue here, but exact iteration accounting would improve reproducibility for CI and research reporting.

Trace / evidence

Default passes:

arithmetic_chain
push_split
slot_shuffle
string_obfuscate

Current randomized mask:

rng.next_u32() % 8

Reachable bit positions:

bit 0 -> reachable
bit 1 -> reachable
bit 2 -> reachable
bit 3 -> not reachable

Therefore:

string_obfuscate -> not selected by current randomized pass mask

Runtime-equivalence boundary:

--check-deploy checks deployment success
--check-deploy does not compare runtime traces
--check-deploy does not compare returndata
--check-deploy does not compare revert data
--check-deploy does not compare logs
--check-deploy does not compare storage writes
--check-deploy does not compare external-call effects
--check-deploy does not compare gas behavior

Impact

The main impact is reduced fuzzing assurance.

If one default pass is unreachable, bugs or distinguishability artifacts specific to that pass may remain undetected. This is especially relevant for string_obfuscate, because revert strings, error payloads, and string-like byte sequences can be externally observable or classifier-visible depending on how they are transformed.

The runtime-equivalence gap is also important. Deployment success is necessary, but not sufficient, for an obfuscation system. Original and obfuscated bytecode should ideally be compared over runtime behavior, including success/revert status, returndata, revert payloads, logs, storage effects, external calls, and gas deltas.

At Mirage level, this matters because Azoth is intended to support indistinguishability of execution contracts. A transform can preserve deployability while still creating runtime-visible divergence or stable classifier features.

Recommended mitigation

  1. Replace the hardcoded % 8 mask with a mask derived from the number of default passes.

For example, derive the maximum mask from the pass count instead of hardcoding three bits:

let pass_count = DEFAULT_PASSES.split(',').count();
let mask_limit = 1u32.checked_shl(pass_count as u32).unwrap_or(0);
let mask = (rng.next_u32() % mask_limit) as u8;
let passes = passes_from_bits(mask);
  1. Add a unit test proving every default pass is reachable through the fuzz pass-selection mechanism.

Suggested test intent:

#[test]
fn fuzz_pass_selection_can_reach_every_default_pass() {
    let default_passes: Vec<_> = DEFAULT_PASSES
        .split(',')
        .map(|p| p.trim())
        .collect();

    for expected_pass in &default_passes {
        let reachable = (0u32..(1u32 << default_passes.len()))
            .any(|mask| passes_from_bits(mask as u8).contains(expected_pass));

        assert!(
            reachable,
            "default pass {expected_pass} is not reachable by fuzz pass selection"
        );
    }
}
  1. Consider renaming or documenting --check-deploy as a deployability check rather than an equivalence check.

  2. Add a future --check-runtime-equivalence or equivalent mode using REVM differential execution.

That mode should compare original and obfuscated runtime behavior over generated calldata, state, and environment.

Suggested comparison fields:

status: success / revert / halt / invalid / out-of-gas
returndata
revert data
logs and topics
storage writes
external calls and call outcomes
gas used
gas remaining
generated calldata
generated environment
  1. Make fuzz iteration accounting exact where possible, especially for CI and research runs.

Suggested regression tests

1. every pass in DEFAULT_PASSES is reachable by fuzz pass selection
2. string_obfuscate appears in at least one generated fuzz pass combination
3. --check-deploy remains deployability-only and is documented as such
4. runtime differential smoke test compares original vs obfuscated execution for a small fixture
5. fuzzing with -i N reports an exact or explicitly documented iteration budget

Suggested invariant

For a default pass set P, the fuzzing pass-selection function must be capable of selecting every pass p ∈ P.

For runtime assurance, for every fuzz-generated input where original deployment succeeds, the obfuscated deployment should succeed and generated runtime transactions should produce equivalent observable results under the selected equivalence relation:

status
returndata
revert data
logs
storage delta
external call effects
gas policy

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions