Skip to content

build: per-TU cache misses across CI runs — fast-path fingerprint bakes mtime, defeats content-based reuse #146

@zackees

Description

@zackees

TL;DR

fbuild's fast-path fingerprint in hash_watch_set_stamps (crates/fbuild-build/src/build_fingerprint/mod.rs:204-246) hashes file mtime (seconds + nanos) alongside path + length. Any workflow that saves .fbuild/build/… to actions/cache and restores it on a later run hits this: tar extraction resets mtimes → fingerprint differs → fast-path invalidates → fbuild recompiles every translation unit even though source content is byte-identical and zccache's per-TU cache (if it were warm) would hit.

There is no content-hash fallback. We should add one (blake3 preferred, per zccache parity).

Evidence

Benchmark on bench/fastled-examples — FastLED examples/ compiled for uno, 83 examples, Ubuntu 24.04 runner, fbuild 2.1.21:

run cache-hit total compile first example examples 2–83
iter3 cold 24626421386 false 167s 140s 20.5s 1.4–2.0s
iter3 warm 24626590831 true (65 MB restored) 174s 142s 11.8s 1.4–2.0s

The 9s saved on example #1 is the ~/.fbuild toolchain-materialization cache paying off. Per-TU reuse across runs = zero. Every subsequent example re-enters the orchestrator's full compile loop at the same within-run daemon-warm rate as cold.

Root cause

crates/fbuild-build/src/build_fingerprint/mod.rs:234-242:

for file in files {
    hasher.update(b"file\0");
    hasher.update(normalize_path(&file));
    hasher.update(b"\0");
    let stamp = FileStamp::from_path(&file)?;
    hasher.update(stamp.len.to_le_bytes());
    hasher.update(stamp.modified_secs.to_le_bytes());   // ← tar extraction invalidates
    hasher.update(stamp.modified_nanos.to_le_bytes());  // ← tar extraction invalidates
}

hash_watch_set_stamps feeds directly into the cached fingerprint written to .fbuild/build/<board>/<profile>/build_fingerprint.json (and the per-source watch cache files at .{project|dep_libs}.zccache_fp.json). When the cache restore sets mtimes to extraction time, the stamp-hash changes, the fingerprint changes, and the fast-path check in hash_watch_set_stamps_cached (line 188) never short-circuits.

A content-hash path already exists in the same file — hash_files at line 99 — but it is not wired into the stamp path.

Secondary (for separate issue, not in scope here)

crates/fbuild-build/src/compiler.rs:303 bakes compiler_path.to_string_lossy() — an absolute toolchain path — into the compiler rebuild signature. On a single runner image this path is stable, so it's not the primary culprit for this CI-cache regression, but it would bite cross-runner / containerized cache reuse.

Fix direction

Extend FileStamp with a content hash (blake3, matching zccache) and use it as the fallback when mtime does not match a previously-recorded stamp. The typical flow:

  1. Fast path: if (len, modified_secs, modified_nanos) matches last-seen stamp → assume unchanged, done.
  2. Fallback: compute blake3(content); if it matches last-seen content hash → treat as unchanged (and update the stamp to the new mtime so subsequent fast-path checks succeed without re-hashing).
  3. Mismatch: invalidate and rebuild.

This keeps the steady-state cost equal to today (pure stat call) while making cache-restore resilient. Content hashing only runs when mtime diverges.

TDD plan

RED

Add crates/fbuild-build/tests/fingerprint_survives_mtime_reset.rs:

// Pseudocode — real test uses fbuild-test-support fixtures.
#[test]
fn fingerprint_stable_across_mtime_reset() {
    let proj = create_test_project("uno");
    let watch = FingerprintWatch::from_project(&proj);

    let before = hash_watch_set_stamps(&[watch.clone()]).unwrap();

    // Simulate actions/cache tar-extract: bump every tracked file's mtime.
    let new_time = FileTime::from_unix_time(chrono::Utc::now().timestamp(), 0);
    for entry in walkdir::WalkDir::new(&watch.root) {
        let p = entry.unwrap().into_path();
        if p.is_file() { filetime::set_file_mtime(&p, new_time).unwrap(); }
    }

    let after = hash_watch_set_stamps(&[watch]).unwrap();
    assert_eq!(before, after, "fingerprint must be stable across mtime reset when content is unchanged");
}

Expected: fails today on assert_eq!before and after differ.

GREEN

  1. Add content_hash: String (blake3 hex, 64 chars) to FileStamp.
  2. Change hash_watch_set_stamps so the hasher update is keyed on (len, content_hash), not (len, modified_secs, modified_nanos). Mtime stays in the persisted stamp record as an optimization signal but is not part of the fingerprint hash.
  3. In hash_watch_set_stamps_cached, when deciding whether to recompute a stamp: if cached (len, modified_secs, modified_nanos) matches stat → reuse cached content_hash. Else → re-hash content, compare to cached hash, and if equal, rewrite the stamp record with the new mtime but keep the unchanged hash. Existing callers see a stable fingerprint across mtime reset.
  4. Migrate existing FileStamp JSON on disk: if field missing, re-hash once on next run.

Regression coverage

  • Existing fast-path tests in crates/fbuild-build/src/build_fingerprint/fast_path.rs:350-493 must continue to pass (within-run reuse unchanged).
  • Add a second test fingerprint_invalidates_on_content_change — modify one byte, expect hash change — to guard against the fix being "always stable".

Impact

On the FastLED uno benchmark this is the majority of the warm-cache CI time: ~124s of ~142s compile is the per-example tail that would currently go to sub-second per example if the cache worked. Expect warm job_total to drop from 174s → ~40–50s once this lands, which is the primary target for #112.

Related

  • [META] Fastest possible FastLED examples CI rebuild — profile + benchmark #112 — CI benchmark tracking this and surrounding optimizations.
  • zccache daemon state (~/.zccache/ or equivalent) is also not in the bench workflow's actions/cache list — separate follow-up; even with this fix, zccache content-cache won't persist until that directory is cached too. Filing separately.
  • compiler.rs:303 absolute-toolchain-path in rebuild signature — separate follow-up (above).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority: p1Important follow-up after p0 foundations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions