Skip to content

docs(ci): document cache save/restore design for CI runners (GH Actions, etc.) #92

@zackees

Description

@zackees

Problem

fbuild has a sophisticated local disk cache (~/.fbuild/{dev,prod}/cache/, disk_cache:: module with LRU index, leases, GC budget) but there is no documented design for using it in CI — i.e., how to persist the cache across GH Actions / GitLab CI / Buildkite runs so a cold runner is effectively warm on the second build.

This is blocking the FastLED repo's CI from realising fbuild's cache advantages. It is also the hidden premise behind issue #91 (warm-pass 30 s stall) — until we nail down what a "warm CI run" even means, perf targets are fuzzy.

Acceptance criteria

  • A new docs/CI_CACHING.md (and matching docs/INDEX.md entry) that answers the questions below.
  • A drop-in snippet for actions/cache@v4 that FastLED's CI can adopt unchanged.
  • Cache-key strategy that avoids the three failure modes: stale cache hit (silent wrong output), cache miss on trivial edit (wasted time), and cache explosion (budget breach).

Questions to answer in the doc

1. What to cache

  • Paths: is it ~/.fbuild/, just ~/.fbuild/cache/, or selective subdirs (toolchains vs packages vs build artifacts vs daemon state)?
  • What MUST be excluded? (daemon/daemon.port, daemon/fbuild_daemon.pid, daemon/daemon.log*, any lock files, any leases)
  • Size expectations per-platform — ESP32 cache is ~GB scale; need a budget realistic for actions/cache 10 GB limit.
  • Windows-specific paths (%USERPROFILE%\.fbuild\...) vs Unix.

2. Cache key strategy

  • Primary key inputs — in priority order. Candidates: fbuild version, platformio.ini hash, board-JSON set hash, toolchain version, OS/arch, source tree hash.
  • Restore-key fallback chain for partial hits.
  • How to reason about cache validity across fbuild upgrades — does disk_cache::reconcile_on_open handle a stale cache gracefully, or do we bust on fbuild version?

3. Hermeticity

  • Can a restored cache produce byte-identical outputs to a fresh build? If not, what's the tolerance (mtime-stamped objects? build-id?)?
  • Does reproducible-builds apply? If partial, document which parts are reproducible (object files? archives? final ELF?).
  • Are there runner-pinned paths embedded in cached artifacts (response files with absolute paths, debug info with build host paths)? These will cause cross-runner cache poisoning.

4. Invalidation

  • disk_cache::budget::autoscales_to_disk — how does this behave on a runner with 14 GB free?
  • What triggers eviction on restore? Does reconcile_on_open do a full rescan (slow, bad for warm pass)?
  • Pinning: CI should probably fbuild cache pin <toolchain> so shared toolchains never LRU-evict. Confirm pin API exists or file a follow-up.

5. Daemon interaction

  • CI typically runs one-shot (cold start → build → exit). Does fbuild's daemon support a --foreground --oneshot mode that skips socket setup, or do CI runs need fbuild --no-daemon?
  • If the daemon is used, what's the recommended teardown so the next CI job doesn't inherit stale state through the cache?

6. Worked examples

7. Related tooling audit

  • Does fbuild itself expose a fbuild cache export <tarball> / fbuild cache import <tarball> helper that CI could use instead of raw actions/cache? Document current state and file a follow-up if missing.
  • How does disk_cache::lease interact with two concurrent CI shards sharing a cache? Is locking safe?

Non-goals

  • Implementing new cache features in this issue — that's downstream.
  • Redesigning the local disk cache — this doc describes what exists and how to use it in CI.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions