Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 181 additions & 0 deletions docs/CI_CACHING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# CI caching: save/restore design for continuous integration runners

This doc answers: *how do I persist fbuild's cache across CI runs so a cold runner gets a near-warm build?*

It is written for a consumer CI pipeline (e.g., FastLED's GitHub Actions matrix) and describes the contract the cache exposes, the failure modes to avoid, and a drop-in snippet that works out of the box.

For the *local* developer experience see [DEVELOPMENT.md](DEVELOPMENT.md). For *why* the cache exists and its performance targets see [WHY.md](WHY.md). The internal disk-cache implementation lives in `crates/fbuild-packages/src/disk_cache/` with its own [README](../crates/fbuild-packages/src/disk_cache/README.md).

---

## TL;DR — drop-in GitHub Actions snippet

```yaml
- name: Restore fbuild cache
uses: actions/cache@v4
with:
path: |
~/.fbuild/prod/cache
key: fbuild-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('platformio.ini', '**/boards.txt', 'rust-toolchain.toml') }}-${{ hashFiles('.fbuild-cache-version') }}
restore-keys: |
fbuild-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('platformio.ini', '**/boards.txt', 'rust-toolchain.toml') }}-
fbuild-${{ runner.os }}-${{ runner.arch }}-
```

Adjust `hashFiles(...)` inputs to whatever files actually change the build graph in your project. `.fbuild-cache-version` is a sentinel you bump when you want to force a cache bust (or cache an fbuild version identifier file — see [Cache-key strategy](#cache-key-strategy) below).

**Do not cache** `~/.fbuild/*/daemon/` — it's runtime state (port file, PID file, log), and restoring it across runs will make the next client try to connect to a dead daemon.

---

## What the cache holds

| Path | What lives here | Cache in CI? |
|---|---|---|
| `~/.fbuild/prod/cache/archives/` | Downloaded toolchain + framework + library tarballs (pre-extract) | **Yes** |
| `~/.fbuild/prod/cache/installed/` | Extracted, usable toolchains, frameworks, libraries | **Yes** |
| `~/.fbuild/prod/cache/index.sqlite` | LRU index that pairs entries to URLs/versions | **Yes** (must match archives + installed) |
| `<project>/.fbuild/build/` | Per-project build outputs (object files, archives, compile DB, firmware) | **Yes** (the warm-build fast path depends on this) |
| `~/.fbuild/prod/daemon/` | Daemon PID, port, log, status — **ephemeral runtime state** | **No** |

fbuild's cache root defaults to `~/.fbuild/{prod|dev}/cache/` but can be redirected with `FBUILD_CACHE_DIR`. Project-level build outputs default to `<project>/.fbuild/build/` but can be redirected with `FBUILD_BUILD_DIR` — useful on Windows where path lengths matter.

See `crates/fbuild-paths/src/lib.rs` for the authoritative path list.

## Cache-key strategy

The goal is to maximize hit rate without producing wrong output. In priority order, the key should discriminate on:

1. **Runner OS + arch** — `${{ runner.os }}-${{ runner.arch }}`. Toolchains are platform-pinned; sharing across OSes corrupts builds.
2. **fbuild version** — if you use the fbuild PyPI wheel, cache-key `fbuild --version` output. Internal fingerprints and cache-index formats are stable within a minor version but may change on major bumps.
3. **Graph inputs** — `platformio.ini`, per-board JSONs, `rust-toolchain.toml`, any `lib_deps`-defining file. A change to these invalidates the warm-build fingerprint anyway, so it's cheap to bake them into the key to avoid carrying obsolete artifacts.
4. **Manual bump** — a `.fbuild-cache-version` sentinel in the repo lets you force-invalidate with a one-line commit when the runtime has rotted for reasons GH Actions can't see.

### Restore-key fallback chain

GitHub Actions supports partial hits via `restore-keys`. Use them: a partial hit still restores the toolchains and frameworks (by far the biggest download cost) even when per-project build outputs will be rebuilt.

```yaml
restore-keys: |
fbuild-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('platformio.ini') }}-
fbuild-${{ runner.os }}-${{ runner.arch }}-
```

The index SQLite file is the only thing that MUST match the archives/installed directories it references — and it does, because they're all under one cache path and restored atomically.

## Hermeticity

- **Toolchain binaries, frameworks, and libraries are content-addressed** — they produce bit-for-bit identical intermediate objects given identical inputs, regardless of which runner built them. Safe to share across the matrix.
- **Compile outputs under `.fbuild/build/` are reproducible in file *content* but not always in *mtime***. fbuild's build fingerprint is content-hashed, not mtime-based, so a restored cache does not need mtime preservation from the CI runner — `needs_rebuild` keys on depfile contents and command-hash, not timestamps. This means `actions/cache@v4` restoring without mtime fidelity is fine.
- **Absolute path embedding**: response files (`*.rsp`) under `.fbuild/build/` contain absolute paths to `~/.fbuild/.../installed/...`. On a runner where `$HOME` differs between runs (uncommon but possible), response files become stale. Mitigate by pinning `HOME` to a stable runner path, or by bumping `.fbuild-cache-version` when you change runner images.
- **Debug info**: optimized release builds don't embed source paths beyond what DWARF requires. If you ship debuggable firmware, embedded paths may differ across runners — file-per-runner cache shards if this matters.

## Invalidation

`disk_cache::reconcile_on_open` runs **on daemon startup**, not per-build. On CI where the daemon starts fresh each job, reconcile happens exactly once and is fast (just walks `index.sqlite` + verifies referenced paths exist, removes orphans). No full filesystem rescan per build.

The LRU/GC budget (`crates/fbuild-packages/src/disk_cache/budget.rs`) auto-scales to disk free space. On a 14 GB-free GH Actions runner this lands on a ~10 GB budget — within `actions/cache@v4`'s 10 GB cache-entry limit. If you use a bigger runner, consider pinning the budget with `FBUILD_CACHE_BUDGET=8G` (or whatever you want) to keep the cache entry from growing past what the hosted cache will accept.

### Forcing a rebuild in CI

If you need to bust the cache without deleting anything:

```yaml
env:
FBUILD_CACHE_VERSION_BUMP: "2026-04-18-purge"
```

Then include `${{ env.FBUILD_CACHE_VERSION_BUMP }}` in your cache key. No code change required.

## Daemon interaction in CI

CI runs are one-shot: runner starts → clone → build → exit. fbuild's daemon is optimized for long-lived interactive sessions, but it works fine one-shot:

- First `fbuild` invocation on a fresh runner spawns the daemon (~200 ms after #91's F2 landed, was 2.2 s).
- Subsequent invocations within the job talk to the already-running daemon.
- When the CI job ends, the runner terminates all processes — the daemon dies with them. `SELF_EVICTION_TIMEOUT` (30 s after #91's F4) is not reached in normal job flow; it's just an insurance policy if the runner hangs for some reason.

**You do not need** `fbuild --no-daemon` on CI. The daemon path is the fast path.

Do **not** cache `~/.fbuild/.../daemon/`. The port and PID files refer to a daemon that no longer exists and will confuse the next run's client.

### Matrix sharding

Matrix jobs sharing one `actions/cache` key will each read the same restore atomically and each write back the entry; GHA cache dedupes on key, so only one write wins. On `disk_cache::lease` acquisition: the sqlite-backed index uses a cross-process advisory lease. Two daemons on the same cache directory (not possible on a single runner, but could happen if two shards mount the same persistent volume) serialize through sqlite — safe but slow. Prefer per-shard caches keyed by shard identifier if you have contention.

## Tooling audit

As of this doc:

- `fbuild cache export <tarball>` / `fbuild cache import <tarball>`: **not implemented**. `actions/cache@v4` handles archive+extract. A native helper would only be needed for non-GHA CI systems.
- `fbuild cache pin <entry>`: **not implemented**. LRU eviction is based on recency; if you need to guarantee a toolchain never evicts on a shared cache, file a follow-up.
- `fbuild cache stats`: yes — `DiskCache::stats()` exposes size and entry counts. Useful for CI debug output.

## Worked example: FastLED's matrix

A FastLED build matrix might look like:

```yaml
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
board: [uno, esp32dev, esp32s3, teensy41]

steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: '3.12' }
- run: pip install fbuild

- name: Restore fbuild cache
uses: actions/cache@v4
with:
path: |
~/.fbuild/prod/cache
key: fbuild-${{ matrix.os }}-${{ matrix.board }}-${{ hashFiles('platformio.ini', 'fbuild-boards/${{ matrix.board }}.json') }}
restore-keys: |
fbuild-${{ matrix.os }}-${{ matrix.board }}-
fbuild-${{ matrix.os }}-

- name: Build
run: fbuild build examples/Blink -e ${{ matrix.board }}

- name: Print cache stats
run: fbuild cache stats
if: always()
```

## Expected timings

From #91's profiling run (see `tasks/issue-91-report.md`):

| Scenario | Wall-clock |
|---|---:|
| Warm build, hot daemon, no-op rebuild | 34-71 ms (internal) + HTTP + CLI teardown |
| Cold build, daemon spawn required | +200 ms for spawn + first healthy poll |
| Cache-miss build (full toolchain download + compile) | depends on network; on a good runner, dominated by compile time |

**Note:** the 30 s "warm build stall" originally filed in #91 turned out to be a Windows interactive-shell handle-inheritance bug, not a cache path issue. CI runners with no attached interactive shell do not hit that class of bug.

## Smoke test: confirm your cache is warm

Add a CI step after the build that asserts the second run is cheap:

```yaml
- name: Second build should be a near-no-op
run: |
start=$SECONDS
fbuild build examples/Blink -e ${{ matrix.board }}
elapsed=$((SECONDS - start))
test "$elapsed" -lt 10 || { echo "::error::cache restore did not work, second build took ${elapsed}s"; exit 1; }
```

Adjust the threshold to match your project; for a small FastLED sketch, a warm second build is single-digit seconds including fbuild's daemon-spawn cold path.

## See also

- [WHY.md](WHY.md) — performance targets fbuild aims for.
- [architecture/overview.md](architecture/overview.md) — cache lives in `fbuild-packages`.
- `crates/fbuild-packages/src/disk_cache/README.md` — internal structure of the cache directory.
- #91 — warm-path profiling data that informed this doc's timings.
- #92 — issue this doc closes.
2 changes: 2 additions & 0 deletions docs/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ A grep-friendly FAQ that maps common questions to the file that answers them. Bo
| How do I run tests / lint / fmt? | [DEVELOPMENT.md](DEVELOPMENT.md#testing) |
| Why is my build failing? | [DEVELOPMENT.md](DEVELOPMENT.md#troubleshooting) |
| How do I use the emulator (QEMU / avr8js / simavr)? | [../README.md](../README.md#emulator-testing) |
| How do I cache fbuild across CI runs? | [CI_CACHING.md](CI_CACHING.md) |
| What's safe to cache in GitHub Actions? | [CI_CACHING.md](CI_CACHING.md#what-the-cache-holds) |
| What architecture docs should I read for a given crate? | [CLAUDE.md](CLAUDE.md) |

## Conventions
Expand Down
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Architecture and design documentation for the fbuild project.
- **`WHY.md`** -- Why fbuild exists, key benefits, performance benchmarks
- **`BOARD_STATUS.md`** -- Per-platform CI badges and supported boards
- **`DEVELOPMENT.md`** -- Testing, troubleshooting, local development setup
- **`CI_CACHING.md`** -- How to save/restore fbuild's cache across CI runs
- **`DESIGN_DECISIONS.md`** -- ADR-style decisions with rationale
- **`ROADMAP.md`** -- Implementation phases for the Rust port
- **`architecture/`** -- Subsystem-specific architecture documents
Loading