Skip to content

ESP32 sketch builds fail when project_dir is relative — compile_cwd vs -o path mismatch (missing core/ parent dir) #282

@zackees

Description

@zackees

Context — main has been failing ESP32 CI for ~a week

Build ESP32 Dev, Build ESP32-S3, Build ESP32-S2, Build ESP32-P4, Build ESP32-C3, Build ESP32-H2 (and likely others) have been failing on every main push since 2026-05-28. The most recent green run on Build ESP32 Dev is 437d8f7d (2026-05-24); the first red is 465aa12c (2026-05-28); current main 5cb265aa still red.

The error is identical across boards (with different filenames):

build error: build failed: compilation failed for /home/runner/.fbuild/prod/cache/platforms/framework-arduinoespressif32/9ef436ac06b7bf7f/3.3.8/esp32-core-3.3.8/cores/esp32/HWCDC.cpp:
HWCDC.cpp:15: fatal error: opening dependency file tests/platform/esp32dev/.fbuild/build/esp32dev/quick/core/HWCDC_57cf.cpp.d: No such file or directory

gcc's -MMD -MF <path> fails to write the dep file because the core/ parent dir doesn't exist where gcc is actually trying to write it. zccache session stats show errors=7 per build — only core/ files affected (cached as-is for every other source).

Root cause — latent CWD-vs--o path mismatch in compile dispatch

When the CLI is invoked with a relative project_dir (which CI does: fbuild build tests/platform/esp32dev -e esp32dev --quick), the relative path propagates to the daemon and into core_build_dir, but the gcc subprocess gets exec'd with an absolute current_dir, so the relative -o resolves against the wrong base, producing a doubled, never-created path:

  • crates/fbuild-cli/src/cli/build.rs:84 — CLI sends project_dir to daemon without calling normalize_path (the monitor/deploy paths do; build doesn't).
  • crates/fbuild-daemon/src/handlers/operations/build.rs:22 — daemon does PathBuf::from(&req.project_dir), so it stays relative.
  • crates/fbuild-packages/src/cache.rs:114core_build_dir is therefore the relative tests/platform/esp32dev/.fbuild/build/esp32dev/quick/core/.
  • crates/fbuild-build/src/compiler.rs:560create_dir_all(output.parent()) succeeds for the relative output (resolved against the daemon's cwd = repo root). The core/ dir exists on disk as a relative path from the repo root.
  • crates/fbuild-build/src/zccache.rs:187-201 (compile_cwd_from_output) — walks up from the relative output, finds .fbuild, returns the workspace dir, then canonicalizes to absolute (/home/runner/work/fbuild/fbuild/tests/platform/esp32dev).
  • crates/fbuild-build/src/zccache.rs:209-224 (path_arg_for_compile_cwd) — short-circuits on relative paths:
    if !path.is_absolute() {
        return path.to_string_lossy().to_string();   // <-- bails on relative
    }
    so the -o arg stays the raw relative tests/platform/esp32dev/.fbuild/.../HWCDC_57cf.cpp.o.
  • crates/fbuild-build/src/compiler.rs:621run_command(args, Some(compile_cwd), ...) exec's gcc with CWD = absolute tests/platform/esp32dev/ and -o = relative tests/platform/esp32dev/.fbuild/.../HWCDC.cpp.o. gcc resolves that against its CWD → /home/runner/work/fbuild/fbuild/tests/platform/esp32dev/tests/platform/esp32dev/.fbuild/build/esp32dev/quick/core/HWCDC_57cf.cpp.o. The doubled-path core/ was never create_dir_all'd, so -MMD -MF (and the .o write) fail.

The bug landed weeks ago in ada3b603 ("build: stabilize zccache compile cwd", #191) and dab5a0cb ("build: normalize zccache compile paths", #193 — added the canonicalize-to-absolute that locked in the asymmetry).

Why it surfaced now

The fbuild source is byte-identical between 437d8f7d (last green) and 9520cebb (first red) — git diff 437d8f7d..9520cebb -- crates/ tests/platform/ produces zero lines; the only two commits in the window are a version bump (#276) and an unrelated musl-release workflow tweak.

The trigger was zackees/setup-soldr@v0 (floating tag) picking up soldr 0.7.33 → 0.7.42 between the two runs. That changed the toolchain-cache hash (17de77947111959f6f8cb3e0230dbd69), invalidating every prior build-cache entry. Before the upgrade, every CI run was restoring cached .o files from a previous warm run, so the cold-compile dispatch path (where the bug actually triggers) was effectively never exercised. After the upgrade, all sources cold-compile → the latent bug hits every framework core/ source.

Pinning soldr would mask it; the fix has to be in fbuild.

Acceptance criteria

  • Build ESP32 Dev (and the other 5 failing ESP32 workflows) pass on main from a fully cold cache (i.e. without relying on a build-cache restore to skip the affected compile path).
  • A regression test that invokes the compile dispatch with a relative project_dir and asserts the gcc invocation receives consistent cwd/-o paths (or that the absolute .o lands where expected). The existing zccache_hit_across_workspace_rename.rs uses absolute temp dirs only and misses this.
  • The fix does not require pinning zackees/setup-soldr — soldr can keep floating on v0.

Decisions

  • Priority: P1 — every ESP32 sketch build on main is failing in CI today; this blocks landing any ESP32-touching change cleanly.
  • Fix location (best guess, two candidates):
    • Narrow (preferred): crates/fbuild-build/src/compiler.rs:563 — normalize output (and source for symmetry) to absolute before computing compile_cwd, e.g. let output = std::path::absolute(output).unwrap_or_else(|_| output.to_path_buf());. With absolute output, path_arg_for_compile_cwd will strip the workspace prefix and emit a workspace-relative -o that resolves correctly against the absolute compile CWD.
    • Upstream (more invasive but eliminates an entire class of bug): crates/fbuild-daemon/src/handlers/operations/build.rs:22 — canonicalize project_dir once on entry (matching cli/build.rs::normalize_path, with \?\ stripping on Windows), so core_build_dir is absolute everywhere downstream. The CLI's monitor/deploy already normalize; build is the odd one out.
  • Severity wording: "build fails" — not a runtime issue; nothing flashed. Local devs hit it too if they run fbuild build from a parent dir using a relative project_dir and start with a cold .fbuild/build/ cache; the path I personally tested ran from inside tests/platform/esp32p4 so I got an absolute resolved cwd and never hit it.

Related

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions