Skip to content

ci: make MRBind build cache non-fatal#6125

Merged
Fedr merged 1 commit into
masterfrom
mrbind-cache-non-fatal
May 20, 2026
Merged

ci: make MRBind build cache non-fatal#6125
Fedr merged 1 commit into
masterfrom
mrbind-cache-non-fatal

Conversation

@Fedr
Copy link
Copy Markdown
Contributor

@Fedr Fedr commented May 20, 2026

Summary

Make actions/cache@v5 non-fatal inside the build-mrbind composite action — a silent cache-service crash should not abort the whole job.

Motivation

On PR #6121, the windows-build-test (msvc-2019, Release, CMake, x64-windows-vs2019-meshlib) job in run 26152206642 failed at the Build MRBind composite step in ~2 seconds with zero log output. Tracing the timeline:

  • Get MRBind submodule SHA succeeded
  • actions/cache@v5 started at 09:00:14.67Z, the next workflow step's group started at 09:00:15.25Z — a 0.57 s gap with no output at all, while the two earlier actions/cache@v5 invocations in the same job (CUDA cache, MSYS2 cache) both printed Cache restored successfully as expected
  • the composite was marked failure, so Generate C bindings, Build, tests, archive, upload — all skipped

Looks like a transient cache-service blip causing the action to crash before its logger flushes anything. Even with debug logging enabled we'd still pay a full job failure for what should be a recoverable event.

Fix

One-line continue-on-error: true on the cache step. On a cache-service error:

  • the composite continues
  • steps.cache.outputs.cache-hit is empty → the existing if: steps.cache.outputs.cache-hit != 'true' guards on the platform-specific Build MRBind sub-steps fire → MRBind rebuilds from scratch
  • install_mrbind_windows_msys2.bat (and the Unix equivalent) wipes build/ before rebuilding, so a partially restored cache dir is safe
  • the post-action save still runs and repopulates the cache, so the next run is back to cache-hit speed

Worst-case cost on a transient failure: one rebuild (~10 min on Windows, less elsewhere) instead of a hard job failure that takes the entire matrix entry down.

Test plan

  • CI green across all platforms that use build-mrbind (run 26155116273): 28 success, 11 skipped, 0 failures across Windows (msvc-2019 + msvc-2022, Debug + Release), Linux vcpkg (x64 + arm64, Debug + Release, GCC 11 + Clang 21), Ubuntu (22.04 + 24.04, arm64 + x64, GCC 12 / 13 / 14), macOS (arm64 Debug + Release, x64 Release), and emscripten (singlethreaded + multithreaded + multithreaded-64bit, build + test).
  • Cache-hit path still short-circuits the build. On the exact matrix entry that failed silently on Planar filling with plan #6121 (windows-build-test (msvc-2019, Release, CMake, x64-windows-vs2019-meshlib), job 76932415924): the MRBind cache key was unchanged by this PR (only action.yml was touched, not the cache-key inputs), so the step hit cache (Cache hit for: mrbind-build-windows-clang18-d5ca6d...), the conditional Build MRBind (Windows) sub-step was skipped, and the next workflow step (Generate C bindings) immediately invoked the cached mrbind / mrbind_gen_c binaries. Post-action correctly logged Cache hit occurred on the primary key ..., not saving cache.

actions/cache@v5 occasionally exits non-zero in under a second with no
log output (seen on PR #6121, windows-2019/Release matrix, run
26152206642 job 76922322315), aborting the whole Build MRBind composite
and skipping every downstream step.

A cache restore failure should never abort the build: the existing
`if: steps.cache.outputs.cache-hit != 'true'` build sub-step already
runs from scratch on miss, and `install_mrbind_windows_msys2.bat` (and
its Unix equivalent) wipes `build/` before rebuilding, so a partially
restored cache directory is safe to fall through.

Adding `continue-on-error: true` on the cache step turns transient
cache-service blips into a one-off ~10-min rebuild instead of a hard
job failure; the post-action save still runs and repopulates the cache
for the next run.
@Fedr Fedr requested a review from Grantim May 20, 2026 09:55
@Fedr Fedr merged commit fda85d2 into master May 20, 2026
39 checks passed
@Fedr Fedr deleted the mrbind-cache-non-fatal branch May 20, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants