Skip to content

Disable cache-on-failure (save-always: false on julia-actions/cache@v3)#119

Merged
mtfishman merged 1 commit into
mainfrom
mf/cache-no-save-on-failure
May 28, 2026
Merged

Disable cache-on-failure (save-always: false on julia-actions/cache@v3)#119
mtfishman merged 1 commit into
mainfrom
mf/cache-no-save-on-failure

Conversation

@mtfishman
Copy link
Copy Markdown
Member

@mtfishman mtfishman commented May 28, 2026

Summary

julia-actions/cache@v3 (introduced via #118) saves caches on job failure by default, where v2 only saved on success. The escape hatch documented in the v3 release notes is save-always: false. This PR sets that on every julia-actions/cache@v3 invocation in the reusable workflows here.

The new v3 default is reasonable for the common case (test-failure retries reuse the expensive depot install). But when the failure is in the setup itself — a half-installed depot, an aborted Pkg precompile — the broken state is cached, the restore-key prefix matches subsequent runs, and every retry restores the broken state and fails identically. Reruns alone can't recover; the cache has to be manually evicted or expire.

This was hit on ITensor/ITensorNetworks.jl#373: a fresh Windows run failed in Pkg.test precompilation (ChainRulesCore is required but does not seem to be installed), the broken state was cached, and two reruns reproduced the failure verbatim by restoring from that cache.

`julia-actions/cache` v3.0.0 introduced a breaking change: caches are
now saved on job failure by default. v2 only saved on success. The
override `save-always: false` restores the v2 behavior.

The new default is reasonable for the common case (a flaky test
failure happens after an expensive depot install; keeping that install
cached makes the retry fast). But it's actively harmful when the
failure is in the setup itself — a half-installed depot, an aborted
Pkg precompile, etc. In that case the broken state gets cached, the
restore-key prefix matches subsequent runs, and every retry restores
the broken state and fails identically. Reruns alone can't recover;
the cache has to be manually evicted or expire.

We hit this on `ITensor/ITensorNetworks.jl#373`: a fresh Windows run
failed in `Pkg.test` precompilation (`ChainRulesCore is required but
does not seem to be installed`), the broken state was cached, and two
reruns reproduced the same failure verbatim by restoring from that
cache.

Set `save-always: false` on every `julia-actions/cache@v3` invocation
in this repo's reusable workflows (Tests, CheckCompatBounds,
Documentation, FormatCheck, FormatPullRequest, Registrator). One
explanatory comment lives in `Tests.yml`; the others reference back
to it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mtfishman mtfishman merged commit 9e34bf7 into main May 28, 2026
4 checks passed
@mtfishman mtfishman deleted the mf/cache-no-save-on-failure branch May 28, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant