Skip to content

[SPARK-57075][INFRA] Share precompile Coursier cache with host-runner SBT jobs#56118

Closed
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:share-precompile-coursier-cache-dev6
Closed

[SPARK-57075][INFRA] Share precompile Coursier cache with host-runner SBT jobs#56118
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:share-precompile-coursier-cache-dev6

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng commented May 26, 2026

What changes were proposed in this pull request?

Add precompile-coursier-<hash> and precompile-coursier- as restore-key fallbacks on Cache Coursier local repository for the four host-runner SBT jobs: build (Scala tests), tpcds-1g, docker-integration-tests, k8s-integration-tests. The primary key and existing prefix fallback are unchanged — the new entries are pure fallback.

Why are the changes needed?

The precompile job already resolves all dependencies and writes them to ~/.cache/coursier, but it saves under the key prefix precompile-coursier-, while the downstream test jobs read from <matrix.java>-<matrix.hadoop>-coursier-, tpcds-coursier-, etc. So on cold caches (new branch, modified pom.xml / plugins.sbt), the downstream jobs re-download dependencies that the precompile job already resolved minutes earlier in the same workflow.

Does this PR introduce any user-facing change?

No. CI-only.

How was this patch tested?

Verified on https://github.com/zhengruifeng/spark/actions/runs/26564295518 (commit 919909f5cb9): 12/12 host-runner SBT jobs restored a Coursier cache without re-downloading from Maven Central. 9 of the 12 fell back to the new precompile-coursier-<hash> cache; the other 3 found their own per-job cache from a prior run.

Container jobs (pyspark, sparkr, lint, docs) are excluded — their Coursier cache step is a no-op due to a host↔container $HOME path mismatch (host writes /home/runner/.cache/coursier, container looks at /github/home/.cache/coursier). Independent of this PR; can be addressed in a follow-up.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

@zhengruifeng zhengruifeng changed the title [INFRA] Share precompile Coursier cache with test/pyspark/sparkr jobs [SPARK-57075[INFRA] Share precompile Coursier cache with test/pyspark/sparkr jobs May 26, 2026
@zhengruifeng zhengruifeng changed the title [SPARK-57075[INFRA] Share precompile Coursier cache with test/pyspark/sparkr jobs [SPARK-57075][INFRA] Share precompile Coursier cache with test/pyspark/sparkr jobs May 26, 2026
@zhengruifeng zhengruifeng force-pushed the share-precompile-coursier-cache-dev6 branch 2 times, most recently from b759a65 to fc6274b Compare May 27, 2026 12:06
@zhengruifeng zhengruifeng marked this pull request as ready for review May 28, 2026 08:33
Add the `precompile-coursier-` cache as a restore-key fallback for the
`test`, `pyspark`, and `sparkr` jobs in `build_and_test.yml` so they can
reuse the dependency JARs already resolved by the `precompile` job
instead of re-downloading them when their own Coursier cache misses.

Generated-by: Claude Code (Claude Opus 4.7)
…, docker/k8s integration

Add the `precompile-coursier-` cache as a restore-key fallback to the
remaining SBT-using jobs in `build_and_test.yml`: `lint`, `docs`,
`tpcds-1g`, `docker-integration-tests`, and `k8s-integration-tests`.

All five jobs run SBT tasks (scalastyle/mima, unidoc, TPC-DS queries,
docker integration tests, Kubernetes integration tests) whose dependency
sets are a subset of what the `precompile` job already resolves with
its full profile list (`-Phadoop-3 -Pyarn -Pspark-ganglia-lgpl
-Phadoop-cloud -Phive -Pkubernetes -Pjvm-profiler -Pkinesis-asl
-Phive-thriftserver -Pdocker-integration-tests -Pvolcano`). When their
own per-job Coursier cache misses, they can now fall back to the
precompile cache instead of re-downloading JARs.

Generated-by: Claude Code (Claude Opus 4.7)
@zhengruifeng zhengruifeng force-pushed the share-precompile-coursier-cache-dev6 branch from fc6274b to 919909f Compare May 28, 2026 08:40
@zhengruifeng zhengruifeng marked this pull request as draft May 28, 2026 12:13
@zhengruifeng zhengruifeng marked this pull request as ready for review May 28, 2026 12:27
@zhengruifeng zhengruifeng marked this pull request as draft May 28, 2026 12:40
Remove the `precompile-coursier-` restore-key fallbacks from the four
container-based jobs (`pyspark`, `sparkr`, `lint`, `docs`) since they
cannot benefit from the fallback in practice.

Audit of run https://github.com/zhengruifeng/spark/actions/runs/26564295518
shows every container job's `Cache Coursier local repository` step both
misses on restore and emits "Path Validation Error: Path(s) specified
in the action for caching do(es) not exist, hence no cache is being
saved" on the post-save step. `~/.cache/coursier` is empty at end-of-job
inside these containers because their workloads (R tests, Python tests,
unidoc, scalastyle/mima) either don't invoke Coursier or use a different
cache location than the host runner.

The host-runner SBT jobs (`build`/test, `tpcds-1g`,
`docker-integration-tests`, `k8s-integration-tests`) keep their
precompile fallback - it's confirmed to work for them, restoring from
`precompile-coursier-<hash>` with zero real Maven downloads.

Generated-by: Claude Code (Claude Opus 4.7)
@zhengruifeng zhengruifeng changed the title [SPARK-57075][INFRA] Share precompile Coursier cache with test/pyspark/sparkr jobs [SPARK-57075][INFRA] Share precompile Coursier cache with host-runner SBT jobs May 28, 2026
@zhengruifeng zhengruifeng marked this pull request as ready for review May 28, 2026 13:49
zhengruifeng added a commit that referenced this pull request May 29, 2026
… SBT jobs

### What changes were proposed in this pull request?

Add `precompile-coursier-<hash>` and `precompile-coursier-` as restore-key fallbacks on `Cache Coursier local repository` for the four host-runner SBT jobs: `build` (Scala tests), `tpcds-1g`, `docker-integration-tests`, `k8s-integration-tests`. The primary key and existing prefix fallback are unchanged — the new entries are pure fallback.

### Why are the changes needed?

The `precompile` job already resolves all dependencies and writes them to `~/.cache/coursier`, but it saves under the key prefix `precompile-coursier-`, while the downstream test jobs read from `<matrix.java>-<matrix.hadoop>-coursier-`, `tpcds-coursier-`, etc. So on cold caches (new branch, modified `pom.xml` / `plugins.sbt`), the downstream jobs re-download dependencies that the precompile job already resolved minutes earlier in the same workflow.

### Does this PR introduce _any_ user-facing change?

No. CI-only.

### How was this patch tested?

Verified on https://github.com/zhengruifeng/spark/actions/runs/26564295518 (commit `919909f5cb9`): 12/12 host-runner SBT jobs restored a Coursier cache without re-downloading from Maven Central. 9 of the 12 fell back to the new `precompile-coursier-<hash>` cache; the other 3 found their own per-job cache from a prior run.

Container jobs (`pyspark`, `sparkr`, `lint`, `docs`) are excluded — their Coursier cache step is a no-op due to a host↔container `$HOME` path mismatch (host writes `/home/runner/.cache/coursier`, container looks at `/github/home/.cache/coursier`). Independent of this PR; can be addressed in a follow-up.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

Closes #56118 from zhengruifeng/share-precompile-coursier-cache-dev6.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(cherry picked from commit 299e517)
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
zhengruifeng added a commit that referenced this pull request May 29, 2026
… SBT jobs

### What changes were proposed in this pull request?

Add `precompile-coursier-<hash>` and `precompile-coursier-` as restore-key fallbacks on `Cache Coursier local repository` for the four host-runner SBT jobs: `build` (Scala tests), `tpcds-1g`, `docker-integration-tests`, `k8s-integration-tests`. The primary key and existing prefix fallback are unchanged — the new entries are pure fallback.

### Why are the changes needed?

The `precompile` job already resolves all dependencies and writes them to `~/.cache/coursier`, but it saves under the key prefix `precompile-coursier-`, while the downstream test jobs read from `<matrix.java>-<matrix.hadoop>-coursier-`, `tpcds-coursier-`, etc. So on cold caches (new branch, modified `pom.xml` / `plugins.sbt`), the downstream jobs re-download dependencies that the precompile job already resolved minutes earlier in the same workflow.

### Does this PR introduce _any_ user-facing change?

No. CI-only.

### How was this patch tested?

Verified on https://github.com/zhengruifeng/spark/actions/runs/26564295518 (commit `919909f5cb9`): 12/12 host-runner SBT jobs restored a Coursier cache without re-downloading from Maven Central. 9 of the 12 fell back to the new `precompile-coursier-<hash>` cache; the other 3 found their own per-job cache from a prior run.

Container jobs (`pyspark`, `sparkr`, `lint`, `docs`) are excluded — their Coursier cache step is a no-op due to a host↔container `$HOME` path mismatch (host writes `/home/runner/.cache/coursier`, container looks at `/github/home/.cache/coursier`). Independent of this PR; can be addressed in a follow-up.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

Closes #56118 from zhengruifeng/share-precompile-coursier-cache-dev6.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(cherry picked from commit 299e517)
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
@zhengruifeng
Copy link
Copy Markdown
Contributor Author

merged to master/4.x

@zhengruifeng zhengruifeng deleted the share-precompile-coursier-cache-dev6 branch May 29, 2026 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants