Skip to content

[SPARK-56830][INFRA] Share SBT compile artifact with python hosted runner CI jobs#56107

Draft
zhengruifeng wants to merge 2 commits into
apache:masterfrom
zhengruifeng:share-sbt-compile-python-macos-dev5
Draft

[SPARK-56830][INFRA] Share SBT compile artifact with python hosted runner CI jobs#56107
zhengruifeng wants to merge 2 commits into
apache:masterfrom
zhengruifeng:share-sbt-compile-python-macos-dev5

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR extends the SBT precompile-sharing pattern (parent: SPARK-56830, pyspark: SPARK-56768) to the python-only macOS / ARM workflows that run via .github/workflows/python_hosted_runner_test.yml.

Concretely:

  • New precompile job in python_hosted_runner_test.yml runs Spark's SBT build once on ${{ inputs.os }}:
    ./build/sbt -Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud -Phive \
      -Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver \
      -Pdocker-integration-tests -Pvolcano \
      Test/package streaming-kinesis-asl-assembly/assembly connect/assembly assembly/package
    
    It tars every target/ directory (excluding ./build/ and ./.git/) with tar -czf, uploads as spark-compile-<os>-<branch>-<run_id> with retention-days: 1.
  • The 9 pyspark matrix entries in the same workflow add precompile to needs: and if: (!cancelled()), download/extract the artifact (with graceful fallback), and export SKIP_SCALA_BUILD=true so dev/run-tests.py skips build_apache_spark and build_spark_assembly_sbt.
  • Cache steps in the new precompile job are gated if: ${{ runner.os != 'macOS' }} to match the existing TODO(SPARK-54466) pattern in this file: on macos-26 the precompile runs without GHA cache; on ubuntu-24.04-arm it caches as expected.
  • Artifact name includes ${{ inputs.os }} so the two callers (build_python_3.12_macos26.yml and build_python_3.12_arm.yml) cannot collide.

This benefits both callers of the reusable workflow:

  • .github/workflows/build_python_3.12_macos26.yml (macos-26)
  • .github/workflows/build_python_3.12_arm.yml (ubuntu-24.04-arm)

Optional: graceful fallback if precompile fails

Same pattern as SPARK-56768:

  • precompile has continue-on-error: true so a failed or cancelled precompile does not fail the workflow run.
  • The matrix's "Download precompiled artifact" step is gated on needs.precompile.result == 'success' and itself has continue-on-error: true.
  • The "Extract precompiled artifact" step is gated on the download succeeding, and also has continue-on-error: true.
  • Inside the "Run tests" bash block, SKIP_SCALA_BUILD=true is exported only when steps.extract-precompiled.outcome == 'success'. Otherwise it stays unset and dev/run-tests.py falls back to the original local SBT build.

Why are the changes needed?

Today every one of the 9 pyspark matrix entries in python_hosted_runner_test.yml runs the same SBT build from scratch. Sharing the compile artifact once across the matrix avoids 8x duplicate SBT compile work per scheduled run of build_python_3.12_macos26.yml (and build_python_3.12_arm.yml). This mirrors the savings already realized for the Linux pyspark matrix in SPARK-56768.

Does this PR introduce any user-facing change?

No. CI infrastructure change only.

How was this patch tested?

The change is exercised by the CI run of this PR itself. If the precompile job is forced to fail (or its artifact is missing), the matrix entries should still pass via the fallback path. The "Run tests" step logs Reusing precompiled artifact, skipping local SBT build. to make the fast path visible per matrix entry.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

@zhengruifeng zhengruifeng changed the title [INFRA] Share SBT compile artifact with python hosted runner CI jobs [SPARK-56830][INFRA] Share SBT compile artifact with python hosted runner CI jobs May 26, 2026
Validation-only commit. Revert before marking PR ready for review.

Generated-by: Claude Code (Opus 4.7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant