Skip to content

CI: Select PR matrices incrementally#16566

Open
ajantha-bhat wants to merge 3 commits into
apache:mainfrom
ajantha-bhat:codex/incremental-pr-ci
Open

CI: Select PR matrices incrementally#16566
ajantha-bhat wants to merge 3 commits into
apache:mainfrom
ajantha-bhat:codex/incremental-pr-ci

Conversation

@ajantha-bhat
Copy link
Copy Markdown
Member

@ajantha-bhat ajantha-bhat commented May 26, 2026

Mailing list discussion: https://lists.apache.org/thread/36vxlql61gojbg639c86mnz78n57kvgm

Summary

  • add a shared PR CI planner that uses changed files, full-ci labels, and global build/workflow changes to select PR matrices
  • keep full Java/Spark/Flink/Hive/Kafka/Delta/CVE coverage for main, release branches, tags, full-ci, and global Gradle/workflow changes
  • use Java 17 as the primary JVM for ordinary PRs, while full CI paths keep Java 17 and Java 21
  • preserve the existing Gradle cache policy from Build: Designate a single Gradle cache writer across CI workflows #16356: only java-ci.yml build-checks (17) writes cache on main; all other Gradle jobs are read-only

Selective PR behavior

  • Spark-only changes run Spark jobs, not Flink/Hive/Kafka jobs
  • spark/v4.1/** selects Spark 4.1 only; other versioned Spark paths behave similarly
  • flink/v2.0/** selects Flink 2.0 only; other versioned Flink paths behave similarly
  • API/Core/Data/file-format changes run Java checks plus latest Spark and latest Flink canaries
  • runtime/bundle CVE scans are limited to affected runtime artifacts, while dependency/global changes run the full CVE matrix

Gradle cache behavior

This PR does not add new Gradle cache writers. PR jobs restore cache read-only, release/tag jobs remain read-only, and main keeps the single canonical writer introduced in #16356.

Validation

  • YAML parse for edited workflows
  • git diff --check origin/main...HEAD
  • bash -n .github/scripts/plan-pr-ci.sh
  • synthetic planner assertions for Spark 4.1, Flink 2.0, core canaries, global build changes, and Kafka/CVE selection
  • ./gradlew -h

@github-actions github-actions Bot added the INFRA label May 26, 2026
@ajantha-bhat ajantha-bhat changed the title CI: Add incremental PR build planner [WIP] CI: Add incremental PR build planner May 26, 2026
@ajantha-bhat ajantha-bhat marked this pull request as draft May 26, 2026 07:11
@ajantha-bhat ajantha-bhat changed the title [WIP] CI: Add incremental PR build planner CI: Add incremental PR build planner May 26, 2026
@ajantha-bhat ajantha-bhat force-pushed the codex/incremental-pr-ci branch from 8871ed8 to 0f085f8 Compare May 26, 2026 10:14
@ajantha-bhat ajantha-bhat marked this pull request as ready for review May 26, 2026 12:15
@ajantha-bhat ajantha-bhat changed the title CI: Add incremental PR build planner [WIP] CI: Add incremental PR build planner May 26, 2026
@ajantha-bhat ajantha-bhat changed the title [WIP] CI: Add incremental PR build planner CI: Add incremental PR build planner (WIP) May 26, 2026
@ajantha-bhat ajantha-bhat changed the title CI: Add incremental PR build planner (WIP) CI: Share Gradle build cache across jobs May 26, 2026
@ajantha-bhat ajantha-bhat force-pushed the codex/incremental-pr-ci branch from dd3c1c0 to 43ffb7b Compare May 26, 2026 13:22
@ajantha-bhat ajantha-bhat changed the title CI: Share Gradle build cache across jobs CI: Share Gradle cache and select PR matrices May 26, 2026
@ajantha-bhat ajantha-bhat force-pushed the codex/incremental-pr-ci branch 2 times, most recently from 5b1993c to 4682b11 Compare May 26, 2026 13:48
@kevinjqliu
Copy link
Copy Markdown
Contributor

thanks for the PR @ajantha-bhat

i've done some work with gradle cache recently (#16356) and made it so that there's only 1 canonical writer. Before this change, I saw that we were constantly getting thrashed by multiple cache writers and cache utilization was really low.

Theres also a security component to this, we should only write to cache on push to main branch. We should not allow PRs to write to the shared cache since that's a cache poisoning vulnerability.

I'm curious how this change effects when to save to cache and how its reused

@ajantha-bhat ajantha-bhat force-pushed the codex/incremental-pr-ci branch from 4682b11 to dc78709 Compare May 27, 2026 04:49
@ajantha-bhat ajantha-bhat changed the title CI: Share Gradle cache and select PR matrices CI: Select PR matrices incrementally May 27, 2026
@ajantha-bhat
Copy link
Copy Markdown
Member Author

Thanks @kevinjqliu, good catch.

I updated the PR to preserve #16356's cache model. The selective PR matrix planner now stands on its own, and this PR no longer adds custom cache artifact merge/store actions or additional Gradle cache writers.

Behavior after the update:

So the runner reduction now comes from selective matrices, while the shared Gradle cache writer remains unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants