Skip to content

Run CodeQL only on languages changed in a PR#67972

Merged
shahar1 merged 2 commits into
apache:mainfrom
shahar1:codeql-per-language-gating
Jun 4, 2026
Merged

Run CodeQL only on languages changed in a PR#67972
shahar1 merged 2 commits into
apache:mainfrom
shahar1:codeql-per-language-gating

Conversation

@shahar1
Copy link
Copy Markdown
Contributor

@shahar1 shahar1 commented Jun 3, 2026

What

On pull_request, scan only the CodeQL languages whose files actually changed, instead of always running all five (python, javascript, actions, go, java). A small detect-languages job inspects the PR's changed files and builds the analysis matrix dynamically. push (to main) and schedule runs are unchanged — they always scan every language, so coverage of the main branch is identical to today.

Result per PR:

  • docs-only PR → CodeQL runs nothing (just the tiny detect job)
  • the common python-only PR → 1 analysis job instead of 5
  • multi-language PRs → only the languages they touch

Why

CodeQL on PRs is by far the most frequently triggered workflow in the repo — on the order of ~1,300+ runs/week (≈ 87% of all CodeQL runs are pull_request). Every one of those runs currently fans out one job per language regardless of what changed, so it is a constant, high-volume contributor to runner/concurrency pressure on the shared Actions pool. Measuring a sample of recent PRs:

  • ~67% are python-only, ~12% touch no scannable code at all
  • javascript ~20%, actions ~2%, go ~1%, java ~0%

So the large majority of the language jobs we run on PRs scan code that did not change. Gating the matrix cuts roughly ~55–60% of CodeQL PR minutes and ~80% of CodeQL job-starts, while keeping full per-language coverage on main.

Relationship to #45541

#45541 ("CodeQL scanning can run always on all code") deliberately removed conditional CodeQL logic, on the basis that "CodeQL scanning is fast and having custom configuration … makes it unnecessarily complex." That was true at the time — CodeQL then scanned 3 fast languages (python, javascript, actions).

Two things have changed since:

  1. go and especially java were added afterwardsjava via the "Add Java SDK" change, which runs a full setup-java + ./gradlew classes testClasses Gradle build on every PR, even though java-sdk files change in well under 1% of PRs. That materially breaks the "CodeQL is fast" premise the always-on decision rested on.
  2. The repo is now hitting Actions capacity limits, so the frequency of this workflow (not just per-run cost) matters: trimming ~80% of its job-starts directly relieves the shared concurrency pool.

The added complexity here is intentionally small and contained to one workflow (a single detect job + a dynamic matrix), and only affects PR runs — main scanning stays exactly as it is.


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.8)

Generated-by: Claude Code (Opus 4.8) following the guidelines

PR-triggered CodeQL is by far the most frequent workflow in the repo
(~1,300+ runs/week), and every run fans out one job per language
(python, javascript, actions, go, java) regardless of what changed.
The java job in particular runs a full Gradle build on every PR even
though java-sdk files change in well under 1% of PRs.

Gate the language matrix on the files actually changed in the PR: a
docs-only PR now runs nothing, and the common python-only PR runs a
single job instead of five. push-to-main and scheduled runs still scan
every language, so coverage of the main branch is unchanged.
@boring-cyborg boring-cyborg Bot added area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch labels Jun 3, 2026
Copy link
Copy Markdown
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@nailo2c
Copy link
Copy Markdown
Contributor

nailo2c commented Jun 3, 2026

I love this!

@shahar1 shahar1 merged commit c91dc3b into apache:main Jun 4, 2026
143 checks passed
@shahar1 shahar1 deleted the codeql-per-language-gating branch June 4, 2026 05:19
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Backport failed to create: v3-2-test. View the failure log Run details

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-2-test Commit Link

You can attempt to backport this manually by running:

cherry_picker c91dc3b v3-2-test

This should apply the commit to the v3-2-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

If you don't have cherry-picker installed, see the installation guide.

shahar1 added a commit that referenced this pull request Jun 4, 2026
#67972) (#68024)

PR-triggered CodeQL is by far the most frequent workflow in the repo
(~1,300+ runs/week), and every run fans out one job per language
(python, javascript, actions, go, java) regardless of what changed.
The java job in particular runs a full Gradle build on every PR even
though java-sdk files change in well under 1% of PRs.

Gate the language matrix on the files actually changed in the PR: a
docs-only PR now runs nothing, and the common python-only PR runs a
single job instead of five. push-to-main and scheduled runs still scan
every language, so coverage of the main branch is unchanged.

(cherry picked from commit c91dc3b)
shahar1 added a commit to shahar1/airflow that referenced this pull request Jun 5, 2026
running all five languages unconditionally.  Extend the detect-languages
job to use the GitHub compare API (before…after) for push events, so a
docs-only or single-language merge to main no longer fans out all five
CodeQL jobs.  schedule runs are unchanged — they still scan every
language to maintain periodic full-branch coverage.  Falls back to all
languages when the compare API is unavailable or the before SHA is all
zeros (branch creation).

related: apache#67972
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants