Skip to content

Extend CodeQL language gating to push events (main + release branches)#68085

Open
shahar1 wants to merge 2 commits into
apache:mainfrom
shahar1:codeql-gate-push-by-language
Open

Extend CodeQL language gating to push events (main + release branches)#68085
shahar1 wants to merge 2 commits into
apache:mainfrom
shahar1:codeql-gate-push-by-language

Conversation

@shahar1
Copy link
Copy Markdown
Contributor

@shahar1 shahar1 commented Jun 5, 2026

What

Extends the language-gating logic from #67972 — which only applied to pull_request — to push events as well, and brings the push trigger to parity with pull_request so it also fires on release branches.

Two changes:

  1. Gate the matrix on push. The detect-languages job now uses the GitHub compare API (repos/{repo}/compare/{before}...{after}) to find which languages actually changed in a push, and builds the analysis matrix from that — the same way it already does for PRs. A docs-only or single-language merge no longer fans out all five CodeQL jobs.
  2. Run push CodeQL on release branches too. The push trigger now matches the pull_request trigger (main, v[0-9]+-[0-9]+-test, v[0-9]+-[0-9]+-stable) instead of main only.

schedule runs are unchanged: they still scan all five languages.

Why

#67972 intentionally left push scanning all languages, reasoning that full coverage on main was the goal. But the daily schedule run already provides that full-coverage guarantee — and it only runs on the default branch (main). That left two gaps:

  • main: every merge commit fanned out five CodeQL jobs even when nothing relevant changed (the daily schedule already covers full-branch scanning).
  • Release branches: merges to v*-test / v*-stable got no push-time CodeQL at all (only PRs targeting them were scanned, and schedule doesn't run there). Adding gated push runs gives maintained release branches the same security coverage cheaply.

The detect logic is branch-agnostic, so the same compare-based gating works for every push branch.

Edge cases handled

The compare call is followed by a fallback to a full scan when it fails or returns nothing — covering a force-push, or a newly created release branch whose before SHA is all zeros (no base commit to diff against).

Behaviour summary

Event Before After
pull_request (main / release) changed languages only (#67972) unchanged
push to main all 5 languages changed languages only
push to v*-test / v*-stable not triggered changed languages only
schedule (daily, main) all 5 languages all 5 languages (unchanged)

related: #67972


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (claude-opus-4-8)

Generated-by: Claude Code (claude-opus-4-8) following the guidelines

@boring-cyborg boring-cyborg Bot added area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch labels Jun 5, 2026
@shahar1 shahar1 marked this pull request as draft June 5, 2026 16:34
running all five languages unconditionally.  Extend the detect-languages
job to use the GitHub compare API (before…after) for push events, so a
docs-only or single-language merge to main no longer fans out all five
CodeQL jobs.  schedule runs are unchanged — they still scan every
language to maintain periodic full-branch coverage.  Falls back to all
languages when the compare API is unavailable or the before SHA is all
zeros (branch creation).

related: apache#67972
@shahar1 shahar1 force-pushed the codeql-gate-push-by-language branch from bbb4699 to ec7a1ea Compare June 5, 2026 16:38
@shahar1 shahar1 marked this pull request as ready for review June 5, 2026 16:38
@shahar1 shahar1 changed the title Extend CodeQL language gating to push-to-main events Extend CodeQL language gating to push events (main + release branches) Jun 5, 2026
@shahar1
Copy link
Copy Markdown
Contributor Author

shahar1 commented Jun 5, 2026

Need your opinion - more than often we merge one PR after another, so the latest commit cancels the previous CodeQL runs.
If we apply the pull_request behavior on main like we do in this PR, we cannot trust that the latest successful "✔️" really reflects that there are no issues (because it may have targeted different set of languages than previous commit).
On the other hand, as we now know - CodeQL runs are heavy, and I don't see a point to run the CodeQL on Java on every commit to main if it doesn't handle the JAVA sdk explicitly.

Considering that anyway we have the scheduled runs:
Is the current PR a good compromise (despite the "information loss" behavior),
or should we keep the existing behavior (run all languages on main),
or, on the other side - should we just remove the runs on main and rely on the scheduled runs?

Copy link
Copy Markdown
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion if we scan (fltered/selectively) on PR prior merge then it is okay to only do a full scan on schedule and have a small risk if multiple PRs are merged and not each merge is a complete check.

Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice — gating push and bringing release branches to parity both make sense, and failing open to a full scan on compare failure is the right call.

One non-blocking robustness nit: the push compare call isn't paginated while the PR path is (--paginate). GitHub's compare API returns at most 300 files per page, so a large merge (>300 files) could under-detect a changed language and skip scanning it — most relevant on release branches, which have no daily schedule full-scan to back them up. Adding --paginate to the compare call would close that and match the PR path. Up to you whether to fold it in here or as a follow-up.


Drafted-by: Claude Code (Opus 4.8); reviewed by @potiuk before posting

The GitHub compare API returns at most 300 changed files and does not
paginate the file list (only commits paginate), so a merge touching more
than 300 files truncates the list and could under-detect a changed
language. Detect that cap and fall back to scanning every language —
release branches have no daily schedule full-scan to back them up.
@shahar1
Copy link
Copy Markdown
Contributor Author

shahar1 commented Jun 6, 2026

Failure in "Latest Boto test" is surfaced in this PR but unrelated, see:
#68122

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants