Skip to content

[build] prune old codeql caches#17570

Merged
titusfortner merged 2 commits into
trunkfrom
codql_cache
May 25, 2026
Merged

[build] prune old codeql caches#17570
titusfortner merged 2 commits into
trunkfrom
codql_cache

Conversation

@titusfortner
Copy link
Copy Markdown
Member

🔗 Related Issues

As discussed in #17568 one solution to the intermittent 502 failures is to enable cache for RBE job.
Currently about 25% of our cache allocation is taken by codeql. This will reduce it to ~5%

💥 What does this PR do?

After every codeql completion it deletes everything except the most recent cache

🤖 AI assistance

  • No substantial AI assistance used
  • AI assisted (complete below)
    • Tool(s):
    • What was generated:
    • I reviewed all AI output and can explain the change

@selenium-ci selenium-ci added the B-build Includes scripting, bazel and CI integrations label May 25, 2026
@qodo-code-review
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Add automated CodeQL cache pruning workflow

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add automated CodeQL cache pruning to reduce cache usage
• Keep only most recent cache per group, delete stale entries
• Trigger pruning after CodeQL workflow completion
• Reduces cache allocation from ~25% to ~5%
Diagram
flowchart LR
  A["CodeQL Workflow<br/>Completes"] -->|triggers| B["Prune Caches<br/>Workflow"]
  B -->|runs| C["prune-codeql-caches.sh"]
  C -->|groups caches| D["overlay-base-database<br/>& dependencies"]
  C -->|keeps newest| E["Most Recent<br/>Cache per Group"]
  C -->|deletes| F["Stale Caches<br/>Removed"]
  F -->|reduces usage| G["Cache Budget<br/>Optimized"]

Loading

File Changes

1. scripts/github-actions/prune-codeql-caches.sh ✨ Enhancement +84/-0

CodeQL cache pruning script implementation

• New bash script to prune stale CodeQL caches from GitHub Actions
• Groups caches by type: overlay-base-database and dependencies
• Keeps only the most recently created cache per group
• Supports dry-run mode (default) and --delete flag for actual removal
• Handles race conditions when multiple jobs run concurrently

scripts/github-actions/prune-codeql-caches.sh


2. .github/workflows/prune-caches.yml ⚙️ Configuration changes +27/-0

Automated cache pruning workflow configuration

• New GitHub Actions workflow triggered after CodeQL workflow completion
• Runs on trunk branch with manual dispatch option
• Uses concurrency group to prevent parallel executions
• Grants actions:write permission for cache deletion
• Executes pruning script with --delete flag to remove stale caches

.github/workflows/prune-caches.yml


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented May 25, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (1) 📎 Requirement gaps (0)

Grey Divider


Action required

1. gh cache delete errors ignored 📘 Rule violation ☼ Reliability
Description
The cache-pruning script logs a warning and continues when gh cache delete fails for reasons other
than a 404, which can silently leave stale caches and undermine the workflow’s purpose. Compliance
requires robust error handling in CI/scripts rather than swallowing errors.
Code

scripts/github-actions/prune-codeql-caches.sh[R66-75]

Evidence
PR Compliance ID 13 requires robust error handling and avoiding error-swallowing in scripts. The new
code treats unexpected gh cache delete failures as warnings and continues, which can mask real
failures (e.g., auth/rate-limit/API issues) and produce non-deterministic pruning results.

scripts/github-actions/prune-codeql-caches.sh[66-75]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The script intentionally continues after unexpected `gh cache delete` failures (non-404), which can hide permission/API errors and leave caches unpruned.

## Issue Context
With `set -euo pipefail`, the script still swallows delete failures by converting them into `::warning::...` and `continue`. For CI correctness and determinism, unexpected failures should fail the job (or at least be aggregated and cause a non-zero exit code at the end).

## Fix Focus Areas
- scripts/github-actions/prune-codeql-caches.sh[66-75]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Workflow_run trigger mismatch 🐞 Bug ☼ Reliability
Description
.github/workflows/prune-caches.yml only runs after a workflow named CodeQL completes, but this
PR branch contains no workflow with that name, so pruning will never run automatically. This defeats
the PR’s goal of reclaiming Actions cache space (manual workflow_dispatch aside).
Code

.github/workflows/prune-caches.yml[R4-7]

Evidence
The pruning workflow is explicitly scoped to run only after a workflow named CodeQL completes;
since the only CodeQL reference is in this new workflow, the automatic trigger will not match
anything and the job will not run.

.github/workflows/prune-caches.yml[3-8]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The prune workflow is configured to run on `workflow_run` for `workflows: ["CodeQL"]`, but there is no workflow in this branch with `name: CodeQL`, so the prune job will never be triggered automatically.

## Issue Context
`workflow_run.workflows` must match the *workflow `name:`* of an existing workflow in this repo. If the actual CodeQL workflow has a different name (or is not defined in-repo), this trigger will not fire.

## Fix Focus Areas
- .github/workflows/prune-caches.yml[3-8]

## Suggested fix
- Update `workflows: ["CodeQL"]` to the exact `name:` of the workflow that runs CodeQL in this repository.
- If CodeQL is not defined in-repo, replace `workflow_run` with an alternative trigger (e.g., `schedule` + `workflow_dispatch`) that guarantees regular pruning on `trunk`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. Cache listing not paginated ✓ Resolved 🐞 Bug ☼ Reliability
Description
prune-codeql-caches.sh fetches only 100 caches (gh cache list --limit 100), so any matching
caches beyond that are never considered for deletion. Since the script itself notes CodeQL creates a
new cache per commit, older caches can accumulate beyond 100 and remain unpruned.
Code

scripts/github-actions/prune-codeql-caches.sh[R24-28]

Evidence
The script states CodeQL creates a new cache per commit, but it only fetches 100 cache entries,
meaning it cannot prune older entries once the count exceeds that limit.

scripts/github-actions/prune-codeql-caches.sh[3-6]
scripts/github-actions/prune-codeql-caches.sh[24-28]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The script only lists up to 100 caches and prunes within that partial set, which can leave most stale caches untouched once the repository accumulates more than 100 CodeQL caches.

## Issue Context
The script comments say CodeQL creates a new overlay-base-database cache per commit; on an active repository, that can exceed 100 entries quickly.

## Fix Focus Areas
- scripts/github-actions/prune-codeql-caches.sh[3-6]
- scripts/github-actions/prune-codeql-caches.sh[24-28]

## Suggested fix
- Replace the single `gh cache list ... --limit 100` call with logic that retrieves *all* matching caches.
 - Option A: increase `--limit` to the maximum supported by `gh` (if sufficiently high for this repo’s scale).
 - Option B (more robust): implement pagination via the GitHub API (`gh api`) to iterate through all pages of caches, building the `rows` array from the full result set.
- Keep the rest of the grouping/deletion logic unchanged so behavior stays “keep newest per group”.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread scripts/github-actions/prune-codeql-caches.sh
Comment thread .github/workflows/prune-caches.yml
@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented May 25, 2026

Persistent review updated to latest commit e197140

@titusfortner titusfortner merged commit 312e586 into trunk May 25, 2026
26 of 27 checks passed
@titusfortner titusfortner deleted the codql_cache branch May 25, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants