[build] Improve CI speed and add job timeouts by AutomatedTester · Pull Request #17590 · SeleniumHQ/selenium

AutomatedTester · 2026-05-29T12:36:18Z

Summary

Merge check+read-targets into a single native job (ci.yml): the affected-targets detection is a shell script that reads a cached index file — it doesn't need the full bazel.yml setup (free-disk-space, setup-bazel, disk-checks, artifact upload/download). Collapsing these two sequential jobs into one native job saves ~3–5 min off the critical path on every PR.
Remove needs: build from Ruby lint (ci-ruby.yml): lint runs ./go rb:lint which is source-level analysis and doesn't need the compiled artifacts. Now runs in parallel with build, matching the Python pattern.
Add timeout-minutes to bazel.yml (default 60) with per-caller overrides: 15 min for lint, 20 min for builds, 30 min for unit tests, 90 min for browser/integration tests. Prevents hung browser tests from holding a runner for 6 hours (the GitHub Actions default).
Merge Rust platform builds (ci-rust.yml): consolidate {windows,linux,macos}-{stable,debug} from 6 jobs into 3 combined jobs that build both profiles in one runner, sharing toolchain setup and halving the fan-out.

Test plan

Verify Check Targets job completes without Bazel setup (check the job log — no "Setup Bazel" step should appear)
Verify Ruby lint and build start at the same time on a Ruby PR
Verify a Rust PR shows Windows (stable + debug), Linux (stable + debug), MacOS (stable + debug) jobs instead of 6 separate jobs
Confirm artifact names for Rust binaries are unchanged (selenium-manager-windows, selenium-manager-windows-debug, etc.)
Confirm ci-success still passes when non-affected language jobs are skipped

- Merge check+read-targets into a single native job in ci.yml, skipping the full bazel.yml setup (disk-free, setup-bazel, disk-checks) for what is essentially a shell script. Saves 3-5 min off the critical path on every PR by eliminating one job + one artifact upload/download. - Remove needs:build from Ruby lint so lint and build run in parallel, matching the Python pattern already in place. - Add timeout-minutes input to bazel.yml (default 60) and set explicit limits per caller: 15-20 min for lint/build, 30 min for unit tests, 90 min for browser/integration tests. Prevents hung runners from holding concurrency slots for 6 hours. - Merge Rust {windows,linux,macos}-{stable,debug} into three combined build jobs that produce both artifacts in one runner, halving the platform build fan-out from 6 jobs to 3 and sharing toolchain setup.

titusfortner · 2026-05-29T19:33:29Z

Most of this doesn't do what we need and is mistaken in a few places. Things are a lot better optimized than the AI agent thinks they are.

The only we'll actually move the needle on CI speed is to stop running tests that aren't providing value (#17539).
I've been doing a lot of work on Ruby to show the template I'd like to see in other languages (#17550).

The specifics on this PR:

affected_targets is a shell script that calls bazel in 2 places, so it can't be removed from bazel.yml, and we don't want to repeat the bazel.yml logic in ci.yml anyway, the checkout logic alone makes it worth running through bazel.yml. Fetch-depth 2 would break in a number of places. Besides, the bazel.yml overhead with the conditionals we have in place is about ~10s without the disk cleanup script that we need to remove, anyway.
Ruby Lint needs bazel as well
Merging Rust platform builds will actually just slow down the wall-clock because of parallelization. That said, I don't think we need to run debug at all most of the time - @bonigarcia how often do you work with the debug artifacts on PRs? We could gate generation with the same logic as we do for the following release step. But still this is a few minutes of runner time, the real answer to rust performance is, like you said, moving things to bazel ([rust] Allow cross-compilation of selenium-manager on all platforms #17586).
I don't think hard coded job-level timeouts are the right answer either. The rerun with debug step requires the bazel runs to complete so we can get extra info about the failure if we need. If we bail early then we might not have the info we need to actually fix things. Do we have any data to support the benefit here? Looking at the past 1000 jobs, most of the issues look to be Safari related, or MacOS availability starvation which gets us back to reducing how much we're running right now.

diemol · 2026-05-29T20:03:38Z

I'm with @titusfortner on this. No one has spent more time optimizing the build than him, and some of these changes are overriding changes he has made. I don't think we need to merge this.

bonigarcia · 2026-05-30T05:15:12Z

how often do you work with the debug artifacts on PRs?

SM is quite stable now, so the debug artifacts are not strictly required. We can stop generating them, IMO.

selenium-ci added the B-build Includes scripting, bazel and CI integrations label May 29, 2026

AutomatedTester requested a review from titusfortner May 29, 2026 13:06

titusfortner mentioned this pull request May 30, 2026

[build] only build debug artifacts when releasing #17595

Open

AutomatedTester closed this May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[build] Improve CI speed and add job timeouts#17590

[build] Improve CI speed and add job timeouts#17590
AutomatedTester wants to merge 1 commit into
trunkfrom
worktree-misty-sprouting-cupcake

AutomatedTester commented May 29, 2026

Uh oh!

titusfortner commented May 29, 2026

Uh oh!

diemol commented May 29, 2026

Uh oh!

bonigarcia commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

AutomatedTester commented May 29, 2026

Summary

Test plan

Uh oh!

titusfortner commented May 29, 2026

Uh oh!

diemol commented May 29, 2026

Uh oh!

bonigarcia commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants