Skip to content

ci: cpu_test.yml races itself on PRs (push + pull_request triggers, no concurrency group) #234

@shuheng-liu

Description

@shuheng-liu

Symptom

CPU Tests CI fails on PRs with a consistent pattern: the push-triggered run passes, the pull_request-triggered run on the same SHA fails. Reproduced across two distinct commits on PR #232:

Commit Push trigger PR trigger Δ start
0bf66d5 run 25246947390 at 07:33:01 ✅ run 25246954126 at 07:33:23 ❌ 22 s
c5a7e30 run 25247282253 at 07:52:15 ✅ run 25247282825 at 07:52:16 ❌ 1 s

In both cases push starts a few seconds earlier and succeeds; PR runs concurrently and fails. The failure is not random — it is systematically the second-to-start.

Root cause

.github/workflows/cpu_test.yml triggers on both push and pull_request, and on a PR branch both events fire. There is no concurrency: block, so two full ~8-minute CPU-test jobs run in parallel at the same SHA against the same external resources (HF Hub auth + dataset downloads, libero assets, etc.). The second job to start loses the race against pre-existing flaky tests in the datasets/HF-Hub network family — the same flake class @shuheng-liu has documented in #229 ("pre-existing failures: datasets/HF-Hub network, test_optional_keys's SimpleNamespace mock missing camera_keys, test_hub").

Fix

Add a concurrency: block at the top of cpu_test.yml so duplicate runs at the same ref dedupe instead of racing:

concurrency:
  group: cpu-test-${{ github.ref }}
  cancel-in-progress: true

This:

  • eliminates the duplicate-run race (and the systematic PR-trigger failure it causes),
  • saves ~8 min of CPU compute per PR push,
  • behaves correctly for force-pushes (cancels the in-flight run for the previous SHA on the same ref).

The same pattern probably also belongs on pre-commit.yml and the claude bot workflows for the same compute-saving reason, but those don't seem to suffer from the flake.

Out of scope

Repro / evidence

PR #232 (a no-op test deletion) hit this twice. The PR has been left as-is so the failure is still observable in CI logs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions