ci: cpu_test.yml races itself on PRs (push + pull_request triggers, no concurrency group)

## Symptom

CPU Tests CI fails on PRs with a consistent pattern: the `push`-triggered run passes, the `pull_request`-triggered run on the **same SHA** fails. Reproduced across two distinct commits on PR #232:

| Commit | Push trigger | PR trigger | Δ start |
|---|---|---|---|
| `0bf66d5` | run [25246947390](https://github.com/TensorAuto/OpenTau/actions/runs/25246947390) at 07:33:01 ✅ | run [25246954126](https://github.com/TensorAuto/OpenTau/actions/runs/25246954126) at 07:33:23 ❌ | 22 s |
| `c5a7e30` | run [25247282253](https://github.com/TensorAuto/OpenTau/actions/runs/25247282253) at 07:52:15 ✅ | run [25247282825](https://github.com/TensorAuto/OpenTau/actions/runs/25247282825) at 07:52:16 ❌ | 1 s |

In both cases push starts a few seconds earlier and succeeds; PR runs concurrently and fails. The failure is not random — it is systematically the second-to-start.

## Root cause

`.github/workflows/cpu_test.yml` triggers on both `push` and `pull_request`, and on a PR branch **both** events fire. There is no `concurrency:` block, so two full ~8-minute CPU-test jobs run in parallel at the same SHA against the same external resources (HF Hub auth + dataset downloads, libero assets, etc.). The second job to start loses the race against pre-existing flaky tests in the `datasets/HF-Hub network` family — the same flake class @shuheng-liu has documented in #229 ("pre-existing failures: datasets/HF-Hub network, `test_optional_keys`'s SimpleNamespace mock missing `camera_keys`, `test_hub`").

## Fix

Add a `concurrency:` block at the top of `cpu_test.yml` so duplicate runs at the same ref dedupe instead of racing:

```yaml
concurrency:
  group: cpu-test-${{ github.ref }}
  cancel-in-progress: true
```

This:
- eliminates the duplicate-run race (and the systematic PR-trigger failure it causes),
- saves ~8 min of CPU compute per PR push,
- behaves correctly for force-pushes (cancels the in-flight run for the previous SHA on the same ref).

The same pattern probably also belongs on `pre-commit.yml` and the claude bot workflows for the same compute-saving reason, but those don't seem to suffer from the flake.

## Out of scope

- Fixing the underlying `datasets/HF-Hub network` / `test_optional_keys` / `test_hub` flakes. Those are pre-existing and tracked separately in PR #229's notes; the concurrency fix mitigates them by reducing parallel pressure on HF Hub but doesn't eliminate them when a single run hits an HF-side hiccup.

## Repro / evidence

PR #232 (a no-op test deletion) hit this twice. The PR has been left as-is so the failure is still observable in CI logs.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: cpu_test.yml races itself on PRs (push + pull_request triggers, no concurrency group) #234

Symptom

Root cause

Fix

Out of scope

Repro / evidence

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Commit	Push trigger	PR trigger	Δ start
`0bf66d5`	run 25246947390 at 07:33:01 ✅	run 25246954126 at 07:33:23 ❌	22 s
`c5a7e30`	run 25247282253 at 07:52:15 ✅	run 25247282825 at 07:52:16 ❌	1 s

ci: cpu_test.yml races itself on PRs (push + pull_request triggers, no concurrency group) #234

Description

Symptom

Root cause

Fix

Out of scope

Repro / evidence

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions