ci(gpu) + test(pi07): drop temp push trigger + scope CUDA drain (cleanup of #285) by shuheng-liu · Pull Request #286 · TensorAuto/OpenTau

shuheng-liu · 2026-05-08T05:13:46Z

What this does

Cleanup left by #285's squash-merge.

#285 went through with the temporary gpu_test.yml push trigger I had added on the fix branch to manually dispatch GPU CI without API auth. The maintainer marked the PR ready and squash-merged it before my follow-up cleanup commit landed, so the trigger leaked into main. This PR drops it.

While I'm here, also addressing the claude[bot] nit from the original review (PR #285):
tests/policies/test_pi07_low_level.py — the pre-init CUDA drain (set_device(0) / synchronize() / empty_cache()) was at function scope, so it ran even when an earlier test in the session had already initialized the PG. In that case NCCL is already bound to whichever device that test picked and re-pinning here can't unbind it, so the drain only does useful work on the path where we own the init. Move it inside the if not already_initialized: block alongside the actual init_process_group call. Comment is updated to spell out the asymmetry.

No functional change to GPU behavior — the GPU CI run on dd42b69 (Run Pytest on GPU job 74911614535, 13min, success) already confirmed the underlying fix works.

How it was tested

pre-commit run --files tests/policies/test_pi07_low_level.py .github/workflows/gpu_test.yml — all hooks pass.
pytest -m "not gpu" -n auto tests/policies/test_pi07_cpu.py tests/policies/test_pi07_low_level.py — 69 passed, no regressions.
The gpu_test.yml change is trigger-only; the gpu-test job body is unchanged so a re-run on the nightly cron (or manual workflow_dispatch) still exercises the same fix that already passed on dd42b69.

How to checkout & try? (for the reviewer)

git fetch origin claude/fix-gpu-tests-TStWF
git checkout claude/fix-gpu-tests-TStWF
git diff main -- .github/workflows/gpu_test.yml tests/policies/test_pi07_low_level.py
pytest -m "not gpu" -n auto tests/policies/test_pi07_cpu.py tests/policies/test_pi07_low_level.py

Checklist

I have added Google-style docstrings to important functions and ensured function parameters are typed.
My PR includes policy-related changes.
- If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

Generated by Claude Code

Cleanup left by #285's squash-merge: - Remove the gpu_test.yml `push` trigger I added to dispatch GPU CI on the fix branch — the run on dd42b69 already confirmed the fix. - Address the claude[bot] nit on #285: scope the pre-init CUDA drain (set_device + synchronize + empty_cache) to the path where we own the init. When a prior test had already initialized the PG, NCCL is already bound to that device and re-pinning here can't unbind it, so the drain is only meaningful when we own init.

shuheng-liu added the test label May 8, 2026 — with Claude

shuheng-liu self-assigned this May 8, 2026

shuheng-liu marked this pull request as ready for review May 8, 2026 19:21

shuheng-liu merged commit 22063d2 into main May 8, 2026
5 of 6 checks passed

shuheng-liu deleted the claude/fix-gpu-tests-TStWF branch May 8, 2026 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(gpu) + test(pi07): drop temp push trigger + scope CUDA drain (cleanup of #285)#286

ci(gpu) + test(pi07): drop temp push trigger + scope CUDA drain (cleanup of #285)#286
shuheng-liu merged 1 commit into
mainfrom
claude/fix-gpu-tests-TStWF

shuheng-liu commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shuheng-liu commented May 8, 2026

What this does

How it was tested

How to checkout & try? (for the reviewer)

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant