Skip to content

fix(ci): bypass commit-age gate when retrying long-running tests#33

Merged
ivanbasov merged 3 commits into
NVIDIA:mainfrom
ivanbasov:fix/long-running-retry-gate
Apr 1, 2026
Merged

fix(ci): bypass commit-age gate when retrying long-running tests#33
ivanbasov merged 3 commits into
NVIDIA:mainfrom
ivanbasov:fix/long-running-retry-gate

Conversation

@ivanbasov
Copy link
Copy Markdown
Member

Summary

  • The check-for-changes gate in long-running-tests.yml skips all test jobs when there are no commits on main in the last 24 h
  • When retrying from the GitHub UI ("Re-run all jobs"), the gate re-runs with the same stale event context and skips again
  • Added an elif branch: if github.run_attempt > 1, the gate always passes has_changes=true, so retries always execute the tests

Test plan

  • Trigger the scheduled workflow on a quiet day (no recent commits) — confirm tests are still skipped on the first run
  • Use "Re-run all jobs" on that run — confirm check-for-changes logs "Re-run attempt 2 — always run." and all test jobs proceed

🤖 Generated with Claude Code

ivanbasov and others added 3 commits March 30, 2026 11:54
…fault

torch.compile=on combined with DataLoader spawn workers during LER
validation causes a segfault (20 leaked semaphores, core dumped).
Set PREDECODER_TORCH_COMPILE=0 for the Train all orientations step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`github.run_attempt > 1` short-circuits the 24 h commit check so that
"Re-run all jobs" from the UI always executes the test jobs, even on
quiet days with no recent pushes to main.

Signed-off-by: Ivan Basov <ibasov@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ivanbasov ivanbasov requested a review from bmhowe23 April 1, 2026 17:33
@ivanbasov ivanbasov merged commit 8e33934 into NVIDIA:main Apr 1, 2026
12 checks passed
ivanbasov added a commit that referenced this pull request Apr 10, 2026
* fix(ci): disable torch.compile in orientation training to prevent segfault

torch.compile=on combined with DataLoader spawn workers during LER
validation causes a segfault (20 leaked semaphores, core dumped).
Set PREDECODER_TORCH_COMPILE=0 for the Train all orientations step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "fix(ci): disable torch.compile in orientation training to prevent segfault"

This reverts commit 7f0f6c8.

* fix(ci): bypass commit-age gate when retrying long-running tests

`github.run_attempt > 1` short-circuits the 24 h commit check so that
"Re-run all jobs" from the UI always executes the test jobs, even on
quiet days with no recent pushes to main.

Signed-off-by: Ivan Basov <ibasov@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Signed-off-by: Ivan Basov <ibasov@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
ivanbasov added a commit that referenced this pull request Apr 10, 2026
* fix(ci): disable torch.compile in orientation training to prevent segfault

torch.compile=on combined with DataLoader spawn workers during LER
validation causes a segfault (20 leaked semaphores, core dumped).
Set PREDECODER_TORCH_COMPILE=0 for the Train all orientations step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "fix(ci): disable torch.compile in orientation training to prevent segfault"

This reverts commit 7f0f6c8.

* fix(ci): bypass commit-age gate when retrying long-running tests

`github.run_attempt > 1` short-circuits the 24 h commit check so that
"Re-run all jobs" from the UI always executes the test jobs, even on
quiet days with no recent pushes to main.

Signed-off-by: Ivan Basov <ibasov@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Signed-off-by: Ivan Basov <ibasov@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants