Skip to content

CI: retry arm64 gh200 nightly-standard row#2296

Draft
leofang wants to merge 3 commits into
NVIDIA:mainfrom
leofang:leofang/retry-gh200-nightly
Draft

CI: retry arm64 gh200 nightly-standard row#2296
leofang wants to merge 3 commits into
NVIDIA:mainfrom
leofang:leofang/retry-gh200-nightly

Conversation

@leofang

@leofang leofang commented Jul 2, 2026

Copy link
Copy Markdown
Member

Re-enable the gh200 nightly-standard row in ci/test-matrix.yml (previously disabled in 4c70cfa because the runner hung on cudaMallocAsync). Runner team indicated the pool-side issue has been fixed, so let's give it another CI-visible run.

Second commit adds a temporary push: trigger to ci-nightly.yml so we can exercise the row from this PR. Revert before merging.

leofang added 2 commits July 2, 2026 15:01
Reverts the disable in 4c70cfa now that the runner team has fixed the
pool-side hang on stream-ordered memory allocator calls.
@copy-pr-bot

copy-pr-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang

leofang commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

/ok to test 8d51cf7

@github-actions github-actions Bot added the CI/CD CI/CD infrastructure label Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

The cufile BAR-size query returns CUDA_ERROR_NOT_SUPPORTED on
Grace+Hopper (unified memory, no discrete PCIe BAR). Tracked in NVIDIA#2299;
remove this deselect once the test skipif is fixed upstream.

Uses PYTEST_ADDOPTS so no changes to run-tests are needed.
@leofang

leofang commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

/ok to test f148214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant