Skip to content

cuda_core test_memory.py, test_program.py xfail, fix#1302

Merged
rwgk merged 2 commits intoNVIDIA:mainfrom
rwgk:cuda_core_tests_from_next
Dec 3, 2025
Merged

cuda_core test_memory.py, test_program.py xfail, fix#1302
rwgk merged 2 commits intoNVIDIA:mainfrom
rwgk:cuda_core_tests_from_next

Conversation

@rwgk
Copy link
Copy Markdown
Contributor

@rwgk rwgk commented Dec 2, 2025

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Dec 2, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Copy Markdown
Contributor Author

rwgk commented Dec 2, 2025

/ok to test

@github-actions

This comment has been minimized.

@rwgk rwgk requested a review from leofang December 2, 2025 22:42
except CUDAError as exc:
msg = str(exc)
if "CUDA_ERROR_UNKNOWN" in msg:
pytest.xfail("TODO(#1300): Known to fail already with CTK 13.0 (Windows)")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Is our new Windows CI still unable to capture this? What's missing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Is our new Windows CI still unable to capture this?

I just checked, looking at the log archive for this run (this PR):

grep -a 'Known to fail already with CTK 13' *.txt

still has no hits.

What's missing?

It requires (a) specific GPU(s) that we evidently don't have in the CI.

Relevant nvbugs:

  • 5630448 (Titan RTX WDDM) I saw the failure there when testing interactively with CTK 13.0 when backtracking.

  • 5633483 for the Jetson failure reported by SWQA.

Possibly we could take out the xfail in line 460 and no automated or QA testing will catch this, but I'm sure I saw this failure testing interactively. I'd prefer to keep the xfail to avoid getting sidetracked when it pops up again somewhere. We have #1300 to track understanding why I saw this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of the discussion in #1264 (comment), where an error was seen due to using the pre-production 13.1 driver (which we cannot test in the public CI yet) which enables the RDMA support. Let's keep an eye on this when 13.1 is out.

@leofang leofang added test Improvements or additions to tests cuda.core Everything related to the cuda.core module labels Dec 2, 2025
@leofang leofang added the P1 Medium priority - Should do label Dec 3, 2025
@leofang leofang added this to the cuda.core beta 10 milestone Dec 3, 2025
@rwgk rwgk merged commit 4b09e51 into NVIDIA:main Dec 3, 2025
64 checks passed
@rwgk rwgk deleted the cuda_core_tests_from_next branch December 3, 2025 05:38
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 3, 2025

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module P1 Medium priority - Should do test Improvements or additions to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants