cuda_core test_memory.py, test_program.py xfail, fix#1302
Conversation
…y skipped in our CI; failures discovered elsewhere
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
| except CUDAError as exc: | ||
| msg = str(exc) | ||
| if "CUDA_ERROR_UNKNOWN" in msg: | ||
| pytest.xfail("TODO(#1300): Known to fail already with CTK 13.0 (Windows)") |
There was a problem hiding this comment.
Q: Is our new Windows CI still unable to capture this? What's missing?
There was a problem hiding this comment.
Q: Is our new Windows CI still unable to capture this?
I just checked, looking at the log archive for this run (this PR):
grep -a 'Known to fail already with CTK 13' *.txt
still has no hits.
What's missing?
It requires (a) specific GPU(s) that we evidently don't have in the CI.
Relevant nvbugs:
-
5630448 (Titan RTX WDDM) I saw the failure there when testing interactively with CTK 13.0 when backtracking.
-
5633483 for the Jetson failure reported by SWQA.
Possibly we could take out the xfail in line 460 and no automated or QA testing will catch this, but I'm sure I saw this failure testing interactively. I'd prefer to keep the xfail to avoid getting sidetracked when it pops up again somewhere. We have #1300 to track understanding why I saw this.
There was a problem hiding this comment.
This reminds me of the discussion in #1264 (comment), where an error was seen due to using the pre-production 13.1 driver (which we cannot test in the public CI yet) which enables the RDMA support. Let's keep an eye on this when 13.1 is out.
|
commit b4b2bf1 — Add
xfailin cuda_core/tests/test_memory.py: these tests are currently skipped in our CI; failures discovered elsewhere — Review of thesexfailis tracked under Review test_memory.py::test_vmm_allocator_policy_configurationxfail#1300commit 455889d — Fix mis-spelled enum name in cuda_core/tests/test_program.py (oversight in PR [NVVM] Guardrail test_program for version mismatches between driver and PTX compiled binary #1204)