Skip to content

Conversation

@rparolin
Copy link
Collaborator

@rparolin rparolin commented Oct 22, 2025

This change is addressing a test failure when running the graphics api tests on WSL. We simply skip the tests when running on WSL due to the driver lacking pass through capabilities on WSL.

image

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rparolin rparolin requested a review from mdboom October 22, 2025 00:06
@rparolin rparolin marked this pull request as ready for review October 22, 2025 00:08
greptile-apps[bot]

This comment was marked as resolved.

@rparolin rparolin force-pushed the rparolin/test_graphics_api_assert_fix branch from 4597400 to ae3f200 Compare October 22, 2025 15:28
@rparolin
Copy link
Collaborator Author

rparolin commented Oct 22, 2025

@greptileai

greptile-apps[bot]

This comment was marked as resolved.

@rparolin rparolin changed the title Adding cudaErrorOperatingSystem to the error tuple in the test_graphics_apis.py Skipping test_graphics_apis.py when running on WSL Oct 22, 2025
@rparolin
Copy link
Collaborator Author

@greptileai

@rparolin
Copy link
Collaborator Author

/ok to test 9da73e3

greptile-apps[bot]

This comment was marked as resolved.

@rparolin
Copy link
Collaborator Author

@greptileai

greptile-apps[bot]

This comment was marked as outdated.

@rparolin
Copy link
Collaborator Author

@greptileai

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR addresses graphics API test failures on WSL by implementing platform-specific test skipping. A new helpers.py module is introduced to centralize test utilities, providing the IS_WSL flag used by test_graphics_apis.py to skip both test_graphics_api_smoketest() and test_cuda_register_image_invalid() when running on WSL platforms. The solution recognizes that WSL lacks OpenGL passthrough capabilities, making these graphics interop tests non-functional in that environment.

PR Description Notes:

  • The description should clarify that tests are being skipped on WSL, not that errors are being added to acceptable tuples (as the screenshot might suggest).

Potential Issues

helpers.py import structure adds unnecessary complexity:
Lines 28-31 implement a package-vs-module detection workaround that suggests cuda_python_test_helpers has an inconsistent installation structure. This pattern may cause confusion for future maintainers:

if hasattr(cuda_python_test_helpers, "__path__"):
    import cuda_python_test_helpers.cuda_python_test_helpers as cuda_python_test_helpers

Consider standardizing how cuda_python_test_helpers is structured across installation methods (editable vs wheel) to eliminate this conditional logic.

sys.path modification is not cleaned up on success:
Line 25 inserts a path into sys.path but never removes it, even after successful import. While this may be acceptable in a test context, it permanently modifies the import search order for the entire test session. Consider using try/finally to ensure cleanup, or document why the persistent modification is intentional.

Confidence: 4/5

The change correctly solves the stated problem (WSL test failures) with a clean, maintainable approach. The confidence score is not 5 because: (1) the import fallback logic in helpers.py suggests underlying structural issues with cuda_python_test_helpers packaging that may surface later, and (2) the permanent sys.path modification could theoretically cause import shadowing issues if other packages have conflicting names (though this is unlikely in practice).

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@github-actions
Copy link

@rparolin
Copy link
Collaborator Author

/ok to test 6927cfe

Copy link
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test /always/ fails on WSL. WSL is my daily driver, I wrote this test, and it "works for me" ™️. So I think there is probably something more specific/different about the WSL setup that is failing there. Maybe there's no $DISPLAY, or maybe the drivers aren't installed? Are there any clues in the way it's failing? I think we should update this to something more narrow than just "always skip on WSL".

I also have some feedback on the cuda_python_test_helpers stuff, but I think that's out-of-scope for here so it can be dealt with separately.

@leofang
Copy link
Member

leofang commented Oct 22, 2025

I agree with Mike that we shouldn't just skip. Lots of things, if not all, are supposed to just work on WSL, and I raised this offline to Rob and all in the team channel that incomplete knowledge of WSL failures are now biting us.

@rparolin
Copy link
Collaborator Author

I can dig deeper into why this happening... I figured since it was throwing a clear operating system error code that WSL wasn't support. But I'll dig deeper.

Comment on lines +30 to +32
# Clean up sys.path modification
if test_helpers_path in sys.path:
sys.path.remove(test_helpers_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you trying to achieve here by removing this from sys.path after adding it?

The import system will keep cuda_python_test_helpers in sys.modules, so any subsequent imports of it will just pull from that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was its preventing polluting global state because the import path changes is alive for the duration of the lifetime of the interpreter process.

def test_graphics_api_smoketest():
# Due to lazy importing in pyglet, pytest.importorskip doesn't work
try:
import pyglet
Copy link
Contributor

@cpcloud cpcloud Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be restructured to avoid mixing up the reason for unexpected failures, expected failures, or missing dependencies.

Right now, it's just pytest.skip-ing and then returning (I realize this is pre-existing) which is making the whole thing unnecessarily complicated.

# skips if `pyglet.image` isn't importable for whatever reason, you can address
# the laziness by importing something deeper like `pyglet.image`
pyglet_image = pytest.importorskip("pyglet.image")
# if the next line raises an `AttributeError`, that should be specified in a
# `pytest.mark.xfail(raises=AttributeError)` decorator using whatever condition
# to indicate why the failure happened
tex = pyglet_image.Texture.create(512, 512)

Then you can probably remove the code that's mixing these reasons together.

Copy link
Contributor

@cpcloud cpcloud Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also chain xfail decorators so that you can specify different conditions and types. For example, let's say pyglet is installed but we're running without a display. That will raises a different error than AttributeError. So you can do the following:

@pytest.mark.xfail(condition=..., raises=AttributeError)
@pytest.mark.xfail(condition=os.environ.get("DISPLAY") is None, raises=pyglet.TheRightException)
def test_graphics_api_smoketest():
    ...

You can also combine these if they're not mutually exclusive (i.e., an AttributeError is raised when DISPLAY is None.

@pytest.mark.xfail(condition=... or os.environ.get("DISPLAY") is None, raises=(AttributeError, pyglet.TheRightException))
def test_graphics_api_smoketest():
    ...

@mdboom
Copy link
Contributor

mdboom commented Oct 22, 2025

Looking at the screenshot again, I think maybe just adding cudaErrorOperatingSystem to the list of allowed results is probably enough. It's letting the graphics API tell us the OS isn't sufficient rather than deciding for ourselves. That should allow it continue to test /something/ where the WSL setup is working, and also test that it doesn't work where it isn't expected to.

@leofang
Copy link
Member

leofang commented Oct 22, 2025

But why didn't you get the cudaErrorOperatingSystem error when you developed the test? Something is clearly missing.

@leofang
Copy link
Member

leofang commented Oct 22, 2025

I don't know if it is obvious to see relations between issues/PRs, but I think this might be relevant (#1165).

@mdboom mdboom closed this Oct 22, 2025
@mdboom mdboom reopened this Oct 22, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The developer has responded to previous feedback by refactoring the WSL detection logic into cuda_python_test_helpers and cleaning up the test helper imports in both cuda_core/tests/helpers.py and cuda_bindings/tests/helpers.py. Specifically, cuda_python_test_helpers now exports IS_WSL and supports_ipc_mempool, with the latter incorporating WSL detection and lazy-importing handle_return to avoid hard dependencies. The test helper modules in cuda_core and cuda_bindings now use try-finally blocks to clean up temporary sys.path modifications after importing the helper package.

However, the changes introduce a critical logic error in both cuda_core/tests/helpers.py and cuda_bindings/tests/helpers.py: the sys.path.remove() call executes in the finally block immediately after the import statement (lines 26-32 in both files), which removes the path from sys.path before the module attributes IS_WSL and supports_ipc_mempool are accessed on lines 35-36. While Python's import system will cache the module in sys.modules, this premature cleanup is confusing and suggests a misunderstanding of when the cleanup should occur. The cleanup should happen after all usage of the imported module, not inside the try-finally block that wraps the import itself.

Additionally, in cuda_bindings/tests/helpers.py, the refactor removed the package-vs-module handling code (old lines 28-30) that checked whether cuda_python_test_helpers was imported as a package or module. This removal may cause issues if the import context changes.

Potential Issues

High Severity:

  • cuda_core/tests/helpers.py and cuda_bindings/tests/helpers.py (lines 26-32, 35-36): The sys.path cleanup in the finally block occurs immediately after the import, before the module attributes are accessed. While this will work due to sys.modules caching, it creates confusing logic. The reviewer previously asked "What are you trying to achieve here by removing this from sys.path after adding it?" and this implementation doesn't clearly address that question. If the intent is cleanup, it should happen after line 36, not between lines 28-32.

  • cuda_bindings/tests/helpers.py (removed lines 28-30): The deletion of the package-vs-module handling code may break imports if cuda_python_test_helpers is ever imported as a package instead of a module. The old code checked hasattr(cuda_python_test_helpers, 'IS_WSL') and had fallback logic. This safeguard is now gone.

Medium Severity:

  • cuda_python_test_helpers/init.py (line 41): The lazy import of handle_return inside supports_ipc_mempool is a good pattern, but the comment says it's to "avoid hard dependency on cuda.core". If cuda.core is not installed, the import will still fail when the function is called. Consider wrapping this in a try-except to return False gracefully if cuda.core is unavailable.

Confidence Score: 1 / 5

The logic errors in the sys.path cleanup make this PR unsafe to merge without correction. The timing of the cleanup suggests a misunderstanding of Python's import mechanics, and the removal of defensive code in cuda_bindings increases fragility.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +26 to +32
try:
sys.path.insert(0, test_helpers_path)
import cuda_python_test_helpers
finally:
# Clean up sys.path modification
if test_helpers_path in sys.path:
sys.path.remove(test_helpers_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The finally block removes test_helpers_path from sys.path immediately after import on line 28. This breaks the import cuda_python_test_helpers statement since Python needs the path to remain in sys.path during the import process. The cleanup should happen after all uses of the imported module (after line 36), not inside the try/finally that wraps the import.

Comment on lines +26 to +32
try:
sys.path.insert(0, test_helpers_path)
import cuda_python_test_helpers
finally:
# Clean up sys.path modification
if test_helpers_path in sys.path:
sys.path.remove(test_helpers_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The sys.path cleanup in the finally block runs before the import statement (line 28) completes, which means cuda_python_test_helpers may not be accessible after the try-finally block exits. The finally block executes immediately after the try block, removing the path before the module can be used at line 35-36. Move the cleanup to after all usage of the imported module (after line 36).

Comment on lines +25 to +28
test_helpers_path = str(pathlib.Path(__file__).resolve().parents[2] / "cuda_python_test_helpers")
try:
sys.path.insert(0, test_helpers_path)
import cuda_python_test_helpers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: If the import on line 28 fails, the path remains in sys.path since the finally block's removal is conditional. Consider moving the import outside the try-finally and only wrapping the sys.path manipulation, or ensure the path is always removed regardless of import success.

@mdboom
Copy link
Contributor

mdboom commented Oct 22, 2025

But why didn't you get the cudaErrorOperatingSystem error when you developed the test? Something is clearly missing.

Yeah, I think @rparolin can just compare our WSL installs until we understand what might be the difference, then we can gate on that. But generally speaking, if CUDA tells us the OS isn't suitable, that's a "reasonable" result in terms of a test that runs everywhere.

@leofang
Copy link
Member

leofang commented Oct 22, 2025

if CUDA tells us the OS isn't suitable, that's a "reasonable" result in terms of a test that runs everywhere.

OK. Here's the CUDA's dark corner (which we learned through debugging #248 (comment)). In the graphics interop context cudaErrorOperatingSystem usually just means a graphics context is not correctly set up, and there could be many reasons for why it's the case. It does not mean that the OS is not supported. In the case of issue 248 we had to activate the EGL context first (export PYOPENGL_PLATFORM="EGL" and then from OpenGL import EGL), and it could not be on a headless node. I dunno what pyglet is doing under the hood, so I suggest we look at this together with #1165 holistically and understand the real issue. The CUDA Python team needs to have a deep understanding in CUDA too.

@leofang
Copy link
Member

leofang commented Oct 22, 2025

(I canceled all CI runs because they are hanging and blocking all other teams from getting their tests run, see the nv-gha-runners thread.)

@rwgk
Copy link
Collaborator

rwgk commented Oct 22, 2025

I'm seeing the error in the PR description also here (ctk-next testing).

Note the LOOOK below: os.environ.get("DISPLAY")=None

This is with Windows 11 24H2 headless.

See also #1165 that @leofang pointed out already.

(TestVenv) C:\Users\rgrossekunst\forked\cuda-python-private\ctk-next\cuda_bindings>python -m pytest -ra -s -vv tests\test_graphics_apis.py
====================================================================== test session starts ======================================================================
platform win32 -- Python 3.13.9, pytest-8.4.2, pluggy-1.6.0 -- C:\Users\rgrossekunst\forked\cuda-python-private\ctk-next\TestVenv\Scripts\python.exe
cachedir: .pytest_cache
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: C:\Users\rgrossekunst\forked\cuda-python-private\ctk-next\cuda_bindings\tests
configfile: pytest.ini
plugins: benchmark-5.1.0
collected 2 items

tests\test_graphics_apis.py::test_graphics_api_smoketest
LOOOK os.environ.get("DISPLAY")=None
FAILED
tests\test_graphics_apis.py::test_cuda_register_image_invalid PASSED

=========================================================================== FAILURES ============================================================================
__________________________________________________________________ test_graphics_api_smoketest __________________________________________________________________

    def test_graphics_api_smoketest():
        pyglet = pytest.importorskip("pyglet")
        try:
            tex = pyglet.image.Texture.create(512, 512)
        except ImportError:
            pytest.skip("pyglet lazy ImportError")
        except AttributeError:
            pytest.skip("pyglet lazy import AttributeError")
        except pyglet.display.xlib.NoSuchDisplayException:
            pytest.skip("pyglet.display.xlib.NoSuchDisplayException")

        err, gfx_resource = cudart.cudaGraphicsGLRegisterImage(
            tex.id, tex.target, cudart.cudaGraphicsRegisterFlags.cudaGraphicsRegisterFlagsWriteDiscard
        )
        import os
        print(f"\nLOOOK {os.environ.get("DISPLAY")=}", flush=True)
        error_name = cudart.cudaGetErrorName(err)[1].decode()
        if error_name == "cudaSuccess":
            assert int(gfx_resource) != 0
        else:
>           assert error_name in ("cudaErrorInvalidValue", "cudaErrorUnknown")
E           AssertionError: assert 'cudaErrorOperatingSystem' in ('cudaErrorInvalidValue', 'cudaErrorUnknown')

tests\test_graphics_apis.py:28: AssertionError
==================================================================== short test summary info ====================================================================
FAILED tests\test_graphics_apis.py::test_graphics_api_smoketest - AssertionError: assert 'cudaErrorOperatingSystem' in ('cudaErrorInvalidValue', 'cudaErrorUnknown')
================================================================== 1 failed, 1 passed in 0.42s ==================================================================

@rwgk
Copy link
Collaborator

rwgk commented Oct 22, 2025

This seems to work for me (same as in #1165 (comment)):

diff --git a/cuda_bindings/tests/test_graphics_apis.py b/cuda_bindings/tests/test_graphics_apis.py
old mode 100644
new mode 100755
index 67f674b..d8bb2ee
--- a/cuda_bindings/tests/test_graphics_apis.py
+++ b/cuda_bindings/tests/test_graphics_apis.py
@@ -1,6 +1,8 @@
 # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE

+import os
+
 import pytest
 from cuda.bindings import runtime as cudart

@@ -22,6 +24,8 @@ def test_graphics_api_smoketest():
     error_name = cudart.cudaGetErrorName(err)[1].decode()
     if error_name == "cudaSuccess":
         assert int(gfx_resource) != 0
+    elif os.environ.get("DISPLAY") is None:
+        assert error_name == "cudaErrorOperatingSystem"
     else:
         assert error_name in ("cudaErrorInvalidValue", "cudaErrorUnknown")

@rparolin
Copy link
Collaborator Author

Looking at the screenshot again, I think maybe just adding cudaErrorOperatingSystem to the list of allowed results is probably enough. It's letting the graphics API tell us the OS isn't sufficient rather than deciding for ourselves. That should allow it continue to test /something/ where the WSL setup is working, and also test that it doesn't work where it isn't expected to.

That was actually my first commit but I allowed myself to be convinced by Greptile that doing a platform skip was more clearly showing intent. 🤦

@leofang leofang added test Improvements or additions to tests cuda.bindings Everything related to the cuda.bindings module P1 Medium priority - Should do labels Oct 23, 2025
@leofang leofang added this to the cuda-python 13-next, 12-next milestone Oct 23, 2025
@leofang leofang self-requested a review October 24, 2025 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module P1 Medium priority - Should do test Improvements or additions to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants