Add (failing) tests demonstrating that errors in Buffer.close are not raised #1117

Andy-Jost · 2025-10-09T20:30:14Z

Errors occurring during Buffer.close are not raised. This change adds tests demonstrating the issue. See #1118.

copy-pr-bot · 2025-10-09T20:30:18Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Andy-Jost · 2025-10-09T20:31:09Z

/ok to test 845fbd4

github-actions · 2025-10-09T20:46:49Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1117/
https://nvidia.github.io/cuda-python/pr-preview/pr-1117/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1117/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1117/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

Andy-Jost · 2025-10-09T22:10:11Z

/ok to test 0986f5e

Andy-Jost · 2025-10-09T22:18:43Z

/ok to test 1d5248e

…ple) are not raised.

rwgk · 2025-10-10T00:17:32Z

cuda_core/tests/memory_ipc/test_errors.py

+        mr.close()
+
+
+@pytest.mark.xfail


Will this work here?

@pytest.mark.xfail(reason="Issue #1118", strict=True)

The important part is strict=True.

leofang

HUGE 👎 in testing this.

leofang · 2025-10-10T14:18:15Z

cuda_core/tests/memory_ipc/test_errors.py

+def test_error_in_close_memory_resource(ipc_memory_resource):
+    """Test that errors when closing a memory resource are raised."""
+    mr = ipc_memory_resource
+    driver.cuMemPoolDestroy(mr.handle)
+    with pytest.raises(CUDAError, match=".*CUDA_ERROR_INVALID_VALUE.*"):
+        mr.close()


This is illegal and I disagree we need to test this. This is Python and we can't possibly guard against all kinds of bizarre ways of trying to mutate the state of our Python objects behind our back. In particular, as noted in both #1074 (comment) and offline discussion, errors like CUDA_ERROR_INVALID_VALUE are due to multiple frees. I thought we've moved on?

This test is just another instance of the same class of errors: We free the handle of an object through a direct C API call, bypassing our safeguard mechanism (under the hood we do check if the handle is already null before freeing, and then after free we set the handle to null to avoid double free), so our destructor kicks in again, either through an explicit close() call or implicitly when going out of scope.

I agree that this test is asserting that different levels of APIs offer the same guarantees, which would make developing layers of APIs really really difficult.

It seems roughly analogous to calling into the Python C API through ctypes, and expecting Python to somehow know you didn't mean to cause a segmentation violation:

❯ python3.13 -q >>> x = 1 >>> import ctypes >>> ctypes.pythonapi.Py_DecRef(x) zsh: segmentation fault (core dumped) python3.13 -q

I would guess there's probably another way to write this test such that the behavior is buggy without crossing into the C abyss of naked bindings.

Would it be enough to just call close() twice? That seems like something we should perhaps be robust to if we're not already:

❯ python -q >>> f = open('/tmp/x', 'w') >>> f.close() >>> f.close()

leofang · 2025-10-10T14:18:52Z

cuda_core/tests/memory_ipc/test_errors.py

+    driver.cuMemFree(buffer.handle)
+    with pytest.raises(CUDAError, match=".*CUDA_ERROR_INVALID_VALUE.*"):
+        buffer.close()


leofang · 2025-10-10T14:19:11Z

cuda_core/tests/memory_ipc/test_errors.py

+        try:
+            driver.cuMemPoolDestroy(self.mr.handle)
+        except Exception:  # noqa: S110
+            pass
+        else:
+            self.mr.close()


leofang · 2025-10-10T14:22:50Z

FWIW cccl-runtime, C++ stdandad library, or any high-level frameworks/libraries have the same issue. We give you the access to the underlying handle of a container, does not mean that you can free it through free() or delete behind the container's back. This is UB and by testing it we are guaranteeing certain behavior (whatever it is).

Andy-Jost requested review from cpcloud, leofang, rparolin and rwgk October 9, 2025 20:30

Andy-Jost self-assigned this Oct 9, 2025

Andy-Jost mentioned this pull request Oct 9, 2025

[BUG]: Errors occurring in Buffer.close are not raised #1118

Open

1 task

Andy-Jost force-pushed the ipc_suppressed_errors branch from 845fbd4 to adfb7e5 Compare October 9, 2025 22:08

Andy-Jost force-pushed the ipc_suppressed_errors branch from 0986f5e to 78a4815 Compare October 9, 2025 22:17

Andy-Jost added test Improvements or additions to tests cuda.core Everything related to the cuda.core module labels Oct 9, 2025

Andy-Jost force-pushed the ipc_suppressed_errors branch from 1d5248e to bbdbbcd Compare October 9, 2025 22:33

Andy-Jost changed the title ~~Add skipped tests demonstrating that errors in Buffer.close are not raised~~ Add (failing) tests demonstrating that errors in Buffer.close are not raised Oct 9, 2025

Andy-Jost force-pushed the ipc_suppressed_errors branch from d916308 to b7f8c2a Compare October 9, 2025 22:35

Add failing tests demonstrating that errors in Buffer.close (for exam…

fea6f8a

…ple) are not raised.

Andy-Jost force-pushed the ipc_suppressed_errors branch from 6e9c283 to fea6f8a Compare October 9, 2025 22:36

Merge branch 'main' into ipc_suppressed_errors

06c8d2d

rwgk reviewed Oct 10, 2025

View reviewed changes

leofang requested changes Oct 10, 2025

View reviewed changes

		mr.close()


		@pytest.mark.xfail

Add (failing) tests demonstrating that errors in Buffer.close are not raised #1117

Are you sure you want to change the base?

Add (failing) tests demonstrating that errors in Buffer.close are not raised #1117

Conversation

Andy-Jost commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Oct 9, 2025

Uh oh!

Andy-Jost commented Oct 9, 2025

Uh oh!

github-actions bot commented Oct 9, 2025

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Andy-Jost commented Oct 9, 2025

Uh oh!

Andy-Jost commented Oct 9, 2025

Uh oh!

rwgk Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

leofang Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

cpcloud Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cpcloud Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leofang Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

leofang Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

leofang commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Andy-Jost commented Oct 9, 2025 •

edited

Loading

cpcloud Oct 10, 2025 •

edited

Loading

cpcloud Oct 10, 2025 •

edited

Loading

leofang commented Oct 10, 2025 •

edited

Loading