-
Notifications
You must be signed in to change notification settings - Fork 214
Add (failing) tests demonstrating that errors in Buffer.close are not raised #1117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/ok to test 845fbd4 |
|
845fbd4
to
adfb7e5
Compare
/ok to test 0986f5e |
0986f5e
to
78a4815
Compare
/ok to test 1d5248e |
1d5248e
to
bbdbbcd
Compare
d916308
to
b7f8c2a
Compare
…ple) are not raised.
6e9c283
to
fea6f8a
Compare
mr.close() | ||
|
||
|
||
@pytest.mark.xfail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work here?
@pytest.mark.xfail(reason="Issue #1118", strict=True)
The important part is strict=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HUGE 👎 in testing this.
def test_error_in_close_memory_resource(ipc_memory_resource): | ||
"""Test that errors when closing a memory resource are raised.""" | ||
mr = ipc_memory_resource | ||
driver.cuMemPoolDestroy(mr.handle) | ||
with pytest.raises(CUDAError, match=".*CUDA_ERROR_INVALID_VALUE.*"): | ||
mr.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is illegal and I disagree we need to test this. This is Python and we can't possibly guard against all kinds of bizarre ways of trying to mutate the state of our Python objects behind our back. In particular, as noted in both #1074 (comment) and offline discussion, errors like CUDA_ERROR_INVALID_VALUE
are due to multiple frees. I thought we've moved on?
This test is just another instance of the same class of errors: We free the handle of an object through a direct C API call, bypassing our safeguard mechanism (under the hood we do check if the handle is already null before freeing, and then after free we set the handle to null to avoid double free), so our destructor kicks in again, either through an explicit close()
call or implicitly when going out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this test is asserting that different levels of APIs offer the same guarantees, which would make developing layers of APIs really really difficult.
It seems roughly analogous to calling into the Python C API through ctypes, and expecting Python to somehow know you didn't mean to cause a segmentation violation:
❯ python3.13 -q
>>> x = 1
>>> import ctypes
>>> ctypes.pythonapi.Py_DecRef(x)
zsh: segmentation fault (core dumped) python3.13 -q
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess there's probably another way to write this test such that the behavior is buggy without crossing into the C abyss of naked bindings.
Would it be enough to just call close()
twice? That seems like something we should perhaps be robust to if we're not already:
❯ python -q
>>> f = open('/tmp/x', 'w')
>>> f.close()
>>> f.close()
driver.cuMemFree(buffer.handle) | ||
with pytest.raises(CUDAError, match=".*CUDA_ERROR_INVALID_VALUE.*"): | ||
buffer.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
try: | ||
driver.cuMemPoolDestroy(self.mr.handle) | ||
except Exception: # noqa: S110 | ||
pass | ||
else: | ||
self.mr.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
FWIW cccl-runtime, C++ stdandad library, or any high-level frameworks/libraries have the same issue. We give you the access to the underlying handle of a container, does not mean that you can free it through |
Errors occurring during
Buffer.close
are not raised. This change adds tests demonstrating the issue. See #1118.