Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 13 additions & 7 deletions cuda_bindings/cuda/bindings/cyruntime.pyx.in
Original file line number Diff line number Diff line change
Expand Up @@ -1917,13 +1917,19 @@ cdef cudaError_t getLocalRuntimeVersion(int* runtimeVersion) except ?cudaErrorCa
cdef cudaError_t err = cudaSuccess
err = (<cudaError_t (*)(int*) except ?cudaErrorCallRequiresNewerDriver nogil> __cudaRuntimeGetVersion)(runtimeVersion)

# Unload
{{if 'Windows' == platform.system()}}
windll.FreeLibrary(handle)
{{else}}
dlfcn.dlclose(handle)
{{endif}}
# We explicitly do *NOT* cleanup the library handle here, acknowledging
# that, yes, the handle leaks. The reason is that there's a
# `functools.cache` on the top-level caller of this function.
#
# This means this library would be opened once and then immediately closed,
# all the while remaining in the cache lurking there for people to call.
#
# Since we open the library one time (technically once per unique library name),
# there's not a ton of leakage, which we deem acceptable for the 1000x speedup
# achieved by caching (ultimately) `ctypes.CDLL` calls.
#
# Long(er)-term we can explore cleaning up the library using higher-level
# Python mechanisms, like `__del__` or `weakref.finalizer`s.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying my comment from the private PR:

Could you help me understand the context more? Where is the functools.cache?

I'm thinking it wouldn't be difficult to change this function to do the caching of the result/error right here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the functools.cache?

https://github.com/nvidia/cuda-python/blob/main/cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py?plain=1#L54

I'm thinking it wouldn't be difficult to change this function to do the caching of the result/error right here.

The caching isn't done for the function's result.

It's that this function loads the library and then closes it, invalidating the handle to that library that is cached by functools.cache that decorates load_nvidia_dynamic_lib. The pointer itself remains valid, but the symbol table (at least in the elf loader) contains NULL pointers that are eventually dereferenced during a subsequent dlsym call with that (now invalid) handle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reposting my responpose also:

Ah ... thanks. Could you please give me a moment to think about this?

I didn't realize that the caching implies: never close the handle. That's not good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the discussion here to avoid duplicating on cuda-python-private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent most of the day going down this rabbit hole, so I'm happy to talk it through IRL if that helps.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you need to get this issue out of the way asap:

WDYT about:

Comment out the code here (but don't delete for easy reference).

Add this comment:

# Skip closing handle until https://github.com/NVIDIA/cuda-python/issues/1011 is resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment out the code here (but don't delete for easy reference).

Not really a huge fan of that in general.

We have git history if someone really needs the exact code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you need to get this issue out of the way asap:

There's no rush here since 3.13t is experimental and 3.14 is still an RC.

If you have a solution you want to explore, have at it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will not have a solution overnight. For the moment I'd just do this:

# Currently pathfinder does not support closing the handle.
# See https://github.com/NVIDIA/cuda-python/issues/1011 for background.

It's fine to delete the code for closing the handle entirely. From what I learned yesterday afternoon, the code here will have to change for sure, if we decide to support closing the handles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following what problem remains with the weakref finalize solution. I'm not saying it is without problems, but I currently don't see how it doesn't solve the problem of correctly closing a CDLL-opened library at the right time.


# Return
return err
{{endif}}
16 changes: 9 additions & 7 deletions cuda_bindings/tests/test_cudart.py
Original file line number Diff line number Diff line change
Expand Up @@ -1404,10 +1404,12 @@ def test_struct_pointer_comparison(target):


def test_getLocalRuntimeVersion():
try:
err, version = cudart.getLocalRuntimeVersion()
except pathfinder.DynamicLibNotFoundError:
pytest.skip("cudart dynamic lib not available")
else:
assertSuccess(err)
assert version >= 12000 # CUDA 12.0
# verify that successive calls do not segfault the interpreter
for _ in range(10):
try:
err, version = cudart.getLocalRuntimeVersion()
except pathfinder.DynamicLibNotFoundError:
pytest.skip("cudart dynamic lib not available")
else:
assertSuccess(err)
assert version >= 12000 # CUDA 12.0
Loading