Skip to content

[FEA]: Update the memory module to use __dealloc__. #1063

@Andy-Jost

Description

@Andy-Jost

Is this a duplicate?

Area

cuda.core

Is your feature request related to a problem? Please describe.

I would like abnormal process shutdown to complete gracefully. When testing memory IPC, it was observed that a killed child process sometimes appears to invoke Buffer.__del__ after the CUDA context has been destroyed (see additional context section for error message). This has been observed to result in hard errors including floating point exceptions and segmentation violations.

Describe the solution you'd like

Update Buffer.__del__ (and similar methods) to use Buffer.__dealloc__.

Describe alternatives you've considered

Solutions based on sys.is_finalizing appear to be unreliable.

Additional context

The following messages were output from a killed child process while running cuda_core/tests/memory_ipc tests:

	/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/_pytest/unraisableexception.py:67: PytestUnraisableExceptionWarning: Exception ignored in: 'cuda.core.experimental._memory.Buffer.__del__' 
	
	Traceback (most recent call last): 
	File "/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/cuda/core/experimental/_device.py", line 991, in __new__ 
	devices = _tls.devices 
	^^^^^^^^^^^^ 
	AttributeError: '_thread._local' object has no attribute 'devices' 
	
	During handling of the above exception, another exception occurred: 
	
	Traceback (most recent call last): 
	File "cuda/core/experimental/_memory.pyx", line 82, in cuda.core.experimental._memory.Buffer._shutdown_safe_close 
	self._mr.deallocate(self._ptr, self._size, stream) 
	File "cuda/core/experimental/_memory.pyx", line 801, in cuda.core.experimental._memory.DeviceMemoryResource.deallocate 
	raise_if_driver_error(err) 
	File "cuda/core/experimental/_utils/cuda_utils.pyx", line 69, in cuda.core.experimental._utils.cuda_utils._check_driver_error 
	raise CUDAError(f"{name}: {expl}") 
cuda.core.experimental._utils.cuda_utils.CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using ::cuCtxFromGreenCtx API.

Metadata

Metadata

Assignees

Labels

P0High priority - Must do!cuda.coreEverything related to the cuda.core moduleenhancementAny code-related improvements

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions