-
Notifications
You must be signed in to change notification settings - Fork 33
Closed
Description
Hi, I noticed a potential issue in nvmath/internal/utils.py at line 230, in the function get_memory_limit_from_device_id.
Currently, the function looks like this:
def get_memory_limit_from_device_id(memory_limit: int | float | str, device_id: int) -> int:
with device_ctx(device_id):
status, _, total_memory = cbr.cudaMemGetInfo()
return _get_memory_limit(memory_limit, total_memory)
The problem is that the return value status is not checked. In some cases (e.g., depending on the version of cuda-bindings), I encountered status=35 with total_memory=None. Although I solved this issue by switching to a compatible cuda-bindings version, it seems safer for the library to handle non-zero statuses explicitly.
I suggest adding error handling for status, for example:
def get_memory_limit_from_device_id(memory_limit: int | float | str, device_id: int) -> int:
with device_ctx(device_id):
status, _, total_memory = cbr.cudaMemGetInfo()
if status != 0 or total_memory is None:
raise RuntimeError(
f"cudaMemGetInfo failed with status {status}, total_memory={total_memory}"
)
return _get_memory_limit(memory_limit, total_memory)
This way, users will get a clear exception instead of unexpected None values being passed to _get_memory_limit.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels