Missing error handling for status in get_memory_limit_from_device_id

Hi, I noticed a potential issue in [nvmath/internal/utils.py](https://github.com/NVIDIA/nvmath-python/blob/main/nvmath/internal/utils.py) at line 230, in the function get_memory_limit_from_device_id.
Currently, the function looks like this:
```
def get_memory_limit_from_device_id(memory_limit: int | float | str, device_id: int) -> int:
    with device_ctx(device_id):
        status, _, total_memory = cbr.cudaMemGetInfo()
        return _get_memory_limit(memory_limit, total_memory)
```
The problem is that the return value status is not checked. In some cases (e.g., depending on the version of cuda-bindings), I encountered status=35 with total_memory=None. Although I solved this issue by switching to a compatible cuda-bindings version, it seems safer for the library to handle non-zero statuses explicitly.

I suggest adding error handling for status, for example:
```
def get_memory_limit_from_device_id(memory_limit: int | float | str, device_id: int) -> int:
    with device_ctx(device_id):
        status, _, total_memory = cbr.cudaMemGetInfo()
        if status != 0 or total_memory is None:
            raise RuntimeError(
                f"cudaMemGetInfo failed with status {status}, total_memory={total_memory}"
            )
        return _get_memory_limit(memory_limit, total_memory)
```
This way, users will get a clear exception instead of unexpected None values being passed to _get_memory_limit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing error handling for status in get_memory_limit_from_device_id #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing error handling for status in get_memory_limit_from_device_id #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions