Skip to content

Re-wrap most of the enums in cuda.bindings.nvml for cuda.core.system.#2014

Merged
mdboom merged 12 commits intoNVIDIA:mainfrom
mdboom:re-expose-enums
May 5, 2026
Merged

Re-wrap most of the enums in cuda.bindings.nvml for cuda.core.system.#2014
mdboom merged 12 commits intoNVIDIA:mainfrom
mdboom:re-expose-enums

Conversation

@mdboom
Copy link
Copy Markdown
Contributor

@mdboom mdboom commented May 4, 2026

This rewraps most of the enums (except the extremely large and unorganized FieldId) for cuda.core.system rather than passing them directly through. This also creates an optional string interface for all of these enum values.

@mdboom mdboom requested review from leofang and rwgk May 4, 2026 17:25
@github-actions github-actions Bot added the cuda.core Everything related to the cuda.core module label May 4, 2026
@mdboom mdboom self-assigned this May 4, 2026
@mdboom mdboom added this to the cuda.core v1.0.0 milestone May 4, 2026
@mdboom mdboom added the P0 High priority - Must do! label May 4, 2026
@github-actions

This comment has been minimized.

@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented May 4, 2026

PR 2014 Agent Review

  • Size: 16 files, +800 / -191
  • Reviewer: Claude Opus 4.7 (1M context, Max thinking) — initial review

With small manual edits: I reduced the Compatibility subsection. I deleted the Next steps section entirely. The rest looks useful to me.

The "Sync guard (structural gap)" finding wasn't in the initial automatic review; I asked about it in a follow-on prompt. It seems pretty easy to at least add basic protections.


High-level summary (asked for by reviewer)

This PR is the implementation of issue
#1995 ("Replace string
literals with enums in public API"), which is itself a sub-item of the
#1919 "Audit cuda.core
API for 1.0 release" epic. The motivation, captured in the issue thread
(especially the 2026-05-01 comment), is the cuda.core 1.0 milestone (#43, due
2026-05-07): once 1.0 ships, the team is locked into a multi-year support
window, so any awkward-to-fix public API surface must be cleaned up now. The
decision was to:

  • For "logically enum-like" things, accept either an enum or a string
    everywhere, and return a string-flavored enum (StrEnum).
  • For NVML-derived enums (currently _FastEnum ints), wrap them in
    cuda.core.system with hand-curated StrEnums so users see a uniform
    Enum | str interface and don't get C-style names like
    EVENT_TYPE_XID_CRITICAL_ERROR or TEMPERATURE_THRESHOLD_SHUTDOWN leaking
    into the public API.

The PR rewraps most NVML enums (except FieldId, intentionally) into
StrEnum types, adds _<NAME>_MAPPING / _<NAME>_INV_MAPPING dicts for
round-tripping, accepts Enum | str everywhere, and cleans up a few adjacent
things while the file is open:

  • Pstates → plain int (0..15) with None for unknown.
  • PciInfo.get_throughput(counter)rx_throughput / tx_throughput
    properties.
  • device.brand → free-form str (with "Unknown" fallback).
  • NvLink version → (major, minor) tuple.

It also depends on backports.strenum for Python 3.10 (cuda_core supports
3.10+).

The direction is right and consistent with the design agreed in #1995.
However, the PR has several concrete bugs, a doc-build risk, a structural gap
(no mechanism to keep the wrappers in sync with the bindings — see "Sync
guard" below), and a few breaking-change footnotes that are not currently
called out in the PR body.

Findings (most severe first)

Bugs

Bug — test_nvlink references a removed symbol.
cuda_core/tests/system/test_system_device.py:759 still does
assert isinstance(version, system.NvlinkVersion). NvlinkVersion is no
longer in the system namespace (the __all__ change in _device.pyx
dropped it; _nvlink.pxi now returns a plain tuple[int, int]). This test
will fail with AttributeError: module 'cuda.core.system' has no attribute 'NvlinkVersion' on any host with NVLink. Replace with
assert isinstance(version, tuple) plus length / element checks.

Bug — supported_pstates can iterate beyond the valid range.
cuda_core/cuda/core/system/_device.pyx:1001 walks
nvml.device_get_supported_performance_states(...) and only filters out
PSTATE_UNKNOWN (= 32). The NVML header says unused trailing slots are
PSTATE_UNKNOWN, but if the driver ever returned a value outside 0..15
(e.g., a future PSTATE_*), _pstate_to_int would silently return
int(x) - 0, producing an out-of-contract integer. Either drop the value or
raise. The same risk applies to device.performance_state
(cuda_core/cuda/core/system/_device.pyx:971).

Bug — _pstate_to_enum name is wrong; takes an int and returns an int.
cuda_core/cuda/core/system/_device.pyx:29 is named _pstate_to_enum but
the body is
return int(pstate) + int(nvml.Pstates.PSTATE_0). It just shifts an int, and
since PSTATE_0 = 0 it's literally the identity for valid input. The
misnomer is confusing; rename to _int_to_pstate (or inline). The cast back
through int(...) is also redundant.

Bug — error messages report bit index, not the failing bit value.
In current_clock_event_reasons / supported_clock_event_reasons
(cuda_core/cuda/core/system/_device.pyx:670 and
cuda_core/cuda/core/system/_device.pyx:691), in CoolerInfo.target
(cuda_core/cuda/core/system/_cooler.pxi:76), and in
get_supported_event_types
(cuda_core/cuda/core/system/_device.pyx:811), the error path is:

for reason in _unpack_bitmask(reasons):       # reason is a bit *index* (0,1,2,...)
    try:
        output_reason = _CLOCKS_EVENT_REASONS_MAPPING[1 << reason]  # bit *value*
    except KeyError:
        raise ValueError(f"Unknown clock event reason bit: {reason}")  # reports index

The lookup uses 1 << reason but the error message reports reason. If a
future driver introduces a new bit, the user will see a confusing message
("bit: 9" when the real unmapped value is 0x200). Use 1 << reason in the
message and also include the bit index for context.

Bug — device.brand silently downgrades several brands to "Unknown".
_BRAND_TYPE_MAPPING (cuda_core/cuda/core/system/_device.pyx:95) is
missing BRAND_QUADRO_RTX, BRAND_NVIDIA_RTX, and BRAND_NVIDIA (all
defined in nvml.BrandType). The old code returned a typed BrandType enum
that could be matched against any of those; the new code uses
.get(..., "Unknown") so a Quadro RTX or "NVIDIA"-branded card will now
report as "Unknown" instead of its real brand. Either add the missing
entries or fall through to nvml.BrandType(brand).name.

Bug — pre-existing Device.__new__() call in EventData.device.
cuda_core/cuda/core/system/_event.pxi:64 reads
device = Device.__new__() (no cls argument). This is a pre-existing bug,
not introduced by this PR, but the PR is the right time to fix it because the
file is being touched. It will raise
TypeError: object.__new__(): not enough arguments the first time
event.device is accessed.

Sync guard (structural gap)

There is no mechanism today, and none added by this PR, to ensure the
cuda_core enum wrappers stay in sync with the underlying cuda_bindings
NVML enums. I checked for tests over __members__, code-gen hooks, CI
diff-checks, and runtime self-checks; none exist. The only thing close is the
runtime fallback pattern this PR introduces — _<NAME>_MAPPING.get(value, default) for the inbound direction and try: ... except KeyError: raise ValueError(...) for the outbound direction. That catches missing mappings
at the point of first encounter (a property access on a real device), but
only when the NVML driver actually returns the new value on the test host —
which CI labs don't control. Several of the bugs above (the missing
BRAND_* entries, the dropped THERMAL_GPU_RELATED) survived precisely
because nothing else flags them.

This is a real risk for cuda.core 1.0:

  • cuda_bindings is auto-generated from the NVML header. The header at the
    top of cuda_bindings/cuda/bindings/_internal/_fast_enum.py confirms it is
    generated, and the nvml.pyx comment says
    "automatically generated across versions from 12.9.1 to 13.2.0". NVML
    enums grow on every CUDA toolkit refresh.
  • This PR turns the wrappers into hand-curated StrEnums. So every NVML
    release will silently widen the binding's _FastEnum, while the
    cuda.core wrapper stays stuck at whatever was current when the wrapper
    was written. The fallback .get() pattern then either reports None /
    "Unknown", or — for outbound (user → driver) — raises ValueError for a
    value that NVML actually supports.
  • The repo already has a closed bug
    (#1712) for a similar
    problem (the explanation dicts going out of sync between cuda_core and
    cuda_bindings), and a known related bug
    (#1663) about Cython
    type-redefinition. So this category of drift is a known-recurring footgun
    in the codebase.

What a guard could look like, in rough order of cost vs. coverage:

  1. Cheapest — a single parametrized test in
    cuda_core/tests/system/ that imports both cuda.bindings.nvml and
    cuda.core.system, and for each wrapper asserts that every NVML member
    has a corresponding entry in _<NAME>_MAPPING (or is on a documented
    allow-list). Sketch:

    import pytest
    from cuda.bindings import nvml
    from cuda.core.system import _device
    
    WRAPPER_TO_BINDING = [
        (_device._ADDRESSING_MODE_MAPPING, nvml.DeviceAddressingModeType,
         {"DEVICE_ADDRESSING_MODE_NONE"}),
        (_device._AFFINITY_SCOPE_MAPPING, nvml.AffinityScope, set()),
        (_device._GPU_TOPOLOGY_LEVEL_MAPPING, nvml.GpuTopologyLevel, set()),
        (_device._EVENT_TYPE_MAPPING, nvml.EventType, set()),
        (_device._BRAND_TYPE_MAPPING, nvml.BrandType, {"BRAND_COUNT"}),
        # ... one entry per wrapper
    ]
    
    @pytest.mark.parametrize("mapping, binding, intentionally_unmapped",
                             WRAPPER_TO_BINDING)
    def test_wrapper_covers_all_binding_members(mapping, binding,
                                                intentionally_unmapped):
        binding_keys = set(binding.__members__) - intentionally_unmapped
        mapped_keys = (
            {m.name for m in mapping.keys() if isinstance(m, binding)}
            | {m.name for m in mapping.values() if isinstance(m, binding)}
        )
        missing = binding_keys - mapped_keys
        assert not missing, (
            f"{binding.__name__} is missing wrapper entries for: {missing}"
        )

    The intentionally_unmapped set is the explicit allow-list (e.g. *_COUNT
    sentinels, deprecated APP_CLOCK_*, the typo'd
    P2P_STATUS_CHIPSET_NOT_SUPPORED, etc.). When NVML adds a new member,
    this test fails on every CI host (no GPU required), and a maintainer
    either adds a wrapper entry or extends the allow-list with a comment
    explaining why.

  2. Medium — a small import-time _validate_mappings() behind
    if __debug__: or behind a CUDA_PYTHON_VALIDATE_ENUMS=1 env var. Same
    idea as Renaming #1, but lives next to the mappings so the failure mode is
    ImportError on cuda.core.system. I'd lean against this for a 1.0
    library — too noisy at import — but it's an option.

  3. Heavier — code-gen the mappings. Since cuda_bindings is already
    generator-driven, the _<NAME>_MAPPING dicts could be too: feed the same
    NVML header → emit a _generated_mappings.pxi with the forward dict, plus
    a side file listing "human-curated" StrEnum names that humans then
    maintain. The generator emits a placeholder # TODO map <new_member>
    comment when a new NVML member appears, which fails CI via a regex check.
    This is the strongest guarantee but is a much bigger change.

  4. Adjacent — pin a binding floor. cuda_core already has
    cuda-bindings[all]==12.* / 13.* in extras; if the wrappers target a
    specific NVML enum surface, pin cuda-bindings>=X,<Y so a user who
    mixes cuda-core 1.0.0 with a newer cuda-bindings gets a clean failure
    rather than silent "Unknown" / ValueError. (Issue
    #1715 was exactly the
    inverse problem — cuda.core demanding an unreleased bindings version —
    so the team already has scar tissue here.)

Given the milestone is May 7 and this review already turned up missing brand
entries and a missing cooler target, I'd push for option #1 as a blocker
for 1.0: it's ~80 lines, table-driven, no GPU required, and the allow-list
doubles as documentation for why certain NVML members are deliberately not
surfaced (deprecated / sentinel / typo / composite bitmask). I'm happy to
draft that test against the current branch so it's ready to drop in on top of
this PR.

Behavior / compatibility

Behavior — get_supported_event_types includes EventType.NONE mapping,
but the bitmask path can never produce it.

EventType.NONE = "none" is added to the wrapper but the bitmask path can
never produce it (no bit is set). _EVENT_TYPE_MAPPING includes
nvml.EventType.NONE → EventType.NONE only because it is needed for the
EventData.event_type property. That's fine, but consider documenting that
EventType.NONE is reserved for "no event" and isn't a registrable type.
Currently device.register_events([EventType.NONE]) is a silent no-op
(bitmask stays 0), which is surprising; consider raising.

Compatibility — Call out that this PR is a breaking change in the PR description /
release notes. The breaking label is missing.

Behavior — device.brand: BrandType → str (also see the bug above).
Switching to a free-form str (with "Unknown" fallback) means callers
can't reliably enumerate brands or do == against a name they didn't see in
CI. Was a Brand StrEnum considered and rejected? If yes, mention it in
the PR body so reviewers don't ask again.

Behavior — NvLink.version: NvlinkVersion → tuple[int, int]. This is a
clean improvement, but it's not in the issue #1995 scope. It would be good to
call out in the PR description that it's an additional change so it doesn't
surprise users.

Behavior — PciInfo.get_throughput(counter)
rx_throughput / tx_throughput properties.
Same — this is good
cleanup, but it changes the public API shape and isn't in the PR title or
description. Worth noting.

Docs

Docs — three private-doc enums still listed as cyclasses.
cuda_core/docs/source/api_private.rst:74 keeps
system._device.GpuP2PCapsIndex, system._device.GpuP2PStatus, and
system._device.GpuTopologyLevel under
:template: autosummary/cyclass.rst. After this PR they are pure-Python
StrEnums, not Cython classes. They should be moved to the lower section
(cuda_core/docs/source/api_private.rst:92+) alongside AddressingMode,
AffinityScope, etc., or rendered with the default template; otherwise the
docs build is likely to warn or render incorrectly.

Docs — Device.performance_state documentation feels split.
cuda_core/cuda/core/system/_device.pyx:957 returns int | None. The
current doc and runtime contract say "0 is highest, 15 is lowest, None if
unknown". That's reasonable, but dynamic_pstates_info and
register_events([EventType.PSTATE]) still use Pstate concepts; users who
read those docstrings have to mentally context-switch. Consider documenting
once on system.Device and cross-referencing.

Docs — stale references to old enum names.

  • cuda_core/cuda/core/system/_event.pxi:79,
    cuda_core/cuda/core/system/_event.pxi:92, and
    cuda_core/cuda/core/system/_event.pxi:105 still reference
    EventType.EVENT_TYPE_XID_CRITICAL_ERROR in docstrings. The new value is
    EventType.XID_CRITICAL_ERROR. The Sphinx links will fail (or worse,
    render as broken refs).
  • cuda_core/cuda/core/system/_system_events.pyx:168 example uses
    SystemEventType.SYSTEM_EVENT_TYPE_GPU_DRIVER_UNBIND. The new name is
    SystemEventType.UNBIND.
  • cuda_core/cuda/core/system/_event.pxi:95 (gpu_instance_id) and
    cuda_core/cuda/core/system/_event.pxi:108 (compute_instance_id)
    docstrings still use the old EVENT_TYPE_XID_CRITICAL_ERROR name.

Tests

Test — temperature-thresholds slice loses one threshold.
cuda_core/tests/system/test_system_device.py:680 is
for threshold in list(system.TemperatureThresholds)[:-1]:. Pre-PR, [:-1]
stripped TEMPERATURE_THRESHOLD_COUNT. Post-PR, the new StrEnum has 8 real
values and no sentinel; [:-1] now strips GPS_CURR instead. Drop the
[:-1]. Same audit for any other [:-1] over an enum in tests.

Test — register_events test misses positive case for str input.
cuda_core/tests/system/test_system_device.py:265-286 was simplified, but it
only checks register_events([]), register_events(0) (now correctly
rejected), and a pre-existing typed-list path on the systems test. It would
be good to also assert that device.register_events("xid_critical_error")
and device.register_events([EventType.XID_CRITICAL_ERROR]) both work, since
the whole point of this PR is dual support.

Style / nits

Style — module-level docstring assignment is repeated and noisy.
The pattern

class ClockId(StrEnum):
    """..."""
    CURRENT = "current"
ClockId.CURRENT.__doc__ = "Current actual clock value."

is duplicated across 8+ enums. Per #1995 mdboom called this "ugly but
works." Consider a small helper, e.g.
_set_member_docs(ClockId, {"CURRENT": "...", "CUSTOMER_BOOST_MAX": "..."}),
both for readability and to make it easy to lift these into autodoc later.

Style — import sys + version branch duplicated in two .pyx files.
cuda_core/cuda/core/system/_device.pyx:8 and
cuda_core/cuda/core/system/_system_events.pyx:8 have identical
if sys.version_info >= (3, 11): guards. With
cuda_core/pyproject.toml:22 requiring >=3.10, this is necessary. Consider
centralizing in a shared _compat.pxi (or in _device_utils.pxi) and
include-ing it from both, so future bumps to 3.11 are a single edit.

Style — inverse mappings constructed inconsistently.
_GPU_TOPOLOGY_LEVEL_INV_MAPPING, _EVENT_TYPE_INV_MAPPING,
_THERMAL_TARGET_INV_MAPPING, _SYSTEM_EVENT_TYPE_INV_MAPPING are all built
as {v: k for k, v in <forward>.items()}. Several other dicts (e.g.,
_CLOCK_ID_MAPPING, _GPU_P2P_CAPS_INDEX_MAPPING) are forward-only and used
only one way. Be explicit and consistent: either always have both, or comment
why the asymmetry exists.

Style — get_thermal_settings has a stranded TODO comment.
cuda_core/cuda/core/system/_temperature.pxi:257 has
# TODO: The above docstring is from the NVML header, but it doesn't seem to make sense. after the docstring. It would be clearer in the docstring
itself, or fixed properly while you're here.

Misc — nvml.GpuP2PStatus typo handled but with redundant key.
_GPU_P2P_STATUS_MAPPING (cuda_core/cuda/core/system/_device.pyx:164) maps
both P2P_STATUS_CHIPSET_NOT_SUPPORED (typo) and
P2P_STATUS_CHIPSET_NOT_SUPPORTED. Per the binding, these are aliases (same
int value), so one of these dict entries silently overwrites the other at
construction time. That's harmless, but worth a comment so a future cleanup
doesn't accidentally drop the typo'd alias and break callers using older
NVML.

Misc — CoolerTarget.GPU_RELATED not exposed.
nvml.CoolerTarget.THERMAL_GPU_RELATED is a composite bitmask
(GPU | MEMORY | POWER_SUPPLY); the new CoolerTarget StrEnum drops it
entirely. If the underlying NVML field is set to THERMAL_GPU_RELATED (and
the device reports it as a single composite rather than three individual
bits), the new code may iterate three bits and produce a longer list than
before. That's probably what's wanted, but worth a sentence in the property
docstring — and is exactly the sort of thing the sync-guard test would have
flagged.

Open questions

  1. Was a Brand StrEnum considered for device.brand and rejected? The
    current free-form str makes typed comparisons in user code impossible.
  2. Were the NvlinkVersion → tuple[int, int] and
    PciInfo.get_throughput → rx_throughput / tx_throughput changes intended
    to land here, or as separate PRs? They expand the scope beyond cuda.core: Replace string literals with enums in public API #1995.
  3. Is the team OK with the silent breaking change for callers who were
    passing nvml.<EnumType>.X directly? If yes, please call it out in the PR
    body and add the breaking label; if no, consider a one-release deprecation
    path that accepts the old _FastEnum value with a DeprecationWarning.

@leofang leofang added breaking Breaking changes are introduced labels May 4, 2026
@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented May 4, 2026

<rant>I think these epic auto-reviews are really hard to address, because they include a lot of reasonable comments with a lot of noise. Can whatever robo-tool being used actually comment on lines in the PR? I already pre-reviewed this with the same model, so a lot of these ideas I already rejected, but then of course there are mistakes that this found that my run didn't so it's not entirely without value. Just seems like a lot of extra time vs. how GitHub was designed to be used 🤷</rant>

I am going to respond to everything I think is wrong. If not mentioned, assume I have addressed and fixed it.

Bug — test_nvlink references a removed symbol. cuda_core/tests/system/test_system_device.py:759 still does
assert isinstance(version, system.NvlinkVersion)

Nope.

 rg system\.NvLinkVersion

Bug — supported_pstates can iterate beyond the valid range.
cuda_core/cuda/core/system/device.pyx:1001 walks
nvml.device_get_supported_performance_states(...) and only filters out
PSTATE_UNKNOWN (= 32). The NVML header says unused trailing slots are
PSTATE_UNKNOWN, but if the driver ever returned a value outside 0..15
(e.g., a future PSTATE
*), _pstate_to_int would silently return
int(x) - 0, producing an out-of-contract integer.

Disagree. If NVML doesn't follow the enum contract, all bets are off. I don't think we do this kind of defensive programming elsewhere. And it's Python -- it will raise, not segfault.

Bug — _pstate_to_enum name is wrong; takes an int and returns an int.
cuda_core/cuda/core/system/_device.pyx:29 is named _pstate_to_enum but
the body is
return int(pstate) + int(nvml.Pstates.PSTATE_0)

Since the enums used are Python enums, not Cython enums, we need to type it this way. Confusing, sure, but it's an internal convenience function. The fact that it's a no-op is fine -- it's to be resilient to changes in the underlying enum.

Bug — device.brand silently downgrades several brands to "Unknown".

Yep. Looks like your Opus 4.7 caught the error from my Opus 4.7 🙃

Behavior — get_supported_event_types includes EventType.NONE mapping,
but the bitmask path can never produce it. ... Currently device.register_events([EventType.NONE]) is a silent no-op
(bitmask stays 0), which is surprising; consider raising.

I think this is fine as-is.

Behavior — device.brand: BrandType → str

Yes, brands aren't really designed to be acted on programmatically -- they are primarily just a name to display to the user.

Docs — Device.performance_state documentation feels split.

Yes, this is the downside of moving away from numerical enums to an int. I'm not sure I agree with the solution.

Style — module-level docstring assignment is repeated and noisy.

Yes, but it's very explicit. I think the model's suggestion here is too magical.

Style — import sys + version branch duplicated in two .pyx files.

Again, this is fine, and pretty standard practice. If we do want to unify Python compat code, we should do it as a separate sweep.

Style — inverse mappings constructed inconsistently.

Python doesn't have dead code elimination, so we shouldn't create private objects that we would never use.

Style — get_thermal_settings has a stranded TODO comment.

No. This is where it should go. It can't go above the docstring.

Misc — nvml.GpuP2PStatus typo handled but with redundant key.
_GPU_P2P_STATUS_MAPPING (cuda_core/cuda/core/system/_device.pyx:164) maps
both P2P_STATUS_CHIPSET_NOT_SUPPORED (typo) and
P2P_STATUS_CHIPSET_NOT_SUPPORTED. Per the binding, these are aliases (same
int value), so one of these dict entries silently overwrites the other at
construction time.

I don't think we can guarantee that will always be the case, so it's safer as-is.

Was a Brand StrEnum considered for device.brand and rejected? The
current free-form str makes typed comparisons in user code impossible.

Yes, I think that's the right choice.

Were the NvlinkVersion → tuple[int, int] and
PciInfo.get_throughput → rx_throughput / tx_throughput changes intended
to land here, or as separate PRs? They expand the scope beyond #1995.

Yes. The scope was really to "be Pythonic", not to strictly adhere to wrapping enums as-is. If that were the goal, why bother with this at all?

Is the team OK with the silent breaking change for callers who were
passing nvml..X directly? If yes, please call it out in the PR
body and add the breaking label; if no, consider a one-release deprecation
path that accepts the old _FastEnum value with a DeprecationWarning.

Yes, prior to 1.0 we are fine with any breaking change without notice.

Sync guards

The agent's analysis of this problem really gets to the heart of why I wasn't sure any of this was a good idea. Though a lot of its analysis is based on an incomplete understanding about version compatibility and guarantees between cuda_core and cuda_bindings. However, I plan to experiment with its first suggestion. If that doesn't bear immediate fruit, we can separate it out into a separate issue, and consider sync guards across all of cuda_core. That is not blocking for 1.0

@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented May 4, 2026

Can whatever robo-tool being used actually comment on lines in the PR?

It can, but I intentionally didn't make use of that feature. I believe it'll make it even harder to know what came from where and when. — I figure an agent can easily go from the details in the comments to code edits, so figured it's better to have one review in a one-piece comment.

I already pre-reviewed this with the same model, so a lot of these ideas I already rejected,

We could post such things as comments on the PRs before sending them out for review. — But again, I wouldn't want to have the tool auto-post comments. I agree there is a lot of noise, I want to weed out what I can, and actually read at least once what it wrote. So I always post my comments manually. (Maybe one day that'll be futile, as the tools get better, but I don't think we're there yet.)

cdef object _pstate_to_int(object pstate):
if pstate == nvml.Pstates.PSTATE_UNKNOWN:
return None
return int(pstate) - int(nvml.Pstates.PSTATE_0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't let this go unchecked (bugs happen), but keep it simple:

    assert int(pstate) >= int(nvml.Pstates.PSTATE_0)

This is the only item my agent still pulled out as noteworthy.

Copy link
Copy Markdown
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Mike, no major issues found.

Comment thread cuda_core/cuda/core/system/_device.pyx Outdated
Comment thread cuda_core/cuda/core/system/_clock.pxi Outdated
Comment thread cuda_core/cuda/core/system/_clock.pxi
Comment thread cuda_core/cuda/core/system/_event.pxi
Comment thread cuda_core/cuda/core/system/_device.pyx Outdated
import warnings

from cuda.bindings import nvml
from cuda.bindings._internal._fast_enum import FastEnum
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Do we need a try-except to check if FastEnum exists (and fall back to IntEnum if not)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, I suppose this limits us to a recent-ish cuda_bindings. But all of cuda.core.system is already limited in the same way... Anyway, it can't hurt to be careful.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually a good point -- IIRC FastEnum was introduced at the same time when the nvml bindings were released?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a little bit after. So I think doing a try/except ImportError thing is not a bad idea, at least for a little while.

@mdboom mdboom enabled auto-merge (squash) May 5, 2026 00:38
return nvml.device_get_max_customer_boost_clock(self._handle, self._clock_type)

def get_min_max_clock_of_pstate_mhz(self, pstate: Pstates) -> tuple[int, int]:
def get_min_max_clock_of_pstate_mhz(self, pstate: int) -> tuple[int, int]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we intending to move to a integral type here: Pstates -> int?


import sys
if sys.version_info >= (3, 11):
from enum import StrEnum
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would it better to have a common .py file that holds this implementation and all other cuda-python code can just import it? Rather than copy this version check to all the sites that need it?


from cuda.bindings import nvml
try:
from cuda.bindings._internal._fast_enum import FastEnum
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Ditto above about hoisting this backwards compat code into a common location that can be referenced.

The type of event that was triggered.
"""
return EventType(self._event_data.event_type)
return _EVENT_TYPE_MAPPING[self._event_data.event_type]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Bare dict lookup will raise KeyError if the driver returns an event type not in _EVENT_TYPE_MAPPING. Consider .get(...) with a sentinel/None (or wrap with a clearer error) so a property accessor doesn't blow up on values introduced by newer drivers.

The :obj:`~SystemEventType` that was triggered.
"""
return SystemEventType(self._event_data.event_type)
return _SYSTEM_EVENT_TYPE_MAPPING[self._event_data.event_type]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Same as in _event.pxi — bare lookup raises KeyError on unmapped values. Consider .get(...) with a fallback for forward-compat with newer drivers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine and not worth the performance impact. The value comes from C++ code. It would be a runtime, not a user error, if that were ever to happen.

For all CUDA-capable discrete products with fans.
"""
return FanControlPolicy(nvml.device_get_fan_control_policy_v2(self._handle, self._fan))
return _FAN_CONTROL_POLICY_MAPPING[nvml.device_get_fan_control_policy_v2(self._handle, self._fan)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Bare _FAN_CONTROL_POLICY_MAPPING[...] lookup will raise KeyError for any policy value the wrapper doesn't know. Other wrappers in this PR use .get(..., fallback) — worth being consistent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other wrappers use .get because the value is either explicitly unbounded in the NVML docs or comes from the user. That is not the case here.

Comment on lines +29 to +31
assert (
int(pstate) >= 0 and int(pstate) <= 15
), f"Invalid P-state: {pstate}. Must be between 0 and 15 inclusive, or PSTATE_UNKNOWN."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert is stripped under python -O, so this bounds check silently disappears in optimized runs. Prefer raising ValueError for runtime input validation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we don't need to validate values coming from NVML, IMHO.

return NvlinkVersion(nvml.device_get_nvlink_version(self._device._handle, self._link))
version = nvml.device_get_nvlink_version(self._device._handle, self._link)
if version == nvml.NvlinkVersion.VERSION_INVALID:
raise RuntimeError(f"Invalid NvLink version returned for device")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f-string with no {} interpolation — either drop the f prefix or include the offending version value in the message (more useful for debugging). Ruff would flag this as F541; cython-lint won't.

Comment on lines +191 to +193
# Typo in upstream library
nvml.GpuP2PStatus.P2P_STATUS_CHIPSET_NOT_SUPPORED: GpuP2PStatus.CHIPSET_NOT_SUPPORTED,
nvml.GpuP2PStatus.P2P_STATUS_CHIPSET_NOT_SUPPORTED: GpuP2PStatus.CHIPSET_NOT_SUPPORTED,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typo'd P2P_STATUS_CHIPSET_NOT_SUPPORED and the corrected P2P_STATUS_CHIPSET_NOT_SUPPORTED alias to the same integer value, so the second entry overwrites the first in the dict — the first line is dead. Either drop one or add a comment noting it's intentional dedup coverage.

Comment on lines +20 to +23
if sys.version_info >= (3, 11):
from enum import StrEnum
else:
from backports.strenum import StrEnum
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This StrEnum / backports.strenum compat block is duplicated yet again here. Folds into the hoist suggested in the existing threads on _device.pyx (r3185426568, r3185428204).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #2019.

@mdboom mdboom merged commit ecd558a into NVIDIA:main May 5, 2026
94 checks passed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking changes are introduced cuda.core Everything related to the cuda.core module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants