Skip to content

cuda.core: trim graph API surface for v1.0 (drop top-level re-exports, remove GraphAllocOptions)#2048

Merged
Andy-Jost merged 4 commits intoNVIDIA:mainfrom
Andy-Jost:ajost/graph-cleanup-no-reexport
May 7, 2026
Merged

cuda.core: trim graph API surface for v1.0 (drop top-level re-exports, remove GraphAllocOptions)#2048
Andy-Jost merged 4 commits intoNVIDIA:mainfrom
Andy-Jost:ajost/graph-cleanup-no-reexport

Conversation

@Andy-Jost
Copy link
Copy Markdown
Contributor

Summary

Two pre-1.0 breaking-change cleanups to the graph API, applied as separate commits.

  1. Stop re-exporting graph types from the top-level cuda.core namespace; they live under cuda.core.graph only. The same symbols are also dropped from the deprecated cuda.core.experimental shim. The cuda.core.graph submodule itself remains accessible after import cuda.core (added to the existing from cuda.core import checkpoint, graph, system, utils line), so cuda.core.graph.X continues to work without an explicit submodule import.
  2. Remove the GraphAllocOptions dataclass and the AllocNode.options round-trip property. Its three fields are now keyword-only parameters on GraphDefinition.allocate and GraphNode.allocate: device, memory_type, peer_access. The same data is still readable on the resulting node via the existing device_id, memory_type, and peer_access properties.

Changes

Commit 1 - drop top-level graph re-exports:

  • cuda_core/cuda/core/__init__.py: removed from cuda.core.graph import (Graph, GraphAllocOptions, GraphBuilder, GraphCompleteOptions, GraphCondition, GraphDebugPrintOptions, GraphDefinition); added graph to the existing submodule import line.
  • cuda_core/cuda/core/experimental/__init__.py: removed the matching from cuda.core.graph import (Graph, GraphBuilder, GraphCompleteOptions, GraphDebugPrintOptions) block.
  • cuda_core/tests/test_experimental_backward_compat.py: dropped four assertions that exercised the removed forwarding.
  • Five graph tests under cuda_core/tests/graph/ updated to import the affected names from cuda.core.graph.
  • cuda_core/docs/source/getting-started.rst: :class:\GraphBuilder`->:class:`graph.GraphBuilder``.

Commit 2 - remove GraphAllocOptions:

  • cuda_core/cuda/core/graph/_graph_definition.pyx: removed the GraphAllocOptions dataclass; new GraphDefinition.allocate(size, *, device=None, memory_type=GraphMemoryType.DEVICE, peer_access=None) signature.
  • cuda_core/cuda/core/graph/_graph_node.pyx: same kwargs on GraphNode.allocate (with full per-parameter docstring); inlined the params into GN_alloc. Also removed an unsubstantiated note claiming the allocation uses the device's default mempool.
  • cuda_core/cuda/core/graph/_subclasses.pyx: removed AllocNode.options and its docstring entry.
  • cuda_core/tests/graph/test_graph_definition.py: dropped the import; updated four call sites and two helpers (also dropped the \"options\" key from expected-attrs dicts).
  • cuda_core/docs/source/api.rst: removed graph.GraphAllocOptions from the dataclass autosummary.

Both commits add corresponding entries under "Breaking changes" in cuda_core/docs/source/release/1.0.0-notes.rst.

Test Coverage

Existing graph tests (cuda_core/tests/graph/test_graph_definition.py, test_graph_builder*.py, test_graph_memory_resource.py, test_options.py, test_device_launch.py) and test_experimental_backward_compat.py cover all the touched code paths and were updated in lockstep with the API changes. Local run on the user's GPU machine passed before push.

Related Work

Part of the v1.0 API cleanup tracked in the breaking-changes section of cuda_core/docs/source/release/1.0.0-notes.rst.

Made with Cursor

Andy-Jost and others added 2 commits May 7, 2026 11:03
The graph types (Graph, GraphAllocOptions, GraphBuilder,
GraphCompleteOptions, GraphCondition, GraphDebugPrintOptions,
GraphDefinition) are now reachable only from the cuda.core.graph
submodule. The submodule itself is still loaded by `import cuda.core`
(via `from cuda.core import ... graph ...`), so `cuda.core.graph.X`
remains accessible without an explicit submodule import.

The same symbols are also no longer forwarded through the deprecated
cuda.core.experimental shim.

Documents the change as a breaking change in the 1.0.0 release notes
and updates internal tests and the getting-started guide to import
through cuda.core.graph.

Co-authored-by: Cursor <cursoragent@cursor.com>
GraphDefinition.allocate and GraphNode.allocate now accept device,
memory_type, and peer_access as keyword-only arguments instead of
a positional GraphAllocOptions dataclass. The dataclass and its
companion AllocNode.options round-trip property are removed; the
existing AllocNode.device_id, .memory_type, and .peer_access
properties cover that data directly.

Documents the change as a breaking change in the 1.0.0 release notes
and removes the type from the API reference autosummary.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Andy-Jost Andy-Jost added this to the cuda.core v1.0.0 milestone May 7, 2026
@Andy-Jost Andy-Jost added enhancement Any code-related improvements P0 High priority - Must do! cuda.core Everything related to the cuda.core module breaking Breaking changes are introduced labels May 7, 2026
@Andy-Jost Andy-Jost self-assigned this May 7, 2026
@Andy-Jost Andy-Jost added enhancement Any code-related improvements P0 High priority - Must do! cuda.core Everything related to the cuda.core module breaking Breaking changes are introduced labels May 7, 2026
Comment thread cuda_core/cuda/core/graph/_graph_definition.pyx Outdated
Comment thread cuda_core/cuda/core/graph/_graph_definition.pyx Outdated
Address review comments from @leofang on PR NVIDIA#2048: annotate the
device and peer_access parameters of GraphDefinition.allocate and
GraphNode.allocate as `"Device" | int | None` and
`list["Device" | int] | None` respectively, instead of leaving them
untyped.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Andy-Jost Andy-Jost enabled auto-merge (squash) May 7, 2026 18:45
@github-actions

This comment has been minimized.

@leofang
Copy link
Copy Markdown
Member

leofang commented May 7, 2026

@Andy-Jost CI failed

…rcular load

Importing cuda.core.graph from the top of cuda/core/__init__.py triggers a
load of cuda.core.graph._graph_builder, which cimports cuda.core._stream and
other extensions. While cuda.core itself is still initializing, those
circular loads leave the graph submodules partially initialized when
`from ._graph_builder import *` runs in cuda/core/graph/__init__.py, and
Graph, GraphBuilder, GraphCompleteOptions, and GraphDebugPrintOptions
silently fail to surface on cuda.core.graph.

Defer `import cuda.core.graph` until after every cuda.core._* extension has
been loaded so the inner `from ._graph_builder import *` finds a fully
initialized module. The standalone `import` form (rather than
`from cuda.core import graph`) keeps it from being collapsed back into the
checkpoint/system/utils block by ruff's import sorter; an `# isort: split`
marker pins the placement.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Andy-Jost Andy-Jost force-pushed the ajost/graph-cleanup-no-reexport branch from 0fab123 to a93ea03 Compare May 7, 2026 19:47
@Andy-Jost Andy-Jost merged commit 986bfbc into NVIDIA:main May 7, 2026
94 checks passed
@Andy-Jost Andy-Jost deleted the ajost/graph-cleanup-no-reexport branch May 7, 2026 20:48
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking changes are introduced cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants