Skip to content

cuda.core: reject stream=None — require explicit stream everywhere #2001

@leofang

Description

@leofang

Summary

Several cuda.core APIs accept stream=None and silently fall back to default_stream() (or NULL). This makes the stream choice implicit and environment-dependent (CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM), which is error-prone. Users should always pass a stream explicitly — including device.default_stream when they want the default.

Design rule

stream should be a keyword-only argument with no default value (*, stream: Stream), so callers must always pass a stream explicitly.

APIs that already follow this convention:

  • Buffer.copy_to(dst=None, *, stream: Stream | GraphBuilder)
  • Buffer.copy_from(src, *, stream: Stream | GraphBuilder)
  • Buffer.fill(value, *, stream: Stream | GraphBuilder)

APIs that need to change

stream=Nonedefault_stream() (implicit fallback, remove it)

  • MemoryPool.allocate() / deallocate()
  • GraphMemoryResource.allocate() / deallocate()
  • GraphicsResource.map()
  • LegacyPinnedMemoryResource.allocate()
  • _SynchronousMemoryResource.allocate()
  • Device.allocate() (delegates to the above)

stream=None → NULL / legacy default (same issue)

  • Kernel.max_potential_cluster_size(config, stream=None)
  • Kernel.max_active_clusters(config, stream=None)

stream should become keyword-only

  • GraphBuilder.launch(self, stream: Stream) — currently positional, should be (self, *, stream: Stream)

APIs that are fine (no change needed)

  • launch(stream, config, kernel, *args) — stream is the 1st positional arg by design
  • Buffer.close(stream=None) — None means "reuse the original stream", by design
  • GraphicsResource.unmap(stream=None) / close(stream=None) — same, reuses the mapping stream

Proposed implementation

  1. Have Stream_accept() raise TypeError when stream is None, so the check is centralized and cannot be forgotten.
  2. Remove the = None default from every affected API signature.
  3. Make stream keyword-only where it isn't already (except launch() where it's intentionally positional).

Prior art: CCCL's cccl-runtime

CCCL's modern CUDA runtime (libcudacxx, cuda:: namespace) follows the same explicit-stream principle:

The CUDA default (NULL) stream is not exposed as a first-class runtime object because it is tied to implicit per-device state and encourages hidden dependencies.

docs/libcudacxx/runtime/cudart_interactions.rst

Concretely:

  • cuda::launch(stream, config, kernel, args...) requires an explicit stream_ref as the first parameter, with no default.
  • The cuda::mr::resource concept requires allocate(cuda::stream_ref, ...) / deallocate(cuda::stream_ref, ...) — no fallback to a default stream.
  • cuda::stream_ref deletes constructors from int and nullptr, and deprecates the parameterless (default stream) constructor.

Metadata

Metadata

Assignees

Labels

P0High priority - Must do!bugSomething isn't workingcuda.coreEverything related to the cuda.core module

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions