Adds use_pool option to DeviceMemoryResource. #1192

Andy-Jost · 2025-10-27T19:52:44Z

Description

closes #963

Adds use_pool option to DeviceMemoryResource to allow disabling memory pool use. This allows for graph capture of stream-ordered allocations.

copy-pr-bot · 2025-10-27T19:52:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Andy-Jost · 2025-10-27T19:53:55Z

This is by no means ready for merge. I'm interested in getting feedback on whether this is the right direction.

greptile-apps

Greptile Overview

Greptile Summary

This PR introduces a use_pool option to DeviceMemoryResource that enables CUDA graph capture of stream-ordered memory allocations. When use_pool=False, the resource bypasses memory pool creation and uses cuMemAllocAsync directly instead of cuMemAllocFromPoolAsync, allowing these allocations to be captured in graph nodes. The change maintains backward compatibility by defaulting use_pool=True and adds validation to prevent conflicting configurations (IPC requires pools). This extends the cuda.core.experimental memory abstraction to support both traditional pool-based allocations and graph-capturable stream-ordered allocations within a unified interface.

Important Files Changed

Filename	Score	Overview
cuda_core/cuda/core/experimental/_memory.pyx	3/5	Adds `use_pool` option to `DeviceMemoryResourceOptions`, conditional pool creation logic, allocation path branching between pool and non-pool modes, and validation for attribute access
cuda_core/tests/test_graph2.py	4/5	New test file validating graph capture with non-pooled stream-ordered allocations, kernel launches, and result verification

Confidence score: 3/5

This PR requires careful review due to a potential runtime bug and incomplete cleanup
Score lowered due to: (1) line 542 check mr.handle is None likely incorrect since handle property returns a CUmemoryPool object not None for NULL handles, (2) commented-out test_no_graph function should be removed, (3) numerous unused imports in test file
Pay close attention to cuda_core/cuda/core/experimental/_memory.pyx lines 542-543 for the handle validation logic, and cuda_core/tests/test_graph2.py for cleanup of development artifacts

Sequence Diagram

sequenceDiagram
    participant User
    participant Device
    participant GraphBuilder
    participant DeviceMemoryResource
    participant Stream
    participant Module
    participant Graph
    participant Buffer

    User->>Device: "create_graph_builder()"
    Device-->>GraphBuilder: "return graph_builder"
    User->>GraphBuilder: "begin_building(mode='thread_local')"
    GraphBuilder-->>GraphBuilder: "create internal stream"
    GraphBuilder-->>User: "return self"
    
    User->>Device: "create device with use_pool=False"
    Device-->>DeviceMemoryResource: "create with use_pool=False"
    DeviceMemoryResource-->>User: "return mr"
    
    User->>Module: "get_kernel('set_zero')"
    Module-->>User: "return set_zero kernel"
    User->>Module: "get_kernel('add_one')"
    Module-->>User: "return add_one kernel"
    
    User->>DeviceMemoryResource: "allocate(NBYTES, stream=target_stream)"
    DeviceMemoryResource->>Stream: "cuMemAllocAsync()"
    Stream-->>Buffer: "create buffer"
    Buffer-->>User: "return target buffer"
    
    User->>DeviceMemoryResource: "allocate(NBYTES, stream=gb.stream)"
    DeviceMemoryResource->>GraphBuilder: "get capture stream"
    DeviceMemoryResource->>Stream: "cuMemAllocAsync() [captured]"
    Stream-->>Buffer: "create work_buffer"
    Buffer-->>User: "return work_buffer"
    
    User->>GraphBuilder: "launch(set_zero, work_buffer)"
    GraphBuilder->>Stream: "record kernel launch [captured]"
    User->>GraphBuilder: "launch(add_one, work_buffer)"
    GraphBuilder->>Stream: "record kernel launch [captured]"
    User->>GraphBuilder: "launch(add_one, work_buffer)"
    GraphBuilder->>Stream: "record kernel launch [captured]"
    
    User->>Buffer: "target.copy_from(work_buffer)"
    Buffer->>Stream: "cuMemcpyAsync() [captured]"
    
    User->>GraphBuilder: "end_building()"
    GraphBuilder->>Graph: "finalize capture"
    GraphBuilder-->>User: "return graph"
    
    User->>Graph: "complete()"
    Graph-->>User: "return completed graph"
    
    User->>Stream: "create_stream()"
    Stream-->>User: "return stream"
    
    User->>Graph: "upload(stream)"
    Graph->>Stream: "upload executable"
    
    User->>Graph: "launch(stream)"
    Graph->>Stream: "launch graph"
    
    User->>Stream: "sync()"
    Stream-->>User: "wait for completion"
    
    User->>Buffer: "compare_buffer.copy_from(target)"
    Buffer->>Stream: "cuMemcpyAsync()"
    User->>Stream: "sync()"
    Stream-->>User: "verification complete"

_{2 files reviewed, 8 comments}

_{Edit Code Review Agent Settings | Greptile}

cuda_core/tests/test_graph2.py

greptile-apps · 2025-10-27T19:54:22Z

cuda_core/cuda/core/experimental/_memory.pyx

+                if mr.handle is None:
+                    raise RuntimeError("DeviceMemoryResource is not configured to use a memory pool")


logic: mr.handle property returns a driver.CUmemoryPool object (line 982), not None. The check should be mr._mempool_handle != NULL or use the is_using_pool property.

greptile-apps · 2025-10-27T19:54:23Z

cuda_core/cuda/core/experimental/_memory.pyx

+                if opts.ipc_enabled:
+                    raise RuntimeError("Cannot supply ipc_enabled=True with use_pool=False")


style: Setting ipc_enabled=True with use_pool=False is correctly rejected, but this leaves the resource partially initialized (device_id, handle_type, etc. are set). Consider moving this validation before any state changes.

Andy-Jost · 2025-10-27T19:57:34Z

cuda_core/tests/test_graph2.py

+    # import code
+    # code.interact(local=dict(globals(), **locals()))
+    work_buffer = mr.allocate(NBYTES, stream=gb.stream)
+    launch(gb, LaunchConfig(grid=1, block=1), set_zero, int(work_buffer.handle), NBYTES)
+    launch(gb, LaunchConfig(grid=1, block=1), add_one, int(work_buffer.handle), NBYTES)
+    launch(gb, LaunchConfig(grid=1, block=1), add_one, int(work_buffer.handle), NBYTES)
+    target.copy_from(work_buffer, stream=gb.stream)


I believe this is the sort of use we are targeting.

pciolkosz · 2025-10-27T20:11:20Z

Why not add a separate memory resource type for graphs? The behavior is different enough that a separate type might be needed

Andy-Jost · 2025-10-28T15:09:03Z

Why not add a separate memory resource type for graphs? The behavior is different enough that a separate type might be needed

I would much prefer to do that. When I added IPC support to DeviceMemoryResource I initially made DeviceMemoryResource.__new__ a factory method that chose the correct subclass, but that idea was rejected. I certainly think the existing code is getting way too complicated but I don't have a good sense of what alternatives will be accepted.

Adds use_pool option to DeviceMemoryResource.

8cdfda8

Andy-Jost self-assigned this Oct 27, 2025

Andy-Jost marked this pull request as draft October 27, 2025 19:53

Andy-Jost requested a review from leofang October 27, 2025 19:53

greptile-apps bot reviewed Oct 27, 2025

View reviewed changes

Andy-Jost commented Oct 27, 2025

View reviewed changes

leofang added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Oct 28, 2025

leofang added this to the cuda.core beta 9 milestone Oct 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adds use_pool option to DeviceMemoryResource. #1192

Adds use_pool option to DeviceMemoryResource. #1192

Uh oh!

Andy-Jost commented Oct 27, 2025

Uh oh!

copy-pr-bot bot commented Oct 27, 2025

Uh oh!

Andy-Jost commented Oct 27, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Oct 27, 2025

Uh oh!

greptile-apps bot Oct 27, 2025

Uh oh!

Andy-Jost Oct 27, 2025

Uh oh!

pciolkosz commented Oct 27, 2025

Uh oh!

Andy-Jost commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if mr.handle is None:
		raise RuntimeError("DeviceMemoryResource is not configured to use a memory pool")

		if opts.ipc_enabled:
		raise RuntimeError("Cannot supply ipc_enabled=True with use_pool=False")

Uh oh!

Adds use_pool option to DeviceMemoryResource. #1192

Are you sure you want to change the base?

Adds use_pool option to DeviceMemoryResource. #1192

Uh oh!

Conversation

Andy-Jost commented Oct 27, 2025

Description

Uh oh!

copy-pr-bot bot commented Oct 27, 2025

Uh oh!

Andy-Jost commented Oct 27, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 3/5

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

pciolkosz commented Oct 27, 2025

Uh oh!

Andy-Jost commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants