Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

This reverts commit 20f29e9.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost Andy-Jost requested a review from leofang October 22, 2025 14:43
@Andy-Jost
Copy link
Contributor Author

/ok to test 1d5bd94

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR completely reverts commit 20f29000000000.07 which implemented IPC-enabled events for cuda.core.experimental. The revert removes all IPC event functionality from the Event and Stream classes by deleting IPCEventDescriptor, get_ipc_descriptor(), from_ipc_descriptor(), multiprocessing serialization support, and the _ipc_enabled flag. Test infrastructure added for IPC events is removed (test files, helper modules including buffers.py, latch.py, logging.py) or restored to previous implementations. The remaining IPC memory tests revert from PatternGen to IPCBufferTestHelper and add platform compatibility checks using supports_ipc_mempool() to gracefully skip tests on systems where the driver rejects IPC-enabled mempool creation. The Event.is_ipc_supported property now raises NotImplementedError instead of returning a boolean, marking IPC support as still work-in-progress. The revert affects only the experimental API and leaves the broader cuda-bindings layer unchanged.

Potential Issues:

  1. Breaking API change in Event.is_ipc_supported: The property changed from returning a boolean to raising NotImplementedError (lines 165-167 in _event.pyx). Any code checking if event.is_ipc_supported: will now crash instead of returning False. Consider deprecation warnings before breaking changes in experimental APIs.

  2. Type safety regression in DeviceMemoryResourceOptions.max_size: Changed from cython.size_t (unsigned) to cython.int (signed) in _memory.pyx line 514. Negative values could now be passed where memory sizes should always be non-negative, potentially causing undefined behavior in CUDA driver calls.

  3. Missing platform check in TestIPCSharedAllocationHandleAndBufferObjects: Unlike the other three IPC test classes (lines 19-22, 60-63, 112-115 in test_memory_ipc.py), this test class at line 163 lacks the supports_ipc_mempool() check, which may cause test failures on WSL or platforms with limited IPC support.

  4. Test helper instantiation pattern creates potential state bugs: In test_send_buffers.py and test_serialize.py, IPCBufferTestHelper instances are created multiple times per buffer (once for fill, once for verify) rather than reused. If the helper's internal state depends on initialization context (stream creation, scratch buffer allocation), this could lead to subtle verification failures.

  5. API semantic equivalence assumption: The revert assumes PatternGen.fill_buffer(seed=True) is equivalent to IPCBufferTestHelper.fill_buffer(flipped=True). If the data patterns differ, tests will pass but validate incorrect buffer contents. The inconsistent API in test_memory_ipc.py (some tests use flipped=True/False, others use starting_from=<int>) suggests incomplete refactoring.

Confidence: 3/5 - This is a clean revert with clear intent, but the breaking API change in is_ipc_supported, the type regression in max_size, and potential test coverage gaps reduce confidence. The revert removes a significant feature, so thorough testing on all supported platforms (especially WSL and multi-GPU systems) is critical before merge.

Additional Comments (1)

  1. cuda_core/tests/memory_ipc/test_memory_ipc.py, line 163 (link)

    logic: Missing IPC mempool support check. Other test classes guard with if not supports_ipc_mempool(ipc_device): but this one doesn't.

20 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

@github-actions

This comment has been minimized.

@Andy-Jost Andy-Jost merged commit bcd40ff into NVIDIA:main Oct 22, 2025
71 checks passed
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@Andy-Jost Andy-Jost deleted the revert_ipc_events branch October 22, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants