Checking for RDMA support before allocating via VMM in test suite #1179

rparolin · 2025-10-22T21:47:13Z

This PR addresses issues with GPU Direct RDMA support validation in the Virtual Memory Resource (VMM) allocator and improves test coverage for memory management functionality.

Removed hardcoded platform checks: Eliminated Windows and WSL-specific skips in favor of device capability checks.
New test case: Added test_vmm_allocator_rdma_unsupported_exception() to verify proper error handling when RDMA is requested on unsupported devices.

copy-pr-bot · 2025-10-22T21:47:16Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rparolin · 2025-10-22T21:50:19Z

/ok to test aba3a17

rparolin · 2025-10-22T21:54:32Z

@greptileai

leofang

It occurs to me that none of us (Ben, Keith, myself) read the docs when getting the VMM PR merged. The docs made it clear that there is one device attribute that we should check (which is typical to all major CUDA features, as we did in the IPC mempool test helper).
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#query-for-support

Translate this to cuda.core, we need to check

dev = Device()
if not dev.properties.virtual_memory_management_supported:
    pytest.skip(...)

greptile-apps · 2025-10-22T22:06:19Z

Greptile encountered an error while reviewing this PR. Please reach out to support@greptile.com for assistance.

rparolin · 2025-10-22T23:16:21Z

/ok to test ebc6818

rparolin · 2025-10-22T23:22:48Z

/ok to test ebc6818

rparolin · 2025-10-22T23:33:50Z

It occurs to me that none of us (Ben, Keith, myself) read the docs when getting the VMM PR merged. The docs made it clear that there is one device attribute that we should check (which is typical to all major CUDA features, as we did in the IPC mempool test helper). https://docs.nvidia.com/cuda/cuda-c-programming-guide/#query-for-support

Translate this to cuda.core, we need to check
dev = Device()
if not dev.properties.virtual_memory_management_supported:
    pytest.skip(...)

As discussed in person, migrated the majority of the test suites skip checks to use dev.properties.virtual_memory_management_supported but also check for device support for RDMA when the user is explicitly requesting it via our API.

cuda_core/cuda/core/experimental/_memory.pyx

github-actions · 2025-10-23T01:09:35Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

Checking for RDMA support before allocating via VMM

0750227

rparolin requested a review from leofang October 22, 2025 21:49

whitespace

aba3a17

rparolin marked this pull request as ready for review October 22, 2025 21:50

rparolin changed the title ~~Checking for RDMA support before allocating via VMM~~ Checking for RDMA support before allocating via VMM in test suite Oct 22, 2025

This comment has been minimized.

Sign in to view

leofang requested changes Oct 22, 2025

View reviewed changes

rparolin added 2 commits October 22, 2025 15:59

Improving the test_memory suite.

50224bb

improving tests and skip checks

ebc6818

leofang approved these changes Oct 22, 2025

View reviewed changes

leofang linked an issue Oct 22, 2025 that may be closed by this pull request

Investigate VMM issues on WSL #1175

Closed

leofang assigned rparolin Oct 22, 2025

leofang added bug Something isn't working P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Oct 22, 2025

leofang added this to the cuda.core beta 8 milestone Oct 22, 2025

rparolin enabled auto-merge (squash) October 22, 2025 23:34

leofang reviewed Oct 23, 2025

View reviewed changes

cuda_core/cuda/core/experimental/_memory.pyx Show resolved Hide resolved

rparolin merged commit f3cb5a2 into NVIDIA:main Oct 23, 2025
74 checks passed

leofang modified the milestones: cuda.core beta 9, cuda.core beta 8 Oct 28, 2025

leofang mentioned this pull request Nov 10, 2025

[BUG] Remove the check disabling windows support, and replace by checking for CUDA VMM API support #1229

Merged

rwgk mentioned this pull request Nov 19, 2025

[no-ci] Fix comment and skip message in test_memory.py::test_vmm_allocator_policy_configuration #1266

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Checking for RDMA support before allocating via VMM in test suite #1179

Checking for RDMA support before allocating via VMM in test suite #1179

Uh oh!

rparolin commented Oct 22, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 22, 2025

Uh oh!

rparolin commented Oct 22, 2025

Uh oh!

rparolin commented Oct 22, 2025

Uh oh!

This comment has been minimized.

leofang left a comment

Uh oh!

greptile-apps bot commented Oct 22, 2025

Uh oh!

rparolin commented Oct 22, 2025

Uh oh!

rparolin commented Oct 22, 2025

Uh oh!

rparolin commented Oct 22, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Checking for RDMA support before allocating via VMM in test suite #1179

Checking for RDMA support before allocating via VMM in test suite #1179

Uh oh!

Conversation

rparolin commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Oct 22, 2025

Uh oh!

rparolin commented Oct 22, 2025

Uh oh!

rparolin commented Oct 22, 2025

Uh oh!

This comment has been minimized.

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Oct 22, 2025

Uh oh!

rparolin commented Oct 22, 2025

Uh oh!

rparolin commented Oct 22, 2025

Uh oh!

rparolin commented Oct 22, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rparolin commented Oct 22, 2025 •

edited

Loading