Skip to content

Improve error message for ManagedMemoryResource() on unsupported platforms#1835

Open
Andy-Jost wants to merge 1 commit intoNVIDIA:mainfrom
Andy-Jost:managed-mr-error-message
Open

Improve error message for ManagedMemoryResource() on unsupported platforms#1835
Andy-Jost wants to merge 1 commit intoNVIDIA:mainfrom
Andy-Jost:managed-mr-error-message

Conversation

@Andy-Jost
Copy link
Copy Markdown
Contributor

@Andy-Jost Andy-Jost commented Mar 30, 2026

Closes #1617

Summary

ManagedMemoryResource() (no options) calls cuMemGetMemPool to retrieve the default managed memory pool, but on platforms without concurrent managed access (e.g. WSL2), this fails with a cryptic CUDA_ERROR_NOT_SUPPORTED. Meanwhile, explicitly creating a pool via ManagedMemoryResource(options=ManagedMemoryResourceOptions(...)) works fine on the same platform.

This PR catches the error and re-raises it as a RuntimeError with an actionable message pointing users to the explicit options path. The improved message is only emitted when concurrent managed access is confirmed to be unavailable; otherwise the original CUDAError propagates unchanged.

The error is identified via string match on the CUDAError message rather than inspecting a structured error code because (1) we prefer not to change the CUDAError class or the MP_init_current_pool API for this, and (2) this is not a hot path.

Changes

  • _managed_memory_resource.pyx — catch CUDAError from MP_init_current_pool in the opts is None path; check concurrent_managed_access via device properties and raise a clear RuntimeError when applicable
  • _memory_pool.pyx — improve the CUDA < 13 fallback error message in MP_init_current_pool to describe the unsupported operation
  • test_managed_memory_warning.py — add test_default_pool_error_without_concurrent_access using the existing device_without_concurrent_managed_access fixture

Test Plan

  • Reproduced on WSL2 (RTX 3500 Ada, concurrent_managed_access=False)
  • Verified fix produces the improved error message
  • CI

Made with Cursor

@Andy-Jost Andy-Jost added this to the cuda.core v0.7.0 milestone Mar 30, 2026
@Andy-Jost Andy-Jost added bug Something isn't working P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels Mar 30, 2026
@Andy-Jost Andy-Jost self-assigned this Mar 30, 2026
@Andy-Jost Andy-Jost requested review from cpcloud, leofang, mdboom, rparolin and rwgk and removed request for leofang March 30, 2026 20:10
Comment on lines +260 to +263
raise RuntimeError(
"Getting the current memory pool for a memory location and "
"allocation type requires CUDA 13.0 or later"
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated fix, but a needed improvement.

When ManagedMemoryResource() is called without options on a platform
where the default memory pool does not support managed allocations
(e.g. WSL2), the error from cuMemGetMemPool is now caught and
re-raised as a RuntimeError with actionable guidance.

Made-with: Cursor
@Andy-Jost Andy-Jost force-pushed the managed-mr-error-message branch from a2dc93d to a986306 Compare March 30, 2026 20:18
@github-actions
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ManagedMemoryResource() without options fails when the default pool does not support managed allocations

1 participant