Background
PR #1775 lands ManagedBuffer with a property-style advice API (buf.read_mostly = ..., buf.preferred_location = ..., buf.accessed_by.add(...)) for the write side of managed-memory advice. The corresponding read side is uneven:
buf.preferred_location — exists, returns Device | Host | None.
buf.read_mostly — exists as a getter (queries CU_MEM_RANGE_ATTRIBUTE_READ_MOSTLY).
buf.accessed_by — exists as AccessedBySetProxy.
last_prefetch_location — missing.
Because the last-prefetch query isn't exposed, the test suite reaches into cuda.bindings.driver directly:
last = _get_int_attr(buf, driver.CUmem_range_attribute.CU_MEM_RANGE_ATTRIBUTE_LAST_PREFETCH_LOCATION)
As @leofang noted on PR #1775 (#1775 (comment)):
The fact that these are needed at test time rings a bell. cuda.core tries hard to not leak the abstraction. This highlights a problem that we do not expose enough mem-range attributes for ManagedBuffer.
PR #1775 currently works around this with a private _last_prefetch_location(buf) helper in tests/memory/test_managed_ops.py carrying a TODO that points at this issue.
Proposal
Add ManagedBuffer.last_prefetch_location mirroring preferred_location's shape:
@property
def last_prefetch_location(self) -> Device | Host | None:
"""Location of the most recent prefetch on this range, or ``None``
if no prefetch has been issued.
"""
Returns:
Device(i) for i >= 0
Host() for the legacy -1 ordinal
None for the "no prefetch yet" sentinel
On CUDA 13, verify whether CU_MEM_RANGE_ATTRIBUTE_LAST_PREFETCH_LOCATION_TYPE / _ID exist; if they do, layer a v2 path the same way preferred_location does so Host(numa_id=N) round-trips. Otherwise document the legacy-attribute caveat consistently with preferred_location.
Follow-on cleanup
Once this lands, in cuda_core/tests/memory/test_managed_ops.py:
- Drop the private
_last_prefetch_location(buf) helper.
- Replace
last == _HOST_LOCATION_ID / last == device.device_id assertions with buf.last_prefetch_location == Host() / ... == device.
- Drop the
from cuda.core._memory._managed_buffer import _get_int_attr import and most driver.CUmem_range_attribute.* references.
Scope notes
Background
PR #1775 lands
ManagedBufferwith a property-style advice API (buf.read_mostly = ...,buf.preferred_location = ...,buf.accessed_by.add(...)) for the write side of managed-memory advice. The corresponding read side is uneven:buf.preferred_location— exists, returnsDevice | Host | None.buf.read_mostly— exists as a getter (queriesCU_MEM_RANGE_ATTRIBUTE_READ_MOSTLY).buf.accessed_by— exists asAccessedBySetProxy.last_prefetch_location— missing.Because the last-prefetch query isn't exposed, the test suite reaches into
cuda.bindings.driverdirectly:As @leofang noted on PR #1775 (#1775 (comment)):
PR #1775 currently works around this with a private
_last_prefetch_location(buf)helper intests/memory/test_managed_ops.pycarrying a TODO that points at this issue.Proposal
Add
ManagedBuffer.last_prefetch_locationmirroringpreferred_location's shape:Returns:
Device(i)fori >= 0Host()for the legacy-1ordinalNonefor the "no prefetch yet" sentinelOn CUDA 13, verify whether
CU_MEM_RANGE_ATTRIBUTE_LAST_PREFETCH_LOCATION_TYPE/_IDexist; if they do, layer a v2 path the same waypreferred_locationdoes soHost(numa_id=N)round-trips. Otherwise document the legacy-attribute caveat consistently withpreferred_location.Follow-on cleanup
Once this lands, in
cuda_core/tests/memory/test_managed_ops.py:_last_prefetch_location(buf)helper.last == _HOST_LOCATION_ID/last == device.device_idassertions withbuf.last_prefetch_location == Host()/... == device.from cuda.core._memory._managed_buffer import _get_int_attrimport and mostdriver.CUmem_range_attribute.*references.Scope notes
CUmem_range_attribute(e.g.,PAGE_ENGINE_LAST_GPU_USEDand friends) — file separately if needed.