use pinned device_ptr to init cpu cache tensor#1287
Conversation
There was a problem hiding this comment.
Code Review
This pull request modifies the CPU cache creation logic to use device pointers obtained via cudaHostGetDevicePointer and introduces asynchronous registration. However, several issues were identified: a busy-wait loop in creator.py undermines the non-blocking parameter and is redundant when blocking; an assertion comparing host and device pointers is likely to fail and should be removed; and the device pointer is exposed before all memory chunks are fully registered, which could lead to unsafe memory access.
| while attach_handle.device_ptr is None: | ||
| time.sleep(0.01) |
There was a problem hiding this comment.
| cpu_cache_tensor = self._build_tensor_view(shm_ptr=attach_handle.device_ptr) | ||
| assert shm_ptr == cpu_cache_tensor.data_ptr() |
There was a problem hiding this comment.
此断言 assert shm_ptr == cpu_cache_tensor.data_ptr()(实际上是在检查 shm_ptr == attach_handle.device_ptr)与使用 device_ptr 的初衷相悖。如果设备指针与主机指针不同(这是使用 cudaHostGetDevicePointer 的原因),此断言将导致程序崩溃。如果两者总是相同,则无需更改为使用 device_ptr。鉴于 kv_cache_utils.py 中已移除了类似的断言,建议移除此行以支持指针不一致的系统。
| cpu_cache_tensor = self._build_tensor_view(shm_ptr=attach_handle.device_ptr) | |
| assert shm_ptr == cpu_cache_tensor.data_ptr() | |
| cpu_cache_tensor = self._build_tensor_view(shm_ptr=attach_handle.device_ptr) |
| if handle.device_ptr is None: | ||
| # 提前获取对应的指针对象,避免在wait后再获取,照成过长的阻塞等待。 | ||
| device_ptr = ctypes.c_void_p() | ||
| host_ptr = ctypes.c_void_p(shm_ptr) | ||
| res = cuda.cudaHostGetDevicePointer(ctypes.byref(device_ptr), host_ptr, 0) | ||
| if res != 0: | ||
| raise Exception(f"cudaHostGetDevicePointer failed with error code {res}") | ||
|
|
||
| logger.info(f"cudaHostGetDevicePointer success, host_ptr={host_ptr.value}, device_ptr={device_ptr.value}") | ||
| handle.device_ptr = device_ptr.value |
No description provided.