use pinned device_ptr to init cpu cache tensor by hiworldwzj · Pull Request #1287 · ModelTC/LightLLM

hiworldwzj · 2026-05-06T02:47:55Z

No description provided.

gemini-code-assist

Code Review

This pull request modifies the CPU cache creation logic to use device pointers obtained via cudaHostGetDevicePointer and introduces asynchronous registration. However, several issues were identified: a busy-wait loop in creator.py undermines the non-blocking parameter and is redundant when blocking; an assertion comparing host and device pointers is likely to fail and should be removed; and the device pointer is exposed before all memory chunks are fully registered, which could lead to unsafe memory access.

gemini-code-assist · 2026-05-06T02:49:42Z

+            while attach_handle.device_ptr is None:
+                time.sleep(0.01)


这个忙等循环在 pin_no_blocking 为 False 时是多余的，因为 attach_handle.wait() 已经确保了注册工作完成（且 device_ptr 在 tasks_finished 设置前已被赋值）。更重要的是，当 pin_no_blocking 为 True 时，这个循环会导致调用者阻塞，违背了该参数的设计初衷。此外，如果移除此循环且 pin_no_blocking 为 True，第40行会因为 device_ptr 为 None 而抛出 TypeError。这表明异步固定逻辑与立即获取 device_ptr 的需求在当前设计下是冲突的。

gemini-code-assist · 2026-05-06T02:49:42Z

+            cpu_cache_tensor = self._build_tensor_view(shm_ptr=attach_handle.device_ptr)
            assert shm_ptr == cpu_cache_tensor.data_ptr()


此断言 assert shm_ptr == cpu_cache_tensor.data_ptr()（实际上是在检查 shm_ptr == attach_handle.device_ptr）与使用 device_ptr 的初衷相悖。如果设备指针与主机指针不同（这是使用 cudaHostGetDevicePointer 的原因），此断言将导致程序崩溃。如果两者总是相同，则无需更改为使用 device_ptr。鉴于 kv_cache_utils.py 中已移除了类似的断言，建议移除此行以支持指针不一致的系统。

Suggested change

cpu_cache_tensor = self._build_tensor_view(shm_ptr=attach_handle.device_ptr)

assert shm_ptr == cpu_cache_tensor.data_ptr()

cpu_cache_tensor = self._build_tensor_view(shm_ptr=attach_handle.device_ptr)

gemini-code-assist · 2026-05-06T02:49:42Z

+            if handle.device_ptr is None:
+                # 提前获取对应的指针对象，避免在wait后再获取，照成过长的阻塞等待。
+                device_ptr = ctypes.c_void_p()
+                host_ptr = ctypes.c_void_p(shm_ptr)
+                res = cuda.cudaHostGetDevicePointer(ctypes.byref(device_ptr), host_ptr, 0)
+                if res != 0:
+                    raise Exception(f"cudaHostGetDevicePointer failed with error code {res}")
+
+                logger.info(f"cudaHostGetDevicePointer success, host_ptr={host_ptr.value}, device_ptr={device_ptr.value}")
+                handle.device_ptr = device_ptr.value


在仅完成第一个分块注册后就设置 handle.device_ptr 是不安全的。如果调用者在后台线程完成所有分块注册之前，就对整个 Tensor 执行 GPU 操作，可能会访问到尚未固定的内存。在某些对内存固定有严格要求的硬件上，这可能导致未定义行为或性能大幅下降。建议仅在所有分块注册完成后再暴露 device_ptr，或者确保调用者明确知道此时仅部分固定。

use pinned device_ptr to init cpu cache tensor

c7c74ac

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

hiworldwzj added 4 commits May 6, 2026 03:05

use pinned device_ptr to init cpu cache tensor

e1c3583

fix

34013d3

fix assert

fb4ed6f

fix

aeee71e

hiworldwzj merged commit 3d08cba into main May 7, 2026
1 check passed

hiworldwzj deleted the wzj_ptr branch May 7, 2026 02:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use pinned device_ptr to init cpu cache tensor#1287

use pinned device_ptr to init cpu cache tensor#1287
hiworldwzj merged 5 commits into
mainfrom
wzj_ptr

hiworldwzj commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		cpu_cache_tensor = self._build_tensor_view(shm_ptr=attach_handle.device_ptr)
		assert shm_ptr == cpu_cache_tensor.data_ptr()

Uh oh!

Conversation

hiworldwzj commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant