[lora] allow int64 values for LoRA ID to avoid overflow#1574
Conversation
Signed-off-by: AlpinDale <alpindale@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request updates the dtype of request_lora_mapping from np.int32 to np.int64 in both gpu_input_batch.py and tpu_input_batch.py to prevent potential overflows with LoRA IDs. While this change is necessary to support larger IDs, it introduces a risk of type mismatch with downstream components like CUDA or TPU kernels that might still expect 32-bit integers. I've added critical comments highlighting the need to ensure all consumers of this array are updated to prevent potential data corruption or crashes.
|
|
||
| # lora related | ||
| self.request_lora_mapping = np.zeros((self.max_num_reqs,), dtype=np.int32) | ||
| self.request_lora_mapping = np.zeros((self.max_num_reqs,), dtype=np.int64) |
There was a problem hiding this comment.
This change to np.int64 can cause a critical type mismatch. If downstream consumers of this array (e.g., CUDA kernels) still expect np.int32, it can lead to silent data corruption or crashes due to incorrect memory interpretation. It is crucial that all components using request_lora_mapping are also updated to handle 64-bit integers.
|
|
||
| # lora related | ||
| self.request_lora_mapping = np.zeros((self.max_num_reqs,), dtype=np.int32) | ||
| self.request_lora_mapping = np.zeros((self.max_num_reqs,), dtype=np.int64) |
There was a problem hiding this comment.
This change to np.int64 can cause a critical type mismatch. If downstream consumers of this array (e.g., TPU kernels) still expect np.int32, it can lead to silent data corruption or crashes due to incorrect memory interpretation. It is crucial that all components using request_lora_mapping are also updated to handle 64-bit integers.
No description provided.