-
Notifications
You must be signed in to change notification settings - Fork 659
[CUDAGraph] [FIX] Fix CUDA error(700): 'cudaErrorIllegalAddress' in CascadeAppendW… #4218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDAGraph] [FIX] Fix CUDA error(700): 'cudaErrorIllegalAddress' in CascadeAppendW… #4218
Conversation
|
PR 标题建议带上 [CUDAGraph] 标签 |
gongshaotian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| const uint32_t qkv_bias = bias % hidden_size; | ||
| const uint32_t hi = qkv_bias / head_size; | ||
| const uint32_t h_bias = qkv_bias % head_size; | ||
| const uint32_t ori_bi = batch_id_per_token[token_idx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里可以改成int类型吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不合适,因为uint32的取值范围大于int32。
不过单看这个kernel里ori_bi之后没有参与别的uint32的赋值,所以是可以的。
我改一下试试
…riteCacheKVQKV cache_kernel(). Continue when batch_id_per_token[token_idx] is default value -1.
e09e898 to
db403bd
Compare
|
Thanks for your contribution! |
gongshaotian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
修复cache_kernel里的 CUDA error(700): 'cudaErrorIllegalAddress'。
由于int* batch_id_per_token默认值会被设成-1,
在代码
const uint32_t ori_bi = batch_id_per_token[token_idx];中,-1被转成uint32_t会导致越界报错。