issue/1042 - fix address misaligned error of rearrange#1044
Conversation
|
服务启动代码 xmake build _infinilm && xmake install _infinilm python python/infinilm/server/inference_server.py 客户端启动代码 vllm bench serve --backend openai-chat --model 9g_8b_thinking |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/utils/rearrange.cc
Outdated
| namespace { | ||
| inline bool is_contiguous(const size_t *shape, const ptrdiff_t *strides_, size_t ndim) { | ||
| ptrdiff_t expected_stride = 1; | ||
| for (size_t i = ndim - 1; i >= 0; --i) { |
There was a problem hiding this comment.
The is_contiguous function has an infinite loop bug. The loop variable i is declared as size_t (an unsigned type), and the condition i >= 0 is always true. When i reaches 0 and is decremented with --i, it wraps around to SIZE_MAX due to unsigned underflow, causing an infinite loop (or out-of-bounds memory access). The loop counter should use a signed type (e.g., ptrdiff_t) to allow the termination condition to work correctly.
| for (size_t i = ndim - 1; i >= 0; --i) { | |
| if (ndim == 0) { | |
| return true; | |
| } | |
| for (ptrdiff_t i = static_cast<ptrdiff_t>(ndim) - 1; i >= 0; --i) { |
src/utils/rearrange.cc
Outdated
| } else { | ||
| break; | ||
|
|
||
| if (is_contiguous(shape, src_strides_, ndim) && is_contiguous(shape, dst_strides_, ndim)) { |
There was a problem hiding this comment.
The is_contiguous function is called at line 68 with ndim that has already been reassigned to dims.size() (at line 64), which is the number of dimensions after filtering out size-1 dimensions. However, the function receives the original shape and src_strides_/dst_strides_ pointer arrays (from the caller), which still have the original, unfiltered length. This mismatch means is_contiguous iterates over only the last dims.size() entries of the original arrays instead of all entries, potentially reporting a false contiguous result. The contiguity check should use the original ndim (before line 64 reassigns it) or iterate over the dims vector instead.
|
@copilot code review[agent] |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (size_t candidate : candidates) { | ||
| if (current_unit % candidate == 0) { | ||
| for (const size_t &candidate : candidates) { | ||
| if ((current_unit % candidate == 0) && 0 == (dst_strides_0 & (candidate - 1)) && 0 == (src_strides_0 & (candidate - 1))) { |
There was a problem hiding this comment.
The alignment check only validates the first ([0]-th) element of dst_strides and src_strides. For correct alignment, all outer strides must be multiples of the chosen new_unit, not just the first one. When ndim > 1, intermediate strides at indices 1 through ndim-1 are not checked.
For example, if ndim == 2 and dst_strides == [40, 12] with candidate=8, then 40 & 7 == 0 passes the check but 12 & 7 == 4 != 0 means stride index 1 is not 8-byte aligned, leading to misaligned access.
The loop over candidates should check alignment for ALL strides (dst_strides()[0..ndim-1] and src_strides()[0..ndim-1]), not just index 0.
There was a problem hiding this comment.
有必要检查每一个维度的stride么
| const size_t ndim = this->ndim(); | ||
| const ptrdiff_t dst_strides_0 = _meta[2 + ndim]; | ||
| const ptrdiff_t src_strides_0 = _meta[2 + ndim + ndim]; |
There was a problem hiding this comment.
When ndim == 0 (all dimensions were merged into unit during RearrangeMeta::create()), _meta has only 2 elements (indices 0 and 1). In that case, _meta[2 + ndim] = _meta[2] and _meta[2 + ndim + ndim] = _meta[2] are out-of-bounds accesses, causing undefined behavior.
A guard should be added so that when ndim == 0, the alignment check on strides is skipped entirely (there are no strides to check). For example, the alignment checks should only be performed when ndim > 0, and when ndim == 0, only the current_unit % candidate == 0 condition needs to be satisfied.
There was a problem hiding this comment.
在这个函数中,ndim会永远大于0,不会存在ndim == 0的情况
|
resolves #1042 |
`import infinicore
aa = infinicore.empty((4, 3), dtype=infinicore.int64, device=infinicore.device("cuda"))
bb = infinicore.empty((4, 2), dtype=infinicore.int64, device=infinicore.device("cuda"))
cc = aa.narrow(1, 0, 2)
print(aa.shape, aa.stride())
print(bb.shape, bb.stride())
print(cc.shape, cc.stride())
bb.copy_(cc)
infinicore.sync_stream()`
上述样例代码会报错。
测试结果:
修改前

修改后

算子单测

服务推理结果
cuda_graph的推理不再出错