issue/1042 - fix address misaligned error of rearrange by pengcheng888 · Pull Request #1044 · InfiniTensor/InfiniCore

pengcheng888 · 2026-03-02T13:02:46Z

`import infinicore

aa = infinicore.empty((4, 3), dtype=infinicore.int64, device=infinicore.device("cuda"))
bb = infinicore.empty((4, 2), dtype=infinicore.int64, device=infinicore.device("cuda"))
cc = aa.narrow(1, 0, 2)

print(aa.shape, aa.stride())
print(bb.shape, bb.stride())
print(cc.shape, cc.stride())

bb.copy_(cc)

infinicore.sync_stream()`

上述样例代码会报错。

测试结果：

修改前

修改后

算子单测

服务推理结果
cuda_graph的推理不再出错

pengcheng888 · 2026-03-02T13:22:14Z

服务启动代码
`export CUDA_VISIBLE_DEVICES=1

xmake build _infinilm && xmake install _infinilm

python python/infinilm/server/inference_server.py
--nvidia
--model_path=/data-aisoft/mechdancer/models/9g_8b_thinking_llama/
--max_batch_size 6
--num_blocks 8192
--tp 1
--temperature 1.0
--block_size 16
--top_p 0.8
--top_k 1
--port 8102
--max_tokens 256
--enable-graph
--cache_type paged
`

客户端启动代码
`export CUDA_VISIBLE_DEVICES=1
unset http_proxy https_proxy all_proxy ALL_PROXY

vllm bench serve --backend openai-chat --model 9g_8b_thinking
--endpoint /chat/completions
--tokenizer /data-aisoft/zenghua/models/9g_8b_thinking_llama
--max-concurrency 16
--request_rate 10.0
--port 8102
--seed $(date +%s)
--ready-check-timeout-sec 3000
--num-prompts 64
--random-input-len 16
--extra-body '{"max_tokens": 64}'`

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T07:40:55Z

src/utils/rearrange.cc

+namespace {
+inline bool is_contiguous(const size_t *shape, const ptrdiff_t *strides_, size_t ndim) {
+    ptrdiff_t expected_stride = 1;
+    for (size_t i = ndim - 1; i >= 0; --i) {


The is_contiguous function has an infinite loop bug. The loop variable i is declared as size_t (an unsigned type), and the condition i >= 0 is always true. When i reaches 0 and is decremented with --i, it wraps around to SIZE_MAX due to unsigned underflow, causing an infinite loop (or out-of-bounds memory access). The loop counter should use a signed type (e.g., ptrdiff_t) to allow the termination condition to work correctly.

Suggested change

for (size_t i = ndim - 1; i >= 0; --i) {

if (ndim == 0) {

return true;

}

for (ptrdiff_t i = static_cast<ptrdiff_t>(ndim) - 1; i >= 0; --i) {

Copilot · 2026-03-03T07:40:56Z

src/utils/rearrange.cc

-        } else {
-            break;
+
+    if (is_contiguous(shape, src_strides_, ndim) && is_contiguous(shape, dst_strides_, ndim)) {


The is_contiguous function is called at line 68 with ndim that has already been reassigned to dims.size() (at line 64), which is the number of dimensions after filtering out size-1 dimensions. However, the function receives the original shape and src_strides_/dst_strides_ pointer arrays (from the caller), which still have the original, unfiltered length. This mismatch means is_contiguous iterates over only the last dims.size() entries of the original arrays instead of all entries, potentially reporting a false contiguous result. The contiguity check should use the original ndim (before line 64 reassigns it) or iterate over the dims vector instead.

pengcheng888 · 2026-03-04T06:26:40Z

@copilot code review[agent]

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T06:55:42Z

src/utils/rearrange.cc

-    for (size_t candidate : candidates) {
-        if (current_unit % candidate == 0) {
+    for (const size_t &candidate : candidates) {
+        if ((current_unit % candidate == 0) && 0 == (dst_strides_0 & (candidate - 1)) && 0 == (src_strides_0 & (candidate - 1))) {


The alignment check only validates the first ([0]-th) element of dst_strides and src_strides. For correct alignment, all outer strides must be multiples of the chosen new_unit, not just the first one. When ndim > 1, intermediate strides at indices 1 through ndim-1 are not checked.

For example, if ndim == 2 and dst_strides == [40, 12] with candidate=8, then 40 & 7 == 0 passes the check but 12 & 7 == 4 != 0 means stride index 1 is not 8-byte aligned, leading to misaligned access.

The loop over candidates should check alignment for ALL strides (dst_strides()[0..ndim-1] and src_strides()[0..ndim-1]), not just index 0.

有必要检查每一个维度的stride么

Copilot · 2026-03-04T06:55:42Z

src/utils/rearrange.cc

+    const size_t ndim = this->ndim();
+    const ptrdiff_t dst_strides_0 = _meta[2 + ndim];
+    const ptrdiff_t src_strides_0 = _meta[2 + ndim + ndim];


When ndim == 0 (all dimensions were merged into unit during RearrangeMeta::create()), _meta has only 2 elements (indices 0 and 1). In that case, _meta[2 + ndim] = _meta[2] and _meta[2 + ndim + ndim] = _meta[2] are out-of-bounds accesses, causing undefined behavior.

A guard should be added so that when ndim == 0, the alignment check on strides is skipped entirely (there are no strides to check). For example, the alignment checks should only be performed when ndim > 0, and when ndim == 0, only the current_unit % candidate == 0 condition needs to be satisfied.

在这个函数中，ndim会永远大于0，不会存在ndim == 0的情况

wooway777 · 2026-03-04T09:07:11Z

resolves #1042

pengcheng888 requested review from a team and Ziminli March 2, 2026 13:02

pengcheng888 self-assigned this Mar 2, 2026

pengcheng888 added the 类型：BUG label Mar 2, 2026

pengcheng888 linked an issue Mar 2, 2026 that may be closed by this pull request

[BUG] rearrange在一些非连续情况下的 case运行报错：内存不对齐misaligned #1042

Closed

pengcheng888 requested a review from wooway777 March 2, 2026 13:18

wooway777 requested a review from Copilot March 3, 2026 03:29

Copilot AI reviewed Mar 3, 2026

View reviewed changes

wooway777 requested a review from Copilot March 3, 2026 07:38

Copilot started reviewing on behalf of wooway777 March 3, 2026 07:38 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

issue/1042 - fix address misaligned error of rearrange

75a9b69

pengcheng888 force-pushed the issue/1042 branch from 16df801 to 75a9b69 Compare March 4, 2026 06:16

wooway777 requested a review from Copilot March 4, 2026 06:51

Copilot started reviewing on behalf of wooway777 March 4, 2026 06:51 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

wooway777 approved these changes Mar 4, 2026

View reviewed changes

wooway777 merged commit a950314 into main Mar 4, 2026
14 checks passed

wooway777 deleted the issue/1042 branch March 4, 2026 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue/1042 - fix address misaligned error of rearrange#1044

issue/1042 - fix address misaligned error of rearrange#1044
wooway777 merged 1 commit intomainfrom
issue/1042

pengcheng888 commented Mar 2, 2026 •

edited

Loading

Uh oh!

pengcheng888 commented Mar 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

pengcheng888 commented Mar 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

pengcheng888 Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

pengcheng888 Mar 4, 2026

Uh oh!

wooway777 commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    for (size_t i = ndim - 1; i >= 0; --i) {
+    if (ndim == 0) {
+        return true;
+    }
+    for (ptrdiff_t i = static_cast<ptrdiff_t>(ndim) - 1; i >= 0; --i) {

Conversation

pengcheng888 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pengcheng888 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

pengcheng888 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

pengcheng888 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

pengcheng888 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

wooway777 commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pengcheng888 commented Mar 2, 2026 •

edited

Loading

pengcheng888 commented Mar 2, 2026 •

edited

Loading

pengcheng888 commented Mar 4, 2026 •

edited

Loading