Skip to content

issue/1042 - fix address misaligned error of rearrange#1044

Merged
wooway777 merged 1 commit intomainfrom
issue/1042
Mar 4, 2026
Merged

issue/1042 - fix address misaligned error of rearrange#1044
wooway777 merged 1 commit intomainfrom
issue/1042

Conversation

@pengcheng888
Copy link
Collaborator

@pengcheng888 pengcheng888 commented Mar 2, 2026

`import infinicore

aa = infinicore.empty((4, 3), dtype=infinicore.int64, device=infinicore.device("cuda"))
bb = infinicore.empty((4, 2), dtype=infinicore.int64, device=infinicore.device("cuda"))
cc = aa.narrow(1, 0, 2)

print(aa.shape, aa.stride())
print(bb.shape, bb.stride())
print(cc.shape, cc.stride())

bb.copy_(cc)

infinicore.sync_stream()`

上述样例代码会报错。

测试结果:

修改前
Screenshot from 2026-03-02 20-01-11

修改后
Screenshot from 2026-03-02 20-01-57

算子单测
Screenshot from 2026-03-04 14-20-25

服务推理结果
cuda_graph的推理不再出错

Screenshot from 2026-03-02 21-20-59 Screenshot from 2026-03-04 16-43-08 Screenshot from 2026-03-04 16-59-49

@pengcheng888
Copy link
Collaborator Author

pengcheng888 commented Mar 2, 2026

服务启动代码
`export CUDA_VISIBLE_DEVICES=1

xmake build _infinilm && xmake install _infinilm

python python/infinilm/server/inference_server.py
--nvidia
--model_path=/data-aisoft/mechdancer/models/9g_8b_thinking_llama/
--max_batch_size 6
--num_blocks 8192
--tp 1
--temperature 1.0
--block_size 16
--top_p 0.8
--top_k 1
--port 8102
--max_tokens 256
--enable-graph
--cache_type paged
`

客户端启动代码
`export CUDA_VISIBLE_DEVICES=1
unset http_proxy https_proxy all_proxy ALL_PROXY

vllm bench serve --backend openai-chat --model 9g_8b_thinking
--endpoint /chat/completions
--tokenizer /data-aisoft/zenghua/models/9g_8b_thinking_llama
--max-concurrency 16
--request_rate 10.0
--port 8102
--seed $(date +%s)
--ready-check-timeout-sec 3000
--num-prompts 64
--random-input-len 16
--extra-body '{"max_tokens": 64}'`

@wooway777 wooway777 requested a review from Copilot March 3, 2026 03:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

namespace {
inline bool is_contiguous(const size_t *shape, const ptrdiff_t *strides_, size_t ndim) {
ptrdiff_t expected_stride = 1;
for (size_t i = ndim - 1; i >= 0; --i) {
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_contiguous function has an infinite loop bug. The loop variable i is declared as size_t (an unsigned type), and the condition i >= 0 is always true. When i reaches 0 and is decremented with --i, it wraps around to SIZE_MAX due to unsigned underflow, causing an infinite loop (or out-of-bounds memory access). The loop counter should use a signed type (e.g., ptrdiff_t) to allow the termination condition to work correctly.

Suggested change
for (size_t i = ndim - 1; i >= 0; --i) {
if (ndim == 0) {
return true;
}
for (ptrdiff_t i = static_cast<ptrdiff_t>(ndim) - 1; i >= 0; --i) {

Copilot uses AI. Check for mistakes.
} else {
break;

if (is_contiguous(shape, src_strides_, ndim) && is_contiguous(shape, dst_strides_, ndim)) {
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_contiguous function is called at line 68 with ndim that has already been reassigned to dims.size() (at line 64), which is the number of dimensions after filtering out size-1 dimensions. However, the function receives the original shape and src_strides_/dst_strides_ pointer arrays (from the caller), which still have the original, unfiltered length. This mismatch means is_contiguous iterates over only the last dims.size() entries of the original arrays instead of all entries, potentially reporting a false contiguous result. The contiguity check should use the original ndim (before line 64 reassigns it) or iterate over the dims vector instead.

Copilot uses AI. Check for mistakes.
@pengcheng888
Copy link
Collaborator Author

pengcheng888 commented Mar 4, 2026

@copilot code review[agent]

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

for (size_t candidate : candidates) {
if (current_unit % candidate == 0) {
for (const size_t &candidate : candidates) {
if ((current_unit % candidate == 0) && 0 == (dst_strides_0 & (candidate - 1)) && 0 == (src_strides_0 & (candidate - 1))) {
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment check only validates the first ([0]-th) element of dst_strides and src_strides. For correct alignment, all outer strides must be multiples of the chosen new_unit, not just the first one. When ndim > 1, intermediate strides at indices 1 through ndim-1 are not checked.

For example, if ndim == 2 and dst_strides == [40, 12] with candidate=8, then 40 & 7 == 0 passes the check but 12 & 7 == 4 != 0 means stride index 1 is not 8-byte aligned, leading to misaligned access.

The loop over candidates should check alignment for ALL strides (dst_strides()[0..ndim-1] and src_strides()[0..ndim-1]), not just index 0.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有必要检查每一个维度的stride么

Comment on lines +147 to +149
const size_t ndim = this->ndim();
const ptrdiff_t dst_strides_0 = _meta[2 + ndim];
const ptrdiff_t src_strides_0 = _meta[2 + ndim + ndim];
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When ndim == 0 (all dimensions were merged into unit during RearrangeMeta::create()), _meta has only 2 elements (indices 0 and 1). In that case, _meta[2 + ndim] = _meta[2] and _meta[2 + ndim + ndim] = _meta[2] are out-of-bounds accesses, causing undefined behavior.

A guard should be added so that when ndim == 0, the alignment check on strides is skipped entirely (there are no strides to check). For example, the alignment checks should only be performed when ndim > 0, and when ndim == 0, only the current_unit % candidate == 0 condition needs to be satisfied.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在这个函数中,ndim会永远大于0,不会存在ndim == 0的情况

@wooway777
Copy link
Collaborator

resolves #1042

@wooway777 wooway777 merged commit a950314 into main Mar 4, 2026
14 checks passed
@wooway777 wooway777 deleted the issue/1042 branch March 4, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] rearrange在一些非连续情况下的 case运行报错: 内存不对齐misaligned

3 participants