Skip to content

[Cherry-Pick][BugFix] Fix Async D2H copy bug & flash mash atten cache V out of bound bug(#7221)#7296

Merged
Jiang-Jia-Jun merged 1 commit into
PaddlePaddle:release/2.6from
EmmonsCurse:cherry-pick/7221/release/2.6
Apr 10, 2026
Merged

[Cherry-Pick][BugFix] Fix Async D2H copy bug & flash mash atten cache V out of bound bug(#7221)#7296
Jiang-Jia-Jun merged 1 commit into
PaddlePaddle:release/2.6from
EmmonsCurse:cherry-pick/7221/release/2.6

Conversation

@EmmonsCurse
Copy link
Copy Markdown
Collaborator

Cherry-pick of #7221 (authored by @ming1753) to release/2.6.

devPR:#7221


Motivation

  1. Async D2H copy bug: 使用异步拷贝后立即读取 CPU 数据,导致可能读取到未完成的拷贝结果,引发数据竞态问题
  2. Flash Mask Attention out of bound: 当 seq_len 不是 kBlockN 整数倍时,最后一个 block 的共享内存访问越界

Modifications

  1. copy_copy_to 的最后一个参数从 false 改为 true,将异步拷贝改为同步拷贝,确保数据在读取前已完成
  • get_block_shape_and_split_kv_block.cu: 4 处
  • pre_cache_len_concat.cu: 2 处
  1. 在 mainloop_attn.hpp 中添加边界条件处理:当处理最后一个 block 时,将超出 valid_k 部分的共享内存清零

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
  • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
  • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 10, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-10

📋 Review 摘要

PR 概述:Cherry-pick 修复两个 GPU 算子 bug - 异步 D2H 拷贝竞态问题和 Flash Mask Attention 共享内存越界问题

变更范围custom_ops/gpu_ops/

影响面 Tag[OP]

📝 PR 规范检查

标题拼写错误:"flash mash atten" 应改为 "flash mask attention"

标题建议(可直接复制):

  • [Cherry-Pick][BugFix] Fix Async D2H copy bug & flash mask attention cache V out of bound bug(#7221)

问题

级别 文件 概述
🟡 建议 代码库系统性问题 发现大量其他文件存在同样的异步拷贝后立即访问 CPU 数据的问题

总体评价

当前 PR 的两个 bug 修复逻辑正确:

  • Async D2H copy bug:将异步拷贝改为同步拷贝,确保数据在读取前已完成,所有相关位置都已正确修改
  • Flash Mask Attention out of bound bug:正确处理了最后一个 block 的共享内存边界条件,清零越界部分并使用 fence_view_async_shared() 确保内存一致性

系统性建议:代码库中存在大量类似的异步拷贝问题(20+ 处),以下文件也存在同样模式:

  • step.cu:375
  • save_output_msg_with_topk.cc:48-52
  • step_system_cache.cu:160
  • step_reschedule.cu:262,264
  • speculate_decoding/ 目录下多处
  • 等等...

建议后续系统性排查并修复这些潜在的数据竞态问题。

未发现阻塞性问题。

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (release/2.6@dea9d35). Learn more about missing BASE report.

Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #7296   +/-   ##
==============================================
  Coverage               ?   74.21%           
==============================================
  Files                  ?      376           
  Lines                  ?    52915           
  Branches               ?     8255           
==============================================
  Hits                   ?    39270           
  Misses                 ?    10900           
  Partials               ?     2745           
Flag Coverage Δ
GPU 74.21% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit dd0863b into PaddlePaddle:release/2.6 Apr 10, 2026
33 of 37 checks passed
@EmmonsCurse EmmonsCurse deleted the cherry-pick/7221/release/2.6 branch April 12, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants