[CI] Set high-risk OOM tests for sequential execution#7268
[CI] Set high-risk OOM tests for sequential execution#7268EmmonsCurse merged 2 commits intoPaddlePaddle:developfrom
Conversation
|
/skip-ci ci_iluvatar |
|
Thanks for your contribution! |
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-09
📋 Review 摘要
PR 概述:将高风险 OOM 测试标记为 multi_gpu 类型强制串行执行,避免并发导致的 OOM 问题
变更范围:scripts/coverage_run.sh、scripts/unittest_requirement.txt
影响面 Tag:[CI]
📝 PR 规范检查
✓ 标题包含有效 [CI] Tag,Motivation 和 Modifications 描述清晰,符合规范。
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | scripts/unittest_requirement.txt:12 |
arctic_inference 依赖与本次 CI 变更主题无关 |
| 🟡 建议 | scripts/unittest_requirement.txt:12 |
依赖版本与代码中提示的版本不一致 |
| ❓ 疑问 | scripts/coverage_run.sh:45 |
测试文件硬编码方式维护性较差 |
总体评价
测试分类逻辑合理,能够有效避免高风险 OOM 测试并发执行导致的问题。但新增的 arctic_inference 依赖与本次 CI 变更无关,建议确认是否应该包含在该 PR 中。
| aistudio_sdk==0.3.5 | ||
| pandas | ||
| use_triton_in_paddle | ||
| arctic_inference @ https://paddle-qa.bj.bcebos.com/ernie/arctic_inference-0.1.3-cp310-cp310-linux_x86_64.whl |
There was a problem hiding this comment.
🟡 建议 arctic_inference 依赖与本次 CI 变更主题无关。
本次 PR 是关于测试执行策略的优化,不应该引入新的运行时依赖。建议确认此依赖是否应该单独提交或从本次 PR 中移除。
| aistudio_sdk==0.3.5 | ||
| pandas | ||
| use_triton_in_paddle | ||
| arctic_inference @ https://paddle-qa.bj.bcebos.com/ernie/arctic_inference-0.1.3-cp310-cp310-linux_x86_64.whl |
There was a problem hiding this comment.
🟡 建议 依赖版本不一致。
代码中 fastdeploy/spec_decode/suffix.py:46 提示安装 arctic-inference==0.1.2,但本次 PR 引入的是 0.1.3 版本,请确认版本一致性。
| fi | ||
| fi | ||
|
|
||
| # Rule 5: high-risk OOM tests (treat as multi_gpu for sequential execution) |
There was a problem hiding this comment.
❓ 疑问 硬编码测试文件维护性较差。
9 个测试文件被硬编码在 Rule 5 中,后续新增高风险测试需要手动修改脚本。建议考虑更灵活的方式,如通过配置文件、环境变量或测试装饰器标记。
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7268 +/- ##
==========================================
Coverage ? 74.16%
==========================================
Files ? 383
Lines ? 53560
Branches ? 8394
==========================================
Hits ? 39722
Misses ? 11149
Partials ? 2689
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Under
single-GPU parallel execution, some test cases have high memory consumption (e.g., large batch sizes, complex operators, or multi-stage inference). When executed concurrently, these tests may trigger OOM or be killed by the system, leading to flaky CI failures and increased debugging cost.Modifications
high-risk OOMtest casesmulti_gpuexecution typeUsage or Command
N/A
Accuracy Tests
N/A
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.