[Cherry-Pick][CI] Sync dev optimizations to 2.6(#7602)#7610
Conversation
|
Thanks for your contribution! |
|
/skip-ci ci_iluvatar |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7610 +/- ##
==============================================
Coverage ? 73.06%
==============================================
Files ? 376
Lines ? 53401
Branches ? 8352
==============================================
Hits ? 39019
Misses ? 11631
Partials ? 2751
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-24 17:44:34
📋 Review 摘要
PR 概述:Cherry-Pick CI 优化到 release/2.6,包含 git diff 修复、OOM 重试机制、pytest 失败日志收集及错误 grep 降噪
变更范围:CI 脚本(scripts/)、测试配置(tests/conftest.py)
影响面 Tag:[CI]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| ❓ 疑问 | tests/conftest.py:152 |
pytest_runtest_makereport 仅捕获 call 阶段失败,setup/teardown 阶段失败不会写入错误日志 |
变更逐项说明
scripts/check_approval.sh:将 git diff upstream/$BRANCH 改为 git diff --merge-base upstream/$BRANCH,等价于以 $(git merge-base HEAD upstream/$BRANCH) 为基准做 diff,避免了在本地分支未严格 rebase 时误判无关变更。修复正确,4 处调用一致更新 ✓
scripts/coverage_run.sh:新增 exit code 137(SIGKILL/OOM Killer)重试机制,最多重试 3 次(共最多 4 次执行),非 137 错误和超时(exit 124)不触发重试,逻辑设计合理 ✓
scripts/run_golang_router.sh / run_gpu_4cards.sh / run_pre_ce.sh:在 grep -Rni error 命令中加入 --exclude="pytest_*_error.log" 过滤,避免 CI 日志中新增的 pytest 错误日志文件被重复扫描产生噪音 ✓
tests/conftest.py:代码重构(imports 移至文件头、补充 docstring)+ 新增 pytest_runtest_makereport hook,将失败测试的完整 traceback 写入 $FD_LOG_DIR/pytest_<case_name>_error.log 文件,便于 CI 后续调试 ✓
总体评价
改动清晰合理,三项优化相互配合:错误日志写入 → grep 过滤 → OOM 重试,整体 CI 稳定性和可调试性均有提升。无阻塞性问题。
| outcome = yield | ||
| report = outcome.get_result() | ||
|
|
||
| if report.when == "call" and report.failed: |
There was a problem hiding this comment.
❓ 疑问:此处仅捕获 call(测试执行)阶段的失败,setup 和 teardown 阶段的失败不会写入错误日志文件。如果 fixture 初始化失败,错误日志将不会被保存。请确认这是预期行为,还是需要同时覆盖 setup/teardown 阶段?
如需覆盖所有阶段,可改为:
if report.failed: # 覆盖 setup/call/teardown 所有阶段
Motivation
Modifications
Cherry-pick of #7602 #7601 #7405 and to
release/2.6.Usage or Command
N/A
Accuracy Tests
N/A
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.