[Cherry-Pick][CI] Sync dev optimizations to 2.6(#7335)#7343
[Cherry-Pick][CI] Sync dev optimizations to 2.6(#7335)#7343EmmonsCurse merged 1 commit intoPaddlePaddle:release/2.6from
Conversation
|
Thanks for your contribution! |
|
/skip-ci ci_iluvatar |
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review | 2026-04-11
📋 Review 摘要
PR 概述:Cherry-pick 将 dev 分支的 CI 优化同步到 release/2.6 分支
变更范围:13 个 GitHub workflow 文件和 1 个测试脚本
影响面 Tag:[CI]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | .github/workflows/_accuracy_test.yml:227 |
容器清理逻辑存在漏洞,当容器已停止时 workspace 不会被清理 |
| 🟡 建议 | .github/workflows/_accuracy_test.yml:163 |
--shm-size=64g 大小写与其他 workflow 不一致 |
| 🟡 建议 | scripts/run_pre_ce.sh:10 |
缺少详细注释说明为何使用预编译 wheel 文件 |
总体评价
本次 CI 优化整体正确,改进了容器资源清理和错误处理。但存在一个 P0 级别的容器清理逻辑问题需要修复,该问题影响所有 13 个添加了清理步骤的 workflow 文件。建议修复后合并。
| echo "TEST_EXIT_CODE=${TEST_EXIT_CODE}" | ||
| exit ${TEST_EXIT_CODE} | ||
|
|
||
| - name: Terminate and delete the container |
There was a problem hiding this comment.
🔴 Bug 容器清理逻辑存在漏洞
当容器已停止时(例如测试失败导致容器退出),docker exec -t ${{ runner.name }} 会失败,导致 workspace 内容不会被清理,但容器仍会被 docker rm -f 删除。
建议修改为:
- name: Terminate and delete the container
if: always()
run: |
set +e
# 清理 workspace(如果容器仍在运行)
docker exec -t ${{ runner.name }} /bin/bash -c 'find /workspace -mindepth 1 -delete' 2>/dev/null || true
# 强制删除容器
docker rm -f ${{ runner.name }}
# 如果容器已停止但 workspace 残留,直接清理宿主机上的 workspace
find $(pwd) -mindepth 1 -maxdepth 1 -delete 2>/dev/null || true| fi | ||
|
|
||
| docker run --rm --ipc=host --pid=host --net=host \ | ||
| docker run --rm --net=host \ |
There was a problem hiding this comment.
❓ 大小写不一致
此处的 --shm-size=64g 使用了小写 g,而其他 workflow 文件(如 _unit_test_coverage.yml)使用大写 G。虽然 Docker 对大小写不敏感,但建议保持一致。
| python -m pip install -r requirements.txt | ||
| python -m pip install jsonschema aistudio_sdk==0.3.5 | ||
| python -m pip install xgrammar==0.1.19 torch==2.6.0 | ||
| # Use prebuilt wheel files to install xgrammar==0.1.19 and torch==2.6.0 specifically for the CI environment |
There was a problem hiding this comment.
🟡 建议 添加注释说明为什么需要使用预编译 wheel 文件
建议在注释中说明原因(例如:PyPI 源可能不可用、CI 网络环境限制、特定构建版本等),便于后续维护。
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7343 +/- ##
==============================================
Coverage ? 73.85%
==============================================
Files ? 376
Lines ? 52960
Branches ? 8268
==============================================
Hits ? 39112
Misses ? 11112
Partials ? 2736
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
The CI pipeline may leave behind running containers or uncleaned workspaces when jobs are canceled or fail unexpectedly. This can cause resource leakage, workspace conflicts, and instability in subsequent jobs.
Modifications
Cherry-pick of #7198 #7227 #7283 #7268 #7315 #7335 to
release/2.6.Usage or Command
N/A
Accuracy Tests
N/A
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.