[CI][XPU] Optimize CI logs and variable names#5025
[CI][XPU] Optimize CI logs and variable names#5025plusNew001 merged 11 commits intoPaddlePaddle:developfrom
Conversation
Redirect output of stop_processes to kill.log to capture logs.
Added logging of server.log for failed tests.
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes CI logs and standardizes variable naming for XPU testing infrastructure by replacing GPU-specific naming with XPU-appropriate terminology.
Key Changes
- Renamed all
GPU_IDreferences toXPU_IDacross Python test files, shell scripts, and GitHub workflows for consistency with XPU hardware - Optimized CI log output by redirecting
stop_processesoutput tokill.logand only displayingserver.logon test failures - Added real-time health check status messages during service startup and
paths-ignorefilter to skip CI for documentation changes
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/ci_use/XPU_45T/run_w4a8.py | Updated variable name from gpu_id to xpu_id and environment variable from GPU_ID to XPU_ID |
| tests/ci_use/XPU_45T/run_ep_online.py | Updated variable name from gpu_id to xpu_id and environment variable from GPU_ID to XPU_ID |
| tests/ci_use/XPU_45T/run_ep.py | Updated variable name from gpu_id to xpu_id and environment variable from GPU_ID to XPU_ID |
| tests/ci_use/XPU_45T/run_45vl.py | Updated variable name from gpu_id to xpu_id and environment variable from GPU_ID to XPU_ID |
| tests/ci_use/XPU_45T/run_45T.py | Updated variable name from gpu_id to xpu_id and environment variable from GPU_ID to XPU_ID |
| scripts/run_ci_xpu.sh | Renamed GPU_ID to XPU_ID throughout, fixed bash loop syntax, redirected kill logs, added health check status output, and made server.log conditional |
| .github/workflows/ci_xpu.yml | Renamed gpu_id to xpu_id in workflow variables and added paths-ignore to skip CI for .md and .txt files |
| branches: | ||
| - develop | ||
| - 'release/*' | ||
| paths-ignore: | ||
| - '**.md' | ||
| - '**.txt' |
There was a problem hiding this comment.
PR标题格式正确,包含了[CI]和[XPU]标签。
但是,PR描述信息不够完整。描述中的"Motivation"和"Modifications"部分为空。建议补充以下内容:
Motivation(动机):说明为什么要进行这些修改,例如:
- 由于这是XPU相关的CI测试代码,使用GPU_ID变量名容易造成混淆
- CI日志输出过多,需要优化以提高可读性
Modifications(修改内容):详细说明具体的改动,例如:
- 将所有GPU_ID变量重命名为XPU_ID,使命名更准确
- 优化日志输出:将kill进程的日志重定向到kill.log,只在测试失败时输出server.log
- 添加服务健康检查的实时状态输出
- 在workflow中添加paths-ignore,忽略.md和.txt文件的变更
这样可以帮助reviewer和未来的代码维护者更好地理解这些改动的目的和内容。
|
|
||
| HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -m 2 "$ENDPOINT" || true) | ||
|
|
||
| echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" |
There was a problem hiding this comment.
在健康检查循环中添加实时状态输出是个很好的改进,可以让CI日志更容易追踪。但是建议在echo语句中去掉 -e "\r",因为:
- 在CI日志中,
\r(回车符)不会产生"覆盖上一行"的效果,而是会显示为特殊字符或被忽略 - CI日志是按行记录的,每次循环应该输出新的一行,这样更容易追踪等待时间的历史记录
建议修改为:
echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"这样在CI日志中会看到每次检查的完整记录,更方便问题排查。
| echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" | |
| echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" |
|
|
||
| HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -m 2 "$ENDPOINT" || true) | ||
|
|
||
| echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" |
There was a problem hiding this comment.
同样的问题:在CI环境中使用 \r 回车符不会产生预期的"覆盖上一行"效果。建议去掉 -e "\r",直接输出每次检查的状态,便于在CI日志中追踪。
建议修改为:
echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"| echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" | |
| echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" |
|
|
||
| HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -m 2 "$ENDPOINT" || true) | ||
|
|
||
| echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" |
There was a problem hiding this comment.
同样的问题:在CI环境中使用 \r 回车符不会产生预期效果。建议去掉 -e "\r",改为:
echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"| echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" | |
| echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" |
| exit 1 | ||
| fi | ||
| HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -m 2 "$ENDPOINT" || true) | ||
| echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" |
There was a problem hiding this comment.
同样的问题:在CI环境中使用 \r 回车符不会产生预期效果。建议去掉 -e "\r",改为:
echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"| echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" | |
| echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}" |
| } | ||
| stop_processes | ||
|
|
||
| stop_processes >kill.log 2>&1 |
There was a problem hiding this comment.
将stop_processes的输出重定向到kill.log是个好主意,可以减少CI日志的噪音。但建议使用追加模式(>>)而不是覆盖模式(>),这样可以保留所有的kill操作记录,便于问题排查。
建议修改为:
stop_processes >>kill.log 2>&1这样每次调用stop_processes时,日志都会被追加到kill.log,而不是覆盖之前的内容。这在脚本中多次调用stop_processes时特别有用(如第64、130、206、285、383、469、558行)。
| stop_processes >kill.log 2>&1 | |
| stop_processes >>kill.log 2>&1 |
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.