Skip to content

[CI][XPU] Optimize CI logs and variable names#5025

Merged
plusNew001 merged 11 commits intoPaddlePaddle:developfrom
plusNew001:ci-update
Nov 14, 2025
Merged

[CI][XPU] Optimize CI logs and variable names#5025
plusNew001 merged 11 commits intoPaddlePaddle:developfrom
plusNew001:ci-update

Conversation

@plusNew001
Copy link
Collaborator

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings November 14, 2025 07:49
@paddle-bot
Copy link

paddle-bot bot commented Nov 14, 2025

Thanks for your contribution!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes CI logs and standardizes variable naming for XPU testing infrastructure by replacing GPU-specific naming with XPU-appropriate terminology.

Key Changes

  • Renamed all GPU_ID references to XPU_ID across Python test files, shell scripts, and GitHub workflows for consistency with XPU hardware
  • Optimized CI log output by redirecting stop_processes output to kill.log and only displaying server.log on test failures
  • Added real-time health check status messages during service startup and paths-ignore filter to skip CI for documentation changes

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/ci_use/XPU_45T/run_w4a8.py Updated variable name from gpu_id to xpu_id and environment variable from GPU_ID to XPU_ID
tests/ci_use/XPU_45T/run_ep_online.py Updated variable name from gpu_id to xpu_id and environment variable from GPU_ID to XPU_ID
tests/ci_use/XPU_45T/run_ep.py Updated variable name from gpu_id to xpu_id and environment variable from GPU_ID to XPU_ID
tests/ci_use/XPU_45T/run_45vl.py Updated variable name from gpu_id to xpu_id and environment variable from GPU_ID to XPU_ID
tests/ci_use/XPU_45T/run_45T.py Updated variable name from gpu_id to xpu_id and environment variable from GPU_ID to XPU_ID
scripts/run_ci_xpu.sh Renamed GPU_ID to XPU_ID throughout, fixed bash loop syntax, redirected kill logs, added health check status output, and made server.log conditional
.github/workflows/ci_xpu.yml Renamed gpu_id to xpu_id in workflow variables and added paths-ignore to skip CI for .md and .txt files

Comment on lines 5 to +10
branches:
- develop
- 'release/*'
paths-ignore:
- '**.md'
- '**.txt'
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR标题格式正确,包含了[CI]和[XPU]标签。

但是,PR描述信息不够完整。描述中的"Motivation"和"Modifications"部分为空。建议补充以下内容:

Motivation(动机):说明为什么要进行这些修改,例如:

  • 由于这是XPU相关的CI测试代码,使用GPU_ID变量名容易造成混淆
  • CI日志输出过多,需要优化以提高可读性

Modifications(修改内容):详细说明具体的改动,例如:

  • 将所有GPU_ID变量重命名为XPU_ID,使命名更准确
  • 优化日志输出:将kill进程的日志重定向到kill.log,只在测试失败时输出server.log
  • 添加服务健康检查的实时状态输出
  • 在workflow中添加paths-ignore,忽略.md和.txt文件的变更

这样可以帮助reviewer和未来的代码维护者更好地理解这些改动的目的和内容。

Copilot generated this review using guidance from repository custom instructions.

HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -m 2 "$ENDPOINT" || true)

echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在健康检查循环中添加实时状态输出是个很好的改进,可以让CI日志更容易追踪。但是建议在echo语句中去掉 -e "\r",因为:

  1. 在CI日志中,\r(回车符)不会产生"覆盖上一行"的效果,而是会显示为特殊字符或被忽略
  2. CI日志是按行记录的,每次循环应该输出新的一行,这样更容易追踪等待时间的历史记录

建议修改为:

echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"

这样在CI日志中会看到每次检查的完整记录,更方便问题排查。

Suggested change
echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"
echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"

Copilot uses AI. Check for mistakes.

HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -m 2 "$ENDPOINT" || true)

echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同样的问题:在CI环境中使用 \r 回车符不会产生预期的"覆盖上一行"效果。建议去掉 -e "\r",直接输出每次检查的状态,便于在CI日志中追踪。

建议修改为:

echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"
Suggested change
echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"
echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"

Copilot uses AI. Check for mistakes.

HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -m 2 "$ENDPOINT" || true)

echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同样的问题:在CI环境中使用 \r 回车符不会产生预期效果。建议去掉 -e "\r",改为:

echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"
Suggested change
echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"
echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"

Copilot uses AI. Check for mistakes.
exit 1
fi
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -m 2 "$ENDPOINT" || true)
echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同样的问题:在CI环境中使用 \r 回车符不会产生预期效果。建议去掉 -e "\r",改为:

echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"
Suggested change
echo -e "\r服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"
echo "服务健康检查中... 已等待 ${ELAPSED} 秒,当前状态码:${HTTP_CODE}"

Copilot uses AI. Check for mistakes.
}
stop_processes

stop_processes >kill.log 2>&1
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop_processes的输出重定向到kill.log是个好主意,可以减少CI日志的噪音。但建议使用追加模式(>>)而不是覆盖模式(>),这样可以保留所有的kill操作记录,便于问题排查。

建议修改为:

stop_processes >>kill.log 2>&1

这样每次调用stop_processes时,日志都会被追加到kill.log,而不是覆盖之前的内容。这在脚本中多次调用stop_processes时特别有用(如第64、130、206、285、383、469、558行)。

Suggested change
stop_processes >kill.log 2>&1
stop_processes >>kill.log 2>&1

Copilot uses AI. Check for mistakes.
@plusNew001 plusNew001 merged commit 0e819cd into PaddlePaddle:develop Nov 14, 2025
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants