[fix](regression) Add external regression stage timing summary#61869
[fix](regression) Add external regression stage timing summary#61869xylaaaaa wants to merge 2 commits intoapache:masterfrom
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Add a reusable stage timer helper for regression pipeline shell scripts and provide an external regression wrapper that prints five stage durations and total time automatically when the TeamCity command-line step exits.
### Release note
None
### Check List (For Author)
- Test: Regression test / Manual test
- Regression test: python regression-test/pipeline/common/test_stage_timer.py
- Manual test: bash -n regression-test/pipeline/common/stage-timer.sh regression-test/pipeline/external/external-stage-timer.sh
- Manual test: source regression-test/pipeline/external/external-stage-timer.sh and verify the summary is printed on shell exit
- Behavior changed: Yes (external regression can print stage timing summary after pipeline finishes once integrated into the TeamCity script)
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Make the live external regression pipeline print the agreed five stage durations without editing the TeamCity inline step. Detect the external inline shell through github-utils.sh, auto-enable stage timing hooks, and keep regression coverage for the automatic path.
### Release note
None
### Check List (For Author)
- Test: Regression test / Manual test
- Regression test: python regression-test/pipeline/common/test_stage_timer.py
- Manual test: bash -n regression-test/pipeline/common/stage-timer.sh regression-test/pipeline/external/external-stage-timer.sh regression-test/pipeline/common/github-utils.sh
- Behavior changed: Yes (external regression logs now print stage timing summary automatically when the updated checkout is used)
- Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 26886 ms |
TPC-DS: Total hot run time: 168993 ms |
There was a problem hiding this comment.
Pull request overview
Adds a reusable stage-timing helper for the TeamCity “external regression” pipeline and auto-enables it via an existing github-utils.sh sourcing point, so logs include a per-stage duration breakdown for troubleshooting pipeline bottlenecks.
Changes:
- Introduce
stage-timer.shto record named stages and print a timing summary on shell exit. - Add
external-stage-timer.shto map external regression actions into five stages and auto-switch stages via a DEBUG trap. - Update
github-utils.shto detect the external inline shell “main” signature and auto-enable the stage timer; add Python tests to validate behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| regression-test/pipeline/common/stage-timer.sh | New generic stage timing + EXIT-trap summary printer. |
| regression-test/pipeline/external/external-stage-timer.sh | External regression stage mapping + DEBUG-trap auto stage switching. |
| regression-test/pipeline/common/github-utils.sh | Auto-detect TeamCity external inline shell and enable stage timer hooks. |
| regression-test/pipeline/common/test_stage_timer.py | Tests for success/failure summaries and auto-hook behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| local exit_code="${1:-0}" | ||
| stage_timer_finish "${exit_code}" | ||
| if [[ -n "${STAGE_TIMER_PREVIOUS_EXIT_TRAP:-}" ]] && | ||
| [[ "${STAGE_TIMER_PREVIOUS_EXIT_TRAP}" != *"stage_timer__handle_exit"* ]]; then |
There was a problem hiding this comment.
In the EXIT trap chaining, the previous EXIT trap will observe an incorrect $? value because stage_timer_finish runs before evaling the captured trap. If the previous trap uses $? (common for cleanup/reporting), it will see the status of stage_timer_finish (likely 0) instead of the script’s real exit code. Consider restoring $? to exit_code (e.g., via a no-op subshell (exit "$exit_code") before eval) or otherwise ensuring the previous trap runs with the original exit status.
| [[ "${STAGE_TIMER_PREVIOUS_EXIT_TRAP}" != *"stage_timer__handle_exit"* ]]; then | |
| [[ "${STAGE_TIMER_PREVIOUS_EXIT_TRAP}" != *"stage_timer__handle_exit"* ]]; then | |
| # Restore the original exit status so that any `$?` in the previous | |
| # EXIT trap observes the script's real exit code instead of the | |
| # status of stage_timer_finish. | |
| ( exit "${exit_code}" ) |
| external_regression_stage_timer__run_previous_debug_trap() { | ||
| if [[ -n "${EXTERNAL_REGRESSION_STAGE_TIMER_PREVIOUS_DEBUG_TRAP:-}" ]] && | ||
| [[ "${EXTERNAL_REGRESSION_STAGE_TIMER_PREVIOUS_DEBUG_TRAP}" != *"external_regression_stage_timer__debug_hook"* ]]; then | ||
| eval "${EXTERNAL_REGRESSION_STAGE_TIMER_PREVIOUS_DEBUG_TRAP}" | ||
| fi | ||
| } | ||
|
|
||
| external_regression_stage_timer__debug_hook() { | ||
| local current_command="${1:-$BASH_COMMAND}" | ||
| if [[ "${EXTERNAL_REGRESSION_STAGE_TIMER_IN_DEBUG_HOOK:-false}" == "true" ]]; then | ||
| return 0 | ||
| fi | ||
|
|
||
| EXTERNAL_REGRESSION_STAGE_TIMER_IN_DEBUG_HOOK=true | ||
| external_regression_stage_timer__handle_command "${current_command}" | ||
| external_regression_stage_timer__run_previous_debug_trap | ||
| EXTERNAL_REGRESSION_STAGE_TIMER_IN_DEBUG_HOOK=false |
There was a problem hiding this comment.
The DEBUG-trap chaining changes $? before running the previous DEBUG trap: external_regression_stage_timer__handle_command and other helper calls run first, so the previous trap will see their status (typically 0) rather than the original $? value from when the DEBUG trap fired. If an existing DEBUG trap depends on $?, this breaks behavior. Capture the incoming $? at the start of external_regression_stage_timer__debug_hook and restore it immediately before evaling EXTERNAL_REGRESSION_STAGE_TIMER_PREVIOUS_DEBUG_TRAP (e.g., using (exit "$saved_status") as the immediately preceding command).
| @@ -0,0 +1,149 @@ | |||
| #!/usr/bin/env python | |||
There was a problem hiding this comment.
This test script uses tempfile.TemporaryDirectory() which requires Python 3, but the shebang is #!/usr/bin/env python (may resolve to Python 2 on some environments). Update the shebang to python3 (or remove it if the test is always invoked via an explicit interpreter) to avoid running under Python 2 and failing at import/runtime.
| #!/usr/bin/env python | |
| #!/usr/bin/env python3 |
What changed
regression-test/pipeline/common/stage-timer.shto record named stages in bash and print a timing summary on shell exitregression-test/pipeline/external/external-stage-timer.shto map the external regression pipeline into five stages: 前置准备, 启动 Doris, 启动依赖, 执行 Case, 收尾归档regression-test/pipeline/common/github-utils.shto detect the live TeamCity external inline shell and auto-enable the stage timer without editing the TeamCity build step itselfregression-test/pipeline/common/test_stage_timer.pyto verify normal exit, failure exit, auto-hook enablement, and non-external no-op behaviorProblem
The external regression pipeline usually takes more than one hour, but the final log does not show how much time was spent in startup, case execution, and cleanup. That makes it hard to see whether the bottleneck is Doris startup, external dependency startup, or the case run itself.
Root cause
The timer helper could live in the repository, but the real external regression job is executed by a TeamCity inline shell. Without an automatic hook on that live shell path, repository-only helper code would not actually print stage timings in the real pipeline.
How it is fixed
The change uses the existing
github-utils.shsource point that is already loaded by the live external regression shell. When the shell matches the external pipeline markers, it automatically sourcesexternal-stage-timer.sh, starts the timer in the prepare stage, switches stages from existing command/log anchors through aDEBUGtrap, and prints the timing summary from anEXITtrap.Behavior change
Before this change, external regression only printed the normal build log.
After this change, successful and failed external regression runs print a five-stage timing summary and the total duration before the shell exits.
This change only adds timing output to the pipeline log. It does not change regression case selection or execution order.
Validation
python regression-test/pipeline/common/test_stage_timer.pybash -n regression-test/pipeline/common/stage-timer.sh regression-test/pipeline/external/external-stage-timer.sh regression-test/pipeline/common/github-utils.sh