Skip to content

[fix](regression) Add external regression stage timing summary#61869

Open
xylaaaaa wants to merge 2 commits intoapache:masterfrom
xylaaaaa:codex/external-regression-stage-timer
Open

[fix](regression) Add external regression stage timing summary#61869
xylaaaaa wants to merge 2 commits intoapache:masterfrom
xylaaaaa:codex/external-regression-stage-timer

Conversation

@xylaaaaa
Copy link
Copy Markdown
Contributor

@xylaaaaa xylaaaaa commented Mar 30, 2026

What changed

  • add regression-test/pipeline/common/stage-timer.sh to record named stages in bash and print a timing summary on shell exit
  • add regression-test/pipeline/external/external-stage-timer.sh to map the external regression pipeline into five stages: 前置准备, 启动 Doris, 启动依赖, 执行 Case, 收尾归档
  • update regression-test/pipeline/common/github-utils.sh to detect the live TeamCity external inline shell and auto-enable the stage timer without editing the TeamCity build step itself
  • add regression-test/pipeline/common/test_stage_timer.py to verify normal exit, failure exit, auto-hook enablement, and non-external no-op behavior

Problem

The external regression pipeline usually takes more than one hour, but the final log does not show how much time was spent in startup, case execution, and cleanup. That makes it hard to see whether the bottleneck is Doris startup, external dependency startup, or the case run itself.

Root cause

The timer helper could live in the repository, but the real external regression job is executed by a TeamCity inline shell. Without an automatic hook on that live shell path, repository-only helper code would not actually print stage timings in the real pipeline.

How it is fixed

The change uses the existing github-utils.sh source point that is already loaded by the live external regression shell. When the shell matches the external pipeline markers, it automatically sources external-stage-timer.sh, starts the timer in the prepare stage, switches stages from existing command/log anchors through a DEBUG trap, and prints the timing summary from an EXIT trap.

Behavior change

Before this change, external regression only printed the normal build log.
After this change, successful and failed external regression runs print a five-stage timing summary and the total duration before the shell exits.
This change only adds timing output to the pipeline log. It does not change regression case selection or execution order.

Validation

  • python regression-test/pipeline/common/test_stage_timer.py
  • bash -n regression-test/pipeline/common/stage-timer.sh regression-test/pipeline/external/external-stage-timer.sh regression-test/pipeline/common/github-utils.sh

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Add a reusable stage timer helper for regression pipeline shell scripts and provide an external regression wrapper that prints five stage durations and total time automatically when the TeamCity command-line step exits.

### Release note

None

### Check List (For Author)

- Test: Regression test / Manual test
    - Regression test: python regression-test/pipeline/common/test_stage_timer.py
    - Manual test: bash -n regression-test/pipeline/common/stage-timer.sh regression-test/pipeline/external/external-stage-timer.sh
    - Manual test: source regression-test/pipeline/external/external-stage-timer.sh and verify the summary is printed on shell exit
- Behavior changed: Yes (external regression can print stage timing summary after pipeline finishes once integrated into the TeamCity script)
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Make the live external regression pipeline print the agreed five stage durations without editing the TeamCity inline step. Detect the external inline shell through github-utils.sh, auto-enable stage timing hooks, and keep regression coverage for the automatic path.

### Release note

None

### Check List (For Author)

- Test: Regression test / Manual test
    - Regression test: python regression-test/pipeline/common/test_stage_timer.py
    - Manual test: bash -n regression-test/pipeline/common/stage-timer.sh regression-test/pipeline/external/external-stage-timer.sh regression-test/pipeline/common/github-utils.sh
- Behavior changed: Yes (external regression logs now print stage timing summary automatically when the updated checkout is used)
- Does this need documentation: No
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Mar 30, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@xylaaaaa
Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 26886 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 418b0596cb919d1ef3072753786ca481333030dd, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17634	4528	4328	4328
q2	q3	10657	813	523	523
q4	4672	350	253	253
q5	7572	1250	1029	1029
q6	177	176	145	145
q7	789	874	693	693
q8	9744	1488	1335	1335
q9	5613	4812	4738	4738
q10	6335	1935	1669	1669
q11	483	250	238	238
q12	736	587	461	461
q13	18046	2657	1986	1986
q14	230	234	208	208
q15	q16	742	732	670	670
q17	733	873	446	446
q18	5853	5387	5259	5259
q19	1096	1021	623	623
q20	544	491	367	367
q21	4479	1831	1626	1626
q22	437	340	289	289
Total cold run time: 96572 ms
Total hot run time: 26886 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4824	4566	4573	4566
q2	q3	3944	4438	3909	3909
q4	883	1230	827	827
q5	4056	4361	4360	4360
q6	183	181	138	138
q7	1755	1676	1528	1528
q8	2495	2777	2664	2664
q9	7761	7499	7379	7379
q10	3809	4017	3672	3672
q11	511	437	425	425
q12	494	617	456	456
q13	2620	3273	2150	2150
q14	301	309	294	294
q15	q16	729	775	723	723
q17	1172	1425	1411	1411
q18	7217	6884	6613	6613
q19	915	997	1005	997
q20	2083	2229	2007	2007
q21	4058	3524	3384	3384
q22	480	445	390	390
Total cold run time: 50290 ms
Total hot run time: 47893 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 168993 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 418b0596cb919d1ef3072753786ca481333030dd, data reload: false

query5	4328	646	510	510
query6	321	220	206	206
query7	4227	458	261	261
query8	335	239	234	234
query9	8725	2794	2757	2757
query10	534	381	340	340
query11	7020	5131	4849	4849
query12	185	131	125	125
query13	1304	468	351	351
query14	5837	3841	3473	3473
query14_1	2883	2837	2845	2837
query15	203	193	176	176
query16	981	491	475	475
query17	984	736	625	625
query18	2467	471	353	353
query19	222	219	185	185
query20	136	127	127	127
query21	210	137	114	114
query22	13292	14029	14746	14029
query23	16672	16555	15970	15970
query23_1	16270	16097	15601	15601
query24	7053	1614	1213	1213
query24_1	1240	1218	1220	1218
query25	526	475	441	441
query26	1228	259	149	149
query27	2777	495	297	297
query28	4475	1856	1858	1856
query29	855	564	472	472
query30	295	225	188	188
query31	1016	952	892	892
query32	79	70	68	68
query33	512	327	284	284
query34	918	858	518	518
query35	638	699	596	596
query36	1086	1108	995	995
query37	137	91	85	85
query38	2953	2935	2865	2865
query39	863	846	811	811
query39_1	804	779	801	779
query40	226	154	135	135
query41	63	62	58	58
query42	262	254	260	254
query43	248	265	224	224
query44	
query45	200	189	186	186
query46	893	982	611	611
query47	2139	2542	2068	2068
query48	312	324	227	227
query49	624	468	377	377
query50	692	275	211	211
query51	4058	4055	4054	4054
query52	261	272	262	262
query53	290	339	282	282
query54	295	276	277	276
query55	95	92	84	84
query56	310	329	305	305
query57	1940	1679	1641	1641
query58	283	268	265	265
query59	2792	2972	2772	2772
query60	332	326	320	320
query61	156	155	160	155
query62	637	595	545	545
query63	310	279	273	273
query64	5070	1283	1033	1033
query65	
query66	1463	460	358	358
query67	24227	24392	24221	24221
query68	
query69	407	318	289	289
query70	949	985	897	897
query71	330	300	302	300
query72	2946	2879	2638	2638
query73	542	561	315	315
query74	9666	9613	9445	9445
query75	2874	2756	2464	2464
query76	2268	1044	670	670
query77	354	378	311	311
query78	10936	11147	10493	10493
query79	3060	776	580	580
query80	1759	625	529	529
query81	551	266	224	224
query82	1010	149	121	121
query83	325	262	245	245
query84	290	126	112	112
query85	940	510	470	470
query86	417	301	288	288
query87	3187	3105	2964	2964
query88	3593	2677	2666	2666
query89	437	364	343	343
query90	1988	189	179	179
query91	171	162	141	141
query92	81	77	76	76
query93	1219	855	509	509
query94	638	324	279	279
query95	589	404	329	329
query96	642	515	235	235
query97	2456	2467	2370	2370
query98	233	224	226	224
query99	1015	986	910	910
Total cold run time: 252515 ms
Total hot run time: 168993 ms

@xylaaaaa xylaaaaa changed the title [codex] Add external regression stage timing summary [fix](regression) Add external regression stage timing summary Mar 30, 2026
@xylaaaaa xylaaaaa marked this pull request as ready for review April 1, 2026 01:38
Copilot AI review requested due to automatic review settings April 1, 2026 01:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a reusable stage-timing helper for the TeamCity “external regression” pipeline and auto-enables it via an existing github-utils.sh sourcing point, so logs include a per-stage duration breakdown for troubleshooting pipeline bottlenecks.

Changes:

  • Introduce stage-timer.sh to record named stages and print a timing summary on shell exit.
  • Add external-stage-timer.sh to map external regression actions into five stages and auto-switch stages via a DEBUG trap.
  • Update github-utils.sh to detect the external inline shell “main” signature and auto-enable the stage timer; add Python tests to validate behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
regression-test/pipeline/common/stage-timer.sh New generic stage timing + EXIT-trap summary printer.
regression-test/pipeline/external/external-stage-timer.sh External regression stage mapping + DEBUG-trap auto stage switching.
regression-test/pipeline/common/github-utils.sh Auto-detect TeamCity external inline shell and enable stage timer hooks.
regression-test/pipeline/common/test_stage_timer.py Tests for success/failure summaries and auto-hook behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

local exit_code="${1:-0}"
stage_timer_finish "${exit_code}"
if [[ -n "${STAGE_TIMER_PREVIOUS_EXIT_TRAP:-}" ]] &&
[[ "${STAGE_TIMER_PREVIOUS_EXIT_TRAP}" != *"stage_timer__handle_exit"* ]]; then
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the EXIT trap chaining, the previous EXIT trap will observe an incorrect $? value because stage_timer_finish runs before evaling the captured trap. If the previous trap uses $? (common for cleanup/reporting), it will see the status of stage_timer_finish (likely 0) instead of the script’s real exit code. Consider restoring $? to exit_code (e.g., via a no-op subshell (exit "$exit_code") before eval) or otherwise ensuring the previous trap runs with the original exit status.

Suggested change
[[ "${STAGE_TIMER_PREVIOUS_EXIT_TRAP}" != *"stage_timer__handle_exit"* ]]; then
[[ "${STAGE_TIMER_PREVIOUS_EXIT_TRAP}" != *"stage_timer__handle_exit"* ]]; then
# Restore the original exit status so that any `$?` in the previous
# EXIT trap observes the script's real exit code instead of the
# status of stage_timer_finish.
( exit "${exit_code}" )

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +86
external_regression_stage_timer__run_previous_debug_trap() {
if [[ -n "${EXTERNAL_REGRESSION_STAGE_TIMER_PREVIOUS_DEBUG_TRAP:-}" ]] &&
[[ "${EXTERNAL_REGRESSION_STAGE_TIMER_PREVIOUS_DEBUG_TRAP}" != *"external_regression_stage_timer__debug_hook"* ]]; then
eval "${EXTERNAL_REGRESSION_STAGE_TIMER_PREVIOUS_DEBUG_TRAP}"
fi
}

external_regression_stage_timer__debug_hook() {
local current_command="${1:-$BASH_COMMAND}"
if [[ "${EXTERNAL_REGRESSION_STAGE_TIMER_IN_DEBUG_HOOK:-false}" == "true" ]]; then
return 0
fi

EXTERNAL_REGRESSION_STAGE_TIMER_IN_DEBUG_HOOK=true
external_regression_stage_timer__handle_command "${current_command}"
external_regression_stage_timer__run_previous_debug_trap
EXTERNAL_REGRESSION_STAGE_TIMER_IN_DEBUG_HOOK=false
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DEBUG-trap chaining changes $? before running the previous DEBUG trap: external_regression_stage_timer__handle_command and other helper calls run first, so the previous trap will see their status (typically 0) rather than the original $? value from when the DEBUG trap fired. If an existing DEBUG trap depends on $?, this breaks behavior. Capture the incoming $? at the start of external_regression_stage_timer__debug_hook and restore it immediately before evaling EXTERNAL_REGRESSION_STAGE_TIMER_PREVIOUS_DEBUG_TRAP (e.g., using (exit "$saved_status") as the immediately preceding command).

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,149 @@
#!/usr/bin/env python
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test script uses tempfile.TemporaryDirectory() which requires Python 3, but the shebang is #!/usr/bin/env python (may resolve to Python 2 on some environments). Update the shebang to python3 (or remove it if the test is always invoked via an explicit interpreter) to avoid running under Python 2 and failing at import/runtime.

Suggested change
#!/usr/bin/env python
#!/usr/bin/env python3

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants