Skip to content

[Fix](Query Stats) Add QueryStatsRecorder for column-level query and filter - Part3#63974

Open
nsivarajan wants to merge 1 commit into
apache:masterfrom
nsivarajan:fix-query-filter-stats-part-3-fix
Open

[Fix](Query Stats) Add QueryStatsRecorder for column-level query and filter - Part3#63974
nsivarajan wants to merge 1 commit into
apache:masterfrom
nsivarajan:fix-query-filter-stats-part-3-fix

Conversation

@nsivarajan
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #63067, #63768

Problem Summary:

Follow-up of Part 2 (#xxx). Extends column-level query/filter hit recording to cover the constructs that were explicitly deferred from Part 2, plus additional cases discovered during full plan-node audit:

Deferred from Part 2 (now resolved):

  • UNION / INTERSECT / EXCEPT — PhysicalSetOperation.getRegularChildrenOutputs() maps each child branch's slots to queryHit
  • CTE columns — PhysicalCTEProducer child is walked explicitly before sibling nodes so scan slots are registered; PhysicalCTEConsumer maps consumer-side ExprIds to producer scan slots
  • LATERAL VIEW / EXPLODE — PhysicalGenerate generator input columns recorded as queryHit
  • HAVING SUM(k2) > 0 — single-input aggregate output ExprId is mapped to the underlying scan so a parent PhysicalFilter records filterHit on k2

Additional gaps closed by full plan-node audit:

  • HAVING SUM(k2+k3) > 0 — multi-input aggregate outputs populate aggOutputToInputSlots; recordInputSlotsAsFilterHit expands to all contributing columns
  • Mark join conjuncts — AbstractPhysicalJoin.getMarkJoinConjuncts() is a separate field not included in hash or other conjuncts; adds filterHit for IN/EXISTS subquery correlation columns
  • Recursive CTE — PhysicalRecursiveUnion extends PhysicalBinary (not PhysicalSetOperation); dedicated handler calls getRegularChildrenOutputs() so base-case columns get queryHit; recursive-case WorkTableReference
    slots are silently skipped
  • Computed SELECT expressions — SELECT k1+k2 AS result now records queryHit on both k1 and k2 via the multi-input branch of the PhysicalProject alias handler, consistent with how ORDER BY k1+k2 and GROUP BY k1+k2
    already behave

Out of scope (intentional):

  • External tables (Hive / Iceberg / JDBC / ODBC) — OlapTable-only by design

Release note

Extend QueryStatsRecorder to cover UNION/INTERSECT/EXCEPT, CTE (inline and materialized), LATERAL VIEW/EXPLODE, HAVING on aggregate expressions (single and multi-input), IN/EXISTS mark join predicates, recursive CTEs, and computed SELECT expressions — completing full OlapTable column-level query/filter hit coverage.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@nsivarajan
Copy link
Copy Markdown
Contributor Author

run buildall

@nsivarajan nsivarajan force-pushed the fix-query-filter-stats-part-3-fix branch from 143bd88 to 9c84413 Compare June 2, 2026 02:49
@nsivarajan
Copy link
Copy Markdown
Contributor Author

run buildall

@nsivarajan
Copy link
Copy Markdown
Contributor Author

run external

@nsivarajan
Copy link
Copy Markdown
Contributor Author

run nonConcurrent

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 8.06% (37/459) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 8.06% (37/459) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29234 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9c8441339eac0ddb223039dbeaeb4746c6ccaa7f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17741	4086	4074	4074
q2	q3	10729	1391	802	802
q4	4688	484	348	348
q5	7598	914	586	586
q6	184	176	137	137
q7	792	852	664	664
q8	9354	1638	1551	1551
q9	5807	4559	4487	4487
q10	6800	1836	1531	1531
q11	443	275	260	260
q12	636	445	304	304
q13	18190	3398	2809	2809
q14	268	270	254	254
q15	q16	835	777	707	707
q17	979	1008	926	926
q18	7046	5934	5485	5485
q19	1309	1353	1084	1084
q20	528	405	261	261
q21	6288	2889	2643	2643
q22	492	376	321	321
Total cold run time: 100707 ms
Total hot run time: 29234 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5113	4747	4835	4747
q2	q3	4895	5375	4710	4710
q4	2142	2195	1400	1400
q5	4847	4836	4740	4740
q6	244	183	129	129
q7	1874	1839	1643	1643
q8	2473	2125	2113	2113
q9	8084	7730	7596	7596
q10	4772	4719	4244	4244
q11	534	388	358	358
q12	776	743	526	526
q13	3024	3380	2806	2806
q14	277	276	257	257
q15	q16	688	702	603	603
q17	1300	1285	1271	1271
q18	7221	6642	6673	6642
q19	1188	1120	1112	1112
q20	2244	2236	1953	1953
q21	5308	4608	4438	4438
q22	516	454	422	422
Total cold run time: 57520 ms
Total hot run time: 51710 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170285 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9c8441339eac0ddb223039dbeaeb4746c6ccaa7f, data reload: false

query5	4337	623	488	488
query6	436	200	184	184
query7	4893	551	307	307
query8	369	224	215	215
query9	8775	4086	4036	4036
query10	443	327	259	259
query11	5884	2360	2187	2187
query12	158	102	102	102
query13	1279	603	461	461
query14	6427	5426	5107	5107
query14_1	4415	4442	4428	4428
query15	208	199	177	177
query16	1006	477	460	460
query17	1153	749	615	615
query18	2479	487	363	363
query19	214	190	149	149
query20	122	108	111	108
query21	220	141	122	122
query22	13618	13721	13499	13499
query23	17624	16503	16121	16121
query23_1	16361	16323	16427	16323
query24	7567	1795	1344	1344
query24_1	1334	1328	1339	1328
query25	593	475	404	404
query26	1308	348	170	170
query27	2629	568	335	335
query28	4455	2029	2034	2029
query29	1072	603	484	484
query30	307	235	201	201
query31	1130	1076	946	946
query32	106	60	56	56
query33	513	309	242	242
query34	1180	1135	664	664
query35	751	792	658	658
query36	1426	1371	1232	1232
query37	147	106	92	92
query38	3234	3125	3052	3052
query39	944	927	905	905
query39_1	860	873	860	860
query40	218	124	113	113
query41	65	63	63	63
query42	96	98	96	96
query43	324	327	277	277
query44	
query45	193	199	180	180
query46	1076	1208	701	701
query47	2377	2403	2289	2289
query48	384	418	301	301
query49	620	465	361	361
query50	1039	369	256	256
query51	4349	4299	4261	4261
query52	86	88	77	77
query53	254	272	191	191
query54	269	213	195	195
query55	79	75	74	74
query56	234	253	209	209
query57	1424	1398	1304	1304
query58	257	221	211	211
query59	1610	1654	1491	1491
query60	283	252	234	234
query61	161	160	159	159
query62	701	653	584	584
query63	238	187	182	182
query64	2581	787	632	632
query65	
query66	1789	467	343	343
query67	29790	29839	29631	29631
query68	
query69	448	312	265	265
query70	974	951	944	944
query71	299	217	218	217
query72	3012	2672	2539	2539
query73	840	760	432	432
query74	5145	4988	4763	4763
query75	2664	2579	2253	2253
query76	2306	1174	739	739
query77	336	407	291	291
query78	12359	12543	12049	12049
query79	1509	1032	783	783
query80	584	490	434	434
query81	466	283	260	260
query82	574	164	124	124
query83	365	285	262	262
query84	278	152	117	117
query85	964	580	445	445
query86	369	295	289	289
query87	3382	3337	3231	3231
query88	3608	2754	2719	2719
query89	435	386	333	333
query90	1969	197	167	167
query91	175	171	140	140
query92	67	61	56	56
query93	1550	1425	861	861
query94	510	351	325	325
query95	698	381	446	381
query96	1093	782	352	352
query97	2697	2706	2577	2577
query98	212	210	209	209
query99	1154	1171	1024	1024
Total cold run time: 252039 ms
Total hot run time: 170285 ms

@nsivarajan nsivarajan marked this pull request as ready for review June 2, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants