Skip to content

Conversation

@zhangstar333
Copy link
Contributor

@zhangstar333 zhangstar333 commented Oct 15, 2025

What problem does this PR solve?

Problem Summary:

  • in sort node, the topn maybe have a runtime predicate, which could filter data in scan. and then get the topn rows.
  • When no runtime predicate is pushed down, the previous implementation sorted all input data before maintaining the TopN heap.
  • This PR changes the logic to:
    1. Insert incoming rows into the heap until the heap reaches the TopN size.
    2. For subsequent batches, use the current top value in the heap to filter incoming rows before sorted data, avoiding unnecessary operations.
    3. Maintain the heap size by replacing the top when a better row arrives.
  • This avoids a full sort and improves performance for TopN queries without runtime predicate pushdown.
mysql> SELECT murmur_hash3_32(number) AS n FROM numbers("number" = "20000000") ORDER BY n, n + 1, n + 2 LIMIT 10;
+-------------+
| n           |
+-------------+
| -2147483480 |
| -2147483264 |
| -2147482864 |
| -2147482719 |
| -2147482662 |
| -2147482588 |
| -2147482109 |
| -2147482067 |
| -2147481904 |
| -2147481747 |
+-------------+
10 rows in set (1.83 sec)

mysql> SELECT murmur_hash3_32(number) AS n FROM numbers("number" = "20000000") ORDER BY n, n + 1, n + 2 LIMIT 10;
+-------------+
| n           |
+-------------+
| -2147483480 |
| -2147483264 |
| -2147482864 |
| -2147482719 |
| -2147482662 |
| -2147482588 |
| -2147482109 |
| -2147482067 |
| -2147481904 |
| -2147481747 |
+-------------+
10 rows in set (0.67 sec)

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Oct 15, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhangstar333
Copy link
Contributor Author

run buildall

@zhangstar333
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 29.66 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit fd1cb0c5ba73270de55970e904b3fdd48515e695, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.06	0.06
query3	0.26	0.09	0.08
query4	1.62	0.12	0.13
query5	0.27	0.26	0.25
query6	1.19	0.65	0.64
query7	0.03	0.03	0.03
query8	0.05	0.05	0.05
query9	0.65	0.53	0.53
query10	0.58	0.58	0.59
query11	0.17	0.11	0.12
query12	0.15	0.12	0.12
query13	0.64	0.61	0.60
query14	1.01	1.02	1.01
query15	0.86	0.87	0.86
query16	0.41	0.41	0.41
query17	1.03	1.05	1.02
query18	0.22	0.21	0.21
query19	1.89	1.81	1.88
query20	0.02	0.02	0.01
query21	15.44	0.94	0.58
query22	0.77	1.22	0.85
query23	14.80	1.36	0.64
query24	7.56	0.97	0.44
query25	0.52	0.24	0.07
query26	0.59	0.17	0.13
query27	0.08	0.06	0.06
query28	9.70	1.37	0.95
query29	12.62	3.95	3.28
query30	0.29	0.14	0.14
query31	2.82	0.58	0.38
query32	3.24	0.55	0.49
query33	3.12	3.06	3.13
query34	15.89	5.27	4.59
query35	4.55	4.64	4.57
query36	0.68	0.51	0.48
query37	0.10	0.07	0.08
query38	0.07	0.05	0.05
query39	0.03	0.03	0.04
query40	0.19	0.16	0.15
query41	0.09	0.04	0.03
query42	0.04	0.04	0.04
query43	0.05	0.04	0.03
Total cold run time: 104.45 s
Total hot run time: 29.66 s

@zhangstar333
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-DS: Total hot run time: 189922 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 80c1cf9b55c3135b849341ed38f72ec7e0b05a14, data reload: false

query1	1097	427	411	411
query2	6554	1705	1688	1688
query3	6764	224	225	224
query4	26441	23732	23514	23514
query5	4876	665	511	511
query6	342	258	236	236
query7	4657	512	310	310
query8	317	273	273	273
query9	8723	2652	2589	2589
query10	496	353	274	274
query11	15759	15315	14964	14964
query12	197	118	114	114
query13	1691	549	427	427
query14	11627	9237	9323	9237
query15	206	228	170	170
query16	7669	678	506	506
query17	1263	755	643	643
query18	2192	443	348	348
query19	237	233	192	192
query20	141	141	139	139
query21	227	147	133	133
query22	4701	4742	4606	4606
query23	34741	34078	33744	33744
query24	8958	2521	2606	2521
query25	651	553	527	527
query26	1265	287	175	175
query27	2747	527	362	362
query28	4419	2295	2245	2245
query29	817	642	510	510
query30	320	236	203	203
query31	986	865	772	772
query32	82	72	63	63
query33	616	415	330	330
query34	832	930	548	548
query35	863	873	816	816
query36	984	1023	918	918
query37	132	121	98	98
query38	3749	3668	3589	3589
query39	1537	1498	1406	1406
query40	229	127	115	115
query41	60	61	58	58
query42	117	112	109	109
query43	490	497	457	457
query44	1335	837	836	836
query45	179	175	177	175
query46	828	995	624	624
query47	1786	1830	1716	1716
query48	401	421	316	316
query49	770	511	414	414
query50	643	706	414	414
query51	3996	3910	3810	3810
query52	115	111	99	99
query53	245	268	201	201
query54	606	613	537	537
query55	94	83	84	83
query56	332	323	304	304
query57	1168	1233	1109	1109
query58	289	282	278	278
query59	2560	2657	2502	2502
query60	346	348	330	330
query61	165	153	153	153
query62	805	724	654	654
query63	232	198	194	194
query64	4422	1130	811	811
query65	4037	3945	3919	3919
query66	1073	432	325	325
query67	15559	15203	14884	14884
query68	8202	967	597	597
query69	491	318	281	281
query70	1357	1254	1290	1254
query71	508	352	322	322
query72	6039	4883	4886	4883
query73	704	585	363	363
query74	9063	9122	8610	8610
query75	4089	3322	2870	2870
query76	3756	1157	740	740
query77	824	414	318	318
query78	9669	9805	8916	8916
query79	2050	841	584	584
query80	723	562	482	482
query81	477	267	224	224
query82	431	160	133	133
query83	298	313	254	254
query84	300	117	100	100
query85	908	478	432	432
query86	337	319	293	293
query87	3727	3744	3628	3628
query88	2921	2222	2212	2212
query89	396	317	290	290
query90	2037	215	217	215
query91	166	161	136	136
query92	84	73	69	69
query93	1118	944	642	642
query94	694	453	346	346
query95	405	322	319	319
query96	494	579	281	281
query97	2939	2966	2895	2895
query98	238	217	209	209
query99	1470	1396	1261	1261
Total cold run time: 279673 ms
Total hot run time: 189922 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.48 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 80c1cf9b55c3135b849341ed38f72ec7e0b05a14, data reload: false

query1	0.06	0.06	0.05
query2	0.09	0.05	0.06
query3	0.25	0.09	0.09
query4	1.61	0.12	0.12
query5	0.30	0.28	0.26
query6	1.20	0.64	0.65
query7	0.03	0.03	0.03
query8	0.06	0.05	0.04
query9	0.61	0.54	0.52
query10	0.57	0.59	0.59
query11	0.21	0.11	0.11
query12	0.16	0.12	0.11
query13	0.62	0.62	0.60
query14	1.00	1.01	1.01
query15	0.86	0.83	0.84
query16	0.40	0.39	0.39
query17	1.01	1.02	1.05
query18	0.22	0.21	0.20
query19	1.88	1.81	1.78
query20	0.01	0.01	0.02
query21	15.44	0.92	0.58
query22	0.77	1.18	0.69
query23	14.97	1.38	0.66
query24	8.10	0.93	0.67
query25	0.45	0.28	0.07
query26	0.62	0.15	0.13
query27	0.07	0.06	0.05
query28	9.65	1.32	0.92
query29	12.56	3.97	3.27
query30	0.29	0.13	0.11
query31	2.82	0.59	0.38
query32	3.23	0.56	0.47
query33	3.11	3.07	3.13
query34	15.92	5.13	4.57
query35	4.62	4.54	4.56
query36	0.69	0.50	0.50
query37	0.11	0.07	0.07
query38	0.06	0.05	0.04
query39	0.04	0.03	0.02
query40	0.18	0.14	0.13
query41	0.09	0.03	0.03
query42	0.03	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 105.01 s
Total hot run time: 29.48 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 83.70% (190/227) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.58% (17845/33937)
Line Coverage 37.77% (162081/429134)
Region Coverage 32.23% (123651/383621)
Branch Coverage 33.62% (54254/161375)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 96.92% (220/227) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.20% (23693/33276)
Line Coverage 57.62% (247093/428848)
Region Coverage 52.69% (204632/388361)
Branch Coverage 54.51% (88442/162254)

1 similar comment
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 96.92% (220/227) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.20% (23693/33276)
Line Coverage 57.62% (247093/428848)
Region Coverage 52.69% (204632/388361)
Branch Coverage 54.51% (88442/162254)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 96.92% (220/227) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.20% (23693/33276)
Line Coverage 57.62% (247093/428848)
Region Coverage 52.69% (204632/388361)
Branch Coverage 54.51% (88442/162254)

@zhangstar333
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 29.66 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c01bed64c1d5e7b0e6709ef79ddee9f4957f7476, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.06	0.06
query3	0.26	0.09	0.09
query4	1.61	0.12	0.12
query5	0.30	0.27	0.26
query6	1.18	0.66	0.67
query7	0.03	0.03	0.03
query8	0.06	0.04	0.04
query9	0.62	0.54	0.53
query10	0.59	0.59	0.58
query11	0.18	0.12	0.12
query12	0.15	0.12	0.13
query13	0.63	0.62	0.60
query14	1.01	1.01	1.01
query15	0.85	0.86	0.85
query16	0.41	0.39	0.39
query17	1.04	1.08	1.04
query18	0.22	0.21	0.21
query19	1.96	1.83	1.78
query20	0.02	0.01	0.01
query21	15.42	0.94	0.59
query22	0.77	1.24	0.63
query23	14.92	1.44	0.64
query24	7.18	1.88	0.64
query25	0.50	0.19	0.12
query26	0.77	0.18	0.16
query27	0.07	0.07	0.05
query28	8.63	1.42	0.94
query29	12.61	3.97	3.29
query30	0.29	0.14	0.12
query31	2.83	0.60	0.39
query32	3.24	0.56	0.48
query33	3.12	3.04	3.21
query34	15.87	5.16	4.60
query35	4.60	4.55	4.57
query36	0.69	0.51	0.49
query37	0.10	0.07	0.08
query38	0.07	0.05	0.05
query39	0.04	0.04	0.03
query40	0.17	0.14	0.14
query41	0.09	0.03	0.03
query42	0.05	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 103.35 s
Total hot run time: 29.66 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 80.75% (172/213) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.58% (17849/33946)
Line Coverage 37.77% (162102/429222)
Region Coverage 32.23% (123666/383689)
Branch Coverage 33.62% (54272/161408)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.84% (202/213) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.28% (23726/33285)
Line Coverage 57.71% (247578/428980)
Region Coverage 52.77% (205090/388653)
Branch Coverage 54.60% (88613/162303)

2 similar comments
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.84% (202/213) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.28% (23726/33285)
Line Coverage 57.71% (247578/428980)
Region Coverage 52.77% (205090/388653)
Branch Coverage 54.60% (88613/162303)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.84% (202/213) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.28% (23726/33285)
Line Coverage 57.71% (247578/428980)
Region Coverage 52.77% (205090/388653)
Branch Coverage 54.60% (88613/162303)

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 20, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@gavinchou gavinchou self-requested a review October 26, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants