Skip to content

[improvement](fe) Balance runtime filter coordinator selection#64130

Open
BiteTheDDDDt wants to merge 1 commit into
apache:masterfrom
BiteTheDDDDt:codex/improve-rf-coordinator-selection
Open

[improvement](fe) Balance runtime filter coordinator selection#64130
BiteTheDDDDt wants to merge 1 commit into
apache:masterfrom
BiteTheDDDDt:codex/improve-rf-coordinator-selection

Conversation

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Jun 5, 2026

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Nereids runtime filter merge coordinator selection in FE used the first instance of the top-most distributed plan. The merge coordinator should stay on a top-most fragment instance so its lifetime covers runtime filter merging, but stable plan and worker ordering can still make repeated queries choose the same backend. This change randomly selects the merge worker from the distinct BEs assigned to the top-most distributed plan, so coordinator work is no longer fixed to the first top-most instance while preserving the top-most fragment lifetime requirement. Legacy Coordinator behavior is unchanged.

Release note

None

Check List (For Author)

  • Test: Unit Test
    • ./run-fe-ut.sh --run org.apache.doris.qe.runtime.RuntimeFiltersThriftBuilderTest
    • ./build.sh --fe
  • Behavior changed: Yes. Nereids runtime filter coordinator selection is randomized across top-most fragment instances instead of using a fixed first top-most instance.
  • Does this need documentation: No

Copilot AI review requested due to automatic review settings June 5, 2026 03:02
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR balances runtime filter merge coordinator selection in FE by introducing a shared round-robin selector and ensuring the chosen coordinator is drawn from distinct participating backends rather than always defaulting to the first/top instance. This reduces the chance that repeated queries concentrate coordinator work on a single BE due to stable fragment/worker ordering.

Changes:

  • Add a process-wide round-robin helper (RuntimeFilterCoordinatorSelector) to rotate coordinator selection across queries.
  • Update Nereids runtime filter thrift building to select the merge worker from all participating workers (deduped by BE).
  • Update legacy Coordinator runtime filter merge instance selection to rotate across distinct participating BEs while preserving BE fallback alignment; expand unit tests accordingly.

Review Checkpoints (Part 1.3)

  • Goal & correctness: The code changes implement round-robin selection over deduped candidates in both legacy and Nereids paths, matching the described goal.
  • Concurrency: Selection state is maintained via AtomicLong (getAndIncrement()), which is thread-safe for concurrent queries.
  • Lifecycle/static init: The new static selector is simple and does not introduce cross-class initialization dependencies.
  • Tests: Unit tests are added/expanded to validate the selector’s round-robin behavior and Nereids merge-worker selection.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
fe/fe-core/src/test/java/org/apache/doris/qe/runtime/ThriftPlansBuilderTest.java Expands FE unit tests to cover round-robin selection and Nereids merge-worker selection behavior.
fe/fe-core/src/main/java/org/apache/doris/qe/runtime/RuntimeFiltersThriftBuilder.java Selects merge worker from all distinct participating workers and applies round-robin rotation.
fe/fe-core/src/main/java/org/apache/doris/qe/runtime/RuntimeFilterCoordinatorSelector.java New shared round-robin selector used across coordinator-selection sites.
fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java Legacy planner path now round-robins the runtime filter merge instance across distinct participating BEs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@BiteTheDDDDt BiteTheDDDDt force-pushed the codex/improve-rf-coordinator-selection branch from ce2a27d to eea2548 Compare June 5, 2026 08:16
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Nereids runtime filter merge coordinator selection in FE used the first instance of the top-most distributed plan. The merge coordinator should stay on a top-most fragment instance so its lifetime covers runtime filter merging, but stable plan and worker ordering can still make repeated queries choose the same backend. This change randomly selects the merge worker from the distinct BEs assigned to the top-most distributed plan, so coordinator work is no longer fixed to the first top-most instance while preserving the top-most fragment lifetime requirement. Legacy Coordinator behavior is unchanged.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.qe.runtime.RuntimeFiltersThriftBuilderTest
    - ./build.sh --fe
- Behavior changed: Yes. Nereids runtime filter coordinator selection is randomized across top-most fragment instances instead of using a fixed first top-most instance.
- Does this need documentation: No
@BiteTheDDDDt BiteTheDDDDt force-pushed the codex/improve-rf-coordinator-selection branch from eea2548 to c11a45f Compare June 5, 2026 08:31
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found.

Critical checkpoint conclusions:

  • Goal/test: The PR changes Nereids runtime filter merge coordinator selection from the first top-most fragment instance to a random distinct backend from the top-most distributed plan. This satisfies the stated balancing goal while preserving the merge coordinator on a top-most fragment backend. A focused FE unit test covers candidate collection from the top-most plan and verifies selection is within those candidates.
  • Scope/clarity: The change is small and localized to RuntimeFiltersThriftBuilder plus one unit test.
  • Concurrency: No shared mutable state or new locking/concurrency path is introduced; ThreadLocalRandom is used only during per-query thrift-plan construction.
  • Lifecycle/static initialization: No new static/global lifecycle concerns were introduced.
  • Configuration: No configuration items were added.
  • Compatibility: No thrift schema, storage format, function symbol, or rolling-upgrade-sensitive protocol change was introduced; only the selected address value changes among already assigned top-most fragment workers.
  • Parallel paths: Legacy Coordinator behavior is intentionally unchanged; the modified path is the Nereids ThriftPlansBuilder path that uses RuntimeFiltersThriftBuilder.
  • Conditional checks: The new empty-candidate check is an invariant check for malformed distributed plans and is acceptable.
  • Test coverage: The added unit test covers the key helper behavior. I attempted to run ./run-fe-ut.sh --run org.apache.doris.qe.runtime.RuntimeFiltersThriftBuilderTest locally, but this runner is missing thirdparty/installed/bin/protoc, so generated source build failed before the test could execute.
  • Test results: No .out results are involved.
  • Observability: No additional observability appears necessary for this low-level coordinator selection change.
  • Transactions/persistence/data writes: Not applicable.
  • FE-BE variable passing: Existing runtime filter params propagation is preserved; merge params are still populated only for the backend matching runtime_filter_merge_addr, and the selected backend is drawn from the top-most plan workers so it is present in the thrift output.
  • Performance: Candidate collection is linear in the number of top-most instances and negligible compared with plan construction; no hot-path regression found.

User focus: No additional user-provided review focus was specified.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29251 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c11a45f9c343189d1fa072e83d934d1d15f55010, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17721	4013	4051	4013
q2	q3	10813	1368	795	795
q4	4689	486	347	347
q5	7607	869	592	592
q6	190	172	143	143
q7	763	848	653	653
q8	9709	1609	1514	1514
q9	6608	4532	4474	4474
q10	6844	1827	1540	1540
q11	437	280	251	251
q12	642	425	297	297
q13	18201	3399	2777	2777
q14	273	260	247	247
q15	q16	821	781	715	715
q17	1080	990	891	891
q18	6731	5756	5569	5569
q19	1287	1314	1119	1119
q20	527	414	273	273
q21	5954	2822	2716	2716
q22	462	378	325	325
Total cold run time: 101359 ms
Total hot run time: 29251 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4825	4860	4679	4679
q2	q3	5142	5218	4724	4724
q4	2165	2206	1398	1398
q5	5164	4773	4781	4773
q6	230	184	135	135
q7	1891	1720	1527	1527
q8	2437	2106	1910	1910
q9	7361	7409	7340	7340
q10	4712	4676	4254	4254
q11	526	388	381	381
q12	725	739	520	520
q13	3077	3419	2786	2786
q14	284	282	252	252
q15	q16	687	705	613	613
q17	1267	1238	1232	1232
q18	7569	6969	6889	6889
q19	1110	1104	1125	1104
q20	2219	2233	1951	1951
q21	5259	4579	4443	4443
q22	519	455	415	415
Total cold run time: 57169 ms
Total hot run time: 51326 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 168863 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c11a45f9c343189d1fa072e83d934d1d15f55010, data reload: false

query5	4336	638	477	477
query6	445	207	192	192
query7	4917	564	298	298
query8	371	223	200	200
query9	8787	4011	4009	4009
query10	442	308	262	262
query11	5839	2376	2188	2188
query12	169	114	101	101
query13	1289	596	436	436
query14	6398	5446	5087	5087
query14_1	4439	4457	4425	4425
query15	228	200	182	182
query16	1046	468	435	435
query17	1144	717	607	607
query18	2726	491	352	352
query19	212	189	149	149
query20	110	106	106	106
query21	232	142	117	117
query22	13615	13459	13372	13372
query23	17439	16511	16109	16109
query23_1	16172	16243	16318	16243
query24	7522	1768	1307	1307
query24_1	1277	1283	1296	1283
query25	536	455	381	381
query26	1341	318	170	170
query27	2668	572	327	327
query28	4438	2017	2029	2017
query29	1091	608	484	484
query30	320	238	200	200
query31	1128	1080	959	959
query32	102	64	60	60
query33	511	318	254	254
query34	1182	1119	666	666
query35	765	781	700	700
query36	1411	1407	1217	1217
query37	159	109	91	91
query38	3187	3136	3044	3044
query39	932	922	922	922
query39_1	881	886	859	859
query40	216	120	104	104
query41	65	64	64	64
query42	96	94	93	93
query43	324	321	276	276
query44	
query45	197	187	175	175
query46	1090	1205	761	761
query47	2337	2373	2259	2259
query48	399	398	292	292
query49	630	502	370	370
query50	979	355	258	258
query51	4320	4298	4190	4190
query52	89	89	78	78
query53	246	275	183	183
query54	265	222	200	200
query55	79	75	68	68
query56	232	234	223	223
query57	1423	1413	1315	1315
query58	257	207	218	207
query59	1545	1665	1434	1434
query60	278	248	233	233
query61	160	169	155	155
query62	690	661	573	573
query63	229	184	185	184
query64	2535	817	619	619
query65	
query66	1747	460	342	342
query67	29684	29630	29504	29504
query68	
query69	439	314	275	275
query70	958	915	922	915
query71	278	222	246	222
query72	3029	2645	2533	2533
query73	851	790	441	441
query74	5140	4923	4794	4794
query75	2660	2560	2245	2245
query76	2350	1181	784	784
query77	350	363	284	284
query78	12364	12384	11930	11930
query79	1337	1048	738	738
query80	587	471	381	381
query81	445	274	242	242
query82	568	157	123	123
query83	363	276	250	250
query84	256	137	110	110
query85	871	538	430	430
query86	368	310	293	293
query87	3365	3398	3185	3185
query88	3639	2740	2731	2731
query89	441	376	328	328
query90	1857	184	170	170
query91	172	168	134	134
query92	63	63	58	58
query93	1443	1406	825	825
query94	504	347	267	267
query95	683	474	351	351
query96	1042	768	350	350
query97	2673	2696	2571	2571
query98	207	206	201	201
query99	1165	1173	1029	1029
Total cold run time: 250638 ms
Total hot run time: 168863 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 90.91% (10/11) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 4.81% (10/208) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants