Skip to content

[fix](asof_join)PhysicalHashJoin's computeUniform method should process asof join properly#62730

Draft
starocean999 wants to merge 3 commits into
apache:masterfrom
starocean999:master_0423
Draft

[fix](asof_join)PhysicalHashJoin's computeUniform method should process asof join properly#62730
starocean999 wants to merge 3 commits into
apache:masterfrom
starocean999:master_0423

Conversation

@starocean999
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Extend PhysicalHashJoin to correctly handle ASOF join variants so trait propagation, equal-set extraction, and functional dependency (FD) calculations work for ASOF join

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@starocean999
Copy link
Copy Markdown
Contributor Author

/review

@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 23, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 1 blocking issue.

  1. fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/physical/PhysicalHashJoin.java: this switch update is logically consistent with the existing LogicalJoin handling, but there is still no test that covers the exact runtime path this PR fixes. Current tests cover logical trait derivation and shuffle/output-property derivation; the regression here is specifically physical logical-property recomputation after implementation, when RecomputeLogicalPropertiesProcessor rebuilds traits from PhysicalHashJoin.computeUniform(). Please add a unit test that drives an ASOF hash join through that physical/post-process path and asserts the expected uniform trait.

Critical checkpoint conclusions:

  • Goal and correctness: The goal is to propagate uniform traits for ASOF hash joins during physical logical-property recomputation. The code change matches the existing logical ASOF handling and looks correct, but there is no test proving this physical path.
  • Scope: The modification is small, clear, and focused.
  • Concurrency: Not involved.
  • Lifecycle: The relevant lifecycle is LogicalJoin -> PhysicalHashJoin -> RecomputeLogicalPropertiesProcessor; this PR addresses a real recomputation path.
  • Configuration / compatibility / persistence / FE-BE protocol: Not applicable.
  • Parallel code paths: I checked the nearby logical join and distribution-property derivation code. Those ASOF paths are already covered; this PR fills the physical computeUniform gap.
  • Special conditions: The added switch cases are straightforward and consistent with existing join-type handling.
  • Test coverage: Incomplete for the bug being fixed.
  • Observability: Not applicable.
  • User focus: No additional focus points were provided.

I did not find another distinct correctness issue in the code change itself.

@starocean999
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking issue in the added regression coverage. The production change in PhysicalHashJoin.computeUniform() looks aligned with the existing LogicalJoin handling for ASOF joins, but the new tests do not actually provide a valid physical-plan recompute path.

Critical checkpoints:

  • Goal and correctness: The implementation change appears to accomplish the ASOF uniform-propagation fix, but the PR still lacks a valid passing regression test for the recomputed physical-plan path because the new tests are built on an invalid plan shape.
  • Scope/minimality: The production diff is small and focused. The test addition is larger than necessary, and its current setup is incorrect.
  • Concurrency: Not involved.
  • Lifecycle/static initialization: Not involved.
  • Configuration changes: None.
  • Compatibility / FE-BE protocol / persistence / data writes: Not involved.
  • Parallel code paths: I checked the other relevant physical join path; PhysicalNestedLoopJoin blocks FD propagation, so PhysicalHashJoin is the right implementation site.
  • Special conditional checks: No additional concern beyond the ASOF join-type cases added here.
  • Test coverage: Blocking gap. The existing concern about physical-plan coverage is not fully resolved because the newly added tests fail before they can validate the recompute path.
  • Test results: I could not run ChildOutputPropertyDeriverTest end-to-end in this runner because fe-core depends on org.apache.doris:fe-foundation:1.2-SNAPSHOT, which is unavailable here. Independent code-path inspection shows the new tests will throw ClassCastException when the processor visits GroupPlan.
  • Observability / performance: No concerns from the production change.
  • Other issues: None beyond the invalid tests.

User focus: no additional review focus was provided.

@starocean999
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No additional findings on the current head.

Critical checkpoints:

  • Goal / correctness: The patch aligns PhysicalHashJoin.computeUniform() with the existing ASOF handling in LogicalJoin, so recomputed physical traits now preserve uniform slots for ASOF inner and outer joins. The added unit tests cover all four ASOF variants through the same resetLogicalProperties() + copyStatsAndGroupIdFrom() flow used by RecomputeLogicalPropertiesProcessor.
  • Scope / focus: The change is small and focused: one missing physical-join switch update plus targeted unit coverage.
  • Concurrency: No concurrency-sensitive code paths are introduced or modified.
  • Lifecycle / static init: No special lifecycle or static-initialization concerns are involved.
  • Config / compatibility / persistence: No new config, protocol, persistence, or rolling-upgrade compatibility concerns.
  • Parallel code paths: LogicalJoin and ChildOutputPropertyDeriver already handled ASOF variants; this change correctly brings PhysicalHashJoin back into alignment with those paths.
  • Tests: Unit coverage is present for ASOF_LEFT_INNER_JOIN, ASOF_RIGHT_INNER_JOIN, ASOF_LEFT_OUTER_JOIN, and ASOF_RIGHT_OUTER_JOIN after logical-property recomputation. I did not run FE tests locally because this runner is missing thirdparty/installed/bin/protoc.
  • Observability / performance: No additional observability is needed, and the runtime impact is negligible.
  • Focus points: No additional user-provided focus; nothing extra was outstanding there.

I treated the existing inline review threads as known context and did not find any new blocking issues beyond that context in this review pass.

@starocean999
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100% (0/0) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/45) 🎉
Increment coverage report
Complete coverage report

@starocean999
Copy link
Copy Markdown
Contributor Author

run buildall

@starocean999
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29414 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 77042b65705badb955908385e2105c64b92a2853, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17633	3908	4183	3908
q2	q3	10724	869	597	597
q4	4660	457	340	340
q5	7435	1338	1128	1128
q6	186	170	139	139
q7	916	952	743	743
q8	9296	1407	1267	1267
q9	5546	5325	5330	5325
q10	6314	2054	1810	1810
q11	489	257	255	255
q12	669	415	297	297
q13	18149	3686	2765	2765
q14	297	286	261	261
q15	q16	900	875	786	786
q17	988	1038	809	809
q18	6388	5684	5528	5528
q19	1199	1300	1022	1022
q20	504	371	264	264
q21	4502	2277	1869	1869
q22	422	350	301	301
Total cold run time: 97217 ms
Total hot run time: 29414 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4152	4052	4045	4045
q2	q3	4619	4752	4175	4175
q4	2076	2156	1371	1371
q5	4915	4961	5235	4961
q6	191	162	129	129
q7	2008	1755	1696	1696
q8	3552	3205	3182	3182
q9	8425	8383	8400	8383
q10	4459	4483	4240	4240
q11	660	445	407	407
q12	731	750	528	528
q13	3219	3596	2928	2928
q14	307	302	271	271
q15	q16	767	775	685	685
q17	1351	1321	1338	1321
q18	8060	7189	7051	7051
q19	1137	1185	1155	1155
q20	2263	2260	1945	1945
q21	6217	5388	4810	4810
q22	552	500	420	420
Total cold run time: 59661 ms
Total hot run time: 53703 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170883 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 77042b65705badb955908385e2105c64b92a2853, data reload: false

query5	4346	651	518	518
query6	352	213	197	197
query7	4221	580	316	316
query8	326	222	214	214
query9	8811	4117	4080	4080
query10	514	348	308	308
query11	6022	2396	2187	2187
query12	188	130	130	130
query13	1296	634	443	443
query14	6882	5369	5083	5083
query14_1	4382	4384	4393	4384
query15	215	210	180	180
query16	1011	449	442	442
query17	1370	778	649	649
query18	2753	488	360	360
query19	292	207	179	179
query20	141	134	130	130
query21	220	132	120	120
query22	13557	14020	14505	14020
query23	17527	16530	16230	16230
query23_1	16294	16308	16289	16289
query24	7583	1743	1331	1331
query24_1	1313	1340	1332	1332
query25	538	480	425	425
query26	1276	314	168	168
query27	2693	563	332	332
query28	4314	1989	1947	1947
query29	1004	627	504	504
query30	298	231	196	196
query31	1108	1050	923	923
query32	83	71	72	71
query33	516	334	291	291
query34	1156	1127	650	650
query35	770	774	695	695
query36	1311	1363	1226	1226
query37	151	107	89	89
query38	3197	3121	3076	3076
query39	930	920	907	907
query39_1	881	896	876	876
query40	229	154	134	134
query41	62	63	57	57
query42	108	106	107	106
query43	351	320	281	281
query44	
query45	210	204	192	192
query46	1036	1191	723	723
query47	2309	2245	2128	2128
query48	415	389	310	310
query49	625	547	430	430
query50	704	286	206	206
query51	4299	4271	4209	4209
query52	116	105	95	95
query53	261	274	218	218
query54	312	267	261	261
query55	90	86	83	83
query56	292	315	288	288
query57	1436	1436	1314	1314
query58	302	268	268	268
query59	1535	1639	1421	1421
query60	345	331	318	318
query61	156	158	153	153
query62	659	619	560	560
query63	247	200	199	199
query64	2360	821	693	693
query65	
query66	1702	529	386	386
query67	29960	29947	29781	29781
query68	
query69	477	338	312	312
query70	1026	1001	982	982
query71	304	279	280	279
query72	3072	2700	2458	2458
query73	835	714	423	423
query74	5065	4889	4724	4724
query75	2785	2654	2345	2345
query76	2307	1128	761	761
query77	406	439	346	346
query78	12945	12895	12440	12440
query79	1508	983	751	751
query80	1373	581	491	491
query81	519	276	238	238
query82	883	161	124	124
query83	350	269	244	244
query84	264	137	114	114
query85	927	517	446	446
query86	445	329	330	329
query87	3419	3349	3221	3221
query88	3558	2656	2635	2635
query89	435	383	336	336
query90	1916	178	184	178
query91	176	167	140	140
query92	81	74	71	71
query93	1111	955	557	557
query94	717	347	261	261
query95	680	450	336	336
query96	980	760	334	334
query97	2713	2674	2593	2593
query98	233	231	248	231
query99	1123	1115	971	971
Total cold run time: 255267 ms
Total hot run time: 170883 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100% (0/0) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100% (0/0) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants