Skip to content

[fix](nereids) Fix query rewrite by mv fail when self join#29227

Merged
morrySnow merged 4 commits intoapache:masterfrom
seawinde:fix_self_join_mv_rewrite_fail
Dec 29, 2023
Merged

[fix](nereids) Fix query rewrite by mv fail when self join#29227
morrySnow merged 4 commits intoapache:masterfrom
seawinde:fix_self_join_mv_rewrite_fail

Conversation

@seawinde
Copy link
Member

@seawinde seawinde commented Dec 28, 2023

Proposed changes

Fix query rewrite by mv fail when self join, after fix query like following can be rewrited

def materialized view = """
    select 
    a.o_orderkey,
    count(distinct a.o_orderstatus) num1,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
    AVG(a.o_totalprice) num5,
    MAX(b.o_totalprice) num6,
    MIN(a.o_totalprice) num7
    from
    orders a
    left outer join orders b
    on a.o_orderkey = b.o_orderkey
    and a.o_custkey = b.o_custkey
    group by a.o_orderkey;
    """
def query = """
    select 
    a.o_orderkey,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
    AVG(a.o_totalprice) num5,
    MAX(b.o_totalprice) num6,
    MIN(a.o_totalprice) num7
    from
    orders a
    left outer join orders b
    on a.o_orderkey = b.o_orderkey
    and a.o_custkey = b.o_custkey
    group by a.o_orderkey;
    """

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@seawinde
Copy link
Member Author

run buildall

}
Set<Long> sourceTableKeySet = sourceTableRelationIdMap.keySet();
List<List<Pair<MappedRelation, MappedRelation>>> mappedRelations = new ArrayList<>();
List<List<RelationMapping>> mappedRelations = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be in the future, you can refactor code by using ImmutableEquivalenceSet here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should do it right now

Copy link
Member Author

@seawinde seawinde Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to use ImmutableEquivalenceSet, i found that ImmutableEquivalenceSet may be not suit the scene
such as I want to make a relation mapping as
RelationId#1 -> RelationId#2 which should keep the directivity。
after call ImmutableEquivalenceSet.addEqualPair then tryToMap,i found get the result
is

RelationId#2 -> RelationId#1

Copy link
Member Author

@seawinde seawinde Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have discard Pair<MappedRelation, MappedRelation> and RelationMapping . And use HashBiMap instead directly. WDYT?

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@seawinde
Copy link
Member Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.23 seconds
stream load tsv: 581 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17183979502 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 68a3b5f313e9151327844c66ad30db044dea9eaf, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5067	4677	4700	4677
q2	368	159	166	159
q3	1486	1330	1207	1207
q4	1137	932	867	867
q5	3156	3164	3163	3163
q6	255	129	128	128
q7	1028	492	506	492
q8	2265	2256	2259	2256
q9	6725	6675	6675	6675
q10	3199	3276	3272	3272
q11	329	219	217	217
q12	353	211	209	209
q13	4130	3417	3462	3417
q14	237	209	216	209
q15	579	524	526	524
q16	436	388	379	379
q17	1035	802	574	574
q18	7129	6803	7007	6803
q19	1628	1641	1654	1641
q20	518	319	299	299
q21	3142	2697	2767	2697
q22	371	301	309	301
Total cold run time: 44573 ms
Total hot run time: 40166 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4610	4635	4612	4612
q2	271	177	171	171
q3	3385	3384	3363	3363
q4	2226	2211	2210	2210
q5	5719	5730	5714	5714
q6	245	124	122	122
q7	2364	1893	1852	1852
q8	3622	3633	3635	3633
q9	9060	8964	8948	8948
q10	3804	3914	3903	3903
q11	486	384	376	376
q12	773	598	604	598
q13	3912	3219	3202	3202
q14	288	257	253	253
q15	575	519	527	519
q16	511	445	454	445
q17	1984	1973	1961	1961
q18	8741	8300	8261	8261
q19	1778	1785	1780	1780
q20	2247	1952	1933	1933
q21	6163	5829	5790	5790
q22	562	467	471	467
Total cold run time: 63326 ms
Total hot run time: 60113 ms

@seawinde
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 726176941fdab8d8e55c969b44eaec93385640a9, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5463	5040	5060	5040
q2	416	169	158	158
q3	1496	1270	1237	1237
q4	1092	862	905	862
q5	3196	3044	3083	3044
q6	230	144	139	139
q7	989	581	529	529
q8	2168	2218	2282	2218
q9	6720	6721	6671	6671
q10	3201	3188	3108	3108
q11	337	225	225	225
q12	397	253	257	253
q13	4396	3633	3631	3631
q14	258	226	231	226
q15	637	562	562	562
q16	485	422	399	399
q17	1054	571	547	547
q18	7072	6720	6765	6720
q19	1644	1625	1605	1605
q20	582	370	358	358
q21	2925	2463	2450	2450
q22	380	332	342	332
Total cold run time: 45138 ms
Total hot run time: 40314 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5058	5054	5047	5047
q2	344	250	259	250
q3	3364	3289	3269	3269
q4	2161	2027	2028	2027
q5	5953	5927	5923	5923
q6	234	135	133	133
q7	2388	1937	1917	1917
q8	3552	3678	3678	3678
q9	9087	9055	9002	9002
q10	3879	3931	3941	3931
q11	593	491	475	475
q12	831	640	675	640
q13	3872	3161	3193	3161
q14	307	270	278	270
q15	619	577	571	571
q16	551	509	513	509
q17	2035	1799	1796	1796
q18	8785	8313	8389	8313
q19	1758	1742	1708	1708
q20	2308	2009	1976	1976
q21	5807	5328	5308	5308
q22	531	499	521	499
Total cold run time: 64017 ms
Total hot run time: 60403 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.91 seconds
stream load tsv: 578 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17183840362 Bytes

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 29, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit 9fc613d into apache:master Dec 29, 2023
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
)

Fix query rewrite by mv fail when self join, after fix query like following can be rewrited

def materialized view = """
    select 
    a.o_orderkey,
    count(distinct a.o_orderstatus) num1,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
    AVG(a.o_totalprice) num5,
    MAX(b.o_totalprice) num6,
    MIN(a.o_totalprice) num7
    from
    orders a
    left outer join orders b
    on a.o_orderkey = b.o_orderkey
    and a.o_custkey = b.o_custkey
    group by a.o_orderkey;
"""

def query = """
    select 
    a.o_orderkey,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
    AVG(a.o_totalprice) num5,
    MAX(b.o_totalprice) num6,
    MIN(a.o_totalprice) num7
    from
    orders a
    left outer join orders b
    on a.o_orderkey = b.o_orderkey
    and a.o_custkey = b.o_custkey
    group by a.o_orderkey;
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants