Skip to content

[Fix](Nereids) fix column statistic derive in outer join estimation#25586

Merged
englefly merged 1 commit into
apache:masterfrom
LiBinfeng-01:fix_nereids_statistic
Oct 24, 2023
Merged

[Fix](Nereids) fix column statistic derive in outer join estimation#25586
englefly merged 1 commit into
apache:masterfrom
LiBinfeng-01:fix_nereids_statistic

Conversation

@LiBinfeng-01
Copy link
Copy Markdown
Contributor

@LiBinfeng-01 LiBinfeng-01 commented Oct 18, 2023

Proposed changes

Problem:
When join estimation, upper join output slot statistic ndv would go wrong
Example:
we have two table:
tableA (a1[ndv = 10.0]) tableB(b1[ndv = 0.0], b2[ndv = 10.0])
tableA left join tableB on A.a1 = B.b1. which B.b1 with ndv zero.
the problem is after join estimation, B.b2 changed to 1.0.
Reason:
When estimating outer join, we can assume it behave like inner join. But we estimation then like inner join do
Solved:
When estimation outer join, output slot would update seperatly.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@LiBinfeng-01
Copy link
Copy Markdown
Contributor Author

run buildall

englefly
englefly previously approved these changes Oct 18, 2023
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Oct 18, 2023
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@wm1581066 wm1581066 added the usercase Important user case type label label Oct 18, 2023
@wm1581066 wm1581066 requested a review from jackwener October 18, 2023 13:02
@LiBinfeng-01
Copy link
Copy Markdown
Contributor Author

run buildall

@LiBinfeng-01 LiBinfeng-01 force-pushed the fix_nereids_statistic branch from ee490e7 to 740123a Compare October 24, 2023 10:06
@LiBinfeng-01
Copy link
Copy Markdown
Contributor Author

run buildall

@LiBinfeng-01
Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Oct 24, 2023
@doris-robot
Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.7 seconds
stream load tsv: 552 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17162004858 Bytes

@LiBinfeng-01
Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot
Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.77 seconds
stream load tsv: 550 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 68 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162273964 Bytes

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Oct 24, 2023
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@englefly englefly merged commit 4403451 into apache:master Oct 24, 2023
xiaokang pushed a commit that referenced this pull request Oct 25, 2023
…25586)

Problem:
When join estimation, upper join output slot statistic ndv would go wrong
Example:
we have two table:
tableA (a1[ndv = 10.0]) tableB(b1[ndv = 0.0], b2[ndv = 10.0])
tableA left join tableB on A.a1 = B.b1. which B.b1 with ndv zero.
the problem is after join estimation, B.b2 changed to 1.0.
Reason:
When estimating outer join, we can assume it behave like inner join. But we estimation then like inner join do
Solved:
When estimation outer join, output slot would update seperatly.
dutyu pushed a commit to dutyu/doris that referenced this pull request Oct 28, 2023
…pache#25586)

Problem:
When join estimation, upper join output slot statistic ndv would go wrong
Example:
we have two table:
tableA (a1[ndv = 10.0]) tableB(b1[ndv = 0.0], b2[ndv = 10.0])
tableA left join tableB on A.a1 = B.b1. which B.b1 with ndv zero.
the problem is after join estimation, B.b2 changed to 1.0.
Reason:
When estimating outer join, we can assume it behave like inner join. But we estimation then like inner join do
Solved:
When estimation outer join, output slot would update seperatly.
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
…pache#25586)

Problem:
When join estimation, upper join output slot statistic ndv would go wrong
Example:
we have two table:
tableA (a1[ndv = 10.0]) tableB(b1[ndv = 0.0], b2[ndv = 10.0])
tableA left join tableB on A.a1 = B.b1. which B.b1 with ndv zero.
the problem is after join estimation, B.b2 changed to 1.0.
Reason:
When estimating outer join, we can assume it behave like inner join. But we estimation then like inner join do
Solved:
When estimation outer join, output slot would update seperatly.
@xiaokang xiaokang mentioned this pull request Dec 4, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…pache#25586)

Problem:
When join estimation, upper join output slot statistic ndv would go wrong
Example:
we have two table:
tableA (a1[ndv = 10.0]) tableB(b1[ndv = 0.0], b2[ndv = 10.0])
tableA left join tableB on A.a1 = B.b1. which B.b1 with ndv zero.
the problem is after join estimation, B.b2 changed to 1.0.
Reason:
When estimating outer join, we can assume it behave like inner join. But we estimation then like inner join do
Solved:
When estimation outer join, output slot would update seperatly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants