Skip to content

[feature](nereids) adjust min/max of column stats for cast function#21772

Merged
englefly merged 4 commits intoapache:masterfrom
englefly:vivo
Jul 14, 2023
Merged

[feature](nereids) adjust min/max of column stats for cast function#21772
englefly merged 4 commits intoapache:masterfrom
englefly:vivo

Conversation

@englefly
Copy link
Contributor

@englefly englefly commented Jul 12, 2023

Proposed changes

  1. cast(A as date), where A is a string column. the min/max of result column stats should be calc like this:
    convert A.minExpr to a date dateA, and then get double value from dateA.

  2. add "explain memo plan select ..." to print memo from mysql client

  3. dump column stats for FileScanNode, used in datalake.

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

englefly and others added 2 commits July 12, 2023 20:42
catch exception if minExpr cannot be converted to date
@englefly
Copy link
Contributor Author

run buildall

@englefly englefly changed the title Vivo string to date min max Jul 12, 2023
@morningman morningman added the dev/2.0.0 2.0.0 release label Jul 12, 2023
@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 51.26 seconds
stream load tsv: 506 seconds loaded 74807831229 Bytes, about 140 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 33.4 seconds inserted 10000000 Rows, about 299K ops/s
storage size: 17162493751 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230713010233_clickbench_pr_177539.html

@englefly
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 52.61 seconds
stream load tsv: 511 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 33.7 seconds inserted 10000000 Rows, about 296K ops/s
storage size: 17165427298 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230713093259_clickbench_pr_177592.html

@englefly englefly marked this pull request as ready for review July 13, 2023 08:44
@englefly englefly changed the title string to date min max [feature](nereids) adjust min/max of column stats for cast function Jul 13, 2023
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 13, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow
Copy link
Contributor

run beut feut clickbench

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 51.42 seconds
stream load tsv: 509 seconds loaded 74807831229 Bytes, about 140 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17160485797 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230714115452_clickbench_pr_178322.html

@englefly englefly merged commit 62214cd into apache:master Jul 14, 2023
@englefly englefly deleted the vivo branch July 14, 2023 04:54
BiteTheDDDDt pushed a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Jul 14, 2023
…pache#21772)

cast(A as date), where A is a string column. the min/max of result column stats should be calc like this:
convert A.minExpr to a date dateA, and then get double value from dateA.

add "explain memo plan select ..." to print memo from mysql client

dump column stats for FileScanNode, used in datalake.
@xiaokang xiaokang added dev/2.0.0-merged and removed dev/2.0.0 2.0.0 release labels Jul 14, 2023
xiaokang pushed a commit that referenced this pull request Jul 14, 2023
…21772)

cast(A as date), where A is a string column. the min/max of result column stats should be calc like this:
convert A.minExpr to a date dateA, and then get double value from dateA.

add "explain memo plan select ..." to print memo from mysql client

dump column stats for FileScanNode, used in datalake.
morningman pushed a commit to morningman/doris that referenced this pull request Jul 20, 2023
…pache#21772)

cast(A as date), where A is a string column. the min/max of result column stats should be calc like this:
convert A.minExpr to a date dateA, and then get double value from dateA.

add "explain memo plan select ..." to print memo from mysql client

dump column stats for FileScanNode, used in datalake.
LHG41278 pushed a commit to LHG41278/dorisMine that referenced this pull request Jul 20, 2023
…pache#21772)

cast(A as date), where A is a string column. the min/max of result column stats should be calc like this:
convert A.minExpr to a date dateA, and then get double value from dateA.

add "explain memo plan select ..." to print memo from mysql client

dump column stats for FileScanNode, used in datalake.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/nereids dev/2.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants