Skip to content

[SPARK-39989][SQL][FollowUp] Improve foldable expression stats estimate for string and binary#37532

Closed
linhongliu-db wants to merge 3 commits intoapache:masterfrom
linhongliu-db:SPARK-39989
Closed

[SPARK-39989][SQL][FollowUp] Improve foldable expression stats estimate for string and binary#37532
linhongliu-db wants to merge 3 commits intoapache:masterfrom
linhongliu-db:SPARK-39989

Conversation

@linhongliu-db
Copy link
Contributor

@linhongliu-db linhongliu-db commented Aug 16, 2022

What changes were proposed in this pull request?

This PR improves the foldable expression statistics estimation by providing more accurate min, max, and data length for string and binary data types.

Why are the changes needed?

Improve the accuracy of the statistics.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@linhongliu-db linhongliu-db marked this pull request as draft August 16, 2022 05:09
@github-actions github-actions bot added the SQL label Aug 16, 2022
@linhongliu-db linhongliu-db marked this pull request as ready for review September 2, 2022 21:32
@linhongliu-db linhongliu-db changed the title [DRAFT][SPARK-39989][SQL][FollowUp] Improve foldable expression stats estimate [SPARK-39989][SQL][FollowUp] Improve foldable expression stats estimate for string and binary Sep 2, 2022
@linhongliu-db
Copy link
Contributor Author

cc @cloud-fan @wangyum

object EstimationUtils {

/** Returns true iff the we support column statistics on column of the given type. */
def supportsType(dataType: DataType): Boolean = dataType match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it only for constants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

case DoubleType | FloatType => true
case BooleanType => true
case DateType => true
case TimestampType => true
Copy link
Contributor

@cloud-fan cloud-fan Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about AnsiIntervalType?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 17, 2022
@github-actions github-actions bot closed this Dec 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments