Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39989][SQL] Support estimate column statistics if it is foldable expression #37421

Closed
wants to merge 2 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Aug 5, 2022

What changes were proposed in this pull request?

This PR adds support estimate column statistics if it is foldable expression. For example: estimate the 'a' AS a's column statistics from SELECT 'a' AS a FROM tbl.

  1. If the foldable expression is null:
    ColumnStat(Some(0), None, None, Some(rowCount), Some(size), Some(size), None, 2)
  2. If the foldable expression is not null:
    ColumnStat(Some(1), Some(value), Some(value), Some(0), Some(size), Some(size), None, 2)

Why are the changes needed?

Improve column statistics.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Aug 5, 2022
Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongjoon-hyun
Copy link
Member

Thank you, @wangyum , @beliefer , @HyukjinKwon .
Merged to master for Apache Spark 3.4.

@wangyum wangyum deleted the SPARK-39989 branch August 10, 2022 05:06
wangyum added a commit that referenced this pull request May 26, 2023
…tistics if it is foldable expression (#1031)

### What changes were proposed in this pull request?

This PR adds support estimate column statistics if it is foldable expression. For example: estimate the `'a' AS a`'s column statistics from `SELECT 'a' AS a FROM tbl`.

1. If the foldable expression is null:
   ```scala
   ColumnStat(Some(0), None, None, Some(rowCount), Some(size), Some(size), None, 2)
   ```
2. If the foldable expression is not null:
   ```scala
   ColumnStat(Some(1), Some(value), Some(value), Some(0), Some(size), Some(size), None, 2)
   ```

### Why are the changes needed?

Improve column statistics.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #37421 from wangyum/SPARK-39989.

Lead-authored-by: Yuming Wang <yumwang@ebay.com>
Co-authored-by: Yuming Wang <wgyumg@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

(cherry picked from commit d77bc70)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants