Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request][Spark] Optimize Min/Max using Delta metadata #2092

Closed
2 of 8 tasks
felipepessoto opened this issue Sep 22, 2023 · 0 comments
Closed
2 of 8 tasks

[Feature Request][Spark] Optimize Min/Max using Delta metadata #2092

felipepessoto opened this issue Sep 22, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@felipepessoto
Copy link
Contributor

felipepessoto commented Sep 22, 2023

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

Follow up of #1192, which optimizes COUNT. Add support for MIN and MAX using statistics only.

Motivation

Collecting MIN/MAX can take dozens of minutes for huge tables. Reading the metadata would give this information in seconds.

Further details

PR: #1525

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.

cc: @scottsand-db, @Tom-Newton, @keen85, @henlue, @moredatapls, @khwj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant