Skip to content

[Feature Request][Spark] Optimize Min/Max using Delta metadata #2092

@felipepessoto

Description

@felipepessoto

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

Follow up of #1192, which optimizes COUNT. Add support for MIN and MAX using statistics only.

Motivation

Collecting MIN/MAX can take dozens of minutes for huge tables. Reading the metadata would give this information in seconds.

Further details

PR: #1525

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.

cc: @scottsand-db, @Tom-Newton, @keen85, @henlue, @moredatapls, @khwj

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions