Skip to content

[WIP] Add declared file scan output partitioning#22657

Draft
gene-bordegaray wants to merge 3 commits into
apache:mainfrom
gene-bordegaray:gene.bordegaray/2026/05/file-scan-output-partitioning
Draft

[WIP] Add declared file scan output partitioning#22657
gene-bordegaray wants to merge 3 commits into
apache:mainfrom
gene-bordegaray:gene.bordegaray/2026/05/file-scan-output-partitioning

Conversation

@gene-bordegaray
Copy link
Copy Markdown
Contributor

@gene-bordegaray gene-bordegaray commented May 30, 2026

Which issue does this PR close?

Rationale for this change

This follows up on #22607 by replacing range-partitioning sqllogictest boilerplate with a general file/listing scan API for output partitioning.

Related: #22397, #21992, #22607, #22607 (comment), #22607 (comment).

What changes are included in this PR?

  • Add declared output_partitioning to file scan and listing table configuration.
  • Preserve declared partition counts during listing-table file grouping.
  • Scan output partitioning through proto.
  • Refactor range_partitioning.slt to eliminate boilerplate.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. This adds public API for declaring file/listing scan output partitioning. No breaking API changes.

@github-actions github-actions Bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate proto Related to proto crate datasource Changes to the datasource crate labels May 30, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 30, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion v53.1.0 (current)
       Built [ 103.256s] (current)
     Parsing datafusion v53.1.0 (current)
      Parsed [   0.038s] (current)
    Building datafusion v53.1.0 (baseline)
       Built [  99.691s] (baseline)
     Parsing datafusion v53.1.0 (baseline)
      Parsed [   0.037s] (baseline)
    Checking datafusion v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.848s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 206.257s] datafusion
    Building datafusion-catalog-listing v53.1.0 (current)
       Built [  42.926s] (current)
     Parsing datafusion-catalog-listing v53.1.0 (current)
      Parsed [   0.012s] (current)
    Building datafusion-catalog-listing v53.1.0 (baseline)
       Built [  42.467s] (baseline)
     Parsing datafusion-catalog-listing v53.1.0 (baseline)
      Parsed [   0.013s] (baseline)
    Checking datafusion-catalog-listing v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.100s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field ListingOptions.output_partitioning in /home/runner/work/datafusion/datafusion/datafusion/catalog-listing/src/options.rs:77

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  86.729s] datafusion-catalog-listing
    Building datafusion-datasource v53.1.0 (current)
       Built [  34.969s] (current)
     Parsing datafusion-datasource v53.1.0 (current)
      Parsed [   0.033s] (current)
    Building datafusion-datasource v53.1.0 (baseline)
       Built [  34.946s] (baseline)
     Parsing datafusion-datasource v53.1.0 (baseline)
      Parsed [   0.034s] (baseline)
    Checking datafusion-datasource v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.365s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  71.657s] datafusion-datasource
    Building datafusion-proto v53.1.0 (current)
       Built [  57.045s] (current)
     Parsing datafusion-proto v53.1.0 (current)
      Parsed [   0.019s] (current)
    Building datafusion-proto v53.1.0 (baseline)
       Built [  56.221s] (baseline)
     Parsing datafusion-proto v53.1.0 (baseline)
      Parsed [   0.021s] (baseline)
    Checking datafusion-proto v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.372s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 115.682s] datafusion-proto
    Building datafusion-proto-models v53.1.0 (current)
       Built [  23.250s] (current)
     Parsing datafusion-proto-models v53.1.0 (current)
      Parsed [   0.129s] (current)
    Building datafusion-proto-models v53.1.0 (baseline)
       Built [  23.338s] (baseline)
     Parsing datafusion-proto-models v53.1.0 (baseline)
      Parsed [   0.131s] (baseline)
    Checking datafusion-proto-models v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   2.315s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field FileScanExecConf.output_partitioning in /home/runner/work/datafusion/datafusion/datafusion/proto-models/src/generated/prost.rs:1699
  field FileScanExecConf.output_partitioning in /home/runner/work/datafusion/datafusion/datafusion/proto-models/src/generated/prost.rs:1699

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  50.425s] datafusion-proto-models
    Building datafusion-sqllogictest v53.1.0 (current)
       Built [ 168.321s] (current)
     Parsing datafusion-sqllogictest v53.1.0 (current)
      Parsed [   0.022s] (current)
    Building datafusion-sqllogictest v53.1.0 (baseline)
       Built [ 168.589s] (baseline)
     Parsing datafusion-sqllogictest v53.1.0 (baseline)
      Parsed [   0.024s] (baseline)
    Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.108s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 340.287s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change catalog Related to the catalog crate core Core DataFusion crate datasource Changes to the datasource crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support declared output partitioning for file/listing scans

1 participant