Skip to content

fix: Correct output-count stats for partitioned partial aggs#22780

Open
neilconway wants to merge 1 commit into
apache:mainfrom
neilconway:neilc/fix-stats-partial-agg
Open

fix: Correct output-count stats for partitioned partial aggs#22780
neilconway wants to merge 1 commit into
apache:mainfrom
neilconway:neilc/fix-stats-partial-agg

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

A partial aggregate with no group by expression emits one row per output partition, even for input partitions that did not receive any rows. The stats code gets this incorrect, and claims that the partial agg outputs Exact(0) rows in this scenario. This is off by a factor of partition_count, which can lead to suboptimal planning decisions downstream.

What changes are included in this PR?

  • Pass the requested partition into statistics_inner and adjust statistics_inner as described above
  • Tighten optimizer metadata for grouping sets queries (minor correctness fix)
  • Add/extend existing unit tests

Are these changes tested?

Yes; new tests added.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Jun 5, 2026
@neilconway
Copy link
Copy Markdown
Contributor Author

cc @asolimando

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion v53.1.0 (current)
       Built [ 101.303s] (current)
     Parsing datafusion v53.1.0 (current)
      Parsed [   0.038s] (current)
    Building datafusion v53.1.0 (baseline)
       Built [  98.181s] (baseline)
     Parsing datafusion v53.1.0 (baseline)
      Parsed [   0.038s] (baseline)
    Checking datafusion v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.913s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 202.201s] datafusion
    Building datafusion-physical-plan v53.1.0 (current)
       Built [  34.876s] (current)
     Parsing datafusion-physical-plan v53.1.0 (current)
      Parsed [   0.133s] (current)
    Building datafusion-physical-plan v53.1.0 (baseline)
       Built [  34.219s] (baseline)
     Parsing datafusion-physical-plan v53.1.0 (baseline)
      Parsed [   0.134s] (baseline)
    Checking datafusion-physical-plan v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.856s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure method_parameter_count_changed: pub method parameter count changed ---

Description:
A publicly-visible method now takes a different number of parameters, not counting the receiver (self) parameter.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/method_parameter_count_changed.ron

Failed in:
  datafusion_physical_plan::aggregates::AggregateExec::compute_properties takes 6 parameters in /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/a5cdb2ad55310fc69c38f794c984085911be093f/datafusion/physical-plan/src/aggregates/mod.rs:1020, but now takes 7 parameters in /home/runner/work/datafusion/datafusion/datafusion/physical-plan/src/aggregates/mod.rs:1021

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  71.606s] datafusion-physical-plan

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 5, 2026
@neilconway
Copy link
Copy Markdown
Contributor Author

(This is a pretty marginal corner-case that I ran across while chasing other stuff, but still seems worth fixing.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change core Core DataFusion crate physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stats for no-grouping agg are wrong for partial output mode

1 participant