AggregateExec drops byte-size statistics causing incorrect join build-side selection and broken dynamic filtering

### Describe the bug

In DataFusion 51.1.0, joins involving a `GROUP BY` may choose the wrong build/probe side.
This happens because `AggregateExec` discards `total_byte_size` statistics, leaving only row-count stats.
The join-selection rule then falls back to comparing `num_rows`, which producing incorrect decisions for tables with uneven byte-size distributions

As a result, dynamic filters may be:
- Built on the large table, or
- Not generated at all.

### To Reproduce

This occurs when joining tables with extreme size asymmetry:

- large_bytes: ~50 rows, ~1 GB total
- many_rows: ~1,000,000 rows, ~50 KB total

Run:
```
SELECT *
FROM large_bytes
JOIN (
  SELECT id, join_key
  FROM many_rows
  GROUP BY ALL
) AS many_rows
ON large_bytes.id = many_rows.join_key;

```

Inspect the optimized plan (`EXPLAIN` or `df.explain(false, true)`).
The optimizer may select `large_bytes` as the build side because the aggregated `many_rows` loses its byte-size stats.

Minimal Rust test reproducing the issue: [repro_aggregate_bytes.rs](https://gist.github.com/Tamar-Posen/758bca567c9a591e0b9e5d0a1fa53633)

### Expected behavior

- AggregateExec should preserve or approximate total_byte_size.
- Join selection should use byte-size when available.
- Dynamic filters should consistently build on the smaller table.

### Additional context

Relevant areas:

- physical-plan/aggregates/mod.rs – statistics propagation
- physical-optimizer/join_selection.rs
- physical-optimizer/filter_pushdown.rs
- physical-plan/joins/hash_join/exec.rs

This behavior appears to be a limitation (design choice) rather than an incidental bug. Preserving or estimating byte-size stats after aggregation would improve join selection for skewed row-size distributions

I’m happy to submit a PR after discussing the desired approach for propagating or estimating byte-size stats through AggregateExec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AggregateExec drops byte-size statistics causing incorrect join build-side selection and broken dynamic filtering #18850

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AggregateExec drops byte-size statistics causing incorrect join build-side selection and broken dynamic filtering #18850

Description

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions