Describe the bug
estimate_join_cardinality (datafusion/physical-plan/src/joins/utils.rs) estimates a reduced row count for semi/anti joins but returns the preserved input's column statistics unchanged. The per-column stats then describe the full input rather than the emitted subset:
- null_count, distinct_count, and byte_size become inconsistent with the output num_rows (e.g. null_count > num_rows).
- sum_value still reflects the full input.
- Exact values are preserved even though a subset is only an estimate.
- Join-key columns just copy the input null count, but null keys never match — a semi join drops them all, an anti join keeps them all.
Similar issue for joins in general but it's more complex; will file a separate ticket.
To Reproduce
No response
Expected behavior
No response
Additional context
No response
Describe the bug
estimate_join_cardinality (datafusion/physical-plan/src/joins/utils.rs) estimates a reduced row count for semi/anti joins but returns the preserved input's column statistics unchanged. The per-column stats then describe the full input rather than the emitted subset:
Similar issue for joins in general but it's more complex; will file a separate ticket.
To Reproduce
No response
Expected behavior
No response
Additional context
No response