Skip to content

Statistics::with_fetch promotes Inexact(0) to Exact(0), causing false COUNT(*) fold on multi-way joins #22413

@sesteves

Description

@sesteves

Describe the bug

Statistics::with_fetch unconditionally returns Precision::Exact(0) when nr <= skip, even when the input num_rows is Precision::Inexact. This precision promotion causes AggregateStatistics to falsely fold COUNT(*) to literal 0 on multi-way joins.

DataFusion version: 53.1.0

The problem

In stats.rs:454-464, both Exact(nr) and Inexact(nr) fall into the same match arm:

Statistics {
    num_rows: Precision::Exact(nr),
    ..
}
| Statistics {
    num_rows: Precision::Inexact(nr),
    ..
} => {
    if nr <= skip {
        Precision::Exact(0)  // ← promotes Inexact(0) to Exact(0)
    }

When nr = 0 and skip = 0, the condition 0 <= 0 is true and the function returns Exact(0) regardless of whether the input was Exact or Inexact.

How this causes incorrect query results

On multi-way joins (4+ tables), estimate_disjoint_inputs can produce a false disjoint detection when per-partition column min/max ranges appear non-overlapping (this depends on the physical partition layout and varies between runs). The chain:

  1. estimate_disjoint_inputs detects false disjoint ranges on join key columns.
  2. estimate_join_statistics wraps the cardinality as Inexact(0) (line 452, always Inexact).
  3. HashJoinExec::partition_statistics calls stats.with_fetch(self.fetch, 0, 1). With nr=0, skip=0, with_fetch promotes Inexact(0) to Exact(0).
  4. Count::value_from_stats (count.rs:371) requires Precision::Exact on num_rows to fold COUNT(*). The false Exact(0) matches, and AggregateStatistics replaces the aggregate with PlaceholderRowExec, returning 0.

The bug is flaky because it depends on the partition layout producing non-overlapping min/max ranges for join keys in the merged partition statistics. Different data distributions, partition counts, or memory constraints change the layout and may or may not trigger the false disjoint detection.

Example: TPC-H 4-table join

The following query over TPC-H data can return cnt = 0 instead of the correct result (~2.4M rows at SF150) when partition statistics happen to produce disjoint min/max ranges on the join keys:

SELECT COUNT(*) AS cnt
FROM part, partsupp, supplier, nation
WHERE p_partkey = ps_partkey
  AND s_suppkey = ps_suppkey
  AND s_nationkey = n_nationkey
  AND p_size = 15;

The conditions that trigger the bug:

  • 4+ tables joined via inner joins (deeper join trees propagate column stats through more layers).
  • Leaf tables with Exact column min/max stats (common with Parquet file metadata).
  • A selective filter (p_size = 15) that narrows the apparent min/max range of join keys in intermediate join outputs.
  • Multiple partitions whose merged statistics can appear disjoint.

This is the join skeleton of TPC-H Q2. The flakiness means it may not reproduce on every run; reducing the memory pool size (forcing different partition layouts) or increasing the number of partitions increases the probability.

To Reproduce

The simplest deterministic reproduction is a unit test against Statistics::with_fetch:

#[test]
fn test_with_fetch_preserves_inexact_precision() {
    let stats = Statistics {
        num_rows: Precision::Inexact(0),
        total_byte_size: Precision::Absent,
        column_statistics: vec![],
    };
    // fetch=None, skip=0, n_partitions=1
    let result = stats.with_fetch(None, 0, 1).unwrap();
    // Should preserve Inexact, not promote to Exact
    assert_eq!(result.num_rows, Precision::Inexact(0));
}

This test currently fails:

assertion `left == right` failed
  left: Exact(0)
  right: Inexact(0)

Expected behavior

When nr <= skip and the input num_rows is Inexact, with_fetch should return Inexact(0) (preserving the precision level), not Exact(0).

Suggested fix

Preserve the precision when nr <= skip:

if nr <= skip {
    if self.num_rows.is_exact().unwrap_or(false) {
        Precision::Exact(0)
    } else {
        Precision::Inexact(0)
    }
}

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions