Skip to content

Conversation

@gene-bordegaray
Copy link
Contributor

@gene-bordegaray gene-bordegaray commented Nov 24, 2025

Full Report

Issue 18777 Parallelize Key Partitioned Data.pdf

Which issue does this PR close?

Rationale for this change

Optimize aggregations on Hive-partitioned tables by eliminating unnecessary repartitioning/coalescing when grouping by partition columns. This enables parallel computation of complete results without a merge bottleneck.

What changes are included in this PR?

  • Introduce new partitioning type KeyPartitioned
  • Save and propagate file partition metadata through query plan
  • Change aggregation mode selection in physical planner
  • Update enforce distribution rules to eliminate unnecessary repartitioning

Are these changes tested?

  • Unit and integration tests added for all new logic

Benchmarking

For tpch it was unaffected as expected (not partitioned):

Screenshot 2025-11-24 at 1 47 20 PM Screenshot 2025-11-24 at 1 47 38 PM

I create my own benchmark and saw these results:

Benchmarking hive_partitioned_agg/with_key_partitioned: Collecting 100 samples in estimated 6
hive_partitioned_agg/with_key_partitioned
                        time:   [12.356 ms 12.428 ms 12.505 ms]
                        change: [−1.6022% −0.8538% −0.0780%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking hive_partitioned_agg/without_key_partitioned: Collecting 100 samples in estimate
hive_partitioned_agg/without_key_partitioned
                        time:   [13.179 ms 13.278 ms 13.382 ms]
                        change: [−0.8465% +0.2090% +1.2419%] (p = 0.70 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)

These are not huge improvements as in memory hashing is pretty efficient but these are consistent gain (ran many times).

Are there any user-facing changes?

  • Yes, new configuration option: listing_table_preserve_partition_values
  • Changes query plans when activated

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate common Related to common crate proto Related to proto crate datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Nov 24, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 25, 2025
pub preserve_partition_values: bool,
/// Cached result of key_partition_exprs computation to avoid repeated work
#[allow(clippy::type_complexity)]
key_partition_exprs_cache: OnceLock<Option<Vec<Arc<dyn PhysicalExpr>>>>,
Copy link
Contributor Author

@gene-bordegaray gene-bordegaray Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caches results of compute_key_partition_exprs() which is expensive:

  • loops through file groups and does hash set operations
  • called multiple times (output_partitioning() and eq_properties())

}
Distribution::KeyPartitioned(_) => {
// Nothing to do: treated as satisfied upstream
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No-op because we can guarantee that our data is correctly distributed

02)--AggregateExec: mode=FinalPartitioned, gby=[a@0 as a], aggr=[nth_value(multiple_ordered_table.c,Int64(1)) ORDER BY [multiple_ordered_table.c ASC NULLS LAST]], ordering_mode=Sorted
03)----SortExec: expr=[a@0 ASC NULLS LAST], preserve_partitioning=[true]
04)------CoalesceBatchesExec: target_batch_size=8192
05)--------RepartitionExec: partitioning=Hash([a@0], 4), input_partitions=4
Copy link
Contributor Author

@gene-bordegaray gene-bordegaray Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminates this hash because it would break ordering guarantees

@gene-bordegaray gene-bordegaray marked this pull request as ready for review November 26, 2025 06:09
@gene-bordegaray
Copy link
Contributor Author

cc: @NGA-TRAN @alamb this is updated solution with report on why I chose what I did

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable Parallel Aggregation for Non-Overlapping Partitioned Data

1 participant