Skip to content

Conversation

@jizezhang
Copy link
Contributor

@jizezhang jizezhang commented Nov 30, 2025

Which issue does this PR close?

Rationale for this change

RepartitionExec has two cases: sort-preserving and non sort-preserving. This change integrates LimitedBatchCoalescer with the latter. For the former, it seems that SortPreservingMergeStream that builds on top of PerPartitionStreams has batching logic built in:

if self.advance_cursors(stream_idx) {
self.loser_tree_adjusted = false;
self.in_progress.push_row(stream_idx);
// stop sorting if fetch has been reached
if self.fetch_reached() {
self.done = true;
} else if self.in_progress.len() < self.batch_size {
continue;
}
}
hence I did not include in this change.

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Nov 30, 2025
@jizezhang jizezhang changed the title feat: integrate batch coalescer with repartition when unsorted feat: integrate batch coalescer with repartition Dec 1, 2025
@jizezhang jizezhang changed the title feat: integrate batch coalescer with repartition feat: integrate batch coalescer with repartition exec Dec 1, 2025
Ok(Box::pin(RecordBatchStreamAdapter::new(
self.schema(),
futures::stream::iter(self.batches.clone().into_iter().map(Ok)),
futures::stream::iter(self.batches.clone().into_iter().map(move |batch| {
Copy link
Contributor Author

@jizezhang jizezhang Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #18782 (comment) for my thoughts/reason on updating this test.

@Dandandan
Copy link
Contributor

Dandandan commented Dec 1, 2025

run benchmark tpch tpch_mem tpch10

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

🤖 Hi @Dandandan, thanks for the request (#19002 (comment)). scrape_comments.py only supports whitelisted benchmarks: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, tpch, tpch10, tpch_mem, tpch_mem10. Please choose one of these with run benchmark <name>.

@Dandandan
Copy link
Contributor

run benchmark tpch tpch_mem tpch10

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

🤖 Hi @Dandandan, thanks for the request (#19002 (comment)). scrape_comments.py only supports whitelisted benchmarks: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, tpch, tpch10, tpch_mem, tpch_mem10. Please choose one of these with run benchmark <name>.

@Dandandan
Copy link
Contributor

run benchmark tpch

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing coalesce2 (ad81d5b) to 73562e8 diff using: tpch
Results will be posted here when complete

@Dandandan
Copy link
Contributor

run benchmark tpch10

@Dandandan
Copy link
Contributor

Dandandan commented Dec 1, 2025

run benchmark clickbench_partitioned

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

run benchmark tpch tpch_mem tpch10

I will update my scraper to support this syntax FYI

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

🤖: Benchmark completed

Details

Comparing HEAD and coalesce2
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ coalesce2 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 226.59 ms │ 218.90 ms │    no change │
│ QQuery 2     │  91.33 ms │  88.77 ms │    no change │
│ QQuery 3     │ 126.15 ms │ 121.98 ms │    no change │
│ QQuery 4     │  94.13 ms │  95.54 ms │    no change │
│ QQuery 5     │ 176.02 ms │ 176.76 ms │    no change │
│ QQuery 6     │  61.20 ms │  66.98 ms │ 1.09x slower │
│ QQuery 7     │ 231.29 ms │ 227.95 ms │    no change │
│ QQuery 8     │ 154.72 ms │ 153.52 ms │    no change │
│ QQuery 9     │ 227.34 ms │ 228.20 ms │    no change │
│ QQuery 10    │ 177.50 ms │ 176.62 ms │    no change │
│ QQuery 11    │  73.37 ms │  74.55 ms │    no change │
│ QQuery 12    │ 118.87 ms │ 115.98 ms │    no change │
│ QQuery 13    │ 241.14 ms │ 230.53 ms │    no change │
│ QQuery 14    │  88.72 ms │  90.33 ms │    no change │
│ QQuery 15    │ 120.70 ms │ 119.02 ms │    no change │
│ QQuery 16    │  50.51 ms │  50.34 ms │    no change │
│ QQuery 17    │ 261.03 ms │ 261.01 ms │    no change │
│ QQuery 18    │ 311.54 ms │ 309.19 ms │    no change │
│ QQuery 19    │ 136.34 ms │ 132.75 ms │    no change │
│ QQuery 20    │ 121.35 ms │ 119.11 ms │    no change │
│ QQuery 21    │ 258.04 ms │ 257.21 ms │    no change │
│ QQuery 22    │  55.71 ms │  59.82 ms │ 1.07x slower │
└──────────────┴───────────┴───────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 3403.58ms │
│ Total Time (coalesce2)   │ 3375.06ms │
│ Average Time (HEAD)      │  154.71ms │
│ Average Time (coalesce2) │  153.41ms │
│ Queries Faster           │         0 │
│ Queries Slower           │         2 │
│ Queries with No Change   │        20 │
│ Queries with Failure     │         0 │
└──────────────────────────┴───────────┘

break;
}
let inner_poll = self.poll_next_inner(cx);
let _timer = cloned_time.timer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be before the poll_next_inner() call ?

}
Poll::Ready(None) => {
completed = true;
coalescer.finish()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
coalescer.finish()?;
if let Err(e) = coalescer.finish() {
self.batch_coalescer = Some(coalescer);
return self.baseline_metrics.record_poll(Poll::Ready(Some(Err(e))));
}

Otherwise in case of an error the self.batch_coalescer won't be restored.

}

impl PerPartitionStream {
#[allow(clippy::too_many_arguments)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[allow(clippy::too_many_arguments)]
#[expect(clippy::too_many_arguments)]

By using expect instead of allow the Clippy rule will fail too once it is no more needed and the developer will have to remove it. Otherwise it may become obsolete.

coalescer.finish()?;
}
Poll::Ready(Some(Ok(batch))) => {
coalescer.push_batch(batch)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
coalescer.push_batch(batch)?;
coalescer.?;
if let Err(e) = coalescer.push_batch(batch) {
self.batch_coalescer = Some(coalescer);
return self.baseline_metrics.record_poll(Poll::Ready(Some(Err(e))));
}

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing coalesce2 (ad81d5b) to 73562e8 diff using: tpch10
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

🤖: Benchmark completed

Details

Comparing HEAD and coalesce2
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃ HEAD ┃ coalesce2 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ FAIL │      FAIL │ incomparable │
│ QQuery 2     │ FAIL │      FAIL │ incomparable │
│ QQuery 3     │ FAIL │      FAIL │ incomparable │
│ QQuery 4     │ FAIL │      FAIL │ incomparable │
│ QQuery 5     │ FAIL │      FAIL │ incomparable │
│ QQuery 6     │ FAIL │      FAIL │ incomparable │
│ QQuery 7     │ FAIL │      FAIL │ incomparable │
│ QQuery 8     │ FAIL │      FAIL │ incomparable │
│ QQuery 9     │ FAIL │      FAIL │ incomparable │
│ QQuery 10    │ FAIL │      FAIL │ incomparable │
│ QQuery 11    │ FAIL │      FAIL │ incomparable │
│ QQuery 12    │ FAIL │      FAIL │ incomparable │
│ QQuery 13    │ FAIL │      FAIL │ incomparable │
│ QQuery 14    │ FAIL │      FAIL │ incomparable │
│ QQuery 15    │ FAIL │      FAIL │ incomparable │
│ QQuery 16    │ FAIL │      FAIL │ incomparable │
│ QQuery 17    │ FAIL │      FAIL │ incomparable │
│ QQuery 18    │ FAIL │      FAIL │ incomparable │
│ QQuery 19    │ FAIL │      FAIL │ incomparable │
│ QQuery 20    │ FAIL │      FAIL │ incomparable │
│ QQuery 21    │ FAIL │      FAIL │ incomparable │
│ QQuery 22    │ FAIL │      FAIL │ incomparable │
└──────────────┴──────┴───────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary        ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)        │ 0.00ms │
│ Total Time (coalesce2)   │ 0.00ms │
│ Average Time (HEAD)      │ 0.00ms │
│ Average Time (coalesce2) │ 0.00ms │
│ Queries Faster           │      0 │
│ Queries Slower           │      0 │
│ Queries with No Change   │      0 │
│ Queries with Failure     │     22 │
└──────────────────────────┴────────┘

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing coalesce2 (ad81d5b) to 73562e8 diff using: clickbench_partitioned
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

🤖: Benchmark completed

Details

Comparing HEAD and coalesce2
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃   coalesce2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.54 ms │     2.28 ms │ +1.11x faster │
│ QQuery 1     │    49.43 ms │    48.26 ms │     no change │
│ QQuery 2     │   141.68 ms │   135.92 ms │     no change │
│ QQuery 3     │   167.84 ms │   169.70 ms │     no change │
│ QQuery 4     │  1109.61 ms │  1109.71 ms │     no change │
│ QQuery 5     │  1564.45 ms │  1460.09 ms │ +1.07x faster │
│ QQuery 6     │     2.10 ms │     2.17 ms │     no change │
│ QQuery 7     │    53.72 ms │    53.86 ms │     no change │
│ QQuery 8     │  1442.77 ms │  1444.50 ms │     no change │
│ QQuery 9     │  1850.44 ms │  1840.30 ms │     no change │
│ QQuery 10    │   386.27 ms │   386.36 ms │     no change │
│ QQuery 11    │   442.59 ms │   443.57 ms │     no change │
│ QQuery 12    │  1362.32 ms │  1371.84 ms │     no change │
│ QQuery 13    │  2156.90 ms │  2179.08 ms │     no change │
│ QQuery 14    │  1297.52 ms │  1286.81 ms │     no change │
│ QQuery 15    │  1261.49 ms │  1270.57 ms │     no change │
│ QQuery 16    │  2720.10 ms │  2724.83 ms │     no change │
│ QQuery 17    │  2714.05 ms │  2730.19 ms │     no change │
│ QQuery 18    │  5680.46 ms │  5035.27 ms │ +1.13x faster │
│ QQuery 19    │   129.55 ms │   129.86 ms │     no change │
│ QQuery 20    │  2024.90 ms │  1977.22 ms │     no change │
│ QQuery 21    │  2352.77 ms │  2312.68 ms │     no change │
│ QQuery 22    │  4211.96 ms │  3934.38 ms │ +1.07x faster │
│ QQuery 23    │ 18000.61 ms │ 13216.69 ms │ +1.36x faster │
│ QQuery 24    │   233.11 ms │   230.93 ms │     no change │
│ QQuery 25    │   499.48 ms │   479.84 ms │     no change │
│ QQuery 26    │   242.46 ms │   216.20 ms │ +1.12x faster │
│ QQuery 27    │  2864.98 ms │  2799.69 ms │     no change │
│ QQuery 28    │ 23602.74 ms │ 23389.19 ms │     no change │
│ QQuery 29    │   949.89 ms │   959.83 ms │     no change │
│ QQuery 30    │  1357.85 ms │  1314.97 ms │     no change │
│ QQuery 31    │  1428.28 ms │  1431.00 ms │     no change │
│ QQuery 32    │  5228.53 ms │  4787.07 ms │ +1.09x faster │
│ QQuery 33    │  6258.64 ms │  5907.67 ms │ +1.06x faster │
│ QQuery 34    │  6085.73 ms │  6056.81 ms │     no change │
│ QQuery 35    │  1966.32 ms │  1866.72 ms │ +1.05x faster │
│ QQuery 36    │   124.12 ms │   123.47 ms │     no change │
│ QQuery 37    │    52.24 ms │    53.08 ms │     no change │
│ QQuery 38    │   122.86 ms │   120.67 ms │     no change │
│ QQuery 39    │   198.44 ms │   199.84 ms │     no change │
│ QQuery 40    │    42.44 ms │    42.58 ms │     no change │
│ QQuery 41    │    41.19 ms │    40.34 ms │     no change │
│ QQuery 42    │    32.82 ms │    32.66 ms │     no change │
└──────────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 102458.18ms │
│ Total Time (coalesce2)   │  95318.71ms │
│ Average Time (HEAD)      │   2382.75ms │
│ Average Time (coalesce2) │   2216.71ms │
│ Queries Faster           │           9 │
│ Queries Slower           │           0 │
│ Queries with No Change   │          34 │
│ Queries with Failure     │           0 │
└──────────────────────────┴─────────────┘

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

run benchmark clickbench_partitioned

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing coalesce2 (ad81d5b) to 73562e8 diff using: clickbench_partitioned
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

🤖: Benchmark completed

Details

Comparing HEAD and coalesce2
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃   coalesce2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.57 ms │     2.18 ms │ +1.18x faster │
│ QQuery 1     │    50.44 ms │    47.69 ms │ +1.06x faster │
│ QQuery 2     │   136.90 ms │   134.81 ms │     no change │
│ QQuery 3     │   162.05 ms │   168.08 ms │     no change │
│ QQuery 4     │  1094.06 ms │  1086.72 ms │     no change │
│ QQuery 5     │  1541.94 ms │  1525.10 ms │     no change │
│ QQuery 6     │     2.34 ms │     2.17 ms │ +1.08x faster │
│ QQuery 7     │    55.91 ms │    54.15 ms │     no change │
│ QQuery 8     │  1451.07 ms │  1456.40 ms │     no change │
│ QQuery 9     │  1908.18 ms │  1836.28 ms │     no change │
│ QQuery 10    │   385.10 ms │   388.50 ms │     no change │
│ QQuery 11    │   441.39 ms │   447.72 ms │     no change │
│ QQuery 12    │  1400.92 ms │  1398.13 ms │     no change │
│ QQuery 13    │  2199.70 ms │  2172.79 ms │     no change │
│ QQuery 14    │  1279.92 ms │  1287.41 ms │     no change │
│ QQuery 15    │  1272.00 ms │  1250.00 ms │     no change │
│ QQuery 16    │  2736.30 ms │  2733.82 ms │     no change │
│ QQuery 17    │  2728.84 ms │  2751.68 ms │     no change │
│ QQuery 18    │  5086.88 ms │  5006.47 ms │     no change │
│ QQuery 19    │   127.12 ms │   127.52 ms │     no change │
│ QQuery 20    │  1977.77 ms │  2022.12 ms │     no change │
│ QQuery 21    │  2336.95 ms │  2263.92 ms │     no change │
│ QQuery 22    │  3955.53 ms │  3911.55 ms │     no change │
│ QQuery 23    │ 13290.93 ms │ 13214.36 ms │     no change │
│ QQuery 24    │   214.56 ms │   208.51 ms │     no change │
│ QQuery 25    │   475.02 ms │   481.53 ms │     no change │
│ QQuery 26    │   221.55 ms │   212.00 ms │     no change │
│ QQuery 27    │  2824.26 ms │  2777.84 ms │     no change │
│ QQuery 28    │ 23475.49 ms │ 23340.85 ms │     no change │
│ QQuery 29    │   962.52 ms │   974.45 ms │     no change │
│ QQuery 30    │  1352.95 ms │  1335.20 ms │     no change │
│ QQuery 31    │  1398.84 ms │  1402.53 ms │     no change │
│ QQuery 32    │  4724.17 ms │  4504.97 ms │     no change │
│ QQuery 33    │  5858.55 ms │  5740.60 ms │     no change │
│ QQuery 34    │  6021.25 ms │  5797.69 ms │     no change │
│ QQuery 35    │  1930.45 ms │  1894.01 ms │     no change │
│ QQuery 36    │   119.47 ms │   124.62 ms │     no change │
│ QQuery 37    │    52.09 ms │    52.31 ms │     no change │
│ QQuery 38    │   119.69 ms │   120.04 ms │     no change │
│ QQuery 39    │   198.26 ms │   197.80 ms │     no change │
│ QQuery 40    │    42.01 ms │    44.22 ms │  1.05x slower │
│ QQuery 41    │    41.56 ms │    40.02 ms │     no change │
│ QQuery 42    │    32.83 ms │    31.95 ms │     no change │
└──────────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 95690.32ms │
│ Total Time (coalesce2)   │ 94570.71ms │
│ Average Time (HEAD)      │  2225.36ms │
│ Average Time (coalesce2) │  2199.32ms │
│ Queries Faster           │          3 │
│ Queries Slower           │          1 │
│ Queries with No Change   │         39 │
│ Queries with Failure     │          0 │
└──────────────────────────┴────────────┘

@alamb
Copy link
Contributor

alamb commented Dec 1, 2025

Thank you for this PR @jizezhang 🙏

My reading of the benchmark results so far is that this PR may be slightly faster, and doesn't cause any regressions. I'll try and patch up my ability to run tpch SF10 tests and then give this one a good review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants