Skip to content

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Nov 21, 2025

Which issue does this PR close?

Rationale for this change

DataFusion Comet often uses Sort Merge Joins because DataFusion does have a larger-than-memory Hash Join operator. Performance on TPC-H Q21 is quite bad when run through native, and instead Comet falls back to Spark by default. If you force Comet to use DataFusion's SMJ operator, performance is:

Screenshot 2025-11-21 at 11 31 18 AM

Profiling showed most of the time spent in concat_batches of single-digit rows:

Screenshot 2025-11-20 at 6 49 20 PM

What changes are included in this PR?

Use a BatchCoalescer both internally and to buffer final output. One made the biggest difference, but I switched to two to be consistent. Here are Comet results with the changes based on 50.3 (which is where Comet is):

Screenshot 2025-11-21 at 11 43 57 AM

I ported this PR to DF 51 afterwards, and found #17193 changed the API around batch coalescing. I might need to update this to support LimitedBatchCoalescerlike #18783

Benchmark results are below. For the new smj benchmark, Q12 is the most like TPC-H Q21 (left anti join with filter) and then there's TPC-H Q21 (with hash join disabled). I tried to run SF10 TPC-H but it seemed like it was going to take hours on my machine. It ran successfully on this PR.

./bench.sh compare_detail main smj
Comparing main and smj
--------------------
Benchmark smj.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃                               main ┃                                 smj ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │        2.04 / 3.34 ±2.56 / 8.45 ms │         2.00 / 3.29 ±2.45 / 8.19 ms │     no change │
│ QQuery 2     │     12.91 / 13.47 ±0.79 / 15.03 ms │      12.42 / 14.09 ±1.79 / 17.46 ms │     no change │
│ QQuery 3     │     68.83 / 73.67 ±3.59 / 78.93 ms │      74.27 / 76.41 ±1.69 / 78.58 ms │     no change │
│ QQuery 4     │        2.24 / 2.78 ±0.71 / 4.18 ms │         2.17 / 2.87 ±0.75 / 4.33 ms │     no change │
│ QQuery 5     │     20.46 / 22.31 ±1.18 / 23.99 ms │      18.85 / 19.68 ±0.59 / 20.47 ms │ +1.13x faster │
│ QQuery 6     │     12.25 / 12.97 ±0.57 / 13.60 ms │      12.55 / 12.89 ±0.31 / 13.38 ms │     no change │
│ QQuery 7     │     15.21 / 15.70 ±0.35 / 16.18 ms │      16.77 / 17.24 ±0.30 / 17.62 ms │  1.10x slower │
│ QQuery 8     │        2.47 / 2.59 ±0.11 / 2.78 ms │         2.46 / 2.62 ±0.09 / 2.75 ms │     no change │
│ QQuery 9     │     15.91 / 16.21 ±0.37 / 16.91 ms │      16.23 / 16.90 ±0.68 / 18.20 ms │     no change │
│ QQuery 10    │        5.69 / 5.77 ±0.12 / 6.00 ms │         5.64 / 5.98 ±0.23 / 6.31 ms │     no change │
│ QQuery 11    │        3.36 / 3.49 ±0.11 / 3.69 ms │         2.79 / 3.08 ±0.19 / 3.26 ms │ +1.13x faster │
│ QQuery 12    │     79.53 / 81.43 ±2.38 / 86.11 ms │      30.25 / 30.66 ±0.54 / 31.71 ms │ +2.66x faster │
│ QQuery 13    │        5.60 / 5.78 ±0.13 / 5.99 ms │         5.64 / 5.79 ±0.15 / 6.01 ms │     no change │
│ QQuery 14    │        5.66 / 5.95 ±0.17 / 6.16 ms │         5.60 / 5.80 ±0.25 / 6.28 ms │     no change │
│ QQuery 15    │        5.48 / 5.68 ±0.15 / 5.91 ms │         5.44 / 5.81 ±0.29 / 6.17 ms │     no change │
│ QQuery 16    │        1.73 / 1.84 ±0.07 / 1.92 ms │         1.80 / 1.85 ±0.04 / 1.90 ms │     no change │
│ QQuery 17    │     10.75 / 11.08 ±0.23 / 11.46 ms │       9.56 / 10.59 ±0.56 / 11.26 ms │     no change │
│ QQuery 18    │       9.09 / 9.67 ±0.35 / 10.02 ms │         7.97 / 8.27 ±0.26 / 8.61 ms │ +1.17x faster │
│ QQuery 19    │     24.72 / 25.67 ±0.72 / 26.59 ms │      24.84 / 25.74 ±0.68 / 26.81 ms │     no change │
│ QQuery 20    │ 861.69 / 909.57 ±39.61 / 958.08 ms │ 915.42 / 971.44 ±37.08 / 1030.81 ms │  1.07x slower │
└──────────────┴────────────────────────────────────┴─────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)      │ 1228.97ms │
│ Total Time (smj)       │ 1241.00ms │
│ Average Time (main)    │   61.45ms │
│ Average Time (smj)     │   62.05ms │
│ Queries Faster         │         4 │
│ Queries Slower         │         2 │
│ Queries with No Change │        14 │
│ Queries with Failure   │         0 │
└────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ Query        ┃                                           main ┃                               smj ┃           Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ QQuery 1     │                 44.37 / 48.67 ±4.54 / 55.68 ms │    49.10 / 51.78 ±2.55 / 56.50 ms │     1.06x slower │
│ QQuery 2     │                 45.18 / 47.44 ±2.39 / 51.74 ms │    45.98 / 47.07 ±1.10 / 49.15 ms │        no change │
│ QQuery 3     │                 52.59 / 56.15 ±2.65 / 59.79 ms │    55.38 / 60.31 ±5.21 / 66.90 ms │     1.07x slower │
│ QQuery 4     │                 33.06 / 34.46 ±0.97 / 35.88 ms │    32.39 / 33.70 ±0.80 / 34.47 ms │        no change │
│ QQuery 5     │                 84.50 / 87.63 ±2.06 / 90.58 ms │    84.21 / 87.19 ±2.37 / 90.40 ms │        no change │
│ QQuery 6     │                 17.87 / 18.64 ±0.48 / 19.22 ms │    16.78 / 18.06 ±0.96 / 19.37 ms │        no change │
│ QQuery 7     │              111.11 / 113.59 ±1.79 / 116.70 ms │ 108.28 / 116.71 ±7.15 / 127.92 ms │        no change │
│ QQuery 8     │                89.84 / 94.59 ±3.34 / 100.15 ms │  96.64 / 101.78 ±3.39 / 106.00 ms │     1.08x slower │
│ QQuery 9     │              128.36 / 133.12 ±3.46 / 138.00 ms │ 130.89 / 140.20 ±8.09 / 150.47 ms │     1.05x slower │
│ QQuery 10    │                 49.89 / 51.91 ±1.41 / 54.19 ms │    47.69 / 51.04 ±5.55 / 62.09 ms │        no change │
│ QQuery 11    │                 34.19 / 35.30 ±0.59 / 35.84 ms │    32.05 / 33.28 ±0.86 / 34.42 ms │    +1.06x faster │
│ QQuery 12    │                 36.26 / 38.67 ±2.44 / 42.77 ms │    32.70 / 33.54 ±0.55 / 34.16 ms │    +1.15x faster │
│ QQuery 13    │                 31.32 / 34.13 ±2.29 / 38.22 ms │    28.54 / 29.94 ±1.64 / 33.14 ms │    +1.14x faster │
│ QQuery 14    │                 23.54 / 24.79 ±0.92 / 26.00 ms │    22.30 / 23.21 ±0.84 / 24.71 ms │    +1.07x faster │
│ QQuery 15    │                 26.66 / 27.47 ±0.86 / 29.05 ms │    25.81 / 27.60 ±1.14 / 29.37 ms │        no change │
│ QQuery 16    │                 17.63 / 18.94 ±0.97 / 20.20 ms │    16.97 / 17.95 ±0.57 / 18.57 ms │    +1.06x faster │
│ QQuery 17    │                 94.36 / 96.41 ±1.62 / 98.44 ms │    92.20 / 94.14 ±2.07 / 98.03 ms │        no change │
│ QQuery 18    │               99.91 / 108.58 ±5.85 / 117.27 ms │ 103.51 / 107.20 ±2.69 / 111.14 ms │        no change │
│ QQuery 19    │                 35.23 / 36.68 ±1.46 / 39.23 ms │    32.54 / 34.29 ±1.04 / 35.46 ms │    +1.07x faster │
│ QQuery 20    │                 40.66 / 41.84 ±1.20 / 44.05 ms │    37.83 / 38.51 ±0.63 / 39.46 ms │    +1.09x faster │
│ QQuery 21    │ 151142.04 / 246274.24 ±89682.07 / 358766.84 ms │ 209.69 / 210.83 ±1.09 / 212.70 ms │ +1168.11x faster │
│ QQuery 22    │                16.69 / 28.53 ±22.72 / 73.97 ms │    16.37 / 16.89 ±0.31 / 17.35 ms │    +1.69x faster │
└──────────────┴────────────────────────────────────────────────┴───────────────────────────────────┴──────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary      ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main)      │ 247451.79ms │
│ Total Time (smj)       │   1375.22ms │
│ Average Time (main)    │  11247.81ms │
│ Average Time (smj)     │     62.51ms │
│ Queries Faster         │           9 │
│ Queries Slower         │           4 │
│ Queries with No Change │           9 │
│ Queries with Failure   │           0 │
└────────────────────────┴─────────────┘

Are these changes tested?

Existing Sort Merge Join unit tests, added a new benchmark.

Are there any user-facing changes?

There should not be.

…hes on vector of RecordBatches. Add benchmarks, update tests.
@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Nov 21, 2025
@comphead
Copy link
Contributor

+1168.11x faster

@mbutrovich
Copy link
Contributor Author

I have a bug somewhere the extended tests demonstrate. I'll try to track it down next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sort Merge Join is extremely slow on LeftAnti joins Performance regression after adding support for SMJ with join filter

2 participants