Skip to content

Conversation

@Dandandan
Copy link
Contributor

This commit optimizes the CollectLeft execution mode in HashJoinExec by parallelizing the build-side processing. Previously, CollectLeft required the build side to be coalesced into a single partition, creating a performance bottleneck.

This optimization removes the single-partition requirement and introduces a new collect_left_input_parallel function. This function executes all build-side partitions concurrently, collects their batches, and then populates a shared hash map in parallel using a mutex for synchronization.

This change avoids the need for an explicit CoalescePartitions operator upstream and leverages available parallelism to significantly speed up the hash join build phase for CollectLeft scenarios.

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

This commit optimizes the `CollectLeft` execution mode in `HashJoinExec` by parallelizing the build-side processing. Previously, `CollectLeft` required the build side to be coalesced into a single partition, creating a performance bottleneck.

This optimization removes the single-partition requirement and introduces a new `collect_left_input_parallel` function. This function executes all build-side partitions concurrently, collects their batches, and then populates a shared hash map in parallel using a mutex for synchronization.

This change avoids the need for an explicit `CoalescePartitions` operator upstream and leverages available parallelism to significantly speed up the hash join build phase for `CollectLeft` scenarios.
@Dandandan
Copy link
Contributor Author

Run benchmarks

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jan 12, 2026
@Dandandan Dandandan changed the title feat(hashjoin): Parallelize CollectLeft build phase [TEST] Parallelize CollectLeft build phase Jan 12, 2026
@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing bolt-parallel-collect-left-1345809858742614255 (63b2366) to 418f62a diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and bolt-parallel-collect-left-1345809858742614255
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ bolt-parallel-collect-left-1345809858742614255 ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2488.98 ms │                                     2357.63 ms │ +1.06x faster │
│ QQuery 1 │   920.08 ms │                                      949.91 ms │     no change │
│ QQuery 2 │  1935.37 ms │                                     1921.09 ms │     no change │
│ QQuery 3 │  1184.81 ms │                                     1149.87 ms │     no change │
│ QQuery 4 │  2324.90 ms │                                     2268.88 ms │     no change │
│ QQuery 5 │ 28281.66 ms │                                    28575.40 ms │     no change │
│ QQuery 6 │  3868.30 ms │                                     3889.89 ms │     no change │
│ QQuery 7 │  3673.59 ms │                                     3888.50 ms │  1.06x slower │
└──────────┴─────────────┴────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                             │ 44677.68ms │
│ Total Time (bolt-parallel-collect-left-1345809858742614255)   │ 45001.16ms │
│ Average Time (HEAD)                                           │  5584.71ms │
│ Average Time (bolt-parallel-collect-left-1345809858742614255) │  5625.15ms │
│ Queries Faster                                                │          1 │
│ Queries Slower                                                │          1 │
│ Queries with No Change                                        │          6 │
│ Queries with Failure                                          │          0 │
└───────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ bolt-parallel-collect-left-1345809858742614255 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     1.42 ms │                                        1.43 ms │     no change │
│ QQuery 1  │    48.78 ms │                                       48.60 ms │     no change │
│ QQuery 2  │   133.03 ms │                                      135.89 ms │     no change │
│ QQuery 3  │   150.79 ms │                                      153.99 ms │     no change │
│ QQuery 4  │  1053.05 ms │                                     1107.02 ms │  1.05x slower │
│ QQuery 5  │  1351.13 ms │                                     1403.70 ms │     no change │
│ QQuery 6  │     1.44 ms │                                        1.43 ms │     no change │
│ QQuery 7  │    54.33 ms │                                       53.70 ms │     no change │
│ QQuery 8  │  1445.77 ms │                                     1478.28 ms │     no change │
│ QQuery 9  │  1818.52 ms │                                     1919.28 ms │  1.06x slower │
│ QQuery 10 │   348.59 ms │                                      364.88 ms │     no change │
│ QQuery 11 │   396.37 ms │                                      401.09 ms │     no change │
│ QQuery 12 │  1254.02 ms │                                     1334.83 ms │  1.06x slower │
│ QQuery 13 │  1897.45 ms │                                     1991.20 ms │     no change │
│ QQuery 14 │  1228.75 ms │                                     1261.53 ms │     no change │
│ QQuery 15 │  1255.08 ms │                                     1296.22 ms │     no change │
│ QQuery 16 │  2495.57 ms │                                     2619.32 ms │     no change │
│ QQuery 17 │  2508.66 ms │                                     2595.44 ms │     no change │
│ QQuery 18 │  5979.41 ms │                                     5069.04 ms │ +1.18x faster │
│ QQuery 19 │   121.71 ms │                                      120.10 ms │     no change │
│ QQuery 20 │  1924.31 ms │                                     1856.58 ms │     no change │
│ QQuery 21 │  2238.43 ms │                                     2158.38 ms │     no change │
│ QQuery 22 │  3866.43 ms │                                     3728.17 ms │     no change │
│ QQuery 23 │ 16078.36 ms │                                    12213.79 ms │ +1.32x faster │
│ QQuery 24 │   221.83 ms │                                      219.12 ms │     no change │
│ QQuery 25 │   478.40 ms │                                      455.10 ms │     no change │
│ QQuery 26 │   232.41 ms │                                      224.13 ms │     no change │
│ QQuery 27 │  2790.54 ms │                                     2701.70 ms │     no change │
│ QQuery 28 │ 24747.70 ms │                                    24449.66 ms │     no change │
│ QQuery 29 │   967.08 ms │                                      983.67 ms │     no change │
│ QQuery 30 │  1374.26 ms │                                     1353.30 ms │     no change │
│ QQuery 31 │  1335.02 ms │                                     1358.06 ms │     no change │
│ QQuery 32 │  5379.98 ms │                                     4895.11 ms │ +1.10x faster │
│ QQuery 33 │  5791.21 ms │                                     5694.71 ms │     no change │
│ QQuery 34 │  6048.21 ms │                                     6004.29 ms │     no change │
│ QQuery 35 │  1959.61 ms │                                     2021.08 ms │     no change │
│ QQuery 36 │    66.10 ms │                                       64.29 ms │     no change │
│ QQuery 37 │    44.04 ms │                                       43.98 ms │     no change │
│ QQuery 38 │    65.49 ms │                                       66.62 ms │     no change │
│ QQuery 39 │   102.42 ms │                                       99.96 ms │     no change │
│ QQuery 40 │    25.58 ms │                                       26.20 ms │     no change │
│ QQuery 41 │    22.39 ms │                                       24.33 ms │  1.09x slower │
│ QQuery 42 │    20.06 ms │                                       19.85 ms │     no change │
└───────────┴─────────────┴────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                             │ 99323.72ms │
│ Total Time (bolt-parallel-collect-left-1345809858742614255)   │ 94019.06ms │
│ Average Time (HEAD)                                           │  2309.85ms │
│ Average Time (bolt-parallel-collect-left-1345809858742614255) │  2186.49ms │
│ Queries Faster                                                │          3 │
│ Queries Slower                                                │          4 │
│ Queries with No Change                                        │         36 │
│ Queries with Failure                                          │          0 │
└───────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ bolt-parallel-collect-left-1345809858742614255 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 111.14 ms │                                      118.83 ms │  1.07x slower │
│ QQuery 2  │  28.64 ms │                                       18.34 ms │ +1.56x faster │
│ QQuery 3  │  32.42 ms │                                       28.40 ms │ +1.14x faster │
│ QQuery 4  │  28.83 ms │                                       29.01 ms │     no change │
│ QQuery 5  │  85.30 ms │                                       86.41 ms │     no change │
│ QQuery 6  │  19.50 ms │                                       19.93 ms │     no change │
│ QQuery 7  │ 222.93 ms │                                       32.24 ms │ +6.91x faster │
│ QQuery 8  │  34.33 ms │                                       26.67 ms │ +1.29x faster │
│ QQuery 9  │ 100.29 ms │                                       24.69 ms │ +4.06x faster │
│ QQuery 10 │  63.67 ms │                                       61.57 ms │     no change │
│ QQuery 11 │  17.67 ms │                                       11.55 ms │ +1.53x faster │
│ QQuery 12 │  49.48 ms │                                       51.20 ms │     no change │
│ QQuery 13 │  46.67 ms │                                       47.00 ms │     no change │
│ QQuery 14 │  13.30 ms │                                       13.28 ms │     no change │
│ QQuery 15 │  23.88 ms │                                       24.23 ms │     no change │
│ QQuery 16 │  23.58 ms │                                       24.57 ms │     no change │
│ QQuery 17 │ 150.52 ms │                                      152.61 ms │     no change │
│ QQuery 18 │ 271.35 ms │                                      273.70 ms │     no change │
│ QQuery 19 │  36.60 ms │                                       40.04 ms │  1.09x slower │
│ QQuery 20 │  47.90 ms │                                       49.68 ms │     no change │
│ QQuery 21 │ 312.48 ms │                                       61.67 ms │ +5.07x faster │
│ QQuery 22 │  17.26 ms │                                       17.89 ms │     no change │
└───────────┴───────────┴────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                             ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                             │ 1737.74ms │
│ Total Time (bolt-parallel-collect-left-1345809858742614255)   │ 1213.52ms │
│ Average Time (HEAD)                                           │   78.99ms │
│ Average Time (bolt-parallel-collect-left-1345809858742614255) │   55.16ms │
│ Queries Faster                                                │         7 │
│ Queries Slower                                                │         2 │
│ Queries with No Change                                        │        13 │
│ Queries with Failure                                          │         0 │
└───────────────────────────────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor Author

run benchmark tpch tpch_mem

@github-actions github-actions bot added functions Changes to functions implementation and removed functions Changes to functions implementation labels Jan 12, 2026
@Dandandan Dandandan force-pushed the bolt-parallel-collect-left-1345809858742614255 branch from af0d30f to 63b2366 Compare January 12, 2026 22:56
@Dandandan Dandandan closed this Jan 12, 2026
@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing bolt-parallel-collect-left-1345809858742614255 (63b2366) to 418f62a diff using: tpch
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and bolt-parallel-collect-left-1345809858742614255
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ bolt-parallel-collect-left-1345809858742614255 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 192.35 ms │                                      193.64 ms │     no change │
│ QQuery 2  │  95.76 ms │                                       70.34 ms │ +1.36x faster │
│ QQuery 3  │ 131.30 ms │                                      116.67 ms │ +1.13x faster │
│ QQuery 4  │  78.25 ms │                                       78.83 ms │     no change │
│ QQuery 5  │ 185.08 ms │                                      179.59 ms │     no change │
│ QQuery 6  │  69.65 ms │                                       67.23 ms │     no change │
│ QQuery 7  │ 220.96 ms │                                      151.46 ms │ +1.46x faster │
│ QQuery 8  │ 167.95 ms │                                      166.60 ms │     no change │
│ QQuery 9  │ 242.72 ms │                                      232.35 ms │     no change │
│ QQuery 10 │ 184.82 ms │                                      192.17 ms │     no change │
│ QQuery 11 │  80.84 ms │                                       68.82 ms │ +1.17x faster │
│ QQuery 12 │ 119.30 ms │                                      121.24 ms │     no change │
│ QQuery 13 │ 222.48 ms │                                      231.77 ms │     no change │
│ QQuery 14 │  94.77 ms │                                       96.67 ms │     no change │
│ QQuery 15 │ 125.30 ms │                                      123.07 ms │     no change │
│ QQuery 16 │  59.88 ms │                                       34.15 ms │ +1.75x faster │
│ QQuery 17 │ 281.51 ms │                                       92.68 ms │ +3.04x faster │
│ QQuery 18 │ 332.41 ms │                                      323.68 ms │     no change │
│ QQuery 19 │ 136.01 ms │                                      139.88 ms │     no change │
│ QQuery 20 │ 127.78 ms │                                       42.84 ms │ +2.98x faster │
│ QQuery 21 │ 271.29 ms │                                      178.74 ms │ +1.52x faster │
│ QQuery 22 │  43.01 ms │                                       60.30 ms │  1.40x slower │
└───────────┴───────────┴────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                             ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                             │ 3463.41ms │
│ Total Time (bolt-parallel-collect-left-1345809858742614255)   │ 2962.71ms │
│ Average Time (HEAD)                                           │  157.43ms │
│ Average Time (bolt-parallel-collect-left-1345809858742614255) │  134.67ms │
│ Queries Faster                                                │         8 │
│ Queries Slower                                                │         1 │
│ Queries with No Change                                        │        13 │
│ Queries with Failure                                          │         0 │
└───────────────────────────────────────────────────────────────┴───────────┘

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing bolt-parallel-collect-left-1345809858742614255 (63b2366) to 418f62a diff using: tpch_mem
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and bolt-parallel-collect-left-1345809858742614255
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ bolt-parallel-collect-left-1345809858742614255 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 142.82 ms │                                      119.68 ms │ +1.19x faster │
│ QQuery 2  │  33.22 ms │                                       18.39 ms │ +1.81x faster │
│ QQuery 3  │  43.23 ms │                                       29.61 ms │ +1.46x faster │
│ QQuery 4  │  36.46 ms │                                       30.35 ms │ +1.20x faster │
│ QQuery 5  │ 107.86 ms │                                       90.30 ms │ +1.19x faster │
│ QQuery 6  │  25.56 ms │                                       20.38 ms │ +1.25x faster │
│ QQuery 7  │ 247.89 ms │                                       32.64 ms │ +7.59x faster │
│ QQuery 8  │  37.34 ms │                                       26.53 ms │ +1.41x faster │
│ QQuery 9  │ 110.04 ms │                                       25.94 ms │ +4.24x faster │
│ QQuery 10 │  63.90 ms │                                       64.12 ms │     no change │
│ QQuery 11 │  19.11 ms │                                       11.95 ms │ +1.60x faster │
│ QQuery 12 │  52.62 ms │                                       52.38 ms │     no change │
│ QQuery 13 │  48.11 ms │                                       48.34 ms │     no change │
│ QQuery 14 │  14.72 ms │                                       13.90 ms │ +1.06x faster │
│ QQuery 15 │  24.68 ms │                                       25.29 ms │     no change │
│ QQuery 16 │  24.74 ms │                                       25.15 ms │     no change │
│ QQuery 17 │ 159.85 ms │                                      160.37 ms │     no change │
│ QQuery 18 │ 284.65 ms │                                      289.03 ms │     no change │
│ QQuery 19 │  37.85 ms │                                       37.33 ms │     no change │
│ QQuery 20 │  51.36 ms │                                       52.45 ms │     no change │
│ QQuery 21 │ 332.30 ms │                                       55.06 ms │ +6.03x faster │
│ QQuery 22 │  18.23 ms │                                       17.71 ms │     no change │
└───────────┴───────────┴────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                             ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                             │ 1916.52ms │
│ Total Time (bolt-parallel-collect-left-1345809858742614255)   │ 1246.90ms │
│ Average Time (HEAD)                                           │   87.11ms │
│ Average Time (bolt-parallel-collect-left-1345809858742614255) │   56.68ms │
│ Queries Faster                                                │        12 │
│ Queries Slower                                                │         0 │
│ Queries with No Change                                        │        10 │
│ Queries with Failure                                          │         0 │
└───────────────────────────────────────────────────────────────┴───────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants