Call take arrays once per repartitioned input batch by gene-bordegaray · Pull Request #22159 · apache/datafusion

gene-bordegaray · 2026-05-13T20:45:36Z

Which issue does this PR close?

Does not close an issue.
This is targeted toward high fanout repartitions we see in shuffles in distributed-datafusion (Improve shuffling performance datafusion-contrib/datafusion-distributed#385, High elapsed compute in network shuffle with 100s of tasks datafusion-contrib/datafusion-distributed#353).
Related ongoing effort to improve metrics and repartition performance: Add Granular Metrics to RepartitionExec #21148, Add internal EXPLAIN ANALYZE metric level #22155.

Rationale for this change

Hash repartition currently builds one output batch per non-empty target partition by calling take_arrays separately for each partition. At high fanout this means an input batch can issue many take kernels, which shows in repartition-heavy queries.

This changes hash repartition to concatenate the per-partition row indices, call take_arrays once for the input batch, and then slice the reordered batch back into per-partition output batches.

This is complementary to #22010: that PR reduces channel/gate traffic from many small batches, while this PR reduces the Arrow take-kernel work required to create the repartitioned batches.

What changes are included in this PR?

Replaces per-partition hash repartition take_arrays calls with one grouped take_arrays call per input batch.
Tracks partition ranges into the grouped reordered batch and returns zero-copy RecordBatch::slice outputs for each non-empty partition.

How the grouped take works:

input rows:        0   1   2   3   4   5   6

partition 0:      [2, 5]
partition 1:      []
partition 2:      [0, 3, 4]
partition 3:      [1, 6]

grouped indices:  [2, 5, 0, 3, 4, 1, 6]
partition ranges: [(0, start=0, len=2),
                   (2, start=2, len=3),
                   (3, start=5, len=2)]

take once:        rows [2, 5, 0, 3, 4, 1, 6]
slice outputs:    partition 0 = slice(0, 2)
                  partition 2 = slice(2, 3)
                  partition 3 = slice(5, 2)

Are these changes tested?

cargo test -p datafusion-physical-plan repartition --lib

Benchmarks:

Default TPCH SF10 summary, with no --batch-size override:

Partitions	main total ms	grouped total ms	change	wins	losses	biggest win	biggest loss
8	6234.28	6149.98	-1.35%	10	12	Q3 -22.12%	Q21 5.47%
16	5602.63	5427.40	-3.13%	18	4	Q21 -10.67%	Q10 4.47%
32	6097.10	5738.12	-5.89%	20	2	Q8 -10.45%	Q6 0.47%
64	7194.70	6693.30	-6.97%	15	7	Q21 -15.92%	Q1 5.02%
300	26276.60	23701.32	-9.80%	19	3	Q21 -24.24%	Q1 9.45%

TPCH SF10 default batch size, 8 partitions, all queries

Query	main ms	grouped ms	change	speedup
Q1	411.40	356.46	-13.35%	1.154x
Q2	117.97	95.68	-18.89%	1.233x
Q3	278.88	217.20	-22.12%	1.284x
Q4	104.92	98.47	-6.15%	1.065x
Q5	311.36	302.12	-2.97%	1.031x
Q6	133.82	135.13	0.98%	0.990x
Q7	366.32	360.50	-1.59%	1.016x
Q8	408.86	391.14	-4.33%	1.045x
Q9	509.62	515.74	1.20%	0.988x
Q10	278.97	273.80	-1.85%	1.019x
Q11	76.55	75.34	-1.58%	1.016x
Q12	175.56	180.87	3.03%	0.971x
Q13	211.03	221.03	4.74%	0.955x
Q14	183.05	189.95	3.77%	0.964x
Q15	318.46	328.85	3.26%	0.968x
Q16	57.66	57.70	0.08%	0.999x
Q17	505.05	503.77	-0.25%	1.003x
Q18	596.49	602.77	1.05%	0.990x
Q19	273.60	283.41	3.58%	0.965x
Q20	255.73	267.30	4.52%	0.957x
Q21	592.44	624.83	5.47%	0.948x
Q22	66.54	67.91	2.05%	0.980x

TPCH SF10 default batch size, 16 partitions, all queries

Query	main ms	grouped ms	change	speedup
Q1	278.95	276.85	-0.76%	1.008x
Q2	95.84	94.86	-1.02%	1.010x
Q3	205.24	195.05	-4.97%	1.052x
Q4	93.60	90.58	-3.23%	1.033x
Q5	292.34	289.93	-0.82%	1.008x
Q6	103.59	106.02	2.35%	0.977x
Q7	370.86	365.40	-1.47%	1.015x
Q8	392.01	364.63	-6.98%	1.075x
Q9	512.66	483.45	-5.70%	1.060x
Q10	246.01	257.01	4.47%	0.957x
Q11	73.33	70.21	-4.26%	1.045x
Q12	143.58	143.53	-0.04%	1.000x
Q13	193.30	193.57	0.14%	0.999x
Q14	157.01	146.14	-6.92%	1.074x
Q15	255.58	258.73	1.23%	0.988x
Q16	58.90	57.14	-2.97%	1.031x
Q17	515.22	491.16	-4.67%	1.049x
Q18	518.50	516.90	-0.31%	1.003x
Q19	212.28	211.63	-0.30%	1.003x
Q20	229.34	224.54	-2.09%	1.021x
Q21	594.29	530.87	-10.67%	1.119x
Q22	60.20	59.18	-1.70%	1.017x

TPCH SF10 default batch size, 32 partitions, all queries

Query	main ms	grouped ms	change	speedup
Q1	294.06	286.80	-2.47%	1.025x
Q2	110.59	103.55	-6.37%	1.068x
Q3	227.86	215.08	-5.61%	1.059x
Q4	110.99	102.77	-7.41%	1.080x
Q5	347.34	319.49	-8.02%	1.087x
Q6	109.39	109.91	0.47%	0.995x
Q7	423.98	385.42	-9.09%	1.100x
Q8	428.83	384.00	-10.45%	1.117x
Q9	559.81	510.72	-8.77%	1.096x
Q10	269.17	265.02	-1.54%	1.016x
Q11	85.99	80.21	-6.72%	1.072x
Q12	153.67	150.36	-2.15%	1.022x
Q13	197.90	188.72	-4.64%	1.049x
Q14	161.28	153.93	-4.55%	1.048x
Q15	259.92	261.01	0.42%	0.996x
Q16	64.90	63.70	-1.85%	1.019x
Q17	574.18	531.76	-7.39%	1.080x
Q18	572.23	538.68	-5.86%	1.062x
Q19	227.95	220.19	-3.41%	1.035x
Q20	232.25	225.80	-2.78%	1.029x
Q21	622.54	579.94	-6.84%	1.073x
Q22	62.29	61.06	-1.98%	1.020x

TPCH SF10 default batch size, 64 partitions, all queries

Query	main ms	grouped ms	change	speedup
Q1	285.10	299.41	5.02%	0.952x
Q2	161.42	153.51	-4.90%	1.052x
Q3	297.85	272.09	-8.65%	1.095x
Q4	147.28	140.69	-4.47%	1.047x
Q5	428.29	381.28	-10.98%	1.123x
Q6	106.27	108.41	2.02%	0.980x
Q7	494.50	443.89	-10.23%	1.114x
Q8	507.01	446.69	-11.90%	1.135x
Q9	667.11	624.78	-6.34%	1.068x
Q10	294.91	299.05	1.40%	0.986x
Q11	112.17	104.79	-6.57%	1.070x
Q12	168.23	166.87	-0.81%	1.008x
Q13	198.74	196.30	-1.23%	1.012x
Q14	175.31	177.74	1.39%	0.986x
Q15	265.68	265.71	0.01%	1.000x
Q16	85.69	82.93	-3.22%	1.033x
Q17	691.32	629.09	-9.00%	1.099x
Q18	697.85	617.88	-11.46%	1.129x
Q19	237.00	243.78	2.86%	0.972x
Q20	272.55	278.57	2.21%	0.978x
Q21	813.29	683.82	-15.92%	1.189x
Q22	87.14	76.04	-12.73%	1.146x

TPCH SF10 default batch size, 300 partitions, all queries

Query	main ms	grouped ms	change	speedup
Q1	277.84	304.08	9.45%	0.914x
Q2	1303.10	1268.65	-2.64%	1.027x
Q3	1496.14	1393.15	-6.88%	1.074x
Q4	681.20	652.85	-4.16%	1.043x
Q5	1680.43	1469.91	-12.53%	1.143x
Q6	100.65	105.75	5.07%	0.952x
Q7	1880.26	1652.83	-12.10%	1.138x
Q8	1956.81	1760.72	-10.02%	1.111x
Q9	1787.75	1454.84	-18.62%	1.229x
Q10	1334.62	1296.02	-2.89%	1.030x
Q11	1018.99	994.13	-2.44%	1.025x
Q12	768.97	780.88	1.55%	0.985x
Q13	671.88	638.51	-4.97%	1.052x
Q14	603.10	586.19	-2.80%	1.029x
Q15	302.19	295.28	-2.29%	1.023x
Q16	597.64	585.39	-2.05%	1.021x
Q17	1963.57	1712.46	-12.79%	1.147x
Q18	1942.46	1634.96	-15.83%	1.188x
Q19	818.85	808.19	-1.30%	1.013x
Q20	1499.39	1468.55	-2.06%	1.021x
Q21	3006.35	2277.49	-24.24%	1.320x
Q22	584.39	560.47	-4.09%	1.043x

Stress cases:

These runs use --batch-size 1024 to stress the repartition path. They are included to show the mechanism under smaller input batches and higher output fanout, not as the primary end-to-end performance claim.

TPCH SF10, 8 partitions, all queries

Query	main ms	grouped ms	change	speedup
Q1	640.78	451.82	-29.49%	1.420x
Q2	315.81	150.07	-52.48%	2.100x
Q3	899.21	375.88	-58.20%	2.390x
Q4	469.31	217.07	-53.75%	2.160x
Q5	1131.37	446.36	-60.55%	2.530x
Q6	376.40	163.66	-56.52%	2.300x
Q7	1388.40	484.36	-65.11%	2.870x
Q8	1369.67	571.62	-58.27%	2.400x
Q9	1834.81	739.88	-59.68%	2.480x
Q10	813.73	361.94	-55.52%	2.250x
Q11	267.06	114.84	-57.00%	2.330x
Q12	526.41	250.39	-52.43%	2.100x
Q13	760.54	324.78	-57.30%	2.340x
Q14	446.91	221.04	-50.54%	2.020x
Q15	764.64	375.67	-50.87%	2.040x
Q16	167.74	80.36	-52.09%	2.090x
Q17	1801.72	763.58	-57.62%	2.360x
Q18	3303.89	1649.87	-50.06%	2.000x
Q19	694.16	354.97	-48.86%	1.960x
Q20	693.91	323.17	-53.43%	2.150x
Q21	3112.83	1065.36	-65.78%	2.920x
Q22	205.88	95.97	-53.38%	2.150x

TPCH SF10, 16 partitions, all queries

Query	main ms	grouped ms	change	speedup
Q1	518.84	328.66	-36.66%	1.580x
Q2	350.47	148.64	-57.59%	2.360x
Q3	1003.55	371.04	-63.03%	2.700x
Q4	589.70	258.05	-56.24%	2.290x
Q5	1343.64	506.01	-62.34%	2.660x
Q6	322.21	130.93	-59.37%	2.460x
Q7	1527.85	550.88	-63.94%	2.770x
Q8	1476.46	578.77	-60.80%	2.550x
Q9	2091.16	785.54	-62.44%	2.660x
Q10	817.98	331.02	-59.53%	2.470x
Q11	341.46	123.31	-63.89%	2.770x
Q12	493.51	221.61	-55.10%	2.230x
Q13	690.52	290.54	-57.92%	2.380x
Q14	410.54	171.45	-58.24%	2.390x
Q15	733.96	290.56	-60.41%	2.530x
Q16	197.35	86.09	-56.37%	2.290x
Q17	2089.12	828.96	-60.32%	2.520x
Q18	2712.00	1097.77	-59.52%	2.470x
Q19	602.77	260.74	-56.74%	2.310x
Q20	661.20	288.58	-56.35%	2.290x
Q21	5490.50	1151.50	-79.03%	4.770x
Q22	198.38	103.13	-48.01%	1.920x

TPCH SF10, 32 partitions, all queries

Query	main ms	grouped ms	change	speedup
Q1	533.86	338.54	-36.59%	1.580x
Q2	439.59	199.50	-54.62%	2.200x
Q3	1242.19	510.11	-58.93%	2.440x
Q4	743.92	363.33	-51.16%	2.050x
Q5	1711.97	666.50	-61.07%	2.570x
Q6	325.39	134.07	-58.80%	2.430x
Q7	1947.59	722.22	-62.92%	2.700x
Q8	1914.31	775.62	-59.48%	2.470x
Q9	2662.07	976.47	-63.32%	2.730x
Q10	902.80	362.71	-59.82%	2.490x
Q11	400.93	170.81	-57.40%	2.350x
Q12	572.19	265.06	-53.68%	2.160x
Q13	736.31	296.82	-59.69%	2.480x
Q14	430.11	180.93	-57.93%	2.380x
Q15	732.36	327.12	-55.33%	2.240x
Q16	245.97	116.24	-52.74%	2.120x
Q17	2711.18	1100.17	-59.42%	2.460x
Q18	2946.70	1176.02	-60.09%	2.510x
Q19	600.47	258.58	-56.94%	2.320x
Q20	765.20	337.01	-55.96%	2.270x
Q21	10062.70	1534.95	-84.75%	6.560x
Q22	250.50	128.27	-48.79%	1.950x

TPCH SF10, 64 partitions, all queries

Query	main ms	grouped ms	change	speedup
Q1	595.70	324.74	-45.49%	1.830x
Q2	663.08	305.30	-53.96%	2.170x
Q3	1744.90	727.81	-58.29%	2.400x
Q4	1070.72	566.20	-47.12%	1.890x
Q5	2447.07	938.91	-61.63%	2.610x
Q6	315.47	132.73	-57.93%	2.380x
Q7	2807.33	1004.83	-64.21%	2.790x
Q8	2674.51	1069.64	-60.01%	2.500x
Q9	3777.94	1424.08	-62.31%	2.650x
Q10	1086.91	469.38	-56.82%	2.320x
Q11	575.59	264.02	-54.13%	2.180x
Q12	841.83	387.25	-54.00%	2.170x
Q13	867.57	379.90	-56.21%	2.280x
Q14	470.87	214.58	-54.43%	2.190x
Q15	762.07	340.55	-55.31%	2.240x
Q16	337.20	179.25	-46.84%	1.880x
Q17	3953.82	1701.46	-56.97%	2.320x
Q18	3763.51	1606.90	-57.30%	2.340x
Q19	644.43	314.27	-51.23%	2.050x
Q20	973.56	453.24	-53.45%	2.150x
Q21	19356.91	2396.96	-87.62%	8.080x
Q22	366.20	195.40	-46.64%	1.870x

TPCH SF10, 300 partitions, targeted high-fanout queries

Yes this is a real use case for fanout in distributed-datafusion

Query	main ms	grouped ms	change	speedup
Q3	2543.94	2250.91	-11.52%	1.130x
Q9	6495.22	4755.78	-26.78%	1.370x
Q10	1869.05	1709.18	-8.55%	1.090x
Q13	1238.63	1157.47	-6.55%	1.070x
Q15	461.51	446.25	-3.31%	1.030x
Q21	37810.29	5594.01	-85.21%	6.760x
Q22	1084.95	1058.74	-2.42%	1.020x

TPCH SF10, 300 partitions, peak RSS stress

Measured with /usr/bin/time -l, one iteration, --batch-size 1024, --partitions 300, and no DataFusion memory limit. RSS is process peak resident set size from the OS.

Query	main ms	grouped ms	time change	main peak RSS	grouped peak RSS	RSS change
Q7	5171.45	4151.15	-19.73%	3.69 GiB	3.75 GiB	1.61%
Q9	6055.57	4758.10	-21.43%	4.01 GiB	4.01 GiB	0.04%
Q21	36300.80	5810.14	-83.99%	2.96 GiB	2.05 GiB	-30.79%

Memory concern and follow-up work

This PR changes output batches from materializing per-partition batches to slices of one reordered batch. This means sibling slices can share the same buffers.

Potential concern:

one reordered batch allocation
  -> slice for partition 0
  -> slice for partition 1
  -> slice for partition 2

A slow output partition can keep the shared reordered batch buffers alive until its slice is dropped. Also, RecordBatch::get_array_memory_size() may count shared slice buffers repeatedly when repartition reserves memory per output batch.

The peak RSS stress above did not show a process-memory regression in the measured queries. Follow-up work should add buffer-aware accounting.

Are there any user-facing changes?

No.

rluvaton · 2026-05-13T20:48:44Z

run benchmarks

adriangbot · 2026-05-13T20:51:29Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4445097796-48-6zsv2 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing gene.bordegaray/2026/05/repartition-grouped-hash-take (a0a727c) to 937dfda (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-13T20:51:30Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4445097796-46-gq4hp 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing gene.bordegaray/2026/05/repartition-grouped-hash-take (a0a727c) to 937dfda (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-13T20:52:14Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4445097796-47-57b4m 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing gene.bordegaray/2026/05/repartition-grouped-hash-take (a0a727c) to 937dfda (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-13T21:06:19Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and gene.bordegaray_2026_05_repartition-grouped-hash-take
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ gene.bordegaray_2026_05_repartition-grouped-hash-take ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 38.90 / 40.27 ±1.15 / 41.71 ms │                        38.71 / 39.31 ±1.01 / 41.33 ms │ no change │
│ QQuery 2  │ 20.40 / 20.58 ±0.25 / 21.07 ms │                        20.48 / 20.76 ±0.23 / 21.17 ms │ no change │
│ QQuery 3  │ 35.94 / 37.72 ±1.23 / 39.11 ms │                        34.64 / 36.52 ±1.37 / 37.83 ms │ no change │
│ QQuery 4  │ 17.74 / 17.96 ±0.17 / 18.19 ms │                        17.52 / 17.74 ±0.17 / 17.95 ms │ no change │
│ QQuery 5  │ 43.38 / 44.60 ±1.30 / 46.57 ms │                        42.07 / 43.53 ±0.89 / 44.57 ms │ no change │
│ QQuery 6  │ 16.73 / 16.88 ±0.10 / 17.01 ms │                        16.53 / 16.75 ±0.18 / 17.05 ms │ no change │
│ QQuery 7  │ 51.50 / 52.60 ±1.54 / 55.46 ms │                        49.55 / 50.32 ±1.16 / 52.63 ms │ no change │
│ QQuery 8  │ 46.08 / 46.65 ±0.87 / 48.37 ms │                        45.36 / 45.47 ±0.11 / 45.67 ms │ no change │
│ QQuery 9  │ 50.21 / 52.23 ±1.17 / 53.52 ms │                        49.91 / 51.30 ±1.35 / 53.80 ms │ no change │
│ QQuery 10 │ 64.58 / 65.13 ±0.61 / 66.23 ms │                        64.17 / 64.48 ±0.37 / 65.19 ms │ no change │
│ QQuery 11 │ 13.60 / 14.41 ±1.19 / 16.78 ms │                        13.59 / 14.23 ±0.64 / 15.07 ms │ no change │
│ QQuery 12 │ 25.56 / 25.76 ±0.14 / 25.91 ms │                        24.91 / 25.20 ±0.28 / 25.63 ms │ no change │
│ QQuery 13 │ 35.71 / 36.34 ±0.71 / 37.41 ms │                        35.42 / 35.69 ±0.26 / 36.18 ms │ no change │
│ QQuery 14 │ 26.17 / 26.35 ±0.20 / 26.74 ms │                        25.55 / 25.75 ±0.12 / 25.92 ms │ no change │
│ QQuery 15 │ 31.89 / 32.14 ±0.19 / 32.46 ms │                        31.77 / 31.83 ±0.06 / 31.92 ms │ no change │
│ QQuery 16 │ 14.94 / 15.10 ±0.14 / 15.37 ms │                        14.64 / 15.00 ±0.20 / 15.19 ms │ no change │
│ QQuery 17 │ 75.49 / 77.04 ±1.61 / 79.89 ms │                        75.11 / 76.20 ±1.31 / 78.73 ms │ no change │
│ QQuery 18 │ 69.24 / 70.01 ±0.55 / 70.85 ms │                        66.46 / 67.50 ±1.01 / 68.86 ms │ no change │
│ QQuery 19 │ 35.50 / 35.80 ±0.32 / 36.40 ms │                        35.32 / 35.93 ±0.71 / 37.08 ms │ no change │
│ QQuery 20 │ 38.29 / 38.51 ±0.32 / 39.14 ms │                        37.73 / 37.92 ±0.17 / 38.15 ms │ no change │
│ QQuery 21 │ 59.88 / 62.82 ±1.95 / 65.81 ms │                        58.08 / 59.90 ±2.73 / 65.32 ms │ no change │
│ QQuery 22 │ 23.41 / 23.78 ±0.22 / 24.12 ms │                        23.43 / 23.70 ±0.24 / 24.12 ms │ no change │
└───────────┴────────────────────────────────┴───────────────────────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                                    ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                                    │ 852.68ms │
│ Total Time (gene.bordegaray_2026_05_repartition-grouped-hash-take)   │ 835.02ms │
│ Average Time (HEAD)                                                  │  38.76ms │
│ Average Time (gene.bordegaray_2026_05_repartition-grouped-hash-take) │  37.96ms │
│ Queries Faster                                                       │        0 │
│ Queries Slower                                                       │        0 │
│ Queries with No Change                                               │       22 │
│ Queries with Failure                                                 │        0 │
└──────────────────────────────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	5.5 GiB
Avg memory	5.0 GiB
CPU user	31.3s
CPU sys	2.4s
Peak spill	0 B

tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	5.5 GiB
Avg memory	5.0 GiB
CPU user	30.6s
CPU sys	2.5s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-13T21:07:58Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and gene.bordegaray_2026_05_repartition-grouped-hash-take
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃ gene.bordegaray_2026_05_repartition-grouped-hash-take ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           6.43 / 7.03 ±0.85 / 8.71 ms │                           6.18 / 6.70 ±0.87 / 8.44 ms │     no change │
│ QQuery 2  │        82.11 / 82.32 ±0.13 / 82.45 ms │                        80.93 / 81.38 ±0.29 / 81.78 ms │     no change │
│ QQuery 3  │        29.14 / 29.63 ±0.32 / 30.08 ms │                        28.58 / 29.04 ±0.46 / 29.65 ms │     no change │
│ QQuery 4  │     519.85 / 523.27 ±2.93 / 527.25 ms │                     510.72 / 520.55 ±5.32 / 525.73 ms │     no change │
│ QQuery 5  │        52.88 / 53.37 ±0.37 / 53.95 ms │                        52.55 / 53.25 ±0.60 / 54.32 ms │     no change │
│ QQuery 6  │        35.84 / 36.40 ±0.36 / 36.83 ms │                        35.65 / 35.96 ±0.26 / 36.30 ms │     no change │
│ QQuery 7  │     110.28 / 113.04 ±3.60 / 120.12 ms │                     107.64 / 110.60 ±3.09 / 115.50 ms │     no change │
│ QQuery 8  │        39.23 / 39.84 ±0.37 / 40.35 ms │                        38.77 / 39.43 ±0.46 / 40.20 ms │     no change │
│ QQuery 9  │        56.18 / 57.26 ±0.78 / 58.53 ms │                        53.42 / 54.31 ±0.86 / 55.86 ms │ +1.05x faster │
│ QQuery 10 │        80.76 / 81.78 ±1.16 / 83.88 ms │                        81.79 / 82.86 ±1.49 / 85.80 ms │     no change │
│ QQuery 11 │     314.31 / 321.51 ±4.31 / 325.69 ms │                     314.31 / 320.84 ±7.18 / 331.85 ms │     no change │
│ QQuery 12 │        29.24 / 29.52 ±0.21 / 29.77 ms │                        28.77 / 29.01 ±0.15 / 29.19 ms │     no change │
│ QQuery 13 │     128.87 / 130.17 ±1.07 / 131.59 ms │                     129.10 / 129.47 ±0.21 / 129.74 ms │     no change │
│ QQuery 14 │     515.93 / 517.75 ±1.57 / 520.45 ms │                     507.46 / 509.74 ±1.90 / 512.59 ms │     no change │
│ QQuery 15 │        61.05 / 62.15 ±0.86 / 63.47 ms │                        60.51 / 61.89 ±0.75 / 62.54 ms │     no change │
│ QQuery 16 │           6.94 / 7.11 ±0.14 / 7.34 ms │                           6.92 / 7.13 ±0.22 / 7.55 ms │     no change │
│ QQuery 17 │        84.24 / 85.87 ±1.20 / 87.28 ms │                        81.08 / 83.32 ±1.59 / 85.25 ms │     no change │
│ QQuery 18 │     157.16 / 158.59 ±1.77 / 162.08 ms │                     152.14 / 152.77 ±0.43 / 153.49 ms │     no change │
│ QQuery 19 │        41.55 / 41.79 ±0.26 / 42.21 ms │                        40.90 / 41.57 ±0.40 / 42.03 ms │     no change │
│ QQuery 20 │        35.16 / 35.77 ±0.33 / 36.13 ms │                        35.50 / 35.91 ±0.30 / 36.33 ms │     no change │
│ QQuery 21 │        17.86 / 18.06 ±0.19 / 18.41 ms │                        17.82 / 18.13 ±0.20 / 18.43 ms │     no change │
│ QQuery 22 │        62.31 / 62.91 ±0.56 / 63.79 ms │                        61.62 / 62.38 ±0.75 / 63.49 ms │     no change │
│ QQuery 23 │     476.45 / 482.37 ±3.74 / 485.70 ms │                     479.43 / 482.83 ±3.77 / 489.87 ms │     no change │
│ QQuery 24 │     244.10 / 249.06 ±3.43 / 254.78 ms │                     237.55 / 238.65 ±0.87 / 239.71 ms │     no change │
│ QQuery 25 │     118.94 / 120.50 ±1.23 / 122.45 ms │                     113.81 / 115.68 ±1.51 / 117.94 ms │     no change │
│ QQuery 26 │        71.27 / 72.33 ±0.97 / 74.12 ms │                        70.66 / 71.18 ±0.28 / 71.47 ms │     no change │
│ QQuery 27 │           7.07 / 7.31 ±0.23 / 7.71 ms │                           7.04 / 7.26 ±0.17 / 7.54 ms │     no change │
│ QQuery 28 │        62.72 / 63.11 ±0.30 / 63.55 ms │                        58.27 / 61.16 ±2.08 / 63.20 ms │     no change │
│ QQuery 29 │     101.67 / 103.36 ±1.72 / 106.63 ms │                      99.31 / 100.33 ±0.99 / 101.59 ms │     no change │
│ QQuery 30 │        30.18 / 30.74 ±0.41 / 31.45 ms │                        30.68 / 30.94 ±0.19 / 31.23 ms │     no change │
│ QQuery 31 │     112.12 / 113.31 ±0.75 / 114.20 ms │                     112.65 / 113.01 ±0.27 / 113.45 ms │     no change │
│ QQuery 32 │        20.47 / 20.63 ±0.16 / 20.91 ms │                        20.02 / 20.47 ±0.30 / 20.93 ms │     no change │
│ QQuery 33 │        39.14 / 39.49 ±0.33 / 40.11 ms │                        39.27 / 39.50 ±0.13 / 39.68 ms │     no change │
│ QQuery 34 │         9.78 / 10.04 ±0.22 / 10.36 ms │                         9.84 / 10.18 ±0.20 / 10.40 ms │     no change │
│ QQuery 35 │        80.95 / 82.42 ±1.97 / 85.98 ms │                        81.37 / 82.69 ±1.06 / 84.60 ms │     no change │
│ QQuery 36 │           6.50 / 6.64 ±0.16 / 6.94 ms │                           6.44 / 6.57 ±0.18 / 6.93 ms │     no change │
│ QQuery 37 │           7.11 / 7.35 ±0.24 / 7.81 ms │                           7.27 / 7.35 ±0.09 / 7.50 ms │     no change │
│ QQuery 38 │        68.28 / 68.56 ±0.22 / 68.90 ms │                        68.27 / 69.25 ±0.71 / 70.07 ms │     no change │
│ QQuery 39 │     100.68 / 102.00 ±1.88 / 105.72 ms │                      99.90 / 100.36 ±0.34 / 100.76 ms │     no change │
│ QQuery 40 │        23.77 / 23.92 ±0.13 / 24.11 ms │                        23.26 / 23.50 ±0.22 / 23.85 ms │     no change │
│ QQuery 41 │        14.02 / 14.23 ±0.25 / 14.71 ms │                        14.11 / 14.32 ±0.19 / 14.62 ms │     no change │
│ QQuery 42 │        23.95 / 24.31 ±0.29 / 24.65 ms │                        24.11 / 24.31 ±0.21 / 24.69 ms │     no change │
│ QQuery 43 │           5.26 / 5.38 ±0.12 / 5.61 ms │                           5.43 / 5.51 ±0.10 / 5.69 ms │     no change │
│ QQuery 44 │        10.97 / 11.02 ±0.03 / 11.06 ms │                        11.19 / 11.25 ±0.06 / 11.35 ms │     no change │
│ QQuery 45 │        40.63 / 41.15 ±0.36 / 41.74 ms │                        40.26 / 40.74 ±0.61 / 41.89 ms │     no change │
│ QQuery 46 │        13.47 / 13.78 ±0.29 / 14.27 ms │                        13.43 / 13.56 ±0.13 / 13.80 ms │     no change │
│ QQuery 47 │     232.85 / 237.12 ±4.50 / 244.13 ms │                     232.22 / 235.71 ±2.46 / 239.76 ms │     no change │
│ QQuery 48 │     104.79 / 105.08 ±0.38 / 105.80 ms │                     104.69 / 105.88 ±0.84 / 107.21 ms │     no change │
│ QQuery 49 │        82.96 / 84.63 ±1.39 / 86.31 ms │                        81.24 / 82.36 ±1.82 / 85.98 ms │     no change │
│ QQuery 50 │        63.42 / 64.79 ±2.24 / 69.23 ms │                        61.24 / 62.14 ±1.02 / 64.05 ms │     no change │
│ QQuery 51 │        93.86 / 95.43 ±1.33 / 97.76 ms │                        93.27 / 94.38 ±0.96 / 95.90 ms │     no change │
│ QQuery 52 │        24.12 / 24.97 ±1.02 / 26.69 ms │                        24.46 / 24.67 ±0.17 / 24.92 ms │     no change │
│ QQuery 53 │        30.51 / 30.74 ±0.21 / 31.07 ms │                        30.12 / 30.30 ±0.20 / 30.69 ms │     no change │
│ QQuery 54 │        54.71 / 56.33 ±2.85 / 62.01 ms │                        54.44 / 56.18 ±1.78 / 59.46 ms │     no change │
│ QQuery 55 │        23.99 / 24.32 ±0.21 / 24.62 ms │                        23.76 / 24.16 ±0.29 / 24.50 ms │     no change │
│ QQuery 56 │        39.60 / 40.24 ±0.70 / 41.54 ms │                        39.93 / 40.28 ±0.24 / 40.57 ms │     no change │
│ QQuery 57 │     179.40 / 181.24 ±1.72 / 184.42 ms │                     178.24 / 180.76 ±1.63 / 182.79 ms │     no change │
│ QQuery 58 │     117.81 / 119.53 ±1.61 / 122.53 ms │                     117.91 / 118.56 ±0.59 / 119.60 ms │     no change │
│ QQuery 59 │     119.94 / 120.09 ±0.18 / 120.44 ms │                     118.86 / 119.40 ±0.31 / 119.72 ms │     no change │
│ QQuery 60 │        40.33 / 41.29 ±1.13 / 43.41 ms │                        39.49 / 40.08 ±0.37 / 40.52 ms │     no change │
│ QQuery 61 │        13.74 / 13.88 ±0.18 / 14.19 ms │                        13.64 / 13.80 ±0.20 / 14.18 ms │     no change │
│ QQuery 62 │        47.27 / 48.19 ±1.26 / 50.68 ms │                        46.49 / 47.40 ±1.03 / 49.35 ms │     no change │
│ QQuery 63 │        30.10 / 30.53 ±0.37 / 31.17 ms │                        30.26 / 31.13 ±0.99 / 32.99 ms │     no change │
│ QQuery 64 │     480.89 / 485.65 ±4.40 / 493.20 ms │                     460.52 / 464.43 ±4.61 / 472.98 ms │     no change │
│ QQuery 65 │     145.03 / 149.19 ±5.07 / 159.06 ms │                     146.13 / 149.20 ±2.30 / 152.26 ms │     no change │
│ QQuery 66 │        82.98 / 84.43 ±1.23 / 86.68 ms │                        82.75 / 84.11 ±1.11 / 85.60 ms │     no change │
│ QQuery 67 │     243.13 / 249.40 ±4.17 / 256.02 ms │                     239.16 / 244.20 ±3.18 / 248.35 ms │     no change │
│ QQuery 68 │        13.81 / 14.16 ±0.31 / 14.65 ms │                        13.51 / 13.82 ±0.23 / 14.22 ms │     no change │
│ QQuery 69 │        76.41 / 76.95 ±0.47 / 77.77 ms │                        78.58 / 80.33 ±2.03 / 83.32 ms │     no change │
│ QQuery 70 │     105.88 / 109.68 ±3.19 / 115.20 ms │                     105.17 / 109.14 ±3.98 / 116.78 ms │     no change │
│ QQuery 71 │        36.32 / 36.71 ±0.30 / 37.10 ms │                        35.68 / 37.12 ±2.45 / 42.00 ms │     no change │
│ QQuery 72 │ 2211.42 / 2293.29 ±52.88 / 2350.68 ms │                 2190.63 / 2238.66 ±50.14 / 2335.69 ms │     no change │
│ QQuery 73 │         9.78 / 10.06 ±0.23 / 10.33 ms │                          9.77 / 9.95 ±0.19 / 10.28 ms │     no change │
│ QQuery 74 │     179.17 / 184.50 ±5.38 / 193.63 ms │                     178.53 / 181.79 ±3.46 / 187.83 ms │     no change │
│ QQuery 75 │     150.71 / 152.55 ±2.30 / 157.06 ms │                     148.01 / 148.98 ±0.55 / 149.52 ms │     no change │
│ QQuery 76 │        35.80 / 36.25 ±0.36 / 36.87 ms │                        35.32 / 36.08 ±0.63 / 36.87 ms │     no change │
│ QQuery 77 │        62.36 / 63.15 ±0.42 / 63.49 ms │                        62.23 / 62.59 ±0.19 / 62.75 ms │     no change │
│ QQuery 78 │     196.15 / 199.57 ±4.16 / 207.51 ms │                     189.51 / 191.14 ±1.51 / 193.99 ms │     no change │
│ QQuery 79 │        68.18 / 68.57 ±0.44 / 69.25 ms │                        67.11 / 67.89 ±0.47 / 68.54 ms │     no change │
│ QQuery 80 │     103.71 / 106.00 ±1.94 / 109.48 ms │                     101.30 / 104.17 ±2.17 / 107.57 ms │     no change │
│ QQuery 81 │        24.85 / 25.25 ±0.23 / 25.57 ms │                        24.61 / 24.87 ±0.24 / 25.27 ms │     no change │
│ QQuery 82 │        17.35 / 17.59 ±0.35 / 18.28 ms │                        16.67 / 17.44 ±0.99 / 19.39 ms │     no change │
│ QQuery 83 │        38.45 / 38.79 ±0.29 / 39.23 ms │                        37.24 / 37.69 ±0.36 / 38.28 ms │     no change │
│ QQuery 84 │        43.51 / 43.88 ±0.42 / 44.68 ms │                        43.16 / 43.36 ±0.19 / 43.64 ms │     no change │
│ QQuery 85 │     137.51 / 138.61 ±1.16 / 140.72 ms │                     135.30 / 136.28 ±0.91 / 137.89 ms │     no change │
│ QQuery 86 │        25.64 / 25.84 ±0.17 / 26.04 ms │                        25.40 / 25.79 ±0.35 / 26.31 ms │     no change │
│ QQuery 87 │        69.02 / 69.80 ±0.56 / 70.61 ms │                        68.92 / 70.23 ±1.09 / 72.09 ms │     no change │
│ QQuery 88 │        65.51 / 65.91 ±0.23 / 66.16 ms │                        64.39 / 65.74 ±1.25 / 68.10 ms │     no change │
│ QQuery 89 │        36.29 / 36.81 ±0.40 / 37.32 ms │                        36.21 / 36.56 ±0.26 / 36.99 ms │     no change │
│ QQuery 90 │        17.58 / 17.80 ±0.19 / 18.13 ms │                        17.52 / 17.72 ±0.22 / 18.10 ms │     no change │
│ QQuery 91 │        51.69 / 52.21 ±0.33 / 52.72 ms │                        52.06 / 52.71 ±0.50 / 53.36 ms │     no change │
│ QQuery 92 │        29.54 / 29.87 ±0.19 / 30.07 ms │                        29.82 / 30.07 ±0.18 / 30.30 ms │     no change │
│ QQuery 93 │        53.65 / 54.66 ±1.24 / 57.01 ms │                        51.24 / 52.48 ±1.79 / 55.98 ms │     no change │
│ QQuery 94 │        38.83 / 39.59 ±0.68 / 40.81 ms │                        37.91 / 38.31 ±0.41 / 39.07 ms │     no change │
│ QQuery 95 │        91.22 / 92.41 ±0.76 / 93.20 ms │                        88.39 / 89.31 ±0.88 / 90.83 ms │     no change │
│ QQuery 96 │        24.23 / 24.59 ±0.28 / 24.95 ms │                        24.29 / 24.68 ±0.31 / 25.06 ms │     no change │
│ QQuery 97 │        47.09 / 47.38 ±0.26 / 47.81 ms │                        46.83 / 47.86 ±0.78 / 49.25 ms │     no change │
│ QQuery 98 │        42.54 / 43.37 ±0.55 / 44.14 ms │                        42.93 / 43.27 ±0.20 / 43.53 ms │     no change │
│ QQuery 99 │        71.11 / 71.35 ±0.22 / 71.70 ms │                        70.21 / 70.78 ±0.38 / 71.10 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                    │ 10782.33ms │
│ Total Time (gene.bordegaray_2026_05_repartition-grouped-hash-take)   │ 10612.82ms │
│ Average Time (HEAD)                                                  │   108.91ms │
│ Average Time (gene.bordegaray_2026_05_repartition-grouped-hash-take) │   107.20ms │
│ Queries Faster                                                       │          1 │
│ Queries Slower                                                       │          0 │
│ Queries with No Change                                               │         98 │
│ Queries with Failure                                                 │          0 │
└──────────────────────────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	55.0s
Peak memory	6.8 GiB
Avg memory	6.1 GiB
CPU user	247.8s
CPU sys	6.6s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	55.0s
Peak memory	6.9 GiB
Avg memory	6.2 GiB
CPU user	243.2s
CPU sys	6.4s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-13T21:11:07Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and gene.bordegaray_2026_05_repartition-grouped-hash-take
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃ gene.bordegaray_2026_05_repartition-grouped-hash-take ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.21 / 4.70 ±6.90 / 18.50 ms │                          1.21 / 4.65 ±6.77 / 18.18 ms │     no change │
│ QQuery 1  │        12.64 / 12.97 ±0.23 / 13.33 ms │                        12.63 / 12.97 ±0.18 / 13.13 ms │     no change │
│ QQuery 2  │        35.84 / 36.22 ±0.29 / 36.73 ms │                        35.56 / 36.01 ±0.32 / 36.52 ms │     no change │
│ QQuery 3  │        30.49 / 31.30 ±0.64 / 32.30 ms │                        30.81 / 31.05 ±0.18 / 31.31 ms │     no change │
│ QQuery 4  │     234.92 / 239.26 ±3.28 / 244.72 ms │                     231.04 / 234.71 ±2.77 / 238.77 ms │     no change │
│ QQuery 5  │     277.04 / 278.68 ±1.33 / 280.91 ms │                     278.38 / 280.45 ±1.99 / 283.76 ms │     no change │
│ QQuery 6  │           6.47 / 6.94 ±0.33 / 7.37 ms │                           6.13 / 7.15 ±0.56 / 7.69 ms │     no change │
│ QQuery 7  │        13.80 / 13.90 ±0.09 / 14.04 ms │                        13.86 / 13.99 ±0.08 / 14.07 ms │     no change │
│ QQuery 8  │     311.83 / 319.31 ±4.96 / 327.23 ms │                     312.96 / 319.10 ±5.22 / 328.52 ms │     no change │
│ QQuery 9  │     456.74 / 464.66 ±6.51 / 475.75 ms │                     451.27 / 462.38 ±7.87 / 471.87 ms │     no change │
│ QQuery 10 │        69.44 / 70.90 ±1.72 / 74.28 ms │                        69.70 / 71.41 ±1.17 / 73.07 ms │     no change │
│ QQuery 11 │        79.90 / 80.34 ±0.50 / 81.26 ms │                        80.49 / 82.11 ±1.09 / 83.46 ms │     no change │
│ QQuery 12 │     269.18 / 274.05 ±4.05 / 281.37 ms │                     271.90 / 274.83 ±3.27 / 280.97 ms │     no change │
│ QQuery 13 │    379.93 / 392.63 ±12.61 / 413.40 ms │                     377.19 / 388.35 ±8.37 / 400.54 ms │     no change │
│ QQuery 14 │     278.88 / 282.57 ±4.33 / 290.14 ms │                     278.12 / 279.95 ±1.61 / 282.31 ms │     no change │
│ QQuery 15 │     281.19 / 286.43 ±6.55 / 298.98 ms │                     275.49 / 278.20 ±3.33 / 283.63 ms │     no change │
│ QQuery 16 │     607.66 / 612.92 ±6.24 / 624.01 ms │                     593.93 / 601.51 ±4.35 / 606.79 ms │     no change │
│ QQuery 17 │     607.22 / 611.34 ±5.43 / 622.03 ms │                     597.07 / 604.89 ±8.00 / 618.68 ms │     no change │
│ QQuery 18 │  1199.23 / 1209.96 ±5.72 / 1214.87 ms │                 1161.53 / 1189.57 ±16.71 / 1210.43 ms │     no change │
│ QQuery 19 │        28.02 / 34.07 ±9.09 / 51.96 ms │                        28.05 / 29.81 ±2.62 / 35.01 ms │ +1.14x faster │
│ QQuery 20 │     518.82 / 523.67 ±6.14 / 535.64 ms │                     518.54 / 522.25 ±2.78 / 526.75 ms │     no change │
│ QQuery 21 │     593.52 / 596.87 ±3.92 / 604.35 ms │                     592.11 / 597.65 ±4.08 / 602.64 ms │     no change │
│ QQuery 22 │ 1050.38 / 1064.30 ±14.24 / 1090.99 ms │                  1058.63 / 1061.14 ±3.02 / 1065.10 ms │     no change │
│ QQuery 23 │ 3145.19 / 3190.27 ±33.19 / 3243.33 ms │                 3195.25 / 3231.86 ±24.78 / 3266.06 ms │     no change │
│ QQuery 24 │        42.12 / 43.27 ±1.30 / 45.77 ms │                        41.75 / 42.40 ±0.75 / 43.86 ms │     no change │
│ QQuery 25 │     110.99 / 114.59 ±3.67 / 121.19 ms │                     111.71 / 116.55 ±5.96 / 128.27 ms │     no change │
│ QQuery 26 │        42.30 / 44.32 ±1.83 / 46.53 ms │                        43.39 / 44.06 ±0.55 / 44.87 ms │     no change │
│ QQuery 27 │     668.42 / 679.18 ±9.03 / 695.33 ms │                     676.20 / 681.08 ±5.16 / 690.90 ms │     no change │
│ QQuery 28 │ 2992.39 / 3012.93 ±14.96 / 3027.08 ms │                 2990.16 / 3018.18 ±16.16 / 3037.82 ms │     no change │
│ QQuery 29 │        41.65 / 45.59 ±7.42 / 60.42 ms │                        41.86 / 46.74 ±8.23 / 63.04 ms │     no change │
│ QQuery 30 │     300.52 / 302.26 ±1.40 / 303.93 ms │                     295.20 / 300.43 ±4.33 / 308.33 ms │     no change │
│ QQuery 31 │     282.24 / 292.78 ±8.13 / 305.71 ms │                     279.12 / 288.84 ±5.46 / 294.21 ms │     no change │
│ QQuery 32 │     920.30 / 928.98 ±7.79 / 940.88 ms │                    885.54 / 903.98 ±15.54 / 931.52 ms │     no change │
│ QQuery 33 │ 1422.51 / 1459.42 ±23.74 / 1494.49 ms │                 1424.49 / 1439.16 ±11.86 / 1458.75 ms │     no change │
│ QQuery 34 │ 1449.73 / 1463.83 ±18.67 / 1500.15 ms │                 1446.82 / 1492.24 ±46.65 / 1582.23 ms │     no change │
│ QQuery 35 │    288.26 / 307.05 ±23.25 / 348.73 ms │                     278.26 / 292.68 ±8.38 / 301.54 ms │     no change │
│ QQuery 36 │        62.48 / 64.98 ±2.56 / 69.70 ms │                      64.19 / 74.54 ±14.56 / 101.62 ms │  1.15x slower │
│ QQuery 37 │        35.94 / 40.89 ±4.82 / 47.35 ms │                        35.83 / 39.43 ±3.58 / 45.50 ms │     no change │
│ QQuery 38 │        40.63 / 42.48 ±1.37 / 43.81 ms │                        43.55 / 48.13 ±4.46 / 54.75 ms │  1.13x slower │
│ QQuery 39 │     129.40 / 135.18 ±3.90 / 140.34 ms │                     122.43 / 131.85 ±5.91 / 140.97 ms │     no change │
│ QQuery 40 │        14.28 / 16.81 ±4.22 / 25.21 ms │                        14.06 / 15.25 ±1.52 / 18.18 ms │ +1.10x faster │
│ QQuery 41 │        13.85 / 14.15 ±0.26 / 14.62 ms │                        13.97 / 14.12 ±0.11 / 14.29 ms │     no change │
│ QQuery 42 │        13.48 / 13.99 ±0.64 / 15.25 ms │                        13.48 / 13.70 ±0.14 / 13.88 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                    │ 19660.95ms │
│ Total Time (gene.bordegaray_2026_05_repartition-grouped-hash-take)   │ 19619.36ms │
│ Average Time (HEAD)                                                  │   457.23ms │
│ Average Time (gene.bordegaray_2026_05_repartition-grouped-hash-take) │   456.26ms │
│ Queries Faster                                                       │          2 │
│ Queries Slower                                                       │          2 │
│ Queries with No Change                                               │         39 │
│ Queries with Failure                                                 │          0 │
└──────────────────────────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	100.0s
Peak memory	31.0 GiB
Avg memory	23.5 GiB
CPU user	1033.3s
CPU sys	64.6s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	100.0s
Peak memory	31.4 GiB
Avg memory	23.1 GiB
CPU user	1029.1s
CPU sys	66.4s
Peak spill	0 B

File an issue against this benchmark runner

gene-bordegaray · 2026-05-13T21:48:55Z

This is inended for fanout on larger scale factor. The benchmarks in my description are run with --batch-size=1024 to target workload for this.

Can this be run with

env:
  DATAFUSION_EXECUTION_TARGET_PARTITIONS: 300

gene-bordegaray · 2026-05-13T22:40:28Z

cc: @gabotechs

Call take arrays once per repartitioned input batch

a0a727c

gene-bordegaray changed the title ~~Call take arrays once per repartitioned input batch~~ [WIP] Call take arrays once per repartitioned input batch May 13, 2026

gene-bordegaray changed the title ~~[WIP] Call take arrays once per repartitioned input batch~~ Call take arrays once per repartitioned input batch May 14, 2026

gene-bordegaray marked this pull request as ready for review May 14, 2026 00:38

gene-bordegaray mentioned this pull request May 14, 2026

Add internal repartition metrics #21152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call take arrays once per repartitioned input batch#22159

Call take arrays once per repartitioned input batch#22159
gene-bordegaray wants to merge 1 commit into
apache:mainfrom
gene-bordegaray:gene.bordegaray/2026/05/repartition-grouped-hash-take

gene-bordegaray commented May 13, 2026 •

edited

Loading

Uh oh!

rluvaton commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

gene-bordegaray commented May 13, 2026 •

edited

Loading

Uh oh!

gene-bordegaray commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gene-bordegaray commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Memory concern and follow-up work

Are there any user-facing changes?

Uh oh!

rluvaton commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

adriangbot commented May 13, 2026

Uh oh!

gene-bordegaray commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gene-bordegaray commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gene-bordegaray commented May 13, 2026 •

edited

Loading

gene-bordegaray commented May 13, 2026 •

edited

Loading