Skip to content

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Jan 3, 2026

Which issue does this PR close?

Rationale for this change

Speedup accumulator code (sum, avg, count) by specializing on non-null cases.

What changes are included in this PR?

  • Specialize Nullstate to non-null values.
  • Use unchecked indexing

Are these changes tested?

Are there any user-facing changes?

@Dandandan
Copy link
Contributor Author

run benchmark tpch

@github-actions github-actions bot added the functions Changes to functions implementation label Jan 3, 2026
@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_accumulate2 (2e70075) to 70daf88 diff using: tpch
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and speedup_accumulate2
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 195.98 ms │           177.99 ms │ +1.10x faster │
│ QQuery 2     │  93.00 ms │            93.10 ms │     no change │
│ QQuery 3     │ 125.87 ms │           129.76 ms │     no change │
│ QQuery 4     │  76.91 ms │            77.23 ms │     no change │
│ QQuery 5     │ 173.24 ms │           172.56 ms │     no change │
│ QQuery 6     │  66.55 ms │            60.69 ms │ +1.10x faster │
│ QQuery 7     │ 213.23 ms │           212.37 ms │     no change │
│ QQuery 8     │ 163.32 ms │           159.22 ms │     no change │
│ QQuery 9     │ 222.59 ms │           225.17 ms │     no change │
│ QQuery 10    │ 183.45 ms │           186.90 ms │     no change │
│ QQuery 11    │  73.45 ms │            73.71 ms │     no change │
│ QQuery 12    │ 119.27 ms │           119.24 ms │     no change │
│ QQuery 13    │ 217.80 ms │           211.53 ms │     no change │
│ QQuery 14    │  88.21 ms │            92.57 ms │     no change │
│ QQuery 15    │ 121.09 ms │           118.33 ms │     no change │
│ QQuery 16    │  55.94 ms │            56.40 ms │     no change │
│ QQuery 17    │ 271.09 ms │           263.09 ms │     no change │
│ QQuery 18    │ 323.32 ms │           307.03 ms │ +1.05x faster │
│ QQuery 19    │ 133.86 ms │           131.78 ms │     no change │
│ QQuery 20    │ 125.14 ms │           125.98 ms │     no change │
│ QQuery 21    │ 258.77 ms │           264.91 ms │     no change │
│ QQuery 22    │  41.10 ms │            43.76 ms │  1.06x slower │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 3343.17ms │
│ Total Time (speedup_accumulate2)   │ 3303.33ms │
│ Average Time (HEAD)                │  151.96ms │
│ Average Time (speedup_accumulate2) │  150.15ms │
│ Queries Faster                     │         3 │
│ Queries Slower                     │         1 │
│ Queries with No Change             │        18 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_accumulate2 (2e70075) to 70daf88 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and speedup_accumulate2
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2396.32 ms │          2401.55 ms │     no change │
│ QQuery 1     │   968.67 ms │           948.40 ms │     no change │
│ QQuery 2     │  1904.98 ms │          1904.49 ms │     no change │
│ QQuery 3     │  1189.95 ms │          1082.74 ms │ +1.10x faster │
│ QQuery 4     │  2302.46 ms │          2257.02 ms │     no change │
│ QQuery 5     │ 28185.03 ms │         28318.68 ms │     no change │
│ QQuery 6     │  3989.47 ms │          3954.18 ms │     no change │
│ QQuery 7     │  3607.90 ms │          3402.52 ms │ +1.06x faster │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 44544.78ms │
│ Total Time (speedup_accumulate2)   │ 44269.58ms │
│ Average Time (HEAD)                │  5568.10ms │
│ Average Time (speedup_accumulate2) │  5533.70ms │
│ Queries Faster                     │          2 │
│ Queries Slower                     │          0 │
│ Queries with No Change             │          6 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     1.44 ms │             1.47 ms │     no change │
│ QQuery 1     │    49.61 ms │            51.67 ms │     no change │
│ QQuery 2     │   133.37 ms │           135.39 ms │     no change │
│ QQuery 3     │   153.38 ms │           155.18 ms │     no change │
│ QQuery 4     │  1080.14 ms │          1100.08 ms │     no change │
│ QQuery 5     │  1374.56 ms │          1365.30 ms │     no change │
│ QQuery 6     │     1.44 ms │             1.45 ms │     no change │
│ QQuery 7     │    54.79 ms │            55.54 ms │     no change │
│ QQuery 8     │  1431.19 ms │          1436.26 ms │     no change │
│ QQuery 9     │  1834.80 ms │          1760.94 ms │     no change │
│ QQuery 10    │   343.66 ms │           346.76 ms │     no change │
│ QQuery 11    │   392.50 ms │           406.43 ms │     no change │
│ QQuery 12    │  1250.57 ms │          1282.74 ms │     no change │
│ QQuery 13    │  1948.36 ms │          1901.69 ms │     no change │
│ QQuery 14    │  1245.12 ms │          1252.87 ms │     no change │
│ QQuery 15    │  1253.71 ms │          1240.80 ms │     no change │
│ QQuery 16    │  2604.43 ms │          2638.39 ms │     no change │
│ QQuery 17    │  2588.56 ms │          2543.25 ms │     no change │
│ QQuery 18    │  5552.06 ms │          4873.72 ms │ +1.14x faster │
│ QQuery 19    │   121.28 ms │           118.29 ms │     no change │
│ QQuery 20    │  1924.64 ms │          1852.19 ms │     no change │
│ QQuery 21    │  2220.98 ms │          2110.61 ms │     no change │
│ QQuery 22    │  3838.07 ms │          3633.27 ms │ +1.06x faster │
│ QQuery 23    │ 17788.42 ms │         12106.29 ms │ +1.47x faster │
│ QQuery 24    │   224.44 ms │           211.69 ms │ +1.06x faster │
│ QQuery 25    │   475.26 ms │           443.87 ms │ +1.07x faster │
│ QQuery 26    │   202.16 ms │           209.78 ms │     no change │
│ QQuery 27    │  2850.97 ms │          2618.79 ms │ +1.09x faster │
│ QQuery 28    │ 23800.99 ms │         24347.77 ms │     no change │
│ QQuery 29    │   942.77 ms │          1011.45 ms │  1.07x slower │
│ QQuery 30    │  1307.75 ms │          1248.78 ms │     no change │
│ QQuery 31    │  1366.48 ms │          1289.24 ms │ +1.06x faster │
│ QQuery 32    │  5105.69 ms │          4974.34 ms │     no change │
│ QQuery 33    │  5738.49 ms │          5365.07 ms │ +1.07x faster │
│ QQuery 34    │  5800.86 ms │          5667.67 ms │     no change │
│ QQuery 35    │  1986.02 ms │          1876.39 ms │ +1.06x faster │
│ QQuery 36    │    65.03 ms │            63.81 ms │     no change │
│ QQuery 37    │    44.32 ms │            43.01 ms │     no change │
│ QQuery 38    │    65.12 ms │            64.24 ms │     no change │
│ QQuery 39    │   100.76 ms │           101.10 ms │     no change │
│ QQuery 40    │    24.99 ms │            25.83 ms │     no change │
│ QQuery 41    │    22.47 ms │            22.20 ms │     no change │
│ QQuery 42    │    19.13 ms │            20.06 ms │     no change │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 99330.77ms │
│ Total Time (speedup_accumulate2)   │ 91975.66ms │
│ Average Time (HEAD)                │  2310.02ms │
│ Average Time (speedup_accumulate2) │  2138.97ms │
│ Queries Faster                     │          9 │
│ Queries Slower                     │          1 │
│ Queries with No Change             │         33 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 118.98 ms │           102.93 ms │ +1.16x faster │
│ QQuery 2     │  27.10 ms │            27.74 ms │     no change │
│ QQuery 3     │  38.52 ms │            37.98 ms │     no change │
│ QQuery 4     │  28.23 ms │            29.33 ms │     no change │
│ QQuery 5     │  86.26 ms │            87.18 ms │     no change │
│ QQuery 6     │  20.01 ms │            19.86 ms │     no change │
│ QQuery 7     │ 229.88 ms │           223.06 ms │     no change │
│ QQuery 8     │  34.32 ms │            36.07 ms │  1.05x slower │
│ QQuery 9     │ 104.16 ms │            99.28 ms │     no change │
│ QQuery 10    │  62.37 ms │            64.49 ms │     no change │
│ QQuery 11    │  16.19 ms │            17.94 ms │  1.11x slower │
│ QQuery 12    │  49.55 ms │            49.97 ms │     no change │
│ QQuery 13    │  47.28 ms │            48.64 ms │     no change │
│ QQuery 14    │  13.42 ms │            13.30 ms │     no change │
│ QQuery 15    │  23.99 ms │            24.16 ms │     no change │
│ QQuery 16    │  24.17 ms │            24.46 ms │     no change │
│ QQuery 17    │ 148.07 ms │           143.70 ms │     no change │
│ QQuery 18    │ 278.87 ms │           275.03 ms │     no change │
│ QQuery 19    │  39.61 ms │            36.85 ms │ +1.07x faster │
│ QQuery 20    │  48.85 ms │            50.28 ms │     no change │
│ QQuery 21    │ 317.83 ms │           316.43 ms │     no change │
│ QQuery 22    │  17.22 ms │            17.77 ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 1774.88ms │
│ Total Time (speedup_accumulate2)   │ 1746.44ms │
│ Average Time (HEAD)                │   80.68ms │
│ Average Time (speedup_accumulate2) │   79.38ms │
│ Queries Faster                     │         2 │
│ Queries Slower                     │         2 │
│ Queries with No Change             │        18 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

@Dandandan Dandandan marked this pull request as ready for review January 3, 2026 16:50
@Dandandan
Copy link
Contributor Author

│ QQuery 1 │ 118.98 ms │ 102.93 ms │ +1.16x faster │

Looks like it is a nice win.

@Dandandan Dandandan requested a review from alamb January 3, 2026 17:23
@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_accumulate2 (2e70075) to 70daf88 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@Dandandan Dandandan changed the title Speedup accumulators Optimize Nullstate / accumulators Jan 3, 2026
@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and speedup_accumulate2
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2364.93 ms │          2305.51 ms │     no change │
│ QQuery 1     │   910.69 ms │           924.96 ms │     no change │
│ QQuery 2     │  1939.92 ms │          1863.22 ms │     no change │
│ QQuery 3     │  1215.90 ms │          1099.86 ms │ +1.11x faster │
│ QQuery 4     │  2275.35 ms │          2224.26 ms │     no change │
│ QQuery 5     │ 28223.77 ms │         28200.15 ms │     no change │
│ QQuery 6     │  4000.90 ms │          3956.86 ms │     no change │
│ QQuery 7     │  3448.92 ms │          3338.27 ms │     no change │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 44380.39ms │
│ Total Time (speedup_accumulate2)   │ 43913.10ms │
│ Average Time (HEAD)                │  5547.55ms │
│ Average Time (speedup_accumulate2) │  5489.14ms │
│ Queries Faster                     │          1 │
│ Queries Slower                     │          0 │
│ Queries with No Change             │          7 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     1.44 ms │             1.45 ms │     no change │
│ QQuery 1     │    49.84 ms │            49.41 ms │     no change │
│ QQuery 2     │   131.95 ms │           135.29 ms │     no change │
│ QQuery 3     │   156.93 ms │           148.66 ms │ +1.06x faster │
│ QQuery 4     │  1067.52 ms │          1035.32 ms │     no change │
│ QQuery 5     │  1345.19 ms │          1345.68 ms │     no change │
│ QQuery 6     │     1.44 ms │             1.42 ms │     no change │
│ QQuery 7     │    53.58 ms │            55.38 ms │     no change │
│ QQuery 8     │  1466.22 ms │          1411.60 ms │     no change │
│ QQuery 9     │  1851.63 ms │          1798.48 ms │     no change │
│ QQuery 10    │   342.01 ms │           349.47 ms │     no change │
│ QQuery 11    │   395.10 ms │           408.47 ms │     no change │
│ QQuery 12    │  1276.70 ms │          1248.48 ms │     no change │
│ QQuery 13    │  1967.30 ms │          1965.99 ms │     no change │
│ QQuery 14    │  1263.45 ms │          1230.43 ms │     no change │
│ QQuery 15    │  1257.20 ms │          1211.84 ms │     no change │
│ QQuery 16    │  2604.95 ms │          2512.42 ms │     no change │
│ QQuery 17    │  2535.78 ms │          2516.91 ms │     no change │
│ QQuery 18    │  5090.06 ms │          4786.19 ms │ +1.06x faster │
│ QQuery 19    │   116.90 ms │           119.93 ms │     no change │
│ QQuery 20    │  1880.12 ms │          1822.01 ms │     no change │
│ QQuery 21    │  2195.23 ms │          2119.26 ms │     no change │
│ QQuery 22    │  3812.42 ms │          3667.21 ms │     no change │
│ QQuery 23    │ 14735.79 ms │         12032.40 ms │ +1.22x faster │
│ QQuery 24    │   215.78 ms │           203.87 ms │ +1.06x faster │
│ QQuery 25    │   469.34 ms │           458.36 ms │     no change │
│ QQuery 26    │   225.23 ms │           210.02 ms │ +1.07x faster │
│ QQuery 27    │  2773.05 ms │          2650.96 ms │     no change │
│ QQuery 28    │ 23602.84 ms │         24485.25 ms │     no change │
│ QQuery 29    │   943.91 ms │           975.03 ms │     no change │
│ QQuery 30    │  1352.11 ms │          1228.35 ms │ +1.10x faster │
│ QQuery 31    │  1384.11 ms │          1275.69 ms │ +1.08x faster │
│ QQuery 32    │  4762.69 ms │          4470.40 ms │ +1.07x faster │
│ QQuery 33    │  5680.36 ms │          5159.97 ms │ +1.10x faster │
│ QQuery 34    │  5672.22 ms │          5672.36 ms │     no change │
│ QQuery 35    │  1944.93 ms │          1911.46 ms │     no change │
│ QQuery 36    │    66.09 ms │            67.65 ms │     no change │
│ QQuery 37    │    44.41 ms │            42.84 ms │     no change │
│ QQuery 38    │    66.74 ms │            65.53 ms │     no change │
│ QQuery 39    │    99.97 ms │           104.32 ms │     no change │
│ QQuery 40    │    25.64 ms │            26.62 ms │     no change │
│ QQuery 41    │    22.72 ms │            23.88 ms │  1.05x slower │
│ QQuery 42    │    18.73 ms │            19.52 ms │     no change │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 94969.61ms │
│ Total Time (speedup_accumulate2)   │ 91025.77ms │
│ Average Time (HEAD)                │  2208.60ms │
│ Average Time (speedup_accumulate2) │  2116.88ms │
│ Queries Faster                     │          9 │
│ Queries Slower                     │          1 │
│ Queries with No Change             │         33 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 119.02 ms │           104.49 ms │ +1.14x faster │
│ QQuery 2     │  29.62 ms │            27.01 ms │ +1.10x faster │
│ QQuery 3     │  36.09 ms │            32.81 ms │ +1.10x faster │
│ QQuery 4     │  29.36 ms │            29.39 ms │     no change │
│ QQuery 5     │  87.85 ms │            86.50 ms │     no change │
│ QQuery 6     │  19.97 ms │            19.63 ms │     no change │
│ QQuery 7     │ 219.06 ms │           225.74 ms │     no change │
│ QQuery 8     │  33.16 ms │            33.57 ms │     no change │
│ QQuery 9     │ 102.39 ms │           101.95 ms │     no change │
│ QQuery 10    │  61.38 ms │            61.75 ms │     no change │
│ QQuery 11    │  17.81 ms │            16.50 ms │ +1.08x faster │
│ QQuery 12    │  51.31 ms │            50.94 ms │     no change │
│ QQuery 13    │  46.73 ms │            48.50 ms │     no change │
│ QQuery 14    │  13.41 ms │            13.24 ms │     no change │
│ QQuery 15    │  24.09 ms │            24.25 ms │     no change │
│ QQuery 16    │  24.32 ms │            23.57 ms │     no change │
│ QQuery 17    │ 147.16 ms │           141.32 ms │     no change │
│ QQuery 18    │ 273.15 ms │           275.10 ms │     no change │
│ QQuery 19    │  36.85 ms │            37.84 ms │     no change │
│ QQuery 20    │  50.20 ms │            48.36 ms │     no change │
│ QQuery 21    │ 287.40 ms │           300.98 ms │     no change │
│ QQuery 22    │  17.32 ms │            17.12 ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 1727.63ms │
│ Total Time (speedup_accumulate2)   │ 1720.53ms │
│ Average Time (HEAD)                │   78.53ms │
│ Average Time (speedup_accumulate2) │   78.21ms │
│ Queries Faster                     │         4 │
│ Queries Slower                     │         0 │
│ Queries with No Change             │        18 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_accumulate2 (fb249d6) to 70daf88 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

Benchmark script failed with exit code 101.

Last 10 lines of output:

Click to expand
    |            ^^^
help: consider using `Option::expect` to unwrap the `Option<Option<NullBuffer>>` value, panicking if the value is an `Option::None`
    |
880 |         let sums = PrimitiveArray::<T>::new(sums.into(), nulls.expect("REASON")) // zero copy
    |                                                               +++++++++++++++++

Some errors have detailed explanations: E0308, E0599, E0624.
For more information about an error, try `rustc --explain E0308`.
error: could not compile `datafusion-functions-aggregate` (lib) due to 7 previous errors
warning: build failed, waiting for other jobs to finish...

@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_accumulate2 (bdeda6a) to 70daf88 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and speedup_accumulate2
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2373.27 ms │          2395.72 ms │     no change │
│ QQuery 1     │   942.08 ms │           911.66 ms │     no change │
│ QQuery 2     │  1854.86 ms │          1869.03 ms │     no change │
│ QQuery 3     │  1209.39 ms │          1088.92 ms │ +1.11x faster │
│ QQuery 4     │  2263.96 ms │          2234.55 ms │     no change │
│ QQuery 5     │ 28186.85 ms │         28100.66 ms │     no change │
│ QQuery 6     │  4005.89 ms │          3956.68 ms │     no change │
│ QQuery 7     │  3588.24 ms │          2855.47 ms │ +1.26x faster │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 44424.54ms │
│ Total Time (speedup_accumulate2)   │ 43412.69ms │
│ Average Time (HEAD)                │  5553.07ms │
│ Average Time (speedup_accumulate2) │  5426.59ms │
│ Queries Faster                     │          2 │
│ Queries Slower                     │          0 │
│ Queries with No Change             │          6 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     1.41 ms │             1.46 ms │     no change │
│ QQuery 1     │    48.45 ms │            50.70 ms │     no change │
│ QQuery 2     │   134.17 ms │           133.22 ms │     no change │
│ QQuery 3     │   154.16 ms │           153.01 ms │     no change │
│ QQuery 4     │  1054.42 ms │          1114.63 ms │  1.06x slower │
│ QQuery 5     │  1385.25 ms │          1384.91 ms │     no change │
│ QQuery 6     │     1.44 ms │             1.42 ms │     no change │
│ QQuery 7     │    53.88 ms │            55.16 ms │     no change │
│ QQuery 8     │  1429.95 ms │          1522.33 ms │  1.06x slower │
│ QQuery 9     │  1866.99 ms │          1877.97 ms │     no change │
│ QQuery 10    │   345.56 ms │           343.67 ms │     no change │
│ QQuery 11    │   395.16 ms │           391.14 ms │     no change │
│ QQuery 12    │  1269.03 ms │          1310.46 ms │     no change │
│ QQuery 13    │  1948.75 ms │          1965.13 ms │     no change │
│ QQuery 14    │  1237.28 ms │          1297.19 ms │     no change │
│ QQuery 15    │  1237.17 ms │          1297.92 ms │     no change │
│ QQuery 16    │  2577.77 ms │          2590.83 ms │     no change │
│ QQuery 17    │  2535.10 ms │          2569.87 ms │     no change │
│ QQuery 18    │  5506.90 ms │          4897.52 ms │ +1.12x faster │
│ QQuery 19    │   119.36 ms │           123.21 ms │     no change │
│ QQuery 20    │  1900.62 ms │          1844.48 ms │     no change │
│ QQuery 21    │  2215.21 ms │          2152.16 ms │     no change │
│ QQuery 22    │  3830.99 ms │          3677.16 ms │     no change │
│ QQuery 23    │ 12366.74 ms │         12186.44 ms │     no change │
│ QQuery 24    │   212.29 ms │           211.25 ms │     no change │
│ QQuery 25    │   475.00 ms │           456.82 ms │     no change │
│ QQuery 26    │   220.23 ms │           213.57 ms │     no change │
│ QQuery 27    │  2769.03 ms │          2654.80 ms │     no change │
│ QQuery 28    │ 23318.28 ms │         24396.04 ms │     no change │
│ QQuery 29    │   943.58 ms │           973.65 ms │     no change │
│ QQuery 30    │  1318.27 ms │          1278.99 ms │     no change │
│ QQuery 31    │  1344.91 ms │          1306.33 ms │     no change │
│ QQuery 32    │  5032.81 ms │          4454.45 ms │ +1.13x faster │
│ QQuery 33    │  5567.72 ms │          5389.87 ms │     no change │
│ QQuery 34    │  5638.89 ms │          6574.74 ms │  1.17x slower │
│ QQuery 35    │  1940.63 ms │          1933.81 ms │     no change │
│ QQuery 36    │    64.13 ms │            63.61 ms │     no change │
│ QQuery 37    │    44.81 ms │            43.48 ms │     no change │
│ QQuery 38    │    66.09 ms │            64.95 ms │     no change │
│ QQuery 39    │   102.02 ms │           100.71 ms │     no change │
│ QQuery 40    │    26.86 ms │            27.24 ms │     no change │
│ QQuery 41    │    22.40 ms │            22.13 ms │     no change │
│ QQuery 42    │    19.60 ms │            19.54 ms │     no change │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 92743.32ms │
│ Total Time (speedup_accumulate2)   │ 93127.94ms │
│ Average Time (HEAD)                │  2156.82ms │
│ Average Time (speedup_accumulate2) │  2165.77ms │
│ Queries Faster                     │          2 │
│ Queries Slower                     │          3 │
│ Queries with No Change             │         38 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 119.22 ms │           103.20 ms │ +1.16x faster │
│ QQuery 2     │  28.93 ms │            29.34 ms │     no change │
│ QQuery 3     │  37.58 ms │            37.68 ms │     no change │
│ QQuery 4     │  28.39 ms │            28.58 ms │     no change │
│ QQuery 5     │  86.40 ms │            85.37 ms │     no change │
│ QQuery 6     │  19.86 ms │            20.07 ms │     no change │
│ QQuery 7     │ 215.03 ms │           223.09 ms │     no change │
│ QQuery 8     │  31.60 ms │            31.79 ms │     no change │
│ QQuery 9     │  96.89 ms │            99.16 ms │     no change │
│ QQuery 10    │  62.40 ms │            63.81 ms │     no change │
│ QQuery 11    │  17.38 ms │            17.70 ms │     no change │
│ QQuery 12    │  48.98 ms │            49.68 ms │     no change │
│ QQuery 13    │  46.67 ms │            47.67 ms │     no change │
│ QQuery 14    │  13.34 ms │            13.30 ms │     no change │
│ QQuery 15    │  24.20 ms │            23.98 ms │     no change │
│ QQuery 16    │  23.64 ms │            24.36 ms │     no change │
│ QQuery 17    │ 148.03 ms │           139.74 ms │ +1.06x faster │
│ QQuery 18    │ 278.92 ms │           267.83 ms │     no change │
│ QQuery 19    │  36.96 ms │            37.55 ms │     no change │
│ QQuery 20    │  49.38 ms │            49.45 ms │     no change │
│ QQuery 21    │ 292.16 ms │           309.29 ms │  1.06x slower │
│ QQuery 22    │  17.03 ms │            17.17 ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 1722.99ms │
│ Total Time (speedup_accumulate2)   │ 1719.81ms │
│ Average Time (HEAD)                │   78.32ms │
│ Average Time (speedup_accumulate2) │   78.17ms │
│ Queries Faster                     │         2 │
│ Queries Slower                     │         1 │
│ Queries with No Change             │        19 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_accumulate2 (05414e8) to 70daf88 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@Dandandan Dandandan force-pushed the speedup_accumulate2 branch from fb8f6ac to 05414e8 Compare January 4, 2026 12:53
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is looking pretty cool

/// If `seen_values[i]` is false, have not seen any values that
/// pass the filter yet for group `i`
seen_values: BooleanBufferBuilder,
/// If true, all groups seen so far have seen at least one non-null value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could encode this as an state enum so it is clearer how things are related to BooleamBufferBuilder

Something like

enum SeenValues {
   /// All groups seen so far have seen at least one non-null value
  All {
    num_values: usize,
  }
  // some groups have not yet seen a non-null value 
  Some {
    values: BooleanBufferBuilder,
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@alamb alamb added the performance Make DataFusion faster label Jan 5, 2026
@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_accumulate2 (b00e8d8) to 418f62a diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and speedup_accumulate2
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2333.39 ms │          2299.97 ms │     no change │
│ QQuery 1     │   933.95 ms │           953.47 ms │     no change │
│ QQuery 2     │  1893.08 ms │          1909.69 ms │     no change │
│ QQuery 3     │  1139.36 ms │          1116.75 ms │     no change │
│ QQuery 4     │  2264.83 ms │          2182.98 ms │     no change │
│ QQuery 5     │ 28452.43 ms │         28173.69 ms │     no change │
│ QQuery 6     │  3814.54 ms │          3798.49 ms │     no change │
│ QQuery 7     │  3442.31 ms │          2723.35 ms │ +1.26x faster │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 44273.89ms │
│ Total Time (speedup_accumulate2)   │ 43158.38ms │
│ Average Time (HEAD)                │  5534.24ms │
│ Average Time (speedup_accumulate2) │  5394.80ms │
│ Queries Faster                     │          1 │
│ Queries Slower                     │          0 │
│ Queries with No Change             │          7 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     1.44 ms │             1.45 ms │     no change │
│ QQuery 1     │    51.06 ms │            48.54 ms │     no change │
│ QQuery 2     │   133.44 ms │           134.12 ms │     no change │
│ QQuery 3     │   156.66 ms │           151.31 ms │     no change │
│ QQuery 4     │  1082.03 ms │          1064.49 ms │     no change │
│ QQuery 5     │  1390.57 ms │          1335.63 ms │     no change │
│ QQuery 6     │     1.44 ms │             1.42 ms │     no change │
│ QQuery 7     │    54.85 ms │            54.06 ms │     no change │
│ QQuery 8     │  1475.15 ms │          1406.55 ms │     no change │
│ QQuery 9     │  1910.86 ms │          1725.53 ms │ +1.11x faster │
│ QQuery 10    │   360.14 ms │           352.51 ms │     no change │
│ QQuery 11    │   415.98 ms │           430.31 ms │     no change │
│ QQuery 12    │  1297.67 ms │          1271.24 ms │     no change │
│ QQuery 13    │  1978.48 ms │          1947.09 ms │     no change │
│ QQuery 14    │  1253.35 ms │          1218.98 ms │     no change │
│ QQuery 15    │  1232.75 ms │          1195.40 ms │     no change │
│ QQuery 16    │  2581.97 ms │          2488.56 ms │     no change │
│ QQuery 17    │  2536.26 ms │          2468.55 ms │     no change │
│ QQuery 18    │  5080.79 ms │          4745.32 ms │ +1.07x faster │
│ QQuery 19    │   121.82 ms │           125.91 ms │     no change │
│ QQuery 20    │  1938.69 ms │          1890.44 ms │     no change │
│ QQuery 21    │  2214.73 ms │          2166.65 ms │     no change │
│ QQuery 22    │  3791.39 ms │          3663.18 ms │     no change │
│ QQuery 23    │ 12415.40 ms │         11987.61 ms │     no change │
│ QQuery 24    │   209.02 ms │           202.67 ms │     no change │
│ QQuery 25    │   460.76 ms │           453.63 ms │     no change │
│ QQuery 26    │   202.12 ms │           219.90 ms │  1.09x slower │
│ QQuery 27    │  2719.52 ms │          2641.26 ms │     no change │
│ QQuery 28    │ 24372.90 ms │         24377.82 ms │     no change │
│ QQuery 29    │   966.87 ms │           947.41 ms │     no change │
│ QQuery 30    │  1301.18 ms │          1240.14 ms │     no change │
│ QQuery 31    │  1334.30 ms │          1317.26 ms │     no change │
│ QQuery 32    │  5120.67 ms │          4219.10 ms │ +1.21x faster │
│ QQuery 33    │  5327.77 ms │          5352.88 ms │     no change │
│ QQuery 34    │  5514.46 ms │          5419.84 ms │     no change │
│ QQuery 35    │  1925.83 ms │          1846.22 ms │     no change │
│ QQuery 36    │    67.48 ms │            65.00 ms │     no change │
│ QQuery 37    │    44.04 ms │            43.78 ms │     no change │
│ QQuery 38    │    64.92 ms │            63.52 ms │     no change │
│ QQuery 39    │   100.91 ms │            99.93 ms │     no change │
│ QQuery 40    │    25.68 ms │            25.42 ms │     no change │
│ QQuery 41    │    22.87 ms │            21.67 ms │ +1.06x faster │
│ QQuery 42    │    19.02 ms │            19.32 ms │     no change │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 93277.26ms │
│ Total Time (speedup_accumulate2)   │ 90451.62ms │
│ Average Time (HEAD)                │  2169.24ms │
│ Average Time (speedup_accumulate2) │  2103.53ms │
│ Queries Faster                     │          4 │
│ Queries Slower                     │          1 │
│ Queries with No Change             │         38 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 110.65 ms │           102.41 ms │ +1.08x faster │
│ QQuery 2     │  29.51 ms │            29.30 ms │     no change │
│ QQuery 3     │  34.18 ms │            32.90 ms │     no change │
│ QQuery 4     │  30.15 ms │            28.56 ms │ +1.06x faster │
│ QQuery 5     │  87.63 ms │            87.93 ms │     no change │
│ QQuery 6     │  19.40 ms │            19.70 ms │     no change │
│ QQuery 7     │ 215.54 ms │           212.90 ms │     no change │
│ QQuery 8     │  34.75 ms │            33.94 ms │     no change │
│ QQuery 9     │ 103.62 ms │           105.30 ms │     no change │
│ QQuery 10    │  62.00 ms │            61.58 ms │     no change │
│ QQuery 11    │  18.55 ms │            17.91 ms │     no change │
│ QQuery 12    │  50.42 ms │            51.13 ms │     no change │
│ QQuery 13    │  49.82 ms │            46.67 ms │ +1.07x faster │
│ QQuery 14    │  13.77 ms │            14.20 ms │     no change │
│ QQuery 15    │  24.92 ms │            23.96 ms │     no change │
│ QQuery 16    │  25.00 ms │            24.12 ms │     no change │
│ QQuery 17    │ 155.74 ms │           143.97 ms │ +1.08x faster │
│ QQuery 18    │ 280.95 ms │           281.10 ms │     no change │
│ QQuery 19    │  36.78 ms │            38.61 ms │     no change │
│ QQuery 20    │  49.87 ms │            48.57 ms │     no change │
│ QQuery 21    │ 287.98 ms │           300.77 ms │     no change │
│ QQuery 22    │  17.58 ms │            18.66 ms │  1.06x slower │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 1738.81ms │
│ Total Time (speedup_accumulate2)   │ 1724.19ms │
│ Average Time (HEAD)                │   79.04ms │
│ Average Time (speedup_accumulate2) │   78.37ms │
│ Queries Faster                     │         4 │
│ Queries Slower                     │         1 │
│ Queries with No Change             │        17 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor Author

run benchmark aggregate

@alamb-ghbot
Copy link

🤖 Hi @Dandandan, thanks for the request (#19625 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, external_aggr, tpcds, tpch, tpch10, tpch_mem, tpch_mem10
  • Criterion: aggregate_query_sql, aggregate_vectorized, case_when, character_length, in_list, range_and_generate_series, sort, sql_planner, strpos, substr_index, with_hashes

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...
Unsupported benchmarks: aggregate.

@Dandandan
Copy link
Contributor Author

run benchmark aggregate_query_sql aggregate_vectorized

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_accumulate2 (b00e8d8) to 418f62a diff
BENCH_NAME=aggregate_query_sql
BENCH_COMMAND=cargo bench --features=parquet --bench aggregate_query_sql
BENCH_FILTER=
BENCH_BRANCH_NAME=speedup_accumulate2
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                         main                                   speedup_accumulate2
-----                                                                         ----                                   -------------------
aggregate_query_approx_percentile_cont_on_f32                                 1.00      3.8±0.14ms        ? ?/sec    1.00      3.8±0.20ms        ? ?/sec
aggregate_query_approx_percentile_cont_on_u64                                 1.00      4.1±0.22ms        ? ?/sec    1.00      4.1±0.21ms        ? ?/sec
aggregate_query_distinct_median                                               1.00      2.7±0.03ms        ? ?/sec    1.02      2.8±0.07ms        ? ?/sec
aggregate_query_group_by                                                      1.00  1551.7±43.47µs        ? ?/sec    1.01  1569.1±52.27µs        ? ?/sec
aggregate_query_group_by_u64 15 12                                            1.00  1442.8±34.13µs        ? ?/sec    1.00  1441.9±42.66µs        ? ?/sec
aggregate_query_group_by_u64_multiple_keys                                    1.01      4.0±0.26ms        ? ?/sec    1.00      3.9±0.25ms        ? ?/sec
aggregate_query_group_by_wide_u64_and_f32_without_aggregate_expressions       1.03      2.1±0.12ms        ? ?/sec    1.00      2.1±0.04ms        ? ?/sec
aggregate_query_group_by_wide_u64_and_string_without_aggregate_expressions    1.01      2.6±0.15ms        ? ?/sec    1.00      2.5±0.16ms        ? ?/sec
aggregate_query_group_by_with_filter                                          1.01  1432.9±51.32µs        ? ?/sec    1.00  1415.1±17.28µs        ? ?/sec
aggregate_query_group_by_with_filter_u64 15 12                                1.00  1389.9±21.06µs        ? ?/sec    1.00  1390.8±27.45µs        ? ?/sec
aggregate_query_no_group_by 15 12                                             1.00   735.3±13.90µs        ? ?/sec    1.00   737.8±12.39µs        ? ?/sec
aggregate_query_no_group_by_count_distinct_narrow                             1.02  1178.6±46.94µs        ? ?/sec    1.00  1154.4±20.87µs        ? ?/sec
aggregate_query_no_group_by_count_distinct_wide                               1.00  1940.4±75.94µs        ? ?/sec    1.00  1936.6±87.09µs        ? ?/sec
aggregate_query_no_group_by_min_max_f64                                       1.00    682.0±9.23µs        ? ?/sec    1.00   678.6±12.65µs        ? ?/sec
first_last_ignore_nulls                                                       1.01      2.2±0.10ms        ? ?/sec    1.00      2.2±0.10ms        ? ?/sec
first_last_many_columns                                                       1.01      2.2±0.11ms        ? ?/sec    1.00      2.2±0.10ms        ? ?/sec
first_last_one_column                                                         1.03  1851.9±82.09µs        ? ?/sec    1.00  1803.9±41.36µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_accumulate2 (f646fe8) to 418f62a diff
BENCH_NAME=aggregate_vectorized
BENCH_COMMAND=cargo bench --features=parquet --bench aggregate_vectorized
BENCH_FILTER=
BENCH_BRANCH_NAME=speedup_accumulate2
Results will be posted here when complete

@alamb-ghbot
Copy link

Benchmark script failed with exit code 101.

Last 10 lines of output:

Click to expand
++ BENCH_BRANCH_NAME=speedup_accumulate2
++ rm -f /tmp/comment.txt
++ cat
+++ uname -a
++ gh pr comment -F /tmp/comment.txt https://github.com/apache/datafusion/pull/19625
https://github.com/apache/datafusion/pull/19625#issuecomment-3720299746
++ rm -rf target/criterion/
++ cargo bench --features=parquet --bench aggregate_vectorized -- --save-baseline speedup_accumulate2
error: target `aggregate_vectorized` in package `datafusion-physical-plan` requires the features: `test_utils`
Consider enabling them by passing, e.g., `--features="test_utils"`

@Dandandan Dandandan requested a review from alamb January 8, 2026 09:41
@Dandandan
Copy link
Contributor Author

run benchmark tpch_mem

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_accumulate2 (47c46e7) to 418f62a diff using: tpch_mem
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and speedup_accumulate2
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 115.45 ms │           101.17 ms │ +1.14x faster │
│ QQuery 2     │  29.23 ms │            29.10 ms │     no change │
│ QQuery 3     │  34.26 ms │            31.47 ms │ +1.09x faster │
│ QQuery 4     │  28.96 ms │            29.15 ms │     no change │
│ QQuery 5     │  86.94 ms │            86.66 ms │     no change │
│ QQuery 6     │  19.73 ms │            19.61 ms │     no change │
│ QQuery 7     │ 234.34 ms │           220.45 ms │ +1.06x faster │
│ QQuery 8     │  32.59 ms │            34.38 ms │  1.05x slower │
│ QQuery 9     │ 103.88 ms │            89.02 ms │ +1.17x faster │
│ QQuery 10    │  61.48 ms │            62.48 ms │     no change │
│ QQuery 11    │  17.37 ms │            18.28 ms │  1.05x slower │
│ QQuery 12    │  49.02 ms │            49.53 ms │     no change │
│ QQuery 13    │  46.45 ms │            47.65 ms │     no change │
│ QQuery 14    │  13.33 ms │            13.56 ms │     no change │
│ QQuery 15    │  24.16 ms │            24.07 ms │     no change │
│ QQuery 16    │  24.34 ms │            23.88 ms │     no change │
│ QQuery 17    │ 151.75 ms │           144.72 ms │     no change │
│ QQuery 18    │ 277.30 ms │           281.26 ms │     no change │
│ QQuery 19    │  36.91 ms │            37.35 ms │     no change │
│ QQuery 20    │  51.42 ms │            61.27 ms │  1.19x slower │
│ QQuery 21    │ 317.58 ms │           321.96 ms │     no change │
│ QQuery 22    │  17.43 ms │            17.42 ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 1773.93ms │
│ Total Time (speedup_accumulate2)   │ 1744.43ms │
│ Average Time (HEAD)                │   80.63ms │
│ Average Time (speedup_accumulate2) │   79.29ms │
│ Queries Faster                     │         4 │
│ Queries Slower                     │         3 │
│ Queries with No Change             │        15 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Dandandan -- this looks good to me. I have some comment / structure comments but I don't think anything is needed prior to merge (we could do it as follow on PRs or never)

// "not seen" valid)
let seen_values =
initialize_builder(&mut self.seen_values, total_num_groups, false);
if let SeenValues::All { num_values } = &mut self.seen_values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possibility is to make this another another function to reduce some duplication and have a place it could be explained

Maybe something like

if let Some(num_values) = self.all_values_mut() && opt_filter.is_none && values.null_count == 0 {
...
}

?

Though maybe an extra level of indirection would make it harder to follow.

// "not seen" valid)
let seen_values =
initialize_builder(&mut self.seen_values, total_num_groups, false);
if let SeenValues::All { num_values } = &mut self.seen_values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let SeenValues::All { num_values } = &mut self.seen_values
// skip null handling if no nulls in input or accumulator
if let SeenValues::All { num_values } = &mut self.seen_values

// "not seen" valid)
let seen_values =
initialize_builder(&mut self.seen_values, total_num_groups, false);
if let SeenValues::All { num_values } = &mut self.seen_values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let SeenValues::All { num_values } = &mut self.seen_values
// skip null handling if no nulls in input or accumulator
if let SeenValues::All { num_values } = &mut self.seen_values

SeenValues::Some { .. } => {
let mut old_values = match std::mem::replace(
&mut self.seen_values,
SeenValues::All { num_values: 0 },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nicer if you implemented SeenValues::new() or maybe even SeenValues::default

SO this couls look like

 let mut old_values = match std::mem::take(&mut self.seen_values);

total_num_groups,
|group_index, new_value| {
let value = &mut self.values[group_index];
// SAFETY: group_index is guaranteed to be in bounds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend adding safety notes to the docs of GroupsAccumulator in https://github.com/apache/datafusion/blob/36ec9f1de0aeabca60b8f7ebe07d650b8ef03506/datafusion/expr-common/src/groups_accumulator.rs#L114-L113

That explains that all group indexes are guaranteed to be <= total_num_groups and that can be relied on for safety

@alamb
Copy link
Contributor

alamb commented Jan 8, 2026

run benchmark tpch_mem

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_accumulate2 (47c46e7) to 418f62a diff using: tpch_mem
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jan 8, 2026

(will see if we can see a reproducable speedup)

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and speedup_accumulate2
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ speedup_accumulate2 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 112.48 ms │           101.49 ms │ +1.11x faster │
│ QQuery 2     │  28.91 ms │            27.08 ms │ +1.07x faster │
│ QQuery 3     │  34.88 ms │            33.83 ms │     no change │
│ QQuery 4     │  29.14 ms │            29.03 ms │     no change │
│ QQuery 5     │  86.70 ms │            86.09 ms │     no change │
│ QQuery 6     │  19.85 ms │            19.85 ms │     no change │
│ QQuery 7     │ 221.38 ms │           227.37 ms │     no change │
│ QQuery 8     │  32.84 ms │            35.45 ms │  1.08x slower │
│ QQuery 9     │  95.84 ms │           105.29 ms │  1.10x slower │
│ QQuery 10    │  62.15 ms │            62.78 ms │     no change │
│ QQuery 11    │  17.97 ms │            17.81 ms │     no change │
│ QQuery 12    │  53.11 ms │            51.65 ms │     no change │
│ QQuery 13    │  47.50 ms │            49.66 ms │     no change │
│ QQuery 14    │  13.63 ms │            13.36 ms │     no change │
│ QQuery 15    │  24.06 ms │            24.48 ms │     no change │
│ QQuery 16    │  23.99 ms │            24.11 ms │     no change │
│ QQuery 17    │ 148.75 ms │           143.62 ms │     no change │
│ QQuery 18    │ 278.74 ms │           276.17 ms │     no change │
│ QQuery 19    │  37.07 ms │            37.81 ms │     no change │
│ QQuery 20    │  49.95 ms │            48.42 ms │     no change │
│ QQuery 21    │ 320.43 ms │           322.26 ms │     no change │
│ QQuery 22    │  17.48 ms │            17.28 ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 1756.84ms │
│ Total Time (speedup_accumulate2)   │ 1754.87ms │
│ Average Time (HEAD)                │   79.86ms │
│ Average Time (speedup_accumulate2) │   79.77ms │
│ Queries Faster                     │         2 │
│ Queries Slower                     │         2 │
│ Queries with No Change             │        18 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize NullState for non-null data

3 participants