Company or project name
No response
Describe the situation
Summary
join_runtime_filter query 10 regressed on ARM/aarch64 for client_time.
Builds and environment
Measured on ARM/aarch64, Neoverse-V2 class CPU, 32 logical CPUs, about 123 GiB RAM. Hostname and private workspace paths are intentionally omitted.
Reproduction
Performance test: join_runtime_filter
Query index: 10
Metric: client_time
SQL:
SELECT avg(o_totalprice)
FROM orders
JOIN (SELECT * FROM customer JOIN nation ON c_nationkey = n_nationkey WHERE n_name = 'WAKANDA') AS cn
ON c_custkey = o_custkey
SETTINGS enable_join_runtime_filters=1
Datasets / inputs:
- Use an idle ARM/aarch64 host with similar CPU class if possible; the measurements below were taken on Neoverse-V2.
- Download the public ARM ClickHouse binaries listed in the build table for the baseline and affected revisions.
- Load the datasets/fixtures listed below using the normal ClickHouse performance-test data setup.
- Run the SQL below at least 101 times for each revision and compare median
client_time.
- A valid reproduction should show the affected revision slower by approximately the measured shift, while same-revision reruns stay near zero shift.
A minimal manual loop, after starting each revision as a local server and loading data, is:
for i in $(seq 1 9); do
clickhouse-client --time --query "$QUERY"
done
Measurements
| Comparison |
Builds |
Runs |
Left median |
Right median |
Shift |
Left range |
Right range |
| before→after |
baseline vs affected |
96 / 96 |
0.009844s |
0.011089s |
+12.65% |
0.009132s–0.010607s |
0.010275s–0.011847s |
| before→latest |
baseline vs latest |
97 / 97 |
0.009688s |
0.011050s |
+14.06% |
0.009182s–0.010309s |
0.010445s–0.011608s |
| before→before |
baseline vs baseline |
101 / 101 |
0.009979s |
0.009898s |
-0.81% |
0.009134s–0.010670s |
0.009216s–0.011057s |
| after→after |
affected vs affected |
91 / 91 |
0.011203s |
0.011019s |
-1.65% |
0.010364s–0.012043s |
0.010048s–0.011487s |
| latest→latest |
latest vs latest |
92 / 92 |
0.010774s |
0.011068s |
+2.73% |
0.010256s–0.011607s |
0.010396s–0.011887s |
Stability checks: before→before -0.81%, after→after -1.65%, latest→latest +2.73%. Same-build comparisons are included above so reviewers can distinguish a regression from benchmark noise.
Approximate introduction window
The regression is bounded to the revision/time window below. This is a localization aid, not a final root-cause claim.
| Start revision |
Start time |
Start subject |
End revision |
End time |
End subject |
Width |
Evidence |
| ad347db |
2026-05-08T10:34:54+00:00 |
Merge pull request #103891 from clickgapai/qa-bot/coverage-pr60419 |
c7d5efb |
2026-05-08T10:43:10+00:00 |
Merge pull request #104136 from Algunenano/parts-metadata-arena |
~0.006 days |
baseline_bound at ad347db (baseline); signal at c7d5efb (controlled_window_endpoint) |
Code areas and mechanism clues
-
Files changed in the bounded window: src/Common/AsynchronousMetrics.cpp, src/Common/CurrentMetrics.cpp, src/Common/Jemalloc.cpp, src/Common/Jemalloc.h, src/Common/JemallocMergeTreeArena.cpp, src/Common/JemallocMergeTreeArena.h, src/Storages/MergeTree/DataPartsExchange.cpp, src/Storages/MergeTree/IMergeTreeDataPart.cpp, src/Storages/MergeTree/MergeTask.cpp, src/Storages/MergeTree/MergeTreeData.cpp, src/Storages/MergeTree/MergeTreeDataPartBuilder.cpp, src/Storages/MergeTree/MutateTask.cpp.
-
Shortstat for that window: 19 files changed, 436 insertions(+), 9 deletions(-).
-
Production files worth checking first: src/Common/AsynchronousMetrics.cpp, src/Common/JemallocMergeTreeArena.h, src/Storages/MergeTree/IMergeTreeDataPart.cpp, src/Storages/MergeTree/MergeTreeData.cpp, src/Storages/MergeTree/registerStorageMergeTree.cpp, src/Common/JemallocMergeTreeArena.cpp, src/Storages/MergeTree/MergeTreeDataPartBuilder.cpp, src/Storages/MergeTree/DataPartsExchange.cpp.
-
Static code/query review: Changed files did not strongly map to the benchmark query. Treat any code context as triage-only.
-
Suspect area from static review: No direct code-path suspect identified by deterministic rules.
These are investigation leads only; the issue should not assign blame to a change without a validating patch or rollback measurement.
-
Probe-level client time: baseline 0.057404s → affected 0.057851s (+0.78%).
-
Server query duration: baseline 9 ms → affected 9 ms (+0.00%).
Largest captured ProfileEvents deltas:
| ProfileEvent |
Baseline median |
Affected median |
Delta |
| LocalThreadPoolLockWaitMicroseconds |
18.0 |
9 |
-50.00% |
| OSCPUWaitMicroseconds |
7 |
5 |
-28.57% |
| QueryProfilerSignalOverruns |
2 |
1.5 |
-25.00% |
| OSCPUVirtualTimeMicroseconds |
18,816 |
23,388 |
+24.30% |
| GlobalThreadPoolLockWaitMicroseconds |
17.0 |
21.0 |
+23.53% |
| NetworkSendElapsedMicroseconds |
165.0 |
134.0 |
-18.79% |
| QueryProfilerRuns |
50.0 |
56.0 |
+12.00% |
| FilteringMarksWithPrimaryKeyMicroseconds |
9 |
8 |
-11.11% |
Largest captured processor elapsed-time deltas:
| Processor |
Baseline µs |
Affected µs |
Delta |
| FillingRightJoinSide |
163.0 |
93.0 |
-42.94% |
| SimpleSquashingTransform |
12.0 |
8 |
-33.33% |
| LazyOutputFormat |
27.0 |
19.0 |
-29.63% |
| ConvertingAggregatedToChunksTransform |
15.0 |
13.0 |
-13.33% |
| ExpressionTransform |
11.0 |
12.0 |
+9.09% |
| MergeTreeSelect(pool: ReadPoolInOrder, algorithm: InOrder) |
538.0 |
496.0 |
-7.81% |
Fix or validation status
No validated fix is available yet.
Most useful next patch/revert targets: src/Common/AsynchronousMetrics.cpp, src/Common/JemallocMergeTreeArena.h, src/Storages/MergeTree/IMergeTreeDataPart.cpp, src/Storages/MergeTree/MergeTreeData.cpp, src/Storages/MergeTree/registerStorageMergeTree.cpp, src/Common/JemallocMergeTreeArena.cpp, src/Storages/MergeTree/MergeTreeDataPartBuilder.cpp, src/Storages/MergeTree/DataPartsExchange.cpp.
The current evidence narrows the problem and code areas, but no patch/rollback has measured positive validation yet.
Which ClickHouse versions are affected?
latest
How to reproduce
Reproduction
Performance test: join_runtime_filter
Query index: 10
Metric: client_time
SQL:
SELECT avg(o_totalprice)
FROM orders
JOIN (SELECT * FROM customer JOIN nation ON c_nationkey = n_nationkey WHERE n_name = 'WAKANDA') AS cn
ON c_custkey = o_custkey
SETTINGS enable_join_runtime_filters=1
Datasets / inputs:
- Use an idle ARM/aarch64 host with similar CPU class if possible; the measurements below were taken on Neoverse-V2.
- Download the public ARM ClickHouse binaries listed in the build table for the baseline and affected revisions.
- Load the datasets/fixtures listed below using the normal ClickHouse performance-test data setup.
- Run the SQL below at least 101 times for each revision and compare median
client_time.
- A valid reproduction should show the affected revision slower by approximately the measured shift, while same-revision reruns stay near zero shift.
A minimal manual loop, after starting each revision as a local server and loading data, is:
for i in $(seq 1 9); do
clickhouse-client --time --query "$QUERY"
done
Expected performance
No response
Related issues and pull requests
No response
Additional context
No response
Company or project name
No response
Describe the situation
Summary
join_runtime_filterquery10regressed on ARM/aarch64 forclient_time.single_step_reproduced.Builds and environment
Measured on ARM/aarch64, Neoverse-V2 class CPU, 32 logical CPUs, about 123 GiB RAM. Hostname and private workspace paths are intentionally omitted.
Reproduction
Performance test:
join_runtime_filterQuery index:
10Metric:
client_timeSQL:
Datasets / inputs:
client_time.A minimal manual loop, after starting each revision as a local server and loading data, is:
Measurements
Stability checks: before→before -0.81%, after→after -1.65%, latest→latest +2.73%. Same-build comparisons are included above so reviewers can distinguish a regression from benchmark noise.
Approximate introduction window
The regression is bounded to the revision/time window below. This is a localization aid, not a final root-cause claim.
Code areas and mechanism clues
Files changed in the bounded window:
src/Common/AsynchronousMetrics.cpp,src/Common/CurrentMetrics.cpp,src/Common/Jemalloc.cpp,src/Common/Jemalloc.h,src/Common/JemallocMergeTreeArena.cpp,src/Common/JemallocMergeTreeArena.h,src/Storages/MergeTree/DataPartsExchange.cpp,src/Storages/MergeTree/IMergeTreeDataPart.cpp,src/Storages/MergeTree/MergeTask.cpp,src/Storages/MergeTree/MergeTreeData.cpp,src/Storages/MergeTree/MergeTreeDataPartBuilder.cpp,src/Storages/MergeTree/MutateTask.cpp.Shortstat for that window: 19 files changed, 436 insertions(+), 9 deletions(-).
Production files worth checking first:
src/Common/AsynchronousMetrics.cpp,src/Common/JemallocMergeTreeArena.h,src/Storages/MergeTree/IMergeTreeDataPart.cpp,src/Storages/MergeTree/MergeTreeData.cpp,src/Storages/MergeTree/registerStorageMergeTree.cpp,src/Common/JemallocMergeTreeArena.cpp,src/Storages/MergeTree/MergeTreeDataPartBuilder.cpp,src/Storages/MergeTree/DataPartsExchange.cpp.Static code/query review: Changed files did not strongly map to the benchmark query. Treat any code context as triage-only.
Suspect area from static review: No direct code-path suspect identified by deterministic rules.
These are investigation leads only; the issue should not assign blame to a change without a validating patch or rollback measurement.
Probe-level client time: baseline 0.057404s → affected 0.057851s (+0.78%).
Server query duration: baseline 9 ms → affected 9 ms (+0.00%).
Largest captured ProfileEvents deltas:
Largest captured processor elapsed-time deltas:
Fix or validation status
No validated fix is available yet.
Most useful next patch/revert targets:
src/Common/AsynchronousMetrics.cpp,src/Common/JemallocMergeTreeArena.h,src/Storages/MergeTree/IMergeTreeDataPart.cpp,src/Storages/MergeTree/MergeTreeData.cpp,src/Storages/MergeTree/registerStorageMergeTree.cpp,src/Common/JemallocMergeTreeArena.cpp,src/Storages/MergeTree/MergeTreeDataPartBuilder.cpp,src/Storages/MergeTree/DataPartsExchange.cpp.The current evidence narrows the problem and code areas, but no patch/rollback has measured positive validation yet.
Which ClickHouse versions are affected?
latest
How to reproduce
Reproduction
Performance test:
join_runtime_filterQuery index:
10Metric:
client_timeSQL:
Datasets / inputs:
client_time.A minimal manual loop, after starting each revision as a local server and loading data, is:
Expected performance
No response
Related issues and pull requests
No response
Additional context
No response