Use max_insert_threads for plain INSERTs without materialized views by CheSema · Pull Request #102961 · ClickHouse/ClickHouse

CheSema · 2026-04-17T10:44:25Z

ConcurrencyControl has an issue (#102947) where CC slots are allocated eagerly at query start, and spawnThreads has a race where it spawns threads faster than they can register as idle, resulting in many unnecessary threads being created.

Until lazy CC slot allocation is implemented, we have no choice other than to set strict limits on the thread count for insert queries.

buildInsertPipeline was requesting max_threads CC slots for all INSERTs regardless of pipeline width. For plain INSERTs without MVs the pipeline is 1-wide, so this wasted CC capacity and spawned unnecessary threads.

Now we use max_insert_threads when no MVs are attached, and max_threads only when MVs are involved (their inner SELECTs benefit from full parallelism).

Mitigates #102947

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry

Plain INSERTs without materialized views no longer request excessive ConcurrencyControl slots and threads (max_threads instead of max_insert_threads), preventing CC slot starvation and thread count blowup on clusters with high INSERT throughput.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

buildInsertPipeline was requesting max_threads ConcurrencyControl slots for all INSERTs regardless of pipeline width. For plain INSERTs without MVs the pipeline is 1-wide, so this wasted CC capacity and spawned unnecessary threads (up to ~10 for max_threads=16). Now we use max_insert_threads when no MVs are attached, and max_threads only when MVs are involved (their inner SELECTs benefit from full parallelism). Also adds a trace log in PipelineExecutor::allocateCPU to make CC slot allocation visible, and a regression test covering both cases. Fixes: #102947

clickhouse-gh · 2026-04-17T10:44:59Z

Workflow [PR], commit [c4f5e7e]

Summary: ❌

job_name	test_name	status	info	comment
Upgrade check (amd_release)		FAIL
	Error message in clickhouse-server.log (see upgrade_error_messages.txt)	FAIL	cidb

AI Review

Summary

This PR changes InterpreterInsertQuery::buildInsertPipeline to use max_insert_threads for plain INSERT without dependent materialized views, and max_threads only when views are involved. The core C++ change is small and aligned with the reported concurrency-slot/thread over-allocation problem, but there are two merge-blocking review issues in metadata/test quality that should be addressed before approval.

Findings

⚠️ Majors

[tests/queries/0_stateless/04102_insert_threads_eagerly_spawned.sh:2,37-45,67-76] The test is over-constrained by no-replicated-database and no-async-insert, which narrows CI coverage. Also, system.query_log lookups do not pin is_initial_query = 1 and do not select the latest single row, making the assertion more fragile in mixed-profile runs.
- Suggested fix: remove those restrictive tags and make the query deterministic in-script (--async_insert=0 --wait_for_async_insert=1, plus is_initial_query = 1 and ORDER BY event_time_microseconds DESC LIMIT 1 in both checks).

💡 Nits

[PR description: Changelog entry] The changelog text inverts the settings names and describes the opposite behavior (max_threads vs max_insert_threads) from the code.
- Suggested fix: update the entry to state that plain INSERT without MVs now uses max_insert_threads, while inserts with MVs continue to use max_threads.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
PR metadata quality	❌	`Changelog entry` currently contradicts the implemented behavior
Safe rollout	✅
Compilation time	✅
No large/binary files	✅

Final Verdict

Status: ⚠️ Request changes
Minimum required actions:
1. Fix Changelog entry wording to match the actual setting switch (max_insert_threads for plain INSERT, max_threads when MVs are involved).
2. Make the new stateless test deterministic without profile-narrowing tags (or otherwise preserve equivalent CI profile coverage).

ConcurrencyControl has an issue (#102947) where CC slots are allocated eagerly at query start, and spawnThreads has a race where it spawns threads faster than they can register as idle, resulting in many unnecessary threads being created. Until lazy CC slot allocation is implemented, we have no choice other than to set strict limits on the thread count for insert queries. buildInsertPipeline was requesting max_threads CC slots for all INSERTs regardless of pipeline width. For plain INSERTs without MVs the pipeline is 1-wide, so this wasted CC capacity and spawned unnecessary threads. Now we use max_insert_threads when no MVs are attached, and max_threads only when MVs are involved (their inner SELECTs benefit from full parallelism). Fixes: #102947

Style check requires specifying the log table name explicitly.

…ized settings Pin `input_format_parallel_parsing=0` in INSERT commands so that parallel parsing threads do not inflate `peak_threads_usage` beyond the threshold — the test measures INSERT pipeline concurrency, not parser parallelism. Add `SETTINGS optimize_if_transform_strings_to_enum = 0` to diagnostic queries against `system.query_log` to prevent a column name mismatch when randomized settings enable both this optimization and `parallel_replicas_local_plan`. CI report: #102961 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Regression from ClickHouse#92141: server aborts with signal 6 during `loadDataPart` in debug and sanitizer builds when a part directory contains a leftover `txn_version.txt.tmp` file with no corresponding `txn_version.txt`. This state is produced by `VersionMetadataOnDisk::loadMetadata` case 2 (an incomplete write that was not atomically renamed). It returns a `VersionInfo` with `creation_tid = Tx::DummyTID` and `creation_csn = Tx::RolledBackCSN`. `Tx::DummyTID = {NonTransactionalCSN, DummyLocalTID, Nil} = {1, 2, Nil}` has `start_csn == NonTransactionalCSN` but `local_tid == DummyLocalTID` — which fails the invariant asserted in `TransactionID::isNonTransactional`. The new `VersionMetadata::validateInfo` (added by ClickHouse#92141) and `VersionInfo::wasInvolvedInTransaction` both call `isNonTransactional` on the loaded `creation_tid`, so the assertion aborts the server. Two changes: - `TransactionID::isNonTransactional`: widen the assertion to allow the `DummyTID` special case. The function still correctly returns `false` for `DummyTID` because `local_tid == DummyLocalTID != NonTransactionalLocalTID`. - `VersionMetadata::validateInfo`: early-return for rolled-back parts (`creation_csn == RolledBackCSN`). Such parts are marked `Outdated` immediately after loading by `MergeTreeData::loadDataPart` and cleaned up, so the live-part invariants validated here do not apply. CI reports showing the crash: - PR ClickHouse#103047: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=103047&sha=ec0124d77b94c97a4b1b48049b0215b1dddb94a5&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%2C%20distributed%20plan%2C%20s3%20storage%2C%20parallel%29 - PR ClickHouse#102961: same `Stateless tests (amd_debug, distributed plan, s3 storage, parallel)` check Regression test `04104_transaction_version_metadata_dummy_tid_load.sh` deterministically creates the scenario by dropping a bogus `txn_version.txt.tmp` into a part directory and running `DETACH` + `ATTACH`. Before the fix the server aborts with signal 6 at `TransactionID.h:90`; after the fix the rolled-back part is marked `Outdated` and invisible. Session: cron:clickhouse-ci-task-worker:20260418-221500

DatabaseReplicated: `query_log` lookup by `query_id` assumes single-node execution; replicated setup may miss entries or return duplicates from other replicas. AsyncInsert: injected `async_insert=1` queues the INSERT and returns immediately, so `peak_threads_usage` reflects only the queuing step, not the MV processing pipeline the test is designed to verify. CI report: #102961 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ParallelReplicas settings alter query execution plans and thread allocation, making `peak_threads_usage` checks unreliable. S3 I/O threads (multi-part uploads, cache connections) inflate `peak_threads_usage` beyond the pipeline thread count the test measures. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102961&sha=43b3cf5f2a16a3ddc9683efa36016d69558b0531&name_0=PR PR: #102961 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-04-20T15:10:21Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	84.00%	84.00%	+0.00%
Functions	91.10%	91.10%	+0.00%
Branches	76.50%	76.50%	+0.00%

Changed lines: 100.00% (4/4) · Uncovered code

Full report · Diff report

Cherry pick #102961 to 26.1: Use max_insert_threads for plain INSERTs without materialized views

…thout materialized views

Cherry pick #102961 to 26.2: Use max_insert_threads for plain INSERTs without materialized views

…thout materialized views

Cherry pick #102961 to 26.3: Use max_insert_threads for plain INSERTs without materialized views

…thout materialized views

Backport #102961 to 26.1: Use max_insert_threads for plain INSERTs without materialized views

Backport #102961 to 26.2: Use max_insert_threads for plain INSERTs without materialized views

Backport #102961 to 26.3: Use max_insert_threads for plain INSERTs without materialized views

Use max_insert_threads for plain INSERTs without materialized views

Backport #102961 to 25.12: Use max_insert_threads for plain INSERTs without materialized views

Backport #102961 to 25.10: Use max_insert_threads for plain INSERTs without materialized views

…thread-count Use max_insert_threads for plain INSERTs without materialized views

clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 17, 2026

clickhouse-gh Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread src/Interpreters/InterpreterInsertQuery.cpp

CheSema added 2 commits April 17, 2026 13:03

fix: use SYSTEM FLUSH LOGS query_log in test

9e044fa

Style check requires specifying the log table name explicitly.

chmod +x

147b1f8

CheSema force-pushed the sema/fix-insert-thread-count branch from a6bb0a1 to c208281 Compare April 17, 2026 20:00

groeneai mentioned this pull request Apr 18, 2026

Fix assertion in TransactionID::isNonTransactional for DummyTID #103082

Open

1 task

clickhouse-gh Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/04102_insert_threads_eagerly_spawned.sh Outdated

serxa approved these changes Apr 20, 2026

View reviewed changes

serxa self-assigned this Apr 20, 2026

CheSema added this pull request to the merge queue Apr 20, 2026

Merged via the queue into master with commit 3cda729 Apr 20, 2026
159 of 161 checks passed

CheSema deleted the sema/fix-insert-thread-count branch April 20, 2026 17:35

robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 20, 2026

CheSema added v25.10-must-backport v25.11-must-backport v25.12-must-backport v26.2-must-backport v26.3-must-backport labels Apr 21, 2026

robot-ch-test-poll1 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Apr 21, 2026

robot-ch-test-poll4 mentioned this pull request Apr 21, 2026

Cherry pick #102961 to 26.1: Use max_insert_threads for plain INSERTs without materialized views #103236

Merged

robot-ch-test-poll4 added a commit that referenced this pull request Apr 21, 2026

Merge pull request #103236 from ClickHouse/cherrypick/26.1/102961

e864a90

Cherry pick #102961 to 26.1: Use max_insert_threads for plain INSERTs without materialized views

robot-clickhouse added a commit that referenced this pull request Apr 21, 2026

Backport #102961 to 26.1: Use max_insert_threads for plain INSERTs wi…

712d1db

…thout materialized views

This was referenced Apr 21, 2026

Backport #102961 to 26.1: Use max_insert_threads for plain INSERTs without materialized views #103237

Merged

Cherry pick #102961 to 26.2: Use max_insert_threads for plain INSERTs without materialized views #103238

Merged

robot-ch-test-poll4 added a commit that referenced this pull request Apr 21, 2026

Merge pull request #103238 from ClickHouse/cherrypick/26.2/102961

67a8866

Cherry pick #102961 to 26.2: Use max_insert_threads for plain INSERTs without materialized views

robot-clickhouse added a commit that referenced this pull request Apr 21, 2026

Backport #102961 to 26.2: Use max_insert_threads for plain INSERTs wi…

ff6a76e

…thout materialized views

This was referenced Apr 21, 2026

Backport #102961 to 26.2: Use max_insert_threads for plain INSERTs without materialized views #103239

Merged

Cherry pick #102961 to 26.3: Use max_insert_threads for plain INSERTs without materialized views #103240

Merged

robot-ch-test-poll4 added a commit that referenced this pull request Apr 21, 2026

Merge pull request #103240 from ClickHouse/cherrypick/26.3/102961

451492f

Cherry pick #102961 to 26.3: Use max_insert_threads for plain INSERTs without materialized views

robot-clickhouse added a commit that referenced this pull request Apr 21, 2026

Backport #102961 to 26.3: Use max_insert_threads for plain INSERTs wi…

d4a5262

…thout materialized views

robot-ch-test-poll4 mentioned this pull request Apr 21, 2026

Backport #102961 to 26.3: Use max_insert_threads for plain INSERTs without materialized views #103242

Merged

robot-ch-test-poll1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Apr 21, 2026

clickhouse-gh Bot added a commit that referenced this pull request Apr 21, 2026

Merge pull request #103237 from ClickHouse/backport/26.1/102961

4c6437e

Backport #102961 to 26.1: Use max_insert_threads for plain INSERTs without materialized views

clickhouse-gh Bot added a commit that referenced this pull request Apr 21, 2026

Merge pull request #103239 from ClickHouse/backport/26.2/102961

936f0ba

Backport #102961 to 26.2: Use max_insert_threads for plain INSERTs without materialized views

CheSema added a commit that referenced this pull request Apr 22, 2026

Merge pull request #103242 from ClickHouse/backport/26.3/102961

015407a

Backport #102961 to 26.3: Use max_insert_threads for plain INSERTs without materialized views

CheSema removed pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Apr 30, 2026

This was referenced Apr 30, 2026

Backport #102961 to 25.10: Use max_insert_threads for plain INSERTs without materialized views #103748

Merged

Backport #102961 to 25.12: Use max_insert_threads for plain INSERTs without materialized views #103749

Merged

robot-ch-test-poll1 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Apr 30, 2026

robot-ch-test-poll added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Apr 30, 2026

clickhouse-gh Bot pushed a commit that referenced this pull request Apr 30, 2026

Merge pull request #102961 from ClickHouse/sema/fix-insert-thread-count

9d323eb

Use max_insert_threads for plain INSERTs without materialized views

clickhouse-gh Bot added a commit that referenced this pull request Apr 30, 2026

Merge pull request #103749 from ClickHouse/backport/25.12/102961

7852f16

Backport #102961 to 25.12: Use max_insert_threads for plain INSERTs without materialized views

CheSema added a commit that referenced this pull request Apr 30, 2026

Merge pull request #103748 from ClickHouse/backport/25.10/102961

16b240c

Backport #102961 to 25.10: Use max_insert_threads for plain INSERTs without materialized views

kewin-robetti pushed a commit to viasoftkorp/ClickHouse that referenced this pull request May 1, 2026

Merge pull request ClickHouse#102961 from ClickHouse/sema/fix-insert-…

0148bcf

…thread-count Use max_insert_threads for plain INSERTs without materialized views

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use max_insert_threads for plain INSERTs without materialized views#102961

Use max_insert_threads for plain INSERTs without materialized views#102961
CheSema merged 7 commits intomasterfrom
sema/fix-insert-thread-count

CheSema commented Apr 17, 2026 •

edited

Loading

Uh oh!

clickhouse-gh Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh Bot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

CheSema commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

ClickHouse Rules

Final Verdict

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh Bot commented Apr 20, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CheSema commented Apr 17, 2026 •

edited

Loading

clickhouse-gh Bot commented Apr 17, 2026 •

edited

Loading