Try to squash matview concurrent inserts by antonio2368 · Pull Request #87280 · ClickHouse/ClickHouse

antonio2368 · 2025-09-18T12:11:43Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Squash data from all threads before inserting to materialized views depending on the settings min_insert_block_size_rows_for_materialized_views and min_insert_block_size_bytes_for_materialized_views.
Previously, if parallel_view_processing was enabled, each thread inserting to a specific materailized view would squash insert independently which could lead to higher number of generated parts.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2025-09-18T12:12:12Z

Workflow [PR], commit [7deeb3a]

Summary: ❌

job_name	test_name	status
Upgrade check (amd_msan)		failure
	Killed by signal (in clickhouse-server.log)	FAIL
	Fatal message in clickhouse-server.log (see fatal_messages.txt)	FAIL
	Killed by signal (output files)	FAIL
	Found signal in gdb.log	FAIL

den-crane · 2025-09-19T18:29:05Z

src/Interpreters/InsertDependenciesBuilder.cpp

+
+    insert_chains.reserve(sink_stream_size);
+
+    /// Squashing from multiple streams breaks deduplication for now so the optimization will be disabled


How it will work if there is multiple inserts and some of the inserts have transactions and able to initiate rollback after insertion?

Sorry for not being clear, I'll write proper changelog once the PR is ready.
This squashes insert data from same inserts but different threads.
Currently, we create squashing transform per thread so each of it will squash the data independently.
I'm trying to see if I can squash everything in one place to reduce number of generated parts, especially if MV generates much smaller parts than the source insert.

antonio2368 · 2025-09-22T10:26:09Z

@CheSema do you think we need a setting to have old behavior?
Even though this is better when it comes to number of parts it could increase insert latency depending on the SELECT queries done by the MVs.

CheSema · 2025-09-23T13:10:42Z

Sema Checherinda do you think we need a setting to have old behavior? Even though this is better when it comes to number of parts it could increase insert latency depending on the SELECT queries done by the MVs.

I'm really worried here about deduplication. I know that the order of the resulting chunks in inner query has to be preserved and thy can not be mixed. So that feature could be incompatible with the way how we deduplicate inserted data.

CheSema · 2025-09-23T13:13:28Z

src/Interpreters/InterpreterInsertQuery.cpp

+        query_sample_block,
+        async_insert,
+        /*skip_destination_table*/ no_destination,
+        /*max_insert_threads*/ 1,


what if we need more inserting threads here?

antonio2368 · 2025-09-23T13:15:10Z

I'm really worried here about deduplication. I know that the order of the resulting chunks in inner query has to be preserved and thy can not be mixed. So that feature could be incompatible with the way how we deduplicate inserted data.

It will be disabled if we deduplication is enabled. I'm thinking about adding setting that could disable squashing in such way even with disabled deduplication.

CheSema · 2025-09-23T14:41:01Z

src/Interpreters/InsertDependenciesBuilder.cpp

+        }
+    }
+
+    if (deduplicate_blocks_in_dependent_materialized_views || !has_squashing_transforms)


I see, you do it with respect to deduplication.

CheSema · 2025-09-23T14:42:57Z

src/Processors/Transforms/ExceptionKeepingTransform.cpp

    else if (stage == Stage::Finish)
    {
-        if (auto exception = runStep([this] { onFinish(); }, thread_group))
+        GenerateResult res;


this is not clear why do we need this changes?
why do we have a result on Finish stage?

PlanSquashingTransform was IInflatingTransform because we don't create chunk until we have enough data. To properly handle errors from MV it needed to become ExceptionKeepingTransform.
As the logic of IInflatingTransform is much simple, the easiest thing for me was to add the logic from it to ExceptionKeepingTransform.

CheSema · 2025-09-23T14:59:55Z

As I understand the code the setting insert thread count was introduced not for speeding up inner queries, it does it as side effect, but mainly as making writing to destination table concurrently.

Many be we need here more detailed settings like:
inner_mv_queries_concurency
insertion_concurency

antonio2368 · 2025-09-24T07:24:52Z

As I understand the code the setting insert thread count was introduced not for speeding up inner queries, it does it as side effect, but mainly as making writing to destination table concurrently.

Yes, but if you have chained MVs, with this PR it will squash data from different threads which means instead of running 4 smaller SELECTs (and creating more parts) you run 1 heavy SELECT (and creating only 1 part).
So you end up doing more during the insert instead of during merges and SELECTs on the destination table.
It general this should be fine, creating fewer parts is always a win but it could confuse some users.

And I agree for the settings, it's a bit confusing how max_insert_threads interacts with MVs. Let's do it in different PR not to clutter this one.

clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Sep 18, 2025

antonio2368 force-pushed the matview-squashing-in-parallel-processing branch from b943071 to 32bad6b Compare September 18, 2025 12:21

CheSema self-assigned this Sep 18, 2025

antonio2368 force-pushed the matview-squashing-in-parallel-processing branch 2 times, most recently from a7f4efa to f30c774 Compare September 18, 2025 13:25

Try to squash matview concurrent inserts

031a9d5

antonio2368 force-pushed the matview-squashing-in-parallel-processing branch from f30c774 to 031a9d5 Compare September 18, 2025 13:30

antonio2368 added 2 commits September 19, 2025 14:06

Fix error handling

090215b

Merge branch 'master' into matview-squashing-in-parallel-processing

6a98fe3

den-crane reviewed Sep 19, 2025

View reviewed changes

Add comments

1d7f0bb

antonio2368 marked this pull request as ready for review September 22, 2025 10:14

CheSema reviewed Sep 23, 2025

View reviewed changes

clickhouse-gh bot added pr-improvement Pull request with some product improvements and removed pr-not-for-changelog This PR should not be mentioned in the changelog labels Sep 23, 2025

CheSema reviewed Sep 23, 2025

View reviewed changes

CheSema approved these changes Sep 24, 2025

View reviewed changes

antonio2368 added 2 commits September 29, 2025 12:33

Merge branch 'master' into matview-squashing-in-parallel-processing

74c3ad5

Add setting

7deeb3a

antonio2368 force-pushed the matview-squashing-in-parallel-processing branch from ad1fc8c to 7deeb3a Compare September 29, 2025 11:45

antonio2368 added this pull request to the merge queue Oct 2, 2025

Merged via the queue into master with commit b6d1cd8 Oct 2, 2025
121 of 123 checks passed

antonio2368 deleted the matview-squashing-in-parallel-processing branch October 2, 2025 15:23

robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to squash matview concurrent inserts#87280

Try to squash matview concurrent inserts#87280
antonio2368 merged 6 commits intomasterfrom
matview-squashing-in-parallel-processing

antonio2368 commented Sep 18, 2025 •

edited

Loading

Uh oh!

clickhouse-gh bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

den-crane Sep 19, 2025

Uh oh!

antonio2368 Sep 22, 2025

Uh oh!

antonio2368 commented Sep 22, 2025

Uh oh!

CheSema commented Sep 23, 2025

Uh oh!

CheSema Sep 23, 2025 •

edited

Loading

Uh oh!

antonio2368 commented Sep 23, 2025

Uh oh!

CheSema Sep 23, 2025

Uh oh!

CheSema Sep 23, 2025

Uh oh!

antonio2368 Sep 25, 2025

Uh oh!

CheSema commented Sep 23, 2025

Uh oh!

antonio2368 commented Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		insert_chains.reserve(sink_stream_size);

		/// Squashing from multiple streams breaks deduplication for now so the optimization will be disabled

Conversation

antonio2368 commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

den-crane Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

antonio2368 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

antonio2368 commented Sep 22, 2025

Uh oh!

CheSema commented Sep 23, 2025

Uh oh!

CheSema Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonio2368 commented Sep 23, 2025

Uh oh!

CheSema Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

CheSema Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

antonio2368 Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

CheSema commented Sep 23, 2025

Uh oh!

antonio2368 commented Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

antonio2368 commented Sep 18, 2025 •

edited

Loading

clickhouse-gh bot commented Sep 18, 2025 •

edited

Loading

CheSema Sep 23, 2025 •

edited

Loading