Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/cloud/bestpractices/avoidoptimizefinal.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ It is important to note that using this optimization will force a rewrite of a p
even if merging to a single part has already occurred.

Additionally, use of the `OPTIMIZE TABLE ... FINAL` query may disregard
setting [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) which controls the maximum size of parts
setting [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) which controls the maximum size of parts
that ClickHouse will typically merge by itself in the background.

The [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) setting is by default set to 150 GB.
The [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) setting is by default set to 150 GB.
When running `OPTIMIZE TABLE ... FINAL`,
the steps outlined above will be performed resulting in a single part after merge.
This remaining single part could exceed the 150 GB specified by the default of this setting.
Expand Down
8 changes: 4 additions & 4 deletions docs/guides/developer/deduplicating-inserts-on-retries.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ When an insert is retried, ClickHouse tries to determine whether the data has al

**Only `*MergeTree` engines support deduplication on insertion.**

For `*ReplicatedMergeTree` engines, insert deduplication is enabled by default and is controlled by the [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated-deduplication-window) and [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated-deduplication-window-seconds) settings. For non-replicated `*MergeTree` engines, deduplication is controlled by the [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non-replicated-deduplication-window) setting.
For `*ReplicatedMergeTree` engines, insert deduplication is enabled by default and is controlled by the [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated_deduplication_window) and [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated_deduplication_window_seconds) settings. For non-replicated `*MergeTree` engines, deduplication is controlled by the [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non_replicated_deduplication_window) setting.

The settings above determine the parameters of the deduplication log for a table. The deduplication log stores a finite number of `block_id`s, which determine how deduplication works (see below).

Expand All @@ -41,9 +41,9 @@ When a table has one or more materialized views, the inserted data is also inser

You can control this process using the following settings for the source table:

- [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated-deduplication-window)
- [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated-deduplication-window-seconds)
- [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non-replicated-deduplication-window)
- [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated_deduplication_window)
- [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated_deduplication_window_seconds)
- [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non_replicated_deduplication_window)

You can also use the user profile setting [`deduplicate_blocks_in_dependent_materialized_views`](/operations/settings/settings#deduplicate_blocks_in_dependent_materialized_views).

Expand Down
8 changes: 4 additions & 4 deletions docs/integrations/data-ingestion/s3/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,13 @@ Note that the `min_insert_block_size_bytes` value denotes the uncompressed in-me

#### Be aware of merges {#be-aware-of-merges}

The smaller the configured insert block size is, the more initial parts get created for a large data load, and the more background part merges are executed concurrently with the data ingestion. This can cause resource contention (CPU and memory) and require additional time (for reaching a [healthy](/operations/settings/merge-tree-settings#parts-to-throw-insert) (3000) number of parts) after the ingestion is finished.
The smaller the configured insert block size is, the more initial parts get created for a large data load, and the more background part merges are executed concurrently with the data ingestion. This can cause resource contention (CPU and memory) and require additional time (for reaching a [healthy](/operations/settings/merge-tree-settings#parts_to_throw_insert) (3000) number of parts) after the ingestion is finished.

:::important
ClickHouse query performance will be negatively impacted if the part count exceeds the [recommended limits](/operations/settings/merge-tree-settings#parts-to-throw-insert).
ClickHouse query performance will be negatively impacted if the part count exceeds the [recommended limits](/operations/settings/merge-tree-settings#parts_to_throw_insert).
:::

ClickHouse will continuously [merge parts](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance) into larger parts until they [reach](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) a compressed size of ~150 GiB. This diagram shows how a ClickHouse server merges parts:
ClickHouse will continuously [merge parts](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance) into larger parts until they [reach](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) a compressed size of ~150 GiB. This diagram shows how a ClickHouse server merges parts:

<Image img={Merges} size="lg" border alt="Background merges in ClickHouse" />

Expand All @@ -84,7 +84,7 @@ Go to ①

Note that [increasing](https://clickhouse.com/blog/supercharge-your-clickhouse-data-loads-part1#hardware-size) the number of CPU cores and the size of RAM increases the background merge throughput.

Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old-parts-lifetime) number of minutes. Over time, this creates a tree of merged parts (hence the name [`MergeTree`](/engines/table-engines/mergetree-family) table).
Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old_parts_lifetime) number of minutes. Over time, this creates a tree of merged parts (hence the name [`MergeTree`](/engines/table-engines/mergetree-family) table).

### Insert Parallelism {#insert-parallelism}

Expand Down
4 changes: 2 additions & 2 deletions docs/managing-data/core-concepts/merges.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ ClickHouse [is fast](/concepts/why-clickhouse-is-so-fast) not just for queries b

This makes data writes lightweight and [highly efficient](/concepts/why-clickhouse-is-so-fast#storage-layer-concurrent-inserts-are-isolated-from-each-other).

To control the number of parts per table and implement ② above, ClickHouse continuously merges ([per partition](/partitions#per-partition-merges)) smaller parts into larger ones in the background until they reach a compressed size of approximately [~150 GB](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool).
To control the number of parts per table and implement ② above, ClickHouse continuously merges ([per partition](/partitions#per-partition-merges)) smaller parts into larger ones in the background until they reach a compressed size of approximately [~150 GB](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool).

The following diagram sketches this background merge process:

<Image img={merges_01} size="lg" alt='PART MERGES'/>

<br/>

The `merge level` of a part is incremented by one with each additional merge. A level of `0` means the part is new and has not been merged yet. Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old-parts-lifetime) time (8 minutes by default). Over time, this creates a **tree** of merged parts. Hence the name [merge tree](/engines/table-engines/mergetree-family) table.
The `merge level` of a part is incremented by one with each additional merge. A level of `0` means the part is new and has not been merged yet. Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old_parts_lifetime) time (8 minutes by default). Over time, this creates a **tree** of merged parts. Hence the name [merge tree](/engines/table-engines/mergetree-family) table.

## Monitoring merges {#monitoring-merges}

Expand Down
2 changes: 1 addition & 1 deletion docs/managing-data/core-concepts/parts.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Data parts are self-contained, including all metadata needed to interpret their

## Part merges {#part-merges}

To manage the number of parts per table, a [background merge](/merges) job periodically combines smaller parts into larger ones until they reach a [configurable](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a [configurable](/operations/settings/merge-tree-settings#old-parts-lifetime) time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:
To manage the number of parts per table, a [background merge](/merges) job periodically combines smaller parts into larger ones until they reach a [configurable](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a [configurable](/operations/settings/merge-tree-settings#old_parts_lifetime) time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:

<Image img={merges} size="lg"/>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ Users should consider partitioning a data management technique. It is ideal when

Important: Ensure your partitioning key expression does not result in a high cardinality set i.e. creating more than 100 partitions should be avoided. For example, do not partition your data by high cardinality columns such as client identifiers or names. Instead, make a client identifier or name the first column in the `ORDER BY` expression.

> Internally, ClickHouse [creates parts](/guides/best-practices/sparse-primary-indexes#clickhouse-index-design) for inserted data. As more data is inserted, the number of parts increases. In order to prevent an excessively high number of parts, which will degrade query performance (because there are more files to read), parts are merged together in a background asynchronous process. If the number of parts exceeds a [pre-configured limit](/operations/settings/merge-tree-settings#parts-to-throw-insert), then ClickHouse will throw an exception on insert as a ["too many parts" error](/knowledgebase/exception-too-many-parts). This should not happen under normal operation and only occurs if ClickHouse is misconfigured or used incorrectly e.g. many small inserts. Since parts are created per partition in isolation, increasing the number of partitions causes the number of parts to increase i.e. it is a multiple of the number of partitions. High cardinality partitioning keys can, therefore, cause this error and should be avoided.
> Internally, ClickHouse [creates parts](/guides/best-practices/sparse-primary-indexes#clickhouse-index-design) for inserted data. As more data is inserted, the number of parts increases. In order to prevent an excessively high number of parts, which will degrade query performance (because there are more files to read), parts are merged together in a background asynchronous process. If the number of parts exceeds a [pre-configured limit](/operations/settings/merge-tree-settings#parts_to_throw_insert), then ClickHouse will throw an exception on insert as a ["too many parts" error](/knowledgebase/exception-too-many-parts). This should not happen under normal operation and only occurs if ClickHouse is misconfigured or used incorrectly e.g. many small inserts. Since parts are created per partition in isolation, increasing the number of partitions causes the number of parts to increase i.e. it is a multiple of the number of partitions. High cardinality partitioning keys can, therefore, cause this error and should be avoided.

## Materialized views vs projections {#materialized-views-vs-projections}

Expand Down
2 changes: 1 addition & 1 deletion docusaurus.config.en.js
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ const config = {
onBrokenLinks: "throw",
onBrokenMarkdownLinks: "warn",
onDuplicateRoutes: "throw",
onBrokenAnchors: "throw",
onBrokenAnchors: "warn",
favicon: "img/docs_favicon.ico",
organizationName: "ClickHouse",
trailingSlash: false,
Expand Down
3 changes: 2 additions & 1 deletion scripts/settings/autogenerate-settings.sh
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,12 @@ done
# move across files to where they need to be
mv settings-formats.md "$root/docs/operations/settings" || { echo "Failed to move generated settings-format.md"; exit 1; }
mv settings.md "$root/docs/operations/settings" || { echo "Failed to move generated settings.md"; exit 1; }
cat generated_merge_tree_settings.md >> "$root/docs/operations/settings/merge-tree-settings.md" || { echo "Failed to append MergeTree settings.md"; exit 1; }
mv server_settings.md "$root/docs/operations/server-configuration-parameters/settings.md" || { echo "Failed to move generated server_settings.md"; exit 1; }

echo "[$SCRIPT_NAME] Auto-generation of settings markdown pages completed successfully"

# perform cleanup
rm -rf "$tmp_dir"/{settings-formats.md,settings.md,FormatFactorySettings.h,Settings.cpp,clickhouse}
rm -rf "$tmp_dir"/{settings-formats.md,settings.md,FormatFactorySettings.h,Settings.cpp,generated_merge_tree_settings.md,clickhouse}

echo "[$SCRIPT_NAME] Autogenerating settings completed"
15 changes: 15 additions & 0 deletions scripts/settings/mergetree-settings.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
WITH
merge_tree_settings AS
(
SELECT format(
'## {} {} \n{}\n{}{}',
name,
'{#'||name||'}',
multiIf(tier == 'Experimental', '\n<ExperimentalBadge/>\n', tier == 'Beta', '\n<BetaBadge/>\n', ''),
if(type != '' AND default != '', format('|Type|Default|\n|---|---|\n|`{}`|`{}`|\n\n',type, default), ''),
replaceRegexpAll(description, '(?m)(^[ \t]+|[ \t]+$)', '')
)
FROM system.merge_tree_settings ORDER BY name
)
SELECT * FROM merge_tree_settings
INTO OUTFILE 'generated_merge_tree_settings.md' TRUNCATE FORMAT LineAsString