diff --git a/docs/cloud/bestpractices/avoidoptimizefinal.md b/docs/cloud/bestpractices/avoidoptimizefinal.md
index 7639f2888df..0cf27f3a7ae 100644
--- a/docs/cloud/bestpractices/avoidoptimizefinal.md
+++ b/docs/cloud/bestpractices/avoidoptimizefinal.md
@@ -20,10 +20,10 @@ It is important to note that using this optimization will force a rewrite of a p
even if merging to a single part has already occurred.
Additionally, use of the `OPTIMIZE TABLE ... FINAL` query may disregard
-setting [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) which controls the maximum size of parts
+setting [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) which controls the maximum size of parts
that ClickHouse will typically merge by itself in the background.
-The [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) setting is by default set to 150 GB.
+The [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) setting is by default set to 150 GB.
When running `OPTIMIZE TABLE ... FINAL`,
the steps outlined above will be performed resulting in a single part after merge.
This remaining single part could exceed the 150 GB specified by the default of this setting.
diff --git a/docs/guides/developer/deduplicating-inserts-on-retries.md b/docs/guides/developer/deduplicating-inserts-on-retries.md
index 06768a6c43e..b75f823dd33 100644
--- a/docs/guides/developer/deduplicating-inserts-on-retries.md
+++ b/docs/guides/developer/deduplicating-inserts-on-retries.md
@@ -15,7 +15,7 @@ When an insert is retried, ClickHouse tries to determine whether the data has al
**Only `*MergeTree` engines support deduplication on insertion.**
-For `*ReplicatedMergeTree` engines, insert deduplication is enabled by default and is controlled by the [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated-deduplication-window) and [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated-deduplication-window-seconds) settings. For non-replicated `*MergeTree` engines, deduplication is controlled by the [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non-replicated-deduplication-window) setting.
+For `*ReplicatedMergeTree` engines, insert deduplication is enabled by default and is controlled by the [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated_deduplication_window) and [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated_deduplication_window_seconds) settings. For non-replicated `*MergeTree` engines, deduplication is controlled by the [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non_replicated_deduplication_window) setting.
The settings above determine the parameters of the deduplication log for a table. The deduplication log stores a finite number of `block_id`s, which determine how deduplication works (see below).
@@ -41,9 +41,9 @@ When a table has one or more materialized views, the inserted data is also inser
You can control this process using the following settings for the source table:
-- [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated-deduplication-window)
-- [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated-deduplication-window-seconds)
-- [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non-replicated-deduplication-window)
+- [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated_deduplication_window)
+- [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated_deduplication_window_seconds)
+- [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non_replicated_deduplication_window)
You can also use the user profile setting [`deduplicate_blocks_in_dependent_materialized_views`](/operations/settings/settings#deduplicate_blocks_in_dependent_materialized_views).
diff --git a/docs/integrations/data-ingestion/s3/performance.md b/docs/integrations/data-ingestion/s3/performance.md
index e64b0b03122..584bd9d35a8 100644
--- a/docs/integrations/data-ingestion/s3/performance.md
+++ b/docs/integrations/data-ingestion/s3/performance.md
@@ -60,13 +60,13 @@ Note that the `min_insert_block_size_bytes` value denotes the uncompressed in-me
#### Be aware of merges {#be-aware-of-merges}
-The smaller the configured insert block size is, the more initial parts get created for a large data load, and the more background part merges are executed concurrently with the data ingestion. This can cause resource contention (CPU and memory) and require additional time (for reaching a [healthy](/operations/settings/merge-tree-settings#parts-to-throw-insert) (3000) number of parts) after the ingestion is finished.
+The smaller the configured insert block size is, the more initial parts get created for a large data load, and the more background part merges are executed concurrently with the data ingestion. This can cause resource contention (CPU and memory) and require additional time (for reaching a [healthy](/operations/settings/merge-tree-settings#parts_to_throw_insert) (3000) number of parts) after the ingestion is finished.
:::important
-ClickHouse query performance will be negatively impacted if the part count exceeds the [recommended limits](/operations/settings/merge-tree-settings#parts-to-throw-insert).
+ClickHouse query performance will be negatively impacted if the part count exceeds the [recommended limits](/operations/settings/merge-tree-settings#parts_to_throw_insert).
:::
-ClickHouse will continuously [merge parts](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance) into larger parts until they [reach](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) a compressed size of ~150 GiB. This diagram shows how a ClickHouse server merges parts:
+ClickHouse will continuously [merge parts](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance) into larger parts until they [reach](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) a compressed size of ~150 GiB. This diagram shows how a ClickHouse server merges parts:
@@ -84,7 +84,7 @@ Go to ①
Note that [increasing](https://clickhouse.com/blog/supercharge-your-clickhouse-data-loads-part1#hardware-size) the number of CPU cores and the size of RAM increases the background merge throughput.
-Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old-parts-lifetime) number of minutes. Over time, this creates a tree of merged parts (hence the name [`MergeTree`](/engines/table-engines/mergetree-family) table).
+Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old_parts_lifetime) number of minutes. Over time, this creates a tree of merged parts (hence the name [`MergeTree`](/engines/table-engines/mergetree-family) table).
### Insert Parallelism {#insert-parallelism}
diff --git a/docs/managing-data/core-concepts/merges.md b/docs/managing-data/core-concepts/merges.md
index b6ee079ede5..2b975ffd693 100644
--- a/docs/managing-data/core-concepts/merges.md
+++ b/docs/managing-data/core-concepts/merges.md
@@ -28,7 +28,7 @@ ClickHouse [is fast](/concepts/why-clickhouse-is-so-fast) not just for queries b
This makes data writes lightweight and [highly efficient](/concepts/why-clickhouse-is-so-fast#storage-layer-concurrent-inserts-are-isolated-from-each-other).
-To control the number of parts per table and implement ② above, ClickHouse continuously merges ([per partition](/partitions#per-partition-merges)) smaller parts into larger ones in the background until they reach a compressed size of approximately [~150 GB](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool).
+To control the number of parts per table and implement ② above, ClickHouse continuously merges ([per partition](/partitions#per-partition-merges)) smaller parts into larger ones in the background until they reach a compressed size of approximately [~150 GB](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool).
The following diagram sketches this background merge process:
@@ -36,7 +36,7 @@ The following diagram sketches this background merge process:
-The `merge level` of a part is incremented by one with each additional merge. A level of `0` means the part is new and has not been merged yet. Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old-parts-lifetime) time (8 minutes by default). Over time, this creates a **tree** of merged parts. Hence the name [merge tree](/engines/table-engines/mergetree-family) table.
+The `merge level` of a part is incremented by one with each additional merge. A level of `0` means the part is new and has not been merged yet. Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old_parts_lifetime) time (8 minutes by default). Over time, this creates a **tree** of merged parts. Hence the name [merge tree](/engines/table-engines/mergetree-family) table.
## Monitoring merges {#monitoring-merges}
diff --git a/docs/managing-data/core-concepts/parts.md b/docs/managing-data/core-concepts/parts.md
index 9cd9e172e6d..dc243a29654 100644
--- a/docs/managing-data/core-concepts/parts.md
+++ b/docs/managing-data/core-concepts/parts.md
@@ -55,7 +55,7 @@ Data parts are self-contained, including all metadata needed to interpret their
## Part merges {#part-merges}
-To manage the number of parts per table, a [background merge](/merges) job periodically combines smaller parts into larger ones until they reach a [configurable](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a [configurable](/operations/settings/merge-tree-settings#old-parts-lifetime) time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:
+To manage the number of parts per table, a [background merge](/merges) job periodically combines smaller parts into larger ones until they reach a [configurable](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a [configurable](/operations/settings/merge-tree-settings#old_parts_lifetime) time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:
diff --git a/docs/migrations/bigquery/migrating-to-clickhouse-cloud.md b/docs/migrations/bigquery/migrating-to-clickhouse-cloud.md
index 1db6f5d9d7f..94a95068bce 100644
--- a/docs/migrations/bigquery/migrating-to-clickhouse-cloud.md
+++ b/docs/migrations/bigquery/migrating-to-clickhouse-cloud.md
@@ -242,7 +242,7 @@ Users should consider partitioning a data management technique. It is ideal when
Important: Ensure your partitioning key expression does not result in a high cardinality set i.e. creating more than 100 partitions should be avoided. For example, do not partition your data by high cardinality columns such as client identifiers or names. Instead, make a client identifier or name the first column in the `ORDER BY` expression.
-> Internally, ClickHouse [creates parts](/guides/best-practices/sparse-primary-indexes#clickhouse-index-design) for inserted data. As more data is inserted, the number of parts increases. In order to prevent an excessively high number of parts, which will degrade query performance (because there are more files to read), parts are merged together in a background asynchronous process. If the number of parts exceeds a [pre-configured limit](/operations/settings/merge-tree-settings#parts-to-throw-insert), then ClickHouse will throw an exception on insert as a ["too many parts" error](/knowledgebase/exception-too-many-parts). This should not happen under normal operation and only occurs if ClickHouse is misconfigured or used incorrectly e.g. many small inserts. Since parts are created per partition in isolation, increasing the number of partitions causes the number of parts to increase i.e. it is a multiple of the number of partitions. High cardinality partitioning keys can, therefore, cause this error and should be avoided.
+> Internally, ClickHouse [creates parts](/guides/best-practices/sparse-primary-indexes#clickhouse-index-design) for inserted data. As more data is inserted, the number of parts increases. In order to prevent an excessively high number of parts, which will degrade query performance (because there are more files to read), parts are merged together in a background asynchronous process. If the number of parts exceeds a [pre-configured limit](/operations/settings/merge-tree-settings#parts_to_throw_insert), then ClickHouse will throw an exception on insert as a ["too many parts" error](/knowledgebase/exception-too-many-parts). This should not happen under normal operation and only occurs if ClickHouse is misconfigured or used incorrectly e.g. many small inserts. Since parts are created per partition in isolation, increasing the number of partitions causes the number of parts to increase i.e. it is a multiple of the number of partitions. High cardinality partitioning keys can, therefore, cause this error and should be avoided.
## Materialized views vs projections {#materialized-views-vs-projections}
diff --git a/docusaurus.config.en.js b/docusaurus.config.en.js
index 02f742800bd..5a9a9d503c4 100644
--- a/docusaurus.config.en.js
+++ b/docusaurus.config.en.js
@@ -59,7 +59,7 @@ const config = {
onBrokenLinks: "throw",
onBrokenMarkdownLinks: "warn",
onDuplicateRoutes: "throw",
- onBrokenAnchors: "throw",
+ onBrokenAnchors: "warn",
favicon: "img/docs_favicon.ico",
organizationName: "ClickHouse",
trailingSlash: false,
diff --git a/scripts/settings/autogenerate-settings.sh b/scripts/settings/autogenerate-settings.sh
index e34a9706d2f..0ed8484cc51 100755
--- a/scripts/settings/autogenerate-settings.sh
+++ b/scripts/settings/autogenerate-settings.sh
@@ -54,11 +54,12 @@ done
# move across files to where they need to be
mv settings-formats.md "$root/docs/operations/settings" || { echo "Failed to move generated settings-format.md"; exit 1; }
mv settings.md "$root/docs/operations/settings" || { echo "Failed to move generated settings.md"; exit 1; }
+cat generated_merge_tree_settings.md >> "$root/docs/operations/settings/merge-tree-settings.md" || { echo "Failed to append MergeTree settings.md"; exit 1; }
mv server_settings.md "$root/docs/operations/server-configuration-parameters/settings.md" || { echo "Failed to move generated server_settings.md"; exit 1; }
echo "[$SCRIPT_NAME] Auto-generation of settings markdown pages completed successfully"
# perform cleanup
-rm -rf "$tmp_dir"/{settings-formats.md,settings.md,FormatFactorySettings.h,Settings.cpp,clickhouse}
+rm -rf "$tmp_dir"/{settings-formats.md,settings.md,FormatFactorySettings.h,Settings.cpp,generated_merge_tree_settings.md,clickhouse}
echo "[$SCRIPT_NAME] Autogenerating settings completed"
diff --git a/scripts/settings/mergetree-settings.sql b/scripts/settings/mergetree-settings.sql
new file mode 100644
index 00000000000..c50691679f3
--- /dev/null
+++ b/scripts/settings/mergetree-settings.sql
@@ -0,0 +1,15 @@
+WITH
+ merge_tree_settings AS
+ (
+ SELECT format(
+ '## {} {} \n{}\n{}{}',
+ name,
+ '{#'||name||'}',
+ multiIf(tier == 'Experimental', '\n\n', tier == 'Beta', '\n\n', ''),
+ if(type != '' AND default != '', format('|Type|Default|\n|---|---|\n|`{}`|`{}`|\n\n',type, default), ''),
+ replaceRegexpAll(description, '(?m)(^[ \t]+|[ \t]+$)', '')
+ )
+ FROM system.merge_tree_settings ORDER BY name
+ )
+SELECT * FROM merge_tree_settings
+INTO OUTFILE 'generated_merge_tree_settings.md' TRUNCATE FORMAT LineAsString
\ No newline at end of file