Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
83ff2e6
Table part merges documentation.
tom-clickhouse Feb 5, 2025
91876cf
Removed some newlines.
tom-clickhouse Feb 5, 2025
fb72681
Tweaked a sentence.
tom-clickhouse Feb 5, 2025
c883737
Spelling.
tom-clickhouse Feb 5, 2025
2936a05
Deduplicate words.
tom-clickhouse Feb 5, 2025
8a97bd9
Slightly revamped visuals.
tom-clickhouse Feb 5, 2025
1cd0c87
Spelling issues.
tom-clickhouse Feb 5, 2025
65963c6
Slight changes in text.
tom-clickhouse Feb 5, 2025
64fce31
Reference to the new merges docs from elsewhere.
tom-clickhouse Feb 5, 2025
d17e63e
Slight visuals revamp.
tom-clickhouse Feb 5, 2025
a5d097f
Slight word massaging.
tom-clickhouse Feb 5, 2025
774ea02
Mention merges dashboard.
tom-clickhouse Feb 5, 2025
ea7518a
Slight rephrasing.
tom-clickhouse Feb 5, 2025
6d12235
Simplify and align some headings.
tom-clickhouse Feb 5, 2025
527bb9d
Further alignment of some headings.
tom-clickhouse Feb 5, 2025
98a842f
Final alignment of some headings.
tom-clickhouse Feb 5, 2025
bca97db
Cross reference merges with partitions docs.
tom-clickhouse Feb 5, 2025
b288078
Revamping of cross references between merges and partitions docs.
tom-clickhouse Feb 5, 2025
4d6afaa
Alignment of wording.
tom-clickhouse Feb 5, 2025
6695c8a
Align headings of all new core concepts docs.
tom-clickhouse Feb 5, 2025
338209a
Additional cross references.
tom-clickhouse Feb 5, 2025
aaafb3a
More precision in sentence.
tom-clickhouse Feb 5, 2025
9cc3038
Newlines after visuals.
tom-clickhouse Feb 5, 2025
10d1bc7
Revamp of visual 1.
tom-clickhouse Feb 5, 2025
49a5b02
More clarity for the merges dashboard section.
tom-clickhouse Feb 6, 2025
3277e10
Give one example of partial state.
tom-clickhouse Feb 6, 2025
3fcec50
Make merge step descriptions more clear and tight.
tom-clickhouse Feb 6, 2025
35b4d3f
Mention additional metadata.
tom-clickhouse Feb 6, 2025
66d6fa2
Resolving of Dale's review.
tom-clickhouse Feb 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/en/concepts/why-clickhouse-is-so-fast.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,19 @@ From an architectural perspective, databases consist (at least) of a storage lay

## Storage Layer: Concurrent inserts are isolated from each other

In ClickHouse, each table consists of multiple "table parts". A part is created whenever a user inserts data into the table (INSERT statement). A query is always executed against all table parts that exist at the time the query starts.
In ClickHouse, each table consists of multiple "table parts". A [part](/docs/en/parts) is created whenever a user inserts data into the table (INSERT statement). A query is always executed against all table parts that exist at the time the query starts.

To avoid that too many parts accumulate, ClickHouse runs a merge operation in the background which continuously combines multiple (small) parts into a single bigger part.
To avoid that too many parts accumulate, ClickHouse runs a [merge](/docs/en/merges) operation in the background which continuously combines multiple smaller parts into a single bigger part.

This approach has several advantages: On the one hand, individual inserts are "local" in the sense that they do not need to update global, i.e. per-table data structures. As a result, multiple simultaneous inserts need no mutual synchronization or synchronization with existing table data, and thus inserts can be performed almost at the speed of disk I/O.
This approach has several advantages: All data processing can be [offloaded to background part merges](/docs/en/concepts/why-clickhouse-is-so-fast#storage-layer-merge-time-computation), keeping data writes lightweight and highly efficient. Individual inserts are "local" in the sense that they do not need to update global, i.e. per-table data structures. As a result, multiple simultaneous inserts need no mutual synchronization or synchronization with existing table data, and thus inserts can be performed almost at the speed of disk I/O.

## Storage Layer: Concurrent inserts and selects are isolated

On the other hand, merging parts is a background operation which is invisible to the user, i.e. does not affect concurrent SELECT queries. In fact, this architecture isolates insert and selects so effectively, that many other databases adopted it.
Inserts are fully isolated from SELECT queries, and merging inserted data parts happens in the background without affecting concurrent queries.

## Storage Layer: Merge-time computation

Unlike other databases, ClickHouse is also able to perform additional data transformations during the merge operation. Examples of this include:
Unlike other databases, ClickHouse keeps data writes lightweight and efficient by performing all additional data transformations during the [merge](/docs/en/merges) background process. Examples of this include:

- **Replacing merges** which retain only the most recent version of a row in the input parts and discard all other row versions. Replacing merges can be thought of as a merge-time cleanup operation.

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/en/managing-data/core-concepts/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ you will learn some of the core concepts of how ClickHouse works.
|-------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Table parts](/docs/en/parts) | Learn what table parts are in ClickHouse. |
| [Table partitions](/docs/en/partitions) | Learn what table partitions are and what they are used for. |
| [Table part merges](/docs/en/merges) | Learn what table part merges are and what they are used for. |
| [Primary indexes](/docs/en/optimize/sparse-primary-indexes) | A deep dive into ClickHouse indexing including how it differs from other DB systems, how ClickHouse builds and uses a table's spare primary index and what some of the best practices are for indexing in ClickHouse. |
166 changes: 166 additions & 0 deletions docs/en/managing-data/core-concepts/merges.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
slug: /en/merges
title: Part merges
description: What are part merges in ClickHouse
keywords: [merges]
---

## What are part merges in ClickHouse?

<br/>

ClickHouse [is fast](/docs/en/concepts/why-clickhouse-is-so-fast) not just for queries but also for inserts, thanks to its [storage layer](https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf), which operates similarly to [LSM trees](https://en.wikipedia.org/wiki/Log-structured_merge-tree):

① Inserts (into tables from the [MergeTree engine](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family) family) create sorted, immutable [data parts](/docs/en/parts).

② All data processing is offloaded to **background part merges**.

This makes data writes lightweight and [highly efficient](/docs/en/concepts/why-clickhouse-is-so-fast#storage-layer-concurrent-inserts-are-isolated-from-each-other).

To control the number of parts per table and implement ② above, ClickHouse continuously merges ([per partition](/docs/en/partitions#per-partition-merges)) smaller parts into larger ones in the background until they reach a compressed size of approximately [~150 GB](/docs/en/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool).

The following diagram sketches this background merge process:

<img src={require('./images/merges_01.png').default} alt='PART MERGES' class='image' style={{width: '60%'}} />
<br/>

The `merge level` of a part is incremented by one with each additional merge. A level of `0` means the part is new and has not been merged yet. Parts that were merged into larger parts are marked as [inactive](/docs/en/operations/system-tables/parts) and finally deleted after a [configurable](/docs/en/operations/settings/merge-tree-settings#old-parts-lifetime) time (8 minutes by default). Over time, this creates a **tree** of merged parts. Hence the name [merge tree](/docs/en/engines/table-engines/mergetree-family) table.

## Monitoring merges

In the [what are table parts](/docs/en/parts) example, we [showed](/docs/en/parts#monitoring-table-parts) that ClickHouse tracks all table parts in the [parts](/docs/en/operations/system-tables/parts) system table. We used the following query to retrieve the merge level and the number of stored rows per active part of the example table:
```sql
SELECT
name,
level,
rows
FROM system.parts
WHERE (database = 'uk') AND (`table` = 'uk_price_paid_simple') AND active
ORDER BY name ASC;
```

The [previously documented](/docs/en/parts#monitoring-table-parts) query result shows that the example table had four active parts, each created from a single merge of the initially inserted parts:
```
┌─name────────┬─level─┬────rows─┐
1. │ all_0_5_1 │ 1 │ 6368414 │
2. │ all_12_17_1 │ 1 │ 6442494 │
3. │ all_18_23_1 │ 1 │ 5977762 │
4. │ all_6_11_1 │ 1 │ 6459763 │
└─────────────┴───────┴─────────┘
```

[Running](https://sql.clickhouse.com/?query=U0VMRUNUCiAgICBuYW1lLAogICAgbGV2ZWwsCiAgICByb3dzCkZST00gc3lzdGVtLnBhcnRzCldIRVJFIChkYXRhYmFzZSA9ICd1aycpIEFORCAoYHRhYmxlYCA9ICd1a19wcmljZV9wYWlkX3NpbXBsZScpIEFORCBhY3RpdmUKT1JERVIgQlkgbmFtZSBBU0M7&run_query=true&tab=results) the query now shows that the four parts have since merged into a single final part (as long as there are no further inserts into the table):

```
┌─name───────┬─level─┬─────rows─┐
1. │ all_0_23_2 │ 2 │ 25248433 │
└────────────┴───────┴──────────┘
```

In ClickHouse 24.10, a new [merges dashboard](https://presentations.clickhouse.com/2024-release-24.10/index.html#17) was added to the built-in [monitoring dashboards](https://clickhouse.com/blog/common-issues-you-can-solve-using-advanced-monitoring-dashboards). Available in both OSS and Cloud via the `/merges` HTTP handler, we can use it to visualize all part merges for our example table:

<img src={require('./images/merges-dashboard.gif').default} alt='PART MERGES' class='image' style={{width: '60%'}} />
<br/>

The recorded dashboard above captures the entire process, from the initial data inserts to the final merge into a single part:

① Number of active parts.

② Part merges, visually represented with boxes (size reflects part size).

③ [Write amplification](https://en.wikipedia.org/wiki/Write_amplification).

## Concurrent merges

A single ClickHouse server uses several background [merge threads](/docs/en/operations/server-configuration-parameters/settings#background_pool_size) to execute concurrent part merges:

<img src={require('./images/merges_02.png').default} alt='PART MERGES' class='image' style={{width: '60%'}} />
<br/>

Each merge thread executes a loop:

① Decide which parts to merge next, and load these parts into memory.

② Merge the parts in memory into a larger part.

③ Write the merged part to disk.

Go to ①

Note that increasing the number of CPU cores and the size of RAM allows to increase the background merge throughput.

## Memory optimized merges

ClickHouse does not necessarily load all parts to be merged into memory at once, as sketched in the [previous example](/docs/en/merges#concurrent-merges). Based on several [factors](https://github.com/ClickHouse/clickhouse-private/blob/68008d83e6c3e8487bbbb7d672d35082f80f9453/src/Storages/MergeTree/MergeTreeSettings.cpp#L208), and to reduce memory consumption (sacrificing merge speed), so-called [vertical merging](https://github.com/ClickHouse/clickhouse-private/blob/68008d83e6c3e8487bbbb7d672d35082f80f9453/src/Storages/MergeTree/MergeTreeSettings.cpp#L207) loads and merges parts by chunks of blocks instead of in one go.

## Merge mechanics

The diagram below illustrates how a single background [merge thread](/docs/en/merges#concurrent-merges) in ClickHouse merges parts (by default, without [vertical merging](/docs/en/merges#memory-optimized-merges)):

<img src={require('./images/merges_03.png').default} alt='PART MERGES' class='image' style={{width: '60%'}} />
<br/>

The part merging is performed in several steps:

**① Decompression & Loading**: The [compressed binary column files](/docs/en/parts#what-are-table-parts-in-clickhouse) from the parts to be merged are decompressed and loaded into memory.

**② Merging**: The data is merged into larger column files.

**③ Indexing**: A new [sparse primary index](/docs/en/optimize/sparse-primary-indexes) is generated for the merged column files.

**④ Compression & Storage**: The new column files and index are [compressed](/docs/en/sql-reference/statements/create/table#column_compression_codec) and saved in a new [directory](/docs/en/parts#what-are-table-parts-in-clickhouse) representing the merged data part.

Additional [metadata in data parts](/docs/en/parts), such as secondary data skipping indexes, column statistics, checksums, and min-max indexes, is also recreated based on the merged column files. We omitted these details for simplicity.

The mechanics of step ② depend on the specific [MergeTree engine](/docs/en/engines/table-engines/mergetree-family) used, as different engines handle merging differently. For example, rows may be aggregated or replaced if outdated. As mentioned earlier, this approach **offloads all data processing to background merges**, enabling **super-fast inserts** by keeping write operations lightweight and efficient.

Next, we will briefly outline the merge mechanics of specific engines in the MergeTree family.


### Standard merges

The diagram below illustrates how parts in a standard [MergeTree](/docs/en/engines/table-engines/mergetree-family/mergetree) table are merged:

<img src={require('./images/merges_04.png').default} alt='PART MERGES' class='image' style={{width: '60%'}} />
<br/>

The DDL statement in the diagram above creates a `MergeTree` table with a sorting key `(town, street)`, [meaning](/docs/en/parts#what-are-table-parts-in-clickhouse) data on disk is sorted by these columns, and a sparse primary index is generated accordingly.

The ① decompressed, pre-sorted table columns are ② merged while preserving the table’s global sorting order defined by the table’s sorting key, ③ a new sparse primary index is generated, and ④ the merged column files and index are compressed and stored as a new data part on disk.

### Replacing merges

Part merges in a [ReplacingMergeTree](/docs/en/engines/table-engines/mergetree-family/replacingmergetree) table work similarly to [standard merges](/docs/en/merges#standard-merges), but only the most recent version of each row is retained, with older versions being discarded:

<img src={require('./images/merges_05.png').default} alt='PART MERGES' class='image' style={{width: '60%'}} />
<br/>

The DDL statement in the diagram above creates a `ReplacingMergeTree` table with a sorting key `(town, street, id)`, meaning data on disk is sorted by these columns, with a sparse primary index generated accordingly.

The ② merging works similarly to a standard `MergeTree` table, combining decompressed, pre-sorted columns while preserving the global sorting order.

However, the `ReplacingMergeTree` removes duplicate rows with the same sorting key, keeping only the most recent row based on the creation timestamp of its containing part.

<br/>

### Summing merges

Numeric data is automatically summarized during merges of parts from a [SummingMergeTree](/docs/en/engines/table-engines/mergetree-family/summingmergetree) table:

<img src={require('./images/merges_06.png').default} alt='PART MERGES' class='image' style={{width: '60%'}} />
<br/>

The DDL statement in the diagram above defines a `SummingMergeTree` table with `town` as the sorting key, meaning that data on disk is sorted by this column and a sparse primary index is created accordingly.

In the ② merging step, ClickHouse replaces all rows with the same sorting key with a single row, summing the values of numeric columns.

### Aggregating merges

The `SummingMergeTree` table example from above is a specialized variant of the [AggregatingMergeTree](/docs/en/engines/table-engines/mergetree-family/aggregatingmergetree) table, allowing [automatic incremental data transformation](https://www.youtube.com/watch?v=QDAJTKZT8y4) by applying any of [90+](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference) aggregation functions during part merges:

<img src={require('./images/merges_07.png').default} alt='PART MERGES' class='image' style={{width: '60%'}} />
<br/>

The DDL statement in the diagram above creates an `AggregatingMergeTree` table with `town` as the sorting key, ensuring data is ordered by this column on disk and a corresponding sparse primary index is generated.

During ② merging, ClickHouse replaces all rows with the same sorting key with a single row storing [partial aggregation states](https://clickhouse.com/blog/clickhouse_vs_elasticsearch_mechanics_of_count_aggregations#-multi-core-parallelization) (e.g. a `sum` and a `count` for `avg()`). These states ensure accurate results through incremental background merges.
5 changes: 4 additions & 1 deletion docs/en/managing-data/core-concepts/partitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,16 @@ Then, for each identified partition, the rows are processed as [usual](/docs/en/

Note that with partitioning enabled, ClickHouse automatically creates [MinMax indexes](https://github.com/ClickHouse/ClickHouse/blob/dacc8ebb0dac5bbfce5a7541e7fc70f26f7d5065/src/Storages/MergeTree/IMergeTreeDataPart.h#L341) for each data part. These are simply files for each table column used in the partition key expression, containing the minimum and maximum values of that column within the data part.

Further note that with partitioning enabled, ClickHouse only [merges](/docs/en/parts) data parts within, but not across partitions. We sketch that for our example table from above:
### Per partition merges

With partitioning enabled, ClickHouse only [merges](/docs/en/merges) data parts within, but not across partitions. We sketch that for our example table from above:

<img src={require('./images/merges_with_partitions.png').default} alt='PART MERGES' class='image' style={{width: '100%'}} />
<br/>

As sketched in the diagram above, parts belonging to different partitions are never merged. If a partition key with high cardinality is chosen, then parts spread across thousands of partitions will never be merge candidates - exceeding preconfigured limits and causing the dreaded `Too many parts` error. Addressing this problem is simple: choose a sensible partition key with [cardinality under 1000..10000](https://github.com/ClickHouse/ClickHouse/blob/ffc5b2c56160b53cf9e5b16cfb73ba1d956f7ce4/src/Storages/MergeTree/MergeTreeDataWriter.cpp#L121).

## Monitoring partitions

You can [query](https://sql.clickhouse.com/?query=U0VMRUNUIERJU1RJTkNUIF9wYXJ0aXRpb25fdmFsdWUgQVMgcGFydGl0aW9uCkZST00gdWsudWtfcHJpY2VfcGFpZF9zaW1wbGVfcGFydGl0aW9uZWQKT1JERVIgQlkgcGFydGl0aW9uIEFTQw&run_query=true&tab=results) the list of all existing unique partitions of our example table by using the [virtual column](/docs/en/engines/table-engines#table_engines-virtual_columns) `_partition_value`:
```
Expand Down
Loading