Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCS-439: RU review. EN translation. Data storage policies. #7597

Merged
36 changes: 31 additions & 5 deletions docs/en/operations/system_tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,7 @@ Columns:
- `table` (`String`) – Name of the table.
- `engine` (`String`) – Name of the table engine without parameters.
- `path` (`String`) – Absolute path to the folder with data part files.
- `disk` (`String`) – Name of a disk that stores the data part.
- `hash_of_all_files` (`String`) – [sipHash128](../query_language/functions/hash_functions.md#hash_functions-siphash128) of compressed files.
- `hash_of_uncompressed_files` (`String`) – [sipHash128](../query_language/functions/hash_functions.md#hash_functions-siphash128) of uncompressed files (files with marks, index file etc.).
- `uncompressed_hash_of_compressed_files` (`String`) – [sipHash128](../query_language/functions/hash_functions.md#hash_functions-siphash128) of data in the compressed files as if they were uncompressed.
Expand All @@ -354,11 +355,12 @@ This table contains information about events that occurred with [data parts](tab
The `system.part_log` table contains the following columns:

- `event_type` (Enum) — Type of the event that occurred with the data part. Can have one of the following values:
- `NEW_PART` — inserting
- `MERGE_PARTS` — merging
- `DOWNLOAD_PART` — downloading
- `REMOVE_PART` — removing or detaching using [DETACH PARTITION](../query_language/alter.md#alter_detach-partition)
- `MUTATE_PART` — updating.
- `NEW_PART` — Inserting of a new data part.
- `MERGE_PARTS` — Merging of data parts.
- `DOWNLOAD_PART` — Downloading a data part.
- `REMOVE_PART` — Removing or detaching a data part using [DETACH PARTITION](../query_language/alter.md#alter_detach-partition).
- `MUTATE_PART` — Mutating of a data part.
- `MOVE_PART` — Moving the data part from the one disk to another one.
- `event_date` (Date) — Event date.
- `event_time` (DateTime) — Event time.
- `duration_ms` (UInt64) — Duration.
Expand Down Expand Up @@ -761,6 +763,30 @@ If there were problems with mutating some parts, the following columns contain a

## system.disks {#system_tables-disks}

Contains information about disks defined in the [server configuration](table_engines/mergetree.md#table_engine-mergetree-multiple-volumes_configure).

Columns:

- `name` ([String](../data_types/string.md)) — Name of a disk in the server configuration.
- `path` ([String](../data_types/string.md)) — Path to the mount point in the file system.
- `free_space` ([UInt64](../data_types/int_uint.md)) — Free space on disk in bytes.
- `total_space` ([UInt64](../data_types/int_uint.md)) — Disk volume in bytes.
- `keep_free_space` ([UInt64](../data_types/int_uint.md)) — Amount of disk space that should stay free on disk in bytes. Defined in the `keep_free_space_bytes` parameter of disk configuration.


## system.storage_policies {#system_tables-storage_policies}

Contains information about storage policies and volumes defined in the [server configuration](table_engines/mergetree.md#table_engine-mergetree-multiple-volumes_configure).

Columns:

- `policy_name` ([String](../data_types/string.md)) — Name of the storage policy.
- `volume_name` ([String](../data_types/string.md)) — Volume name defined in the storage policy.
- `volume_priority` ([UInt64](../data_types/int_uint.md)) — Volume order number in the configuration.
- `disks` ([Array(String)](../data_types/array.md)) — Disk names, defined in the storage policy.
- `max_data_part_size` ([UInt64](../data_types/int_uint.md)) — Maximum size of a data part that can be stored on volume disks (0 — no limit).
- `move_factor` ([Float64](..data_types/float.md)) — Ratio of free disk space. When the ratio exceeds the value of configuration parameter, ClickHouse start to move data to the next volume in order.

If the storage policy contains more then one volume, then information for each volume is stored in the individual row of the table.

[Original article](https://clickhouse.yandex/docs/en/operations/system_tables/) <!--hide-->
104 changes: 68 additions & 36 deletions docs/en/operations/table_engines/mergetree.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,9 @@ For a description of parameters, see the [CREATE query description](../../query_
- `min_merge_bytes_to_use_direct_io` — The minimum data volume for merge operation that is required for using direct I/O access to the storage disk. When merging data parts, ClickHouse calculates the total storage volume of all the data to be merged. If the volume exceeds `min_merge_bytes_to_use_direct_io` bytes, ClickHouse reads and writes the data to the storage disk using the direct I/O interface (`O_DIRECT` option). If `min_merge_bytes_to_use_direct_io = 0`, then direct I/O is disabled. Default value: `10 * 1024 * 1024 * 1024` bytes.
<a name="mergetree_setting-merge_with_ttl_timeout"></a>
- `merge_with_ttl_timeout` — Minimum delay in seconds before repeating a merge with TTL. Default value: 86400 (1 day).
- `write_final_mark` — Enables or disables writing the final index mark at the end of the data part. Default value: 1. Don't turn it off.
- `write_final_mark` — Enables or disables writing the final index mark at the end of data part. Default value: 1. Don't turn it off.
- `storage_policy` — Storage policy. See [Using Multiple Block Devices for Data Storage](#table_engine-mergetree-multiple-volumes).


**Example of Sections Setting**

Expand Down Expand Up @@ -462,53 +464,89 @@ If you perform the `SELECT` query between merges, you may get expired data. To a
[Original article](https://clickhouse.yandex/docs/en/operations/table_engines/mergetree/) <!--hide-->


## Using multiple block devices for data storage {#table_engine-mergetree-multiple-volumes}
## Using Multiple Block Devices for Data Storage {#table_engine-mergetree-multiple-volumes}

### General
### Introduction

Tables of the MergeTree family are able to store their data on multiple block devices, which may be useful when, for instance, the data of a certain table are implicitly split into "hot" and "cold". The most recent data is regularly requested but requires only a small amount of space. On the contrary, the fat-tailed historical data is requested rarely. If several disks are available, the "hot" data may be located on fast disks (NVMe SSDs or even in memory), while the "cold" data - on relatively slow ones (HDD).
`MergeTree` family table engines can store data on multiple block devices. For example, it can be useful when the data of a certain table are implicitly split into "hot" and "cold". The most recent data is regularly requested but requires only a small amount of space. On the contrary, the fat-tailed historical data is requested rarely. If several disks are available, the "hot" data may be located on fast disks (for example, NVMe SSDs or in memory), while the "cold" data - on relatively slow ones (for example, HDD).

Part is the minimum movable unit for MergeTree tables. The data belonging to one part are stored on one disk. Parts can be moved between disks in the background (according to user settings) as well as by means of the [ALTER](../../query_language/alter.md#alter_move-partition) queries.
Data part is the minimum movable unit for `MergeTree`-engine tables. The data belonging to one part are stored on one disk. Data parts can be moved between disks in the background (according to user settings) as well as by means of the [ALTER](../../query_language/alter.md#alter_move-partition) queries.

### Terms
* Disk — a block device mounted to the filesystem.
* Default disk — a disk that contains the path specified in the `<path>` tag in `config.xml`.
* Volume — an ordered set of equal disks (similar to [JBOD](https://en.wikipedia.org/wiki/Non-RAID_drive_architectures)).
* Storage policy — a number of volumes together with the rules for moving data between them.

The names given to the described entities can be found in the system tables, [system.storage_policies](../system_tables.md#system_tables-storage_policies) and [system.disks](../system_tables.md#system_tables-disks). Storage policy name can be used as a parameter for tables of the MergeTree family.
- Disk — Block device mounted to the filesystem.
- Default disk — Disk that stores the path specified in the [path](../server_settings/settings.md#server_settings-path) server setting.
- Volume — Ordered set of equal disks (similar to [JBOD](https://en.wikipedia.org/wiki/Non-RAID_drive_architectures)).
- Storage policy — Set of volumes and the rules for moving data between them.

The names given to the described entities can be found in the system tables, [system.storage_policies](../system_tables.md#system_tables-storage_policies) and [system.disks](../system_tables.md#system_tables-disks). To apply one of the configured storage policies for a table, use the `storage_policy` setting of `MergeTree`-engine family tables.

### Configuration {#table_engine-mergetree-multiple-volumes_configure}

Disks, volumes and storage policies should be declared inside the `<storage_configuration>` tag either in the main file `config.xml` or in a distinct file in the `config.d` directory. This section in a configuration file has the following structure:
Disks, volumes and storage policies should be declared inside the `<storage_configuration>` tag either in the main file `config.xml` or in a distinct file in the `config.d` directory.

Configuration structure:

```xml
<disks>
<fast_disk> <!-- disk name -->
<disk_name_1> <!-- disk name -->
<path>/mnt/fast_ssd/clickhouse</path>
</fast_disk>
<disk1>
</disk_name_1>
<disk_name_2>
<path>/mnt/hdd1/clickhouse</path>
<keep_free_space_bytes>10485760</keep_free_space_bytes>_
</disk1>
<disk2>
</disk_name_2>
<disk_name_3>
<path>/mnt/hdd2/clickhouse</path>
<keep_free_space_bytes>10485760</keep_free_space_bytes>_
</disk2>
</disk_name_3>

...
</disks>
```

where
Tags:

* the disk name is given as a tag name.
* `path` — path under which a server will store data (`data` and `shadow` folders), should be terminated with '/'.
* `keep_free_space_bytes` — the amount of free disk space to be reserved.
- `<disk_name_N>` — Disk name. Names must be different for all disks.
- `path` — path under which a server will store data (`data` and `shadow` folders), should be terminated with '/'.
- `keep_free_space_bytes` — the amount of free disk space to be reserved.

The order of the disk definition is not important.

Storage policies configuration:
Storage policies configuration markup:

```xml
<policies>
<policy_name_1>
<volumes>
<volume_name_1>
<disk>disk_name_from_disks_configuration</disk>
<max_data_part_size_bytes>1073741824</max_data_part_size_bytes>
</volume_name_1>
<volume_name_2>
<!-- configuration -->
</volume_name_2>
<!-- more volumes -->
</volumes>
<move_factor>0.2</move_factor>
</policy_name_1>
<policy_name_2>
<!-- configuration -->
</policy_name_2>

<!-- more policies -->
</policies>
```

Tags:

- `policy_name_N` — Policy name. Policy names must be unique.
- `volume_name_N` — Volume name. Volume names must be unique.
- `disk` — a disk within a volume.
- `max_data_part_size_bytes` — the maximum size of a part that can be stored on any of the volume's disks.
- `move_factor` — when the amount of available space gets lower than this factor, data automatically start to move on the next volume if any (by default, 0.1).

Cofiguration examples:

```xml
<policies>
Expand Down Expand Up @@ -536,16 +574,9 @@ Storage policies configuration:
</policies>
```

where

* volume and storage policy names are given as tag names.
* `disk` — a disk within a volume.
* `max_data_part_size_bytes` — the maximum size of a part that can be stored on any of the volume's disks.
* `move_factor` — when the amount of available space gets lower than this factor, data automatically start to move on the next volume if any (by default, 0.1).

In given example, the `hdd_in_order` policy implements the [round-robin](https://en.wikipedia.org/wiki/Round-robin_scheduling) approach. Thus this policy defines only one volume (`single`), the data parts are stored on all its disks in circular order. Such policy can be quite useful if there are several similar disks are mounted to the system, but RAID is not configured. Keep in mind that each individual disk drive is not reliable and you might want to compensate it with replication factor of 3 or more.

In the given example, the `hdd_in_order` policy implements the [round-robin](https://en.wikipedia.org/wiki/Round-robin_scheduling) approach. Since the policy defines only one volume (`single`), the data are stored on all its disks in circular order. Such a policy can be quite useful if there are several similar disks mounted to the system. If there are different disks, the policy `moving_from_ssd_to_hdd` can be used instead.
The volume `hot` consists of an SSD disk (`fast_ssd`), and the maximum size of a part that can be stored on this volume is 1GB. All the parts with the size larger than 1GB will be stored directly on the `cold` volume, which contains an HDD disk `disk1`.
If there are different kinds of disks available in the system, `moving_from_ssd_to_hdd` policy can be used instead. The volume `hot` consists of an SSD disk (`fast_ssd`), and the maximum size of a part that can be stored on this volume is 1GB. All the parts with the size larger than 1GB will be stored directly on the `cold` volume, which contains an HDD disk `disk1`.
Also, once the disk `fast_ssd` gets filled by more than 80%, data will be transferred to the `disk1` by a background process.

The order of volume enumeration within a storage policy is important. Once a volume is overfilled, data are moved to the next one. The order of disk enumeration is important as well because data are stored on them in turns.
Expand All @@ -568,12 +599,12 @@ The `default` storage policy implies using only one volume, which consists of on

### Details

In the case of MergeTree tables, data is getting to disk in different ways:
In the case of `MergeTree` tables, data is getting to disk in different ways:

* as a result of an insert (`INSERT` query).
* during background merges and [mutations](../../query_language/alter.md#alter-mutations).
* when downloading from another replica.
* as a result of partition freezing [ALTER TABLE ... FREEZE PARTITION](../../query_language/alter.md#alter_freeze-partition).
- As a result of an insert (`INSERT` query).
- During background merges and [mutations](../../query_language/alter.md#alter-mutations).
- When downloading from another replica.
- As a result of partition freezing [ALTER TABLE ... FREEZE PARTITION](../../query_language/alter.md#alter_freeze-partition).

In all these cases except for mutations and partition freezing, a part is stored on a volume and a disk according to the given storage policy:

Expand All @@ -592,3 +623,4 @@ Moving data does not interfere with data replication. Therefore, different stora
After the completion of background merges and mutations, old parts are removed only after a certain amount of time (`old_parts_lifetime`).
During this time, they are not moved to other volumes or disks. Therefore, until the parts are finally removed, they are still taken into account for evaluation of the occupied disk space.

[Original article](https://clickhouse.yandex/docs/ru/operations/table_engines/mergetree/) <!--hide-->
26 changes: 24 additions & 2 deletions docs/en/query_language/alter.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ The following operations with [partitions](../operations/table_engines/custom_pa
- [CLEAR INDEX IN PARTITION](#alter_clear-index-partition) - Resets the specified secondary index in a partition.
- [FREEZE PARTITION](#alter_freeze-partition) – Creates a backup of a partition.
- [FETCH PARTITION](#alter_fetch-partition) – Downloads a partition from another server.

- [MOVE PARTITION|PART](#alter_move-partition) – Move partition/data part to another disk or volume.
#### DETACH PARTITION {#alter_detach-partition}

```sql
Expand Down Expand Up @@ -291,7 +291,7 @@ ALTER TABLE table_name FREEZE [PARTITION partition_expr]

This query creates a local backup of a specified partition. If the `PARTITION` clause is omitted, the query creates the backup of all partitions at once.

!!! note
!!! note "Note"
The entire backup process is performed without stopping the server.

Note that for old-styled tables you can specify the prefix of the partition name (for example, '2019') - then the query creates the backup for all the corresponding partitions. Read about setting the partition expression in a section [How to specify the partition expression](#alter-how-to-specify-part-expr).
Expand All @@ -301,6 +301,9 @@ At the time of execution, for a data snapshot, the query creates hardlinks to a
- `/var/lib/clickhouse/` is the working ClickHouse directory specified in the config.
- `N` is the incremental number of the backup.

!!! note "Note"
If you use [a set of disks for data storage in a table](../operations/table_engines/mergetree.md#table_engine-mergetree-multiple-volumes), the `shadow/N` directory appears on every disk, storing data parts that matched by the `PARTITION` expression.

The same structure of directories is created inside the backup as inside `/var/lib/clickhouse/`. The query performs 'chmod' for all files, forbidding writing into them.

After creating the backup, you can copy the data from `/var/lib/clickhouse/shadow/` to the remote server and then delete it from the local server. Note that the `ALTER t FREEZE PARTITION` query is not replicated. It creates a local backup only on the local server.
Expand Down Expand Up @@ -357,6 +360,25 @@ Although the query is called `ALTER TABLE`, it does not change the table structu

#### MOVE PARTITION|PART {#alter_move-partition}

Moves partitions or data parts to another volume or disk for `MergeTree`-engine tables. See [Using Multiple Block Devices for Data Storage](../operations/table_engines/mergetree.md#table_engine-mergetree-multiple-volumes).

```sql
ALTER TABLE table_name MOVE PARTITION|PART partition_expr TO DISK|VOLUME 'disk_name'
```

The `ALTER TABLE t MOVE` query:

- Not replicated, because different replicas can have different storage policies.
- Returns an error if the specified disk or volume is not configured. Query also returns an error if conditions of data moving, that specified in the storage policy, can't be applied.
- Can return an error in the case, when data to be moved is already moved by a background process, concurrent `ALTER TABLE t MOVE` query or as a result of background data merging. A user shouldn't perform any additional actions in this case.

Example:

```sql
ALTER TABLE hits MOVE PART '20190301_14343_16206_438' TO VOLUME 'slow'
ALTER TABLE hits MOVE PARTITION '2019-09-01' TO DISK 'fast_ssd'
```

#### How To Set Partition Expression {#alter-how-to-specify-part-expr}

You can specify the partition expression in `ALTER ... PARTITION` queries in different ways:
Expand Down