From 80a855cf918340acb0d9816b61113450ef1f59cc Mon Sep 17 00:00:00 2001 From: 0xgouda Date: Mon, 27 Oct 2025 03:14:06 +0300 Subject: [PATCH] Fix typo --- docs/managing-data/core-concepts/academic_overview.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/managing-data/core-concepts/academic_overview.mdx b/docs/managing-data/core-concepts/academic_overview.mdx index c075303cb13..07e2b1711b8 100644 --- a/docs/managing-data/core-concepts/academic_overview.mdx +++ b/docs/managing-data/core-concepts/academic_overview.mdx @@ -97,7 +97,7 @@ Figure 3: Inserts and merges for ^^MergeTree^^*-engine tables. Compared to LSM trees [\[58\]](#page-13-7) and their implementation in various databases [\[13,](#page-12-6) [26,](#page-12-7) [56\]](#page-13-8), ClickHouse treats all ^^parts^^ as equal instead of arranging them in a hierarchy. As a result, merges are no longer limited to ^^parts^^ in the same level. Since this also forgoes the implicit chronological ordering of ^^parts^^, alternative mechanisms for updates and deletes not based on tombstones are required (see Section [3.4)](#page-4-0). ClickHouse writes inserts directly to disk while other LSM-treebased stores typically use write-ahead logging (see Section [3.7)](#page-5-1). -A part corresponds to a directory on disk, containing one file for each column. As an optimization, the columns of a small part (smaller than 10 MB by default) are stored consecutively in a single file to increase the spatial locality for reads and writes. The rows of a part are further logically divided into groups of 8192 records, called granules. A ^^granule^^ represents the smallest indivisible data unit processed by the scan and index lookup operators in ClickHouse. Reads and writes of on-disk data are, however, not performed at the ^^granule^^ level but at the granularity of blocks, which combine multiple neighboring granules within a column. New blocks are formed based on a configurable byte size per ^^block^^ (by default 1 MB), i.e., the number of granules in a ^^block^^ is variable and depends on the column's data type and distribution. Blocks are furthermore compressed to reduce their size and I/O costs. By default, ClickHouse employs LZ4 [\[75\]](#page-13-9) as a general-purpose compression algorithm, but users can also specify specialized codecs like Gorilla [\[63\]](#page-13-10) or FPC [\[12\]](#page-12-8) for floating-point data. Compression algorithms can also be chained. For example, it is possible to first reduce logical redundancy in numeric values using delta coding [\[23\]](#page-12-9), then perform heavy-weight compression, and finally encrypt the data using an AES codec. Blocks are decompressed on-the-fy when they are loaded from disk into memory. To enable fast random access to individual granules despite compression, ClickHouse additionally stores for each column a mapping that associates every ^^granule^^ id with the offset of its containing compressed ^^block^^ in the column file and the offset of the ^^granule^^ in the uncompressed ^^block^^. +A part corresponds to a directory on disk, containing one file for each column. As an optimization, the columns of a small part (smaller than 10 MB by default) are stored consecutively in a single file to increase the spatial locality for reads and writes. The rows of a part are further logically divided into groups of 8192 records, called granules. A ^^granule^^ represents the smallest indivisible data unit processed by the scan and index lookup operators in ClickHouse. Reads and writes of on-disk data are, however, not performed at the ^^granule^^ level but at the granularity of blocks, which combine multiple neighboring granules within a column. New blocks are formed based on a configurable byte size per ^^block^^ (by default 1 MB), i.e., the number of granules in a ^^block^^ is variable and depends on the column's data type and distribution. Blocks are furthermore compressed to reduce their size and I/O costs. By default, ClickHouse employs LZ4 [\[75\]](#page-13-9) as a general-purpose compression algorithm, but users can also specify specialized codecs like Gorilla [\[63\]](#page-13-10) or FPC [\[12\]](#page-12-8) for floating-point data. Compression algorithms can also be chained. For example, it is possible to first reduce logical redundancy in numeric values using delta coding [\[23\]](#page-12-9), then perform heavy-weight compression, and finally encrypt the data using an AES codec. Blocks are decompressed on-the-fly when they are loaded from disk into memory. To enable fast random access to individual granules despite compression, ClickHouse additionally stores for each column a mapping that associates every ^^granule^^ id with the offset of its containing compressed ^^block^^ in the column file and the offset of the ^^granule^^ in the uncompressed ^^block^^. Columns can further be ^^dictionary^^-encoded [\[2,](#page-12-10) [77,](#page-13-11) [81\]](#page-13-12) or made nullable using two special wrapper data types: LowCardinality(T) replaces the original column values by integer ids and thus significantly reduces the storage overhead for data with few unique values. Nullable(T) adds an internal bitmap to column T, representing whether column values are NULL or not.