Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column-level compression block sizes #55201

Merged
merged 15 commits into from Jan 23, 2024

Conversation

canhld94
Copy link
Contributor

@canhld94 canhld94 commented Oct 2, 2023

Closes #54821

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Certain settings (currently min_compress_block_size and max_compress_block_size) can now be specified at column-level where they take precedence over the corresponding table-level setting. Example: CREATE TABLE tab (col String SETTINGS (min_compress_block_size = 81920, max_compress_block_size = 163840)) ENGINE = MergeTree ORDER BY tuple();

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
@canhld94
Copy link
Contributor Author

canhld94 commented Oct 6, 2023

@alexey-milovidov @nikitamikhaylov can I get some reviews for this feature :D.
I think the syntax may be not yet the best one. However the feature itself is definitely useful.

We have tested on our prod table.

  • The table xxxx_html_local has one big column, xxxx_html, so we previously set compress block size = 64MB to achieve high compression ratio. But it results in high memory consumption during SELECT, even we don't select the big column.
  • The new table xxxx_html_local2 has same schema but no global compression block size and only column xxxx_html level compress block size.
  • Result: the compression ratio is almost same, but memory usage is 100x time better when query table xxxx_html_local2.
  • If we don't tune the table compress block size (use default values), the compression ratio is only ~5.x.
-- Compression ratio
┌─table─────────────┬─count()─┬─compressed_sz─┬─uncompressed_sz─┬──────────────ratio─┐
│ xxxx_html_local   │      14228.20 GiB    │ 3.43 TiB        │ 15.385512604598656 │
│ xxxx_html_local2  │      12226.07 GiB    │ 3.42 TiB        │ 15.504667251480628 │
└───────────────────┴─────────┴───────────────┴─────────────────┴────────────────────┘

-- SELECT * on origin table with min_compression_block_size = 64MB and max_compress_block_size = 64M 
-- on table level
SELECT * EXCEPT xxxx_html
FROM xxxx_html_local
WHERE _partition_id = '9-4-0'
SETTINGS max_threads = 16
FORMAT `Null`

Query id: 4304bfcd-a3e4-4d95-b5fa-96becee33ad0

Ok.

0 rows in set. Elapsed: 1.105 sec. Processed 5.53 million rows, 725.27 MB (5.00 million rows/s., 656.11 MB/s.)
Peak memory usage: 7.68 GiB.

-- SELECT * on new table with min_compression_block_size = 64MB and max_compress_block_size = 64M
-- on column `xxxx_html ` level
SELECT * EXCEPT xxxx_html
FROM xxxx_html_local2
WHERE _partition_id = '9-4-0'
SETTINGS max_threads = 16
FORMAT `Null`

Query id: 55e7290d-a6ef-4a96-badd-7569f30fb409

Ok.

0 rows in set. Elapsed: 0.172 sec. Processed 5.53 million rows, 719.40 MB (32.19 million rows/s., 4.19 GB/s.)
Peak memory usage: 33.01 MiB.

If we don't tune the table compress block size, the compression ratio (with default settings) is only ~5.x.

@alexey-milovidov
Copy link
Member

@canhld94 The default compress block size is from 64 KB to 1 MB, and it is strange to see it could lead to the difference in 7 GB of memory usage. Is it possible that you have also changed the defaults? I don't think it's ever needed to increase the compress block size. It is one of the "factory" settings that are not expected to be changed.

@canhld94
Copy link
Contributor Author

canhld94 commented Oct 8, 2023

@canhld94 The default compress block size is from 64 KB to 1 MB, and it is strange to see it could lead to the difference in 7 GB of memory usage. Is it possible that you have also changed the defaults?

@alexey-milovidov May be the example is not clear. With default compress block, memory consumption is normal, but the compression ratio is not good. Previously, we need to increase table level min_compress_block_size and max_compress_block_size to 64MB, but it results in high memory consumption during SELECT queries. We tried to tune min_compress_block_size and max_compress_block_size and index_granularity_bytes accordingly, but none of them can achieve good compression ratio as fixed block size of 64MB.

I don't think it's ever needed to increase the compress block size. It is one of the "factory" settings that are not expected to be changed.

In our use case, the table has a big string column (e.g. the whole html source of a website). If we use default compression block size, the compression ratio is from 5-6, which is too low for our demand.

@canhld94
Copy link
Contributor Author

@alexey-milovidov I've revised the example in my previous comment as well. Hope it is more clear to you.

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Oct 17, 2023
@robot-ch-test-poll robot-ch-test-poll added the pr-feature Pull request with new product feature label Oct 17, 2023
@robot-ch-test-poll
Copy link
Contributor

robot-ch-test-poll commented Oct 17, 2023

This is an automated comment for commit 7246655 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker server and keeper imagesThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Docs checkThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Fast testsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
SQLancerFuzzing tests that detect logical bugs with SQLancer tool✅ success
SqllogicRun clickhouse on the sqllogic test set against sqlite and checks that all statements are passed✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style checkThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Unit testsRuns the unit tests for different release types✅ success
Check nameDescriptionStatus
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts❌ failure

@botsbreeder
Copy link

@alexey-milovidov we have tables where one column is a big string (100KB on average) and other columns don't have that much data. Setting min_compress_block_size to 64MB for string column helps to increase compression ratio almost twice. But if min_compress_block_size applied to the whole table all other columns use 64MB compress block and it slows down select queries and these queries require more memory. The solution is to apply min_compress_block_size to one column only and it works well in our fork (high compression ratio AND fast queries AND lower memory usage).

Resolve conflicts

Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
@canhld94
Copy link
Contributor Author

Upgrade check #57893
Stateless 00002_log_and_exception_messages_formatting look unrelated

@canhld94
Copy link
Contributor Author

@alexey-milovidov we change the syntax to declare compress block size as parameters of CODEC to be more ClickHouse friendly. Document is also updated.

CREATE TABLE t
(
        ...
        big_column String CODEC(16384, 16384)(ZSTD(9, 24)),
)
ENGINE = MergeTree ORDER BY tuple();

This change is backward incompatible, so I also add a query setting allow_declare_codec_with_parameters that is default to false now, and we can enable (or remove it) after few stable release.

@UnamedRus
Copy link
Contributor

UnamedRus commented Dec 20, 2023

Honestly, i preferred old syntax, it's more self describing, and also allow to support more per column settings. (low_cardinality or potential dictionary support for ZSTD, i'm looking for you.)

        big_column String CODEC(ZSTD(9, 24)) SETTINGS (min_compress_block_size = 16384, max_compress_block_size = 16384),

BTW, there is another potential syntax option (inspired by YDB):

CREATE TABLE series_with_families (
    series_id Uint64,
    title Utf8,
    series_info Utf8 FAMILY family_large,
    release_date Uint64,
    PRIMARY KEY (series_id),
    FAMILY default (
        DATA = "ssd",
        COMPRESSION = "off"
    ),
    FAMILY family_large (
        DATA = "rot",
        COMPRESSION = "lz4"
    )
);

@canhld94
Copy link
Contributor Author

canhld94 commented Dec 21, 2023

@UnamedRus yes, the old syntax is more declarative and more generic, but its scope is beyond of the main purpose of this PR (to have explicit compress block size for each column). For now we want to push this PR to upstream first.

Re. column level settings, it's definitely a needed feature, but different people will have different preferred syntax and we may need lots of discussion. I still advocate my previous proposed syntax and will try to push it to upstream.

COLUMN TYPE ATRRIBUTES SETTINGS (<list of settings>), 

But it'll be in another issue and PR.

@rschu1ze rschu1ze self-assigned this Jan 11, 2024
@rschu1ze
Copy link
Member

rschu1ze commented Jan 11, 2024

@canhld94 @UnamedRus After reading this PR, #54821 and #36428, I think there is some value in per-column min/max block sizes when the columns have very different average byte sizes per value ("big string column" use case) and I like to help get this merged.

Settings in ClickHouse come in global form (configured via cfg file), session/query form (SET ... = ... or SELECT ... SETTINGS) or MergeTree settings (CREATE TABLE ... SETTINGS ...). Some specific settings, e.g. min/max_compress_block_size, exist at session and MergeTree level. This PR would make it possible for specific (not all) MergeTree settings to be overruled at column level. But we should be careful which settings we really expose at column-level, it is a balance between code complexity and benefit. E.g. low_cardinality_max_dictionary_size (#36428) sounds too obscure to me.

Re syntax: We should strive for maximum consistency. CODEC(min_compress_block_size, max_compress_block_size) isn't consistent with the SETTINGS clause as used elsewhere. I also agree with @UnamedRus that such syntax isn't easily extensible, it is also hard to decipher for casual users.

COLUMN TYPE ATRRIBUTES SETTINGS (<list of settings>)
But it'll be in another issue and PR.

I like that. Is there perhaps a PR already? I am afraid that if we implement CODEC(min_compress_block_size, max_compress_block_size), it will be orthogonal to the better SETTINGS syntax, so we better start with SYNTAX right away.

@alexey-milovidov
Copy link
Member

I like the syntax:

big_column String CODEC(ZSTD(9, 24)) SETTINGS (min_compress_block_size = 16384, max_compress_block_size = 16384),

Let's finish this PR and merge...

Resolve conflicts

Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
@rschu1ze rschu1ze changed the title [RFC] Column level compress block size Column-level compression block sizes Jan 18, 2024
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
@rschu1ze rschu1ze merged commit e67076e into ClickHouse:master Jan 23, 2024
240 of 250 checks passed
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Per-column compression block size
7 participants