Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control QPL_DEFLATE by its own cfg flag (default: off) + improve test coverage #50305

Closed

Conversation

jinjunzh
Copy link
Contributor

@jinjunzh jinjunzh commented May 29, 2023

  • Codec DEFLATE_QPL is now controlled via setting "enable_qpl_deflate" (default: off) instead of setting "allow_experimental_codecs". This marks QPL_DEFLATE essentially non-experimental.
  • Improve test coverage of DEFLATE_QPL
  • Show ENABLE_QPL in system.build_options

Just including the first point in the change log as it's the important one for users.

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Codec DEFLATE_QPL is now controlled via server setting "enable_deflate_qpl_codec" (default: false) instead of setting "allow_experimental_codecs". This marks QPL_DEFLATE non-experimental.

@rschu1ze rschu1ze self-assigned this May 29, 2023
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-build Pull request with build/testing/packaging improvement label May 29, 2023
@robot-ch-test-poll1
Copy link
Contributor

robot-ch-test-poll1 commented May 29, 2023

This is an automated comment for commit 037ef19 with description of existing statuses. It's updated for the latest CI running
The full report is available here
The overall status of the commit is 🔴 failure

Check nameDescriptionStatus
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR🔴 failure
Docs CheckBuilds and tests the documentation🟡 pending
Style CheckRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report🔴 failure

tests/performance/codecs_int_insert.xml Outdated Show resolved Hide resolved
tests/integration/test_non_default_compression/test.py Outdated Show resolved Hide resolved
src/Core/Settings.h Outdated Show resolved Hide resolved
@@ -380,7 +380,7 @@ High compression levels are useful for asymmetric scenarios, like compress once,

`DEFLATE_QPL` — [Deflate compression algorithm](https://github.com/intel/qpl) implemented by Intel® Query Processing Library. Some limitations apply:

- DEFLATE_QPL is experimental and can only be used after setting configuration parameter `allow_experimental_codecs=1`.
- DEFLATE_QPL is disabled by default and can only be used after setting configuration parameter `enable_qpl_deflate=1`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two types of settings in ClickHouse: user/session/query-level settings and server-level (aka. global) settings. The former are centrally defined in Core/Settings.h, documented here and exist at runtime in the non-shared part of the Context (Interpreters/Context.h). The latter are not centrally defined, but documented here, and they exist in the shared part of the Context.

The previously used allow_experimental_codecs = 1 and the new parameter enable_qpl_deflate are user settings. The disadvantage with them is that they need to be set anew in every session using SET enable_qpl_deflate = 1; (or alternatively using some unbeautiful syntax in config file users.xml) - this is a bit annoying and unintuitive for something that should generally work. So, a better alternative would be to control QPL on&off at server-level. Perhaps that is something which can be done in a separate PR.

Copy link
Contributor Author

@jinjunzh jinjunzh Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted it. I think I should implement a new setting like below:

image

If "enable_qpl_deflate_codec" to be set 0 or not defined here , the server would throw an exception.
And the code change focus on this file: ./src/Storages/CompressionCodecSelector.h
If my understanding above is correct, let me try to implement it on this PR. @rschu1ze

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yepp, your understanding is correct. It would be great to have this implemented in this PR - in this case, we would avoid an intermediate code state where QPL can be enabled/disabled via user-defined settings.

The rest of the PR looks good (I did some minor adjustments), thanks.

Copy link
Contributor Author

@jinjunzh jinjunzh Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschu1ze New commits added. But the current situation is that deflate_qpl codec can be controlled from server setting perspective, while user still can use it through customizing codec like below:

create table test_array (`ab_version` Int32 CODEC(DEFLATE_QPL)) Engine = Log();
insert into test_array select * from numbers(1000);

Is that a problem?

Copy link
Member

@rschu1ze rschu1ze Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed still possible to specify the codec manually. The reason is that the new check in Storages/CompressionCodecSelector.h only affects a heuristics which automatically applies compression to data parts with specific properties (i.e. a minimum size, a minimum part-size ratio etc.).

So yes, enablement/disablement would need to be more comprehensive since explicit specification of the codec is common practice in ClickHouse. (sorry if I told you wrong ... this was not clear to me).

What about this:

  • you add a check similar to what we had before (see here) in validateCodecAndGetPreprocessedAST. But instead of passing in a parameter enable_qpl_default as argument (which eventually came from context->getSettingsRef().enable_qpl_deflate_codec), we parse the server configuration file (like done in Storages/CompressionCodecSelector.h). This approach should capture all situations where a codec is created.
  • I think that with this, we no longer need to change CompressionCodecSelector.

I also noticed that <enable_qpl_deflate_codec> is nested in <case>. That makes little sense, it should be defined directly below <compression>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New commits added. Two more questions:

  1. We cannot define enable_qpl_deflate_codec inside the section of, because right now it limit the keyword start with "case" since the logic need loop the conditions start with "case", otherwise exception would happen. So i create a separate server setting named "enable_deflate_qpl_codec" which stand the same level with "compression".
    image

  2. The config context cannot be shared in below function, which actually called from client side, but it validate codec as well. I have to pass the value of enable_deflate_qpl_codec with true, is that a problem?
    Otherwise, we need to make more changes on caller function : Connection::sendQuery
    ./src/Client/Connection.cpp --> Connection::sendQuery
    CompressionCodecFactory::instance().validateCodec(method, level, !settings->allow_suspicious_codecs, settings->allow_experimental_codecs, true);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. isn't a question 😄 but the approach you chose is the right one!
  2. Should be fine to pass true as dummy value (I'll validate locally that everything behaves as expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question, since we need enable deflate_qpl on server side , how to run function test in CI system?
On local test, i can specify config.xml with enable_deflate_qpl_codec=true, but in CI system, is there any specific server config need to change?

src/Compression/ICompressionCodec.h Show resolved Hide resolved
@rschu1ze rschu1ze changed the title add extensive test for deflate_qpl codec Control QPL_DEFLATE by its own cfg flag (default: off) + improve test coverage May 29, 2023
@rschu1ze
Copy link
Member

rschu1ze commented May 30, 2023

Note to myself: Disable via constraint in CH Cloud.

EDIT: or server setting.

@ClickHouse ClickHouse deleted a comment from clickhouse-ci bot Jun 1, 2023
@ClickHouse ClickHouse deleted a comment from clickhouse-ci bot Jun 1, 2023
@ClickHouse ClickHouse deleted a comment from clickhouse-ci bot Jun 1, 2023
@robot-ch-test-poll robot-ch-test-poll added pr-feature Pull request with new product feature and removed pr-build Pull request with build/testing/packaging improvement labels Jun 1, 2023
docs/en/sql-reference/statements/create/table.md Outdated Show resolved Hide resolved
src/Storages/CompressionCodecSelector.h Outdated Show resolved Hide resolved
src/Storages/CompressionCodecSelector.h Outdated Show resolved Hide resolved
@@ -380,7 +380,7 @@ High compression levels are useful for asymmetric scenarios, like compress once,

`DEFLATE_QPL` — [Deflate compression algorithm](https://github.com/intel/qpl) implemented by Intel® Query Processing Library. Some limitations apply:

- DEFLATE_QPL is experimental and can only be used after setting configuration parameter `allow_experimental_codecs=1`.
- DEFLATE_QPL is disabled by default and can only be used after setting configuration parameter `enable_qpl_deflate=1`.
Copy link
Member

@rschu1ze rschu1ze Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed still possible to specify the codec manually. The reason is that the new check in Storages/CompressionCodecSelector.h only affects a heuristics which automatically applies compression to data parts with specific properties (i.e. a minimum size, a minimum part-size ratio etc.).

So yes, enablement/disablement would need to be more comprehensive since explicit specification of the codec is common practice in ClickHouse. (sorry if I told you wrong ... this was not clear to me).

What about this:

  • you add a check similar to what we had before (see here) in validateCodecAndGetPreprocessedAST. But instead of passing in a parameter enable_qpl_default as argument (which eventually came from context->getSettingsRef().enable_qpl_deflate_codec), we parse the server configuration file (like done in Storages/CompressionCodecSelector.h). This approach should capture all situations where a codec is created.
  • I think that with this, we no longer need to change CompressionCodecSelector.

I also noticed that <enable_qpl_deflate_codec> is nested in <case>. That makes little sense, it should be defined directly below <compression>.

@rschu1ze
Copy link
Member

rschu1ze commented Jun 9, 2023

Thanks - if you don't mind, I'll continue this over here: #50775

@rschu1ze rschu1ze closed this Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants