New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control QPL_DEFLATE by its own cfg flag (default: off) + improve test coverage #50305
Control QPL_DEFLATE by its own cfg flag (default: off) + improve test coverage #50305
Conversation
This is an automated comment for commit 037ef19 with description of existing statuses. It's updated for the latest CI running
|
tests/queries/0_stateless/00804_test_alter_compression_codecs.sql
Outdated
Show resolved
Hide resolved
tests/queries/0_stateless/00804_test_alter_compression_codecs.sql
Outdated
Show resolved
Hide resolved
tests/queries/0_stateless/00804_test_custom_compression_codecs.sql
Outdated
Show resolved
Hide resolved
@@ -380,7 +380,7 @@ High compression levels are useful for asymmetric scenarios, like compress once, | |||
|
|||
`DEFLATE_QPL` — [Deflate compression algorithm](https://github.com/intel/qpl) implemented by Intel® Query Processing Library. Some limitations apply: | |||
|
|||
- DEFLATE_QPL is experimental and can only be used after setting configuration parameter `allow_experimental_codecs=1`. | |||
- DEFLATE_QPL is disabled by default and can only be used after setting configuration parameter `enable_qpl_deflate=1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two types of settings in ClickHouse: user/session/query-level settings and server-level (aka. global) settings. The former are centrally defined in Core/Settings.h, documented here and exist at runtime in the non-shared part of the Context
(Interpreters/Context.h). The latter are not centrally defined, but documented here, and they exist in the shared part of the Context
.
The previously used allow_experimental_codecs = 1
and the new parameter enable_qpl_deflate
are user settings. The disadvantage with them is that they need to be set anew in every session using SET enable_qpl_deflate = 1;
(or alternatively using some unbeautiful syntax in config file users.xml) - this is a bit annoying and unintuitive for something that should generally work. So, a better alternative would be to control QPL on&off at server-level. Perhaps that is something which can be done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted it. I think I should implement a new setting like below:
If "enable_qpl_deflate_codec" to be set 0 or not defined here , the server would throw an exception.
And the code change focus on this file: ./src/Storages/CompressionCodecSelector.h
If my understanding above is correct, let me try to implement it on this PR. @rschu1ze
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yepp, your understanding is correct. It would be great to have this implemented in this PR - in this case, we would avoid an intermediate code state where QPL can be enabled/disabled via user-defined settings.
The rest of the PR looks good (I did some minor adjustments), thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschu1ze New commits added. But the current situation is that deflate_qpl codec can be controlled from server setting perspective, while user still can use it through customizing codec like below:
create table test_array (`ab_version` Int32 CODEC(DEFLATE_QPL)) Engine = Log();
insert into test_array select * from numbers(1000);
Is that a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed still possible to specify the codec manually. The reason is that the new check in Storages/CompressionCodecSelector.h only affects a heuristics which automatically applies compression to data parts with specific properties (i.e. a minimum size, a minimum part-size ratio etc.).
So yes, enablement/disablement would need to be more comprehensive since explicit specification of the codec is common practice in ClickHouse. (sorry if I told you wrong ... this was not clear to me).
What about this:
- you add a check similar to what we had before (see here) in
validateCodecAndGetPreprocessedAST
. But instead of passing in a parameterenable_qpl_default
as argument (which eventually came fromcontext->getSettingsRef().enable_qpl_deflate_codec
), we parse the server configuration file (like done in Storages/CompressionCodecSelector.h). This approach should capture all situations where a codec is created. - I think that with this, we no longer need to change
CompressionCodecSelector
.
I also noticed that <enable_qpl_deflate_codec>
is nested in <case>
. That makes little sense, it should be defined directly below <compression>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New commits added. Two more questions:
-
We cannot define enable_qpl_deflate_codec inside the section of, because right now it limit the keyword start with "case" since the logic need loop the conditions start with "case", otherwise exception would happen. So i create a separate server setting named "enable_deflate_qpl_codec" which stand the same level with "compression".
-
The config context cannot be shared in below function, which actually called from client side, but it validate codec as well. I have to pass the value of enable_deflate_qpl_codec with true, is that a problem?
Otherwise, we need to make more changes on caller function : Connection::sendQuery
./src/Client/Connection.cpp --> Connection::sendQuery
CompressionCodecFactory::instance().validateCodec(method, level, !settings->allow_suspicious_codecs, settings->allow_experimental_codecs, true);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- isn't a question 😄 but the approach you chose is the right one!
- Should be fine to pass
true
as dummy value (I'll validate locally that everything behaves as expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question, since we need enable deflate_qpl on server side , how to run function test in CI system?
On local test, i can specify config.xml with enable_deflate_qpl_codec=true, but in CI system, is there any specific server config need to change?
Note to myself: Disable via constraint in CH Cloud. EDIT: or server setting. |
@@ -380,7 +380,7 @@ High compression levels are useful for asymmetric scenarios, like compress once, | |||
|
|||
`DEFLATE_QPL` — [Deflate compression algorithm](https://github.com/intel/qpl) implemented by Intel® Query Processing Library. Some limitations apply: | |||
|
|||
- DEFLATE_QPL is experimental and can only be used after setting configuration parameter `allow_experimental_codecs=1`. | |||
- DEFLATE_QPL is disabled by default and can only be used after setting configuration parameter `enable_qpl_deflate=1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed still possible to specify the codec manually. The reason is that the new check in Storages/CompressionCodecSelector.h only affects a heuristics which automatically applies compression to data parts with specific properties (i.e. a minimum size, a minimum part-size ratio etc.).
So yes, enablement/disablement would need to be more comprehensive since explicit specification of the codec is common practice in ClickHouse. (sorry if I told you wrong ... this was not clear to me).
What about this:
- you add a check similar to what we had before (see here) in
validateCodecAndGetPreprocessedAST
. But instead of passing in a parameterenable_qpl_default
as argument (which eventually came fromcontext->getSettingsRef().enable_qpl_deflate_codec
), we parse the server configuration file (like done in Storages/CompressionCodecSelector.h). This approach should capture all situations where a codec is created. - I think that with this, we no longer need to change
CompressionCodecSelector
.
I also noticed that <enable_qpl_deflate_codec>
is nested in <case>
. That makes little sense, it should be defined directly below <compression>
.
Thanks - if you don't mind, I'll continue this over here: #50775 |
system.build_options
Just including the first point in the change log as it's the important one for users.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Codec DEFLATE_QPL is now controlled via server setting "enable_deflate_qpl_codec" (default: false) instead of setting "allow_experimental_codecs". This marks QPL_DEFLATE non-experimental.