New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support of encrypted elements in configuration file #50986
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This is an automated comment for commit 0af869f with description of existing statuses. It's updated for the latest CI running
|
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@@ -755,6 +843,11 @@ void ConfigProcessor::savePreprocessedConfig(const LoadedConfig & loaded_config, | |||
{ | |||
LOG_WARNING(log, "Couldn't save preprocessed config to {}: {}", preprocessed_path, e.displayText()); | |||
} | |||
|
|||
#if USE_SSL | |||
if (fs::path(preprocessed_path).filename() == "config.xml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIlly question, but will this work with yaml configuration files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is interesting.
I am trying to figure it out. It seems that in current code it cannot work with yaml files.
But we can add support of yaml if needed.
In the integration test we have:
tests/integration/test_config_decryption/configs/config.xml
<clickhouse>
<encryption_codecs>
<aes_128_gcm_siv>
<key_hex>00112233445566778899aabbccddeeff</key_hex>
</aes_128_gcm_siv>
<aes_256_gcm_siv>
<key_hex>00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff</key_hex>
</aes_256_gcm_siv>
</encryption_codecs>
<max_table_size_to_drop encryption_codec="AES_128_GCM_SIV">96260000000B0000000000E8FE3C087CED2205A5071078B29FD5C3B97F824911DED3217E980C</max_table_size_to_drop>
<max_partition_size_to_drop encryption_codec="AES_256_GCM_SIV">97260000000B0000000000BFFF70C4DA718754C1DA0E2F25FF9246D4783F7FFEC4089EC1CC14</max_partition_size_to_drop>
</clickhouse>
I used ymltoxml to convert it into yaml by command:
ymltoxml -i config.xml -o config.yaml
I got:
clickhouse:
encryption_codecs:
aes_128_gcm_siv:
key_hex: 00112233445566778899aabbccddeeff
aes_256_gcm_siv:
key_hex: 00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff
max_table_size_to_drop:
'@encryption_codec': AES_128_GCM_SIV
'#text': 96260000000B0000000000E8FE3C087CED2205A5071078B29FD5C3B97F824911DED3217E980C
max_partition_size_to_drop:
'@encryption_codec': AES_256_GCM_SIV
'#text': 97260000000B0000000000BFFF70C4DA718754C1DA0E2F25FF9246D4783F7FFEC4089EC1CC14
The problem is that attributes via @
as supported by ClickHouse. See the example of usage here.
But text nodes via #text
are not supported yet. See implementation of YAML parser:
ClickHouse/src/Common/Config/YAMLParser.cpp
Line 103 in bfe349a
bool is_attribute = (key.starts_with(YAML_ATTRIBUTE_PREFIX) && value_node.IsScalar()); |
So we could implement support of
#text
in YAMLParser.cpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschu1ze
OK. I added support of text nodes in YAMLParser.cpp. Thus YAML configs can be decrypted too. Also I added a test for YAML configs decryption into my integration test.
In short I see there is no a single/standard approach about attributes and text nodes in YAML.
Consider the fragment of XML config:
<max_table_size_to_drop encryption_codec="AES_128_GCM_SIV">96260000000B0000000000E8FE3C087CED2205A5071078B29FD5C3B97F824911DED3217E980C</max_table_size_to_drop>
In this fragment:
max_table_size_to_drop
is an element node.encryption_codec
is an attribute node.96260000000B0000000000E8FE3C087CED2205A5071078B29FD5C3B97F824911DED3217E980C
is a text node.
I tried different online tools they convert XML to YAML differently.
https://www.site24x7.com/tools/xml-to-yaml.html
max_table_size_to_drop:
"-encryption_codec": AES_128_GCM_SIV
"#text": 96260000000B0000000000E8FE3C087CED2205A5071078B29FD5C3B97F824911DED3217E980C
https://www.atatus.com/tools/xml-to-yaml
max_table_size_to_drop:
encryption_codec: "AES_128_GCM_SIV"
$t: "96260000000B0000000000E8FE3C087CED2205A5071078B29FD5C3B97F824911DED3217E980C"
https://beautifytools.com/xml-to-yaml-converter.php
max_table_size_to_drop:
_encryption_codec: AES_128_GCM_SIV
__text: 96260000000B0000000000E8FE3C087CED2205A5071078B29FD5C3B97F824911DED3217E980C
toString:
https://www.anyjson.in/xml-to-yaml
max_partition_size_to_drop:
'#text': 97260000000B0000000000BFFF70C4DA718754C1DA0E2F25FF9246D4783F7FFEC4089EC1CC14
'@encryption_codec': AES_256_GCM_S
This last result is the same as produced by mentioned about ymltoxml module.
And as far as we already represent attributes as nodes starting with @
I implemented support of text nodes as #text
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the XML/YAML relationship: The authoritative guide (at least as far as I see) is https://clickhouse.com/docs/en/operations/configuration-files . It says about XML attributes:
If you want to write an attribute for a Sequence or Map node, you should use a @ prefix before the attribute key. Note, that @ is reserved by YAML standard, so you should also to wrap it into double quotes:
map:
"@attr1": value1
"@attr2": value2
key: 123
which is close to the last of your examples but not 100% the same. Could you make your YAML test check this syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this code key
is not the text node. But it's the element subnode. Because it's "translated" into this code according to documentation:
<map attr1="value1" attr2="value2">
<key>123</key>
</map>
While we need it to be translated into:
<map attr1="value1" attr2="value2">
123
</map>
According to the documentation probably the sequence can do it the way we need it:
seq:
- "@attr1": value1
- "@attr2": value2
- 123
<seq attr1="value1" attr2="value2">123</seq>
BTW in my correction of YAMLParser.cpp there no exception throwing for the case when several #text
are specified. For example this way:
max_table_size_to_drop:
'#text': SOMETHING1
'@encryption_codec': AES_128_GCM_SIV
'#text': SOMETHING2
- Should we ignore it?
- Or should we throw an exception when we see
#text
the second time in config? - Or should we silently replace old value SOMETHING1 by new value SOMETHING2 when parsing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, according to the docs, the YAML would need to contain a sequence of the form
seq:
- "@attr1": value1
- "@attr2": value2
- 123
instead of a map.
Should we ignore it?
Or should we throw an exception when we see #text the second time in config?
Or should we silently replace old value SOMETHING1 by new value SOMETHING2 when parsing?
Assuming the user writes ClickHouse-specific YAML instead of YAML generated by the XML-to-YAML translators, do we need any modifications in the YAML parser at all or does the existing code work out of the box? (if it doesn't then the docs currently lie).
Besides that, I propose to implement the same behavior which exists in XML if the same tag appears multiple times, e.g.
<some_param>42</some_param>
<some_param>53</some_param>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschu1ze I think the documentation is incorrect about usage the sequence with YAML.
I tried the following:
If config.yaml file has:
encryption_codecs:
aes_128_gcm_siv:
key_hex: 00112233445566778899aabbccddeeff
aes_256_gcm_siv:
key_hex: 00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff
max_table_size_to_drop:
- '@encryption_codec': AES_128_GCM_SIV
- 96260000000B0000000000E8FE3C087CED2205A5071078B29FD5C3B97F824911DED3217E980C
max_partition_size_to_drop:
- 97260000000B0000000000BFFF70C4DA718754C1DA0E2F25FF9246D4783F7FFEC4089EC1CC14
- '@encryption_codec': AES_256_GCM_SIV
Then it created config-preprocessed.xml with contents:
<encryption_codecs>
<aes_128_gcm_siv>
<key_hex>00112233445566778899aabbccddeeff</key_hex>
</aes_128_gcm_siv>
<aes_256_gcm_siv>
<key_hex>00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff</key_hex>
</aes_256_gcm_siv>
</encryption_codecs>
<max_table_size_to_drop encryption_codec="AES_128_GCM_SIV"/>
<max_table_size_to_drop>96260000000B0000000000E8FE3C087CED2205A5071078B29FD5C3B97F824911DED3217E980C</max_table_size_to_drop>
<max_partition_size_to_drop>97260000000B0000000000BFFF70C4DA718754C1DA0E2F25FF9246D4783F7FFEC4089EC1CC14</max_partition_size_to_drop>
<max_partition_size_to_drop encryption_codec="AES_256_GCM_SIV"/>
You see it's not just about the encryption elements this is the way YAMLParser.cpp works in master. Thus we cannot use the sequences.
What do we do then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now I implemented it this way (the way number two from above's list): YAMLParser will throw an exception if it find second text node for an element node.
tests/integration/test_config_decryption/test_wrong_settings.py
Outdated
Show resolved
Hide resolved
Our company's security analysts and architect had series of meetings with security analysts of our existing clients and prospects (potential clients). They all admit that exposing encryption keys in preprocessed config is OK for them. They validate/approve this solution and they may start using ClickHouse with this feature.
During our meetings with clients and protects we showed our technical papers representing many different combinations of usage Therefore introducing the @rschu1ze We consider You may create a new issue "Implement One technical note about technical implementation of the
This PR is not the ideal solution which definitely/finally solves the encryption problem. But this PR is the new step which improves the security to such level that our prospects may start using ClickHouse with solution proposed in this PR. |
@alexey-milovidov @rschu1ze As @rvasin wrote above, this PR is the first step to make storing of sensitive information more secure. It allows to store encrypted data and encryption keys in different places while "hidden" fields are stored in the single "original" configuration file. The second step will be moving encrypted keys from configuration file to the separate files (probably stored in another storage). And these keys will not be stored in the preprocessed files as well as encrypted configuration fields. By the way, "hidden" fields can be the solution for the this second step. |
Hi all, I'm a bit late to this but as per @alexey-milovidov comment above, it does feel redundant when we have both encryption key and the encrypted value in the preprocessed, however, as per @rvasin 's comment above, I also agree that the "hidden" feature can be complimentary to this if you want to use from_env + hidden for the AES key itself. With that note, I just want to point out that environment variable is only a "so so" solution... at the end of the day, in *nix land, If an attacker manage to find a local file inclusion inside ClickHouse (eg: user able to exploit file table function and somehow bypass the restriction put in place to narrow access to only While this is more obscure and harder to get to, it is still possible - A good thing is it would protect ClickHouse from other users on the machine with no root privilege and helps with security compliance :) Perhaps in the future, beside from_env , we could have something like from_tpm / from_hsm, from_kms that would basically move the decryption outside of ClickHouse altogether and so the decrypted value only ever lives in memory... Overall, I agree, it's a step forward that may enable us with more options to secure these values in the future. |
Good. I opened #52207 for preprocessed config with hidden parts (but GitHub somehow won't let me assign @rvasin to it). No further objections from my side against this PR, let's fix the review comments, then I'll happily merge. |
This is how GitHub works: in order to assign someone the person should write a comment in the issue. So I wrote the comment. You may try to assign me now.
I am working on all the comments. I will try to fix all the remaining questions today. |
…n file" Cf. PR ClickHouse#50986 - rename XML attribute "encryption_codec" to "encrypted_by"
Closes #48291
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implement support of encrypted elements in configuration file
Added possibility to use encrypted text in leaf elements of configuration file. The text is encrypted using encryption codecs from <encryption_codecs> section.
Documentation entry for user-facing changes