-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Better access validation in ON CLUSTER queries.
#71334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is an automated comment for commit f83a89a with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
Also, @vitlibar, what do you think about this? |
# Conflicts: # src/Core/ServerSettings.cpp # src/Interpreters/executeDDLQueryOnCluster.cpp
For asynchronous insert queue we store and later use I'm not sure |
|
Dear @tavplubix, @vitlibar, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
# Conflicts: # src/Core/ServerSettings.cpp # src/Interpreters/executeDDLQueryOnCluster.cpp
|
Dear @tuanpach, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
# Conflicts: # src/Interpreters/DDLTask.cpp
|
Workflow [PR], commit [1916d8e] Summary: ❌
|
| entry.setSettingsIfRequired(context); | ||
| entry.tracing_context = OpenTelemetry::CurrentContext(); | ||
| entry.initial_query_id = context->getClientInfo().initial_query_id; | ||
| entry.initiator_user = *context->getUserID(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't context->getUserID() be nullopt?
src/Interpreters/DDLTask.cpp
Outdated
| if (!user) | ||
| LOG_INFO(getLogger("DDLTask"), "Initiator user is not present on the instance. Will use the global user for the query execution."); | ||
| else | ||
| query_context->setUser(entry.initiator_user, entry.initiator_user_roles); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the initiator user just hasn't been replicated to the current shard yet?
According to the code the query will be executed on behalf of the global user in that case.
It seems it would be better to fail instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a lot. Since both replicated storage and DDL queue use ZooKeeper, which is a log-based consensus algorithm, it's technically impossible to have a state where a user is not yet written to the log, but it already writes DDL queries.
On the other hand, clusters with no access replications should work as they did before. Otherwise, the incompatibility will be devastating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least DDLWorker's queue and ReplicatedAccessStorage's queue work in different threads. They don't have to do things in sync. Also there can be a configuration which uses multiple ZooKeeper servers so these queues can connect even to different servers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a good enough solution to that is adding a setting which sets some waiting time for such cases. I mean if a shard fails to find the initiator by its UUID it could wait for a while before throwing an exception and thus failing the current query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User creation in replicated storage is an operation with a commit to the Keeper log, so no issue on the second consern.
For the first part, yes, indeed, it can potentially be a problem (mostly on paper, the atacker has to fully control network to the target instance to use this potential vurnalability). An easy fix would be just to throw an error if we use ReplicatedAccessStorage only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the atacker has to fully control network to the target instance to use this potential vurnalability
If a client executes hundreds or thousands of queries (which is quite normal for our Cloud) then this problem will appear quite often. And throwing an error immediately is secure but it is not nice to users so they will be complaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got another idea but it requires changing the interface of IAccessStorage. I think we could modify ReplicatedAccessStorage so it could check more thoroughly that a specified user doesn't really exist (no node in ZooKeeper) before throwing exception.
I mean that exception-throwing functions ReplicatedAccessStorage::getID() and ReplicatedAccessStorage::read(const UUID & id, bool throw_if_not_exists) in case if there is no entry in memory_storage and if throw_if_not_exists == true then these function could try to read the correspondent nodes from ZooKeeper immediately without waiting a queue, and then throw exception User not found only after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upd:
What if the initiator user just hasn't been replicated to the current shard yet?
Let's just ignore this problem in this PR. It seems it shouldn't be a very big problem in most configurations. If it becomes kind of a big problem for some customer they will be able to just turn off the server setting enabling this PR's improvements (see my comment) and ignore initiator_user & initiator_roles specified in DDLTasks.
According to the code the query will be executed on behalf of the global user in that case. It seems it would be better to fail instead.
Let's just fail if initiator_user or initiator_roles don't exist. Using the global context seems very random, it's better not to do so. If a customer doesn't like this PR they will have the server setting to turn it off.
I've got another idea but it requires changing the interface of IAccessStorage. I think we could modify ReplicatedAccessStorage so it could check more thoroughly that a specified user doesn't really exist (no node in ZooKeeper) before throwing exception.
Let's not do that - any big changes of ReplicatedAccessStorage can cause new troubles, we don't want to go into that because of this PR.
src/Interpreters/DDLTask.h
Outdated
| std::optional<UUID> parent_table_uuid; | ||
|
|
||
| UUID initiator_user; | ||
| std::vector<UUID> initiator_user_roles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it's better to use names just because it's more general (and for ReplicatedAccessStorage it's almost the same). Also we need a way to enable/disable this feature, probably via some configuration setting, because not everyone will want it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is disabled by default right now with the distributed_ddl_entry_format_version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what if someone wants to disable it after a while after we increase distributed_ddl_entry_format_version once again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, they can set distributed_ddl_entry_format_version to the previous version. That is also why I don't want to throw any exceptions in the DDL execution to make these changes as backward compatible as possible, so there is no need to turn off this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upd:
And what if someone wants to disable it after a while after we increase distributed_ddl_entry_format_version once again?
Let's add a server setting which will allow to enable or disable both sending initiator_user / initiator_roles and using them on replicas/shards.
|
Dear @vitlibar, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
# Conflicts: # src/Interpreters/DDLTask.cpp
Add a server setting.
|
CI failures are unrelated: |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
All DDL
ON CLUSTERqueries now execute with the original query user context for better access validation.Details
This PR adds two new fields to the DDL entry:
initiator_user- a user's name from the original request.initiator_roles- a user's roles from the original request.If an
initiator_useris not present on the cluster's instance, the request will fail. This behaviour can be controlled by a new server settingdistributed_ddl_use_initial_user_and_roles