-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add validation of acl users #1743
Conversation
src/server/main_service.cc
Outdated
if (!acl::IsUserAllowedToInvokeCommand(*cntx, *cid)) { | ||
(*cntx)->SendError(absl::StrCat("User: ", cntx->authed_username, | ||
" does not have the credentials to execute this command")); | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the catch and it's actually a very dangerous bug. I added specifically the Verify functions but forgot to add a comment here: Any validation logic should only be placed in verify functions.
The reason for is that verification is always performed on a const connection context pointer. Its possible to do this in parallel. Executing in parallel with one context is not possible, so I use just "stub" contexts to transfer replies.
This means that if you use this context for verification, it's acl_username won't be set for squshing.
So first:
Please add a test case with squashing. You can do the following: 1. enable multi_exec_squash
2. Execute a 10 command multi/exec transaction that spans multiple shards (important - use simple single shard commands).
(IDK why you actually use pytests for everything. They're easier to use, sure 🙂 But they don't run on every pr)
Second:
There are two solutions:
-
Use VerifyCommandState to check allowance. It has docs in its header above it. It's fine for all cases except MULTI/EXEC, because with it, commands are verified only when added to the multi/exec buffer. So if at the moment EXEC is called the acl rights change, the verification is invalid. Does it work differently in Redis?
-
See
DispatchMonitor
below. It checks conn_context.squashing_info.parent? Smth like this, I don't remember 🙂 This way you can get a const pointer to the original context. -
Copy acl_username into the stub context, but I'm not a fan of this... Better let validation operate on the original context
Sorry for the inconvenience with my squashing stuff 😅 😄
PS: If you choose (2) maybe please extract the const ConnectionContext* and pass it to VerifyExecution 🙂 So all validation is nicely grouped together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my late reply, I wanted to go over in detail and understand what the exact issue is and what have I missed. Even though you explained, I originally thought that I had it covered when I first pushed this PR and added the error checking within the InvokeCommand
but what I didn't take into account is as you mentioned the stub
context. The good thing is that I learned a few things and I now have a better understanding of our squashing/multi transaction mechanism.
So a few things:
-
We can't really rely only
VerifyCommandState
or onVerifyCommandExecution
. We need both and the reason is that formulti
transactions we need to check twice as per your first point. a) when we issue the command, it could be the case, that the user does not have the credentials and therefore we can issue an error immediately. b) amulti
command was issued and some admin changed the user's acl categories between the call ofmulti
and the call toexec
. For this particular case we need to useVerifyCommandExecution
which covers that corner case. Best part is that we now also have two different errors. For the first case, the error is that the user does not have the credentials and for the later that the user's acl categories have changed inbeteween. IMO this is far clearer for the user + this is what redis does. -
There is an issue that we send multiple error responses, I will fix this but on another PR (plz see my comment on the pytest -- it will become clear what the problem is).
Copy acl_username into the stub context, but I'm not a fan of this... Better let validation operate on the original context
Neither am I, we should perform checks on the OG context and that's what my changes do.
(IDK why you actually use pytests for everything. They're easier to use, sure 🙂 But they don't run on every pr)
Because the most important for now is to have to end to end, I don't want to change unit tests until I have something stable. I am planning to add them in the future though.
P.s. I am not a big fun of having the regression tests not being run per PR but I understand the rationale of why we wouldn't want that. I think this will be problematic especially when the team grows as we will start having cascading failures (multiple people who pushed in a 3 hour window can all break the regression tests) but for now this is a non-issue so it's not worth discussing
result = await async_client.execute_command("AUTH default nopass") | ||
result == "ok" | ||
|
||
# Vlad goes rogue starts giving admin stats to random users |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤣
|
||
namespace dfly::acl { | ||
|
||
[[nodiscard]] bool IsUserAllowedToInvokeCommand(const ConnectionContext& cntx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* fix: fix index loading Signed-off-by: Vladislav <vladislav.oleshko@gmail.com> Co-authored-by: Kostas Kyrimis <kostaskyrim@gmail.com>
* feat(server): support multi eval in lock ahead mode 1. remove validation to allow multi eval only in global script mode 2. send error if there is a mode conflict when running eval inside multi 3. reset uniqe_keys_ when transaction finishes
Also, add sccache debug log in hope to understand why we get 0 hits sometimes. Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* fix: fix defrag stats Signed-off-by: Vladislav <vladislav.oleshko@gmail.com>
1. If the first request sent to the connection is large (2kb or more) Dragonfly was closing the connection. 2. Changed server side error reporting according to memcache protocol: https://github.com/memcached/memcached/blob/master/doc/protocol.txt#L172 3. Fixed the wrong casting in DispatchCommand. 4. Remove practically unused code that translated opstatus to strings. Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* feat(server): Support limiting the number of open connections. * * Update helio after the small fix was merged to master * Don't limit admin connections (and add a test case) * Resolve CR comments
* fix: Run defrag on dbs > 0 as well Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
While loading rdb snapshot, if oom is reached a bad alloc exception is thrown. Now we catch it and write warning to log and fali loader. Signed-off-by: adi_holden <adi@dragonflydb.io>
docs: snapshot_cron
* feat: implement CONFIG GET command The command returns all the matched arguments and their current values. In addition, this PR adds mutability semantics to each config - whether it can be changed at runtime. Fixes #1700 Signed-off-by: Roman Gershman <roman@dragonflydb.io> --------- Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* fix: add master metric the role metric is hard to work with as it generates two time serieses, for each of the role label values. Will remove the role metric once we update the production versions. --------- Signed-off-by: ashotland <ari@dragonflydb.io>
assert res == b"OK" | ||
|
||
res = await client.execute_command("EXEC") | ||
# TODO(we need to fix this, basiscally SQUASHED/MULTI transaction commands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dranikpg This little code needs some polishing. I will address this in a separate PR. Basically, we need to squash the error messages into one....
optional<ErrorReply> Service::VerifyCommandExecution(const CommandId* cid, | ||
const ConnectionContext* cntx) { | ||
// TODO: Move OOM check here | ||
return VerifyConnectionAclStatus(cid, cntx, "ACL rules changed between the MULTI and EXEC"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot: Actually verifycommandexecution is called for all commands, not sure why you use this error here
Once I implement
acl deluser
I will brutally test this and polish.