-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: Add design doc for RBAC #17702
doc: Add design doc for RBAC #17702
Conversation
| `EXPLAIN` | usually `OBJECT(r)` | | ||
| `INSERT INTO ... VALUES` | `OBJECT(a)` | | ||
| `INSERT INTO ... SELECT` | `CLUSTER(U)`,`OBJECT(a)` usually `OBJECT(r)` | | ||
| `{SELECT, SHOW, SUBSCRIBE}` | `CLUSTER(U)`, usually `OBJECT(r)` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Cara pointed out, this could prevent SHOW
and other introspection queries which might not be desired. Do we want to special case those somehow? This one is tricky to get right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one idea that has been shot down before, but might warrant reconsideration in this context. All SHOW
commands are run in the mz_introspection
cluster, and all roles are automatically given CLUSTER(U)
for mz_introspection
and OBJECT(r)
for builtin objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It always seemed a little strange to make users hop clusters to run introspection queries, so that option is worth considering for ux alone.
Flagging that there are some mz_introspection
use cases we missed in the initial spec, which I'm adding there too:
- Support uses
mz_introspection
clusters to debug client environments - They need to be able to to SELECT across all mz_* tables and execute EXPLAIN and DESCRIBE, ideally across clusters
- Current permissions on this cluster are hardcoded in environmentd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about select queries on objects that don't inherently exist on a cluster?
Would you need to be in a cluster that you have CLUSTER(U)
privileges for that cluster when you run the SELECT
?
Aka we're saying that anyone who has access to Materialize should have usage access to at least one cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you need to be in a cluster that you have CLUSTER(U) privileges for that cluster when you run the SELECT?
For this design yes. The rationale is that your select will use some of the compute resources in the current cluster, and therefore you will need usage privileges on that cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it makes sense.
@ggnall FYI - for our comms / as we think about our guidance
I was thinking that we'd be able to leave pre-defined roles out of this initial project, but after a conversation on slack I think that we'll need to include them. I'll update the design to add a section for pre-defined roles. I won't change any of the existing sections, so it shouldn't affect any on-going reviews. |
I made the updates, it did end up touching a couple of existing sections, but only very slightly. I kept it all in a single small commit that you can look at. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Role/public changes lgtm.
I'm planning to review this by the end of the week! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came across this comment by @mjibson on another issue #13430 (comment) about preventing the creation of duplicate indexes
A concern should we implement RBAC: a user could be denied read access to a mv/index due to RBAC, but then this feature might also prevent them from creating an index/mv of their own. We should thus perhaps issue a NOTICE that there was a duplicate, pointing to the first one, and the user can then decide to immediately DROP what they created.
It's a good question about what we should do if someone tries to create an object that already exists but they don't have access to, e.g. a database of schema. Is it leaking information somehow by producing an error? Curious what postgres does.
When a new user logs in, we will create a new role for them with only the `LOGIN` and `INHERIT` attributes. If that user | ||
is a frontegg admin, then the role will also have the `SUPERUSER` attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this scenario play out on every login? Or only on the initial one?
When we eventually decouple Account permissions (or build our own), it probably makes sense to have SQL/RBAC be the system of record on who can do what.
From that perspective, an existing explicitly GRANTed role should take precedence. This would enable a case where a user needs to manage users, billing, etc., but should have limited or no database access.
The simplest thing to explain is: "Account admins are always SUPERUSERS", but it seems reasonable to say "Account admins are always SUPERUSERS by default, unless they are explicitly granted another role".
There will be extra steps in the latter case to upgrade users:
- OrgAdmin -> OrgMember: would retain explicitly granted roles
- OrgMember -> OrgAdmin: would retain explicitly granted roles. Another SUPERUSER would have to remove existing roles or GRANT them SUPERUSER.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From another perspective, I'm not 100% sure how important this scenario is:
This would enable a case where a user needs to manage users, billing, etc., but should have limited or no database access.
If it's simpler for us to implement "Account admins are always SUPERUSERS" and it's reversible, we could start there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggnall The above scenario is about a frontegg user who was initially a non-admin frontegg user, but was then later granted admin in frontegg. Should we bump up their perms in mz? Also applies the other way around: if a frontegg user later gets their admin role revoked, should we also revoke mz superuser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjibson We may have crossed messages but for:
"Account admins are always SUPERUSERS"
...Yes, frontegg admins grants/revokes should probably be paired with mz superuser grants/revokes. It would make sense to keep these in sync, so that users that invite new members to their organization are also able to fully manage access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're never allowed to specify LOGIN or SUPERUSER when you create a role. The role in Materialize always exists with neither of those characteristics.
...
When you log in to a role with Frontegg and your JWT has the "Organization Admin" role, you implicitly get SUPERUSER privilege for that session.
One consequence of this is that we lose the mechanism to revoke SUPERUSER
from someone while they're logged in. So as long as someone remains connected, they retain SUPERUSER
after it's been revoked in Frontegg. Unless, we invent some new syntax for this and make sure to always update this field in the session before executing a command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're never allowed to specify LOGIN or SUPERUSER when you create a role. The role in Materialize always exists with neither of those characteristics.
...
When you log in to a role with Frontegg and your JWT has the "Organization Admin" role, you implicitly get SUPERUSER privilege for that session.One consequence of this is that we lose the mechanism to revoke
SUPERUSER
from someone while they're logged in. So as long as someone remains connected, they retainSUPERUSER
after it's been revoked in Frontegg. Unless, we invent some new syntax for this and make sure to always update this field in the session before executing a command.
Or we have some periodic Frontegg refresh that periodically re-authenticates through Frontegg and updates these attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I've updated the doc with what we've discussed in this thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we have some periodic Frontegg refresh that periodically re-authenticates through Frontegg and updates these attributes.
We've already got a task that periodically refreshes access tokens for pgwire connections!
materialize/src/pgwire/src/protocol.rs
Line 178 in a757249
.and_then(|token| frontegg.continuously_validate_access_token(token, user.clone())) |
Would be pretty easy to stitch in SUPERUSER
updates to that task, I think. Would only apply to pgwire sessions and not HTTP sessions, but I think that's ok; pgwire sessions tend to be long lived while http sessions tend to be short lived.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we have some periodic Frontegg refresh that periodically re-authenticates through Frontegg and updates these attributes.
We've already got a task that periodically refreshes access tokens for pgwire connections!
materialize/src/pgwire/src/protocol.rs
Line 178 in a757249
.and_then(|token| frontegg.continuously_validate_access_token(token, user.clone())) Would be pretty easy to stitch in
SUPERUSER
updates to that task, I think. Would only apply to pgwire sessions and not HTTP sessions, but I think that's ok; pgwire sessions tend to be long lived while http sessions tend to be short lived.
I've updated the doc to include this.
There are actually two scenarios here:
Though this is such a weird scenario we should either not care or outright prevent it. However, after re-reading @mjibson a couple of times I think it was actually about something else.
What I think he's saying is that we should allow duplicate indexes to exist, because certain users may only have access to a single index and not the other. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally made it through this! Thanks very much for writing this up, @jkosh44. Looks really good!
I felt like the material here was vast enough that I couldn't engage with it all, but also that there is quite a bit more to design (e.g., how are privileges actually represented in the catalog). I think it's exactly right to pause around here and start on the implementation, but I'd love to see this design document continually revised + updated during the implementation.
With that said, I think we should add the following two sections up front:
-
Rollout plan
How can we start landing RBAC PRs incrementally without committing to syntax or semantics? I assume we can put
GRANT
,REVOKE
, etc. behind unsafe mode. But how do we start adding privilege checks when users don't have access toGRANT
orREVOKE
? Is it possibly as easy as continuing to give everyoneSUPERUSER
privilege at the moment? Then privilege checks just won't apply. When we're ready to flip the switch, we remove the unsafe mode toggles and also removeSUPERUSER
privileges from "Organization Members".Probably some holes in that braindump, so fleshing a plan out in the design doc would be great. Definitely don't need to flesh out the whole rollout—just enough to describe how we'll land the first batch of PRs.
-
Testing plan
I imagine the plan here is mostly "write lots of SLT and testdrive tests to test the SQL implementation, and write one or two custom Rust/mzcompose tests to test the Frontegg integration." Whatever it is, we should write that down, and make sure @philip-stoev and @def- are aware of this ASAP. I'm sure they'll have no shortage of tests to write.
| role creation | `CREATEROLE` | Can create, alter, drop, grant membership to, and revoke membership from roles. | Yes | | ||
| inheritance of privileges | `INHERIT` | Can inherit the privileges of roles that it is a member of. On by default. For this project we can keep this as a mandatory attribute. | Yes | | ||
| cluster creation | `CREATECLUSTER` | Can create a cluster. | No | | ||
| persisted data creation | `CREATEPERSIST` | Can write data to S3, either directly or indirectly (for example inserting into a table would be directly and creating a source would be indirectly). | No | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd to me! Is there something special about persist that means we want to promote this to a per role permission? I would lean for simply not having this permission.
If so, we should find a different name, since “persist” isn’t a user facing concept anywhere else!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind this was to differentiate between being able to create objects like views, which don't persist large amounts of data in S3, and being able to create objects like sources, which may persist large amounts of data in S3.
I personally would also lean towards removing this, so I'll probably just remove it, unless that explanation changed your mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjibson FYI, in case you want to weigh in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one of the problems users have is "limit who can spend money", and money is spent by creating replicas or writing things to s3, it made sense to me that both replica creation and s3 writing should be things that need granting. I don't care too much either way, and if no users need this for now then no objection to removing it.
When a new user logs in, we will create a new role for them with only the `LOGIN` and `INHERIT` attributes. If that user | ||
is a frontegg admin, then the role will also have the `SUPERUSER` attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, I think the next logical question is, if we don't allow specifying the
LOGIN
attribute via SQL, then should we be allowing any RBAC via SQL? We should at least prevent users from typingSUPERUSER
because that also will be overwritten by Frontegg on login. Though perhaps we should completely removeCREATE ROLE
,ALTER ROLE
,DROP ROLE
,GRANT
, andREVOKE
.
Frontegg's role and permission system is nowhere near powerful enough to be used directly for SQL RBAC, is the problem! You can't have roles that inherit from other roles, and you can't (easily) create privileges associated with objects in SQL. Like, if a user makes a table, something would have to add Frontegg privileges for INSERT
and SELECT
on that source in Frontegg. It would be hard and messy.
What makes sense to me is to tightly couple to Frontegg, but only for the minimum required to map the "Organization Admin" and "Organization Member" roles sensibly to SQL RBAC-land.
Reading through this discussion gives me the following idea:
- You're never allowed to specify
LOGIN
orSUPERUSER
when you create a role. The role in Materialize always exists with neither of those characteristics. - When you log in to a role with Frontegg, you implicitly get
LOGIN
privilege for that session. - When you log in to a role with Frontegg and your JWT has the "Organization Admin" role, you implicitly get
SUPERUSER
privilege for that session. - The
mz_roles
table makes no mention of these privileges because they are not inherent properties of the role, but rather properties of a given session. - We introduce an
mz_has_superuser()
function (or maybe there's some existing PostgreSQL function we can piggyback on) that reports whether the current session hasSUPERUSER
privilege. - In the future, if necessary, we can expose an
mz_sessions
table that shows all live sessions, their roeles, and whether they haveSUPERUSER
privileges.
This avoids a key pitfall with syncing LOGIN
and SUPERUSER
status to mz_roles
when a user logs in: you can't actually be sure the user is logging in with an up-to-date JWT. Suppose a user is downgraded from "Organization Admin" to "Organization Member" while actively using Materialize. They might end up with two JWTs, one saying they're an admin and one saying they're a member. There's no guarantee they use these JWTs in the right order, and they might log in second with the older JWT, and now mz_roles
will sync in stale information.
By instead associating SUPERUSER
privilege on a session-by-session basis, we sidestep this problem entirely. The session privileges will exactly reflect the privileges in the JWT used to start the session.
- Do we want different `SELECT` privileges based on if a new dataflow will be spun up or if we can use an existing one? | ||
- Pros: We can differentiate between using existing compute resources vs creating new ones when reading. | ||
- Cons: Users (and the database) are unable to determine if they're allowed to execute a read until after that read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take is we should avoid this for now. It's a pretty darn specific privilege, and it's one that we can always introduce later (FAST_PATH_ONLY_USAGE
or something). Would be good to suss out how much of a need for this there actually is.
I'm not sure that it will be common for e.g. one team to say "this is our cluster, feel free to issue fast queries but absolutely no slow queries"—especially since enough fast queries can probably meaningfully impact cluster performance. Would rather encourage folks to create materialized views and say "if you want to query our views, turn on your own cluster and build your own indexes."
For example, an alternative to explore here is a session parameter that prevents you from running non-fast-path peeks. I can imagine you might want to opt into this for some applications where you want to be really sure that you don't accidentally have a regression (either in your application code or in Materialize's optimizer) that causes queries to suddenly start taking the slow path and crash the cluster.
I think the plan is to deprecate unsafe mode and move toward LaunchDarkly flags. @ggnall can confirm LD is ready to go. |
I'm hoping to piggyback off of PostgreSQL for this, I just haven't put in the legwork yet to see how they do it.
I like this plan with the caveat added that users that don't authenticate through Frontegg are not given superuser. Otherwise, we don't have a good way of testing everything if even our test users are given superuser status.
Yeah, I think it will look a lot like this. I'm also hoping to borrow any tests that PostgreSQL has in their repo. I'll add all of this to the doc. |
Now that I'm on the final (non-optional) phase, the implementation details for all phases are filled in, if anyone wants to take another look. Specifically I think the final phase's implementation is of particular interest: a448732. One thing that may be challenging is that the PostgreSQL Yet when displaying
Looking the the It isn't until the This means that the conversion to a string needs access to the catalog. If I'm remembering correctly, Materializes does not have catalog access when formatting a |
That sounds identical to the regtype and regproc types which are OIDs under the covers but implement a cast to string that knows how to print itself, which has access to the catalog. |
Oh very cool, I did not know about these types. However, it looks like we don't follow the same semantics here as PostgreSQL:
|
That's due to some other postgres casting rule we don't yet follow. We are indeed able to convert those to strings, you just have to ask:
But the impl should look something like materialize/src/sql/src/plan/typeconv.rs Line 318 in 3fbb6ed
|
I'm not sure if that's 100% correct. I don't think that PostgreSQL implicitly casts
I'm not sure where the casting is implemented, but the formatting is implemented here: https://github.com/postgres/postgres/blob/8e5eef50c5b41fd39ad60365c9c1b46782f881ca/src/backend/utils/adt/regproc.c#L1223-L1269 The return type of that function is explicitly and not which makes me think that PostgreSQL is not implicitly casting It makes sense to me that the casting works in Materialize because it is implemented on the main coord thread that has access to the catalog, but formatting happens on the pgwire client thread which doesn't have access to the catalog. We probably want to keep the proposed column as a |
All this is just if we want |
This is a part of postgres I don't yet understand. The type of regtype is regtype (==oid), but it always returns a string datum. I don't get how it's correct for pg to announce it's a regtype, but then serialize it as a string always. |
- We will create a new Rust struct called `AclMode` that is a bit flag containing all privileges. | ||
- See https://github.com/postgres/postgres/blob/3aa961378b4e517908a4400cdc476ca299693de9/src/include/nodes/parsenodes.h#L74-L101 | ||
for PostgreSQL's implementation. | ||
- We will create a new datatype called `maclitem` in the `mz_internal` schema to represent a single privilege. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to add a new type, it needs to be described more. Can you list the operators, functions, and casts it supports. Same for the array version of it. I see pg has some details at https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-ACLITEM-OP-TABLE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also how they will be serialized over pgwire in text and binary. (This is probably just: "exactly the same as postgres" but worthwhile to do the investigation for how postgres does it first to make sure we also can.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added these details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simpler approach, that I'm starting to like better, is to add the following system table:
CREATE TABLE mz_privileges (
object_id: Text, -- ID of object.
grantee: Text, -- ID of role that has a certain privilege.
grantor: Text, -- ID of role that granted the privilege.
privileges: Text, -- Letter codes of privileges concatenated together.
);
maclitem
is slightly more complicated than I was hoping for because our repr layer doesn't have access to RoleId
s.
I think this is the case for all non-trivial data types though, right? For example internally an However, when
Even for simple ones like dates and times, which are just a single integer, there is custom formatting when displaying to the user:
|
What does this mean? Like, what's the difference between a regproc displayed to the user and not displayed to the user? How does it determine if something is displayed to the user? Is it anything in the final SELECT projection? It feels like there's some data layer mz lacks here. |
For each datatype in PostgreSQL, there's a Our equivalent of this is when we encode values here: materialize/src/pgrepr/src/value.rs Lines 302 to 311 in 3fbb6ed
The big difference is that when we encode values, we don't have access to the catalog.
I think fundamentally the break down is as follows: |
Though the more I think about it, the more I think that it's probably fine if |
Sorry I don't think I fully understood what was going on at first which is why my language was inexact/vague. I understand what's happening now, so here's a summary that uses more exact language to avoid confusion. Currently in Materialize, our text encoding of the The reason this discrepancy exists is because PostgreSQL has access to the catalog when doing text encoding, while Materialize does not. As a corollary, the text encoding of the proposed type, I'll add this to the design doc. |
I'm going to go ahead and merge this. If there's updates later then I'll open a new PR. |
Part of #11579
Motivation
Design doc.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-proto
label.