Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: Add design doc for RBAC #17702

Merged
merged 35 commits into from
Apr 19, 2023
Merged

doc: Add design doc for RBAC #17702

merged 35 commits into from
Apr 19, 2023

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Feb 16, 2023

Part of #11579

Motivation

Design doc.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@jkosh44 jkosh44 requested review from benesch, ggnall, maddyblue and a team February 16, 2023 19:52
| `EXPLAIN` | usually `OBJECT(r)` |
| `INSERT INTO ... VALUES` | `OBJECT(a)` |
| `INSERT INTO ... SELECT` | `CLUSTER(U)`,`OBJECT(a)` usually `OBJECT(r)` |
| `{SELECT, SHOW, SUBSCRIBE}` | `CLUSTER(U)`, usually `OBJECT(r)` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Cara pointed out, this could prevent SHOW and other introspection queries which might not be desired. Do we want to special case those somehow? This one is tricky to get right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one idea that has been shot down before, but might warrant reconsideration in this context. All SHOW commands are run in the mz_introspection cluster, and all roles are automatically given CLUSTER(U) for mz_introspection and OBJECT(r) for builtin objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It always seemed a little strange to make users hop clusters to run introspection queries, so that option is worth considering for ux alone.

Flagging that there are some mz_introspection use cases we missed in the initial spec, which I'm adding there too:

  • Support uses mz_introspection clusters to debug client environments
  • They need to be able to to SELECT across all mz_* tables and execute EXPLAIN and DESCRIBE, ideally across clusters
  • Current permissions on this cluster are hardcoded in environmentd?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about select queries on objects that don't inherently exist on a cluster?
Would you need to be in a cluster that you have CLUSTER(U) privileges for that cluster when you run the SELECT?

Aka we're saying that anyone who has access to Materialize should have usage access to at least one cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you need to be in a cluster that you have CLUSTER(U) privileges for that cluster when you run the SELECT?

For this design yes. The rationale is that your select will use some of the compute resources in the current cluster, and therefore you will need usage privileges on that cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it makes sense.
@ggnall FYI - for our comms / as we think about our guidance

@jkosh44 jkosh44 requested a review from chaas February 16, 2023 20:43
@jkosh44 jkosh44 mentioned this pull request Feb 21, 2023
4 tasks
@jkosh44
Copy link
Contributor Author

jkosh44 commented Feb 21, 2023

I was thinking that we'd be able to leave pre-defined roles out of this initial project, but after a conversation on slack I think that we'll need to include them. I'll update the design to add a section for pre-defined roles. I won't change any of the existing sections, so it shouldn't affect any on-going reviews.
pre-defined roles: https://www.postgresql.org/docs/current/predefined-roles.html
slack conversation: https://materializeinc.slack.com/archives/C02FWJ94HME/p1676998963912989

@jkosh44
Copy link
Contributor Author

jkosh44 commented Feb 21, 2023

I was thinking that we'd be able to leave pre-defined roles out of this initial project, but after a conversation on slack I think that we'll need to include them. I'll update the design to add a section for pre-defined roles. I won't change any of the existing sections, so it shouldn't affect any on-going reviews. pre-defined roles: https://www.postgresql.org/docs/current/predefined-roles.html slack conversation: https://materializeinc.slack.com/archives/C02FWJ94HME/p1676998963912989

I made the updates, it did end up touching a couple of existing sections, but only very slightly. I kept it all in a single small commit that you can look at.

Copy link
Contributor

@maddyblue maddyblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Role/public changes lgtm.

@benesch
Copy link
Member

benesch commented Feb 22, 2023

I'm planning to review this by the end of the week!

Copy link
Contributor

@chaas chaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came across this comment by @mjibson on another issue #13430 (comment) about preventing the creation of duplicate indexes

A concern should we implement RBAC: a user could be denied read access to a mv/index due to RBAC, but then this feature might also prevent them from creating an index/mv of their own. We should thus perhaps issue a NOTICE that there was a duplicate, pointing to the first one, and the user can then decide to immediately DROP what they created.

It's a good question about what we should do if someone tries to create an object that already exists but they don't have access to, e.g. a database of schema. Is it leaking information somehow by producing an error? Curious what postgres does.

Comment on lines 88 to 89
When a new user logs in, we will create a new role for them with only the `LOGIN` and `INHERIT` attributes. If that user
is a frontegg admin, then the role will also have the `SUPERUSER` attribute.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggnall Do have an opinion on the following related scenario: A user logs in AND they are a Frontegg admin AND they have an existing role in Materialize AND the existing role is not a super user. Should we update that Materialize role to be a super user or leave it alone?

(FYI @mjibson)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this scenario play out on every login? Or only on the initial one?

When we eventually decouple Account permissions (or build our own), it probably makes sense to have SQL/RBAC be the system of record on who can do what.

From that perspective, an existing explicitly GRANTed role should take precedence. This would enable a case where a user needs to manage users, billing, etc., but should have limited or no database access.

The simplest thing to explain is: "Account admins are always SUPERUSERS", but it seems reasonable to say "Account admins are always SUPERUSERS by default, unless they are explicitly granted another role".

There will be extra steps in the latter case to upgrade users:

  • OrgAdmin -> OrgMember: would retain explicitly granted roles
  • OrgMember -> OrgAdmin: would retain explicitly granted roles. Another SUPERUSER would have to remove existing roles or GRANT them SUPERUSER.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From another perspective, I'm not 100% sure how important this scenario is:

This would enable a case where a user needs to manage users, billing, etc., but should have limited or no database access.

If it's simpler for us to implement "Account admins are always SUPERUSERS" and it's reversible, we could start there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggnall The above scenario is about a frontegg user who was initially a non-admin frontegg user, but was then later granted admin in frontegg. Should we bump up their perms in mz? Also applies the other way around: if a frontegg user later gets their admin role revoked, should we also revoke mz superuser?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjibson We may have crossed messages but for:

"Account admins are always SUPERUSERS"

...Yes, frontegg admins grants/revokes should probably be paired with mz superuser grants/revokes. It would make sense to keep these in sync, so that users that invite new members to their organization are also able to fully manage access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're never allowed to specify LOGIN or SUPERUSER when you create a role. The role in Materialize always exists with neither of those characteristics.
...
When you log in to a role with Frontegg and your JWT has the "Organization Admin" role, you implicitly get SUPERUSER privilege for that session.

One consequence of this is that we lose the mechanism to revoke SUPERUSER from someone while they're logged in. So as long as someone remains connected, they retain SUPERUSER after it's been revoked in Frontegg. Unless, we invent some new syntax for this and make sure to always update this field in the session before executing a command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're never allowed to specify LOGIN or SUPERUSER when you create a role. The role in Materialize always exists with neither of those characteristics.
...
When you log in to a role with Frontegg and your JWT has the "Organization Admin" role, you implicitly get SUPERUSER privilege for that session.

One consequence of this is that we lose the mechanism to revoke SUPERUSER from someone while they're logged in. So as long as someone remains connected, they retain SUPERUSER after it's been revoked in Frontegg. Unless, we invent some new syntax for this and make sure to always update this field in the session before executing a command.

Or we have some periodic Frontegg refresh that periodically re-authenticates through Frontegg and updates these attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I've updated the doc with what we've discussed in this thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we have some periodic Frontegg refresh that periodically re-authenticates through Frontegg and updates these attributes.

We've already got a task that periodically refreshes access tokens for pgwire connections!

.and_then(|token| frontegg.continuously_validate_access_token(token, user.clone()))

Would be pretty easy to stitch in SUPERUSER updates to that task, I think. Would only apply to pgwire sessions and not HTTP sessions, but I think that's ok; pgwire sessions tend to be long lived while http sessions tend to be short lived.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we have some periodic Frontegg refresh that periodically re-authenticates through Frontegg and updates these attributes.

We've already got a task that periodically refreshes access tokens for pgwire connections!

.and_then(|token| frontegg.continuously_validate_access_token(token, user.clone()))

Would be pretty easy to stitch in SUPERUSER updates to that task, I think. Would only apply to pgwire sessions and not HTTP sessions, but I think that's ok; pgwire sessions tend to be long lived while http sessions tend to be short lived.

I've updated the doc to include this.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Feb 24, 2023

It's a good question about what we should do if someone tries to create an object that already exists but they don't have access to, e.g. a database of schema. Is it leaking information somehow by producing an error? Curious what postgres does.

There are actually two scenarios here:

  1. A user does not have any privileges on a schema. In this case the existence of a table is not leaked:
postgres=> CREATE TABLE bar.t1 ();
ERROR:  permission denied for schema bar
LINE 1: CREATE TABLE bar.t1 ();
                     ^
postgres=> SELECT * FROM bar.t1;
ERROR:  permission denied for schema bar
LINE 1: SELECT * FROM bar.t1;
                      ^
  1. A user has create privileges but not usage privileges on a schema. In this case the existence of a table is leaked:
postgres=> CREATE TABLE bar.t1 ();
ERROR:  relation "t1" already exists
postgres=> SELECT * FROM bar.t1;
ERROR:  permission denied for schema bar
LINE 1: SELECT * FROM bar.t1;
                      ^
postgres=> CREATE TABLE bar.t2 ();
CREATE TABLE
postgres=> SELECT * FROM bar.t2;
ERROR:  permission denied for schema bar
LINE 1: SELECT * FROM bar.t2;
                      ^

Though this is such a weird scenario we should either not care or outright prevent it.

However, after re-reading @mjibson a couple of times I think it was actually about something else.

A concern should we implement RBAC: a user could be denied read access to a mv/index due to RBAC, but then this feature might also prevent them from creating an index/mv of their own. We should thus perhaps issue a NOTICE that there was a duplicate, pointing to the first one, and the user can then decide to immediately DROP what they created.

What I think he's saying is that we should allow duplicate indexes to exist, because certain users may only have access to a single index and not the other.

Copy link
Member

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally made it through this! Thanks very much for writing this up, @jkosh44. Looks really good!

I felt like the material here was vast enough that I couldn't engage with it all, but also that there is quite a bit more to design (e.g., how are privileges actually represented in the catalog). I think it's exactly right to pause around here and start on the implementation, but I'd love to see this design document continually revised + updated during the implementation.

With that said, I think we should add the following two sections up front:

  • Rollout plan

    How can we start landing RBAC PRs incrementally without committing to syntax or semantics? I assume we can put GRANT, REVOKE, etc. behind unsafe mode. But how do we start adding privilege checks when users don't have access to GRANT or REVOKE? Is it possibly as easy as continuing to give everyone SUPERUSER privilege at the moment? Then privilege checks just won't apply. When we're ready to flip the switch, we remove the unsafe mode toggles and also remove SUPERUSER privileges from "Organization Members".

    Probably some holes in that braindump, so fleshing a plan out in the design doc would be great. Definitely don't need to flesh out the whole rollout—just enough to describe how we'll land the first batch of PRs.

  • Testing plan

    I imagine the plan here is mostly "write lots of SLT and testdrive tests to test the SQL implementation, and write one or two custom Rust/mzcompose tests to test the Frontegg integration." Whatever it is, we should write that down, and make sure @philip-stoev and @def- are aware of this ASAP. I'm sure they'll have no shortage of tests to write.

| role creation | `CREATEROLE` | Can create, alter, drop, grant membership to, and revoke membership from roles. | Yes |
| inheritance of privileges | `INHERIT` | Can inherit the privileges of roles that it is a member of. On by default. For this project we can keep this as a mandatory attribute. | Yes |
| cluster creation | `CREATECLUSTER` | Can create a cluster. | No |
| persisted data creation | `CREATEPERSIST` | Can write data to S3, either directly or indirectly (for example inserting into a table would be directly and creating a source would be indirectly). | No |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd to me! Is there something special about persist that means we want to promote this to a per role permission? I would lean for simply not having this permission.

If so, we should find a different name, since “persist” isn’t a user facing concept anywhere else!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this was to differentiate between being able to create objects like views, which don't persist large amounts of data in S3, and being able to create objects like sources, which may persist large amounts of data in S3.

I personally would also lean towards removing this, so I'll probably just remove it, unless that explanation changed your mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjibson FYI, in case you want to weigh in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one of the problems users have is "limit who can spend money", and money is spent by creating replicas or writing things to s3, it made sense to me that both replica creation and s3 writing should be things that need granting. I don't care too much either way, and if no users need this for now then no objection to removing it.

Comment on lines 88 to 89
When a new user logs in, we will create a new role for them with only the `LOGIN` and `INHERIT` attributes. If that user
is a frontegg admin, then the role will also have the `SUPERUSER` attribute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, I think the next logical question is, if we don't allow specifying the LOGIN attribute via SQL, then should we be allowing any RBAC via SQL? We should at least prevent users from typing SUPERUSER because that also will be overwritten by Frontegg on login. Though perhaps we should completely remove CREATE ROLE, ALTER ROLE, DROP ROLE, GRANT, and REVOKE.

Frontegg's role and permission system is nowhere near powerful enough to be used directly for SQL RBAC, is the problem! You can't have roles that inherit from other roles, and you can't (easily) create privileges associated with objects in SQL. Like, if a user makes a table, something would have to add Frontegg privileges for INSERT and SELECT on that source in Frontegg. It would be hard and messy.

What makes sense to me is to tightly couple to Frontegg, but only for the minimum required to map the "Organization Admin" and "Organization Member" roles sensibly to SQL RBAC-land.

Reading through this discussion gives me the following idea:

  • You're never allowed to specify LOGIN or SUPERUSER when you create a role. The role in Materialize always exists with neither of those characteristics.
  • When you log in to a role with Frontegg, you implicitly get LOGIN privilege for that session.
  • When you log in to a role with Frontegg and your JWT has the "Organization Admin" role, you implicitly get SUPERUSER privilege for that session.
  • The mz_roles table makes no mention of these privileges because they are not inherent properties of the role, but rather properties of a given session.
  • We introduce an mz_has_superuser() function (or maybe there's some existing PostgreSQL function we can piggyback on) that reports whether the current session has SUPERUSER privilege.
  • In the future, if necessary, we can expose an mz_sessions table that shows all live sessions, their roeles, and whether they have SUPERUSER privileges.

This avoids a key pitfall with syncing LOGIN and SUPERUSER status to mz_roles when a user logs in: you can't actually be sure the user is logging in with an up-to-date JWT. Suppose a user is downgraded from "Organization Admin" to "Organization Member" while actively using Materialize. They might end up with two JWTs, one saying they're an admin and one saying they're a member. There's no guarantee they use these JWTs in the right order, and they might log in second with the older JWT, and now mz_roles will sync in stale information.

By instead associating SUPERUSER privilege on a session-by-session basis, we sidestep this problem entirely. The session privileges will exactly reflect the privileges in the JWT used to start the session.

doc/developer/design/20230216_role_based_access_control.md Outdated Show resolved Hide resolved
doc/developer/design/20230216_role_based_access_control.md Outdated Show resolved Hide resolved
doc/developer/design/20230216_role_based_access_control.md Outdated Show resolved Hide resolved
doc/developer/design/20230216_role_based_access_control.md Outdated Show resolved Hide resolved
doc/developer/design/20230216_role_based_access_control.md Outdated Show resolved Hide resolved
doc/developer/design/20230216_role_based_access_control.md Outdated Show resolved Hide resolved
Comment on lines +378 to +380
- Do we want different `SELECT` privileges based on if a new dataflow will be spun up or if we can use an existing one?
- Pros: We can differentiate between using existing compute resources vs creating new ones when reading.
- Cons: Users (and the database) are unable to determine if they're allowed to execute a read until after that read
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take is we should avoid this for now. It's a pretty darn specific privilege, and it's one that we can always introduce later (FAST_PATH_ONLY_USAGE or something). Would be good to suss out how much of a need for this there actually is.

I'm not sure that it will be common for e.g. one team to say "this is our cluster, feel free to issue fast queries but absolutely no slow queries"—especially since enough fast queries can probably meaningfully impact cluster performance. Would rather encourage folks to create materialized views and say "if you want to query our views, turn on your own cluster and build your own indexes."

For example, an alternative to explore here is a session parameter that prevents you from running non-fast-path peeks. I can imagine you might want to opt into this for some applications where you want to be really sure that you don't accidentally have a regression (either in your application code or in Materialize's optimizer) that causes queries to suddenly start taking the slow path and crash the cluster.

doc/developer/design/20230216_role_based_access_control.md Outdated Show resolved Hide resolved
@chaas
Copy link
Contributor

chaas commented Feb 27, 2023

I assume we can put GRANT, REVOKE, etc. behind unsafe mode.

I think the plan is to deprecate unsafe mode and move toward LaunchDarkly flags. @ggnall can confirm LD is ready to go.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Feb 27, 2023

how are privileges actually represented in the catalog

I'm hoping to piggyback off of PostgreSQL for this, I just haven't put in the legwork yet to see how they do it.

Rollout plan

How can we start landing RBAC PRs incrementally without committing to syntax or semantics? I assume we can put GRANT, REVOKE, etc. behind unsafe mode. But how do we start adding privilege checks when users don't have access to GRANT or REVOKE? Is it possibly as easy as continuing to give everyone SUPERUSER privilege at the moment? Then privilege checks just won't apply. When we're ready to flip the switch, we remove the unsafe mode toggles and also remove SUPERUSER privileges from "Organization Members".

I like this plan with the caveat added that users that don't authenticate through Frontegg are not given superuser. Otherwise, we don't have a good way of testing everything if even our test users are given superuser status.

Testing plan

I imagine the plan here is mostly "write lots of SLT and testdrive tests to test the SQL implementation, and write one or two custom Rust/mzcompose tests to test the Frontegg integration." Whatever it is, we should write that down, and make sure @philip-stoev and @def- are aware of this ASAP. I'm sure they'll have no shortage of tests to write.

Yeah, I think it will look a lot like this. I'm also hoping to borrow any tests that PostgreSQL has in their repo.

I'll add all of this to the doc.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Mar 29, 2023

Now that I'm on the final (non-optional) phase, the implementation details for all phases are filled in, if anyone wants to take another look. Specifically I think the final phase's implementation is of particular interest: a448732.

One thing that may be challenging is that the PostgreSQL aclitem datatype does not store role names, it only stores role IDs:

https://github.com/postgres/postgres/blob/3aa961378b4e517908a4400cdc476ca299693de9/src/include/utils/acl.h#L48-L59

Yet when displaying aclitems, it shows the role names and not the role IDs:

postgres=# SELECT makeaclitem(16384, 10, 'CREATE', false);
  makeaclitem   
----------------
 joe=C/postgres
(1 row)

Looking the the makeaclitem implementation, it only uses role IDs and not role names:

https://github.com/postgres/postgres/blob/3aa961378b4e517908a4400cdc476ca299693de9/src/backend/utils/adt/acl.c#L1604-L1645

It isn't until the aclitem is converted to a string that the role names filled in:

https://github.com/postgres/postgres/blob/3aa961378b4e517908a4400cdc476ca299693de9/src/backend/utils/adt/acl.c#L614-L682

This means that the conversion to a string needs access to the catalog. If I'm remembering correctly, Materializes does not have catalog access when formatting a Datum to a string.

@maddyblue
Copy link
Contributor

That sounds identical to the regtype and regproc types which are OIDs under the covers but implement a cast to string that knows how to print itself, which has access to the catalog.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Mar 29, 2023

That sounds identical to the regtype and regproc types which are OIDs under the covers but implement a cast to string that knows how to print itself, which has access to the catalog.

Oh very cool, I did not know about these types. However, it looks like we don't follow the same semantics here as PostgreSQL:

postgres=# SELECT 16::regtype;
 regtype 
---------
 boolean
(1 row)
materialize=> SELECT 16::regtype;
 regtype 
---------
 16
(1 row)

@maddyblue
Copy link
Contributor

That's due to some other postgres casting rule we don't yet follow. We are indeed able to convert those to strings, you just have to ask:

mjibson=> select 16::regtype;
 regtype 
---------
 16
(1 row)

Time: 59.627 ms
mjibson=> select 16::regtype::text;
 text 
------
 bool
(1 row)

But the impl should look something like

(RegType, String) => Explicit: sql_impl_cast("(

@jkosh44
Copy link
Contributor Author

jkosh44 commented Mar 29, 2023

That's due to some other postgres casting rule we don't yet follow.

I'm not sure if that's 100% correct. I don't think that PostgreSQL implicitly casts regtype to a text, it just has custom display logic for regtype:

postgres=# SELECT 16::regtype;
 regtype 
---------
 boolean
(1 row)

postgres=# SELECT pg_typeof(16::regtype);
 pg_typeof 
-----------
 regtype
(1 row)

postgres=# SELECT 16::regtype::text;
  text   
---------
 boolean
(1 row)

postgres=# SELECT pg_typeof(16::regtype::text);
 pg_typeof 
-----------
 text
(1 row)

I'm not sure where the casting is implemented, but the formatting is implemented here: https://github.com/postgres/postgres/blob/8e5eef50c5b41fd39ad60365c9c1b46782f881ca/src/backend/utils/adt/regproc.c#L1223-L1269

The return type of that function is explicitly cstring:
https://github.com/postgres/postgres/blob/8e5eef50c5b41fd39ad60365c9c1b46782f881ca/src/include/catalog/pg_proc.dat#L7119-L7121

and not text like some other functions that are know to return text, like system_user:
https://github.com/postgres/postgres/blob/8e5eef50c5b41fd39ad60365c9c1b46782f881ca/src/include/catalog/pg_proc.dat#L1520-L1522

which makes me think that PostgreSQL is not implicitly casting regtype to text.

It makes sense to me that the casting works in Materialize because it is implemented on the main coord thread that has access to the catalog, but formatting happens on the pgwire client thread which doesn't have access to the catalog.

We probably want to keep the proposed column as a maclitem[] type and not text[], because maclitem carries more semantic meaning internally than text and is not denormalized. However, then we'd have to somehow force all queries against maclitem to be implicitly cast to text, which is a bit of a strange behavior.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Mar 29, 2023

All this is just if we want maclitem to be formatted nicely like aclitem, which is probably required for certain meta-commands to work properly but not strictly needed for RBAC to work properly. Otherwise we can just format maclitem using raw role IDs and have it look ugly.

@maddyblue
Copy link
Contributor

it just has custom display logic for regtype

This is a part of postgres I don't yet understand. The type of regtype is regtype (==oid), but it always returns a string datum. I don't get how it's correct for pg to announce it's a regtype, but then serialize it as a string always.

- We will create a new Rust struct called `AclMode` that is a bit flag containing all privileges.
- See https://github.com/postgres/postgres/blob/3aa961378b4e517908a4400cdc476ca299693de9/src/include/nodes/parsenodes.h#L74-L101
for PostgreSQL's implementation.
- We will create a new datatype called `maclitem` in the `mz_internal` schema to represent a single privilege.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to add a new type, it needs to be described more. Can you list the operators, functions, and casts it supports. Same for the array version of it. I see pg has some details at https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-ACLITEM-OP-TABLE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how they will be serialized over pgwire in text and binary. (This is probably just: "exactly the same as postgres" but worthwhile to do the investigation for how postgres does it first to make sure we also can.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added these details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler approach, that I'm starting to like better, is to add the following system table:

CREATE TABLE mz_privileges (
    object_id: Text, -- ID of object.
    grantee: Text, -- ID of role that has a certain privilege.
    grantor: Text, -- ID of role that granted the privilege.
    privileges: Text, -- Letter codes of privileges concatenated together.
);

maclitem is slightly more complicated than I was hoping for because our repr layer doesn't have access to RoleIds.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Mar 29, 2023

it just has custom display logic for regtype

This is a part of postgres I don't yet understand. The type of regtype is regtype (==oid), but it always returns a string datum. I don't get how it's correct for pg to announce it's a regtype, but then serialize it as a string always.

I think this is the case for all non-trivial data types though, right? For example internally an interval is one 64-bit integer and two 32-bit integers:

https://github.com/postgres/postgres/blob/8e5eef50c5b41fd39ad60365c9c1b46782f881ca/src/include/datatype/timestamp.h#L44-L53

However, when interval's are displayed to the user they are first formatted as a string:

postgres=# SELECT make_interval(1, 2, 3, 4, 5, 6, 7);
         make_interval          
--------------------------------
 1 year 2 mons 25 days 05:06:07
(1 row)

postgres=# SELECT pg_typeof(make_interval(1, 2, 3, 4, 5, 6, 7));
 pg_typeof 
-----------
 interval
(1 row)

postgres=# SELECT make_interval(1, 2, 3, 4, 5, 6, 7)::text;
         make_interval          
--------------------------------
 1 year 2 mons 25 days 05:06:07
(1 row)

postgres=# SELECT pg_typeof(make_interval(1, 2, 3, 4, 5, 6, 7)::text);
 pg_typeof 
-----------
 text
(1 row)

Even for simple ones like dates and times, which are just a single integer, there is custom formatting when displaying to the user:

https://github.com/postgres/postgres/blob/8e5eef50c5b41fd39ad60365c9c1b46782f881ca/src/include/utils/date.h#L23-L25

postgres=# SELECT make_time(1, 2, 3);
 make_time 
-----------
 01:02:03
(1 row)

postgres=# SELECT pg_typeof(make_time(1, 2, 3));
       pg_typeof        
------------------------
 time without time zone
(1 row)

postgres=# SELECT make_time(1, 2, 3)::text;
 make_time 
-----------
 01:02:03
(1 row)

postgres=# SELECT pg_typeof(make_time(1, 2, 3)::text);
 pg_typeof 
-----------
 text
(1 row)

postgres=# 
postgres=# SELECT make_date(1, 2, 3);
 make_date  
------------
 0001-02-03
(1 row)

postgres=# SELECT pg_typeof(make_date(1, 2, 3));
 pg_typeof 
-----------
 date
(1 row)

postgres=# SELECT make_date(1, 2, 3)::text;
 make_date  
------------
 0001-02-03
(1 row)

postgres=# SELECT pg_typeof(make_date(1, 2, 3)::text);
 pg_typeof 
-----------
 text
(1 row)

@maddyblue
Copy link
Contributor

when displaying to the user

What does this mean? Like, what's the difference between a regproc displayed to the user and not displayed to the user? How does it determine if something is displayed to the user? Is it anything in the final SELECT projection? It feels like there's some data layer mz lacks here.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Mar 29, 2023

when displaying to the user

What does this mean? Like, what's the difference between a regproc displayed to the user and not displayed to the user? How does it determine if something is displayed to the user? Is it anything in the final SELECT projection? It feels like there's some data layer mz lacks here.

For each datatype in PostgreSQL, there's a X_out function that serializes the datatype into a c-string, for example the regtypeout function linked above. I'm not exactly sure at what point this is called in PostgreSQL, but it's my understanding that it's called on all the values in every row right before that row is sent over the wire to the client when sending a query response (there may be some nuances here that I don't understand between the text and binary protocol).

Our equivalent of this is when we encode values here:

/// Serializes this value to `buf` in the specified `format`.
pub fn encode(&self, ty: &Type, format: Format, buf: &mut BytesMut) -> Result<(), io::Error> {
match format {
Format::Text => {
self.encode_text(buf);
Ok(())
}
Format::Binary => self.encode_binary(ty, buf),
}
}

The big difference is that when we encode values, we don't have access to the catalog.

what's the difference between a regproc displayed to the user and not displayed to the user

I think fundamentally the break down is as follows:
Not displayed to the user would be how a regproc is stored on disk and how it's stored in memory when being manipulated by functions and operators.
Displayed to the user would be how a regproc is serialized before being sent over the wire to a client.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Mar 29, 2023

Though the more I think about it, the more I think that it's probably fine if maclitem matches retype and encodes itself as text using raw IDs instead of role names.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Mar 29, 2023

Sorry I don't think I fully understood what was going on at first which is why my language was inexact/vague. I understand what's happening now, so here's a summary that uses more exact language to avoid confusion.

Currently in Materialize, our text encoding of the regtype and regproc for the pgwire protocol does not match PostgreSQL. The text encoding that ProstgreSQL uses for regtype and regproc uses the type and function's human readable name instead of raw OIDs. The text encoding that Materialize uses for regtype and regproc uses the raw OIDs.

The reason this discrepancy exists is because PostgreSQL has access to the catalog when doing text encoding, while Materialize does not.

As a corollary, the text encoding of the proposed type, maclitem will use raw role IDs instead of human readable role names.

I'll add this to the design doc.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Apr 19, 2023

I'm going to go ahead and merge this. If there's updates later then I'll open a new PR.

@jkosh44 jkosh44 merged commit 6a36355 into MaterializeInc:main Apr 19, 2023
@jkosh44 jkosh44 deleted the rbac-design branch April 19, 2023 20:03
@jkosh44 jkosh44 mentioned this pull request Apr 24, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants