Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to restrict which operators can be used and which props can be used for ordering #2442

Open
adrian-gierakowski opened this issue Aug 22, 2022 · 15 comments
Labels
idea Needs of discussion to become an enhancement, not ready for implementation

Comments

@adrian-gierakowski
Copy link

Environment

  • PostgreSQL version: (if using docker, specify the image)

14.3

  • PostgREST version: (if using docker, specify the image)

9.0.1.20220717

  • Operating system:

linux (nixos, debian)

Description of issue

By default, users of the api served by postgrest can execute queries using all available operators and sort the results based on all available columns. I'd like to be able to restrict which operators can be used (and with which columns) as well as only enable ordering on specific columns. I don't see any mention of such capabilities in the postgrest docs. What's the recommended way of doing this?

Thanks!

@steve-chavez
Copy link
Member

steve-chavez commented Aug 23, 2022

This would be an interesting feature. I don't know the interface yet but Ideally we'd reuse some PostgreSQL mechanism.

which operators can be used (and with which columns) as well as only enable ordering on specific columns.

For example, if we would expose operators based on indexes, I see that CREATE OPERATOR CLASS has a FOR SEARCH | FOR ORDER BY distinction.

The CREATE OPERATOR FAMILY can be schema namespaced, it would be great if we could have an operator family for postgREST.

(Mostly thinking out loud above, those interfaces might not work)

Another idea on #2028 (comment)

What's the recommended way of doing this?

For now you could use functions/views and only expose the columns you want to be searched on. The most restrictive way would be scalar functions(that don't return table valued types) which wouldn't allow any filter or order, those would have to be done internally.


Ideally pg would get the hint from:

grant EXECUTE on FUNCTION pg_catalog.int4eq(test.projects.id%TYPE,integer) TO postgrest_test_anonymous;
grant EXECUTE on FUNCTION pg_catalog.int4eq(test.projects.client_id%TYPE,integer) TO postgrest_test_anonymous;

Tthat =(int4eq) is only allowed on the id and client_id columns.

@steve-chavez steve-chavez added the idea Needs of discussion to become an enhancement, not ready for implementation label Aug 23, 2022
@steve-chavez
Copy link
Member

Another idea on #2028 (comment)

Using the YAML idea above but namespacing the configs to our role, it could be like:

ALTER ROLE authenticator SET pgrst.operators TO $$
available: 
  -lt: '<'
 - gt: '>' 
 - like: 'LIKE'
schema:
  table:
    - column1: lt, gt
    - column2: like
    - order: column3
$$;

@adrian-gierakowski adrian-gierakowski changed the title how to restrict which operators operators can be used and which props can be used for ordering how to restrict which operators can be used and which props can be used for ordering Aug 23, 2022
@steve-chavez
Copy link
Member

@adrian-gierakowski Instead of whitelisting the operations, what do you think of disallowing requests by their plan cost using the pg_plan_filter extension?

Using https://www.postgresql.org/docs/current/pgstatstatements.html, it seems possible to determine the query with the highest cost and set that as a cost limit.

@adrian-gierakowski
Copy link
Author

@adrian-gierakowski Instead of whitelisting the operations, what do you think of disallowing requests by their plan cost using the pg_plan_filter extension?

This does not sound like what I'm looking for.

What I'm looking for is something which would allow me to start by crafting a minimal API for a particular purpose, and then easily expand it as the need arises, making sure that any allowed queries can be server efficiently. The solution I'm looking for should also enable the generated OpenApi spec to reflect exactly what is allowed. And error messages returned for the rejected requests should clearly indicate why the reason for rejection (for example: "ordering by name is not allowed").

I could obviously slap a reverse proxy in front of the postgrest instance and reject unwanted queries, but that would required a re-implementation of the parser for the not so trivial postgrest query language on the proxy, which I think doesn't make sense, given that postgrest does the parsing already.

@steve-chavez
Copy link
Member

steve-chavez commented Aug 23, 2022

Fair points.

Some other ideas. Ideally pg would get the hint from:

grant EXECUTE on FUNCTION pg_catalog.int4eq(test.projects.id%TYPE,integer) TO postgrest_test_anonymous;
grant EXECUTE on FUNCTION pg_catalog.int4eq(test.projects.client_id%TYPE,integer) TO postgrest_test_anonymous;

That eq/=(int4eq) is only allowed on the id and client_id columns for the test.projects relation but that doesn't work.

Another idea is to have a convention on a function like:

CREATE FUNCTION eq(test.projects, integer) RETURNS bool
AS $_$ 
  SELECT $1.id = $2 
$_$ LANGUAGE SQL STABLE;

That wouldnn't work as well because we need the function to apply for integer types coming from different columns. Defining operators in this way could get long too.


Another option would be

CREATE FUNCTION "pgrst.projects.id.eq"(test.projects, test.projects.id%TYPE) RETURNS bool
AS $_$ 
  SELECT $1.id = $2 
$_$ LANGUAGE SQL STABLE;

CREATE FUNCTION "pgrst.projects.client_id.eq"(test.projects, test.projects.client_id%TYPE) RETURNS bool
AS $_$ 
  SELECT $1.client_id = $2 
$_$ LANGUAGE SQL STABLE;

@steve-chavez
Copy link
Member

There's another way that michelp suggested to me.

PostgreSQL has the SECURITY LABEL command, we could use it as:

SECURITY LABEL FOR pgrest ON TABLE foo.bar IS $$
  GRANT ORDER ON col TO anonymous
  GRANT USAGE ON col2.eq TO anonymous
$$;

The IS string is a label and the syntax is arbitrary, we can have a nice DSL there.

The benefit is that it would allow us to restrict ORDER or any other feature, unlike the FUNCTION approach.

The drawback is that it would require installing a postgresql extension but we can fallback to our in-db config or env vars using the same syntax.

@adrian-gierakowski
Copy link
Author

Nice, this sounds really promising! Thank you for giving it such a thorough consideration. Let me know if there is anything I could do to help make it happen.

@wolfgangwalther
Copy link
Member

PostgreSQL has the SECURITY LABEL command

I thought about using SECURITY LABEL to add metadata to database objects a while back, too. The big upside is, that we'd have our own "namespace", so we wouldn't conflict with other metadata, as in the COMMENT ON approach. The downside is, that SECURITY LABEL is currently not supported for all database objects. Examples are CONSTRAINT, which could be useful in embedding contexts, OPERATOR, in this context here, CAST in terms of input/output conversions, etc..

I'm not sure whether we should start attaching metadata to objects in this way, when we're that limited down the road.

@steve-chavez
Copy link
Member

steve-chavez commented Aug 27, 2022

The downside is, that SECURITY LABEL is currently not supported for all database objects. Examples are CONSTRAINT, which could be useful in embedding contexts, OPERATOR, in this context here, CAST in terms of input/output conversions, etc..

Not sure we'll need all of those but we could propose extending SECURITY LABEL in postgres core(feasible now that I have a couple of patches landed) and for past versions we could fallback to our in-db config.

Edit: Additionally, since we'll have to parse the label inside anyway, we could extend the approach this way

SECURITY LABEL FOR pgrest ON TABLE foo.bar IS $$
  HIDE CONSTRAINT mypkey MESSAGE WITH 'duplicate id'
$$;

Meaning we could use the label to refer to other objects of the table. Using SECURITY LABEL .. ON SCHEMA would allow us to annotate all possible db objects I think.

@wolfgangwalther
Copy link
Member

Edit: Additionally, since we'll have to parse the label inside anyway, we could extend the approach this way

SECURITY LABEL FOR pgrest ON TABLE foo.bar IS $$
  HIDE CONSTRAINT mypkey MESSAGE WITH 'duplicate id'
$$;

Meaning we could use the label to refer to other objects of the table. Using SECURITY LABEL .. ON SCHEMA would allow us to annotate all possible db objects I think.

One of the upsides of using COMMENT ON or SECURITY LABEL ON is, that those are tied to the objects they reference - so they will be automatically removed when the object is removed, too. No orphaned comments possible. But this is not the case with those "in-label-references".

The downside is, that SECURITY LABEL is currently not supported for all database objects. Examples are CONSTRAINT, which could be useful in embedding contexts, OPERATOR, in this context here, CAST in terms of input/output conversions, etc..

Not sure we'll need all of those but we could propose extending SECURITY LABEL in postgres core(feasible now that I have a couple of patches landed) and for past versions we could fallback to our in-db config.

Yeah, extending SECURITY LABEL to other objects would be great. I can imagine, however, that this will be rejected, because to make use of security labels e.g. for selinux, there would need to be a sensible way to do some kind of "access check" (via object_access_hook) on the objects to be added...

Another option could be to try to extend COMMENT ON in postgres core to support something like COMMENT FOR pgrst ON ..., similar to security labels FOR, essentially providing namespaces for comments.

@steve-chavez
Copy link
Member

steve-chavez commented Aug 29, 2022

One of the upsides of using COMMENT ON or SECURITY LABEL ON is, that those are tied to the objects they reference - so they will be automatically removed when the object is removed, too. No orphaned comments possible.

Hm, I don't see how's that a dealbreaker for going with SECURITY LABEL. Orphaned metadata can be noticed and cleaned up on git-versioned database code and we could warn or err when trying to apply it. After all we're going to have to parse the label metadata on Haskell code.

Also consider that we'd need to support the in-db config and the other configs as fallback too

ALTER ROLE authenticator SET pgrest.sec.table."foo.bar" IS $$
  HIDE CONSTRAINT mypkey MESSAGE WITH 'duplicate id'
$$;

And those could leave orphaned metadata too

@steve-chavez
Copy link
Member

I've been thinking this could be kept friendlier(less cognitive overload for the user) and conciser if we use plain HTTP syntax for an allowlist.

GET /clients?select=projects(),tasks()&id=eq.*&name=like.*&order=name # users can only use eq on id, like on name, order on name and only embed on projects and tasks

PATCH /clients?id=eq.* # users can only patch by using the id filter

Every other request would fail. This allowlist could limit the Schema Cache and also the OpenAPI output.

(Specific columns in select are not included here because we want users to secure those in the database)


Ideally there would be a mode where PostgREST starts without an allowlist but can generate it based on the requests it receives. So users can start with an unrestricted instance, build a test suite, run it against the instance and have it generate the allowlist(likely with an endpoint on /admin). This would avoid having to create the allowlist manually.

@adrian-gierakowski
Copy link
Author

@steve-chavez this sounds great!

@steve-chavez
Copy link
Member

Just so I don't forget.

This could be more easily implemented with #1710 plus a pre-request function. With that we could filter postgREST query params on SQL.

pre-request might not be as performant but that might change with pipeline mode.

@steve-chavez
Copy link
Member

steve-chavez commented May 16, 2023

So I've been trying hard to restrict operators using pg itself(ref). The interface turns out to be clunky and requires a lot of database objects. And that doesn't even cover restricting order.

Instead of trying to encode HTTP concerns into pg. What if we take an OpenAPI spec as input?

The user can take our openapi output and then store it in a db function, say db-openapi-input-spec. And simply remove all the unwanted query params. This would solve this issue.


Notes:

  • A YAML openapi would be nicer to edit.
  • Seems simpler to implement as it won't require using the schema cache. It would be purely config.
  • We would need to solve Improvements to query language #2066 first to restrict operators?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Needs of discussion to become an enhancement, not ready for implementation
Development

No branches or pull requests

3 participants