Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add request.param guc #1710

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

steve-chavez
Copy link
Member

As discussed on #915 (comment). This adds the request.param GUC.

This is the draft implementation. Not efficient because it leads to a lot of new prepared statements(no reuse).
Each query param ?select=id,name&id=eq.1&name=eq.project&order=id will get its own parameter:

select 
  set_config('request.param.id', $1, true),
  set_config('request.param.name', $2, true),
  set_config('request.param.select', $3, true),
  set_config('request.param.order', $4, true);

Ideas

  • Convert all the query parameters to a single json, like mentioned in Change SET LOCAL gucs to set_config #1600 (comment). Would lead to reusing prepared statements.

  • How about having a single GUC for the filters only?

    • It can be like request.param.filters. This would be a JSON object: {"id": "eq.1", "name": "eq.project"}. There wouldn't be a request.param.id or request.param.name.
    • With this we could disable update/delete with no filters globally by using pre-request. It'd be easier to validate that filters are present by having them on a single json.
    create or replace function update_delete_restrictions() returns void as $$
    declare
      req_filters text = nullif(current_setting('request.param.filters', true), '{}');
      req_method text = current_setting('request.method', true);
    begin
    if req_method similar to 'PATCH|DELETE' and req_filters is null then
      RAISE EXCEPTION 'UPDATE or DELETE is not allowed without filters'
      USING HINT = 'Add filters to the request';
    end if;
    end $$ language plpgsql;

    (this would be more flexible than doing it as config option as proposed on Cancel query if user cancels request #699 (comment))

@wolfgangwalther wolfgangwalther changed the base branch from master to main December 31, 2020 14:10
@wolfgangwalther
Copy link
Member

Hm. I'm not entirely convinced that I would want to have all the request params set as GUCs. Every GUC has a performance penalty when setting it: I remember from the SQL tests with did with the SET LOCAL stuff - the number of gucs had a linear impact on the query time. There is no way to opt out of this.

It'd be easier to validate that filters are present by having them on a single json.

I agree with that.


Taking a step back for a moment I wonder whether we can design something that would allow me to opt-in for all kind of request. GUCs, but have nothing as the default - for the best-performing case.

What if we did:

  • remove all set_config except role and search_path.
  • pass those settings in to hook functions we have (currently only db-pre-request, but might become more, see Expose raw body in request #1661 (comment)).
  • make it optional by using named arguments on those hooks, similar to how fixtures work with pytest.

Example:

CREATE FUNCTION my_pre_request(method TEXT, filters JSON) RETURNS void
LANGUAGE plpgsql AS $$
  ... similar to the function in the opening post ...
$$; 

Here, we would parse the function definition together with the schema cache and would then know which arguments we need to pass in.
We could easily add an example pre-request function that uses set_config to restore current behaviour to postgrest-contrib.

This would have the advantage that we could choose which variables to pass / set and could leave a big chunk out in most cases - improving performance.

We would still only need one round-trip to the database, it should not perform worse.


Thinking more about the naming of arguments, we could support something like a json path in those, to be able to select only a sub-key:

CREATE FUNCTION my_pre_request("headers.my_custom_header" TEXT, filters JSON)

This would provide all filters, but only one of the headers.

@steve-chavez
Copy link
Member Author

pass those settings in to hook functions we have (currently only db-pre-request)
make it optional by using named arguments on those hooks, similar to how fixtures work with pytest.

Defining the gucs in pre-request seems interesting.

A drawback is that if you only want to use restricted views or tables(#915 (comment)), you'd have to define the gucs on pre-request as well, which makes the feature more complicated to use. Ideally one would only touch pre-request for the global cases.

An advantage is that it could allow more GUCs per endpoint and it would not be a global config like we could do on the config file.

Not opposed to the idea, need to have more thought about it.


Other stuff that could be interesting for this feature:

@steve-chavez
Copy link
Member Author

steve-chavez commented Mar 27, 2021

New idea for filter restrictions

How about this.

If we (ab)use our in-db config for defining restrictions, we could make them static checks that don't have to hit the database.

Example:

CREATE TABLE clients(
  id int
, name text
);

-- assuming our postgrest-contrib in place
SELECT pgrst.restrict_filter('clients', 'id');

Then a request that doesn't include a filter on id will get rejected as:

GET /clients

HTTP/1.1 403 Forbidden
{"hint":null,"details":null, "code": null, "message":"id filter must be present"}

To make that work, the pgrst.restrict_filter(..) would be translated to:

ALTER ROLE pgrst_authenticator SET parameters_restrictions = $$
{
  "clients": "id"
}
$$;

-- or to a config file as
parameters-restrictions = '{"clients": "id"}'

Advantages

  • We don't have to add GUCs to every request.
  • As a I mentioned above, we can reject the request at the postgrest level, it doesn't have to hit the db. It would be similar to how we reject non-existent embeds.
  • We could use this mechanism not only for tables/views but also for set returning functions.
SELECT pgrst.restrict_filter('rpc/get_clients', 'id');

Possibilities

  • Restricting PATCH/DELETE without filters could also be done in a similar manner.
SELECT pgrst.restrict_method('clients', '{PATCH,DELETE}', '{id,name}');
-- only allow patch/delete if id or name filters are present.
  • Though the functions interface is not yet defined, this could also aid in allowing them in this way:
SELECT pgrst.allow_function('clients', 'avg(salary)');
  • Allow/deny operators (someone asked for this before on gitter, only choice was proxy)
SELECT pgrst.allow_operators('clients', '{eq,gt,lt}');
SELECT pgrst.deny_operators('clients', '{fts,wfts,phtfts,like}');
  • Enable group by/distinct globally or per route
SELECT pgrst.allow_group_by(global:=true);
SELECT pgrst.allow_group_by('clients');
SELECT pgrst.allow_distinct('clients');
SELECT pgrst.embedding_depth(global:=true, 5);

(Just an idea, the functions form could use more work)


@wolfgangwalther What do you think? Do you see any drawbacks with this approach?

@wolfgangwalther
Copy link
Member

If we (ab)use our in-db config for defining restrictions, we could make them static checks that don't have to hit the database.

Advantages

  • We don't have to add GUCs to every request.
  • As a I mentioned above, we can reject the request at the postgrest level, it doesn't have to hit the db. It would be similar to how we reject non-existent embeds.
  • We could use this mechanism not only for tables/views but also for set returning functions.
  • Restricting PATCH/DELETE without filters could also be done in a similar manner.
  • Allow/deny operators (someone asked for this before on gitter, only choice was proxy)
  • Enable group by/distinct globally or per route

@wolfgangwalther What do you think? Do you see any drawbacks with this approach?

Hm. My first reaction was: Why not do that at the proxy-level? This seems out of scope for PostgREST.

But giving it further thought... I actually like it. When discussing all the content negotiation stuff with Accept headers etc., I briefly had the idea of improving our current "router" situation. Right now we basically have a very basic router hardcoded: It just takes the route, maps it to a table or RPC name and then does a little bit of magic with the Accept header.

We could implement this in a much more generic and flexible way. And then we could load additional config per route the kind of checks you mention. I'm not sure about using all kinds of GUCs / config options for those kind of settings, though. Maybe we should just have a db-router-config setting that takes a table name. This table would have to follow a predefined schema and would hold all kind of those settings you mentioned above.

@steve-chavez
Copy link
Member Author

Why not do that at the proxy-level? This seems out of scope for PostgREST.

Yeah, the line that separated proxy and PostgREST has blurred out with time. Even the previous GUC request.param proposal or the whole discussion at #915 (comment) would be meaningless if we decide to do in the proxy(which we can). But that would keep us from evolving and adding features that are valuable.

Also, we're already a proxy for pg as well, after all we reject some requests(invalid http method, invalid jwt, etc) without touching pg.

Additionally, doing REST restrictions with our functions seems superior to doing it inside RLS or a security-barrier-like condition on views. They're a PostgREST concern after all, they should be stored in our config.

I briefly had the idea of improving our current "router" situation
Maybe we should just have a db-router-config setting that takes a table name.

Sounds interesting! But I think we should keep the design that authenticator doesn't need a privilege over a table or usage over a schema. So perhaps we should keep doing it in our config(as a JSON field), or maybe later on a table inside our extension.

@wolfgangwalther
Copy link
Member

I briefly had the idea of improving our current "router" situation
Maybe we should just have a db-router-config setting that takes a table name.

Sounds interesting! But I think we should keep the design that authenticator doesn't need a privilege over a table or usage over a schema. So perhaps we should keep doing it in our config(as a JSON field), or maybe later on a table inside our extension.

Hm. The table approach is kind of a hack to achieve the same thing as schema variables would do. For schema variables, we'd need to have privileges, too. Yes, I was thinking about putting it in our extension. This idea was mainly targeted at cloud-based users, who can't use GUC-based configuration. Maybe we can even extend this idea:

How about making db-config take either a boolean or a string? The meaning of boolean would still be the same. If it's set to a string, this is the schema qualified name of a table, that the authenticator user needs access to. This table can be a simple key/value store of all those config options that we are reading from GUCs right now. This would basically replace the GUC loading with table-loading. This would allow all cloud-based users to still maintain their config in the database.


In any case, I agree with keeping the filter settings you mentioned above in a single JSON map. This would allow the following generalization of our "routing".

We could create some types roughly along the lines of:

-- maps a list of path segments to a route configuration
type Router = M.Map [Text] Route

data Route = Route {
  rtTarget :: Target,
  -- ... other settings
  rtAllowedMethods :: [Method],
  rtAllowedOperators :: [Operator],
  rtAllowAggregation :: Boolean,
  rtAllowedFunctions :: [Text],
  rtRequiredFilters :: [Text]
}

(Those types need to be adjusted to properly allow function overloading for RPCs, but the general idea would be the same.)

When parsing the request, we would match this to one of the routes. This would imply, that we can't respond to any unknown routes anymore. I remember we discussed this elsewhere already, though.

The default values for each route are generated from the schema cache. And then we add the json config option mentioned above (e.g. router-overlay) on top of the whole Router structure. This would allow us to override some of those values selectively. With some helper functions as mentioned above, this could be easily managed. In the future this could also allow much more customization, e.g. moving target tables/views/rpcs to completely different paths, etc.

@wolfgangwalther
Copy link
Member

Maybe wai-routing can be of help here, too. I didn't fully read it, yet, but it sounds like those preconditions could be checked with it: https://hackage.haskell.org/package/wai-routing-0.3/docs/Network-Wai-Routing-Tutorial.html.

@steve-chavez
Copy link
Member Author

Every GUC has a performance penalty when setting it: I remember from the SQL tests with did with the SET LOCAL stuff - the number of gucs had a linear impact on the query time

The above is still true. However now that we use a single json for our GUCs, request.params would have a lesser impact.

@steve-chavez
Copy link
Member Author

The above ideas could also be done with a SECURITY LABEL extension, like discussed on #2442 (comment)

@rdlrt
Copy link

rdlrt commented Oct 27, 2022

The option to be able to access limit/offset - even if not other columns (controlled via config) - also helps us build RPCs that can artificially use custom pagination techniques (for my use case, currently modifying RPC to view works , combed with haproxy URL redirection for consistency - but would be nice to have a more consistent option as we predominantly rely on /rpc => /api map - which works for 80% of endpoints - except cases when RPC are not ideal - for being able to leverage keyset pagination)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants