Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] HTTP headers filtering - whitelist transaction-scoped settings #1941

Open
dwagin opened this issue Sep 9, 2021 · 23 comments
Labels
enhancement a feature, ready for implementation

Comments

@dwagin
Copy link
Contributor

dwagin commented Sep 9, 2021

It's good to introduce an option for filtering HTTP headers passed to PostgreSQL.

For example, request-headers-allowlist, with default value is * (meaning all headers are passed).

There're many unnecessary set_config() commands in every transaction.

I've gotten 22 params at all, where a 50% of them are HTTP headers, a lot of people doesn't use (or uses small portion) it. We use only two of 11.

@steve-chavez
Copy link
Member

There're many unnecessary set_config() commands in every transaction.

Related to #1857. We'll need to set all the headers in a single set_config for postgres 14.

When we do that, perhaps passing all the headers won't be a problem?

For example, request-headers-allowlist, with default value is * (meaning all headers are passed).

I see no harm in offering such a config option though, it would reduce the passed json payload for #1857 as well.

@steve-chavez steve-chavez added difficulty: beginner Pure Haskell task enhancement a feature, ready for implementation labels Sep 9, 2021
@wolfgangwalther
Copy link
Member

A different approach, compared to another config option, that achieves the same thing, but is much more flexible down the road is mentioned in #1710 (comment).

@dwagin
Copy link
Contributor Author

dwagin commented Sep 16, 2021

I see no harm in offering such a config option though, it would reduce the passed json payload for #1857 as well.

@steve-chavez, of course, proposed enhancement would reduce JSON-payload, that decrease request overhead time.

A different approach, compared to another config option, that achieves the same thing, but is much more flexible down the road is mentioned in #1710 (comment).

@wolfgangwalther, your enhancement doesn't filter headers on PostgREST level, only on PostgreSQL level. Therefore, these features can complement each other, not replace.

@wolfgangwalther
Copy link
Member

@wolfgangwalther, your enhancement doesn't filter headers on PostgREST level, only on PostgreSQL level.

That's not correct. Implementing the full proposal would allow filtering the headers on the PostgREST level already. It would be set up via arguments to a pre-request function - but those headers that are not used, will never be sent to PostgreSQL.

@dwagin
Copy link
Contributor Author

dwagin commented Sep 16, 2021

@wolfgangwalther So I don't understand how it works, please tell me in more detail.

@wolfgangwalther
Copy link
Member

@wolfgangwalther So I don't understand how it works, please tell me in more detail.

The proposal starts by stripping all set_config(...)s - so you'd start from zero. No headers transmitted to PostgreSQL at all.

At schema cache creation time, we'd read the argument list of the pre-request function definition. If that reads something like CREATE FUNCTION your_pre_req ("headers.your_first_header" TEXT, "headers.your_second_header" TEXT) ... only those two headers would be passed as arguments to the pre-request function. If you needed the headers in any other place than the pre-request function body, you could then call set_config(...) yourself in the pre-request function to make it available.

Thereby the filter will apply at schema cache creation time - and therefore at the PostgREST level, before getting to PostgreSQL at all during the actual query.

Does that make sense?

@steve-chavez
Copy link
Member

@wolfgangwalther The pre-request approach would impact performance though(one extra roundtrip to the db), we discussed that somewhere. Unless we solve that part(by joining the pre-request call to the initial set locals), the request-headers-allowlist config is looking much simpler.

@dwagin
Copy link
Contributor Author

dwagin commented Sep 21, 2021

@wolfgangwalther now everything is clear. IMHO main issue of this solution is function parameters, it is not very flexible and difficulty updating under load.

I would prefer something like CREATE FUNCTION your_pre_req (headers json) ... + request-headers-allowlist. This solves the PG14 compatibility issue and provides flexibility.

@steve-chavez of course pre-request must be join to initial statement, one nuance as far as I know the order of the call is not defined in SELECT func1(), func2(), ..., so the pre-request must be defined with schema.

@wolfgangwalther
Copy link
Member

Unless we solve that part(by joining the pre-request call to the initial set locals)

Yes, I assumed that would be part of implementing that. But I can see how we'll face the same problem with SET ROLE here, that we would face in the main query.

So maybe we should join the pre-request function to the main query instead - and keep the SET ROLE (without all the other set_configs) as the first step. After all, we're quite certain that we can run the pre-request function before everything else - just not 100% as in "we can guarantee it will never happen". That's why we can't use it for SET ROLE, as this would impact security potentially.


@wolfgangwalther now everything is clear. IMHO main issue of this solution is function parameters, it is not very flexible and difficulty updating under load.

I would prefer something like CREATE FUNCTION your_pre_req (headers json) ...

Well, the proposal includes the ability to create a headers json argument to get all headers.

I don't really see the difficulty of updating under load, yet, but I haven't thought about that too much either. Could you explain where you see the problem?

@dwagin
Copy link
Contributor Author

dwagin commented Sep 21, 2021

Well, the proposal includes the ability to create a headers json argument to get all headers.

Yes, I understand that needs to be added request-headers-allowlist.

I don't really see the difficulty of updating under load, yet, but I haven't thought about that too much either. Could you explain where you see the problem?

For example:

  1. We have two PostgREST for HA
  2. We want to add additional header handling

Our actions:

  1. Add new function with additional parameter (we must continue to serve requests)
  2. Reload the schema cache of the first PostgREST (which function will PostgREST choose?, functions should be with different names?)
  3. Reload the schema cache of the second PostgREST
  4. Drop old function

Our actions (single JSON):

  1. Add new header to request-headers-allowlist
  2. Reload all PostgREST instances (order is not important)
  3. Replace pre-request function

I think the first case is more difficult than the second. Especially when DevOps and DBA are different people.

@wolfgangwalther
Copy link
Member

Thanks for the walk-through.

2\. Reload the schema cache of the first PostgREST (which function will PostgREST choose?, functions should be with different names?)

I think this is the core of the issue. Once we create pre-request functions with different arguments, we can overload the same function. This will then quickly lead to a situation, where PostgREST needs to decide which of those to call later, while reloading the schema cache.

One way to handle this would be to move away from the config option db-pre-request (or whatever it's name is currently) and make it a CREATE FUNCTION ... SET pgrst.hook='pre-request' or something similar. If multiple functions are registered for the same hook, then all of them would be called.

This would allow to:

  • CREATE FUNCTION the new pre-request with SET pgrst.hook, CREATE OR REPLACE FUNCTION the old pre-request function without SET pgrst.hook - both in one transaction.
  • Then reload all PostgREST instances
  • Then drop the old pre-request function, which is not used anymore.

This would also make the system extensible to support hooks in other places than "pre-request" in the same way.

One side effect of this, that I like: We'd move more of the config right into SQL, where objects (here FUNCTION) are defined.

@dwagin
Copy link
Contributor Author

dwagin commented Sep 22, 2021

One way to handle this would be to move away from the config option db-pre-request (or whatever it's name is currently) and make it a CREATE FUNCTION ... SET pgrst.hook='pre-request' or something similar.

I don't understand how it works, please tell me in more detail.

@wolfgangwalther
Copy link
Member

One way to handle this would be to move away from the config option db-pre-request (or whatever it's name is currently) and make it a CREATE FUNCTION ... SET pgrst.hook='pre-request' or something similar.

I don't understand how it works, please tell me in more detail.

The concept of SET on a function definition is something that we tried out in #1582. We didn't merge that, yet, because of some limitations for inlining those functions, but those limits would not apply in this case here.

The basic idea is that we can set all kinds of "flags" or "parameters" via SET on a function definition. We can then parse those at schema cache creation time and decide how to use those functions. This could replace the db-pre-request option. Every function defined with a SET pgrst.hook='pre-request' would then be a pre-request function.

@steve-chavez
Copy link
Member

So maybe we should join the pre-request function to the main query instead

I tried the above and the query can work like this:

WITH
pgrst_pre_request AS ( SELECT test."pre_request"() ),
pgrst_source AS ( SELECT "test"."clients".*FROM "test"."clients"     )
SELECT
 (select 1 from pgrst_pre_request) AS pre_req, -- it must be before the response_headers below otherwise headers set on pre-req don't get passed to PostgREST
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  nullif(current_setting('response.headers', true), '') AS response_headers,
  nullif(current_setting('response.status', true), '') AS response_status
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

@wolfgangwalther
Copy link
Member

So maybe we should join the pre-request function to the main query instead

I tried the above and the query can work like this:

Don't we have the same problem with pre-request on the main query as with SET ROLE on the main query? Pre request functions are used to implement auth strategies, including a SET ROLE in them (I do!) - and that could possibly break.

I was now finally able to produce an example where joining SET ROLE to the main query would cause a problem:

create role authenticator;
create role alice;
create schema a;
grant create, usage on schema a to alice;
set role alice;
create table a.t ();
select * from a.t;
reset role;

select set_config('role', 'alice', false), current_setting('role');
-- returns: alice | alice

-- BUT:

set role authenticator;
select set_config('role', 'alice', false), current_setting('role') from a.t;
-- returns: permission denied for schema a

reset role;
grant usage on schema a to authenticator;
set role authenticator;
select set_config('role', 'alice', false), current_setting('role') from a.t;
-- returns: permission denied for table t

Since the authenticator role should have no privileges itself at all and since privileges are checked at planning time, this will always fail.

@steve-chavez
Copy link
Member

Pre request functions are used to implement auth strategies, including a SET ROLE in them (I do!) - and that could possibly break.

Good call, my proposed query is a no go then. Then the only way to improve perf for pre-request would be pipeline mode.

@wolfgangwalther
Copy link
Member

If we had pipeline mode, then #1941 (comment) would suddenly be a lot more attractive, I think.

@steve-chavez
Copy link
Member

Now that we have transaction-scoped settings clearly defined, I think we can have a single config for all. Should be like:

# default
tx-settings = '{"*": {"*": ["*"]}}'

# whitelisting some headers and allowing all jwt claims
tx-settings = '{"public.table": {"request.headers": ["header1", "header2"]}, "public.func()": {"request.jwt.claims": ["*"]}}'

And with pre-config:

create or replace function postgrest.pre_config()
returns void as $$
select
  set_config('pgrst.tx_settings',
  , json_build_object('public.table'
    , json_build_object('request.headers', json_build_array('header1', 'header2'))
    )::text
  , true
  )
;
$$ language sql;

@steve-chavez steve-chavez added the difficulty: medium Haskell task involving PostgreSQL IO label Oct 21, 2023
@steve-chavez steve-chavez changed the title [Feature Request] HTTP headers filtering [Feature Request] HTTP headers filtering - whitelist transaction-scoped settings Oct 21, 2023
@taimoorzaeem taimoorzaeem self-assigned this Jan 14, 2024
@taimoorzaeem
Copy link
Collaborator

tx-settings = '{"public.table": {"request.headers": ["header1", "header2"]}, "public.func()": {"request.jwt.claims": ["*"]}}'

@steve-chavez Could you give more detail about what public.table and public.func() are in this?

@steve-chavez
Copy link
Member

Could you give more detail about what public.table and public.func() are in this?

@taimoorzaeem Those were meant to be a table and a function.. but I think we should take a step back. There was a recent change that affects this issue.


At schema cache creation time, we'd read the argument list of the pre-request function definition. If that reads something like CREATE FUNCTION your_pre_req ("headers.your_first_header" TEXT, "headers.your_second_header" TEXT)

@wolfgangwalther How would the above work now that we only have the JSON GUCs (60a9000)?

So if we had:

CREATE FUNCTION your_pre_req ("request.headers" json)

This would mean all the headers would be always passed to the pre-request function, which would still have a perf impact? How could we filter the headers that get passed as an argument?

(the pre-request way looks ideal since it's the most flexible one but not sure if we can make it work)

@wolfgangwalther
Copy link
Member

How would the #1941 (comment) work now that we only have the JSON GUCs (60a9000)?

Still the same. The "new" JSON GUC format was only invented because of a limitation in what characters were allowed for GUC names. There is no such limitation for argument names... except maybe NAMEDATALEN. So it could be useful to think about shortening the required prefixes. I guess we don't need both request and headers, so header.xxx should be enough. Other stuff like request.method could still be prefixed with with request. jwt stuff could be without jwt prefix, too: claims.yyy, or maybe jwt.yyy (is there anything else than claims that we'd need from a jwt?).

@steve-chavez
Copy link
Member

Hm, so the proposal remains in that we should filter the passed headers according to the pre-request parameters. But that caused some problems with deployment as mentioned above. It would be complex to define a function with 20 arguments if the user is interested in 20 headers.

Really this looks to me like a pure proxy issue. So if we had an nginx like directive, we could do it like:

server{
  location{
    pg_pre_request 'pre_req_func' '[$http_user_agent, $http_referer]' '..'
  }
}

We don't have that kind of flexibility, but maybe we can come up with a similar solution.

@wolfgangwalther
Copy link
Member

But that caused some problems with deployment as mentioned #1941 (comment).

This is not a problem with the proposal here - this is a general problem of "how to do HA with PostgREST". The core problem was identified in #1941 (comment). You can easily work around that problem by versioning your pre-request function names to avoid any overloaded collisions:

  • Create new function pre_request_v2
  • reload all PostgREST instances with the new pre-request setting
  • Drop old function pre_request_v1

Yes, we do need to figure out what to do with multiple overloaded pre-request functions which all match our config. But I already made some proposals above.

It would be complex to define a function with 20 arguments if the user is interested in 20 headers.

But that's the opposite of what was the idea of this issue. If you need 20 headers, you are more than likely fine with just adding a catch-all headers argument. We can easily support both: headers for all headers and header.accept etc. for individual headers.

@taimoorzaeem taimoorzaeem removed their assignment Jan 16, 2024
@steve-chavez steve-chavez removed the difficulty: medium Haskell task involving PostgreSQL IO label Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement a feature, ready for implementation
Development

No branches or pull requests

4 participants