[Feature Request] HTTP headers filtering - whitelist transaction-scoped settings #1941

dwagin · 2021-09-09T16:20:42Z

It's good to introduce an option for filtering HTTP headers passed to PostgreSQL.

For example, request-headers-allowlist, with default value is * (meaning all headers are passed).

There're many unnecessary set_config() commands in every transaction.

I've gotten 22 params at all, where a 50% of them are HTTP headers, a lot of people doesn't use (or uses small portion) it. We use only two of 11.

The text was updated successfully, but these errors were encountered:

steve-chavez · 2021-09-09T17:46:00Z

There're many unnecessary set_config() commands in every transaction.

Related to #1857. We'll need to set all the headers in a single set_config for postgres 14.

When we do that, perhaps passing all the headers won't be a problem?

For example, request-headers-allowlist, with default value is * (meaning all headers are passed).

I see no harm in offering such a config option though, it would reduce the passed json payload for #1857 as well.

wolfgangwalther · 2021-09-13T07:06:55Z

A different approach, compared to another config option, that achieves the same thing, but is much more flexible down the road is mentioned in #1710 (comment).

dwagin · 2021-09-16T16:34:31Z

I see no harm in offering such a config option though, it would reduce the passed json payload for #1857 as well.

@steve-chavez, of course, proposed enhancement would reduce JSON-payload, that decrease request overhead time.

A different approach, compared to another config option, that achieves the same thing, but is much more flexible down the road is mentioned in #1710 (comment).

@wolfgangwalther, your enhancement doesn't filter headers on PostgREST level, only on PostgreSQL level. Therefore, these features can complement each other, not replace.

wolfgangwalther · 2021-09-16T17:35:31Z

@wolfgangwalther, your enhancement doesn't filter headers on PostgREST level, only on PostgreSQL level.

That's not correct. Implementing the full proposal would allow filtering the headers on the PostgREST level already. It would be set up via arguments to a pre-request function - but those headers that are not used, will never be sent to PostgreSQL.

dwagin · 2021-09-16T17:50:30Z

@wolfgangwalther So I don't understand how it works, please tell me in more detail.

wolfgangwalther · 2021-09-20T12:27:50Z

@wolfgangwalther So I don't understand how it works, please tell me in more detail.

The proposal starts by stripping all set_config(...)s - so you'd start from zero. No headers transmitted to PostgreSQL at all.

At schema cache creation time, we'd read the argument list of the pre-request function definition. If that reads something like CREATE FUNCTION your_pre_req ("headers.your_first_header" TEXT, "headers.your_second_header" TEXT) ... only those two headers would be passed as arguments to the pre-request function. If you needed the headers in any other place than the pre-request function body, you could then call set_config(...) yourself in the pre-request function to make it available.

Thereby the filter will apply at schema cache creation time - and therefore at the PostgREST level, before getting to PostgreSQL at all during the actual query.

Does that make sense?

steve-chavez · 2021-09-20T22:26:04Z

@wolfgangwalther The pre-request approach would impact performance though(one extra roundtrip to the db), we discussed that somewhere. Unless we solve that part(by joining the pre-request call to the initial set locals), the request-headers-allowlist config is looking much simpler.

dwagin · 2021-09-21T12:05:08Z

@wolfgangwalther now everything is clear. IMHO main issue of this solution is function parameters, it is not very flexible and difficulty updating under load.

I would prefer something like CREATE FUNCTION your_pre_req (headers json) ... + request-headers-allowlist. This solves the PG14 compatibility issue and provides flexibility.

@steve-chavez of course pre-request must be join to initial statement, one nuance as far as I know the order of the call is not defined in SELECT func1(), func2(), ..., so the pre-request must be defined with schema.

wolfgangwalther · 2021-09-21T12:24:36Z

Unless we solve that part(by joining the pre-request call to the initial set locals)

Yes, I assumed that would be part of implementing that. But I can see how we'll face the same problem with SET ROLE here, that we would face in the main query.

So maybe we should join the pre-request function to the main query instead - and keep the SET ROLE (without all the other set_configs) as the first step. After all, we're quite certain that we can run the pre-request function before everything else - just not 100% as in "we can guarantee it will never happen". That's why we can't use it for SET ROLE, as this would impact security potentially.

@wolfgangwalther now everything is clear. IMHO main issue of this solution is function parameters, it is not very flexible and difficulty updating under load.

I would prefer something like CREATE FUNCTION your_pre_req (headers json) ...

Well, the proposal includes the ability to create a headers json argument to get all headers.

I don't really see the difficulty of updating under load, yet, but I haven't thought about that too much either. Could you explain where you see the problem?

dwagin · 2021-09-21T13:03:22Z

Well, the proposal includes the ability to create a headers json argument to get all headers.

Yes, I understand that needs to be added request-headers-allowlist.

I don't really see the difficulty of updating under load, yet, but I haven't thought about that too much either. Could you explain where you see the problem?

For example:

We have two PostgREST for HA
We want to add additional header handling

Our actions:

Add new function with additional parameter (we must continue to serve requests)
Reload the schema cache of the first PostgREST (which function will PostgREST choose?, functions should be with different names?)
Reload the schema cache of the second PostgREST
Drop old function

Our actions (single JSON):

Add new header to request-headers-allowlist
Reload all PostgREST instances (order is not important)
Replace pre-request function

I think the first case is more difficult than the second. Especially when DevOps and DBA are different people.

wolfgangwalther · 2021-09-22T12:55:45Z

Thanks for the walk-through.

2\. Reload the schema cache of the first PostgREST (which function will PostgREST choose?, functions should be with different names?)

I think this is the core of the issue. Once we create pre-request functions with different arguments, we can overload the same function. This will then quickly lead to a situation, where PostgREST needs to decide which of those to call later, while reloading the schema cache.

One way to handle this would be to move away from the config option db-pre-request (or whatever it's name is currently) and make it a CREATE FUNCTION ... SET pgrst.hook='pre-request' or something similar. If multiple functions are registered for the same hook, then all of them would be called.

This would allow to:

CREATE FUNCTION the new pre-request with SET pgrst.hook, CREATE OR REPLACE FUNCTION the old pre-request function without SET pgrst.hook - both in one transaction.
Then reload all PostgREST instances
Then drop the old pre-request function, which is not used anymore.

This would also make the system extensible to support hooks in other places than "pre-request" in the same way.

One side effect of this, that I like: We'd move more of the config right into SQL, where objects (here FUNCTION) are defined.

dwagin · 2021-09-22T13:26:58Z

One way to handle this would be to move away from the config option db-pre-request (or whatever it's name is currently) and make it a CREATE FUNCTION ... SET pgrst.hook='pre-request' or something similar.

I don't understand how it works, please tell me in more detail.

wolfgangwalther · 2021-09-22T13:45:04Z

One way to handle this would be to move away from the config option db-pre-request (or whatever it's name is currently) and make it a CREATE FUNCTION ... SET pgrst.hook='pre-request' or something similar.

I don't understand how it works, please tell me in more detail.

The concept of SET on a function definition is something that we tried out in #1582. We didn't merge that, yet, because of some limitations for inlining those functions, but those limits would not apply in this case here.

The basic idea is that we can set all kinds of "flags" or "parameters" via SET on a function definition. We can then parse those at schema cache creation time and decide how to use those functions. This could replace the db-pre-request option. Every function defined with a SET pgrst.hook='pre-request' would then be a pre-request function.

steve-chavez · 2022-07-28T05:00:57Z

So maybe we should join the pre-request function to the main query instead

I tried the above and the query can work like this:

WITH
pgrst_pre_request AS ( SELECT test."pre_request"() ),
pgrst_source AS ( SELECT "test"."clients".*FROM "test"."clients"     )
SELECT
 (select 1 from pgrst_pre_request) AS pre_req, -- it must be before the response_headers below otherwise headers set on pre-req don't get passed to PostgREST
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  nullif(current_setting('response.headers', true), '') AS response_headers,
  nullif(current_setting('response.status', true), '') AS response_status
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

wolfgangwalther · 2022-07-28T06:12:36Z

So maybe we should join the pre-request function to the main query instead

I tried the above and the query can work like this:

Don't we have the same problem with pre-request on the main query as with SET ROLE on the main query? Pre request functions are used to implement auth strategies, including a SET ROLE in them (I do!) - and that could possibly break.

I was now finally able to produce an example where joining SET ROLE to the main query would cause a problem:

create role authenticator;
create role alice;
create schema a;
grant create, usage on schema a to alice;
set role alice;
create table a.t ();
select * from a.t;
reset role;

select set_config('role', 'alice', false), current_setting('role');
-- returns: alice | alice

-- BUT:

set role authenticator;
select set_config('role', 'alice', false), current_setting('role') from a.t;
-- returns: permission denied for schema a

reset role;
grant usage on schema a to authenticator;
set role authenticator;
select set_config('role', 'alice', false), current_setting('role') from a.t;
-- returns: permission denied for table t

Since the authenticator role should have no privileges itself at all and since privileges are checked at planning time, this will always fail.

steve-chavez · 2022-07-28T14:26:44Z

Pre request functions are used to implement auth strategies, including a SET ROLE in them (I do!) - and that could possibly break.

Good call, my proposed query is a no go then. Then the only way to improve perf for pre-request would be pipeline mode.

wolfgangwalther · 2022-08-11T18:58:10Z

If we had pipeline mode, then #1941 (comment) would suddenly be a lot more attractive, I think.

steve-chavez · 2023-10-21T20:12:26Z

Now that we have transaction-scoped settings clearly defined, I think we can have a single config for all. Should be like:

# default
tx-settings = '{"*": {"*": ["*"]}}'

# whitelisting some headers and allowing all jwt claims
tx-settings = '{"public.table": {"request.headers": ["header1", "header2"]}, "public.func()": {"request.jwt.claims": ["*"]}}'

And with pre-config:

create or replace function postgrest.pre_config()
returns void as $$
select
  set_config('pgrst.tx_settings',
  , json_build_object('public.table'
    , json_build_object('request.headers', json_build_array('header1', 'header2'))
    )::text
  , true
  )
;
$$ language sql;

taimoorzaeem · 2024-01-15T17:31:24Z

tx-settings = '{"public.table": {"request.headers": ["header1", "header2"]}, "public.func()": {"request.jwt.claims": ["*"]}}'

@steve-chavez Could you give more detail about what public.table and public.func() are in this?

steve-chavez · 2024-01-16T05:04:16Z

Could you give more detail about what public.table and public.func() are in this?

@taimoorzaeem Those were meant to be a table and a function.. but I think we should take a step back. There was a recent change that affects this issue.

At schema cache creation time, we'd read the argument list of the pre-request function definition. If that reads something like CREATE FUNCTION your_pre_req ("headers.your_first_header" TEXT, "headers.your_second_header" TEXT)

@wolfgangwalther How would the above work now that we only have the JSON GUCs (60a9000)?

So if we had:

CREATE FUNCTION your_pre_req ("request.headers" json)

This would mean all the headers would be always passed to the pre-request function, which would still have a perf impact? How could we filter the headers that get passed as an argument?

(the pre-request way looks ideal since it's the most flexible one but not sure if we can make it work)

wolfgangwalther · 2024-01-16T07:06:02Z

How would the #1941 (comment) work now that we only have the JSON GUCs (60a9000)?

Still the same. The "new" JSON GUC format was only invented because of a limitation in what characters were allowed for GUC names. There is no such limitation for argument names... except maybe NAMEDATALEN. So it could be useful to think about shortening the required prefixes. I guess we don't need both request and headers, so header.xxx should be enough. Other stuff like request.method could still be prefixed with with request. jwt stuff could be without jwt prefix, too: claims.yyy, or maybe jwt.yyy (is there anything else than claims that we'd need from a jwt?).

steve-chavez · 2024-01-16T15:09:24Z

Hm, so the proposal remains in that we should filter the passed headers according to the pre-request parameters. But that caused some problems with deployment as mentioned above. It would be complex to define a function with 20 arguments if the user is interested in 20 headers.

Really this looks to me like a pure proxy issue. So if we had an nginx like directive, we could do it like:

server{
  location{
    pg_pre_request 'pre_req_func' '[$http_user_agent, $http_referer]' '..'
  }
}

We don't have that kind of flexibility, but maybe we can come up with a similar solution.

wolfgangwalther · 2024-01-16T16:18:29Z

But that caused some problems with deployment as mentioned #1941 (comment).

This is not a problem with the proposal here - this is a general problem of "how to do HA with PostgREST". The core problem was identified in #1941 (comment). You can easily work around that problem by versioning your pre-request function names to avoid any overloaded collisions:

Create new function pre_request_v2
reload all PostgREST instances with the new pre-request setting
Drop old function pre_request_v1

Yes, we do need to figure out what to do with multiple overloaded pre-request functions which all match our config. But I already made some proposals above.

It would be complex to define a function with 20 arguments if the user is interested in 20 headers.

But that's the opposite of what was the idea of this issue. If you need 20 headers, you are more than likely fine with just adding a catch-all headers argument. We can easily support both: headers for all headers and header.accept etc. for individual headers.

steve-chavez added difficulty: beginner Pure Haskell task enhancement a feature, ready for implementation labels Sep 9, 2021

steve-chavez mentioned this issue Jul 11, 2022

Feature Request: ability to call pg 11 stored procedure #1210

Open

steve-chavez mentioned this issue Dec 20, 2022

current_setting can lead to bad performance when used on RLS PostgREST/postgrest-docs#609

Open

steve-chavez removed the difficulty: beginner Pure Haskell task label Jul 23, 2023

steve-chavez added the difficulty: medium Haskell task involving PostgreSQL IO label Oct 21, 2023

steve-chavez changed the title ~~[Feature Request] HTTP headers filtering~~ [Feature Request] HTTP headers filtering - whitelist transaction-scoped settings Oct 21, 2023

steve-chavez mentioned this issue Nov 22, 2023

Add aggregate functions #2925

Merged

taimoorzaeem self-assigned this Jan 14, 2024

steve-chavez mentioned this issue Jan 16, 2024

Verify JWTs without changing role #3002

Open

taimoorzaeem removed their assignment Jan 16, 2024

steve-chavez removed the difficulty: medium Haskell task involving PostgreSQL IO label Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] HTTP headers filtering - whitelist transaction-scoped settings #1941

[Feature Request] HTTP headers filtering - whitelist transaction-scoped settings #1941

dwagin commented Sep 9, 2021

steve-chavez commented Sep 9, 2021

wolfgangwalther commented Sep 13, 2021

dwagin commented Sep 16, 2021

wolfgangwalther commented Sep 16, 2021

dwagin commented Sep 16, 2021

wolfgangwalther commented Sep 20, 2021

steve-chavez commented Sep 20, 2021

dwagin commented Sep 21, 2021

wolfgangwalther commented Sep 21, 2021

dwagin commented Sep 21, 2021 •

edited

Loading

wolfgangwalther commented Sep 22, 2021

dwagin commented Sep 22, 2021

wolfgangwalther commented Sep 22, 2021

steve-chavez commented Jul 28, 2022

wolfgangwalther commented Jul 28, 2022

steve-chavez commented Jul 28, 2022

wolfgangwalther commented Aug 11, 2022

steve-chavez commented Oct 21, 2023

taimoorzaeem commented Jan 15, 2024

steve-chavez commented Jan 16, 2024

wolfgangwalther commented Jan 16, 2024

steve-chavez commented Jan 16, 2024

wolfgangwalther commented Jan 16, 2024

[Feature Request] HTTP headers filtering - whitelist transaction-scoped settings #1941

[Feature Request] HTTP headers filtering - whitelist transaction-scoped settings #1941

Comments

dwagin commented Sep 9, 2021

steve-chavez commented Sep 9, 2021

wolfgangwalther commented Sep 13, 2021

dwagin commented Sep 16, 2021

wolfgangwalther commented Sep 16, 2021

dwagin commented Sep 16, 2021

wolfgangwalther commented Sep 20, 2021

steve-chavez commented Sep 20, 2021

dwagin commented Sep 21, 2021

wolfgangwalther commented Sep 21, 2021

dwagin commented Sep 21, 2021 • edited Loading

wolfgangwalther commented Sep 22, 2021

dwagin commented Sep 22, 2021

wolfgangwalther commented Sep 22, 2021

steve-chavez commented Jul 28, 2022

wolfgangwalther commented Jul 28, 2022

steve-chavez commented Jul 28, 2022

wolfgangwalther commented Aug 11, 2022

steve-chavez commented Oct 21, 2023

taimoorzaeem commented Jan 15, 2024

steve-chavez commented Jan 16, 2024

wolfgangwalther commented Jan 16, 2024

steve-chavez commented Jan 16, 2024

wolfgangwalther commented Jan 16, 2024

dwagin commented Sep 21, 2021 •

edited

Loading