Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize URL structure - Confusion about /rpc/ prefix, move stored procedures to root level #1086

Open
omani opened this issue Mar 27, 2018 · 18 comments
Labels
enhancement a feature, ready for implementation

Comments

@omani
Copy link

omani commented Mar 27, 2018

So apparently #561 was made due to the issue of not being able to treat read-only dbs (replica slaves) in a proper way.

another reported issue is #547. OP demonstrates a simple curl to localhost on a replicated psql server. it fails due to having the transaction mode set to read-write. which makes sense, since it was requested on a read-only slave.

impact

the above merge practically makes any rpc which wants to modify the db unusable with a GET request even if it is volatile by default (non stable, non immutable). as seen in https://github.com/begriffs/postgrest/blob/108f3cd651a448fad065faf81a2090cee8a8d9fb/src/PostgREST/ApiRequest.hs#L169.

my proposed solution

  • check if psql server is in replica mode with:
SELECT pg_is_in_recovery();

if false, use read-write transaction. this enabled the GET verb on rpc calls which are volatile.

request for comments.

@omani omani changed the title Treat transactions read-write by default again. Suggestion proposed. Make transactions read-write by default again. Suggestion proposed. Mar 27, 2018
@begriffs
Copy link
Member

the above merge practically makes any rpc which wants to modify the db unusable with a GET request even if it is volatile by default (non stable, non immutable).

That's right, and in fact it's the primary motivation. A GET request is considered safe by the HTTP spec, meaning it should not modify server state. Call the RPC via POST if you want to be able to update the database.

(The only time that this is kind of annoying is when the RPC wants to simply log the GET request internally in the db. The read-only restriction prevents it from taking that harmless action. As a workaround you can use nginx on top of the API server to do the logging.)

@omani
Copy link
Author

omani commented Mar 27, 2018

I understand that it fits and conforms to the HTTP spec defining a GET as safe and idempotent. do we apply the same reasoning on an RPC call? I see a procedure differently than a view/table. A function is an isolated closed-system which can do various things, in fact I am using procedures to do things I am not able to do with only tables and views.

and also: a (transactional) procedure in psql provides its own isolation levels and security mechanisms.

so it's maybe a question of how we want to see and treat a procedure. I told you my way of how I see it.

an example regarding the limitations:
right now I want to send an email to a user with an activation link in it. a click on it should activate his account. the rpc call checks for the email and activationcode combination and sets the user to "active" and deletes the activationcode from the activations table.

the UPDATE and DELETE here is impossible unfortunately.
there is no way to make a POST from within the email client, nor browser (if he clicks on the link). so I am basically dependent on a third party module/library/tool to solve this. whereas having the freedom to treat an RPC differently than a view/table in terms of safeness and idempotence would make this problem disappear.

@begriffs
Copy link
Member

do we apply the same reasoning on an RPC call?

I don't see why not, since some RPCs provide information, and others modify things. It's useful to preserve HTTP semantics there as well as anywhere else.

right now I want to send an email to a user with an activation link in it

Hm, I can see how that's troublesome. Sucks that the browser user-agent has always been too underpowered to support links with arbitrary verbs. In your email you could link users to a page with javascript inside that makes the API call to the RPC.

having the freedom to treat an RPC differently than a view/table in terms of safeness and idempotence would make this problem disappear.

...and new problems to appear for others. If we were to remove the safety guarantees from GET I feel like it lowers the standards of the project, allowing people to build APIs that violate valuable conventions. Maybe the best way to think of it is that PostgREST creates APIs for programs to use, and that it's not always appropriate for direct consumption by a user. Sometimes you have to build a UI layer on top of the API.

@omani
Copy link
Author

omani commented Mar 27, 2018

maybe I should point out more clearly how I see it:

the HTTP spec conformity is a good thing, because views/tables are just bare, naked pieces of elements you can interact with (a sort of gate to the underlying db). by using HTTP in a restful manner we define what a view can do for us. (I hope I could explain myself)

a function on the other hand is something isolated. a closed system. like a mathmatical function it expects an input and provides an output. modification is a must have for this to work. now let's say we apply those HTTP conformity to procedures as well. what is the outcome? we treat functions the same way as if they were an element without its own functionality, like a view is kind of.

so with all the functionality a procedure in psql comes with: isolation levels, security definers, and volatility, we should treat those functionalities as a second layer on top of the HTTP layer.

In fact, when I think about it, we dont apply HTTP in a restful way on procedures like we do it on views/tables. since there is no DELETE or PATCH available for procedures, it is somehow a half baked thing to say that we try to comply with the HTTP specs for one part (a GET should be safe and idempotent) but the other above mentioned two verbs are missing completely because they would make no sense on a function. which is true. so I say, let the function define its own rules (you can make them read only, write only, read writable, isolated, secure, non volatile, etc.) and treat them for what they are:

a remote procedure call.

@omani
Copy link
Author

omani commented Mar 27, 2018

you mentioned few good points, which really point out the limitations I see in this and why I have such a trouble:

In your email you could link users to a page with javascript inside that makes the API call to the RPC.

my application has no web UI or website. it is a pure cli application. yes I can use third party tools and eg. let the user put the activation code inside my cli app and use libs to make a POST request. I see no problem in finding workarounds to this. for me it is more about the principle here.

your second mentioned "limitations" as I see it:

If we were to remove the safety guarantees from GET I feel like it lowers the standards of the project, allowing people to build APIs that violate valuable conventions. Maybe the best way to think of it is that PostgREST creates APIs for programs to use, and that it's not always appropriate for direct consumption by a user. Sometimes you have to build a UI layer on top of the API.

again I can do workarounds here but I really have a hard time to understand why we think this way. by no means I see this as a wrong decision or as a problem per se. it's just that we somehow hide or handwave away the fact that psql is able to do this for us and we would imo not violate valuable conventions. I am not sure to say this but I think people would welcome this.

I guess I see your point and reasoning but I can't understand it.

@omani
Copy link
Author

omani commented Mar 27, 2018

so I say, let the function define its own rules (you can make them read only, write only, read writable, isolated, secure, non volatile, etc.) and treat them for what they are: a remote procedure call.

we then would reduce the HTTP verbs on procedures to the point where we use two verbs, only for defining how we want to pass arguments, like in old days where you had the choice of either using GET with all your params as query strings, or a POST action on a form.

this way we give people the freedom to do whatever they want to do with a procedure and let them choose how they want to pass their arguments to it.

@begriffs
Copy link
Member

begriffs commented Mar 27, 2018

a function on the other hand is something isolated. a closed system.

Except it's not. It can modify the state of the database. If this is acceptable behavior then a user may call it with POST. Otherwise use GET and enjoy the protection elegantly provided by read-only transactions.

we treat functions the same way as if they were an element without its own functionality, like a view is kind of.

Many people resort to RPCs in PostgREST in order to provide views that require more sophisticated queries than are currently possible via the URL query building functionality. The rpc-as-view thing is a common use case.

there is no DELETE or PATCH available for procedures, it is somehow a half baked thing

Sorry, as you mentioned it's impossible to add those verbs for functions so we're doing the best we can in providing the verbs that do make sense, and providing them correctly. IMHO abusing a GET request does not improve the situation.

I think maybe you've run into a limitation of href links and/or are conflating a program-level API with a user interface and now seek to introduce a shortcut to HTTP itself that will weaken the assurances enjoyed by other PostgREST users. I hope this doesn't come across as too dogmatic, like I'm hypnotized by RFC 2616, but I see value in honoring the meaning of the protocol.

I'm curious what other people have to say in this issue.

Edit: I appreciate you debating my position on this though. 😄 Most design decisions on this project have gone through prolonged debate, and perhaps other people have more points to add to this discussion.

@omani
Copy link
Author

omani commented Mar 27, 2018

I think maybe you've run into a limitation of href links and/or are conflating a program-level API with a user interface and now seek to introduce a shortcut to HTTP itself that will weaken the assurances enjoyed by other PostgREST users.

I knew this would come. @begriffs I highly appreciate your effort for making things right and provide us with this great piece of software. by no means I am trying to find a solution to my own problem here. as I've said I can find ways to do a workaround to this. I already have one workaround which works and it's ok and I would stick with it.

the thing is just, the more I thought about this (when I was facing this issue) the more it made sense to me to start this discussion because I somehow felt like we would treat functions in psql in a "wrong" way.

now to your point:

Except it's not. It can modify the state of the database. If this is acceptable behavior then a user may call it with POST. Otherwise use GET and enjoy the protection elegantly provided by read-only transactions.

what I mean by "isolated, closed system" is that a procedure comes with functionalities which makes it behave exactly how you want and what it should do (rather views which are not able to do so, so we outsource that on to the HTTP layer, restful, spec'ed).

let me give you an example:
if I have a function which inserts something into the db, why would I want it to be read only if it is called with a GET anyway? the whole purpose of the function is to manipulate the data. so I - as a user - don't want to separate things to a GET or POST and say: "if it is a GET, don't do anything. oh but if it is a POST, it should do what is was made for".

one way to do this is eg. let the user decide what he expects the function to do:

create or replace function api.delete_something(element text)
returns void as $$
begin
    delete from api.sometable where something = element;
end
$$ security definer language plpgsql;

now it shouldn't make any difference to the user if it was called with a GET or a POST. I mean, what is a user supposed to expect on a GET? that the function shouldn't delete at all? it is the whole purpose of this function to delete something.

Many people resort to RPCs in PostgREST in order to provide views that require more sophisticated queries than are currently possible via the URL query building functionality. The rpc-as-view thing is a common use case.

sure, no problem with that. I do it here and there, too. now, let's say it is a function that exactly does this, mimicing a view (with some more features a normal view can't give you). then I say:

create or replace function api.some_more_advanced_view()
returns void as $$
begin
   set transaction read;
   select * from api.normalview join .... where ... order by ....;
end
$$ security definer language plpgsql;

I made the transaction read-only. a POST in what ever form it comes to manipulate the data (wonder how, because it is just a select), has no effect.

@omani
Copy link
Author

omani commented Mar 27, 2018

maybe worth mentioning:

Why the /rpc prefix? One reason is to avoid name collisions between views and procedures. It also helps emphasize to API consumers that these functions are not normal restful things. The functions can have arbitrary and surprising behavior, not the standard “post creates a resource” thing that users expect from the other routes.

  • from the docs at postgrest.com

.. that these functions are not normal restful things. The functions can have arbitrary and surprising behavior, not the standard “post creates a resource” thing that users expect from the other routes.

so we (you) already realized that an RPC is something different. because it is purely dependent on the user what it does. by saying, we serve you with safetyness on a GET verb is imo restricting a user in his abilities to make things with a procedure. of course a POST to that very same function will make it work, but then I wonder why we restrict it this way. what makes a GET different than a POST. can you give me one example to this? maybe the rpc-as-view you mentioned?

because when I look at the different given examples in our issues list, for example:

now how would a GET differ from a POST to all stored procedures you will find in the above listed issues?

actually, I just looked into my code (hoped to find a good example for an rpc-as-view misuse of a procedure), and I couldn't find one. so no, I don't use a single one as a view replacement.

maybe you can provide one example.

@omani
Copy link
Author

omani commented Mar 27, 2018

ok so to summarize my points:

  • we want restful communication which is fine
  • we can't do it on rpcs, since it makes no sense. but we did our best to comply with the http specs. even on rpc, that said.
  • but we can't use DELETE, PATCH on rpcs, because they make no sense. which is right, so maybe not that restful though
  • but we make GET read-only because refering to the HTTP spec (or restful way) only a POST should manipulate data. because GET is seen as safe and idempotent
  • but there is actually no difference in GETing or POSTing to an rpc when an rpc does what it does
  • but when it does manipulate you can only manipualte on POSTs, not GETs
  • because GET is idempotent. only POST has the intention to change something
  • but the only purpose of a function is there to change something. or maybe not. maybe it fetches only some records and gives them back to the client
  • but then a POST on it would make no sense either. because the intention is to change something with POST (or create a resource)

it somehow confuses me as you can see.

my proposed solution:

  • reduce HTTP on rpcs to only GET and POST (which we already do)
  • reduce it to the way we pass arguments
  • like in old days, either use GET with params in query string or POST to "hide" the params in a json
  • let the user do what it wants to do with a procedure
  • let the user control what the procedure is able to do:
    • if I am sure I will never ever change something set the transaction to read only with set transaction read
    • if I am sure it will change things, it will work on both verbs, since it is read-write by default (when I don't specify a mode with set transaction)
  • dont break the change of Use read-only transaction mode for read requests #561 (had its origin because of replica slaves issue) by checking for replica slave or replica mode in the first place with SELECT pg_is_in_recovery();

thank you for your input on this @begriffs

@ruslantalpa
Copy link
Contributor

@omani if you want to hack your way out of the RPC in regards to GET, you can easily have the proxy rewrite the request verb and everybody is happy.

A function as @begriffs said is not free to do whatever it wants and yet "lie" that it's idempotent/pure.
To eliminate the confusion (look at it as maybe a future implementation) one can not call a "stable/immutable" function using POST, and one can not call a "volatile" function using GET. So you are free to declare the functions however you like but you also have to respect the HTTP semantics that flow out of your function declaration (or resort to hacks with the proxy).

You mentioned the docs in regard to the RPC thing, and indeed there is a conflict between what we are saying here and what the docs say. I think the docs are wrong and the /rpc prefix is a legacy that ideally should be eliminated. The outside world (users of the api) should be "in the dark" about what exactly is the implementation of the endpoint they are calling. Like i start out with a view users in version one, find a limitation and reimplement the endpoint using a stored procedure. Name collision protection should be implemented in another way (i see no valid reason to have a function and a view with the same name in a schema).

@steve-chavez
Copy link
Member

The way I see it is that PostgREST must work with the medium, for exposing a SQL feature we always seek to comply with HTTP semantics, for example, though in SQL you can do a LIMIT 0 you can't do that through PostgREST since we adhere to Range(RFC 2616) semantics.

I agree with Ruslan that the /rpc prefix must be dropped as it's the source of this confusion, and want to point out that we would still be complying with REST by doing this, quoting Roy Fielding:

"Resources are not storage items (or, at least, they aren’t always equivalent to some storage item on the back-end). The same resource state can be overlayed by multiple resources, just as an XML document can be represented as a sequence of bytes or a tree of individually addressable nodes. Likewise, a single resource can be the equivalent of a database stored procedure, with the power to abstract state changes over any number of storage items".

Also, about DELETE/PATCH/PUT not working on stored procedures, from this comment:

"HTTP operations are generic: they are allowed or not, per resource, but they are always valid."

So as I see it, we can move stored procedures to root level and only allow GET/POST on them. Since a function and view/table can have the same name under the same schema in PostgreSQL, the downside is that we'd have to add some logic to differentiate them.

@omani
Copy link
Author

omani commented Jun 4, 2018

@steve-chavez I like your proposal. I think both under root would be a good thing.

@steve-chavez steve-chavez changed the title Make transactions read-write by default again. Suggestion proposed. Confusion about /rpc/ prefix, move stored procedures to root level Jun 5, 2018
@steve-chavez steve-chavez added the enhancement a feature, ready for implementation label Jun 5, 2018
@steve-chavez steve-chavez added hygiene cleanup or refactoring and removed enhancement a feature, ready for implementation labels Aug 4, 2018
@wolfgangwalther
Copy link
Member

Since a function and view/table can have the same name under the same schema in PostgreSQL, the downside is that we'd have to add some logic to differentiate them.

Assume a table and function with the same name. The logic to differentiate them, can not depend on any type of custom request parameter (that makes the api-user aware of whats happening), because that would not be any better than / any different from having the rpc/ prefix in the first place.

Let's assume the very simple (simple as in function definition, but actually the one with the fewest hints available, so the hardest to make a decision about) case of the following:

CREATE TABLE accounts (...);

CREATE FUNCTION accounts () RETURNS SETOF accounts AS $$
  SELECT * FROM accounts WHERE my_custom_filter
$$ LANGUAGE SQL STABLE;

No function arguments, same return type as the table itself. No hints. Since the decision to choose function or table is not about the intent of the api-user, we must guess the intent of the api-developer.

Imho, there is only one valid assumption: The intent must be to replace one of the request methods to the "regular" table endpoint accounts by a custom implementation, while keeping all the other request methods the same. In this case, because it's marked STABLE obviously the GET method.

So in this case PostgREST should choose to call the function for a GET request to the /accounts endpoint, but use regular queries for POST, PUT, PATCH, and DELETE methods directly to the table.

Now imagine a function with the same name, but marked as VOLATILE. Obviously this would mean to replace those mutating requests with a custom implementation. Since we can't guess which one, it should replace all of the 4 in the default case and leave it up to the function itself to check what request.method (does that guc exist already? It should!) is set to and act accordingly.

Now this can be extended to allow replacing only some of the mutating methods, e.g. by using SET on the function definition (I love using SET) and parsing that in the schema cache:

CREATE FUNCTION accounts () RETURNS SETOF accounts AS $$
  [...]
$$ LANGUAGE SQL SET request.method.handler_for = 'POST,PUT';

This would only replace POST and PUT requests, but not PATCH or DELETE - those would still use the regular queries directly to the table.

Now the only thing remaining is: How do we handle replacing e.g. GET and PATCH at the same time, but keeping POST, PUT and DELETE the default? We can't mark a function as both STABLE and VOLATILE at the same time. The only solution is to use function overloading, marking one of them as STABLE and the other one as VOLATILE. Now for overloading, those two functions need different signatures. Introducing an unnamed first argument to the volatile function and calling that with the request body would solve that nicely:

CREATE TABLE accounts (...);

CREATE FUNCTION accounts () RETURNS SETOF accounts AS $$
  SELECT * FROM accounts WHERE my_custom_filter
$$ LANGUAGE SQL STABLE;

CREATE FUNCTION accounts (TEXT) RETURNS SETOF accounts AS $$
  [...] -- $1 will hold the request body in TEXT format now
$$ LANGUAGE SQL STABLE SET request.method.handler_for = 'PATCH';

Since this argument is unnamed and unnamed arguments are not used at all so far by PostgREST for RPCs, it can be parsed and identified by the schema cache as the argument to pass the request body on - while at the same time not creating any confusion with query parameters or parsed json keys from the body, which could still co-exist.

So far, so good, I guess. Thinking this through I imagine this providing exciting new possibilities in addition to the already built-in postgres features like views, triggers and rules.

I think it would be a lot more useful even, with a way to pass in / use the "conditions for horizontal filtering" inside or before the function call. Otherwise calls to PATCH or DELETE don't make that much sense, when they delete the whole table and then horizontally filter the returned result set ;).

A more than vague idea would be to maybe run a SELECT query to the base table with those filters and then provide a REFCURSOR to that query as another unnamed argument. I had not much experiences with cursors so far, so I don't know how feasible that would be.

Admittedly, this now warrants the "enhancement" label a bit more than the "hygiene" label, I guess ;).

@disarticulate
Copy link

ran into this whith the GET.

My use case is having a protected function and a cached table, where the function first checks the table for return.

If there's data to return and within the TTL, it simply returns.

If theres no data, then it triggers another function which queries the data and return sthat.

Finally, if there is data and it's stale, then we delete the cached data, and update the TTL and return a new query.

As far as the client is concerned, this is all invisible and it's just a GET request. The API's i'm confirming to are simply get's for geojson so no real control on those APIs.

@steve-chavez
Copy link
Member

steve-chavez commented Apr 5, 2023

$$ LANGUAGE SQL SET request.method.handler_for = 'POST,PUT';

On #1582, we came up with the idea to represent media types as domain aliases, like CREATE DOMAIN "text/csv" AS text. Basically taking advantage of pg names that contain a slash.

Since this is about URL paths, I think we can do the same to customize them.

  • CREATE FUNCTION "./my/path/to/projects" RETURNS SETOF projects ... -> GET /my/path/to/projects
  • CREATE VIEW "./another/path/projects .. -> GET /another/path/projects

It might even be possible to encode a url parameter inside the name

CREATE FUNCTION "./projects/:id" RETURNS SETOF projects ...

Which then can be obtained on the function or view as current_setting('request.path.id', ...


This would allow to override the rpc prefix without a breaking change. Plus it doesn't require doing CREATE FUNCTION ... SET request.path, which we found that required SUPERUSER on #2701 (comment).

@steve-chavez
Copy link
Member

steve-chavez commented Apr 18, 2023

I think the above is pretty cool as we can specify the more "conventional" REST routes in a declarative way like:

CREATE VIEW "./projects/:id" AS
SELECT * 
FROM projects
WHERE id = current_setting('request.path.id`, false);

Notice the false in current_setting('request.path.id, false), that means if request.path.id is not defined(meaning the /:id is not specified) the query will fail immediately.

We could even fail before the SQL query is ran, at our planning level, since the path in the name would allow us to do a static check.


So the above allows us to make a parameter obligatory. On #915 we discussed the idea of enforcing a range for allowing GROUP BY + aggregates. We could define that as:

CREATE VIEW "./projects/:start/:end" AS
SELECT *
FROM projects
WHERE id BETWEEN current_setting('request.path.start', false) AND current_setting('request.path.end`, false);

And allow doing:

GET /projects/1/1000?select=aggregate(..)&group_by=...

Also related to #1710. One thing good about this way is that we don't need to provide all the query parameters as GUCs, only the path parameters defined in the name of the db object.


One drawback of this approach is that pg identifiers have a max character length:

create or replace function "/projects/:id/clients/:client_id/users/:user_id/subscribers/:subscriber_id"() returns text as $$ select ''::text; $$ language sql;
NOTICE:  identifier "/projects/:id/clients/:client_id/users/:user_id/subscribers/:subscriber_id" will be truncated to "/projects/:id/clients/:client_id/users/:user_id/subscribers/:su"
CREATE FUNCTION

Still useful though and I recall this was an issue pg hackers were interested in solving.

@steve-chavez steve-chavez added enhancement a feature, ready for implementation and removed hygiene cleanup or refactoring labels Apr 22, 2023
@wolfgangwalther
Copy link
Member

I think the above is pretty cool as we can specify the more "conventional" REST routes in a declarative way like:

CREATE VIEW "./projects/:id" AS
SELECT * 
FROM projects
WHERE id = current_setting('request.path.id`, false);

Notice the false in current_setting('request.path.id, false), that means if request.path.id is not defined(meaning the /:id is not specified) the query will fail immediately.

That's neat. We gotta be careful, though: If one transaction / request sets request.path.id, then the value of that setting will be '' in the remainder of the session. That means this would not fail anymore. Example:

postgres=# SELECT current_setting('request.path.id', false);
ERROR:  unrecognized configuration parameter "request.path.id"
postgres=# BEGIN;
BEGIN
postgres=*# SET LOCAL request.path.id TO 'test';
SET
postgres=*# SELECT current_setting('request.path.id', false);
 current_setting
-----------------
 test
(1 row)

postgres=*# ROLLBACK;
ROLLBACK
postgres=# SELECT current_setting('request.path.id', false);
 current_setting
-----------------

(1 row)

If the id field is TEXT, this could even have some unintended consequences.

So we should not encourage the use of current_setting(..., false) here, but rather something like this:

CREATE VIEW "./projects/:id" AS
SELECT * 
FROM projects
WHERE id = nullif(current_setting('request.path.id`, true), '');

This would not throw errors without an id, but at least return consistent results.

@steve-chavez steve-chavez changed the title Confusion about /rpc/ prefix, move stored procedures to root level Customize URL structure - Confusion about /rpc/ prefix, move stored procedures to root level Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement a feature, ready for implementation
Development

No branches or pull requests

6 participants