-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For discussion: Using a lib.sql
to factor out many of the large hasql queries
#1511
Conversation
f320b2a
to
5133f8d
Compare
5133f8d
to
8f4dba8
Compare
Example for refactoring This could also be great for debugging and diagnosing user issues, e.g. you can run in psql:
and see exactly what 'PostgREST sees'. |
@@ -0,0 +1,18 @@ | |||
-- Make sure that we can create temporary functions if we are in a transaction. | |||
set transaction read write; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh.. your proposal seems like a really nice idea for maintainability, also more ergonomic development.
But this one seems like a road block(not sure if there's a workaround). A big use case for postgrest is to be able to run in read-only replicas. This transaction level would reject the create function
:
BEGIN ISOLATION LEVEL READ COMMITTED READ ONLY;
create or replace function pg_temp.postgrest_numeric_precision(a pg_attribute, t pg_type)
returns integer
language sql
as $$
select
information_schema._pg_char_max_length(
information_schema._pg_truetypid(a.*, t.*),
information_schema._pg_truetypmod(a.*, t.*)
)
$$;
ERROR: 25006: cannot execute CREATE FUNCTION in a read-only transaction
LOCATION: PreventCommandIfReadOnly, utility.c:246
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm sorry. You mentioned:
default_transaction_read_only could be set to on, but begin read write; or set transaction read write; seems to override that in all cases
Would set transaction read write
work in read replicas? We would have to test that.
Looks really nice for maintainability, but the unknown unknowns scare me. We could invalidate use cases. Not sure if read replicas would work. We'd have to look into this with much care. If merged, a new major version would have to be released. Also, I'd like to mention that I think it's possible to have one single query for the whole schema cache. This query was shared a long time ago: https://gist.github.com/steve-chavez/eae6a67ec81b195c133bcb9ff0c917fb. So that could alleviate the need of having too many separate sql queries and replace some of the haskell code. It would likely be faster as well. There was also some discussion regarding making that a materialized VIEW, which could speed up start-ups. Though our query is fast for now. Mentioning that because it might be another avenue for better maintainability and perf. |
@steve-chavez You're right, for read only replicas this would likely lead to problems. Too bad, but thank you for looking at this so quickly! Technically it would likely work also for those, as it's not so much different than prepared statements that also work and create server side objects. But still it might not work by default, which would create too many issues. Having everything in one query if definitely doable! I'm not sure if that huge query would be more maintainable, but there is definitely also a lot of data wrangling going on the Haskell side that would probably be easier to express in SQL. That's probably the best direction - maybe splitting up the table/column/pk and procs side. Materialized views would also require write access, I think that might held up by the same issues as the lib approach, no? |
@monacoremo It's a bummer that Regarding maintainability, I think the best we can do for now is to use CTEs with meaningful names and comments. The idea about splitting them into files is certainly feasible(without using
Yes, it was mostly an idea. To let the user specify a materialized view in a config option and somehow provide them our schema cache query so they can create the mat view manually. But the schema cache query has gotten really fast over the years so we no longer had a strong need for that. Making it more fast would still be great though. Would make #1512 more feasible |
@steve-chavez just played around with that, it's not even that difficult to get everything as JSON from one query: https://gist.github.com/monacoremo/baeff41fd7fde418d6684cfe4988c7d0 (pretty much a 1:1 translation of Now if we could use Aeson to automatically derive a decoder for that output based on the type definitions... Should be a fun challenge :-) |
@monacoremo Nice! You made that easy! Just in case you didn't notice this. In https://gist.github.com/monacoremo/baeff41fd7fde418d6684cfe4988c7d0#file-dbstructure-sql-L655-L656 there's a string that's conditionally passed with Haskell code. You could make that logic in sql instead: -- change below
with
pg_version as (
SELECT
current_setting('server_version_num')::integer as server_version_num,
current_setting('server_version') as server_version
),
views as (
select
n.nspname as view_schema,
c.relname as view_name,
r.ev_action as view_definition
from pg_class c
join pg_namespace n on n.oid = c.relnamespace
join pg_rewrite r on r.ev_class = c.oid
where (c.relkind in ('v', 'm')) and n.nspname = 'test'
),
removed_subselects as(
select
view_schema, view_name,
regexp_replace(view_definition,
-- this would be the change
case when (select server_version_num from pg_version) < 100000
then ':subselect {.*?:constraintDeps <>} :location \d+} :res(no|ult)'
else ':subselect {.*?:stmt_len 0} :location \d+} :res(no|ult)'
end,
--
'', 'g') as x
from views
),
... |
@steve-chavez working proof of concept here: https://github.com/monacoremo/postgrest/tree/monacoremo-megaquery?files=1 A little bit messy, but it's just a quick experiment and it works! Have yet to plug in your sql solution for the conditional string, which is mich nicer! This could be very promising - getting everything in one query could be faster, but also easier to introspect, debug and improve further. And Aeson could replace all the manual decoders in a robust way (keys instead of order-dependent tuples). |
@steve-chavez This could potentially ease development and increase maintainability and maybe also extensibility - by allow to override/replace some of those postgrest sql functions. All the schema cache stuff could go in there, but potentially also (parts of) the main queries. This extension could even be auto-installed by PostgREST, although a manual If those functions are pre-defined in the database, it shouldn't be a problem for the read-replica case etc. Also conceptually it makes sense to have parts of the code in postgres itself - because thats exactly what PostgREST is all about ;) I'm sure there is some blocker that I am missing? |
@wolfgangwalther I think a pg extension would be a good idea! I also talked about that intention with Remo.
Yes, I believe the schema cache should be the first step. We want to convert all of the DbStructure logic into pure SQL in #1513. I believe it's possible, but we need to put more work in it. If we can't make it a single query, perhaps we can export a set of views in the extension. Making the schema cache an extension would also have the immediate benefit of letting us produce the OpenAPI output in SQL(which would help with many issues, like #1082 (comment)).
You're right about the read-replica. I think once we have that extension, PostgREST could detect it and call the views(maybe functions later) from there. Otherwise, keep working as usual. Advanced users that want to override some behavior would install the extension manually(to not be invasive and require high privileges from the |
What I really like about PostgREST is that it's very simple to use and non-invasive concerning the database. There's no special configuration needed on the database side for simple features. PostgREST is stateless in the sense that it does not have to build up any database state. As far as I understand, there is no technical need (e.g. you want to save data) for you to create the views and functions in the database. It's for performance reasons, simplifications and customization. Here's my thoughts on the custom extension idea: Pro:
Contra:
This looks like a bad trade. |
Fair points @LorenzHenk. I was mostly interested on the pg extension because it could provide a way for users to refine their OpenAPI in SQL. But now I'm thinking we should try to do this at the Haskell level. |
I have not given up completely on this idea, yet, because imho the upside is a lot bigger than just the OpenAPI stuff. Going through various ideas on how to implement some features / fix bugs, I'm quite often at a point, where I would just need to have a small plpgsql function defined to make things possible at all. Focusing on two of your contras, @LorenzHenk:
Yes, I came to the same conclusion regarding a real "extension". That won't work and is too complicated. If we just designated a However, I agree that we need a good fallback for the case where those privileges are not there. It would be good to still be able to just put PostgREST in front of an existing database and have it work somehow (although I don't really see that happening without any intervention at all - you still need to create those authenticator and anonymous users, right?). I'm not sure about
We could do the following:
This would allow users to create those objects with the same name in a different schema - one which will be placed before |
This this not actually meant for merging, but I wanted to discuss with you if this could be a viable approach.
The large queries e.g. in
DbStructure.hs
are partly so convoluted because we cannot define our own functions. In PostgreSQL there is a sparsely documented featurepg_temp
, a temporary per-session schema that tables, views and also functions can be defined on.pg_temp
is an alias for a per-session schema namedpg_temp_nnn
, which is automatically created on first use. It's implemented innamespace.c
and referenced in a few places in the Postgres docs.The idea is to have a PostgREST
lib.sql
that is loaded in each new session that defines functions (and maybe views?) that are useful for the DbStructure queries and maybe also requests. This PR contains a small proof of concept where a part of theallColumns
query is factored out into a function.Benefits of the approach:
test/with_tmp_db psql -f src/PostgREST/lib.sql
and it's easy to try out orexplain analyze
individual functions.libtests.sql
that tests all the functions using e.g.pgtap
dbstructure
andrequests
).Issues that I think should be possible to fix:
lib.sql
in would be - as a quick hack I plugged it intogetDbStructure
and made the transactions writable (it's a bit scary that this works...). There should be a better way to do this! Is there a way to hook into the creation of new connections withhasql-pool
?embed-file
approach I picked?Possibly more fundamental issues:
pg_temp
in all use cases?default_transaction_read_only
could be set toon
, butbegin read write;
orset transaction read write;
seems to override that in all casestemporary
is a default privilege on databases, see here. Users could have executedrevoke temporary on database [...] from [role];
. This could be fixed withgrant temporary ...
. Would the case for increased maintainability and performance be enough to risk blocking the (presumably few) users who revoked this?I tested this with
postgrest-test-spec-all
, the proof of concept works for all PostgreSQL versions since 9.4.