Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding views with sub-selects #1608

Closed
wolfgangwalther opened this issue Oct 5, 2020 · 1 comment · Fixed by #1632
Closed

Embedding views with sub-selects #1608

wolfgangwalther opened this issue Oct 5, 2020 · 1 comment · Fixed by #1632
Labels
embedding resource embedding

Comments

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Oct 5, 2020

When embedding views, the following breaks the detection algorithm:

-- this works
CREATE VIEW german_films AS
SELECT *,
       (SELECT 'hello') AS world
  FROM films
 WHERE language='de';

-- while this does not work
CREATE VIEW german_films AS
SELECT *,
       COALESCE((SELECT 'hello'), 'bye') AS world
  FROM films
 WHERE language='de';

What happens here is:

  • the SQL query in allSourceColumns tries to parse the view definition in form of a pg_node_tree (https://www.postgresql.org/docs/current/catalog-pg-rewrite.html)
  • to be able to read the embeddable columns, all sub-selects have to be filtered out
  • to filter out the sub-selects a regex is used
  • this regex fails when there is a function call around it (e.g. COALESCE)
  • because the regex fails, no columns are detected at all for embedding

My current solution is to wrap another SELECT around the COALESCE - that works fine.

I have played a bit with changing up the query in allSourceColumns - I think the structure of pg_node_tree does not allow a proper regex solution to really get rid of all sub-selects. However, I was able to apply a couple of transformations to the node tree to bring it to JSON format - which can easily be parsed for the embeddable columns. This was only tested with PG12 so far and also not very extensively, yet. (just noted that it fails on some of the pg_catalog views apparently - so there is definitely some more tweaking needed for the transform_json part)

The query currently looks like this:

with
views as (
select
   n.nspname   as view_schema,
   c.relname   as view_name,
   r.ev_action as view_definition
from pg_class c
join pg_namespace n on n.oid = c.relnamespace
join pg_rewrite r on r.ev_class = c.oid
where c.relkind in ('v', 'm') and n.nspname = 'public' -- replace in haskell with ANY ($1)
),
transform_json as (
   select
     view_schema,
     view_name,
     replace(
      regexp_replace(
       regexp_replace(
        regexp_replace(
         replace(
          replace(
           replace(
            replace(
             view_definition,
            '({', '[{'),
           '})', '}]'),
          '} {', '},{'),
         '"', '\"'),
        '{([A-Z]+)', '{"node":"\1"', 'g'),
       ':([A-Za-z_]+) ([^[{][^:}]*)( |})', ',"\1":"\2"\3', 'g'),
      ':([A-Za-z_]+) ', ',"\1":', 'g'),
     '\ ', ' ')::jsonb AS view_definition
   from views
),
target_lists as (
   select
     view_schema,
     view_name,
     view_definition->0->'targetList' AS target_list
   from transform_json
),
target_entries as (
   select
     view_schema,
     view_name,
     jsonb_array_elements(target_list) as target_entry
   from target_lists
   where jsonb_typeof(target_list) = 'array'
),
results as (
   select
     view_schema,
     view_name,
     target_entry->>'resname' AS view_column_name,
     target_entry->>'resorigtbl' AS resorigtbl,
     target_entry->>'resorigcol' AS resorigcol
   from target_entries
)
select
   sch.nspname as table_schema,
   tbl.relname as table_name,
   col.attname as table_column_name,
   res.view_schema,
   res.view_name,
   res.view_column_name
from results res
join pg_class tbl on tbl.oid::text = res.resorigtbl
join pg_attribute col on col.attrelid = tbl.oid and col.attnum::text = res.resorigcol
join pg_namespace sch on sch.oid = tbl.relnamespace
where resorigtbl <> '0'
order by view_schema, view_name, view_column_name;

In theory this should allow a much more robust parsing of the view definition, but ONLY IF the json transformation works flawlessly.

Just thought I post that here, in case anyone wants to play with this, before I continue to.

@monacoremo
Copy link
Member

This looks super interesting, converting to JSON and leveraging the Postgres JSON parser is a very smart approach!

The tricky part in this query is to get the potentially nested TARGETENTRYs right, converting to JSON solves that beautifully. I was playing with a fast but partially incorrect alternative solution here (on my megaquery2 branch). Your approach seems much more promising.

Some notes:

  • This query is currently one of the performance bottlenecks for getting the DbStructure on large schemas, so we'll need to consider the performance impact carefully.
  • The text format we are working with here is quite stable, e.g. the TARGETENTRY format has not changed in the last 9 years.

@wolfgangwalther wolfgangwalther added the embedding resource embedding label Nov 22, 2020
wolfgangwalther added a commit to wolfgangwalther/postgrest that referenced this issue Dec 10, 2020
resolves PostgREST#1608

Refactors the pfkSourceColumns query to use an intermediate JSON format
for parsing. This allows much more robust extraction of source columns.
wolfgangwalther added a commit to wolfgangwalther/postgrest that referenced this issue Dec 10, 2020
resolves PostgREST#1608

Refactors the pfkSourceColumns query to use an intermediate JSON format
for parsing. This allows much more robust extraction of source columns.
wolfgangwalther added a commit that referenced this issue Dec 10, 2020
resolves #1608

Refactors the pfkSourceColumns query to use an intermediate JSON format
for parsing. This allows much more robust extraction of source columns.
monacoremo pushed a commit to monacoremo/postgrest that referenced this issue Jul 17, 2021
resolves PostgREST#1608

Refactors the pfkSourceColumns query to use an intermediate JSON format
for parsing. This allows much more robust extraction of source columns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embedding resource embedding
Development

Successfully merging a pull request may close this issue.

2 participants