[Feature request] Add join table attributes on M2M resource embedding #2566

maxime-bus · 2021-01-15T08:06:41Z

Hello PostgREST team,

It would be nice, when using resource embedding, to be able to retrieve the data that come from the join table. Here's an example based on the documentation : https://postgrest.org/en/v7.0.0/api.html#resource-embedding

In the example, there is a relationship between actors and films, thanks to the join table roles. Thanks to that, we can fetch actors and the films they acted in.

GET /actors?select=films(title,year) HTTP/1.1

The JSON response would be something like :

[
    {
        "id": "",
        "first_name": "",
        "last_name": "",
        "films": [
            {
                "title": "", 
                "year": ""
            }
        ]
    }
]

But it would be also nice to get the data character coming from table roles, which could give the JSON response :

[
    {
        "id": "",
        "first_name": "",
        "last_name": "",
        "films": [
            {
                "title": "", 
                "year": "",
                "character": ""
            }
        ]
    }
]

Do you think it is possible and a good idea ?

The text was updated successfully, but these errors were encountered:

wolfgangwalther · 2021-01-15T13:16:34Z

Thanks for opening that issue - I already suggested exactly this in some other places. Good to be able to track it in a separate issue, though.

I'm not certain about the best syntax for this, yet. I remember I made some suggestions somewhere, might need have to dig those out. But I don't think they really "clicked", so far.

wolfgangwalther · 2021-01-15T13:27:01Z

I'm not certain about the best syntax for this, yet. I remember I made some suggestions somewhere, might need have to dig those out. But I don't think they really "clicked", so far.

Actually... having said that, the following idea popped up:

We could just add the same hint syntax we're already using for embedding on the column-level, too:

Basic case with distinct names on the join and target tables: GET /actors?select=films(title,year,character)
Name collision case (assume character exists on both tables): GET /actors?select=films(title,year,character!roles)

From the request perspective this would allow a very natural way of handling the distinct cases, why still allowing the more special cases with name collisions. I didn't think too much about wildcards (e.g. *!roles ??) or the SQL side of things at all, yet. I guess this should be do-able, however.

@steve-chavez, WDYT?

steve-chavez · 2021-01-15T18:38:52Z

Basic case with distinct names on the join and target tables: GET /actors?select=films(title,year,character)

@wolfgangwalther Sounds good! We can bring the junction columns to the target table without special syntax.

I've also experimented a bit with the SQL. Right now, with users - users_tasks - users(on fixtures), the following SQL is generated for an m2m embed:

WITH
pgrst_source AS (
  SELECT
    "test"."users".*,
    COALESCE ((
      SELECT json_agg("users".*)
      FROM (
        SELECT
          "test"."tasks".*
        FROM "test"."tasks", "test"."users_tasks"
        WHERE "test"."users"."id" = "test"."users_tasks"."user_id"
        AND   "test"."tasks"."id" = "test"."users_tasks"."task_id"  ) "users")
      , '[]') AS "tasks"
  FROM "test"."users"
)
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source) _postgrest_t;

To support this, we need to alias the tables on the implicit JOIN(was never fan of implicit, but I couldn't remove it sometime ago). Like this:

WITH
pgrst_source AS (
  SELECT
    "test"."users".*,
    COALESCE ((
      SELECT json_agg("users".*)
      FROM (
        SELECT
          x.*,
          y.created_at
        FROM "test"."tasks" x, "test"."users_tasks" y
        WHERE "test"."users"."id" = y."user_id"
        AND   x."id" = y."task_id"  ) "users")
      , '[]') AS "tasks"
  FROM "test"."users"
)
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source) _postgrest_t;

Name collision case (assume character exists on both tables): GET /actors?select=films(title,year,character!roles)

Reusing the syntax for disambiguation sounds good as well. It's a rare use case(duplicating data?) but it's great we're being correct from the start.

steve-chavez · 2021-01-15T18:51:11Z

I didn't think too much about wildcards (e.g. *!roles ??)

Not sure about wildcards as well. I think most users would like (*) to get all the columns including the junction non-PK columns, but perhaps this would break existing clients as they'd be getting more columns suddenly.

The other option is to require the user to be explicit and do (*,jcol1!junction,jcol2!junction) for getting junction cols. Then the *!junction would make more sense. This one seems it'd be easier to implement, and wouldn't break existing clients.

wolfgangwalther · 2021-01-15T18:51:43Z

To support this, we need to alias the tables on the implicit JOIN

I'm not sure I understand why we need the aliases. In fact I think they make it harder, because we now need to know which of the two table a column that is listed in the request belongs to.

Can't we just remove the schema/table qualification in this part of the query like so?

WITH
pgrst_source AS (
  SELECT
    "test"."users".*,
    COALESCE ((
      SELECT json_agg("users".*)
      FROM (
        SELECT
          "tasks".*,
          "created_at"
        FROM "test"."tasks", "test"."users_tasks"
        WHERE "test"."users"."id" = "test"."users_tasks"."user_id"
        AND   "test"."tasks"."id" = "test"."users_tasks"."task_id"  ) "users")
      , '[]') AS "tasks"
  FROM "test"."users"
)
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source) _postgrest_t;

In this case PG itself would error with an ambiguous column reference error (or whatever it is called) if "created_at" were to exist in both tables. We would just need to catch that and map it to some 3xx response, similar to what we do with the embedding itself. We would not need any information about which table a given column belongs to, however.

Mapping the !roles hint is then basically just adding a table qualification back in?

wolfgangwalther · 2021-01-15T19:07:30Z

Not sure about wildcards as well. I think most users would like (*) to get all the columns including the junction non-PK columns, but perhaps this would break existing clients as they'd be getting more columns suddenly.

I agree with both. Also, getting just the non-PK (non-FK in fact) columns requires more work/knowledge about the schema.

The other option is to require the user to be explicit and do (*,jcol1!junction,jcol2!junction) for getting junction cols. Then the *!junction would make more sense. This one seems it'd be easier to implement, and wouldn't break existing clients.

Ah, that seems overly complicated. I think the default case should allow to select columns just by name from both tables, without any hint.

What about:

(*) - all columns of just the target table (backwards compatible)
(*!<target>) - same
(*!<junction>) - all columns of just the junction table
(<column>) - can refer to either junction or target, will error if the same name exists in both
(<column>!<target>) - just this column from the target table
(<column>!<junction>) - just this column from the junction table

We could use an "empty hint shortcut" like this:

(*!) - the magical select aka "all target columns + non-join-columns from the junction"

steve-chavez · 2021-01-15T19:52:00Z

Also, getting just the non-PK (non-FK in fact) columns requires more work/knowledge about the schema.

Note that something like this is being done for the returning part:

postgrest/src/PostgREST/DbRequestBuilder.hs

Lines 321 to 324 in ba86b47

    
           returningCols :: ReadRequest -> [FieldName] -> [FieldName] 
        
           returningCols rr@(Node _ forest) pkCols 
        
             -- if * is part of the select, we must not add pk or fk columns manually - otherwise those would be selected and output twice 
        
             | "*" `elem` fldNames = ["*"]

So that isn't too bad, and it has been working good.

What about:

Looks good! My only concern about adding more syntax is that it adds complexity and consequently more documentation. The embedding disambiguation is already a bit complicated.

But since the need for disambiguating junctions cols is rare, I think it's all good.

We could use an "empty hint shortcut" like this:
(*!)

Hm, I was hoping to eventually document our query grammar and have a spec(discussed on #1398 (comment)). The *! adds ambiguity(! followed by whitespace?) and it looks too magic. I'd prefer consistency and being explicit about the wanted junction cols.

maxime-bus · 2021-02-15T11:16:31Z

Hi folks !

Any news about the progress of this topic :) ?

wolfgangwalther · 2021-02-15T14:56:54Z

Any news about the progress of this topic :) ?

My feeling is that we are currently still in the "design-phase" of that. I don't really have any open questions right now, but nothing is carved in stone either. My feeling is that we do already have quite a few other issues / features ideas open - with some of those a little bit more planned out already. Therefore, I wouldn't bet on this being implemented soon - we tend to have a lot more good ideas than we can actually work on, so if anyone wants to step up and contribute that would be very much welcomed.

maxime-bus · 2021-02-20T08:32:19Z

Thank you for the reply. I would like to contribute on that one if possible. Have you some information to give in order to dive into the project ?

Thank you !

wolfgangwalther · 2021-02-22T17:43:39Z

I suggest you start off with setting up your local dev environment as described in https://github.com/PostgREST/postgrest/blob/main/nix/README.md.

Then you should add some test-cases, probably to https://github.com/PostgREST/postgrest/blob/main/test/Feature/QuerySpec.hs. I just noticed that we're missing some basic m2m embedding tests there - those seem to be added only for update and rpc right now. So it might be worth adding a few of those, too. In any case your test-cases to request those junction columns should fail at first.

At this point, you might enable statement_logging on the database and check the query that's generated for some of the working m2m relationships to figure out which changes to the SQL query you will need to make to be able to support those requests.

After you got the query down, you'll need to dive into the codebase itself and probably add some stuff to the query parsing and then do the statement building parts.

maxime-bus · 2021-03-08T17:53:21Z

Hi PostgREST team,

I think I over estimated myself for that task, I struggle to complete it.

So I'll stop working on this topic, but I'll continue to learn the codebase :).

wolfgangwalther · 2021-03-09T19:14:04Z

@maxime-bus You're right - the whole embedding is a bit more complex and not too easy to understand when you start with the codebase. We do have a few issues tagged "beginner" - if you want to pick up any of those to start learning the codebase! https://github.com/PostgREST/postgrest/labels/beginner

steve-chavez · 2022-05-03T19:48:23Z

Flattening could solve this one:

GET /actors?select=*,films:roles(..films(*),character) HTTP/1.1

Looks more flexible than the above, though a bit more verbose.

wolfgangwalther · 2022-08-11T19:03:40Z

While this is technically true, that wouldn't really be M2M embedding anymore, but rather nested embedding of O2M + M2O. So I'd still consider this a separate feature.

steve-chavez · 2022-11-17T02:51:21Z

Should be possible with #2564 as described above.

While this is technically true, that wouldn't really be M2M embedding anymore, but rather nested embedding of O2M + M2O. So I'd still consider this a separate feature.

I think the above proposal is not needed anymore because:

spread embedding M2Ms by doing M2O+O2M is much more flexible for disambiguation
- It can solve more complex cases where multiple hints are required as seen on many-to-many to the same table(recursive m2m) #1907 (comment).
- It can pick the junction explicitly without a !hint. It saves a special case in the hint syntax here.
Without the complexity of all the added syntax(see above) it can include or ignore the junction columns.
- Doing GET /users?select=tasks:users_tasks(..tasks(*))' gives the same result as doing GET /users?select=tasks(*), which is not much more work.
- Doing GET /users?select=tasks:users_tasks(*,.tasks(*))' gives exactly the result wanted on this feature request.
Maybe if this syntax had existed from the start of resource embedding we could have gotten away without a M2M shortcut. Much less "magic" and potential for ambiguous embeds errors.

~~So I'm tentatively moving this issue to docs to add a section of how it can be done with the new spread embed.~~

Edit: On second thought, I'll add a test that proves this works with spread embedding and then close the issue.

@wolfgangwalther LMK if you agree.

wolfgangwalther changed the title ~~[Feature request] Add join table attributes on resource embedding~~ [Feature request] Add join table attributes on M2M resource embedding Jan 15, 2021

wolfgangwalther mentioned this issue Feb 1, 2022

Computed relationships - First-class computed/virtual columns - a generalization of foreign-key-based resource embedding #2144

Closed

steve-chavez transferred this issue from PostgREST/postgrest Nov 17, 2022

steve-chavez transferred this issue from PostgREST/postgrest-docs Nov 17, 2022

steve-chavez mentioned this issue Nov 17, 2022

Add new tests for spread embeds #2567

Merged

steve-chavez closed this as completed in #2567 Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Add join table attributes on M2M resource embedding #2566

[Feature request] Add join table attributes on M2M resource embedding #2566

maxime-bus commented Jan 15, 2021

wolfgangwalther commented Jan 15, 2021

wolfgangwalther commented Jan 15, 2021

steve-chavez commented Jan 15, 2021

steve-chavez commented Jan 15, 2021

wolfgangwalther commented Jan 15, 2021

wolfgangwalther commented Jan 15, 2021

steve-chavez commented Jan 15, 2021

maxime-bus commented Feb 15, 2021

wolfgangwalther commented Feb 15, 2021

maxime-bus commented Feb 20, 2021

wolfgangwalther commented Feb 22, 2021

maxime-bus commented Mar 8, 2021

wolfgangwalther commented Mar 9, 2021

steve-chavez commented May 3, 2022 •

edited

wolfgangwalther commented Aug 11, 2022

steve-chavez commented Nov 17, 2022 •

edited

[Feature request] Add join table attributes on M2M resource embedding #2566

[Feature request] Add join table attributes on M2M resource embedding #2566

Comments

maxime-bus commented Jan 15, 2021

wolfgangwalther commented Jan 15, 2021

wolfgangwalther commented Jan 15, 2021

steve-chavez commented Jan 15, 2021

steve-chavez commented Jan 15, 2021

wolfgangwalther commented Jan 15, 2021

wolfgangwalther commented Jan 15, 2021

steve-chavez commented Jan 15, 2021

maxime-bus commented Feb 15, 2021

wolfgangwalther commented Feb 15, 2021

maxime-bus commented Feb 20, 2021

wolfgangwalther commented Feb 22, 2021

maxime-bus commented Mar 8, 2021

wolfgangwalther commented Mar 9, 2021

steve-chavez commented May 3, 2022 • edited

wolfgangwalther commented Aug 11, 2022

steve-chavez commented Nov 17, 2022 • edited

steve-chavez commented May 3, 2022 •

edited

steve-chavez commented Nov 17, 2022 •

edited