Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Add join table attributes on M2M resource embedding #2566

Closed
maxime-bus opened this issue Jan 15, 2021 · 16 comments · Fixed by #2567
Closed

[Feature request] Add join table attributes on M2M resource embedding #2566

maxime-bus opened this issue Jan 15, 2021 · 16 comments · Fixed by #2567

Comments

@maxime-bus
Copy link

Hello PostgREST team,

It would be nice, when using resource embedding, to be able to retrieve the data that come from the join table. Here's an example based on the documentation : https://postgrest.org/en/v7.0.0/api.html#resource-embedding

In the example, there is a relationship between actors and films, thanks to the join table roles. Thanks to that, we can fetch actors and the films they acted in.

GET /actors?select=films(title,year) HTTP/1.1

The JSON response would be something like :

[
    {
        "id": "",
        "first_name": "",
        "last_name": "",
        "films": [
            {
                "title": "", 
                "year": ""
            }
        ]
    }
]

But it would be also nice to get the data character coming from table roles, which could give the JSON response :

[
    {
        "id": "",
        "first_name": "",
        "last_name": "",
        "films": [
            {
                "title": "", 
                "year": "",
                "character": ""
            }
        ]
    }
]

Do you think it is possible and a good idea ?

@wolfgangwalther wolfgangwalther changed the title [Feature request] Add join table attributes on resource embedding [Feature request] Add join table attributes on M2M resource embedding Jan 15, 2021
@wolfgangwalther
Copy link
Member

Thanks for opening that issue - I already suggested exactly this in some other places. Good to be able to track it in a separate issue, though.

I'm not certain about the best syntax for this, yet. I remember I made some suggestions somewhere, might need have to dig those out. But I don't think they really "clicked", so far.

@wolfgangwalther
Copy link
Member

I'm not certain about the best syntax for this, yet. I remember I made some suggestions somewhere, might need have to dig those out. But I don't think they really "clicked", so far.

Actually... having said that, the following idea popped up:

We could just add the same hint syntax we're already using for embedding on the column-level, too:

  • Basic case with distinct names on the join and target tables: GET /actors?select=films(title,year,character)
  • Name collision case (assume character exists on both tables): GET /actors?select=films(title,year,character!roles)

From the request perspective this would allow a very natural way of handling the distinct cases, why still allowing the more special cases with name collisions. I didn't think too much about wildcards (e.g. *!roles ??) or the SQL side of things at all, yet. I guess this should be do-able, however.

@steve-chavez, WDYT?

@steve-chavez
Copy link
Member

Basic case with distinct names on the join and target tables: GET /actors?select=films(title,year,character)

@wolfgangwalther Sounds good! We can bring the junction columns to the target table without special syntax.

I've also experimented a bit with the SQL. Right now, with users - users_tasks - users(on fixtures), the following SQL is generated for an m2m embed:

WITH
pgrst_source AS (
  SELECT
    "test"."users".*,
    COALESCE ((
      SELECT json_agg("users".*)
      FROM (
        SELECT
          "test"."tasks".*
        FROM "test"."tasks", "test"."users_tasks"
        WHERE "test"."users"."id" = "test"."users_tasks"."user_id"
        AND   "test"."tasks"."id" = "test"."users_tasks"."task_id"  ) "users")
      , '[]') AS "tasks"
  FROM "test"."users"
)
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source) _postgrest_t;

To support this, we need to alias the tables on the implicit JOIN(was never fan of implicit, but I couldn't remove it sometime ago). Like this:

WITH
pgrst_source AS (
  SELECT
    "test"."users".*,
    COALESCE ((
      SELECT json_agg("users".*)
      FROM (
        SELECT
          x.*,
          y.created_at
        FROM "test"."tasks" x, "test"."users_tasks" y
        WHERE "test"."users"."id" = y."user_id"
        AND   x."id" = y."task_id"  ) "users")
      , '[]') AS "tasks"
  FROM "test"."users"
)
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source) _postgrest_t;

Name collision case (assume character exists on both tables): GET /actors?select=films(title,year,character!roles)

Reusing the syntax for disambiguation sounds good as well. It's a rare use case(duplicating data?) but it's great we're being correct from the start.

@steve-chavez
Copy link
Member

I didn't think too much about wildcards (e.g. *!roles ??)

Not sure about wildcards as well. I think most users would like (*) to get all the columns including the junction non-PK columns, but perhaps this would break existing clients as they'd be getting more columns suddenly.

The other option is to require the user to be explicit and do (*,jcol1!junction,jcol2!junction) for getting junction cols. Then the *!junction would make more sense. This one seems it'd be easier to implement, and wouldn't break existing clients.

@wolfgangwalther
Copy link
Member

To support this, we need to alias the tables on the implicit JOIN

I'm not sure I understand why we need the aliases. In fact I think they make it harder, because we now need to know which of the two table a column that is listed in the request belongs to.

Can't we just remove the schema/table qualification in this part of the query like so?

WITH
pgrst_source AS (
  SELECT
    "test"."users".*,
    COALESCE ((
      SELECT json_agg("users".*)
      FROM (
        SELECT
          "tasks".*,
          "created_at"
        FROM "test"."tasks", "test"."users_tasks"
        WHERE "test"."users"."id" = "test"."users_tasks"."user_id"
        AND   "test"."tasks"."id" = "test"."users_tasks"."task_id"  ) "users")
      , '[]') AS "tasks"
  FROM "test"."users"
)
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source) _postgrest_t;

In this case PG itself would error with an ambiguous column reference error (or whatever it is called) if "created_at" were to exist in both tables. We would just need to catch that and map it to some 3xx response, similar to what we do with the embedding itself. We would not need any information about which table a given column belongs to, however.

Mapping the !roles hint is then basically just adding a table qualification back in?

@wolfgangwalther
Copy link
Member

Not sure about wildcards as well. I think most users would like (*) to get all the columns including the junction non-PK columns, but perhaps this would break existing clients as they'd be getting more columns suddenly.

I agree with both. Also, getting just the non-PK (non-FK in fact) columns requires more work/knowledge about the schema.

The other option is to require the user to be explicit and do (*,jcol1!junction,jcol2!junction) for getting junction cols. Then the *!junction would make more sense. This one seems it'd be easier to implement, and wouldn't break existing clients.

Ah, that seems overly complicated. I think the default case should allow to select columns just by name from both tables, without any hint.

What about:

  • (*) - all columns of just the target table (backwards compatible)
  • (*!<target>) - same
  • (*!<junction>) - all columns of just the junction table
  • (<column>) - can refer to either junction or target, will error if the same name exists in both
  • (<column>!<target>) - just this column from the target table
  • (<column>!<junction>) - just this column from the junction table

We could use an "empty hint shortcut" like this:

  • (*!) - the magical select aka "all target columns + non-join-columns from the junction"

@steve-chavez
Copy link
Member

Also, getting just the non-PK (non-FK in fact) columns requires more work/knowledge about the schema.

Note that something like this is being done for the returning part:

returningCols :: ReadRequest -> [FieldName] -> [FieldName]
returningCols rr@(Node _ forest) pkCols
-- if * is part of the select, we must not add pk or fk columns manually - otherwise those would be selected and output twice
| "*" `elem` fldNames = ["*"]

So that isn't too bad, and it has been working good.

What about:

Looks good! My only concern about adding more syntax is that it adds complexity and consequently more documentation. The embedding disambiguation is already a bit complicated.

But since the need for disambiguating junctions cols is rare, I think it's all good.

We could use an "empty hint shortcut" like this:
(*!)

Hm, I was hoping to eventually document our query grammar and have a spec(discussed on #1398 (comment)). The *! adds ambiguity(! followed by whitespace?) and it looks too magic. I'd prefer consistency and being explicit about the wanted junction cols.

@maxime-bus
Copy link
Author

Hi folks !

Any news about the progress of this topic :) ?

@wolfgangwalther
Copy link
Member

Any news about the progress of this topic :) ?

My feeling is that we are currently still in the "design-phase" of that. I don't really have any open questions right now, but nothing is carved in stone either. My feeling is that we do already have quite a few other issues / features ideas open - with some of those a little bit more planned out already. Therefore, I wouldn't bet on this being implemented soon - we tend to have a lot more good ideas than we can actually work on, so if anyone wants to step up and contribute that would be very much welcomed.

@maxime-bus
Copy link
Author

Thank you for the reply. I would like to contribute on that one if possible. Have you some information to give in order to dive into the project ?

Thank you !

@wolfgangwalther
Copy link
Member

I suggest you start off with setting up your local dev environment as described in https://github.com/PostgREST/postgrest/blob/main/nix/README.md.

Then you should add some test-cases, probably to https://github.com/PostgREST/postgrest/blob/main/test/Feature/QuerySpec.hs. I just noticed that we're missing some basic m2m embedding tests there - those seem to be added only for update and rpc right now. So it might be worth adding a few of those, too. In any case your test-cases to request those junction columns should fail at first.

At this point, you might enable statement_logging on the database and check the query that's generated for some of the working m2m relationships to figure out which changes to the SQL query you will need to make to be able to support those requests.

After you got the query down, you'll need to dive into the codebase itself and probably add some stuff to the query parsing and then do the statement building parts.

@maxime-bus
Copy link
Author

Hi PostgREST team,

I think I over estimated myself for that task, I struggle to complete it.

So I'll stop working on this topic, but I'll continue to learn the codebase :).

@wolfgangwalther
Copy link
Member

@maxime-bus You're right - the whole embedding is a bit more complex and not too easy to understand when you start with the codebase. We do have a few issues tagged "beginner" - if you want to pick up any of those to start learning the codebase! https://github.com/PostgREST/postgrest/labels/beginner

@steve-chavez
Copy link
Member

steve-chavez commented May 3, 2022

Flattening could solve this one:

GET /actors?select=*,films:roles(..films(*),character) HTTP/1.1

Looks more flexible than the above, though a bit more verbose.

@wolfgangwalther
Copy link
Member

While this is technically true, that wouldn't really be M2M embedding anymore, but rather nested embedding of O2M + M2O. So I'd still consider this a separate feature.

@steve-chavez
Copy link
Member

steve-chavez commented Nov 17, 2022

Should be possible with #2564 as described above.

While this is technically true, that wouldn't really be M2M embedding anymore, but rather nested embedding of O2M + M2O. So I'd still consider this a separate feature.

I think the above proposal is not needed anymore because:

  • spread embedding M2Ms by doing M2O+O2M is much more flexible for disambiguation
  • Without the complexity of all the added syntax(see above) it can include or ignore the junction columns.
    • Doing GET /users?select=tasks:users_tasks(..tasks(*))' gives the same result as doing GET /users?select=tasks(*), which is not much more work.
    • Doing GET /users?select=tasks:users_tasks(*,.tasks(*))' gives exactly the result wanted on this feature request.
  • Maybe if this syntax had existed from the start of resource embedding we could have gotten away without a M2M shortcut. Much less "magic" and potential for ambiguous embeds errors.

So I'm tentatively moving this issue to docs to add a section of how it can be done with the new spread embed.

Edit: On second thought, I'll add a test that proves this works with spread embedding and then close the issue.

@wolfgangwalther LMK if you agree.

@steve-chavez steve-chavez transferred this issue from PostgREST/postgrest Nov 17, 2022
@steve-chavez steve-chavez transferred this issue from PostgREST/postgrest-docs Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants