When embedding with top-level filtering (inner join), empty parentheses should be allowed #2340

Pigeo · 2022-06-20T14:43:27Z

Environment

PostgreSQL version: 12.10 (Debian 12.10-1.pgdg100+1)
PostgREST version: 9.0.1 (linux-static-x64)
Operating system: Debian Buster (10.12)

Description of issue

Trying to do :

http://127.0.0.1:3000/film?select=*,film_category!inner()

I get an error message column film.film_category does not exist.

Firstly, the error message is out of touch, since we are performing an INNER JOIN and the columns / relation does exist.

Secondly, it should not trigger an error: this query is perfectly legitimate and feasible ("I want all the films that do have a category, but I'm not interested in knowing what are their precise categories, therefore I would like to have a reduced JSON, not containing the film_category data, i.e. just filtering the top-level on "has a film_category", but not embedding the data)

What I have to do as a work-around: put any dummy column inside the parentheses, so that I don't get an error message anymore:

http://127.0.0.1:3000/film?select=*,film_category!inner(film_id)

And then I get a fat JSON body response, where I'm trashing all this embedded data... (waste of bandwidth + slow web app because of that)

Steps to reproduce

Let's take a sample database from https://www.postgresqltutorial.com/postgresql-sample-database/.
The relevant schema is as follows:

CREATE TABLE public.film (
    film_id integer DEFAULT nextval('public.film_film_id_seq'::regclass) NOT NULL,
    title character varying(255) NOT NULL,
    description text,
    release_year public.year,
    language_id smallint NOT NULL,
    rental_duration smallint DEFAULT 3 NOT NULL,
    rental_rate numeric(4,2) DEFAULT 4.99 NOT NULL,
    length smallint,
    replacement_cost numeric(5,2) DEFAULT 19.99 NOT NULL,
    rating public.mpaa_rating DEFAULT 'G'::public.mpaa_rating,
    last_update timestamp without time zone DEFAULT now() NOT NULL,
    special_features text[],
    fulltext tsvector NOT NULL
);
CREATE TABLE public.category (
    category_id integer DEFAULT nextval('public.category_category_id_seq'::regclass) NOT NULL,
    name character varying(25) NOT NULL,
    last_update timestamp without time zone DEFAULT now() NOT NULL
);
CREATE TABLE public.film_category (
    film_id smallint NOT NULL,
    category_id smallint NOT NULL,
    last_update timestamp without time zone DEFAULT now() NOT NULL
);
ALTER TABLE ONLY public.film_category
    ADD CONSTRAINT film_category_category_id_fkey FOREIGN KEY (category_id) REFERENCES public.category(category_id) ON UPDATE CASCADE ON DELETE RESTRICT;
ALTER TABLE ONLY public.film_category
    ADD CONSTRAINT film_category_film_id_fkey FOREIGN KEY (film_id) REFERENCES public.film(film_id) ON UPDATE CASCADE ON DELETE RESTRICT;

Populate those 3 tables as provided on this page, but then remove the category of some films, for example with:

DELETE FROM film_category WHERE category_id > 1; -- only 64 films will keep their category

Now suppose you're interested to know what are those 64 films which still has a category, but you don't want to embed their categories.

Suggestions / Discussion

If ever it's only a problem of syntax / parsing of the query string, not allowing any empty parentheses, then may I suggest to use a special keyword that would mean "no columns at all". It could be some reserved keyword from the SQL syntax, such as "NULL", so that we're pretty sure nobody would ever have a column with such a name in his/her database:

http://127.0.0.1:3000/film?select=*,film_category!inner(NULL)

(Currently says "column film_category.NULL does not exist", which is true)

The text was updated successfully, but these errors were encountered:

steve-chavez · 2022-06-20T22:19:08Z

And then I get a fat JSON body response, where I'm trashing all this embedded data... (waste of bandwidth + slow web app because of that)

I see your point.

http://127.0.0.1:3000/film?select=*,film_category!inner()

We were planning to change the !inner syntax to exists, this would allow omitting film_category in the select. So how about:

GET /film?exists=film_category

Pigeo · 2022-06-21T09:08:40Z

I also need to do things like that:

http://127.0.0.1:3000/film?select=*,film_category!inner()&film_category.category_id=not.eq.1

(Actually, that's the vast majority of my use cases. The former one, without any additional filter criteria, is rarer, and was cited here for simplification purposes)

As far as I understand how PostgREST works, currently I need to include film_category!inner() in the select=…, otherwise my filter criteria film_category.category_id=not.eq.1 is simply ignored and discarded without any notification or any error message.

How would you do that with your proposed syntax ?

(It would be great if PostgREST could automatically infer the necessity to perform an INNER JOIN from the bare presence of a filter criteria starting with my_other_table_name.column_name=…)

Pigeo · 2022-06-21T09:56:10Z

To give some figures, in my production database, the equivalent of the film table in the above example contains ~1,000 rows, and the equivalent of the film_category table above contains ~1,000,000 rows (because it's not a N-to-N table).

Therefore, when using this work-around:

GET http://127.0.0.1:3000/film?select=*,film_category!inner(film_id) HTTP 1/1

I get a 12 MB JSON response, containing thousands of useless lines...

Whereas with a theoretical:

GET http://127.0.0.1:3000/film?select=*,film_category!inner() HTTP 1/1

The JSON response would be only 80 KB...

So, that makes a great difference!

Pigeo · 2022-06-21T12:52:40Z

And then I get a fat JSON body response, where I'm trashing all this embedded data... (waste of bandwidth + slow web app because of that)

I see your point.

http://127.0.0.1:3000/film?select=*,film_category!inner()

We were planning to change the !inner syntax to exists, this would allow omitting film_category in the select. So how about:
GET /film?exists=film_category

Thinking about it once again: if you try to make an analogy with the SQL syntax, then I understand that the select=… part of the query should not change which rows are retrieved or not, and it should only change which columns are presents or not in the response body, and as such, the INNER JOIN should be located elsewhere than in the select=…. But the same argument can made about the WHERE clause (or what resembles it) : I'm not sure it's a good idea to rely on the content of the WHERE clause to determine if an INNER JOIN should be performed or not (to me, something like exists=film_category clearly is something similar to what can be found in the WHERE clause of an SQL query).

Maybe it is not for nothing that there is also a FROM clause in the SQL syntax... Perhaps you should consider adding some equivalent of the FROM clause for specifying which INNER JOINs should be performed?

steve-chavez · 2022-06-21T17:30:31Z

I also need to do things like that:
http://127.0.0.1:3000/film?select=*,film_category!inner()&film_category.category_id=not.eq.1
How would you do that with your proposed syntax ?

Yes, that would also be possible with exists:

GET /film?exists=film_category&film_category.category_id=not.eq.1

As far as I understand how PostgREST works, currently I need to include film_category!inner() in the select=…, otherwise my filter criteria film_category.category_id=not.eq.1 is simply ignored and discarded without any notification or any error message.

That one was fixed in #2133, v10 will include it.

I'm not sure it's a good idea to rely on the content of the WHERE clause to determine if an INNER JOIN should be performed or not (to me, something like exists=film_category clearly is something similar to what can be found in the WHERE clause of an SQL query).

Check the samples of the generated queries in #1978 (comment), it's actually a LATERAL JOIN plus json_agg. For now the best syntax we could come up is the exists one, but feel free to suggest another one that's clearer.

One other thing we could try to support with exists is doing OR across different tables: #2014 (comment)

steve-chavez · 2022-08-15T20:28:08Z

Just noted that the anti-join mentioned on #1949 (comment) could be supported with a not.exists filter.

We could also add an /tbl?select=*,embed!anti(*) option similar to the !inner but I don't think it would make sense because the embed would always be empty.

Pigeo · 2022-08-17T09:18:05Z

Also considering #1949 (comment), that also suggests to just go with the non-breaking changes for now - and then open another issue where we can discuss a "new query syntax" that could get a few things straight:

I understand you're thinking about a new query syntax, for the long term, that would introduce a breaking change (for ex. by adding a new exists filter)
on the other hand, my initial request was just to relax a little bit the constraints on the current query syntax, so that empty parentheses should be allowed for embeddings. This does not introduce any breaking change, and thus could be done as of now, without having to wait for a new query syntax with breaking changes.

Since both things are independent of each other, would you please consider allowing empty parenthesis in the actual query syntax? (is it difficult to implement? Does it breaks anything?) Or do you expect the breaking new query syntax to be shipped in production so quickly that it is not worth correcting the current syntax?

Thank you for considering this possibility.

steve-chavez · 2022-09-23T18:49:00Z

Since both things are independent of each other, would you please consider allowing empty parenthesis in the actual query syntax?

Yes, using the ideas on #1414 (comment) now I think an empty parenthesis grants us a simpler/better syntax than exists. We could even replace the !inner with:

GET /projects?select=*,clients()&clients=not.is.null

Does it breaks anything?

This would not be a breaking change since empty parenthesis right now gives a syntax error

curl 'localhost:3000/projects?select=*,clients()'

{"code":"42703","details":null,"hint":"Perhaps you meant to reference the column \"projects.client_id\".","message":"column projects.clients does not exist"}

Pigeo · 2022-09-24T13:15:48Z

Great. Then we can go on this.

steve-chavez changed the title ~~When embedding with top-level filtering (inner join), empty parentheses should be allowed~~ When embedding with top-level filtering (inner join), empty parentheses should be allowed(exists filter) Jul 19, 2022

wolfgangwalther added enhancement a feature, ready for implementation embedding resource embedding labels Aug 11, 2022

steve-chavez changed the title ~~When embedding with top-level filtering (inner join), empty parentheses should be allowed(exists filter)~~ exists filter - When embedding with top-level filtering (inner join), empty parentheses should be allowed Aug 15, 2022

This was referenced Sep 21, 2022

Computed relationship changes the shape of main query when selecting it as a computed column #2480

Open

Order parent by child's column (for to-one mapping) #1414

Closed

steve-chavez changed the title ~~exists filter - When embedding with top-level filtering (inner join), empty parentheses should be allowed~~ When embedding with top-level filtering (inner join), empty parentheses should be allowed Oct 12, 2022

steve-chavez mentioned this issue Oct 12, 2022

Is there any way to filter selected data with properties that are not selected? #2510

Closed

steve-chavez self-assigned this Nov 23, 2022

steve-chavez mentioned this issue Nov 25, 2022

feat: Allow embedding without selecting any column #2574

Merged

3 tasks

steve-chavez closed this as completed in #2574 Nov 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When embedding with top-level filtering (inner join), empty parentheses should be allowed #2340

When embedding with top-level filtering (inner join), empty parentheses should be allowed #2340

Pigeo commented Jun 20, 2022

steve-chavez commented Jun 20, 2022 •

edited

Loading

Pigeo commented Jun 21, 2022

Pigeo commented Jun 21, 2022

Pigeo commented Jun 21, 2022

steve-chavez commented Jun 21, 2022 •

edited

Loading

steve-chavez commented Aug 15, 2022

Pigeo commented Aug 17, 2022

steve-chavez commented Sep 23, 2022

Pigeo commented Sep 24, 2022

When embedding with top-level filtering (inner join), empty parentheses should be allowed #2340

When embedding with top-level filtering (inner join), empty parentheses should be allowed #2340

Comments

Pigeo commented Jun 20, 2022

Environment

Description of issue

Steps to reproduce

Suggestions / Discussion

steve-chavez commented Jun 20, 2022 • edited Loading

Pigeo commented Jun 21, 2022

Pigeo commented Jun 21, 2022

Pigeo commented Jun 21, 2022

steve-chavez commented Jun 21, 2022 • edited Loading

steve-chavez commented Aug 15, 2022

Pigeo commented Aug 17, 2022

steve-chavez commented Sep 23, 2022

Pigeo commented Sep 24, 2022

steve-chavez commented Jun 20, 2022 •

edited

Loading

steve-chavez commented Jun 21, 2022 •

edited

Loading