Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When embedding with top-level filtering (inner join), empty parentheses should be allowed #2340

Closed
Pigeo opened this issue Jun 20, 2022 · 9 comments · Fixed by #2574
Closed
Assignees
Labels
embedding resource embedding enhancement a feature, ready for implementation

Comments

@Pigeo
Copy link

Pigeo commented Jun 20, 2022

Environment

  • PostgreSQL version: 12.10 (Debian 12.10-1.pgdg100+1)
  • PostgREST version: 9.0.1 (linux-static-x64)
  • Operating system: Debian Buster (10.12)

Description of issue

Trying to do :

http://127.0.0.1:3000/film?select=*,film_category!inner()

I get an error message column film.film_category does not exist.

Firstly, the error message is out of touch, since we are performing an INNER JOIN and the columns / relation does exist.

Secondly, it should not trigger an error: this query is perfectly legitimate and feasible ("I want all the films that do have a category, but I'm not interested in knowing what are their precise categories, therefore I would like to have a reduced JSON, not containing the film_category data, i.e. just filtering the top-level on "has a film_category", but not embedding the data)

What I have to do as a work-around: put any dummy column inside the parentheses, so that I don't get an error message anymore:

http://127.0.0.1:3000/film?select=*,film_category!inner(film_id)

And then I get a fat JSON body response, where I'm trashing all this embedded data... (waste of bandwidth + slow web app because of that)

Steps to reproduce

Let's take a sample database from https://www.postgresqltutorial.com/postgresql-sample-database/.
The relevant schema is as follows:

CREATE TABLE public.film (
    film_id integer DEFAULT nextval('public.film_film_id_seq'::regclass) NOT NULL,
    title character varying(255) NOT NULL,
    description text,
    release_year public.year,
    language_id smallint NOT NULL,
    rental_duration smallint DEFAULT 3 NOT NULL,
    rental_rate numeric(4,2) DEFAULT 4.99 NOT NULL,
    length smallint,
    replacement_cost numeric(5,2) DEFAULT 19.99 NOT NULL,
    rating public.mpaa_rating DEFAULT 'G'::public.mpaa_rating,
    last_update timestamp without time zone DEFAULT now() NOT NULL,
    special_features text[],
    fulltext tsvector NOT NULL
);
CREATE TABLE public.category (
    category_id integer DEFAULT nextval('public.category_category_id_seq'::regclass) NOT NULL,
    name character varying(25) NOT NULL,
    last_update timestamp without time zone DEFAULT now() NOT NULL
);
CREATE TABLE public.film_category (
    film_id smallint NOT NULL,
    category_id smallint NOT NULL,
    last_update timestamp without time zone DEFAULT now() NOT NULL
);
ALTER TABLE ONLY public.film_category
    ADD CONSTRAINT film_category_category_id_fkey FOREIGN KEY (category_id) REFERENCES public.category(category_id) ON UPDATE CASCADE ON DELETE RESTRICT;
ALTER TABLE ONLY public.film_category
    ADD CONSTRAINT film_category_film_id_fkey FOREIGN KEY (film_id) REFERENCES public.film(film_id) ON UPDATE CASCADE ON DELETE RESTRICT;

Populate those 3 tables as provided on this page, but then remove the category of some films, for example with:

DELETE FROM film_category WHERE category_id > 1; -- only 64 films will keep their category

Now suppose you're interested to know what are those 64 films which still has a category, but you don't want to embed their categories.

Suggestions / Discussion

If ever it's only a problem of syntax / parsing of the query string, not allowing any empty parentheses, then may I suggest to use a special keyword that would mean "no columns at all". It could be some reserved keyword from the SQL syntax, such as "NULL", so that we're pretty sure nobody would ever have a column with such a name in his/her database:

http://127.0.0.1:3000/film?select=*,film_category!inner(NULL)

(Currently says "column film_category.NULL does not exist", which is true)

@steve-chavez
Copy link
Member

steve-chavez commented Jun 20, 2022

And then I get a fat JSON body response, where I'm trashing all this embedded data... (waste of bandwidth + slow web app because of that)

I see your point.

http://127.0.0.1:3000/film?select=*,film_category!inner()

We were planning to change the !inner syntax to exists, this would allow omitting film_category in the select. So how about:

GET /film?exists=film_category

@Pigeo
Copy link
Author

Pigeo commented Jun 21, 2022

I also need to do things like that:

http://127.0.0.1:3000/film?select=*,film_category!inner()&film_category.category_id=not.eq.1

(Actually, that's the vast majority of my use cases. The former one, without any additional filter criteria, is rarer, and was cited here for simplification purposes)

As far as I understand how PostgREST works, currently I need to include film_category!inner() in the select=…, otherwise my filter criteria film_category.category_id=not.eq.1 is simply ignored and discarded without any notification or any error message.

How would you do that with your proposed syntax ?

(It would be great if PostgREST could automatically infer the necessity to perform an INNER JOIN from the bare presence of a filter criteria starting with my_other_table_name.column_name=…)

@Pigeo
Copy link
Author

Pigeo commented Jun 21, 2022

To give some figures, in my production database, the equivalent of the film table in the above example contains ~1,000 rows, and the equivalent of the film_category table above contains ~1,000,000 rows (because it's not a N-to-N table).

Therefore, when using this work-around:

GET http://127.0.0.1:3000/film?select=*,film_category!inner(film_id) HTTP 1/1

I get a 12 MB JSON response, containing thousands of useless lines...

Whereas with a theoretical:

GET http://127.0.0.1:3000/film?select=*,film_category!inner() HTTP 1/1

The JSON response would be only 80 KB...

So, that makes a great difference!

@Pigeo
Copy link
Author

Pigeo commented Jun 21, 2022

And then I get a fat JSON body response, where I'm trashing all this embedded data... (waste of bandwidth + slow web app because of that)

I see your point.

http://127.0.0.1:3000/film?select=*,film_category!inner()

We were planning to change the !inner syntax to exists, this would allow omitting film_category in the select. So how about:

GET /film?exists=film_category

Thinking about it once again: if you try to make an analogy with the SQL syntax, then I understand that the select=… part of the query should not change which rows are retrieved or not, and it should only change which columns are presents or not in the response body, and as such, the INNER JOIN should be located elsewhere than in the select=…. But the same argument can made about the WHERE clause (or what resembles it) : I'm not sure it's a good idea to rely on the content of the WHERE clause to determine if an INNER JOIN should be performed or not (to me, something like exists=film_category clearly is something similar to what can be found in the WHERE clause of an SQL query).

Maybe it is not for nothing that there is also a FROM clause in the SQL syntax... Perhaps you should consider adding some equivalent of the FROM clause for specifying which INNER JOINs should be performed?

@steve-chavez
Copy link
Member

steve-chavez commented Jun 21, 2022

I also need to do things like that:
http://127.0.0.1:3000/film?select=*,film_category!inner()&film_category.category_id=not.eq.1
How would you do that with your proposed syntax ?

Yes, that would also be possible with exists:

GET /film?exists=film_category&film_category.category_id=not.eq.1

As far as I understand how PostgREST works, currently I need to include film_category!inner() in the select=…, otherwise my filter criteria film_category.category_id=not.eq.1 is simply ignored and discarded without any notification or any error message.

That one was fixed in #2133, v10 will include it.

I'm not sure it's a good idea to rely on the content of the WHERE clause to determine if an INNER JOIN should be performed or not (to me, something like exists=film_category clearly is something similar to what can be found in the WHERE clause of an SQL query).

Check the samples of the generated queries in #1978 (comment), it's actually a LATERAL JOIN plus json_agg. For now the best syntax we could come up is the exists one, but feel free to suggest another one that's clearer.


One other thing we could try to support with exists is doing OR across different tables: #2014 (comment)

@steve-chavez steve-chavez changed the title When embedding with top-level filtering (inner join), empty parentheses should be allowed When embedding with top-level filtering (inner join), empty parentheses should be allowed(exists filter) Jul 19, 2022
@wolfgangwalther wolfgangwalther added enhancement a feature, ready for implementation embedding resource embedding labels Aug 11, 2022
@steve-chavez steve-chavez changed the title When embedding with top-level filtering (inner join), empty parentheses should be allowed(exists filter) exists filter - When embedding with top-level filtering (inner join), empty parentheses should be allowed Aug 15, 2022
@steve-chavez
Copy link
Member

Just noted that the anti-join mentioned on #1949 (comment) could be supported with a not.exists filter.

We could also add an /tbl?select=*,embed!anti(*) option similar to the !inner but I don't think it would make sense because the embed would always be empty.

@Pigeo
Copy link
Author

Pigeo commented Aug 17, 2022

Also considering #1949 (comment), that also suggests to just go with the non-breaking changes for now - and then open another issue where we can discuss a "new query syntax" that could get a few things straight:

  • I understand you're thinking about a new query syntax, for the long term, that would introduce a breaking change (for ex. by adding a new exists filter)
  • on the other hand, my initial request was just to relax a little bit the constraints on the current query syntax, so that empty parentheses should be allowed for embeddings. This does not introduce any breaking change, and thus could be done as of now, without having to wait for a new query syntax with breaking changes.

Since both things are independent of each other, would you please consider allowing empty parenthesis in the actual query syntax? (is it difficult to implement? Does it breaks anything?) Or do you expect the breaking new query syntax to be shipped in production so quickly that it is not worth correcting the current syntax?

Thank you for considering this possibility.

@steve-chavez
Copy link
Member

Since both things are independent of each other, would you please consider allowing empty parenthesis in the actual query syntax?

Yes, using the ideas on #1414 (comment) now I think an empty parenthesis grants us a simpler/better syntax than exists. We could even replace the !inner with:

GET /projects?select=*,clients()&clients=not.is.null

Does it breaks anything?

This would not be a breaking change since empty parenthesis right now gives a syntax error

curl 'localhost:3000/projects?select=*,clients()'

{"code":"42703","details":null,"hint":"Perhaps you meant to reference the column \"projects.client_id\".","message":"column projects.clients does not exist"}

@Pigeo
Copy link
Author

Pigeo commented Sep 24, 2022

Great. Then we can go on this.

@steve-chavez steve-chavez changed the title exists filter - When embedding with top-level filtering (inner join), empty parentheses should be allowed When embedding with top-level filtering (inner join), empty parentheses should be allowed Oct 12, 2022
@steve-chavez steve-chavez self-assigned this Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embedding resource embedding enhancement a feature, ready for implementation
Development

Successfully merging a pull request may close this issue.

3 participants