Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to query language #2066

Open
wolfgangwalther opened this issue Dec 3, 2021 · 12 comments
Open

Improvements to query language #2066

wolfgangwalther opened this issue Dec 3, 2021 · 12 comments
Labels
idea Needs of discussion to become an enhancement, not ready for implementation

Comments

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Dec 3, 2021

We have discussed a few shortcomings of our current query language (expressions in the query string). Taking those suggestions together - and adding a few more points, that seem to be inconsistent to me in the current implementation.

I wonder whether we should try to implement all those improvements as non-breaking separated changes - or just invent a query language v2, where we can chose via config option which language to use. Maybe #1804 can lead us to have a cleanly separated QueryLanguage module that we can replace with a new module.

Here we go:

  • Change operator syntax from col=op.val to col.op=val: Swagger: GET record from table with uuid primary key: failed to parse filter #1970. This will allow to use swagger-ui to send requests. It will also allow plain-html requests without javascript like this:

    <form action="/table?select=html">
       <input type="text" name="title.eq" />
     </form>

    This was not possible before, because the value would have to be constructed dynamically.

  • Separating filter logic from values: Swagger: GET record from table with uuid primary key: failed to parse filter #1970. This will allow doing more complex queries in swagger-ui involving logic conditions, too. It will also simplify a lot of our qouting requirements already. Let's put all logic into a logic=... parameter. Maybe filters= is even better.

  • Replace current in with col.eq.any=<array> as suggested in Normalize double quoting across all operators #1943. I'd go one step further and remove in completely. We are using ANY under the hood anyway and together with logic=, we would remove all manual quoting we'd need to do. We'd rely only on PG array quoting.

  • Move select and order to parameters of the mimetype in the Accept header. Both parameters only shape the response - but the underlying entity is the same. As such, I think they don't really fit in the query string, which should describe which entity to receive. This would also remove the strange interaction we have between the select= and the Accept header already: When you request a custom mimetype, e.g. application/octet-stream or text/plain, the whole select= doesn't make any sense. So the availability of select is already dependent on the mediatype.
    Example: Both of the following requests will return the same person. Returning fewer or more columns is not much different to just returning the same entity in a different format.

    GET /people?uid.eq=123&select=*
    GET /people?uid.eq=123&select=lastname,firstname

    would become

    GET /people?uid.eq=123
    Accept: application/json;select=*
    
    GET /people?uid.eq=123
    Accept: application/json;select=lastname,firstname

    I'm not sure whether we can still use the mimetype application/json here. The registration mentions no parameters at all: https://www.iana.org/assignments/media-types/application/json. It specifically hints that charset is not a valid parameter, but we are returning it anyway. At the same time RFC6838 says:

    New parameters SHOULD NOT be defined as a way to introduce new
    functionality in types registered in the standards tree, although new
    parameters MAY be added to convey additional information that does
    not otherwise change existing functionality.

    As we would still be returning valid json, I think we would not change existing functionality, so adding parameters should be fine. If we end up not wanting to do this, we could still do so with our own custom media type, though.

  • Remove limit and offset parameters entirely - this can be done via the respective range headers.

  • Replace !inner on the embedding with an exists filter as suggested in Inaccurate total record count returned when top-level filtering with embedded resource filters #2009 (comment). Note, this does not really play well with moving the embedding to the Accept header, yet. I don't think referencing a definition that is sent in the header from the query string is a wise idea. But given how the SQL for an exists query actually looks like (EXISTS (... sub query...)), maybe we can do something like exists=<embedding-spec>:

    GET /people?exists=jobs!job_fk

    This means that in some cases the embedding + hint needs to be repeated between query string and header, but this makes thing much more explicit. Later on, we could even support true filtering via EXISTS (...), which could then be combined with other filters in the logic= part. And only if it's part of a top-level AND and requested in the select= part, then we would make it a INNER JOIN.

  • Move columns to a parameter of the mimetype in the Content-Type header, which describes the request body. Same arguments apply as above.

  • Move on_conflict to a parameter of the Prefer: resolution header. POSTing the same request once with on_conflict and once without really does not change the returned entity. Parameters on prefer headers can be given like this, according to the RFC:

    Prefer: resolution=merge-duplicates;col1;col2;col3
  • Change the Accept: application/vnd.pgrst.object+json way to request a single resource to a path parameter something like this:

    GET /people;single?uid.eq=123
    GET /people;one?uid.eq=123

    This is because, the ability to query for a single row instead of multiple rows is really not inherent to the media type requested. Right now, it's not possible to query for a single row in text/csv, application/octet-stream, text/plain and other formats, which is a severe limitation. Changing it like this would also allow to people to rewrite those PK-equality requests easily via nginx from something like /people/123. And since there is no interaction with the mimetype, this will not conflict with custom accept headers that might be sent.

This would leave only filters (identified by at least one . before the =), arguments (without .) and logic= query parameters, reducing the amount of "reserved keywords" by a fair bit.

Using parameters to all kinds of things etc. is also much more consistent with what the RFCs allow for those, instead of cramping everything into the query string.


Not sure whether everything in here can really be done. I haven't found a good solution for filters / limits / offsets on embedded resources, yet. It doesn't make much sense to move those to the accept header - but keeping them in the query string would create a strange interaction between query string and accept header once again.

One idea, that I haven't fully thought through, yet, could be to split the embeddings from the select part. If the embeddings were defined in the query string, but the select= in the accept header would reference those, that would be fine, I think. Having an interaction in this direction between header and query string would make sense, imho.

Embedding other resources really changes the entity we are requesting, too - so they should not be moved to the header. What about using the flexibility the path specification of a URI gives us for embedding?

GET /people,jobs

GET /people,jobs;jobs_fk?age.eq=15&jobs.title.eq=Boss
Accept: application/json;select=name,age,jobs.title

This would also play nicely with the ;single or ;one parameter for requesting a single entity, which is kind of similar to a hint of relationship type to embedd !m2m, !o2m, etc.:

GET /people;one,jobs;o2m?...

They both do kind of the same thing in returning an object instead of an array.

@wolfgangwalther wolfgangwalther added the idea Needs of discussion to become an enhancement, not ready for implementation label Dec 3, 2021
@steve-chavez
Copy link
Member

I like many of the ideas from an HTTP standpoint, but they'd be such a big breaking change(consider all client libraries we have now) that we might as well change our name if we did them. Many users will wonder why they should not just try to migrate to a GraphQL/OData(which are behind standards) solution if we do such a change.

or just invent a query language v2

Yes, this is why it's important to document our REST syntax(ref), maybe we could do a RESTQL/RSQL/RESQL V2 if we'd ever adopt such a change. But even if we were to support both our v1 and v2 at the same time, I think that would still create doubts about our stability.

So I think before doing any major breaking changes to our REST syntax, we should offer extensibility(#1909 (comment)) to provide a way to be compatible with other standards(GraphQL/Odata).


Some comments:

GET /people?uid.eq=123
Accept: application/json;select=*

This looks better from a REST/HTTP standpoint, but there are many crippled http clients out there that can't send headers, we would leave those out or require even more use cases for proxies.

GET /people,jobs;jobs_fk?age.eq=15&jobs.title.eq=Boss

Ah, isn't ; required to be urlencoded? I'm not sure, but it seems too likely that we'd be creating impedance with http clients out there, causing similar problems to the ones our current or/and operators generated. Also, IMHO ; somehow looks bad in the URL path.

GET /people?exists=jobs!job_fk

This one looks good and is actionable, will take it to a new issue.

@wolfgangwalther
Copy link
Member Author

Ah, isn't ; required to be urlencoded? I'm not sure, but it seems too likely that we'd be creating impedance with http clients out there, causing similar problems to the ones our current or/and operators generated. Also, IMHO ; somehow looks bad in the URL path.

No, they are specifically allowed for that purpose: https://datatracker.ietf.org/doc/html/rfc3986#section-3.3

Aside from dot-segments in hierarchical paths, a path segment is
considered opaque by the generic syntax. URI producing applications
often use the reserved characters allowed in a segment to delimit
scheme-specific or dereference-handler-specific subcomponents. For
example, the semicolon (";") and equals ("=") reserved characters are
often used to delimit parameters and parameter values applicable to
that segment. The comma (",") reserved character is often used for
similar purposes. For example, one URI producer might use a segment
such as "name;v=1.1" to indicate a reference to version 1.1 of
"name", whereas another might use a segment such as "name,1.1" to
indicate the same. Parameter types may be defined by scheme-specific
semantics, but in most cases the syntax of a parameter is specific to
the implementation of the URI's dereferencing algorithm.

@wolfgangwalther
Copy link
Member Author

I like many of the ideas from an HTTP standpoint, but they'd be such a big breaking change(consider all client libraries we have now) that we might as well change our name if we did them. Many users will wonder why they should not just try to migrate to a GraphQL/OData(which are behind standards) solution if we do such a change.

I certainly don't want to propose a general breaking change in the API here. I think there is value in "providing the most HTTP-standard compliant interface" in itself, but also many of those proposals solve actual problems we have.

Yes, this is why it's important to document our REST syntax(ref), maybe we could do a RESTQL/RSQL/RESQL V2 if we'd ever adopt such a change. But even if we were to support both our v1 and v2 at the same time, I think that would still create doubts about our stability.

So I think before doing any major breaking changes to our REST syntax, we should offer extensibility(#1909 (comment)) to provide a way to be compatible with other standards(GraphQL/Odata).

If we offer an extensible interface for the query language and have both our V1 and V2 to make use of that interface / prove that it's working well - I don't see us creating doubts about stability. It's the other way around, I think. It increases confidence in a stable non-breaking interface, because new features, that would imply breaking changes can much easier be developed while keeping backwards compatibility in other versions of our language.

@steve-chavez

This comment was marked as outdated.

@steve-chavez
Copy link
Member

Once #1970 (comment) is implemented, I think we could support both the old and new filter syntax without a breaking change.

Change operator syntax from col=op.val to col.op=val

Looking at the parsers I think it'd be easier to do op.col=val than col.op=val.

col=val would be eq.col=val implicitly.

@steve-chavez
Copy link
Member

steve-chavez commented May 19, 2023

Some ideas..

Embedding other resources really changes the entity we are requesting, too - so they should not be moved to the header. What about using the flexibility the path specification of a URI gives us for embedding?
GET /people,jobs
GET /people,jobs;jobs_fk?age.eq=15&jobs.title.eq=Boss
Accept: application/json;select=name,age,jobs.title

Doesn't embedding just add attributes to the entity? You still do a FROM base_entity JOIN other on SQL after all. We could just say an embedding is a different representation of a resource(table).

Edit: Totally misread the RFC. It was related to media type parameters not query parameters. Still the new media type below might be worth considering.

Also RFC 2046 says:

Parameters are modifiers of the media subtype, and as such do not fundamentally affect the nature of the content. The set of meaningful parameters depends on the media type and subtype.

So the query parameters depend on the media type, we don't need to put the select on the header. Also, omitting columns from select doesn't fundamentally change the nature of the content - it's still json, we're just leaving some fields as 'undefined'.

Considering the above, it sounds possible to document the embedding possibilities for a resource. In OpenAPI we can list the Accept media types. Using films in resource embedding, we could say it accepts:

paths:
  /films:
    get:
      ...
          content:
            application/vnd.pgrst.related+json; rel=films>-directors
            application/vnd.pgrst.related+json; rel=films-<nominations
            application/vnd.pgrst.related+json; rel=films-<roles
            application/vnd.pgrst.related+json; rel=films>-<actors
            application/vnd.pgrst.related+json; rel=films>-<competitions

>- = many-to-one. -< = one-to-many. >-< many-to-many. one-to-one could be --.

So say we get the following request(with nesting):

GET /films?select=*,roles(*, actors(*)),directors(*)

We would respond with our usual json, but with the added header:

Content-Type: application/vnd.pgrst.related+json; rel=films-<roles,films>-directors,roles>-actors

The gain here is that we could document the relationships a resource has. Which was previously impossible in OpenAPI.

@wolfgangwalther
Copy link
Member Author

Considering the above, it sounds possible to document the embedding possibilities for a resource. In OpenAPI we can list the Accept media types. Using films in resource embedding, we could say it accepts:

This would only make sense if the embedding were to take place via accept header / mediatype. I don't think that would be correct. Adding an embedding will return a new entity - it will not merely change the return format. That means it does not belong in the header.

The gain here is that we could document the relationships a resource has. Which was previously impossible in OpenAPI.

Really what I suggested above about embeddings and the path component would be the way forward. Just requesting people is different from just requesting jobs is different from requesting a combination of both. If we had something like GET /people,jobs, then it would also be possible to add those new paths to the OpenAPI output.

@erickedji
Copy link

Change operator syntax from col=op.val to col.op=val

As another datapoint, this also helps when user input is passed to a stored procedure via GET.

Currently, query=foobar is processed correctly for CREATE FUNCTION search(query TEXT). But with query=in.foobar, Postgrest no longer interprets query as a function param, but as a filter.

There doesn't seem to be a simple way to force the function param interpretation (quoting required preprocessing in the stored procedure to remove the quotes).

@steve-chavez
Copy link
Member

steve-chavez commented Jul 18, 2023

Adding an embedding will return a new entity - it will not merely change the return format. That means it does not belong in the header.

Just noticed that even the Prefer: exact=count hints that the above is true. The count is different when embedding is done (see #2009 and #2846). So indeed it's a new entity and doesn't belong in the header, makes sense because people,jobs can be represented in json, csv or other media types.

GET /people,jobs;jobs_fk?age.eq=15&jobs.title.eq=Boss

This would also play nicely with the ;single or ;one parameter for requesting a single entity, which is kind of similar to a hint of relationship type to embedd !m2m, !o2m, etc.:
GET /people;one,jobs;o2m?...

Now that we're moving on from doing disambiguation on the URL(#2863), the syntax can be much simpler so we don't need the FK or the cardinality in the path. So we could just go with GET /people,jobs.


There's a problem in expressing nested embedding on the path, since it needs to shape the JSON output. For this I believe we should keep the select parameter.

GET /actors,roles,films?select=roles(character,films(title,year))

Or later as a media type parameter too:

GET /actors,roles,films

Accept: application/json;select=roles(character,films(title,year))

If we had something like GET /people,jobs, then it would also be possible to add those new paths to the OpenAPI output.

Yes, and we could do this now without a breaking change.


Another syntax:

GET /users(projects,tasks(subtasks))

That would allow us to express joins on resources.

@steve-chavez

This comment was marked as outdated.

@steve-chavez

This comment was marked as outdated.

@steve-chavez
Copy link
Member

Another improvement for the query language: use ~ for not. It's shorter/clearer (likely easier for the parsers) and it doesn't require urlencoding: https://perishablepress.com/stop-using-unsafe-characters-in-urls/

/a?col.~eq=3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Needs of discussion to become an enhancement, not ready for implementation
Development

No branches or pull requests

3 participants