Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: column to column comparison #1105

Closed
priyank-purohit opened this issue May 9, 2018 · 12 comments
Closed

Feature request: column to column comparison #1105

priyank-purohit opened this issue May 9, 2018 · 12 comments

Comments

@priyank-purohit
Copy link
Contributor

priyank-purohit commented May 9, 2018

Description of issue

Currently PostgREST supports only column-to-value(s) comparison, however column-to-column comparisons are extremely common and too important to not support.

Implementation

This is a somewhat tricky feature to implement because you need to differentiate values from column names. The best solution I came up with is to add a .c. prefix between the operator and the column.

For example, ?and=(first_name.c.eq.last_name) vs ?and=(first_name.eq.last_name) to search for people with same first and last names vs. people whose first name is in the db as literal string 'last_name'.

Another potential solution: duplicate each operator (where it makes sense the user wants to do column comparison), and this duplicate operator (with a slightly different name) will specifically interpret the value supplied by the user as a column name. If the value supplied is not a column name, 404 response. For example, the .eq. operator's column comparison equivalent could be .ceq.

@steve-chavez
Copy link
Member

steve-chavez commented May 9, 2018

Why not define that in the view you're exposing?, say like:

create view api.persons as (
  select id, first_name, last_name, first_name = last_name as is_tautonym 
  from private.persons
);

A computed column will also work:

create or replace function is_tautonym(api.persons) returns boolean as $$
  select $1.first_name = $1.last_name;
$$ language SQL stable;

Then you could do:

/persons?is_tautonym=is.true

For now I'd be against implementing this, as it's more clean to represent that kind of comparisons in the db.

@priyank-purohit
Copy link
Contributor Author

@steve-chavez That works for that one particular query, but not for any other queries (you can create any column-to-value comparison query otherwise). Is there any other reason besides the cleanliness of the PostgREST code?

Alternatively, I could implement something like this in a custom version of PostgREST; but I'm wondering if would this be a difficult undertaking for someone with 0 Haskell knowledge?

@steve-chavez
Copy link
Member

Considering the difficulty required towards implementing this and defining views/computed columns in a few lines of SQL, I'd say the latter is a much more easy option.

I think this kind of comparisons are core business logic, and as so, they should be defined at the db level with an appropriate name/comment on the column so the REST API can have proper documentation, as your project matures you're gonna need this.

Also there needs to be a limit in what kind of operations PostgREST exposes, having this somehow feels like a step closer to exposing custom functions in filters values.

@priyank-purohit
Copy link
Contributor Author

priyank-purohit commented May 10, 2018

I would say that something like 'regular expression comparison operator' is a use-case specific feature, as in not everyone will want/need that (and it increases risk of ReDoS). Hence, that is something you can expose via custom functions if you need.

I think of PostgREST as a way to create API for any PostgreSQL db, and a way to execute some common queries. I would say that SELECT * FROM table WHERE column1 > column2 AND column1 < column3 are examples of basic SQL queries that most people would find themselves executing. For example, I'm using this with sports and biological databases, both need to do those searches. I can create custom functions for each of the 100+ use cases, or have a schema-independent way to compare columns built into this tool!

I'll try to figure out how to implement this, but if someone experienced on contributing to this project can point me in the right direction (i.e. which file should I start at, which files would need to be modified to handle URL parsing and executing queries), I'd appreciate it!

@steve-chavez
Copy link
Member

Better to drop the .c. prefix if you want to do a PR, as this would make your job harder on the Parsers module and looks somewhat confusing on the url, didn't give it much thought but maybe another set of operators like col_{eq,gt,is} would be more appropriate, this needs more discussion.

@steve-chavez
Copy link
Member

Giving it more thought, the only operators that make sense on column to column comparison are: eq, neq, lt, lte, gt, gte.

I think we can have another HashMap of operators here, we'd need another constructor for them, those col_eq, col_neq, col_lt, col_lte, col_gt, col_gte would need to be parsed, and after that they would have to be converted to SQL.

We would also need another set of tests for this, making sure that computed columns comparisons work as well.

@steve-chavez steve-chavez added hygiene cleanup or refactoring enhancement a feature, ready for implementation and removed hygiene cleanup or refactoring labels Jun 5, 2018
@ruslantalpa
Copy link
Contributor

the need to strictly compare column values is very rare. It's even less likely that those two columns are in the same table (and this is what is being suggested/requested).
usually you want to apply some kind of function to a value of a column before comparing it to another and implementing that is complicated and at the same time exposing that is dangerous.

As a test for my theory, let's see here examples of real tables with two columns where people would need to compare them.

@priyank-purohit
Copy link
Contributor Author

priyank-purohit commented Jun 6, 2018

I currently use PostgREST to work with a massive biological database. I thought there was some problem with the way it was created so I needed to do some validation.

There is a PredictedGenes table which has a bunch of columns, including: GeneStart, GeneEnd, Strand. I needed to check that Strand = '+' and GeneStart > GeneEnd and Strand = '-' and GeneStart < GeneEnd

I definitely agree that being able to do complex manipulation of column values for comparison is great and needed pretty frequently. However, there are also other Postgres functions that are commonly used, yet not supported by PostgREST (GROUP BY for example).

That been said, I personally can't work on this at the moment, busy with wrapping up my degree; plus I have 0 knowledge of Haskell (tried to follow Steve's dev guide, but can't make heads or tails of the language atm). :( But if this was implemented, I would definitely implement it in my application!

@steve-chavez
Copy link
Member

@priyank-purohit For validating data, seems more suitable to query your PostgreSQL db directly(with psql or any client) than to go through PostgREST.

I hesitate on including the column to column comparison feature on master for now, would like to see more compelling use cases first.

@steve-chavez steve-chavez removed the enhancement a feature, ready for implementation label Jun 19, 2018
@sinogermany
Copy link

sinogermany commented Jul 4, 2018

Probably not so relevant - we are currently trying to do:

UPDATE <table> SET <col1> = <col2> WHERE <blah>

If we could introduce literals into the PATCH payload, that would've been really great.
I.e. something like:

PATCH /table_name?<blah>
{ "col1": "literal(col2)" }

@wolfgangwalther
Copy link
Member

The conclusion seems to be that this will not be implemented right now, therefore I'm closing this.

For more complex scenarios then what the query string syntax has to offer, we can already use some very powerful tools: Views, Stored Procedures and Computed Columns.

Taking the example mentioned above:

There is a PredictedGenes table which has a bunch of columns, including: GeneStart, GeneEnd, Strand. I needed to check that Strand = '+' and GeneStart > GeneEnd and Strand = '-' and GeneStart < GeneEnd

This can be solved with a computed column function that just returns -1, 0, or 1 for the comparison of GeneStart and GeneEnd. This computed column can then be queried with the regular syntax.

@steve-chavez
Copy link
Member

Just noted that @sinogermany use case above can be solved with JSON Patch copy operation. So relating to #465.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants