Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi column distinct_on for Postgresql #3628

Merged
merged 12 commits into from
May 25, 2023

Conversation

omid
Copy link
Contributor

@omid omid commented May 15, 2023

Copy link
Member

@weiznich weiznich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR 👍
This looks fine for me (beside the minor thing about Self::Output vs the dsl types)
Additionally I would like to see some compile test that verifies that we reject queries with multiple columns in .distinct_on calls without matching order clause.

diesel/src/query_dsl/group_by_dsl.rs Outdated Show resolved Hide resolved
diesel/src/pg/backend.rs Show resolved Hide resolved
diesel/src/pg/backend.rs Show resolved Hide resolved
@omid
Copy link
Contributor Author

omid commented May 15, 2023

Additionally I would like to see some compile test that verifies that we reject queries with multiple columns in .distinct_on calls without matching order clause.

@weiznich, actually it failed with simple correct orders :(
On my local tests, I didn't have any orders and it was fine. But in tests which has orders, it fails.

With simple orders (like .order_by((id, name)) it's fine, but as soon as we have .asc() or .desc() it fails.
I'm not sure how I can solve it. I'm trying but couldn't find the proper way yet :(

@weiznich
Copy link
Member

I do not have a solution for that yet, so I need to find some time to search for a solution. I cannot promise that this happens this week due to other priorities.

@omid
Copy link
Contributor Author

omid commented May 16, 2023

I do not have a solution for that yet, so I need to find some time to search for a solution. I cannot promise that this happens this week due to other priorities.

NP. I'm also working on this.

First try, to generate all impl cases failed. Because it needs to generate a lot of code, like this:

impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Asc<__D1>, __D2)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Desc<__D1>, __D2)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(__D1, crate::helper_types::Asc<__D2>)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(__D1, crate::helper_types::Desc<__D2>)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Asc<__D1>, crate::helper_types::Asc<__D2>)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Asc<__D1>, crate::helper_types::Desc<__D2>)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Desc<__D1>, crate::helper_types::Asc<__D2>)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Desc<__D1>, crate::helper_types::Desc<__D2>)>
{
}

if we have 128-column-tables feature enabled, just for the 128 column scenario, it needs to generate 1.18e61 cases :))

@weiznich
Copy link
Member

What you could try is something like that:

trait OrderDecorator {
    type Plain;
}

impl<C> OrderDecorator for C where C: Column {
   type Plain = C;
}

impl<C> OrderDecorator for Desc<C>  {
    type Plain = C;
}

// implement it for `Asc` as well

impl<__D1,__D2, __O1, __O2> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>> for OrderClause<(__O1, __O2)> 
where __O1: OrderDecorator<Plain = __D1>,
__O2: OrderDecorator<Plain = __D2>,
{}
// implement it for all other tuple sizes

Note that I have not tested that, so there might be problems there.

@omid
Copy link
Contributor Author

omid commented May 16, 2023

Note that I have not tested that, so there might be problems there.

Wow, works! (I tested it for a tuple of two)
I need to learn how you did that, thanks!

@weiznich
Copy link
Member

I need to learn how you did that, thanks!

There are just some patterns that keep coming up again and again in diesel. That's one of them. I should probably start writing more about those internals some day...

@omid
Copy link
Contributor Author

omid commented May 17, 2023

There are just some patterns that keep coming up again and again in diesel. That's one of them. I should probably start writing more about those internals some day...

That would be awesome!

But... regarding the implementation, I found a tiny issue in your code. Here is the code for a tuple of two:

trait OrderDecorator {
    type Plain;
}

impl<C> OrderDecorator for C where C: Column {
   type Plain = C;
}

impl<C> OrderDecorator for Desc<C>  {
    type Plain = C;
}

impl<C> OrderDecorator for Asc<C>  {
    type Plain = C;
}

impl<__D1,__D2, __O1, __O2> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>> for OrderClause<(__O1, __O2)> 
where __O1: OrderDecorator<Plain = __D1>,
__O2: OrderDecorator<Plain = __D2>,
{}

The issue is that it doesn't work for a query like this:

let source = users
        .select((name, hair_color))
        .order((id, name.asc(), hair_color.desc()))
        .distinct_on((id, name));

because order is a tuple of three and distinct_on is tuple of two.

One solution could be to have many impls per DistinctOnClause for example have this also:

impl<__D1,__D2, __O1, __O2,__O3> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>> for OrderClause<(__O1, __O2,__O3)> 
where __O1: OrderDecorator<Plain = __D1>,
__O2: OrderDecorator<Plain = __D2>,
{}

impl<__D1,__D2, __O1, __O2,__O3,__O4> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>> for OrderClause<(__O1, __O2,__O3,__O4)> 
where __O1: OrderDecorator<Plain = __D1>,
__O2: OrderDecorator<Plain = __D2>,
{}

// and so on

@weiznich
Copy link
Member

Yes my implementation would only work with "matching" distinct and order by clauses.

So the basic problem here is that the following combinations are fine:

  • Distinct on clause contains less columns than order by clause and these columns match the begining of the order by clause, so ValidOrderingForDistinct<DistinctOnClause<(__O1::Plain, __O2::Plain)>> for OrderClause<(__O1, __O2,__O3, __O4, …)>
  • Distinct on clause contains more columns than order by clause where the order by clause "matches" the first columns of the distinct clause: ValidOrderingForDistinct<DistinctOnClause<(__O1::Plain, __O2::Plain, D1, D2, …)>> for OrderClause<(__O1, __O2)>
  • The column count matches -> The case in my example above.

So we need way more impls to describe that. For each tuple size n we need m impls where m is the maximal supported tuple size. (That ignores one or two additional impls to allow single column cases without tuples) That means it's just m*m impls. So for a max tuple size of 32 we have 1024 impl, for a max size of 128 we have 16k impls. That does not seem to scale well.

I see the following options here:

  1. Just allow exact matches
  2. Maybe do not use the max_table_columns count to determine the upper limit here, but use something smaller. We could say that 5 (or any other reasonable small number) columns in order by/distinct on clauses should be enough for anyone.

I think variant 1 would make it hard to add support for the unsupported cases later on, because it might require removing these existing impls (which is a breaking change). Variant 2 would allow to add support for the unsupported cases, by just bumping the number. I personally would likely go with variant 2, although that will require some modifications to the diesel_for_each_tuple! macro. I think the easiest route there is to just allow an additional optional argument that specifies the tuple size. If that's not set, just fall back to the "default" size.

@omid
Copy link
Contributor Author

omid commented May 20, 2023

@weiznich I would also do #2.

But I'm not sure what exactly you suggested.

  1. Do you mean, for example, for a distinct on tuple of N members, should we implement order by tuples of 1 to N+5? (actually, min(N+5, MAX_TUPLE_SIZE))?
  2. Or for any distinct on tuple, just order by tuples of 1 to 5 only?
  3. Or just impl to max 5 members, either distinct on tuples and order by tuples? (I think you meant this one, then see other items as new suggestions :D)

impl of the first one is harder :)

@weiznich
Copy link
Member

I've meant to suggest this variant:

Or just impl to max 5 members, either distinct on tuples and order by tuples? (I think you meant this one, then see other items as new suggestions :D)

With the restriction that the max size of the distinct on and order by tuples is 5 (or whatever other limit). So really just a "few" (25) impls for all the cases, but for a low column count.

I think that's the solution that should cover most of the use-cases and that's easy to communicate to the users.

@weiznich weiznich mentioned this pull request May 22, 2023
@omid
Copy link
Contributor Author

omid commented May 22, 2023

@weiznich I need your help. I pushed last changes.

  1. I couldn't make the macro smaller, it's a bit tricky! I need to find the minimum of two lists and define bounds based on that, which makes the recursion too complex, based on my knowledge.
  2. Now there is a conflict between impl<T> ValidOrderingForDistinct<DistinctOnClause<T>> for OrderClause<T> where T: Expression {} and
 #[allow(unused_parens)]
#[cfg(feature = "postgres")]
impl<_OT1, T1,> crate::query_dsl::order_dsl::ValidOrderingForDistinct<crate::pg::DistinctOnClause<(T1,)>,> for crate::query_builder::order_clause::OrderClause<(_OT1,)>
    where
        _OT1: crate::pg::OrderDecorator<Column = T1>,
        T1: Column,
{}

To solve it, I tried to put T1: Column here, but still has the conflict (But I think it makes sense to have Column there, no?). It seems like Expression is really general, I don't know what should I use here to limit the scope?
And if I remove that general impl, I'll get this error:

error[E0277]: the trait bound `SelectStatement<FromClause<schema::users::table>,
query_builder::select_clause::SelectClause<(schema::users::columns::name, schema::users::columns::hair_color)>,
query_builder::distinct_clause::NoDistinctClause, query_builder::where_clause::NoWhereClause,
query_builder::order_clause::OrderClause<schema::users::columns::name>>: DistinctOnDsl<_>` is not satisfied

I'll try to find it, but I think you can lead me and review my changes in the meanwhile 🙏🏼

@weiznich
Copy link
Member

I will have a look at the macro later this week. As for the conflicting impls: You should be able to solve that with the SingleValue bound that was there before.

@omid
Copy link
Contributor Author

omid commented May 22, 2023

There are still conflicts, after I put:

impl<T> ValidOrderingForDistinct<DistinctOnClause<T>> for OrderClause<T>
    where
        T: Expression,
        T::SqlType: SingleValue,
{
}

the conflicts are like before:

`conflicting implementations of trait `ValidOrderingForDistinct<DistinctOnClause<(_,)>>`
for type `order_clause::OrderClause<(_,)>`

(This also uses a different strategy to generate these impls as that
results in more maintainable code)
@weiznich
Copy link
Member

I've pushed a commit that hopefully should workaround these conflicting impls. It also restructures how these impls are generated so that the additional paste dependency is not required anymore. Can you verify that it actually works now?

@omid
Copy link
Contributor Author

omid commented May 24, 2023

@weiznich thanks.
There is still an issue. I get this when I run the distinct_on test:

the trait bound `SelectStatement<FromClause<schema::users::table>,
query_builder::select_clause::SelectClause<(schema::users::columns::name, schema::users::columns::hair_color)>,
query_builder::distinct_clause::NoDistinctClause, query_builder::where_clause::NoWhereClause,
query_builder::order_clause::OrderClause<diesel::expression::operators::Asc<schema::users::columns::name>>>:
DistinctOnDsl<_>` is not satisfied

for this query:

let source = users
        .select((name, hair_color))
        .order(name.asc())
        .distinct_on(name);

@omid
Copy link
Contributor Author

omid commented May 24, 2023

@weiznich

impl<T, O> ValidOrderingForDistinct<DistinctOnClause<T>> for OrderClause<O> where O: OrderDecorator<Column = T> {}

solves the not implemented case.
I'll continue writing more tests.

@omid
Copy link
Contributor Author

omid commented May 24, 2023

@weiznich I think there are no tests to check compile time errors, am I right?
And also, I think we cannot easily show a compile time warning about the limitation of having 5 cases only.

@weiznich
Copy link
Member

Tests to check for compile time error are usually placed here: https://github.com/diesel-rs/diesel/tree/master/diesel_compile_tests%2Ftests
You basically add a file with code that is supposed to cause an compiler error to the fail folder. You execute these tests by running cargo test in the compile_test directory. You need to accept the output after running the new test for the first time (instructions are in the test failure)

@weiznich
Copy link
Member

weiznich commented May 25, 2023

Thanks for finishing this up. I think it's now ready for being merged 🎉

@weiznich weiznich added this pull request to the merge queue May 25, 2023
@omid
Copy link
Contributor Author

omid commented May 25, 2023

I think it's not ready for being merged

It's NOT or it's NOW ready? :D
I'm confused between NOT and 🎉 :))

@weiznich
Copy link
Member

It is supposed to be a "now" (I edit the post above as well)

Merged via the queue into diesel-rs:master with commit d29b7f9 May 25, 2023
@omid omid deleted the multi_column_distinct_on branch May 26, 2023 12:24
@omid
Copy link
Contributor Author

omid commented May 26, 2023

And... there is a problem. The old syntax doesn't work anymore.

.order_by(dsl::sql::<Bool>("col DESC, col2"))

with the following error:

expected `(SqlLiteral<Bool>,)`, found `SqlLiteral<Bool>`

Yet I'm not sure if it's related to this PR or not

@weiznich
Copy link
Member

That's unfortunate :(

It's very likely that this change: https://github.com/diesel-rs/diesel/pull/3628/files#diff-556fca1d890492102d32274fc8da5ebdf8759195ac66e9df5aec8ae849ea2c54R20 caused that issue. (In detail changing the restricting from Expression to Column). We need to somehow change that back.

@weiznich
Copy link
Member

@omid I think what I've written above is the actual issue. I've already tried to work on a fix and got something at works at least partially. See #3643 for that. It would be helpful if you can contribute test case that worked with diesel 2.0 and broke with 2.1 there.
Also ideas how to solve the last missing part are also helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants