Multi column `distinct_on` for Postgresql #3628

omid · 2023-05-15T13:19:07Z

Add ability to define multiple columns in a single distinct_on for PostgreSQL, like: .distinct_on((column_a, column_b)). Main files related to this:
Fixed some tiny typos and grammar issues

diesel/src/pg/backend.rs

weiznich

Thanks for opening this PR 👍
This looks fine for me (beside the minor thing about Self::Output vs the dsl types)
Additionally I would like to see some compile test that verifies that we reject queries with multiple columns in .distinct_on calls without matching order clause.

diesel/src/query_dsl/group_by_dsl.rs

diesel/src/pg/backend.rs

omid · 2023-05-15T14:31:10Z

Additionally I would like to see some compile test that verifies that we reject queries with multiple columns in .distinct_on calls without matching order clause.

@weiznich, actually it failed with simple correct orders :(
On my local tests, I didn't have any orders and it was fine. But in tests which has orders, it fails.

With simple orders (like .order_by((id, name)) it's fine, but as soon as we have .asc() or .desc() it fails.
I'm not sure how I can solve it. I'm trying but couldn't find the proper way yet :(

weiznich · 2023-05-16T15:06:23Z

I do not have a solution for that yet, so I need to find some time to search for a solution. I cannot promise that this happens this week due to other priorities.

omid · 2023-05-16T15:17:47Z

I do not have a solution for that yet, so I need to find some time to search for a solution. I cannot promise that this happens this week due to other priorities.

NP. I'm also working on this.

First try, to generate all impl cases failed. Because it needs to generate a lot of code, like this:

impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Asc<__D1>, __D2)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Desc<__D1>, __D2)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(__D1, crate::helper_types::Asc<__D2>)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(__D1, crate::helper_types::Desc<__D2>)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Asc<__D1>, crate::helper_types::Asc<__D2>)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Asc<__D1>, crate::helper_types::Desc<__D2>)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Desc<__D1>, crate::helper_types::Asc<__D2>)>
{
}
impl<__D1, __D2,> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>>
for OrderClause<(crate::helper_types::Desc<__D1>, crate::helper_types::Desc<__D2>)>
{
}

if we have 128-column-tables feature enabled, just for the 128 column scenario, it needs to generate 1.18e61 cases :))

weiznich · 2023-05-16T15:24:10Z

What you could try is something like that:

trait OrderDecorator {
    type Plain;
}

impl<C> OrderDecorator for C where C: Column {
   type Plain = C;
}

impl<C> OrderDecorator for Desc<C>  {
    type Plain = C;
}

// implement it for `Asc` as well

impl<__D1,__D2, __O1, __O2> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>> for OrderClause<(__O1, __O2)> 
where __O1: OrderDecorator<Plain = __D1>,
__O2: OrderDecorator<Plain = __D2>,
{}
// implement it for all other tuple sizes

Note that I have not tested that, so there might be problems there.

omid · 2023-05-16T15:33:28Z

Note that I have not tested that, so there might be problems there.

Wow, works! (I tested it for a tuple of two)
I need to learn how you did that, thanks!

weiznich · 2023-05-16T17:28:01Z

I need to learn how you did that, thanks!

There are just some patterns that keep coming up again and again in diesel. That's one of them. I should probably start writing more about those internals some day...

omid · 2023-05-17T10:48:02Z

There are just some patterns that keep coming up again and again in diesel. That's one of them. I should probably start writing more about those internals some day...

That would be awesome!

But... regarding the implementation, I found a tiny issue in your code. Here is the code for a tuple of two:

trait OrderDecorator {
    type Plain;
}

impl<C> OrderDecorator for C where C: Column {
   type Plain = C;
}

impl<C> OrderDecorator for Desc<C>  {
    type Plain = C;
}

impl<C> OrderDecorator for Asc<C>  {
    type Plain = C;
}

impl<__D1,__D2, __O1, __O2> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>> for OrderClause<(__O1, __O2)> 
where __O1: OrderDecorator<Plain = __D1>,
__O2: OrderDecorator<Plain = __D2>,
{}

The issue is that it doesn't work for a query like this:

let source = users
        .select((name, hair_color))
        .order((id, name.asc(), hair_color.desc()))
        .distinct_on((id, name));

because order is a tuple of three and distinct_on is tuple of two.

One solution could be to have many impls per DistinctOnClause for example have this also:

impl<__D1,__D2, __O1, __O2,__O3> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>> for OrderClause<(__O1, __O2,__O3)> 
where __O1: OrderDecorator<Plain = __D1>,
__O2: OrderDecorator<Plain = __D2>,
{}

impl<__D1,__D2, __O1, __O2,__O3,__O4> ValidOrderingForDistinct<DistinctOnClause<(__D1, __D2)>> for OrderClause<(__O1, __O2,__O3,__O4)> 
where __O1: OrderDecorator<Plain = __D1>,
__O2: OrderDecorator<Plain = __D2>,
{}

// and so on

weiznich · 2023-05-19T10:07:30Z

Yes my implementation would only work with "matching" distinct and order by clauses.

So the basic problem here is that the following combinations are fine:

Distinct on clause contains less columns than order by clause and these columns match the begining of the order by clause, so ValidOrderingForDistinct<DistinctOnClause<(__O1::Plain, __O2::Plain)>> for OrderClause<(__O1, __O2,__O3, __O4, …)>
Distinct on clause contains more columns than order by clause where the order by clause "matches" the first columns of the distinct clause: ValidOrderingForDistinct<DistinctOnClause<(__O1::Plain, __O2::Plain, D1, D2, …)>> for OrderClause<(__O1, __O2)>
The column count matches -> The case in my example above.

So we need way more impls to describe that. For each tuple size n we need m impls where m is the maximal supported tuple size. (That ignores one or two additional impls to allow single column cases without tuples) That means it's just m*m impls. So for a max tuple size of 32 we have 1024 impl, for a max size of 128 we have 16k impls. That does not seem to scale well.

I see the following options here:

Just allow exact matches
Maybe do not use the max_table_columns count to determine the upper limit here, but use something smaller. We could say that 5 (or any other reasonable small number) columns in order by/distinct on clauses should be enough for anyone.

I think variant 1 would make it hard to add support for the unsupported cases later on, because it might require removing these existing impls (which is a breaking change). Variant 2 would allow to add support for the unsupported cases, by just bumping the number. I personally would likely go with variant 2, although that will require some modifications to the diesel_for_each_tuple! macro. I think the easiest route there is to just allow an additional optional argument that specifies the tuple size. If that's not set, just fall back to the "default" size.

omid · 2023-05-20T17:33:53Z

@weiznich I would also do #2.

But I'm not sure what exactly you suggested.

Do you mean, for example, for a distinct on tuple of N members, should we implement order by tuples of 1 to N+5? (actually, min(N+5, MAX_TUPLE_SIZE))?
Or for any distinct on tuple, just order by tuples of 1 to 5 only?
Or just impl to max 5 members, either distinct on tuples and order by tuples? (I think you meant this one, then see other items as new suggestions :D)

impl of the first one is harder :)

weiznich · 2023-05-21T16:25:15Z

I've meant to suggest this variant:

Or just impl to max 5 members, either distinct on tuples and order by tuples? (I think you meant this one, then see other items as new suggestions :D)

With the restriction that the max size of the distinct on and order by tuples is 5 (or whatever other limit). So really just a "few" (25) impls for all the cases, but for a low column count.

I think that's the solution that should cover most of the use-cases and that's easy to communicate to the users.

omid · 2023-05-22T16:39:38Z

@weiznich I need your help. I pushed last changes.

I couldn't make the macro smaller, it's a bit tricky! I need to find the minimum of two lists and define bounds based on that, which makes the recursion too complex, based on my knowledge.
Now there is a conflict between impl<T> ValidOrderingForDistinct<DistinctOnClause<T>> for OrderClause<T> where T: Expression {} and

 #[allow(unused_parens)]
#[cfg(feature = "postgres")]
impl<_OT1, T1,> crate::query_dsl::order_dsl::ValidOrderingForDistinct<crate::pg::DistinctOnClause<(T1,)>,> for crate::query_builder::order_clause::OrderClause<(_OT1,)>
    where
        _OT1: crate::pg::OrderDecorator<Column = T1>,
        T1: Column,
{}

To solve it, I tried to put T1: Column here, but still has the conflict (But I think it makes sense to have Column there, no?). It seems like Expression is really general, I don't know what should I use here to limit the scope?
And if I remove that general impl, I'll get this error:

error[E0277]: the trait bound `SelectStatement<FromClause<schema::users::table>,
query_builder::select_clause::SelectClause<(schema::users::columns::name, schema::users::columns::hair_color)>,
query_builder::distinct_clause::NoDistinctClause, query_builder::where_clause::NoWhereClause,
query_builder::order_clause::OrderClause<schema::users::columns::name>>: DistinctOnDsl<_>` is not satisfied

I'll try to find it, but I think you can lead me and review my changes in the meanwhile 🙏🏼

weiznich · 2023-05-22T17:38:32Z

I will have a look at the macro later this week. As for the conflicting impls: You should be able to solve that with the SingleValue bound that was there before.

omid · 2023-05-22T17:50:32Z

There are still conflicts, after I put:

impl<T> ValidOrderingForDistinct<DistinctOnClause<T>> for OrderClause<T>
    where
        T: Expression,
        T::SqlType: SingleValue,
{
}

the conflicts are like before:

`conflicting implementations of trait `ValidOrderingForDistinct<DistinctOnClause<(_,)>>`
for type `order_clause::OrderClause<(_,)>`

(This also uses a different strategy to generate these impls as that results in more maintainable code)

weiznich · 2023-05-24T15:01:16Z

I've pushed a commit that hopefully should workaround these conflicting impls. It also restructures how these impls are generated so that the additional paste dependency is not required anymore. Can you verify that it actually works now?

omid · 2023-05-24T15:39:59Z

@weiznich thanks.
There is still an issue. I get this when I run the distinct_on test:

the trait bound `SelectStatement<FromClause<schema::users::table>,
query_builder::select_clause::SelectClause<(schema::users::columns::name, schema::users::columns::hair_color)>,
query_builder::distinct_clause::NoDistinctClause, query_builder::where_clause::NoWhereClause,
query_builder::order_clause::OrderClause<diesel::expression::operators::Asc<schema::users::columns::name>>>:
DistinctOnDsl<_>` is not satisfied

for this query:

let source = users
        .select((name, hair_color))
        .order(name.asc())
        .distinct_on(name);

omid · 2023-05-24T15:52:23Z

@weiznich

impl<T, O> ValidOrderingForDistinct<DistinctOnClause<T>> for OrderClause<O> where O: OrderDecorator<Column = T> {}

solves the not implemented case.
I'll continue writing more tests.

…esel-rs#3552) Resolves diesel-rs#3551

omid · 2023-05-24T17:09:38Z

@weiznich I think there are no tests to check compile time errors, am I right?
And also, I think we cannot easily show a compile time warning about the limitation of having 5 cases only.

weiznich · 2023-05-24T18:27:06Z

Tests to check for compile time error are usually placed here: https://github.com/diesel-rs/diesel/tree/master/diesel_compile_tests%2Ftests
You basically add a file with code that is supposed to cause an compiler error to the fail folder. You execute these tests by running cargo test in the compile_test directory. You need to accept the output after running the new test for the first time (instructions are in the test failure)

weiznich · 2023-05-25T10:21:57Z

Thanks for finishing this up. I think it's now ready for being merged 🎉

omid · 2023-05-25T10:24:29Z

I think it's not ready for being merged

It's NOT or it's NOW ready? :D
I'm confused between NOT and 🎉 :))

weiznich · 2023-05-25T10:25:59Z

It is supposed to be a "now" (I edit the post above as well)

omid · 2023-05-26T12:28:18Z

And... there is a problem. The old syntax doesn't work anymore.

.order_by(dsl::sql::<Bool>("col DESC, col2"))

with the following error:

expected `(SqlLiteral<Bool>,)`, found `SqlLiteral<Bool>`

Yet I'm not sure if it's related to this PR or not

weiznich · 2023-05-26T13:23:15Z

That's unfortunate :(

It's very likely that this change: https://github.com/diesel-rs/diesel/pull/3628/files#diff-556fca1d890492102d32274fc8da5ebdf8759195ac66e9df5aec8ae849ea2c54R20 caused that issue. (In detail changing the restricting from Expression to Column). We need to somehow change that back.

weiznich · 2023-05-26T16:36:59Z

@omid I think what I've written above is the actual issue. I've already tried to work on a fix and got something at works at least partially. See #3643 for that. It would be helpful if you can contribute test case that worked with diesel 2.0 and broke with 2.1 there.
Also ideas how to solve the last missing part are also helpful.

omid added 3 commits May 14, 2023 12:44

Multi-column distinct_on for Pg

c51f11f

a bit of code style and typos

b790154

revert some forgotten bits

ecac494

omid commented May 15, 2023

View reviewed changes

diesel/src/pg/backend.rs Show resolved Hide resolved

omid commented May 15, 2023

View reviewed changes

diesel/src/pg/backend.rs Show resolved Hide resolved

weiznich reviewed May 15, 2023

View reviewed changes

diesel/src/query_dsl/group_by_dsl.rs Outdated Show resolved Hide resolved

diesel/src/pg/backend.rs Show resolved Hide resolved

diesel/src/pg/backend.rs Show resolved Hide resolved

Merge branch 'master' into multi_column_distinct_on

cdc9178

weiznich mentioned this pull request May 22, 2023

Prepare a 2.1 release #3637

Merged

omid and others added 2 commits May 22, 2023 15:32

Merge branch 'master' into multi_column_distinct_on

750f5ff

25 impls, for up to five order bys and distinct ons

47392e7

Provide non-conflicting impls up to a tuple size of 5

20e00a4

(This also uses a different strategy to generate these impls as that results in more maintainable code)

Ten0 and others added 3 commits May 24, 2023 19:06

Implement field length introspection and make available in schema (di…

258d64b

…esel-rs#3552) Resolves diesel-rs#3551

implement missing cases

d31630a

Merge branch 'master' into multi_column_distinct_on

89dc71f

omid added 2 commits May 24, 2023 23:40

Add compile tests and fix CI issues

1de31ad

add more tests + cs

630731d

weiznich approved these changes May 25, 2023

View reviewed changes

weiznich added this pull request to the merge queue May 25, 2023

Merged via the queue into diesel-rs:master with commit d29b7f9 May 25, 2023

omid deleted the multi_column_distinct_on branch May 26, 2023 12:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi column `distinct_on` for Postgresql #3628

Multi column `distinct_on` for Postgresql #3628

omid commented May 15, 2023 •

edited

Loading

weiznich left a comment

omid commented May 15, 2023 •

edited

Loading

weiznich commented May 16, 2023

omid commented May 16, 2023 •

edited

Loading

weiznich commented May 16, 2023

omid commented May 16, 2023

weiznich commented May 16, 2023

omid commented May 17, 2023

weiznich commented May 19, 2023

omid commented May 20, 2023

weiznich commented May 21, 2023

omid commented May 22, 2023 •

edited

Loading

weiznich commented May 22, 2023

omid commented May 22, 2023

weiznich commented May 24, 2023

omid commented May 24, 2023 •

edited

Loading

omid commented May 24, 2023

omid commented May 24, 2023

weiznich commented May 24, 2023

weiznich commented May 25, 2023 •

edited

Loading

omid commented May 25, 2023

weiznich commented May 25, 2023

omid commented May 26, 2023 •

edited

Loading

weiznich commented May 26, 2023

weiznich commented May 26, 2023

Multi column distinct_on for Postgresql #3628

Multi column distinct_on for Postgresql #3628

Conversation

omid commented May 15, 2023 • edited Loading

weiznich left a comment

Choose a reason for hiding this comment

omid commented May 15, 2023 • edited Loading

weiznich commented May 16, 2023

omid commented May 16, 2023 • edited Loading

weiznich commented May 16, 2023

omid commented May 16, 2023

weiznich commented May 16, 2023

omid commented May 17, 2023

weiznich commented May 19, 2023

omid commented May 20, 2023

weiznich commented May 21, 2023

omid commented May 22, 2023 • edited Loading

weiznich commented May 22, 2023

omid commented May 22, 2023

weiznich commented May 24, 2023

omid commented May 24, 2023 • edited Loading

omid commented May 24, 2023

omid commented May 24, 2023

weiznich commented May 24, 2023

weiznich commented May 25, 2023 • edited Loading

omid commented May 25, 2023

weiznich commented May 25, 2023

omid commented May 26, 2023 • edited Loading

weiznich commented May 26, 2023

weiznich commented May 26, 2023

Multi column `distinct_on` for Postgresql #3628

Multi column `distinct_on` for Postgresql #3628

omid commented May 15, 2023 •

edited

Loading

omid commented May 15, 2023 •

edited

Loading

omid commented May 16, 2023 •

edited

Loading

omid commented May 22, 2023 •

edited

Loading

omid commented May 24, 2023 •

edited

Loading

weiznich commented May 25, 2023 •

edited

Loading

omid commented May 26, 2023 •

edited

Loading