left_join makes it possible to try to stuff nullable values into a non-nullable Queryable #2001

icefoxen · 2019-02-28T01:21:31Z

Setup

Pardon the vagueness, my database-fu is too weak to know how to explain this more concisely. This is also slightly simplified so that I can try to explain it at all.

We have two tables, TableA and TableB. We have two Queryable structs, QueryA and QueryB. TableA and TableB have no nullable fields, and QueryA and QueryB contain no Options, and life is good.

Now we add this:

#[derive(Queryable, ...)]
struct QueryC {
    a: QueryA,
    b: QueryB,
}

We can write a diesel query that does an inner join on A and B and stuffs the result into QueryC, and it works: table_a::dsl::table_a.inner_join(table_b::dsl::table_b).select(...).get_results<QueryC>().

However, when we do a left_join() instead, it compiles but panics at runtime with 'Could not get crate list?: DeserializationError(UnexpectedNullError)'. The left_join() produces, basically, (QueryA, Option<QueryB>) but that's not what QueryC expects, and trying to make it contain that fails to compile.

...except now that I test more, it does not-compile, as is correct, but when you do what I'm actually trying to do (which is wrap the whole bloody thing in yet another inner join) it fails. Heck, I'm just going to give you the raw code:

table! {
    crates (id) {
        id -> Int8,
        name -> Varchar,
    }
}

table! {
    crate_versions (id) {
        id -> Int8,
        crate_id -> Int8,
        version -> Varchar,
        checksum -> Varchar,
        yanked -> Bool,
    }
}

table! {
    analyze_cratefile (id) {
        id -> Int8,
        crate_versions_id -> Int8,
        size -> Int8,
        valid -> Bool,
        checksum_matches -> Bool,
    }
}

    // This is the bit that compiles and shouldn't, and panics at runtime
    let crate_versions = c::crates
        .inner_join(cv::crate_versions.left_join(ac::analyze_cratefile))
        .select((
            (cv::id, c::name, cv::version, cv::checksum, cv::yanked),
            (ac::size, ac::valid, ac::checksum_matches),
        ))
        .order_by(c::name)
        .get_results::<CrateWithAnalysis>(conn)
        .expect("Could not get crate list?");

I hope you can make heads or tails of this, 'cause I can't. Please let me know if you need any more information and I'll try to provide, once I've had more than six hours of sleep.

Versions

Rust: 1.32
Diesel: 1.4
Database: PostgreSQL 11
Operating System Linux

Feature Flags

diesel: postgres, extras

Checklist

I have already looked over the issue tracker for similar issues.
This issue can be reproduced on Rust's stable channel. (Your issue will be
closed if this is not the case)

The text was updated successfully, but these errors were encountered:

weiznich · 2019-02-28T15:31:34Z

Seems like you've found the same bug that we've investigated a few hours before that report 🙈
So let me summarize shortly what we've already know:

The problem here is that a left join always returns a nullable field, independently from the schema definition
Diesel enforces that for single left joins. There it's not possible to select such a non nullable field. (To select that field you need to use column.nullable() instead)
There seems to be a bug as soon as more than one join is involved, then the not nullable field is also accepted.

So basically that code above should result in a compiler error but it doesen't. To workaround the runtime error: just call .nullable() on each column coming from the table of the left join. (In your case ac)

icefoxen · 2019-02-28T17:30:53Z

Well thank you for the confirmation that I'm not crazy at least! :D

Is there a way to, instead of having each individual column being nullable, have the entire sub-struct be nullable? So instead of having:

struct Analysis {
    size: Option<...>,
    valid: Option<...>,
    checksum_matches: Option<...>,
}
struct CrateWithAnalysis {
    a: QueryA,
    b: Analysis,
}

I could do:

struct Analysis {
    size: ...,
    valid: ...,
    checksum_matches: ...,
}
struct CrateWithAnalysis {
    a: QueryA,
    b: Option<Analysis>,
}

icefoxen · 2019-02-28T17:48:49Z

Oh nice, got it. Can do (ac::size, ac::valid, ac::checksum_matches).nullable() and it appears to work.

Thank you! You may close this as necessary.

sgrif · 2019-03-01T02:33:34Z

So this problem is beyond left_join. It also happens for inner_join. We can fix this relatively easily for nested left_joins, but for mixed joins I don't see a way we can fix this without overlapping marker traits or disjointness on associated types, neither of which look likely to be stable soon.

https://gist.github.com/sgrif/32710a5f01e1ba36e1ae3fcff2145765 is a compile test demonstrating all the problem cases. We can fix the foo.left_join(bar).left_join(baz) allowing non-nullable columns from bar by changing

diesel/diesel/src/macros/mod.rs

Line 45 in a0c1e29

Left: AppearsInFromClause<$table, Count=Once>,

to be Self: SelectableExpression<Left>.

However, we have the same problem with foo.left_join(bar).inner_join(baz), and I don't see a clear path to solving that case, even with a license to make breaking changes.

sgrif · 2019-03-01T02:36:54Z

We actually might be able to fix this by replacing all of the SelectableExpression impls related to join to be concrete impls for every combination of tables in allow_tables_to_appear_in_same_query!, but I'm worried about the impact that will have on compile times. @aturon @nikomatsakis this is one of those cases I've talked to y'all about in the past, if either of you have some time I'd love to walk you through this issue for us in more detail

Ten0 · 2019-12-05T12:36:07Z

I don't see a way we can fix this without overlapping marker traits or disjointness on associated types, neither of which look likely to be stable soon.

Note that there's this workaround that allows for disjointness based on associated types.
It's pretty verbose but eventually gets the job done (provided it can actually be applied in this case which I haven't been digging enough into to be sure of).

sgrif · 2019-12-05T16:35:08Z

It looks like something along those lines may actually work. Ultimately what we need is for impl SelectableExpression<Join<Left, Right, Inner>> for column to have where column: SelectableExpression<Left> or where column: SelectableExpression<Right>. Since that can only be true for one of those two types today, we can have another trait which selects either Left or Right based on which one the table appears in. E.g. something like this:

impl Select<Left, Right> for (Once, Never) {
    type Selection = Left;
}

impl Select<Left, Right> for (Never, Once) {
    type Selection = Right;
}

impl<Left, Right> SelectableExpression<Join<Left, Right, Inner>> for Column
where
    Left: AppearsInFromClause<table>,
    Right: AppearsInFromClause<table>,
    (Left::Count, Right::Count): Select<Left, Right>,
    Self: SelectableExpression<
         <(Left::Count, Right::Count) as Select<Left, Right>>::Selection,
    >,
{
}

That with the change in my earlier comment fixes the bug, but surfaces a new problem. Our impl SelectableExpression<Join<Left, Right, Kind>> for Nullable<T> actually relies on this bug, so I'll have to figure out a new way to make that work. But I think it's possible.

sgrif · 2019-12-05T17:14:39Z

PR coming shortly

When multiple joins are nested together, the resulting type is essentially a binary tree of each individal join. What this means is that Diesel only has to consider one join at a time in trait implementations. Prior to this commit, when considering nullability, we mistakenly only considered the outermost join. "Considering nullability" in this case means `impl SelectableExpression<TheJoin> for column`. If that trait is implemented, we allow selecting the column directly. If it appears on the right side of a left join, we do not want that trait to be implemented, but we do want it implemented for `Nullable<column>`. The problem code lived in two generated impls in the `table!` macro. For each individual column, we generate two relevant `SelectableExpression` impls: one for inner joins, and one for left joins. Both of these impls were wrong. The impl for left joins was written as: Left: AppearsInFromClause<$table, Count=Once>, Right: AppearsInFromClause<$table, Count=Never>, which in English means "where my table appears once in the left side of the join, and doesn't appear in the right side of the join". This incorrectly assumes that `$table` is the left-most table, or all joins contained in `Left` are inner joins. If `Left == Join<other_table, $table, LeftOuter>`, this constraint incorrectly holds. This first case was much easier to solve. The first line is changed to: Self: SelectableExpression<Left> The inner join case was much harder to solve. The constraint was written as: Join<Left, Right, Inner>: AppearsInFromClause<$table, Count=Once>, In English, this means "my table appears in this join clause exactly once". This fails to consider that the position the table appears in may be the right side of a left join. What we really want to write here is something like this: Self: SelectableExpression<Left> | SelectableExpression<Right>, Where the pipe means "either of these constraints are true". Of course no such impl exists. If we write two separate impls, they are overlapping (and overlapping marker traits are not stable). So we need something different. Since the problem case applies only to columns, we know that the only reason that `SelectableExpression<Left>` and `SelectableExpression<Right>` both hold is if the table appears more than once in the from clause, which we already reject. Because of this, we know that when we want this impl to apply, `(<Left as AppearsInFromClause<$table>>::Count, <Right as AppearsInFromClause<$table>>::Count>)` will either be `(Once, Never)` or `(Never, Once)`. Now that we know we have two disjoint types, we can use that to implement a new trait, which gives us either `Left` or `Right` depending on which of the two counts was `Once`, and use that as the parameter for `SelectableExpression`. In English, the new bounds mean "I am selectable from whichever of these two type parameters contains my table" Since this new trait is only meant as a hack for this single impl, and is not public API, I wanted to avoid having it appear in compiler errors. For this reason, I've added additional impls for all the invalid count combinations. This will lead to an error such as: > table3::id: SelectableExpression<this_table_does_not_appear_in_the_from_clause> instead of > (Never, Never): Select<table1, table2> I did not add explicit compile tests for these errors, as I want to allow for the deletion of those impls if they become a problem in the future. This lead to the final challenge which is that our `impl SelectableExpression<Join<...>> for Nullable<T>` actually relied on this bug. We considered `Nullable<T>: SelectableExpression<AnyJoin>` if `T: SelectableExpression<InnerJoin>`. This of course would break if the actual table is on the right side of a nested left join. To work around this, we've introduced another new trait which recursively converts all joins in a tree to inner joins, which we then use for our check. This also simplified the various impls for `Nullable<T>`, and should correctly ensure that `Nullable<T>` is always usable. Fixes #2001.

sgrif · 2019-12-05T18:14:09Z

#2234 correctly causes the problem code to fail to compile

When multiple joins are nested together, the resulting type is essentially a binary tree of each individal join. What this means is that Diesel only has to consider one join at a time in trait implementations. Prior to this commit, when considering nullability, we mistakenly only considered the outermost join. "Considering nullability" in this case means `impl SelectableExpression<TheJoin> for column`. If that trait is implemented, we allow selecting the column directly. If it appears on the right side of a left join, we do not want that trait to be implemented, but we do want it implemented for `Nullable<column>`. The problem code lived in two generated impls in the `table!` macro. For each individual column, we generate two relevant `SelectableExpression` impls: one for inner joins, and one for left joins. Both of these impls were wrong. The impl for left joins was written as: Left: AppearsInFromClause<$table, Count=Once>, Right: AppearsInFromClause<$table, Count=Never>, which in English means "where my table appears once in the left side of the join, and doesn't appear in the right side of the join". This incorrectly assumes that `$table` is the left-most table, or all joins contained in `Left` are inner joins. If `Left == Join<other_table, $table, LeftOuter>`, this constraint incorrectly holds. This first case was much easier to solve. The first line is changed to: Self: SelectableExpression<Left> The inner join case was much harder to solve. The constraint was written as: Join<Left, Right, Inner>: AppearsInFromClause<$table, Count=Once>, In English, this means "my table appears in this join clause exactly once". This fails to consider that the position the table appears in may be the right side of a left join. What we really want to write here is something like this: Self: SelectableExpression<Left> | SelectableExpression<Right>, Where the pipe means "either of these constraints are true". Of course no such impl exists. If we write two separate impls, they are overlapping (and overlapping marker traits are not stable). So we need something different. Since the problem case applies only to columns, we know that the only reason that `SelectableExpression<Left>` and `SelectableExpression<Right>` both hold is if the table appears more than once in the from clause, which we already reject. Because of this, we know that when we want this impl to apply, `(<Left as AppearsInFromClause<$table>>::Count, <Right as AppearsInFromClause<$table>>::Count>)` will either be `(Once, Never)` or `(Never, Once)`. Now that we know we have two disjoint types, we can use that to implement a new trait, which gives us either `Left` or `Right` depending on which of the two counts was `Once`, and use that as the parameter for `SelectableExpression`. In English, the new bounds mean "I am selectable from whichever of these two type parameters contains my table" Since this new trait is only meant as a hack for this single impl, and is not public API, I wanted to avoid having it appear in compiler errors. For this reason, I've added additional impls for all the invalid count combinations. This will lead to an error such as: > table3::id: SelectableExpression<this_table_does_not_appear_in_the_from_clause> instead of > (Never, Never): Select<table1, table2> I did not add explicit compile tests for these errors, as I want to allow for the deletion of those impls if they become a problem in the future. This lead to the final challenge which is that our `impl SelectableExpression<Join<...>> for Nullable<T>` actually relied on this bug. We considered `Nullable<T>: SelectableExpression<AnyJoin>` if `T: SelectableExpression<InnerJoin>`. This of course would break if the actual table is on the right side of a nested left join. To work around this, we've introduced another new trait which recursively converts all joins in a tree to inner joins, which we then use for our check. This also simplified the various impls for `Nullable<T>`, and should correctly ensure that `Nullable<T>` is always usable. Fixes #2001.

sgrif mentioned this issue Dec 5, 2019

Correctly enforce nullability with nested joins #2234

Merged

weiznich closed this as completed in #2234 Dec 30, 2019

Ten0 mentioned this issue Apr 18, 2020

NonAggregate modifications broke code that doesn't touch aggregation #2368

Closed

This was referenced Jun 18, 2020

left_join incorrectly compiling if additional joins are done afterwards #1459

Closed

Allow to implement a dynamic select clause feature in #2182

Merged

mattdowdell mentioned this issue Nov 18, 2020

Unable to use inner and left join to the same table in a single query #2569

Closed

2 tasks

Ten0 mentioned this issue Dec 19, 2020

Make BoolExpressionMethods not always return a Nullable<bool> #2597

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

left_join makes it possible to try to stuff nullable values into a non-nullable Queryable #2001

left_join makes it possible to try to stuff nullable values into a non-nullable Queryable #2001

icefoxen commented Feb 28, 2019 •

edited

Loading

weiznich commented Feb 28, 2019

icefoxen commented Feb 28, 2019

icefoxen commented Feb 28, 2019

sgrif commented Mar 1, 2019

sgrif commented Mar 1, 2019

Ten0 commented Dec 5, 2019

sgrif commented Dec 5, 2019

sgrif commented Dec 5, 2019

sgrif commented Dec 5, 2019 •

edited

Loading

left_join makes it possible to try to stuff nullable values into a non-nullable Queryable #2001

left_join makes it possible to try to stuff nullable values into a non-nullable Queryable #2001

Comments

icefoxen commented Feb 28, 2019 • edited Loading

Setup

Versions

Feature Flags

Checklist

weiznich commented Feb 28, 2019

icefoxen commented Feb 28, 2019

icefoxen commented Feb 28, 2019

sgrif commented Mar 1, 2019

sgrif commented Mar 1, 2019

Ten0 commented Dec 5, 2019

sgrif commented Dec 5, 2019

sgrif commented Dec 5, 2019

sgrif commented Dec 5, 2019 • edited Loading

icefoxen commented Feb 28, 2019 •

edited

Loading

sgrif commented Dec 5, 2019 •

edited

Loading