Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

left_join makes it possible to try to stuff nullable values into a non-nullable Queryable #2001

Closed
2 tasks done
icefoxen opened this issue Feb 28, 2019 · 9 comments · Fixed by #2234
Closed
2 tasks done

Comments

@icefoxen
Copy link
Contributor

icefoxen commented Feb 28, 2019

Setup

Pardon the vagueness, my database-fu is too weak to know how to explain this more concisely. This is also slightly simplified so that I can try to explain it at all.

We have two tables, TableA and TableB. We have two Queryable structs, QueryA and QueryB. TableA and TableB have no nullable fields, and QueryA and QueryB contain no Options, and life is good.

Now we add this:

#[derive(Queryable, ...)]
struct QueryC {
    a: QueryA,
    b: QueryB,
}

We can write a diesel query that does an inner join on A and B and stuffs the result into QueryC, and it works: table_a::dsl::table_a.inner_join(table_b::dsl::table_b).select(...).get_results<QueryC>().

However, when we do a left_join() instead, it compiles but panics at runtime with 'Could not get crate list?: DeserializationError(UnexpectedNullError)'. The left_join() produces, basically, (QueryA, Option<QueryB>) but that's not what QueryC expects, and trying to make it contain that fails to compile.

...except now that I test more, it does not-compile, as is correct, but when you do what I'm actually trying to do (which is wrap the whole bloody thing in yet another inner join) it fails. Heck, I'm just going to give you the raw code:

table! {
    crates (id) {
        id -> Int8,
        name -> Varchar,
    }
}

table! {
    crate_versions (id) {
        id -> Int8,
        crate_id -> Int8,
        version -> Varchar,
        checksum -> Varchar,
        yanked -> Bool,
    }
}

table! {
    analyze_cratefile (id) {
        id -> Int8,
        crate_versions_id -> Int8,
        size -> Int8,
        valid -> Bool,
        checksum_matches -> Bool,
    }
}

    // This is the bit that compiles and shouldn't, and panics at runtime
    let crate_versions = c::crates
        .inner_join(cv::crate_versions.left_join(ac::analyze_cratefile))
        .select((
            (cv::id, c::name, cv::version, cv::checksum, cv::yanked),
            (ac::size, ac::valid, ac::checksum_matches),
        ))
        .order_by(c::name)
        .get_results::<CrateWithAnalysis>(conn)
        .expect("Could not get crate list?");

I hope you can make heads or tails of this, 'cause I can't. Please let me know if you need any more information and I'll try to provide, once I've had more than six hours of sleep.

Versions

  • Rust: 1.32
  • Diesel: 1.4
  • Database: PostgreSQL 11
  • Operating System Linux

Feature Flags

  • diesel: postgres, extras

Checklist

  • I have already looked over the issue tracker for similar issues.
  • This issue can be reproduced on Rust's stable channel. (Your issue will be
    closed if this is not the case)
@weiznich
Copy link
Member

Seems like you've found the same bug that we've investigated a few hours before that report 🙈
So let me summarize shortly what we've already know:

  • The problem here is that a left join always returns a nullable field, independently from the schema definition
  • Diesel enforces that for single left joins. There it's not possible to select such a non nullable field. (To select that field you need to use column.nullable() instead)
  • There seems to be a bug as soon as more than one join is involved, then the not nullable field is also accepted.

So basically that code above should result in a compiler error but it doesen't. To workaround the runtime error: just call .nullable() on each column coming from the table of the left join. (In your case ac)

@icefoxen
Copy link
Contributor Author

Well thank you for the confirmation that I'm not crazy at least! :D

Is there a way to, instead of having each individual column being nullable, have the entire sub-struct be nullable? So instead of having:

struct Analysis {
    size: Option<...>,
    valid: Option<...>,
    checksum_matches: Option<...>,
}
struct CrateWithAnalysis {
    a: QueryA,
    b: Analysis,
}

I could do:

struct Analysis {
    size: ...,
    valid: ...,
    checksum_matches: ...,
}
struct CrateWithAnalysis {
    a: QueryA,
    b: Option<Analysis>,
}

@icefoxen
Copy link
Contributor Author

Oh nice, got it. Can do (ac::size, ac::valid, ac::checksum_matches).nullable() and it appears to work.

Thank you! You may close this as necessary.

@sgrif
Copy link
Member

sgrif commented Mar 1, 2019

So this problem is beyond left_join. It also happens for inner_join. We can fix this relatively easily for nested left_joins, but for mixed joins I don't see a way we can fix this without overlapping marker traits or disjointness on associated types, neither of which look likely to be stable soon.

https://gist.github.com/sgrif/32710a5f01e1ba36e1ae3fcff2145765 is a compile test demonstrating all the problem cases. We can fix the foo.left_join(bar).left_join(baz) allowing non-nullable columns from bar by changing

Left: AppearsInFromClause<$table, Count=Once>,
to be Self: SelectableExpression<Left>.

However, we have the same problem with foo.left_join(bar).inner_join(baz), and I don't see a clear path to solving that case, even with a license to make breaking changes.

@sgrif
Copy link
Member

sgrif commented Mar 1, 2019

We actually might be able to fix this by replacing all of the SelectableExpression impls related to join to be concrete impls for every combination of tables in allow_tables_to_appear_in_same_query!, but I'm worried about the impact that will have on compile times. @aturon @nikomatsakis this is one of those cases I've talked to y'all about in the past, if either of you have some time I'd love to walk you through this issue for us in more detail

@Ten0
Copy link
Member

Ten0 commented Dec 5, 2019

I don't see a way we can fix this without overlapping marker traits or disjointness on associated types, neither of which look likely to be stable soon.

Note that there's this workaround that allows for disjointness based on associated types.
It's pretty verbose but eventually gets the job done (provided it can actually be applied in this case which I haven't been digging enough into to be sure of).

@sgrif
Copy link
Member

sgrif commented Dec 5, 2019

It looks like something along those lines may actually work. Ultimately what we need is for impl SelectableExpression<Join<Left, Right, Inner>> for column to have where column: SelectableExpression<Left> or where column: SelectableExpression<Right>. Since that can only be true for one of those two types today, we can have another trait which selects either Left or Right based on which one the table appears in. E.g. something like this:

impl Select<Left, Right> for (Once, Never) {
    type Selection = Left;
}

impl Select<Left, Right> for (Never, Once) {
    type Selection = Right;
}

impl<Left, Right> SelectableExpression<Join<Left, Right, Inner>> for Column
where
    Left: AppearsInFromClause<table>,
    Right: AppearsInFromClause<table>,
    (Left::Count, Right::Count): Select<Left, Right>,
    Self: SelectableExpression<
         <(Left::Count, Right::Count) as Select<Left, Right>>::Selection,
    >,
{
}

That with the change in my earlier comment fixes the bug, but surfaces a new problem. Our impl SelectableExpression<Join<Left, Right, Kind>> for Nullable<T> actually relies on this bug, so I'll have to figure out a new way to make that work. But I think it's possible.

@sgrif
Copy link
Member

sgrif commented Dec 5, 2019

PR coming shortly

sgrif added a commit that referenced this issue Dec 5, 2019
When multiple joins are nested together, the resulting type is
essentially a binary tree of each individal join. What this means is
that Diesel only has to consider one join at a time in trait
implementations.

Prior to this commit, when considering nullability, we mistakenly only
considered the outermost join. "Considering nullability" in this case
means `impl SelectableExpression<TheJoin> for column`. If that trait is
implemented, we allow selecting the column directly. If it appears on
the right side of a left join, we do not want that trait to be
implemented, but we do want it implemented for `Nullable<column>`.

The problem code lived in two generated impls in the `table!` macro. For
each individual column, we generate two relevant `SelectableExpression`
impls: one for inner joins, and one for left joins. Both of these impls
were wrong.

The impl for left joins was written as:

    Left: AppearsInFromClause<$table, Count=Once>,
    Right: AppearsInFromClause<$table, Count=Never>,

which in English means "where my table appears once in the left side of
the join, and doesn't appear in the right side of the join". This
incorrectly assumes that `$table` is the left-most table, or all joins
contained in `Left` are inner joins. If `Left == Join<other_table,
$table, LeftOuter>`, this constraint incorrectly holds. This first case
was much easier to solve. The first line is changed to:

    Self: SelectableExpression<Left>

The inner join case was much harder to solve. The constraint was written
as:

    Join<Left, Right, Inner>: AppearsInFromClause<$table, Count=Once>,

In English, this means "my table appears in this join clause exactly
once". This fails to consider that the position the table appears in may
be the right side of a left join. What we really want to write here is
something like this:

    Self: SelectableExpression<Left> | SelectableExpression<Right>,

Where the pipe means "either of these constraints are true". Of course
no such impl exists. If we write two separate impls, they are
overlapping (and overlapping marker traits are not stable). So we need
something different.

Since the problem case applies only to columns, we know that the only
reason that `SelectableExpression<Left>` and
`SelectableExpression<Right>` both hold is if the table appears more
than once in the from clause, which we already reject. Because of this,
we know that when we want this impl to apply, `(<Left as
AppearsInFromClause<$table>>::Count, <Right as
AppearsInFromClause<$table>>::Count>)` will either be `(Once, Never)` or
`(Never, Once)`.

Now that we know we have two disjoint types, we can use that to
implement a new trait, which gives us either `Left` or `Right` depending
on which of the two counts was `Once`, and use that as the parameter for
`SelectableExpression`. In English, the new bounds mean "I am selectable
from whichever of these two type parameters contains my table"

Since this new trait is only meant as a hack for this single impl, and
is not public API, I wanted to avoid having it appear in compiler
errors. For this reason, I've added additional impls for all the invalid
count combinations. This will lead to an error such as:

> table3::id: SelectableExpression<this_table_does_not_appear_in_the_from_clause>

instead of

> (Never, Never): Select<table1, table2>

I did not add explicit compile tests for these errors, as I want to
allow for the deletion of those impls if they become a problem in the
future.

This lead to the final challenge which is that our `impl
SelectableExpression<Join<...>> for Nullable<T>` actually relied on this
bug. We considered `Nullable<T>: SelectableExpression<AnyJoin>` if `T:
SelectableExpression<InnerJoin>`. This of course would break if the
actual table is on the right side of a nested left join. To work around
this, we've introduced another new trait which recursively converts all
joins in a tree to inner joins, which we then use for our check. This
also simplified the various impls for `Nullable<T>`, and should
correctly ensure that `Nullable<T>` is always usable.

Fixes #2001.
sgrif added a commit that referenced this issue Dec 5, 2019
When multiple joins are nested together, the resulting type is
essentially a binary tree of each individal join. What this means is
that Diesel only has to consider one join at a time in trait
implementations.

Prior to this commit, when considering nullability, we mistakenly only
considered the outermost join. "Considering nullability" in this case
means `impl SelectableExpression<TheJoin> for column`. If that trait is
implemented, we allow selecting the column directly. If it appears on
the right side of a left join, we do not want that trait to be
implemented, but we do want it implemented for `Nullable<column>`.

The problem code lived in two generated impls in the `table!` macro. For
each individual column, we generate two relevant `SelectableExpression`
impls: one for inner joins, and one for left joins. Both of these impls
were wrong.

The impl for left joins was written as:

    Left: AppearsInFromClause<$table, Count=Once>,
    Right: AppearsInFromClause<$table, Count=Never>,

which in English means "where my table appears once in the left side of
the join, and doesn't appear in the right side of the join". This
incorrectly assumes that `$table` is the left-most table, or all joins
contained in `Left` are inner joins. If `Left == Join<other_table,
$table, LeftOuter>`, this constraint incorrectly holds. This first case
was much easier to solve. The first line is changed to:

    Self: SelectableExpression<Left>

The inner join case was much harder to solve. The constraint was written
as:

    Join<Left, Right, Inner>: AppearsInFromClause<$table, Count=Once>,

In English, this means "my table appears in this join clause exactly
once". This fails to consider that the position the table appears in may
be the right side of a left join. What we really want to write here is
something like this:

    Self: SelectableExpression<Left> | SelectableExpression<Right>,

Where the pipe means "either of these constraints are true". Of course
no such impl exists. If we write two separate impls, they are
overlapping (and overlapping marker traits are not stable). So we need
something different.

Since the problem case applies only to columns, we know that the only
reason that `SelectableExpression<Left>` and
`SelectableExpression<Right>` both hold is if the table appears more
than once in the from clause, which we already reject. Because of this,
we know that when we want this impl to apply, `(<Left as
AppearsInFromClause<$table>>::Count, <Right as
AppearsInFromClause<$table>>::Count>)` will either be `(Once, Never)` or
`(Never, Once)`.

Now that we know we have two disjoint types, we can use that to
implement a new trait, which gives us either `Left` or `Right` depending
on which of the two counts was `Once`, and use that as the parameter for
`SelectableExpression`. In English, the new bounds mean "I am selectable
from whichever of these two type parameters contains my table"

Since this new trait is only meant as a hack for this single impl, and
is not public API, I wanted to avoid having it appear in compiler
errors. For this reason, I've added additional impls for all the invalid
count combinations. This will lead to an error such as:

> table3::id: SelectableExpression<this_table_does_not_appear_in_the_from_clause>

instead of

> (Never, Never): Select<table1, table2>

I did not add explicit compile tests for these errors, as I want to
allow for the deletion of those impls if they become a problem in the
future.

This lead to the final challenge which is that our `impl
SelectableExpression<Join<...>> for Nullable<T>` actually relied on this
bug. We considered `Nullable<T>: SelectableExpression<AnyJoin>` if `T:
SelectableExpression<InnerJoin>`. This of course would break if the
actual table is on the right side of a nested left join. To work around
this, we've introduced another new trait which recursively converts all
joins in a tree to inner joins, which we then use for our check. This
also simplified the various impls for `Nullable<T>`, and should
correctly ensure that `Nullable<T>` is always usable.

Fixes #2001.
sgrif added a commit that referenced this issue Dec 5, 2019
When multiple joins are nested together, the resulting type is
essentially a binary tree of each individal join. What this means is
that Diesel only has to consider one join at a time in trait
implementations.

Prior to this commit, when considering nullability, we mistakenly only
considered the outermost join. "Considering nullability" in this case
means `impl SelectableExpression<TheJoin> for column`. If that trait is
implemented, we allow selecting the column directly. If it appears on
the right side of a left join, we do not want that trait to be
implemented, but we do want it implemented for `Nullable<column>`.

The problem code lived in two generated impls in the `table!` macro. For
each individual column, we generate two relevant `SelectableExpression`
impls: one for inner joins, and one for left joins. Both of these impls
were wrong.

The impl for left joins was written as:

    Left: AppearsInFromClause<$table, Count=Once>,
    Right: AppearsInFromClause<$table, Count=Never>,

which in English means "where my table appears once in the left side of
the join, and doesn't appear in the right side of the join". This
incorrectly assumes that `$table` is the left-most table, or all joins
contained in `Left` are inner joins. If `Left == Join<other_table,
$table, LeftOuter>`, this constraint incorrectly holds. This first case
was much easier to solve. The first line is changed to:

    Self: SelectableExpression<Left>

The inner join case was much harder to solve. The constraint was written
as:

    Join<Left, Right, Inner>: AppearsInFromClause<$table, Count=Once>,

In English, this means "my table appears in this join clause exactly
once". This fails to consider that the position the table appears in may
be the right side of a left join. What we really want to write here is
something like this:

    Self: SelectableExpression<Left> | SelectableExpression<Right>,

Where the pipe means "either of these constraints are true". Of course
no such impl exists. If we write two separate impls, they are
overlapping (and overlapping marker traits are not stable). So we need
something different.

Since the problem case applies only to columns, we know that the only
reason that `SelectableExpression<Left>` and
`SelectableExpression<Right>` both hold is if the table appears more
than once in the from clause, which we already reject. Because of this,
we know that when we want this impl to apply, `(<Left as
AppearsInFromClause<$table>>::Count, <Right as
AppearsInFromClause<$table>>::Count>)` will either be `(Once, Never)` or
`(Never, Once)`.

Now that we know we have two disjoint types, we can use that to
implement a new trait, which gives us either `Left` or `Right` depending
on which of the two counts was `Once`, and use that as the parameter for
`SelectableExpression`. In English, the new bounds mean "I am selectable
from whichever of these two type parameters contains my table"

Since this new trait is only meant as a hack for this single impl, and
is not public API, I wanted to avoid having it appear in compiler
errors. For this reason, I've added additional impls for all the invalid
count combinations. This will lead to an error such as:

> table3::id: SelectableExpression<this_table_does_not_appear_in_the_from_clause>

instead of

> (Never, Never): Select<table1, table2>

I did not add explicit compile tests for these errors, as I want to
allow for the deletion of those impls if they become a problem in the
future.

This lead to the final challenge which is that our `impl
SelectableExpression<Join<...>> for Nullable<T>` actually relied on this
bug. We considered `Nullable<T>: SelectableExpression<AnyJoin>` if `T:
SelectableExpression<InnerJoin>`. This of course would break if the
actual table is on the right side of a nested left join. To work around
this, we've introduced another new trait which recursively converts all
joins in a tree to inner joins, which we then use for our check. This
also simplified the various impls for `Nullable<T>`, and should
correctly ensure that `Nullable<T>` is always usable.

Fixes #2001.
@sgrif
Copy link
Member

sgrif commented Dec 5, 2019

#2234 correctly causes the problem code to fail to compile

sgrif added a commit that referenced this issue Dec 17, 2019
When multiple joins are nested together, the resulting type is
essentially a binary tree of each individal join. What this means is
that Diesel only has to consider one join at a time in trait
implementations.

Prior to this commit, when considering nullability, we mistakenly only
considered the outermost join. "Considering nullability" in this case
means `impl SelectableExpression<TheJoin> for column`. If that trait is
implemented, we allow selecting the column directly. If it appears on
the right side of a left join, we do not want that trait to be
implemented, but we do want it implemented for `Nullable<column>`.

The problem code lived in two generated impls in the `table!` macro. For
each individual column, we generate two relevant `SelectableExpression`
impls: one for inner joins, and one for left joins. Both of these impls
were wrong.

The impl for left joins was written as:

    Left: AppearsInFromClause<$table, Count=Once>,
    Right: AppearsInFromClause<$table, Count=Never>,

which in English means "where my table appears once in the left side of
the join, and doesn't appear in the right side of the join". This
incorrectly assumes that `$table` is the left-most table, or all joins
contained in `Left` are inner joins. If `Left == Join<other_table,
$table, LeftOuter>`, this constraint incorrectly holds. This first case
was much easier to solve. The first line is changed to:

    Self: SelectableExpression<Left>

The inner join case was much harder to solve. The constraint was written
as:

    Join<Left, Right, Inner>: AppearsInFromClause<$table, Count=Once>,

In English, this means "my table appears in this join clause exactly
once". This fails to consider that the position the table appears in may
be the right side of a left join. What we really want to write here is
something like this:

    Self: SelectableExpression<Left> | SelectableExpression<Right>,

Where the pipe means "either of these constraints are true". Of course
no such impl exists. If we write two separate impls, they are
overlapping (and overlapping marker traits are not stable). So we need
something different.

Since the problem case applies only to columns, we know that the only
reason that `SelectableExpression<Left>` and
`SelectableExpression<Right>` both hold is if the table appears more
than once in the from clause, which we already reject. Because of this,
we know that when we want this impl to apply, `(<Left as
AppearsInFromClause<$table>>::Count, <Right as
AppearsInFromClause<$table>>::Count>)` will either be `(Once, Never)` or
`(Never, Once)`.

Now that we know we have two disjoint types, we can use that to
implement a new trait, which gives us either `Left` or `Right` depending
on which of the two counts was `Once`, and use that as the parameter for
`SelectableExpression`. In English, the new bounds mean "I am selectable
from whichever of these two type parameters contains my table"

Since this new trait is only meant as a hack for this single impl, and
is not public API, I wanted to avoid having it appear in compiler
errors. For this reason, I've added additional impls for all the invalid
count combinations. This will lead to an error such as:

> table3::id: SelectableExpression<this_table_does_not_appear_in_the_from_clause>

instead of

> (Never, Never): Select<table1, table2>

I did not add explicit compile tests for these errors, as I want to
allow for the deletion of those impls if they become a problem in the
future.

This lead to the final challenge which is that our `impl
SelectableExpression<Join<...>> for Nullable<T>` actually relied on this
bug. We considered `Nullable<T>: SelectableExpression<AnyJoin>` if `T:
SelectableExpression<InnerJoin>`. This of course would break if the
actual table is on the right side of a nested left join. To work around
this, we've introduced another new trait which recursively converts all
joins in a tree to inner joins, which we then use for our check. This
also simplified the various impls for `Nullable<T>`, and should
correctly ensure that `Nullable<T>` is always usable.

Fixes #2001.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants