Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No way to express inconsistency when deserializing for Queryable data since #2182 #2523

Closed
Ten0 opened this issue Oct 1, 2020 · 18 comments · Fixed by #2599
Closed

No way to express inconsistency when deserializing for Queryable data since #2182 #2523

Ten0 opened this issue Oct 1, 2020 · 18 comments · Fixed by #2599

Comments

@Ten0
Copy link
Member

Ten0 commented Oct 1, 2020

Setup

Versions

  • Rust: Latest stable (1.46.0)
  • Diesel: Latest master (d7ce43d)
  • Database: PostgreSQL
  • Operating System: Linux

Problem Description

Since #2480 was fixed, I've just been giving another try at updating our codebase to the latest master (#2182).
It turns out I still couldn't because I can't define custom FromSqlRow impls, due to conflicting impls with downstream crates may implement trait, where Rust claims that Diesel could go ahead and implement Queryable or QueryableByName themselves on my type, which would create conflicts because of the following impls:

impl<DB, T> FromSqlRow<Untyped, DB> for T
where
DB: Backend,
T: QueryableByName<DB>,
{
fn build_from_row<'a>(row: &impl Row<'a, DB>) -> Result<Self> {
T::build(row)
}
}

impl<T, ST, DB> FromSqlRow<ST, DB> for T
where
T: Queryable<ST, DB>,
ST: SqlType,
DB: Backend,
T::Row: FromStaticSqlRow<ST, DB>,
{
fn build_from_row<'a>(row: &impl Row<'a, DB>) -> Result<Self> {
let row = <T::Row as FromStaticSqlRow<ST, DB>>::build_from_row(row)?;
Ok(T::build(row))
}
}

Btw, same error when trying to implement StaticallySizedRow because of

impl<T, ST, DB> StaticallySizedRow<ST, DB> for T
where
ST: SqlType + crate::type_impls::tuples::TupleSize,
T: Queryable<ST, DB>,
DB: Backend,
{
const FIELD_COUNT: usize = <ST as crate::type_impls::tuples::TupleSize>::SIZE;
}

What are you trying to accomplish?

I want to customize a FromSqlRowImpl impl because I want to be allowed to fail on deserialization when seeing a data inconsistency, as well as easily reorganize my data in different sub-structures at the same time, while performing validation (which Queryable does not allow me to do).

What is the expected output?

compiles

What is the actual output?

error[E0119]: conflicting implementations of trait `diesel::deserialize::FromSqlRow<diesel::sql_types::Untyped, _>` for type `MyType`:
  --> lib.rs:30:1
   |
30 | / impl<ST, DB> diesel::deserialize::FromSqlRow<ST, DB> for MyType
31 | | where
32 | |     DB: diesel::backend::Backend,
33 | |     (Option<i32>, (Option<i32>, Option<i32>), Option<f64>): diesel::deserialize::FromSqlRow<ST, DB>,
...  |
54 | |     }
55 | | }
   | |_^
   |
   = note: conflicting implementation in crate `diesel`:
           - impl<DB, T> diesel::deserialize::FromSqlRow<diesel::sql_types::Untyped, DB> for T
             where DB: diesel::backend::Backend, T: diesel::QueryableByName<DB>;
   = note: downstream crates may implement trait `diesel::QueryableByName<_>` for type `MyType`

error[E0119]: conflicting implementations of trait `diesel::deserialize::StaticallySizedRow<_, _>` for type `MyType`:
  --> lib.rs:57:1
   |
57 | / impl<ST, DB> diesel::deserialize::StaticallySizedRow<ST, DB> for MyType
58 | | where
59 | |     DB: diesel::backend::Backend,
60 | |     (Option<i32>, (Option<i32>, Option<i32>), Option<f64>): diesel::deserialize::StaticallySizedRow<ST, DB>,
61 | | {
62 | |     const FIELD_COUNT: usize = <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>) as diesel::deserialize::StaticallySizedRow<ST, DB>>::...
63 | | }
   | |_^
   |
   = note: conflicting implementation in crate `diesel`:
           - impl<T, ST, DB> diesel::deserialize::StaticallySizedRow<ST, DB> for T
             where ST: diesel::sql_types::SqlType, ST: diesel::type_impls::tuples::TupleSize, T: diesel::Queryable<ST, DB>, DB: diesel::backend::Backend;
   = note: downstream crates may implement trait `diesel::Queryable<_, _>` for type `MyType`

Steps to reproduce

e.g. (for the above error)

impl<ST, DB> diesel::deserialize::FromSqlRow<ST, DB> for MyType
where
	DB: diesel::backend::Backend,
	(Option<i32>, (Option<i32>, Option<i32>), Option<f64>): diesel::deserialize::FromSqlRow<ST, DB>,
{
	fn build_from_row<'a>(row: &impl diesel::row::Row<'a, DB>) -> diesel::deserialize::Result<Self> {
		let values = <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>) as diesel::deserialize::FromSqlRow<
			ST,
			DB,
		>>::build_from_row(row)?;
		Ok(Self {
			delayed_by_days: values.0.map(|delay| delay.try_into()).transpose()?,
			delayed_by_months: match values.1 {
				(Some(nb_months), day_of_month) => Some(MonthDelay {
					nb_months: nb_months.try_into()?,
					post_rounding_delay: day_of_month.map(<_>::try_into).transpose()?,
				}),
				(None, None) => None,
				(None, Some(_)) => {
					return Err("Failed to deserialize MonthDelay, nb_months null and day_of_month not null".into())
				}
			},
			some_other_field: values.2.filter(|&float| float != 0.),
		})
	}
}



impl<ST, DB> diesel::deserialize::StaticallySizedRow<ST, DB> for MyType
where
	DB: diesel::backend::Backend,
	(Option<i32>, (Option<i32>, Option<i32>), Option<f64>): diesel::deserialize::StaticallySizedRow<ST, DB>,
{
	const FIELD_COUNT: usize = <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>) as diesel::deserialize::StaticallySizedRow<ST, DB>>::FIELD_COUNT;
}
@weiznich
Copy link
Member

weiznich commented Oct 1, 2020

So as first thing I want to write: At least for me it is not that clear that the expected output here should be that this actually compiles. Those impls are in place because otherwise diesel wouldn't work that way it works now. I do not see any way to remove them without causing an even wider breakage. That does not mean that I'm opposed in improving the situation, but at least now I don't have any idea how and I do not have the time anymore to spend another month on figuring out how to write that trait bounds. If someone has an idea, feel free to open a PR.

That written: At least the "ownstream crates may implement trait" error goes away if you use a concrete type for DB.

@Ten0
Copy link
Member Author

Ten0 commented Oct 1, 2020

At least for me it is not that clear that the expected output here should be that this actually compiles

Haha you're right! I probably should have removed the "Expected output" section from the template. ^^
I'm also very happy with the way these now interact with each other in general.
I've been giving this issue more thought, and I think my actual issue isn't that we can't implement a custom FromSqlRow, but more that Queryable (multi-field structure re-composition) does not allow to express an inconsistency when re-composing a structure, which we used to work around by directly implementing FromSqlRow.
An easy way that I see around this would be to allow Queryable::build to fail, just like QueryableByName or FromSql are allowed to fail.

Would you be open to a such PR?

pub trait Queryable<ST, DB>
where
DB: Backend,
{
/// The Rust type you'd like to map from.
///
/// This is typically a tuple of all of your struct's fields.
type Row: FromStaticSqlRow<ST, DB>;
/// Construct an instance of this type
fn build(row: Self::Row) -> Self;
}

pub trait QueryableByName<DB>
where
Self: Sized,
DB: Backend,
{
/// Construct an instance of `Self` from the database row
fn build<'a>(row: &impl NamedRow<'a, DB>) -> Result<Self>;
}

pub trait FromSql<A, DB: Backend>: Sized {
/// See the trait documentation.
fn from_sql(bytes: backend::RawValue<DB>) -> Result<Self>;

impl<T, ST, DB> FromSqlRow<ST, DB> for T
where
T: Queryable<ST, DB>,
ST: SqlType,
DB: Backend,
T::Row: FromStaticSqlRow<ST, DB>,
{
fn build_from_row<'a>(row: &impl Row<'a, DB>) -> Result<Self> {
let row = <T::Row as FromStaticSqlRow<ST, DB>>::build_from_row(row)?;
Ok(T::build(row))
}
}

At least the "downstream crates may implement trait" error goes away if you use a concrete type for DB.

Hmm this workaround doesn't seem to work for me. (Maybe because we're also generic on ST?)

@Ten0 Ten0 changed the title conflicting implementations of trait FromSqlRow: downstream crates may implement trait QueryableByName/Queryable No way to express inconsistency when deserializing for Queryable data since #2182 Oct 1, 2020
@weiznich
Copy link
Member

weiznich commented Oct 1, 2020

Hmm this workaround doesn't seem to work for me. (Maybe because we're also generic on ST?)

I've just checked that. If every involved type is a concrete type, you can implement FromSqlRow. If any type is generic, it will cause a "conflicting implementation" error message. Given this I would say it is not necessary to change Queryable::build to return a Result.

As additional unrelated note: It would be great to get some feedback which kind of changes are required to update from the last diesel release to diesel 2.0/master, as this is definitively something we should somehow mention in our changelog/update guide. (Something that needs to be written…)

@Ten0
Copy link
Member Author

Ten0 commented Oct 1, 2020

Given this I would say it is not necessary to change Queryable::build to return a Result.

So it seems we could have a workaround, making this indeed not completely "necessary".

Also I understand it may sometimes be better to put aside 1% of the cases when the large majority of the time this should never possibly return an error, in order to keep the concepts simpler and save the hassle of error handling to callers of this function.

However,

  • I feel that failing to recompose fields into a structure because data that was stored is inconsistent with what was expected
    • is not a niche scenario, as it's used in several places in our codebase
    • conceptually makes sense
  • Unless I'm mistaken, this function is only ever called (which is not a lot, so wouldn't make error handling really harder)
    • in other Queryable::build impls, where error handling would be reduced to adding an Ok and a ?
    • in the FromSqlRow::build_from_row impl
  • Solving it through Queryable instead of working around it would allow for other great natural extensions, such as:
    • #[diesel(deserialize_as = "SomeType")] could be naturally extended to TryInto, allowing e.g. to write the following:
      #[derive(Queryable)]
      struct S {
          // There is a DB `CHECK` that guarantees this is always positive
          #[diesel(deserialize_as = "i32")]
          some_field: u32,
      }
    • #[diesel(deserialize_as = "SomeType")] could be naturally extended to structures, allowing e.g. to solve my previous problem in the following very diesel-internal-abstract and clean way:
      #[derive(Queryable)]
      #[diesel(deserialize_as = "RawMyType")]
      struct MyType {
          pub delayed_by_days: Option<usize>,
          pub delayed_by_months: Option<MonthDelay>,
          pub some_other_field: Option<f64>,
      }
      #[derive(Queryable)]
      struct RawMyType {
          delayed_by_days: Option<i32>,
          delayed_by_months: Option<i32>,
          post_month_rounding_delayed_by_days: Option<i32>,
          some_other_field: Option<f64>,
      }
      impl TryFrom<RawMyType> for MyType { ... }
  • It would also (even without these last two additional extensions) already naturally reduce the amount of boilerplate: no more of this:
    let values = <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>) as diesel::deserialize::FromSqlRow<
    			ST,
    			DB,
    		>>::build_from_row(row)?;
    as this is already what Queryable allows to abstract and make easy and clean. (When I create a custom FromSqlRow impl, I'm actually always rewriting the impl FromSqlRow for T where T: Queryable, just stripping the Ok).
  • I don't feel like forcing to remove genericity when encountering such a scenario is super comfortable
  • This would probably ease transition to diesel 2.0:
    • No wrapping your head around this precise issue and being forced to make code non-generic
    • May use the occasion to rewrite a few pieces that broke using the above suggested #[diesel(deserialize_as = "SomeType")] improvements
  • Because it's always implemented using the macro, it would require update of a very little amount of code.

So while it may indeed not be absolutely "necessary", I still feel like this use-case is legitimate, and that allowing Queryable::build to return a failure would still be the cleanest and correct approach to solving it.

Given all this, do you think:

  • this is not the right approach? In which case, we'd love to understand why.
  • this is the right approach but you don't have the resources to implement/document it? In which case, my proposal to implement and open the PR still holds, and we'd also totally be willing to help with the documentation.

@weiznich
Copy link
Member

weiznich commented Oct 2, 2020

I will use the weekend to think about this proposal. In the mean time here a few short points that should definitively be addressed:

First of all currently the main contract of Queryable is that it's just a simple map from a tuple returned by the database to a struct, meaning that currently it is conceptually not the place where any type conversation should happen. Changing this is less a technical thing, more a broad conceptual change how things are supposed to work.

#[diesel(deserialize_as = "SomeType")] could be naturally extended to TryInto, allowing e.g. to write the following:

That's a nice idea, but needs much more details. Open questions:

  • What constraints are on the error type?
  • How would we handle the existing impls that use From? I wouldn't like to break them.

#[diesel(deserialize_as = "SomeType")] could be naturally extended to structures, allowing e.g. to solve my previous problem in the following very diesel-internal-abstract and clean way:

I'm honestly not sure if that is that much clearer/shorter/less boilerplate than just implementing FromSqlRow directly.

(When I create a custom FromSqlRow impl, I'm actually always rewriting the impl FromSqlRow for T where T: Queryable, just stripping the Ok).

That's because of the different contracts of the traits. FromSqlRow stands for this type can probably constructed from any row coming from the database, while Queryable says you can map the result of a tuple of the sql type ST to the following datatype. That's semantically different.

Because it's always implemented using the macro, it would require update of a very little amount of code.

At least that assumption is wrong. There are definitively use cases where Queryable is implemented directly. Such a change would have an additional migration impact on them.

This would probably ease transition to diesel 2.0:

  • No wrapping your head around this precise issue and being forced to make code non-generic
  • May use the occasion to rewrite a few pieces that broke using the above suggested #[diesel(deserialize_as = "SomeType")] improvements

This definitively needs more explanation on how this could make a transition easier. To use that code using FromSqlRow must definitively be rewritten, while currently code having a non generic impl only needs some minor fixes for the signature of build_from_row. My general motivation is to keep the breakage of updating to 2.0 as small as possible. If possible normal application code that uses the dsl, the derives and some advanced features should not break at all.

I don't feel like forcing to remove genericity when writing encountering such a scenario is super comfortable

Would you mind giving a example where this genericity is required? I think 99% of the impl can be non-generic or use different concrete impls based on their needs.

That all written: I'm not fundamentally opposed such an idea, but I think we should put a bit more though in the design before just changing things. I do not want to issue a 3.0 release soon after 2.0, just because we messed something up.

@weiznich
Copy link
Member

weiznich commented Oct 2, 2020

Another more fundamental question: If we change Queryable to return a Result, what would be the technical difference between Queryable and FromSqlRow?

From a technical point neither Querable nor QueryableByName are required. Both derives can just generate the corresponding FromSqlRow impl. I did this already as part of the type system rewrite, but added both traits back out of compatibility concerns with existing generic code.

@Ten0
Copy link
Member Author

Ten0 commented Oct 2, 2020

Thanks for your quick answer and for expressing your thoughts in such great detail! 🙂

First of all currently the main contract of Queryable is that it's just a simple map from a tuple returned by the database to a struct, meaning that currently it is conceptually not the place where any type conversation should happen. Changing this is less a technical thing, more a broad conceptual change how things are supposed to work.

From the doc, I read "The purpose of this trait is to convert from a tuple of Rust values that have been deserialized into your struct.", but I'm not sure where there is that conceptual constraint of if being "simple", or "one-to-one" with the fields. Besides, another part of the code seems to say it's "typically a tuple of all of your struct's fields", but could also be any other Rust type that can be deserialized from this SQL type. So the way I understand it, Queryable could be the concept of "defining deserialization through converting from an already-deserializable Rust type to another".

That being said, even if it is a conceptual change, I think changing a concept can not break concept boundaries if both these properties are satisfied:

  • Previous concept is included in new concept
  • New concept still makes sense everywhere it's used which (unless I miscounted the calls when writing my previous message) in our case seems to only be this impl, and recursive use which cannot ever not make sense with itself

#[diesel(deserialize_as = "SomeType")] could be naturally extended to TryInto, allowing e.g. to write the following:

That's a nice idea, but needs much more details. Open questions:

  • What constraints are on the error type?

Error + Send + Sync, so it can be boxed into the error type used there:

pub type Result<T> = result::Result<T, Box<dyn Error + Send + Sync>>;

  • How would we handle the existing impls that use From? I wouldn't like to break them.

Because the standard library implements TryFrom<U> for T where U: Into<T>, and because std::convert::Infaillible is Error + Send + Sync they would naturally not break and keep the same behaviour.

(I actually had these two answers in mind when suggesting this could be done, but I used the idea that the message was heavy enough already as an excuse for myself to not detail them. 😁)

#[diesel(deserialize_as = "SomeType")] could be naturally extended to structures, allowing e.g. to solve my previous problem in the following very diesel-internal-abstract and clean way:

I'm honestly not sure if that is that much clearer/shorter/less boilerplate than just implementing FromSqlRow directly.

Not sure I understand: do you consider

let values = <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>) as diesel::deserialize::FromSqlRow<
			ST,
			DB,
		>>::build_from_row(row)?;

to not be boilerplate, even though it has to be redundant with

where (Option<i32>, (Option<i32>, Option<i32>), Option<f64>): diesel::deserialize::FromSqlRow<ST, DB>,

?

Otherwise I don't see what you mean, as clearly these lines disappearing is by definition less boilerplate and shorter, and the conversion just using TryInto definitely feels easier to grasp (especially for a new user) than all these diesel bounds and traits/types:

impl<ST, DB> diesel::deserialize::FromSqlRow<ST, DB> for MyType
where
	DB: diesel::backend::Backend,
	RawMyType: diesel::deserialize::FromSqlRow<ST, DB>,
{
	fn build_from_row<'a>(row: &impl diesel::row::Row<'a, DB>) -> diesel::deserialize::Result<Self> {

(When I create a custom FromSqlRow impl, I'm actually always rewriting the impl FromSqlRow for T where T: Queryable, just stripping the Ok).

That's because of the different contracts of the traits. FromSqlRow stands for this type can probably constructed from any row coming from the database, while Queryable says you can map the result of a tuple of the sql type ST to the following datatype. That's semantically different.

I'm aware of this, what I was trying to express is that the fact that I'm always rewriting most of what Queryable does when writing custom FromSqlRow impls highlights that this need actually fits very well with the Queryable concept, should we extend it a little.

Because it's always implemented using the macro, it would require update of a very little amount of code.

At least that assumption is wrong. There are definitively use cases where Queryable is implemented directly. Such a change would have an additional migration impact on them.

Oh I might have mis-counted when writing my previous message. I had found (within diesel):

  * in other `Queryable::build` impls, where error handling would be reduced to adding an `Ok` and a `?`

and the code of the macro itself. Where else ?

Or do you mean Queryable is manually implemented in several places in external crates ?
In which case I'm surprised because that is not at all something we've been using precisely because it wouldn't be allowed to fail to build these structures, so it doesn't extend our capabilities compared to deriving Queryable and selecting the appropriate fields in the right place.

This would probably ease transition to diesel 2.0:

This definitively needs more explanation on how this could make a transition easier. To use that code using FromSqlRow must definitively be rewritten, while currently code having a non generic impl only needs some minor fixes for the signature of build_from_row. My general motivation is to keep the breakage of updating to 2.0 as small as possible. If possible normal application code that uses the dsl, the derives and some advanced features should not break at all.

I don't see how the change I suggest changes anything to how FromSqlRow is declared or used, or to its concept. If you meant "code using Queryable" instead, then yes it would require some change, however it would be "as small as possible" as per the question just before, and may be compensated by the fact we provide a clean solution for what used to be the custom FromSqlRow impls. (At least for us, that would have been worth).

I don't feel like forcing to remove genericity when writing encountering such a scenario is super comfortable

Would you mind giving a example where this genericity is required? I think 99% of the impl can be non-generic or use different concrete impls based on their needs.

I'm not saying I have a concrete use-case where genericity is actually required, I'm saying when I read code that mentions "I only work with this specific backend and these specific SQL types", I tend to wonder why it wouldn't work otherwise, question on which I could spend a rather unreasonable amount of time if the answer is "because under the strict condition of non genericity, Rust will not trigger conflicting implementation errors of the kind downstream crates may implement trait", which means we would have to have comments describing this issue instead of having a code that's self-explanatory. Not to mention the fact we'd have to manually match the Rust types with their SQL counterparts everytime we'd work on such an impl.

That all written: I'm not fundamentally opposed such an idea, but I think we should put a bit more though in the design before just changing things.

Which is what we're doing right now, that's great, thanks a lot 😊

I do not want to issue a 3.0 release soon after 2.0, just because we messed something up.

Same applies in the opposite direction. (😜 Joke, I see how this is completely linked to your previous sentence and I completely agree we should always be careful. It's true though.)

Another more fundamental question: If we change Queryable to return a Result, what would be the technical difference between Queryable and FromSqlRow?

Both the technical and conceptual difference would be that FromSqlRow is responsible for the whole process of recomposing the structure from the diesel/database Rows, while Queryable is the part of that FromSqlRow process responsible for the Rust-to-Rust reorganization/validation part of it, independently of how the SQL types are naturally and fail-safely (provided valid schema) converted to Rust types.

Implementing Queryable instead of FromSqlRow would allow (and already allows) to easily safely abstract out of the SQL-to-Rust conversions, provided a valid schema.

From a technical point neither Querable nor QueryableByName are required. Both derives can just generate the corresponding FromSqlRow impl. I did this already as part of the type system rewrite, but added both traits back out of compatibility concerns with existing generic code.

I think both would make even more sense if they allowed for easy extra validation.


Aaaand that's it! 😅 I was already writing answers to your previous message for a while when your last one came in, and it's already been an hour since then. 😄
What do you think ?

@weiznich
Copy link
Member

weiznich commented Oct 5, 2020

From the doc, I read "The purpose of this trait is to convert from a tuple of Rust values that have been deserialized into your struct.", but I'm not sure where there is that conceptual constraint of if being "simple", or "one-to-one" with the fields. Besides, another part of the code seems to say it's "typically a tuple of all of your struct's fields", but could also be any other Rust type that can be deserialized from this SQL type. So the way I understand it, Queryable could be the concept of "defining deserialization through converting from an already-deserializable Rust type to another".

That being said, even if it is a conceptual change, I think changing a concept can not break concept boundaries if both these properties are satisfied:

  • Previous concept is included in new concept
  • New concept still makes sense everywhere it's used which (unless I miscounted the calls when writing my previous message) in our case seems to only be this impl, and recursive use which cannot ever not make sense with itself

My main pain point here is that Queryable currently just maps values, but does not do any deserialization on it's own. Mapping cannot really fail as it just changes the shape of data structure. Deserialisation is currently done by FromSqlRow (for Rows) and FromSql (for single field values). Adding this capability to Queryable feels for me like we have some duplication there. It's not really a new concept, but a concept from another place applied to Queryable. As this other place is literally the outer wrapper/some kind of super trait to Queryable if feel that this is not required.

Or do you mean Queryable is manually implemented in several places in external crates ?
In which case I'm surprised because that is not at all something we've been using precisely because it wouldn't be allowed to fail to build these structures, so it doesn't extend our capabilities compared to deriving Queryable and selecting the appropriate fields in the right place.

There are other reasons that doing actual failable deserialization work why a manual Queryable impl can be useful. To just to name a few:

  • You want to map the results of several queries into a common enum
  • You want to just set some kind of default value for a specific field. (Or any other kind of operation that cannot fail, but adds or removes a field)

Just to link at least one: Example from crates.io

I'm not saying I have a concrete use-case where genericity is actually required, I'm saying when I read code that mentions "I only work with this specific backend and these specific SQL types", I tend to wonder why it wouldn't work otherwise, question on which I could spend a rather unreasonable amount of time if the answer is "because under the strict condition of non genericity, Rust will not trigger conflicting implementation errors of the kind downstream crates may implement trait", which means we would have to have comments describing this issue instead of having a code that's self-explanatory.

Just to write that down as explicitly as possible. Even if those impls seem to be generic, practically they are not generic. (At least from the point of an application, for libraries that build on top of diesel this is a bit different):

  • Beeing generic over ST is just a implementation trick, originally coming from our derive, so that we don't need to list the sql types explicitly (As we don't have a way to get the sql types there). For practical applications there is exactly one type ST that fulfills the where clause saying that the tuple of fields must implement FromSqlRow on it's own.
  • Beeing generic over DB is not really relevant for applications using diesel, as writing code large scale queries that use diesel generically is quite hard. Applications tend to make backend support a compile time options, if they support multiple backends at all.
    That said I feel that at least the error about down stream crates is not correct but I've no motivation to fill a bug for this as those normally don't get fixed in a reasonable amount of time.

Not to mention the fact we'd have to manually match the Rust types with their SQL counterparts everytime we'd work on such an impl.

It's not required to do this match manually. In most cases you can just use diesel::dsl::SqlTypeOf<WhatEver>, where WhatEver is the corresponding query to let rustc figure out that type for you.

Not sure I understand: do you consider

let values = <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>) as diesel::deserialize::FromSqlRow<
  		ST,
  		DB,
  	>>::build_from_row(row)?;

to not be boilerplate, even though it has to be redundant with

where (Option<i32>, (Option<i32>, Option<i32>), Option<f64>): diesel::deserialize::FromSqlRow<ST, DB>,

?

Otherwise I don't see what you mean, as clearly these lines disappearing is by definition less boilerplate and shorter, and the conversion just using TryInto definitely feels easier to grasp (especially for a new user) than all these diesel bounds and traits/types:

impl<ST, DB> diesel::deserialize::FromSqlRow<ST, DB> for MyType
where
  DB: diesel::backend::Backend,
  RawMyType: diesel::deserialize::FromSqlRow<ST, DB>,
{
  fn build_from_row<'a>(row: &impl diesel::row::Row<'a, DB>) -> diesel::deserialize::Result<Self> {

Sure those lines are boilerplate, but I fail to see how this does become better with using TryFrom + a macro attribute, as you just would change one kind of boilerplate into another kind. See the examples below for more details

As we are trying to design a new/better API here, I will just write down different variants how this could be implemented explicitly. This hopefully should make it much clearer what I mean here. Let's start with that variant that's matches your impl above and can be written today with the master branch.

struct MyType {
    pub delayed_by_days: Option<usize>,
    pub delayed_by_months: Option<MonthDelay>,
    pub some_other_field: Option<f64>,
}

impl FromSqlRow<(Nullable<Integer>, (Nullable<Integer>, Nullable<Integer>), Nullable<Double>), diesel::pg::Pg> for MyType
{
	fn build_from_row<'a>(row: &impl diesel::row::Row<'a, diesel::pg::Pg>) -> diesel::deserialize::Result<Self> {
		let values = <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>) as diesel::deserialize::FromSqlRow<
			(Nullable<Integer>, Nullable<Integer>), Nullable<Double>),
			diesel::pg::Pg,
		>>::build_from_row(row)?;
		Ok(Self {
			delayed_by_days: values.0.map(|delay| delay.try_into()).transpose()?,
			delayed_by_months: match values.1 {
				(Some(nb_months), day_of_month) => Some(MonthDelay {
					nb_months: nb_months.try_into()?,
					post_rounding_delay: day_of_month.map(<_>::try_into).transpose()?,
				}),
				(None, None) => None,
				(None, Some(_)) => {
					return Err("Failed to deserialize MonthDelay, nb_months null and day_of_month not null".into())
				}
			},
			some_other_field: values.2.filter(|&float| float != 0.),
		})
	}
}

So the only thing that changes to the original impl is that we removed all generic arguments. On the one hand this skips a part of the complex where clause, on the other hand this removes some genericity. So let's analyze what's exactly is not possible anymore for this impl:

  • The original impl was generic over ST, but that's in fact only "fake" genericity, because the where clause would restrict ST in such a way that only those explicit SQL types are possible. (In the end that was just a way to skip listing the SQL types explicitly for the cost of listing the field types again. Our derives use this trick internally, because we just don't know the SQL types there.).
  • The original impl was generic over DB, to allow to be used with different database backends. In practice is using diesel with multiple different at run time configurable database backends quite hard, so for the waste majority of use cases I would assume that this restriction is fine.

As next version let's have a look on how to write this impl using a possible version of diesel that returns a Result from Queryable:

struct MyType {
    pub delayed_by_days: Option<usize>,
    pub delayed_by_months: Option<MonthDelay>,
    pub some_other_field: Option<f64>,
}

impl<ST, DB> Queryable<ST, DB> for MyType
where (Option<i32>, (Option<i32>, Option<i32>), Option<f64>): Queryable<ST, DB>,
           DB: Backend
{
        type Row = <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>) as Queryable<ST, DB>>::Row;

	fn build(row: Self::Row) -> diesel::deserialize::Result<Self> {
		let values = <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>) as Queryable<
                        ST,
			DB,
		>>::build(row)?;
		Ok(Self {
			delayed_by_days: values.0.map(|delay| delay.try_into()).transpose()?,
			delayed_by_months: match values.1 {
				(Some(nb_months), day_of_month) => Some(MonthDelay {
					nb_months: nb_months.try_into()?,
					post_rounding_delay: day_of_month.map(<_>::try_into).transpose()?,
				}),
				(None, None) => None,
				(None, Some(_)) => {
					return Err("Failed to deserialize MonthDelay, nb_months null and day_of_month not null".into())
				}
			},
			some_other_field: values.2.filter(|&float| float != 0.),
		})
	}
}

While this closely matches your original impl, this does in my opinion not simplify the impl in any meaningful way. It's basically just the same as before. Therefore I think just changing the Queryable::build to return a Result instead does not help much here.

Next let's have a look at a version where we have a struct wide #[deserialize_as ="…"] that uses TryFrom internally:

#[derive(Queryable)]
#[diesel(deserialize_as = "RawMyType")]
struct MyType {
    pub delayed_by_days: Option<usize>,
    pub delayed_by_months: Option<MonthDelay>,
    pub some_other_field: Option<f64>,
}
#[derive(Queryable)]
struct RawMyType {
    delayed_by_days: Option<i32>,
    delayed_by_months: Option<i32>,
    post_month_rounding_delayed_by_days: Option<i32>,
    some_other_field: Option<f64>,
}

impl TryFrom<RawMyType> for MyType { 
    type Error = Box<dyn Error + Send + Sync>;

    fn try_from(v: RawMyType) -> Result<Self, Self::Error> {
        Ok(Self {
		delayed_by_days: v.delayed_by_days.map(|delay| delay.try_into()).transpose()?,
		delayed_by_months: match (v.delayed_by_months, v.post_month_rounding_delayed_by_days) {
			(Some(nb_months), day_of_month) => Some(MonthDelay {
				nb_months: nb_months.try_into()?,
				post_rounding_delay: day_of_month.map(<_>::try_into).transpose()?,
			}),
			(None, None) => None,
			(None, Some(_)) => {
				return Err("Failed to deserialize MonthDelay, nb_months null and day_of_month not null".into())
			}
		},
		some_other_field: v.some_other_field.filter(|&float| float != 0.),
	})
    }
}

While you would not have some "magic" line to construct our initial tuple here, we need to list all field of the struct not only once, but three times, as you need to write the definition for RawMyType and MyType + the final construction of MyType in the TryFrom impl. Compared to the both impls above this does not seem to be much better for me.

Taking a step back at the first impl, as you can write it at least a bit clearer as your example:

struct MyType {
    pub delayed_by_days: Option<usize>,
    pub delayed_by_months: Option<MonthDelay>,
    pub some_other_field: Option<f64>,
}

impl FromSqlRow<dsl::SqlTypeOf<YourQuery>, diesel::pg::Pg> for MyType
{
	fn build_from_row<'a>(row: &impl diesel::row::Row<'a, diesel::pg::Pg>) -> diesel::deserialize::Result<Self> {
                use diesel::row::NamedRow;
		Ok(Self {
			delayed_by_days: row.get::<Nullable<Integer>, _>("first_field_name").map(|delay| delay.try_into()).transpose()?,
			delayed_by_months: match row.get::<Nullable<Integer>, _>("second_field_name") {
				(Some(nb_months), day_of_month) => Some(MonthDelay {
					nb_months: nb_months.try_into()?,
					post_rounding_delay: day_of_month.map(<_>::try_into).transpose()?,
				}),
				(None, None) => None,
				(None, Some(_)) => {
					return Err("Failed to deserialize MonthDelay, nb_months null and day_of_month not null".into())
				}
			},
			some_other_field: row.get::<Nullable<Double>, _>("other_field").filter(|&float| float != 0.),
		})
	}
}

As you can receive all values directly form the row, there is not really a need to use that line you consider as boilerplate to construct the values anymore. (In that context, I probably meaningful addition to our API would probably to have a Row::get like method that does not return a raw database value, but a fully deserialized value, much like NamedRow::get does.)

To summarize that what I've written a bit: I think the underlying problem is that we don't have a great way to express that some additional deserialisation work is required for a specific field (or even for combining multiple fields). That's definitively something that can and should be improved, but at least the current suggestions are in my opinions not really a great fit here. It feels like there must be a better option, that just allows to say how to map this specific subset of fields without touching all the unrelated fields. Unfortunately I do not have any concrete idea how such an API could look like. One possible option would be to fit this somewhere in between calling FromSqlRow for the plain tuple and mapping this tuple to a rust struct, but I do not have a concrete idea how to implement this in a usable way yet.

@Ten0
Copy link
Member Author

Ten0 commented Oct 5, 2020

Thanks again for your quick and detailed answer.

My main pain point here is that Queryable currently just maps values, but does not do any deserialization on it's own. Adding this capability to Queryable feels for me like we have some duplication there. It's not really a new concept, but a concept from another place applied to Queryable. As this other place is literally the outer wrapper/some kind of super trait to Queryable if feel that this is not required.

First, I'd like to underline that I'm very aware that Queryable currently does not do deserialization on its own, and that in my current proposition, it still doesn't: contrarily to the example given in #2523 (comment)

this impl using a possible version of diesel that returns a Result from Queryable:

	fn build(row: Self::Row) -> diesel::deserialize::Result<Self> {
		let values = <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>) as Queryable<
			ST,
			DB,
		>>::build(row)?;
		Ok(Self {

it seems like the part setting values from build(row) does not exist, as row is already Self::Row.

Regarding the point of concepts being the same, I think you had already expressed it in #2523 (comment), and I thought it had been addressed by:

Both the technical and conceptual difference would be that FromSqlRow is responsible for the whole process of recomposing the structure from the diesel/database Rows, while Queryable is the part of that FromSqlRow process responsible for the Rust-to-Rust reorganization/validation part of it, independently of how the SQL types are naturally and fail-safely (provided valid schema) converted to Rust types.

as well as the first paragraph you were answering to, that was coming to the conclusion that even now,

Queryable could be the concept of "defining deserialization through converting from an already-deserializable Rust type to another"

It looks to me that one concept is included in (in the sense that it is meant to be executed as part of, and is never called out of) the other, both on the current master as well as in this suggestion, but that in both cases they still differ.

Just to link at least one: Example from crates.io

I love this example because I could not have given a better one advocating for my suggestion: this is precisely the kind of problem my suggestion is meant to solve. What happened there seems to be:

  • We wanted to recompose a structure from an already-deserializable row tuple, which Queryable is meant for
  • It could fail if the database had unexpected contents, but the function does not allow returning an inconsistency error -> well... I don't really want to query into RawBadge (or (i32, String, serde_json::Value)) instead of Badge everytime I .load::<T>(db) that type of data and then attempt the conversion after, so let's bet it's never going to happen, and we'll .expect("unexpected data in db"). => Now it could panic the server if the database had unexpected contents.

Because we are looking for high availability and reliability even if there happened to be different levels of experience among coders, it's not acceptable for us to panic in these cases. In fact, it looks like a large majority of the case, panicking is not going to be the appropriate answer, while an Err on .load(db) (that could be sent to an error reporting system with additional contextual data for easy debug while not affecting the rest of program, in particular interacting properly with regards to open transactions getting released to connection pool and not breaking worker threads) would be. So in the end it looks like if this suggestion forces users to go and update that kind of piece of code, it's actually going reduce the odds of them (and crates.io!) unexpectedly panicking, which is exactly the outcome I would expect and love to see from it! 😃
In my opinion, a few of these is very quickly sufficient to outweigh a large number of "oh well I'm going to have to add Ok( here".

You want to just set some kind of default value for a specific field. (Or any other kind of operation that cannot fail, but adds or removes a field)

Or any other kind of operation that cannot fail => Then one day you have a new scenario, it can start failing, and you have to rewrite it using a different trait, drop genericity, figure out the SQL types mapping, and add some more boilerplate. My bet is a lot of people are going to just add .expect(). 🙁

Regarding the comments on the amount of boilerplate:

  • It looks like SqlTypeOf<YourQuery> is not something we could use as YourQuery is almost always a much more complex type to express than the SQL types of what I'm selecting, so I'd rather select the tuple of the SQL types.
  • Regarding the version withTryInto, among our most important criteria for production code are (much more than code length):
    • It's as easy as possible to understand what it does (makes interaction with it and reviews easy)
    • There's a very high chance it works (it's as hard as possible to have made a mistake, makes reviews easy and avoids bugs in deployed software)
      So overall we're fine with it taking a bit longer to write if that mostly guarantees us that it works, (which a good part of why we've chosen Rust).
      And regarding these criteria:
    • It requires minimal knowlege on Diesel and its deserialization process to understand what will happen
    • It's much harder to mismatch delayed_by_days: v.delayed_by_days than delayed_by_days: values.0
    • This typo would compile: delayed_by_days: row.get::<Nullable<Integer>, _>("delayed_by_day") while this one wouldn't: delayed_by_days: v.delayed_by_day (plus this approach doesn't work for fields named e.g. id because there may be several of them selected in a single query)

So it looks like with these criteria, the TryInto version may be the best choice.

Even if those impls seem to be generic, practically they are not generic.

This I beleive had been mentionned in #2523 (comment):

I think 99% of the impl can be non-generic or use different concrete impls based on their needs.

The issue I tried to explain at #2523 (comment) wasn't that it couldn't be written or wouldn't work, it was that code mentioning it isn't generic when in fact it could be implies that a reader may wonder "why it couldn't be generic", "whether it relies on some kind of specific postgres internals", etc. Overall I feel like this would raise unnecessary questions when trying to understand the given code than if code just said "this works for whatever as long as I can deserialize these fields already". And unnecessary questions raised is lost time. That being said, I admit it's a relatively minor part of why I feel that code using FromSqlRow would be much harder to understand: the amount of required trait bounds and multi-line generic function calls is the biggest part of it.

Now I'm not sure how

  • Beeing generic over ST is just a implementation trick
  • In practice is using diesel with multiple different at run time configurable database backends quite hard

addresses this point of "it makes code harder to understand", however it does raise a very interesting question: if having different Queryable behaviors depending on SQL type or backend is such a niche use-case (and I would agree), it would probably always be implemented without an additional layer of genericity, so it could probably use FromSqlRow instead of Queryable, as the conflicting impl error wouldn't trigger. Doesn't this mean we could also get rid of the genericity on the SQL type and on the backend on Queryable, and rewrite it like so?

 pub trait Queryable
 { 
     /// The Rust type you'd like to map from. 
     /// 
     /// This is typically a tuple of all of your struct's fields. 
     type Row; 
     fn build(row: Self::Row) -> Self;  // or Result<Self>
 } 

 impl<T, ST, DB> FromSqlRow<ST, DB> for T 
 where 
     T: Queryable, 
     ST: SqlType, 
     DB: Backend, 
     T::Row: FromStaticSqlRow<ST, DB>, 
 { 
     fn build_from_row<'a>(row: &impl Row<'a, DB>) -> Result<Self> { 
         let row = <T::Row as FromStaticSqlRow<ST, DB>>::build_from_row(row)?; 
         Ok(T::build(row)) 
     } 
 } 

This would in turn simplify again:

impl Queryable for MyType
{
        type Row = (Option<i32>, (Option<i32>, Option<i32>), Option<f64>);

	fn build(row: Self::Row) -> diesel::deserialize::Result<Self> {
		Ok(Self {
			delayed_by_days: values.0.map(|delay| delay.try_into()).transpose()?,
			delayed_by_months: match values.1 {
				(Some(nb_months), day_of_month) => Some(MonthDelay {
					nb_months: nb_months.try_into()?,
					post_rounding_delay: day_of_month.map(<_>::try_into).transpose()?,
				}),
				(None, None) => None,
				(None, Some(_)) => {
					return Err("Failed to deserialize MonthDelay, nb_months null and day_of_month not null".into())
				}
			},
			some_other_field: values.2.filter(|&float| float != 0.),
		})
	}
}

while additionally increasing the distinction with from FromSqlRow and still matching the concept of "defining deserialization through converting from an already-deserializable Rust type to another".

It feels like there must be a better option, that just allows to say how to map this specific subset of fields without touching all the unrelated fields.

It looks like this Queryable pattern already allows this:

#[derive(Queryable)]
struct S {
    specific_subset: SpecificSubset,
    unrelated_fields: i32,
}

struct SpecificSubset {
    ...
}

impl Queryable for SpecificSubset {
    type Row = ...;
    fn build(row: Self::Row) -> diesel::deserialize::Result<Self> {
        ...
    }
}

Of course I renew my offer to spend some time implementing this and experimenting with it. 🙂

Thanks again for your great answer.
What do you think ? 🙂

@weiznich
Copy link
Member

weiznich commented Oct 6, 2020

addresses this point of "it makes code harder to understand", however it does raise a very interesting question: if having different Queryable behaviors depending on SQL type or backend is such a niche use-case (and I would agree), it would probably always be implemented without an additional layer of genericity, so it could probably use FromSqlRow instead of Queryable, as the conflicting impl error wouldn't trigger. Doesn't this mean we could also get rid of the genericity on the SQL type and on the backend on Queryable, and rewrite it like so?

So the main reason why Queryable looks like it looks now is: It looked that way since long before diesel 1.0. At least till #2182 ST and DB where required, because Queryable was used as trait bound in Connection to say a query of that type could be used to construct this type. As already stated, the main reason why Queryable exists today is for backward compatibility reasons. Or to word it differently: Either Queryable remains as defined before, or we can just remove that trait completely from the code base and replace it with something more fitting (which should not be named Queryable than). That written I really want to avoid that, because that will cause a quite large breakage for everyone that want's to update to diesel 2.0 as this would imply that we need to deprecate at least #[derive(Queryable)] (because it would be confusing to have a derive that implements a completely different trait)

(Additionally such a change would prevent having more than one Queryable impl for a single type. Again this something I know to exist at list in a non public codebase I have access to)

Of course I renew my offer to spend some time implementing this and experimenting with it. 🙂

Feel free to experiment a bit with this. Here are the constraints from my side for accepting something like this in diesel:

  • Adding this feature should not require large scale changes to application code. (This includes already existing manual Queryable implementations!)
  • The API should be flexible, so that it can handle at least the following cases:
    • Applying a transformation to a single field (basically the existing #[deserialize_as = …])
    • Applying the transformation to multiple fields at once so that we can combine two fields in the database, into one field on rust side
  • Requires a minimal amount of code duplication (I already know that we would get a ton of complains for that struct level #[deserialize_as = …] approach)

I think a potential way to solve this would be a three step approach:

  1. Loading tuple of values from the database.
  2. Having some functionality that does additional type conversations as part of the loading process
  3. Construct the outgoing struct (that implements currently Queryable)

This would require having not one, but two associated Row types somewhere, where the first one could be defined by optional #[deserialize_as =…] annotation and the second one would be defined by the fields. Step 2 would then define a mapping between those row representations. This mapping could be implemented by default for tuples of the same size, where all members implement TryInto/TryFrom to map to the corresponding output tuple member. In that way it would be implemented by default for all cases where both row types are the same or there is an existing From impl. Additionally this would leave the possibility open to define a fully custom mapping that adds or removes fields by implementing the mapping trait directly.

Minor unrelated notes

(Just want to have some correction here, for the case someone looks at this later and uses this as source for whatever…)

it seems like the part setting values from build(row) does not exist, as row is already Self::Row.

Self::Row is not necessarily a tuple here, it's the opaque type <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>) as Queryable<ST, DB,>>::Row

(that's only a minor nit)

It looks like SqlTypeOf is not something we could use as YourQuery is almost always a much more complex type to express than the SQL types of what I'm selecting, so I'd rather select the tuple of the SQL types.

YourQuery does not necessarily need to be a type representing the complete query, only the part that is relevant for the where clause. So in some cases it's as simple as schema::table::all_columns or something like that.

This typo would compile: delayed_by_days: row.get::<Nullable, _>("delayed_by_day") while this one wouldn't: delayed_by_days: v.delayed_by_day (plus this approach doesn't work for fields named e.g. id because there may be several of them selected in a single query)

As written in my original comment: I would happily accept a PR adding a Row::get variant that works with indices instead, that would solve the multiple id problem at least.

@Ten0
Copy link
Member Author

Ten0 commented Oct 6, 2020

I would happily accept a PR adding a Row::get variant that works with indices instead, that would solve the multiple id problem at least

IIRC this was already done as part of #2182:

pub trait Row<'a, DB: Backend>: RowIndex<usize> + for<'b> RowIndex<&'b str> + Sized {

diesel/diesel/src/row.rs

Lines 53 to 55 in d7ce43d

fn get<I>(&self, idx: I) -> Option<Self::Field>
where
Self: RowIndex<I>;

However that seems even more risky to use than the tuple, in terms of "code will not crash at runtime".

@Ten0
Copy link
Member Author

Ten0 commented Oct 6, 2020

I'm having a bit of trouble reading the whole answer #2523 (comment) as there seem to be a lot of quotes from my previous message appearing in seemingly random places (middle of sentences/words).

I think a potential way to solve this would be a three step approach:

  1. Loading tuple of values from the database.
  2. Having some functionality that does additional type conversations as part of the loading process
  3. Construct the outgoing struct (that implements currently Queryable)

I'm not sure why you separate 2 and 3 in your suggestion: often, struct constructors will perform validation (e.g. NonZeroU32::new), and being struct constructors, they are as well type conversions, and the fact the constructors are the ones performing the validation is what guarantees the struct can't be built in an invalid way. If 2 and 3 are merged in the same interface, you get e.g.

mod a_greater_than_b {
    #[derive(Copy, Clone, Error)]
    struct AGreaterThanBBuildError;
    #[derive(Copy, Clone)]
    struct AGreaterThanB {
            a: i32,
            b: i32,
    }
    impl AGreaterThanB {
        fn new(a: i32, b: i32) -> Result<Self, AGreaterThanBBuildError> {
            if a >= b {
                Ok(Self{ a, b })
            } else {
                Err(NotPositive)
            }
        }
        fn into_inner(self) -> (i32, i32) {
            (self.a, self.b)
        }
    }
}
// Where it's db-checked to always be the case for whichever types we select in this
impl Queryable for AGreaterThanB {
    type Row = (i32,i32);
    fn build((a,b): Self::Row) -> diesel::deserialize::Result<Self> {
        Ok(AGreaterThanB::new(a, b)?)
    }
}

What do you mean by them being splitted, why do you think they should be splitted, and how would you write this if they were splitted?

Either Queryable remains as defined before, or we can just remove that trait completely from the code base and replace it with something more fitting; (which should not be named Queryable than). That written I really want to avoid that, because that will cause a quite large breakage for everyone that want's to update to diesel 2.0 as this would imply that we need to deprecate at least #[derive(Queryable)] (because it would be confusing to have a derive that implements a completely different trait)

I don't agree it should not be named Queryable: I feel like the concept as I was defining it still matches with the current spirit and documentation of Queryable (though its behaviour changes a bit in that it tolerates failure when performing the type conversions):

/// Trait indicating that a record can be queried from the database.
///
/// Types which implement `Queryable` represent the result of a SQL query. This
/// does not necessarily mean they represent a single database table.

Additionally such a change would prevent having more than one Queryable impl for a single type. Again this something I know to exist at list in a non public codebase I have access to

Yes, I thought this was addressed by:

it would probably always be implemented without an additional layer of genericity, so it could probably use FromSqlRow instead of Queryable

@weiznich
Copy link
Member

weiznich commented Oct 6, 2020

I'm having a bit of trouble reading the whole answer #2523 (comment) as there seem to be a lot of quotes from my previous message appearing in seemingly random places (middle of sentences/words).

Sorry for that, should be fixed now.

IIRC this was already done as part of #2182:

pub trait Row<'a, DB: Backend>: RowIndex<usize> + for<'b> RowIndex<&'b str> + Sized {

diesel/diesel/src/row.rs

Lines 53 to 55 in d7ce43d

fn get<I>(&self, idx: I) -> Option<Self::Field>
where
Self: RowIndex<I>;

That method returns a Field, I'm talking about a variant that retruns the deserialized value directly.

However that seems even more risky to use than the tuple, in terms of "code will not crash at runtime".

Just to reiterate that: All Queryable impls are doing that internally

I'm not sure why you separate 2 and 3 in your suggestion

I think that will help to reduce the impact on already existing code. If somehow possible I just want to prevent that existing code is broken, which would be the case if we rerurn a Result from Queryable::build. (At least as far as this does not result into a bad API)

@Ten0
Copy link
Member Author

Ten0 commented Oct 6, 2020

Self::Row is not necessarily a tuple here, it's the opaque type <(Option, (Option, Option), Option) as Queryable<ST, DB,>>::Row

Right. It looks like the lines still disappear though if we just write type Row = <(Option<i32>, (Option<i32>, Option<i32>), Option<f64>); (it will have a different path, going through FromSqlRow for MyType -> (FromStaticSqlRow for (Option...) ->Queryable for (Option...)) -> Queryable for MyType instead of FromSqlRow for MyType -> FromStaticSqlRow for (Option...)::Row -> (Queryable for MyType -> Queryable for (Option...)) but eventually have the same outcome, while being simpler.)

Sorry for that, should be fixed now.

Almost, there's still one in the 1st paragraph ^^

That method returns a Field, I'm talking about a variant that retruns the deserialized value directly.

Oh yes it looks like it would be easy to extend this:

diesel/diesel/src/row.rs

Lines 191 to 205 in d7ce43d

impl<'a, R, DB> NamedRow<'a, DB> for R
where
R: Row<'a, DB>,
DB: Backend,
{
fn get<ST, T>(&self, column_name: &str) -> deserialize::Result<T>
where
T: FromSql<ST, DB>,
{
let field = Row::get(self, column_name)
.ok_or_else(|| format!("Column `{}` was not present in query", column_name))?;
T::from_nullable_sql(field.value())
}
}

though it probably shouldn't be called NamedRow anymore then. Maybe just a new method on the standard Row?

Still that would be unsatisfactory for the amount of type-safety I'd like to achieve:

Just to reiterate that: All Queryable impls are doing that internally

It looks to me like the Queryable interface is much safer on that regard: it receives as entry the already-deserialized tuple (or other deserialized), and the provided FromStaticSqlRow impls for the tuples (or other deserialized) provide the type-safety that e.g. we will not attempt to deserialize a Nullable<Integer> into an i32.
It looks like that would be something easy to forget if using that other kind of interface, as nothing seems to prevent you from doing by mistake row.get::<Integer, _>("second_field_name") instead of row.get::<Nullable<Integer>, _>("second_field_name") mismatching what you think you have when writing this code with the actual schema. And it looks like typing in numbers instead of field names would make it even harder to ensure that consistency.

I think that will help to reduce the impact on already existing code. If somehow possible I just want to prevent that existing code is broken, which would be the case if we rerurn a Result from Queryable::build.

I really feel that it's quickly worth forcing people to update their code by adding an Ok and a deserialize::Result when a good part of the time that will also largely encourage them to fix the way their conversion errors are handled (that is, not panicking, cf. #2523 (comment)), and even more so if they can also simplify them by getting rid of previously-implementation-trick generic parameters on the trait.

I think it would be reasonably easy to update with this detailed in the CHANGELOG.

@weiznich
Copy link
Member

weiznich commented Oct 7, 2020

Again, feel free to do some experimentation here. As already expressed above I have a different opinion about changing anything about the definition of Queryable, because from my point of view this trait purely exists because backward compatibility now. If we change something and users need to touch their code anyways we can just remove that trait because it is not needed anymore then and changing from Queryable to FromSqlRow would be also a quite simple change then. (No more conflicting impls and so on …). Everything done by Queryable could be simply done in the same type safe manner inside of FromSqlRow. This is technically a quite easy change, but I think the breakage caused by this would be quite large (at least if we deprecate #[derive(Queryable)] as part of this).

Pinging @diesel-rs/contributors here, maybe someone has other opinions here.

@Ten0
Copy link
Member Author

Ten0 commented Oct 7, 2020

If we change something and users need to touch their code anyways we can just remove that trait because it is not needed anymore then and changing from Queryable to FromSqlRow would be also a quite simple change then

So just to confirm with examples, you think if we choose to allow Diesel 2.0 to force users to update their existing manual implementations of Queryable, it would be better to have users transition from (taking the example from crates.io again):

impl diesel::deserialize::Queryable<(diesel::sql_types::Integer, diesel::sql_types::Text, diesel::pg::types::sql_types::Jsonb), diesel::pg::Pg> for Badge {
    type Row = (i32, String, serde_json::Value);

    fn build((_, badge_type, attributes): Self::Row) -> Self {
        let json = json!({"badge_type": badge_type, "attributes": attributes});
        serde_json::from_value(json).expect("Invalid CI badge in the database")
    }
}

to

impl diesel::deserialize::FromSqlRow<(diesel::sql_types::Integer, diesel::sql_types::Text, diesel::pg::types::sql_types::Jsonb), diesel::pg::Pg> for Badge {
    fn build_from_row<'a>(row: &impl diesel::row::Row<'a, diesel::pg::Pg>) -> diesel::deserialize::Result<Self> {
        let values = <(i32, String, serde_json::Value) as diesel::deserialize::FromSqlRow<
            (diesel::sql_types::Integer, diesel::sql_types::Text, diesel::pg::types::sql_types::Jsonb),
            diesel::pg::Pg,
        >>::build_from_row(row)?;
        let json = json!({"badge_type": badge_type, "attributes": attributes});
        Ok(serde_json::from_value(json).map_err(|e| format!("Invalid CI badge in the database: {}", e))?)
    }
}

(or from

impl<ST, DB> diesel::deserialize::Queryable<ST, DB> for Badge
where
    DB: diesel::backend::Backend,
    (i32, String, serde_json::Value): diesel::deserialize::FromSqlRow<ST, DB>,
{
    type Row = (i32, String, serde_json::Value);

    fn build((_, badge_type, attributes): Self::Row) -> Self {
        let json = json!({"badge_type": badge_type, "attributes": attributes});
        serde_json::from_value(json).expect("Invalid CI badge in the database")
    }
}
impl<ST, DB> diesel::deserialize::FromSqlRow<ST, DB> for Badge
where
    DB: diesel::backend::Backend,
    (i32, String, serde_json::Value): diesel::deserialize::FromSqlRow<ST, DB>,
{
    fn build_from_row<'a>(row: &impl diesel::row::Row<'a, diesel::pg::Pg>) -> diesel::deserialize::Result<Self> {
        let values = <(i32, String, serde_json::Value) as diesel::deserialize::FromSqlRow<
            ST,
            DB,
        >>::build_from_row(row)?;
        let json = json!({"badge_type": badge_type, "attributes": attributes});
        Ok(serde_json::from_value(json).map_err(|e| format!("Invalid CI badge in the database: {}", e))?)
    }
}

)
rather than to:

impl diesel::deserialize::Queryable for Badge {
    type Row = (i32, String, serde_json::Value);

    fn build((_, badge_type, attributes): Self::Row) -> diesel::deserialize::Result<Self> {
        let json = json!({"badge_type": badge_type, "attributes": attributes});
        Ok(serde_json::from_value(json).map_err(|e| format!("Invalid CI badge in the database: {}", e))?)
    }
}

despite the fact that it would probably as you mentioned require changing the idiomatic Queryable derive to FromSqlRow, that the latter is this much simpler to use (notably it abstracts the (sql row)->(rust tuple) part) while still covering a large majority of the use-cases, and that it provides the additional guarantee that there will be no runtime crash caused by a mismatch between the SQL types ST in impl diesel::deserialize::FromSqlRow<ST, DB> and <... as diesel::deserialize::FromSqlRow<ST, DB>>::build_from_row(row)?
If so, what are the elements that make the first solution outweigh the previously mentionned benefits of the second?

Everything done by Queryable could be simply done in the same type safe manner inside of FromSqlRow.

If that is the case then it looks like I'm mistaken in my above assumption that a mismatch between the SQL types ST in impl diesel::deserialize::FromSqlRow<ST, DB> and <... as diesel::deserialize::FromSqlRow<ST, DB>>::build_from_row(row) would not fail to compile. What guarantees that it wouldn't compile in this scenario ?

@weiznich
Copy link
Member

weiznich commented Oct 8, 2020

I just used some time to think about this some more:

(You probably know most of those things already, but I find it important to write this somewhere down, so that other people can understand how diesel works internally)

I should probably first say that I'm not really happy about how the current deserialization structure works. The old (diesel 1.x) implementation required Queryable<> for loading values, while the new (diesel 2.0) one requires FromSqlRow. While this change should not affect most users, it will certainly have an large impact on anyone that has written some generic code as this bound tends to show up in where clause quite often. My main motivation back than for this change was to unify QueryableByName and Queryable behind one common interface. The idea was that each Expression has an SqlType (Expression::SqlType). For queries constructed via sql_query (or similar mechanisms, like the DynamicSelectClause) it's set to Untyped otherwise this will be a type that implements SqlType. All of them implement TypedExpressionType. So the old QueryableByName boils down to FromSqlRow<Untyped, DB> now, while the old Queryable<ST, DB> is now FromSqlRow<ST, DB> where ST: SqlType. Additionally I hoped that this design will allow third party crates to add other custom types that implement TypedExpressionType which provide other guarantees as those two default ones. (#2367 comes in mind, but also maybe some thing like the sqlx::query! that does compile time checking of the provided query via the database itself.)

That all written now some rough overview how the old and new deserialisation implmentation work:

Old implementation (1.x)

We had basically two implementation: QueryableByName for queries where the types are unknown at compile time, and Queryable for queries build with the dsl. For both traits a distinct method in the Connection trait existed. We will start there.

Queryable

Connection::query_by_index executed the query and got back some iterator like type. This type yielded a type implementing Row for each call to next. This row was then passed to <Queryable::Row as FromSqlRow>::build_from_row, which basically constructed a tuple by accessing the column via index out of the row. To construct the tuple the FromSqlRow implementation called FromSqlRow for each tuple element. This enabled the constructed on nested tuples, but on the other hand required that scalar types implement also FromSqlRow. For scalar elements FromSql was called internally. After that the result type was used as argument for Queryable::build to construct the final struct/tuple.overhead if they are actually called (because of the first point)

So it's basically impl Row<DB> -> FromSqlRow::build_from_row (calls FromSql::from_sql internally for all elements) -> Queryable::Row -> Queryable::build

QueryableByName

Connection::query_by_name executed the query and got back som* It should be possible to implement a streaming interface using thee iterator like type (another one as Connection::query_by_id as this type needs to do some different things). This type yielded a type implementing NamedRow for each call to next. Now QueryableByName::build was called, which constructed the output struct, by calling NamedRow::get which called FromSql internally.

So it's basically impl NamedRow<DB> -> QueryableByName::build (calls FromSql::from_sql internally)

This approach had in my opinion several "problems":

  • Code that smells like duplicated code (query_by_name and query_by_index)
  • It's not possible to abstract over both QueryableByName and Queryable easily
  • There was no way to inspect the values returned by the database, so something like diesel_dynamic_schema::DynamicRow was just not possible:
    • No way to receive types from the result set
    • No way to receive column names from the result set
    • No way to even see how many values are returned by the database
  • Problematic null value handling

New Implementation (diesel 2.0)

There is now only one generic implementation. Connection::load is constrained on FromSqlRow<ST, DB> now. We provide the already discussed wild card impls FromSqlRow<ST, DB> for T where T: Queryable<ST, DB> and FromSqlRow<Untyped, DB> for T where T: QueryableByName<DB>. For the first wildcard impl the corresponding tuple is constructed via the static FromStaticSqlRow impl (required because otherwise we get conflicting implementations for the FromSqlRow and Queryable implementations for tuples). The FromStaticSqlRow impl basically work like the old FromSqlRow impl for tuples, but have a better mechanism to handle null values. For the second wild card impl the row provided to FromSqlRow is just passed to QueryableByName::build as we provide a wild card impl of NamedRow for all possible Row types now.

So its basically impl Row<DB> -> FromSqlRow::build_from_rowwhereFromSqlRow::build_from_rowinternally callsQueryable::buildorQueryableByName::build`.

As already mentioned, while this implementation works I'm not sure if that was really a good idea from a backward compatibility point of view. Additionally your point about: "What prevents FromSqlRow::build_from_row to do whatever they want internally" is something I've missed while designing this interface. (Sure, something like this was possible before, but less likely to happen). So this raises some doubt in me if this is the right way to go.

So what do we conceptually want from our deserialization interface:

  • For queries build with the dsl we want to use our type system level information to speed up the deserialization as much as possible, by skipping all information (types, column names, …) that are already present at type system level
  • For queries build with sql_query we need to access columns by name
  • We need some way to inspect information (types, column names) a* It should be possible to implement a streaming interface using thebout the result set. Those functions should only add an overhead if they are actually called (because of the first point)
  • We want this interface to be extensible, so that something like sqlx::query! could be implemented with custom guarantees on top of this interface (or add select_by #2367)
  • We want to minimize breakage for application code and if possible for generic code using diesel 1.4.x

Given those points I think the current implementation is already a improvement over the old 1.4.x implementation, notably the way how Row and the new Field trait provides information about result set and how we generalized Queryable and QueryableByName, but I wonder now if the following is not a better solution:

trait Queryable<ST: TypedExpressionType, DB: Backend> {
    type Row: FromSqlRow<ST, DB>;

    fn build(row: Self::Row) -> deserialize::Result<Self>;
}

trait FromSqlRow<ST: TypedExpressionType, DB: Backend> {
    fn build_from_row(row: &impl Row<DB>) -> deserialize::Result<Self>;
}

impl FromSqlRow<ST: SqlType, DB> for (/* any tuple*/) {}

// Maybe implement this only for `Untyped`?
impl FromSqlRow<ST, DB> for R where R: Row<DB> {}

// FromSql as already defined

trait Row<'a, DB> {

    fn get<ST, T, I>(&self, idx: I) -> Option<deserialize::Result<T>>
        where T: FromSql<ST, DB>,
                   Self: RowIndex<I>,
    
    // field_count and partial_row as already defined
}

impl FromSqlRow<ST, DB> for F where F: Field<DB>

Given those impl's we can switch the constraints in the generic code back to require Queryable<ST, DB> for Connection::load. This would hopefully undo quite a bit of the breakage for generic code caused by this change in the first place. Additionally given the FromSqlRow impls for the rows itself it should be possible to get access to the raw row as part of Queryable impl, which would allow us to kill QueryableByName completely. (I think there much less manual QueryableByName impls out there then Queryable impl, so this should be fine. Otherwise we can also easily provide that trait for compatibility reasons). Fundamentally this approach should be way more type safe given your remark about "What happens if we use a different ST for calling FromSqlRow internally, as this requires that you call the right FromSqlRow impl by having that as constraint on the associated type.

Open questions:

  • Would this even work? Someone needs to sit down and try it.
  • Can custom TypedExpressionType types extent those impls conceptually?

We cannot remove any generic parameter from Queryable:

  • ST is required to allow implement a mapping from more than one query result type. The alternative would be to make Queryable generic over the currently associated type Row, but we cannot get that type from somewhere in our current the corresponding where clause in the implementation of Connection or in the current FromSqlRow impl here
  • DB is required to allow having different behavior for different backends. For example allowing the to use Jsonb as column type on postgresql and Text for Mysql/Sqlite.

@Ten0
Copy link
Member Author

Ten0 commented Nov 9, 2020

So I've experimented with a bit with something in the spirit of your last suggestion:

Additionally given the FromSqlRow impls for the rows itself it should be possible to get access to the raw row as part of Queryable impl, which would allow us to kill QueryableByName completely.

impl FromSqlRow<ST, DB> for R where R: Row<DB> {
    fn build_from_row(row: &impl Row<DB>) -> deserialize::Result<Self> {
        ...
    }
}

Do you have a particular idea on how to implement this part considering that &impl Row<DB> != R? It looks like we'd need to implement it on &dyn Row<DB> instead of R and add lifetimes everywhere (or Rc everything and implement on Rc<dyn Row<DB>>) if we mean to do this, which is probably not what we want as it would kill any sort of compiler optimizations due to dynamic dispatch.

In order to solve this I've spent some time trying to implement something using an associated type to determine whether to use the initial row or a given type as argument to Queryable (type Row = InitialRow for getting the raw Row R, and AlreadyDeserializable<T> to get the type T), but it turned out to not work as it would require either of generic associated types (with types, not only lifetimes), higher-rank trait bounds with types (the for<'a> S: T<'a> but with types, so for<ST> S<ST>: SomeTrait), or specialization.

Because of this, and because it feels simpler conceptually than having a process where we enforce to have several steps, I like a lot the model where FromSqlRow is the sole parent trait for deserialization, takes the &impl Row<DB> and returns the output type. This is also the pattern used by Serde, and is what enables it to be the fastest deserialization framework across all languages.
If our main issue with that model is the question "What prevents FromSqlRow::build_from_row to do whatever they want internally?", I'm satisfied with it being "nothing when using this impl directly, we want full flexibility", as long as when users code, they are provided with a way to easily implement FromSqlRow in such a way that they would have to explicitly opt-out of the typical guarantees that each field is deserialized in something that makes sense for it. It seems that the current Queryable provides that (if you don't use it and you use FromSqlRow directly instead, you're opting-out).

We cannot remove any generic parameter from Queryable:

That is true if Queryable is the type on the end of the chain, however it looks like it's not a constraint if it's used as an intermediate trait to implement FromSqlRow (see experimentation below).

trait Row<'a, DB> {

    fn get<ST, T, I>(&self, idx: I) -> Option<deserialize::Result<T>>

Yes I like a lot the get on Row as well, practical to use and that's what we'd do with the Value 99% of the time.


So I've made an experiment with the following model, available at #2559:

pub trait FromSqlRow<ST, DB: Backend>: Sized {
    /// See the trait documentation.
    fn build_from_row<'a>(row: &impl Row<'a, DB>) -> Result<Self>;
}
pub trait Queryable
{
    /// The Rust type you'd like to map from.
    ///
    /// This is typically a tuple of all of your struct's fields.
    type Row;

    /// Construct an instance of this type
    fn build(row: Self::Row) -> deserialize::Result<Self>;
}
 impl<T, ST, DB> FromSqlRow<ST, DB> for T 
 where 
     T: Queryable, 
     ST: SqlType, 
     DB: Backend, 
     T::Row: FromStaticSqlRow<ST, DB>, 
 { 
     fn build_from_row<'a>(row: &impl Row<'a, DB>) -> Result<Self> { 
         let row = <T::Row as FromStaticSqlRow<ST, DB>>::build_from_row(row)?; 
         Ok(T::build(row)) 
     } 
 }

// Not implemented in the PR yet but it's pretty obvious this would work
trait Row<'a, DB> {
    fn get<ST, T, I>(&self, idx: I) -> Option<deserialize::Result<T>>
        where T: FromSql<ST, DB>,
                   Self: RowIndex<I>,
    
    // field_count and partial_row as already defined
}

I've noticed making Queryable non-generic does not work very well within Diesel itself (so I've had to use FromSqlRow directly in most places there), though given the example you have in crates.io and the usage we would have for it it seems to be sufficient for a large majority of the user cases, so maybe it would make for a better interface, but given this I'd also be happy with keeping it as generic as it is, or making it generic only on the SQL type and not the DB type.

An alternate approach I haven't experimented with but I think I'd be pretty happy with would be to use a macro to enforce implementation constraints on FromSqlRow when creating custom simple implementations (like that of Queryable), getting rid of the Queryable and of the FromStaticSqlRow trait completely (turning it into a macro, where we could make specifying ST, DB optional and auto-specify them with full genericity if not specified). It looks like it would even allow getting rid of the FromStaticSqlRow level, since the documentation mentions it's only here to avoid conflicting impls with Queryable (though I'm not completely sure why):

// This is a distinct trait from `FromSqlRow` because otherwise we
// are getting conflicting implementation errors for our `FromSqlRow`
// implementation for tuples and our wild card impl for all types
// implementing `Queryable`

pub trait FromSqlRow<ST, DB: Backend>: Sized {
    /// See the trait documentation.
    fn build_from_row<'a>(row: &impl Row<'a, DB>) -> Result<Self>;
}
trait Row<'a, DB> {
    fn get<ST, T, I>(&self, idx: I) -> Option<deserialize::Result<T>>
        where T: FromSql<ST, DB>,
                   Self: RowIndex<I>,
    
    // field_count and partial_row as already defined
}

// and e.g.:
queryable! {
    fn build(something: SomeOtherFromSqlRowType) -> diesel::deserialize::Result<WhatWeWantToImplFromSqlRowOn> {
        ...
    }
}

// which would generate aprox. this:
fn build(something: SomeOtherFromSqlRowType) -> diesel::deserialize::Result<WhatWeWantToImplFromSqlRowOn> {
    ...
}
impl<ST, DB> diesel::deserialize::FromSqlRow<ST, DB> for WhatWeWantToImplFromSqlRowOn
where
    DB: diesel::backend::Backend,
    SomeOtherFromSqlRowType: diesel::deserialize::FromSqlRow<ST, DB>,
{
    fn build_from_row<'a>(row: &impl diesel::row::Row<'a, diesel::pg::Pg>) -> diesel::deserialize::Result<Self> {
        let values = <SomeOtherFromSqlRowType as diesel::deserialize::FromSqlRow<
            ST,
            DB,
        >>::build_from_row(row)?;
        Ok(build(values)?)
    }
}

This would make it more complex for people who have custom Queryable impls to migrate, and macros are always a bit annoying to deal with, however it seems to solve all of our issues with conflicting impls, and even allows us to simplify the underlying type system (if we can indeed remove FromStaticSqlRow), so overall it seems pretty good.

Ten0 added a commit to Ten0/diesel that referenced this issue Dec 21, 2020
@Ten0 Ten0 mentioned this issue Dec 21, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants