Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to implement a dynamic select clause feature in #2182

Merged
merged 21 commits into from
Aug 7, 2020

Conversation

weiznich
Copy link
Member

@weiznich weiznich commented Oct 4, 2019

diesel_dynamic_query

This commit contains the following changes:

  • Make SelectClause traits public to allow implementing a dynamic
    select clause variant based on a vector instead of tuples like the
    existing ones. This allows to construct dynamically sized select
    clauses
  • Add two methods to the Row traits to which allows to get more
    information (current column name and column count) of the result
    set. This is required to implement compatible result types for
    dynamically sized select clauses

Copy link
Member

@sgrif sgrif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely clear on what this feature is trying to do, so I'm not really comfortable signing off on it. That said, the code itself generally looks fine. The count methods are more clearly reasonable to me, but I would rename them to field_count, since an item in the result may not map to a column. The name functions are in my opinion misleading, and encourage a common misconception about SQL which is not true.

diesel/src/mysql/connection/stmt/iterator.rs Outdated Show resolved Hide resolved
diesel/src/sqlite/connection/functions.rs Outdated Show resolved Hide resolved
@weiznich
Copy link
Member Author

@sgrif

I'm not entirely clear on what this feature is trying to do, so I'm not really comfortable signing off on it.

To motivate this a bit more: If you currently construct a select clause you need to know how many fields you select at compile time, because result sets are entirely represented at tuples at compile time. (Thanks to #2134 you only need to know the number of arguments not the actual type, at least for postgres). Now there are certain situations where those information are not available at compile time, for example if your application tries to infer the schema at run time. In such situations it would be handy to have a method that gives you some kind of iterator over fields in a row at runtime. I think we've agreed some while ago that the user facing part of this feature belongs to diesel-dynamic-schema, but for implementing it there we need to have some support in diesel itself. There is diesel-rs/diesel-dynamic-schema#10 which implements the user facing part of this feature, but to do so we need to now the number of columns in a row and maybe the name of the current field.

diesel/src/row.rs Outdated Show resolved Hide resolved
diesel/src/mysql/connection/stmt/iterator.rs Outdated Show resolved Hide resolved
};
unsafe {
Some(CStr::from_ptr(field.name).to_str().expect(
"Diesel assumes that your mysql database uses the \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be to use the encoding crate and get message from its real encoding? Or may be it is possibly specify desired encoding per-query basis in MySQL API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every other impl that deals with stings and is part of the diesel mysql backend assumes the same. It makes no sense to change that in this one case. Changing all impls is clearly out of scope of this PR. If you are interested in this, go for it. I'm not willing to invest time in changing this in a foreseeable future, because that's something that has no value for me.

@Razican
Copy link
Member

Razican commented Dec 12, 2019

Hi, dynamic selects would be a great addition to the framework, and code seems good for me. Not big changes, just exposing some extra information.

@Razican Razican requested review from Razican and removed request for Razican December 12, 2019 21:22
@weiznich
Copy link
Member Author

@Razican Nerveless I would like to have some discussion with the @diesel-rs/contributors about the general design of this feature, so just merging it is not an option.

@weiznich weiznich added this to the 2.0 milestone Jan 10, 2020
@weiznich
Copy link
Member Author

I had some time to think about possible solutions to support that for more than the postgresql backend. So here a short mind dump of the remaining open questions for this feature: (cc @sgrif would like to hear your opinion an this 😉)
The current solution introduces a magic Any sql type as part of the diesel-dynamic-schema crate. Additionally a dynamic select clause implementation is added. This sets the return type of such an query to Any, which enforces that only types having a special tailored FromSql implementation could be used in that context. Additionally a dynamic row type implementing FromSqlRow is provided. This is currently only implemented for postgreql, because there is a clear mapping to Any type on database side. I would like to support the other backends there as well, as they also support dynamic value lookups. The probably best solution would be to generalize this Any type to something that is used by all backends. This would imply that we have to implement HasSqlType<Any> for Sqlite and Mysql, which implies that we need to fine some value in SqliteType and MysqlTypeMetadata to represent that type. One way would be to introduce another variant in that enums, that just says this is a unknown sql type, you can only encounter it in this cases and it requires loading via the dynamic row implementation. In that vain: I would also like to move the definition of the any type to diesel itself, because it's some kind of fundamental type that could (and will) be used by other third party backends to, so they don't need to depend on diesel-dynamic-schema, but the dynamic value lookup will work nerveless.

@sgrif
Copy link
Member

sgrif commented Jan 29, 2020

I think your comment makes sense, but I still don't think it makes sense for Any to be in the main repository, at least not for 2.0. It'd be really weird for us to ship a SQL type and not do anything with it

@lutter
Copy link

lutter commented Feb 5, 2020

Just to add my $0.02: I am really happy to see things moving in the direction of dynamic select lists; I've been meaning to ask about them in gitter forever, but obviously haven't. I have exactly this situation: the schema is only known at runtime (user-supplied) Currently, I work around this restriction by calling to_jsonb a lot, but this feature would allow me to get rid of those.

@weiznich
Copy link
Member Author

For now blocked on #2290

/// Raw sqlite value as received from the database
///
/// Use existing `FromSql` implementations to convert this into
/// rust values:
#[allow(missing_debug_implementations, missing_copy_implementations)]
pub struct SqliteValue {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general note: I'm not sure if this is sound in combination with the pointer casting in line 28, as this relays on the ABI representation. Maybe we should try to add something like #[repr(transparent)] here?

@weiznich
Copy link
Member Author

weiznich commented Jun 10, 2020

@diesel-rs/reviewers I would like to proceed on this one. If someone finds some time a review would be appreciated (especially about the API design and the unsafe code usage).

@Ten0 I think I will find some time to work on this in the next weeks. Would be great to get some inputs regarding to #2275.

Copy link
Member

@pksunkara pksunkara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is technically solid. I don't have any opinions on the change and design though.

@Ten0
Copy link
Member

Ten0 commented Jun 12, 2020

Regarding the "Remove FIELDS_NEEDED" commit fixing #2275, I agree with the approach considering we are getting rid of any possibility to have access to the field count.

Given that advance would call take anyway, and in the new implementation if take() returns None the FromSql implementation will typically return an UnexpectedNullError right away, and given that the case where we expect the Option to be None but some fields inside are not None is rather rare and mostly due to optimizing select(table::some_long_string) into select(table::some_long_string.is_not_null()) (hence becoming a boolean that's pretty much zero cost to deserialize) it seems this overhead mentionned in the commit message is really really close to zero and and this method approach has no disadvantage... if not for the additional Box::new(UnexpectedNullError) as Box<dyn Error> everytime we encounter a NULL :

  • I'm not sure if it's immediately clear for everyone who would go about implementing FromSqlRow or FromSql that this precise kind of boxed error is relied upon for deserializing Options, and they should never throw an error of their own when encountering a None (NULL) when attempting to deserialize their value. Currently they could rely on Option::from_sql_row to check it for them.
  • I'm not 100% sure that Box<dyn Error> does not allocate on zero-sized Errors. It's mentioned in the doc of Box that it does not allocate on zero-sized types, but I'm not sure where I can find doc on how it behaves with Box<dyn Error> that is two pointers, which I can't find in the rust source code. (I mean, I don't think it would make sense that it allocates if the normal Box doesn't, but, If anyone has a reference on how exactly this works I'm interested, because I don't understand how this works just from the Box source code.)
  • Even if it does not allocate, that is some dynamic dispatch on getting the type id.

Regarding these 3 points (and in particular the first one), what do you think of turning the deserialize::Result::E into an enum (which would have UnexpectedNull as variant, as well as Custom(Box<dyn Error>)) ?
I think this would help making it more visible for people implementing FromSql that it is relied upon to handle deserializing Options.

Unrelated to that, I beleive Option was also the only reason why we had next_is_null, so it seems it could be removed as well.

@weiznich
Copy link
Member Author

@Ten0 OK I took some time to understand the nullable boolean problem uncovered by the CI. Seems like this fix will not work in that way.
I'm currently trying out an alternative more radical change that would
a) fix your nullable bug
b) Hopefully allow to mix Queryable and QueryableByName structs in the long run

@Ten0
Copy link
Member

Ten0 commented Jun 18, 2020

Ah yes, good old 26720d7 / #104 hurting again (I say again because #1459 (comment)).

Thinking about #104 more seriously than I did last time I saw it (which was my first interaction with Diesel), it seems we could work out a different way to solve #104 (in what Sean called the "real answer") by, instead of using a NotNull marker trait (and suggested in #104 IsNullable marker trait), doing something similar to what was done with ValidGrouping replacing IsAggregate, and using an IsNullable associated type to the SQL types with something like SqlType<IsNullable=Yes/No>. This would guarantee at compile time that no type may implement (the corresponding equivalent of) both IsNullable and NotNull, which was the main issue blocking #104 from being solved in a clean way. We could then use the associated-types-disjointness trick to write the two impls in #104 in such a way that they do not conflict. (And we would update IntoNullable based on that.)

Once that is done, hopefully there is no more blocker to your initial approach to deserializing Options (which seems much cleaner than what we currently have, in that it allows us to remove Row::next_is_null and Row::advance and their corresponding impls for every backend which were only used for that).

@weiznich weiznich force-pushed the feature/dynamic_queries branch 4 times, most recently from f8baa44 to 137b3e9 Compare July 31, 2020 13:29
@weiznich
Copy link
Member Author

@diesel-rs/reviewers I think this is now ready for a final review (assuming that CI now passes, otherwise I will fix the CI and then it's ready). If there are now comments in the next 7 days I will merge this PR.

cc @adwhit as owner of diesel-derive-enum as this likely requires changes in your crate.

@weiznich weiznich force-pushed the feature/dynamic_queries branch 4 times, most recently from 91c6757 to 197a51f Compare July 31, 2020 23:27
(Sorry for the really large diff but this touches nearly all parts of
diesel)

This commit does in two things:
* Refactor our handling of nullable values. Instead of having a marker
trait for non nullable sql types we now indicate if a sql type is
nullable by using a associated type on a new fundamental trait named
SqlType. This allows us to reason if an type is nullable or not in a
much precise way without running in conflicting implementation
issues. This allows us to address diesel-rs#104 in a much more fundamental
way. (The correct way as mentioned there by sgrif).
* Refactor our handling of typed and untyped sql fragments. Instead of
having two separate code paths for `Queryable` (sql types known) and
`QueryableByName` (sql types not known) we now have only one code path
and indicate if a query is typed or untyped as part of the expression
sql type. This is required to address diesel-rs#2150. `Queryable` and
`QueryableByName` now simple . Additionally we separate the static size component of
`FromSqlRow` to allow dynamically sized rows there (but only at the
last position for tuple impls.)

I should probably have implement those changes in different commits
but as both changes basically requiring touching most of our code base
this would have required much more work...

Both changes are theoretically big major breaking changes. For
application code I expect the actual breakage to be minimal, see the
required changes in `diesel_tests` and `diesel_cli` for examples. For
highly generic code I would expect quite a few required changes.

Additionally there are a few small drive by fixes.

Fixes diesel-rs#104
Fixes diesel-rs#2274
Fixes diesel-rs#2161
Copy link
Member

@Ten0 Ten0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as soon as I had hoped, but I could finally take the time to analyze this! o/

This is great! There are tons of improvements everywhere which overall make the code conceptually simpler while fixing issues and introducing new functionality.

I have a few questions/remarks at the implementation level, but haven't found any issue with the design. 👍

diesel/src/row.rs Outdated Show resolved Hide resolved

/// Returns a wrapping row that allows only to access fields, where the index is part of
/// the provided range.
#[doc(hidden)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this is #[doc(hidden)]. Isn't there a scenario where someone implementing a dynamic query would ideally use this?

Copy link
Member Author

@weiznich weiznich Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see how this would be relevant for dynamic queries. They normally should use RowIndex<&str> or just load all columns that are queried. In my view this function is just a internal helper to implement FromStaticSqlRow for recursive tuples by allowing to skip the corresponding number of fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine a scenario where there is still some form of known hierarchical structure, but user-defined and unknown at compile time which could use this when building it through a recursive function.
What would be the downside of having it public? (with corresponding documentation explaining this could only be useful when implementing custom parsing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this function public at any later point if someone comes up with a concrete use case.

diesel/src/pg/connection/result.rs Outdated Show resolved Hide resolved
diesel/src/pg/connection/row.rs Outdated Show resolved Hide resolved
diesel/src/row.rs Outdated Show resolved Hide resolved
diesel/src/sql_types/mod.rs Outdated Show resolved Hide resolved
if self.0.name.is_null() {
None
} else {
unsafe { CStr::from_ptr(self.0.name).to_str().ok() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be different from the others (which expect). Is that because Mysql does not guarantee that they will be UTF8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's different in that way that we have some way to handle non UTF-8 field names here, we can just return None in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. On the other backends, we decided to not silent-fail on this and expect, because they should never return non utf-8.
On this backend it seems we decided to return them as non-named fields.
Is it because in the way we're using Mysql now it is expected behaviour that it might return non-utf-8 values?

Oh well nevermind I guess that's probably the correct solution if the previous correct solution was what was previously implemented, that is c_name.to_str().unwrap_or_default() ^^

fn from_nullable_sql(bytes: Option<backend::RawValue<DB>>) -> Result<Self> {
match bytes {
Some(bytes) => Self::from_sql(bytes),
None => Err(Box::new(crate::result::UnexpectedNullError)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have chosen to not apply the suggestion at #2182 (comment).
I'd like to understand what makes the current solution better :)

Copy link
Member Author

@weiznich weiznich Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in my opinion the simpler solution. As for the additional allocation + dynamic dispatch: I'm not sure if the compiler optimizes them away or not but at least in my short benchmark I haven't seen any impact of this. Therefore I've the opinion that we should just choose the simplest option here and not try to optimize things that are not worth being optimized. If someone provides some benchmark showing that this assumption is wrong I will change the implementation 😉
As for making it clear that this impl checks for crate::result::UnexpectedNullError: I changed FromSql in such a way that basically all impls will return this error if they encounter an unexpected null value, only impl that choose to provide their own from_nullable_sql impl have to deal with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it seems I have indeed underestimated the complexity of that change and overestimated the ease of use of the corresponding interface (which cannot be as nice as I thought due to conflicting impls issues).
Thanks! :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complexity of the change it self is not so much the problem here, it's the potential impact of the change that is in my opinion the problem. If we introduce some enum like

enum DeserialisationError {
    UnexpectedNullError,
    Custom(Box<dyn Error + Send + Sync>)
}

that would change the return type of every function that uses diesel::deserialisation::Result which is a large impact breaking change in my opinion. We can do such changes with diesel 2.0, but I think we should only make major breaking changes where it is really required. At least for me that seems not to be the case here as I cannot measure a negative performance impact of handling null values this way + having a default FromSql::from_nullable_sql method should make it clear how to correctly "handle" unexpected null values.

diesel/src/pg/connection/cursor.rs Outdated Show resolved Hide resolved
diesel/src/pg/connection/cursor.rs Outdated Show resolved Hide resolved
@Razican
Copy link
Member

Razican commented Aug 3, 2020

As a general note a quick benchmark using this code gives the following results:

I have executed the benchmarks myself, and while I was having some inconsistencies in different runs, I can confirm that I didn't see any significant performance regression.

Co-authored-by: Ten0 <9094255+Ten0@users.noreply.github.com>
@weiznich
Copy link
Member Author

weiznich commented Aug 3, 2020

@Ten0 Thanks for the review 👍 I've fixed most things + left some answers.
@Razican Thangs for running the benchmark. Those benchmark only tests the postgres backend, similar tests should be done with the sqlite and the mysql backend to ensure we did not introduce a large regression there.

@weiznich weiznich force-pushed the feature/dynamic_queries branch 3 times, most recently from b95b837 to c4d15b0 Compare August 4, 2020 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants