`sql_query` can't integrate well with generic code #2150

sgrif · 2019-08-18T21:47:01Z

We designed sql_query to be fundamentally separate from the rest of the query builder, as it's designed for when you want to write your whole query in SQL. However, I'm realizing that this plays poorly with the rest of the ecosystem when you're writing generic abstractions that are meant to abstract over a whole query.

Context on why `sql_query` lives in its own world if you need a refresher

There are two reasons this function lives in its own world, and was introduced at all even though sql already existed.

The first reason is that Diesel is fundamentally tuple based, which means that SELECT * can't work -- you need to specify the column list. This is fine when it's generated by the library, but a massive pain when writing the query by hand. When writing raw SQL, it also meant you had to specify the types of each of those columns explicitly, (which you technically still need to do, but it feels much more ergonomic now even if the requirements haven't actually changed much). So with sql_query, columns are loaded by name, and the types are specified either by giving a table name or annotating the fields of a struct.

The second reason was that bind parameters interacted poorly. Before we introduced sql_query, the behavior of sql with bind parameters was similar to how it works with sql_query. You'd put the placeholder in the SQL string, and then call .bind with the value. This is a problem if you're passing it to some arbitrary query builder function on PG or another backend which has indexed bind placeholders, you can't reasonably know the index.

We changed sql so that when you call .bind it also inserts the placeholder SQL, but if you're writing the whole query in SQL, you probably want it in another file (used with include_str!) where you have better indentation, syntax hi-lighting, etc. requiring separate .sql(...).bind(...) calls breaks that flow. So unlike sql, sql_query lets you write the placeholders yourself, so the query can live in one place, because this function is intended to be used only when you're writing the whole query by hand.

For example, crates.io has a type called Paginated, which generates SQL that looks like:

SELECT *, COUNT(*) OVER () FROM ({whole query here }) t LIMIT ? OFFSET ?

We've been running into issues with the fact that offset doesn't scale, so we're preparing to switch to id/cursor based pagination. As part of this, I've been refactoring our code base to better encapsulate pagination overall. The first piece of this was to change "get limit/offset from the request and pass it to .paginate" to "pass the query params to .paginate, and let it determine what query to generate".

This has turned out to be impossible to fully encapsulate, since two of our paginated endpoints are using sql_query. There's two problematic interactions here.

The first is that Paginated needs to use bind parameters. If the query given to it comes from sql_query and that query uses its own bind parameters, the ones used for limit and offset will have the wrong index. Note that there's nothing in our API today that prevents this from being a runtime error, though anything written this way is most likely abstracting over queries or expressions, meaning its constructor most likely has either a T: AsQuery or T: AsExpression<ST> bound, and SqlQuery/UncheckedBind implement neither.

Brief aside, fixing this does leave the API open to misuse

The fix for this is basically to add a (default empty impl for backwards compat) function to QueryBuilder which is informed when push_bind_param_value_only is called, which increments the bind counter and does nothing else. Importantly, this means that it would be valid to add more bind params through the query builder after the SqlQuery is walked, but not before. So SELECT * FROM ({sql_query}) LIMIT $1 is fine, SELECT *, $1 FROM ({sql_query}) is not -- or at least the query passed to it needs to be aware that its indexes need to start from 2.

We could theoretically detect this at runtime by erroring if push_bind_param_value_only is called after push_bind_param. We could even do this only for PG. I opted not to in my (unopened) PR both for simplicity, and because having a sql_query which is expected to be passed to some other combinator like paginate on use is completely valid, so I don't want to make writing these queries with all hand-written SQL starting at $2 impossible.

I also don't think this vector for misuse is particularly common. This interaction is only when writing generic code that abstracts over entire queries. That means by definition it's going to be wrapping that query in a subselect as part of the FROM clause or a CTE. While there are reasons you could have a bind param in front of that, it's substantially less common than having it after.

This one is easy to fix, and I actually have a PR ready to go if we decide that this is a problem worth addressing. However, fixing this first issue is useless on its own, since all it does is let you write a valid QueryFragment impl -- but you still can't call .load on the result.

The second issue is that you can't get a generic impl of LoadQuery that allows both SqlQuery and everything else to be passed to it. Fundamentally, an impl of that trait is either calling conn.query_by_name(self) or conn.query_by_index(self), and you need to know whether the bounds you're writing are T: Queryable<U> or T: QueryableByName.

I believe we're missing a function that is basically internal_load, but takes some other QueryFragment instead of Self, which implies also taking some other SQL type. While I haven't given this more than an hour or so of thought, the latter definitely seems to be the issue, since we need this to be generic over types using QueryableByName, which have no SQL type. I think a good litmus test for whether we can support this generically is wether we can give a default impl to internal_load, since it should be able to be written by calling this new function with self. The issue is the SQL type. What do we even do here? We could give () for the by-name case, but how do we abstract over that? We can't write an impl that says "if this trait is implemented use the SQL type from it, otherwise use ()". If we could have that conditional at all, we wouldn't have this problem in the first place. Specialization will not help here. Maybe we can get away with declaring that anything which uses query_by_name in LoadQuery must implement Query<SqlType = ()>, but right now T: AsQuery is the only way to assert "I don't handle arbitrary SQL, give me something using the query builder and Queryable. I'm not certain we should remove that.

Even if that's solved, I also still have concerns about this trait becoming a footgun. Because ultimately for the non-by-name case it'd be a function that is "here's a query, which isn't you but you can probably assume contains you somewhere in it. Also here's a SQL type, which isn't your SQL type, but probably contains your SQL type somewhere in it. You can't verify either of those things. Please run that query, and deserialize it into this other type however you would have if you were run directly. You can statically assert that the query is valid to be executed, and the type you're deserializing into is valid for the SQL type you were given, but that's it. Crucially, you can't verify that the query you're running actually represents the type you were given, or actually uses the strategy you want to". This just sounds like a recipe for disaster.

I might be over-thinking this. I might also be over-estimating how likely this need is to come up in practice. TBH I can't really think of cases other than pagination that want to a abstract over whole queries like this. We might also be able to get away with adding a const (or probably a runtime function because I don't think associated const defaults are stable?). At least for cases where the type being returned is somehow wrapped in another type, that type already has to consider both Queryable and QueryableByName, so it might be possible to end up with bounds that require both and choose one at runtime that works. I don't think that's possible or broad enough, but we shouldn't rule it out.

The text was updated successfully, but these errors were encountered:

This continues the changes made in rust-lang#1763. Since that PR was merged, one of the non-code steps has been taken care of -- All users hitting any endpoint with `?page=20` (which is an arbitrary search pattern that seemed high enough to give any crawlers going through pagination) have been contacted about the change, with a PR opened against any that included a repo. (Intersting aside, there are *zero* records of this for any endpoint other than search, which perhaps implies we can get rid of a few of these endpoints, but that's a separate discussion). This PR does not change any functionality, but moves some code around to better encapsulate things for upcoming changes. Specifically: - Change our frontend to show "next/prev page" links on the all crates page - Stop returning the "total" meta item when the next/prev page links will be cursor based (which I'd actually just like to start omitting in general) The main goal of this change was to stop having any code outside of `Paginated` (which has been renamed to `PaginatedQuery`, as there's a new type called `Paginated`) care about how pagination occurs. This means other code can't care about *how* pagination happens (with the exception of `reverse_dependencies`, which uses raw SQL, and sorta has to... That was a bit of a rabbit hole, see diesel-rs/diesel#2150 for details. Given the low traffic to that endpoint, I think we can safely ignore it). The distribution of responsibilities is as follows: - `PaginatedQuery<T>`: Given the query params, decides how to paginate things, generates appropriate SQL, loads a `Paginated<T>`. - `Paginated<T>`: Handles providing an iterator to the records, getting the total count (to be removed in the near future), providing the next/prev page params - `Request`: Takes the pagination related query params, turns that into an actual URL (note: Due to jankiness in our router, this will only ever be a query string, we have no way of getting the actual path) The next step from here is to change our UI to stop showing page numbers, and then remove the `total` field. This PR will introduce a performance regression that was addressed by rust-lang#1668. That PR was addressing "this will become a problem in a future", not "this is about to take the site down". Given the intent to remove the `total` field entirely, I think it is fine to cause this regression in the short term. If we aren't confident that the changes to remove this field will land in the near future, or want to be conservative about this, I can add some additional complexity/git churn to retain the previous performance characteristics

Refactor pagination to prepare for cursor-based pagination. This continues the changes made in #1763. Since that PR was merged, one of the non-code steps has been taken care of -- All users hitting any endpoint with `?page=20` (which is an arbitrary search pattern that seemed high enough to give any crawlers going through pagination) have been contacted about the change, with a PR opened against any that included a repo. (Intersting aside, there are *zero* records of this for any endpoint other than search, which perhaps implies we can get rid of a few of these endpoints, but that's a separate discussion). This PR does not change any functionality, but moves some code around to better encapsulate things for upcoming changes. Specifically: - Change our frontend to show "next/prev page" links on the all crates page - Stop returning the "total" meta item when the next/prev page links will be cursor based (which I'd actually just like to start omitting in general) The main goal of this change was to stop having any code outside of `Paginated` (which has been renamed to `PaginatedQuery`, as there's a new type called `Paginated`) care about how pagination occurs. This means other code can't care about *how* pagination happens (with the exception of `reverse_dependencies`, which uses raw SQL, and sorta has to... That was a bit of a rabbit hole, see diesel-rs/diesel#2150 for details. Given the low traffic to that endpoint, I think we can safely ignore it). The distribution of responsibilities is as follows: - `PaginatedQuery<T>`: Given the query params, decides how to paginate things, generates appropriate SQL, loads a `Paginated<T>`. - `Paginated<T>`: Handles providing an iterator to the records, getting the total count (to be removed in the near future), providing the next/prev page params - `Request`: Takes the pagination related query params, turns that into an actual URL (note: Due to jankiness in our router, this will only ever be a query string, we have no way of getting the actual path) The next step from here is to change our UI to stop showing page numbers, and then remove the `total` field. This PR will introduce a performance regression that was addressed by #1668. That PR was addressing "this will become a problem in a future", not "this is about to take the site down". Given the intent to remove the `total` field entirely, I think it is fine to cause this regression in the short term. If we aren't confident that the changes to remove this field will land in the near future, or want to be conservative about this, I can add some additional complexity/git churn to retain the previous performance characteristics

(Sorry for the really large diff but this touches nearly all parts of diesel) This commit does in two things: * Refactor our handling of nullable values. Instead of having a marker trait for non nullable sql types we now indicate if a sql type is nullable by using a associated type on a new fundamental trait named SqlType. This allows us to reason if an type is nullable or not in a much precise way without running in conflicting implementation issues. This allows us to address diesel-rs#104 in a much more fundamental way. (The correct way as mentioned there by sgrif). * Refactor our handling of typed and untyped sql fragments. Instead of having two separate code paths for `Queryable` (sql types known) and `QueryableByName` (sql types not known) we now have only one code path and indicate if a query is typed or untyped as part of the expression sql type. This is required to address diesel-rs#2150. As part of this change we unify `Queryable`, `QueryableByName` and `FromSqlRow` into only `FromSqlRow`. Additionally we separate the static size component of `FromSqlRow` to allow dynamically sized rows there (but only at the last position for tuple impls.) I should probably have implement those changes in different commits but as both changes basically requiring touching most of our code base this would have required much more work... Both changes are theoretically big major breaking changes. For application code I expect the actual breakage to be minimal, see the required changes in `diesel_tests` and `diesel_cli` for examples. For highly generic code I would expect quite a few required changes. Additionally there are a few small drive by fixes. Fixes diesel-rs#104 Fixes diesel-rs#2274 Fixes diesel-rs#2161

(Sorry for the really large diff but this touches nearly all parts of diesel) This commit does in two things: * Refactor our handling of nullable values. Instead of having a marker trait for non nullable sql types we now indicate if a sql type is nullable by using a associated type on a new fundamental trait named SqlType. This allows us to reason if an type is nullable or not in a much precise way without running in conflicting implementation issues. This allows us to address diesel-rs#104 in a much more fundamental way. (The correct way as mentioned there by sgrif). * Refactor our handling of typed and untyped sql fragments. Instead of having two separate code paths for `Queryable` (sql types known) and `QueryableByName` (sql types not known) we now have only one code path and indicate if a query is typed or untyped as part of the expression sql type. This is required to address diesel-rs#2150. `Queryable` and `QueryableByName` now simple . Additionally we separate the static size component of `FromSqlRow` to allow dynamically sized rows there (but only at the last position for tuple impls.) I should probably have implement those changes in different commits but as both changes basically requiring touching most of our code base this would have required much more work... Both changes are theoretically big major breaking changes. For application code I expect the actual breakage to be minimal, see the required changes in `diesel_tests` and `diesel_cli` for examples. For highly generic code I would expect quite a few required changes. Additionally there are a few small drive by fixes. Fixes diesel-rs#104 Fixes diesel-rs#2274 Fixes diesel-rs#2161

weiznich · 2020-10-05T15:40:25Z

The second part of this issue should be fixed after the type system rewrite in #2182. This leaves the first, simple part of the issue. I'm marking this as help wanted as I think this is fairly easy to implement.

The fix for this is basically to add a (default empty impl for backwards compat) function to QueryBuilder which is informed when push_bind_param_value_only is called, which increments the bind counter and does nothing else. Importantly, this means that it would be valid to add more bind params through the query builder after the SqlQuery is walked, but not before. So SELECT * FROM ({sql_query}) LIMIT $1 is fine, SELECT *, $1 FROM ({sql_query}) is not -- or at least the query passed to it needs to be aware that its indexes need to start from 2.

We could theoretically detect this at runtime by erroring if push_bind_param_value_only is called after push_bind_param. We could even do this only for PG. I opted not to in my (unopened) PR both for simplicity, and because having a sql_query which is expected to be passed to some other combinator like paginate on use is completely valid, so I don't want to make writing these queries with all hand-written SQL starting at $2 impossible.

I also don't think this vector for misuse is particularly common. This interaction is only when writing generic code that abstracts over entire queries. That means by definition it's going to be wrapping that query in a subselect as part of the FROM clause or a CTE. While there are reasons you could have a bind param in front of that, it's substantially less common than having it after.

This is the last missing piece to allow code to abstract both, over queries generated by `sql_query` and queries constructed using diesels query dsl. To quote from the original issue: > The fix for this is basically to add a (default empty impl for backwards compat) function to QueryBuilder which is informed when push_bind_param_value_only is called, which increments the bind counter and does nothing else. Importantly, this means that it would be valid to add more bind params through the query builder after the SqlQuery is walked, but not before. So SELECT * FROM ({sql_query}) LIMIT $1 is fine, SELECT *, $1 FROM ({sql_query}) is not -- or at least the query passed to it needs to be aware that its indexes need to start from 2. > We could theoretically detect this at runtime by erroring if push_bind_param_value_only is called after push_bind_param. We could even do this only for PG. I opted not to in my (unopened) PR both for simplicity, and because having a sql_query which is expected to be passed to some other combinator like paginate on use is completely valid, so I don't want to make writing these queries with all hand-written SQL starting at $2 impossible. > I also don't think this vector for misuse is particularly common. This interaction is only when writing generic code that abstracts over entire queries. That means by definition it's going to be wrapping that query in a subselect as part of the FROM clause or a CTE. While there are reasons you could have a bind param in front of that, it's substantially less common than having it after. Fixes diesel-rs#2150

sgrif mentioned this issue Aug 18, 2019

Refactor pagination to prepare for cursor-based pagination. rust-lang/crates.io#1807

Merged

weiznich added this to the 2.0 milestone Oct 5, 2020

weiznich added the help wanted label Oct 5, 2020

weiznich mentioned this issue Oct 8, 2020

Introduce QueryBuilder::push_bind_param_value_only #2534

Merged

weiznich closed this as completed in #2534 Oct 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`sql_query` can't integrate well with generic code #2150

`sql_query` can't integrate well with generic code #2150

sgrif commented Aug 18, 2019

weiznich commented Oct 5, 2020

sql_query can't integrate well with generic code #2150

sql_query can't integrate well with generic code #2150

Comments

sgrif commented Aug 18, 2019

weiznich commented Oct 5, 2020

`sql_query` can't integrate well with generic code #2150

`sql_query` can't integrate well with generic code #2150