This work is very incomplete, but I think I've poked this from enough
directions to be confident that the problem is well understood, and what
the design tradeoffs and implementation hurdles will be. So I'm pushing
up what I've got so far in order to have a discussion about the
potential design choices.
Any version of this feature that we go with must handle these use cases:
- Joining to the same table multiple times
- Joining to a subselect
- Self referential subselects
- I'm not sure we actually need this one. Can anyone come up with a
query that uses a subselect which can't be replaced by a join to
that same subselect?
Ultimately the main questions around design are:
- How do we define the alias?
- How do we reference columns from the alias?
In terms of actual aliasing, I think there's basically only one design
that makes sense. You create an aliased table by doing `let aliased =
table_or_query.aliased(your_alias)`, which generates `table alias` for
tables, and `(SUBSELECT) alias` for subselects. You can then join to
this by doing `whatever.inner_join(aliased.on(...))`. These seem
straightforward and don't have many alternatives.
Even when this is fully implemented, this feature has to cover enough
use cases, and is complex enough that I'd like to have at least one full
version with this feature available before we commit to stability on
this. For that reason I've hidden it behind a cfg gate. This is
specifically not a feature, so that the end user has to explicitly opt
into the use of the unstable feature, similar to the requirements around
nightly.
Defining an alias
==
The design I've gone with requires forward declaring aliases before they
can be used. In the test case I've declared it in the function, but I
expect that the majority of people will define them all in `schema.rs`
and then add them to their `allow_tables_to_appear_in_same_query!` line.
The defining macro creates a unit struct with similar properties to
tables.
Pros
--
This form is flexible and consistent with how we handle tables. It fits
into our type system in the same way, and has the same benefits
(ensuring an alias appears in the from clause no more than once,
disallows referencing columns from that alias unless it's inside the
from clause). It also follows the same "rules" with regards to joins.
Though `allow_tables_to_appear_in_same_query!` is a total hack, but it's
not going away any time soon, and I think keeping the rules simple and
consistent is valuable.
Cons
--
This requires a *lot* of ceremony to ever alias a table. It requires two
lines that are likely going to be in a completely different file before
you can ever even use it. Additionally, the resulting type for an alias
does *not* have an identical structure to a table module. Because of
that, we have to introduce a new macro
`allow_types_to_appear_in_same_query`, which takes a type instead of an
ident. We'll probably need to change `diesel print-schema` to use the
more verbose form.
The fact that we're using unit structs means that you can't use the same
name in the let binding. `let users2 = users.aliased(users2)` would feel
more natural.
Alternatives
--
- Avoid unit structs
- We'd basically stick a single field on the struct with type `()` to
allow let bindings with the same name. This would change the
invocation to `users::table.aliased::<users2>()`. This feels
inconsistent and weird to me.
- Inline declarations `let users2 = users::table.aliased(diesel_alias!(users2))`
- This is what I had originally envisioned, and would be my preferred
form if it is possible. However, we *need* the enumerated
`AppearsInFromClause` impls in order for this to work at all. We can
theoretically generate them specifically for tables, but this still
wouldn't allow multiple aliases to appear together. Perhaps I'm
missing something, but if so it's not clear to me.
- Remove the need for `allow_types_to_appear_in_same_query!`
- Right now our convention is that in `schema.rs` you have a single
`allow_tables_to_appear_in_same_query!` invocation that contains all
your tables. We could decide to make this convention actually
required for this feature. We could define a macro from inside that
macro which you use to define your aliases. You'd have to define
them all in a single macro call, but it'd just be a single
`diesel_define_aliases!` after your table declarations. This
requires a bit too much specific ordering of things, and I think
it's too much hard to understand magic.
- Strings for aliases
- If we don't actually try to enforce type safety, we don't have
problems that arise from proving a query is type safe. This one is a
non-starter for me personally, but I figured I'd mention it here. In
addition to losing type safety, we would also no longer be able to store
these queries in the prepared statement cache.
Unresolved Questions
--
- `allow_types_to_appear_in_same_query!` basically obsoletes
`allow_tables_to_appear_in_same_query!`. Should we deprecate it?
- If we expect folks to be putting this in `schema.rs` anyway, should we
add support for it in `diesel.toml` and generate this code for them?
Accessing the aliased columns
==
The current design has you call `.selection()` to get back the resulting
aliased columns as a tuple.
Pros
--
This is the simplest design, and pretty much the only possible design
for aliased subselects, which have an unknown structure and also require
generating labels for each expression (the unique labels will be
required, even if the expression is just a normal column. We have no way
of knowing if there would be a name conflict otherwise). This means that
what you write is the same, regardless of what's being aliased is a
table or a subselect.
Cons
--
The main issue here is that this is a bit cumbersome, especially if you
want to join on the 17th column in the expression. We can work around
this for specifically tables with an alternate design if we want (see
below), but this restriction is basically always going to exist for
subselects. In the general case this can be worked around by explicitly
calling `.select` with the bits that you want first.
Alternatives
--
- Stick this on the table instead
- So the idea here is basically that we turn tables into structs
containing all columns as fields, and then make both tables and
columns generic over the alias. This means that you can do `let
u2 = users.aliased(users2)` and then access `users.id`, etc. There
are two main reasons I haven't done this. First is just for
consistency, I'd like to have the same thing work for both tables
and subselects. Secondly, this will make our type errors *way*
worse. Every appearance of a table would change from
`schema::foo::table` to `schema::foo::table<schema::foo::__NoAlias>`.
Same for columns. Not to mention the implementation is substantially
more complicated.
- One big upside here is that it would be possible to write code
generic over a table and that table when it's aliased. I suspect
this is niche enough to ignore.
- Forward declare the fields as well
- This would look something like `named_alias! { foo { bar, baz, qux }
}` etc. It's more or less the same thing, but it makes you state the
names far more up front so you can access them with dot syntax. We
could potentially bake the select clause in here as well. I feel
like I actually had a more fleshed out design for this a while back,
but it was in the private channel which doesn't have search. I
remember we all liked the tuple version better though.
It's worth noting that the current design doesn't preclude doing the
others in the future, both of those can be built on the current
structure.
Unresolved Questions
--
- The `selection` method probably will always just be calling
`.default_selection()` -- Should we codify that in the code?
So that's basically it. The actual code included here is just a minimal
skeleton for table aliases, it does not allow subselects. It for sure
needs more tests added, and especially compile tests. I've been writing
this commit message for like an hour now so I'm going to stop.
sgrif commentedJul 3, 2018
This work is very incomplete, but I think I've poked this from enough
directions to be confident that the problem is well understood, and what
the design tradeoffs and implementation hurdles will be. So I'm pushing
up what I've got so far in order to have a discussion about the
potential design choices.
Any version of this feature that we go with must handle these use cases:
query that uses a subselect which can't be replaced by a join to
that same subselect?
Ultimately the main questions around design are:
In terms of actual aliasing, I think there's basically only one design
that makes sense. You create an aliased table by doing
let aliased = table_or_query.aliased(your_alias), which generatestable aliasfortables, and
(SUBSELECT) aliasfor subselects. You can then join tothis by doing
whatever.inner_join(aliased.on(...)). These seemstraightforward and don't have many alternatives.
Even when this is fully implemented, this feature has to cover enough
use cases, and is complex enough that I'd like to have at least one full
version with this feature available before we commit to stability on
this. For that reason I've hidden it behind a cfg gate. This is
specifically not a feature, so that the end user has to explicitly opt
into the use of the unstable feature, similar to the requirements around
nightly.
Defining an alias
The design I've gone with requires forward declaring aliases before they
can be used. In the test case I've declared it in the function, but I
expect that the majority of people will define them all in
schema.rsand then add them to their
allow_tables_to_appear_in_same_query!line.The defining macro creates a unit struct with similar properties to
tables.
Pros
This form is flexible and consistent with how we handle tables. It fits
into our type system in the same way, and has the same benefits
(ensuring an alias appears in the from clause no more than once,
disallows referencing columns from that alias unless it's inside the
from clause). It also follows the same "rules" with regards to joins.
Though
allow_tables_to_appear_in_same_query!is a total hack, but it'snot going away any time soon, and I think keeping the rules simple and
consistent is valuable.
Cons
This requires a lot of ceremony to ever alias a table. It requires two
lines that are likely going to be in a completely different file before
you can ever even use it. Additionally, the resulting type for an alias
does not have an identical structure to a table module. Because of
that, we have to introduce a new macro
allow_types_to_appear_in_same_query, which takes a type instead of anident. We'll probably need to change
diesel print-schemato use themore verbose form.
The fact that we're using unit structs means that you can't use the same
name in the let binding.
let users2 = users.aliased(users2)would feelmore natural.
Alternatives
()toallow let bindings with the same name. This would change the
invocation to
users::table.aliased::<users2>(). This feelsinconsistent and weird to me.
let users2 = users::table.aliased(diesel_alias!(users2))form if it is possible. However, we need the enumerated
AppearsInFromClauseimpls in order for this to work at all. We cantheoretically generate them specifically for tables, but this still
wouldn't allow multiple aliases to appear together. Perhaps I'm
missing something, but if so it's not clear to me.
allow_types_to_appear_in_same_query!schema.rsyou have a singleallow_tables_to_appear_in_same_query!invocation that contains allyour tables. We could decide to make this convention actually
required for this feature. We could define a macro from inside that
macro which you use to define your aliases. You'd have to define
them all in a single macro call, but it'd just be a single
diesel_define_aliases!after your table declarations. Thisrequires a bit too much specific ordering of things, and I think
it's too much hard to understand magic.
problems that arise from proving a query is type safe. This one is a
non-starter for me personally, but I figured I'd mention it here. In
addition to losing type safety, we would also no longer be able to store
these queries in the prepared statement cache.
Unresolved Questions
allow_types_to_appear_in_same_query!basically obsoletesallow_tables_to_appear_in_same_query!. Should we deprecate it?schema.rsanyway, should weadd support for it in
diesel.tomland generate this code for them?Accessing the aliased columns
The current design has you call
.selection()to get back the resultingaliased columns as a tuple.
Pros
This is the simplest design, and pretty much the only possible design
for aliased subselects, which have an unknown structure and also require
generating labels for each expression (the unique labels will be
required, even if the expression is just a normal column. We have no way
of knowing if there would be a name conflict otherwise). This means that
what you write is the same, regardless of what's being aliased is a
table or a subselect.
Cons
The main issue here is that this is a bit cumbersome, especially if you
want to join on the 17th column in the expression. We can work around
this for specifically tables with an alternate design if we want (see
below), but this restriction is basically always going to exist for
subselects. In the general case this can be worked around by explicitly
calling
.selectwith the bits that you want first.Alternatives
containing all columns as fields, and then make both tables and
columns generic over the alias. This means that you can do
let u2 = users.aliased(users2)and then accessusers.id, etc. Thereare two main reasons I haven't done this. First is just for
consistency, I'd like to have the same thing work for both tables
and subselects. Secondly, this will make our type errors way
worse. Every appearance of a table would change from
schema::foo::tabletoschema::foo::table<schema::foo::__NoAlias>.Same for columns. Not to mention the implementation is substantially
more complicated.
generic over a table and that table when it's aliased. I suspect
this is niche enough to ignore.
named_alias! { foo { bar, baz, qux } }etc. It's more or less the same thing, but it makes you state thenames far more up front so you can access them with dot syntax. We
could potentially bake the select clause in here as well. I feel
like I actually had a more fleshed out design for this a while back,
but it was in the private channel which doesn't have search. I
remember we all liked the tuple version better though.
It's worth noting that the current design doesn't preclude doing the
others in the future, both of those can be built on the current
structure.
Unresolved Questions
selectionmethod probably will always just be calling.default_selection()-- Should we codify that in the code?So that's basically it. The actual code included here is just a minimal
skeleton for table aliases, it does not allow subselects. It for sure
needs more tests added, and especially compile tests. I've been writing
this commit message for like an hour now so I'm going to stop.