Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of table aliasing #1773

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@sgrif
Copy link
Member

sgrif commented Jul 3, 2018

This work is very incomplete, but I think I've poked this from enough
directions to be confident that the problem is well understood, and what
the design tradeoffs and implementation hurdles will be. So I'm pushing
up what I've got so far in order to have a discussion about the
potential design choices.

Any version of this feature that we go with must handle these use cases:

  • Joining to the same table multiple times
  • Joining to a subselect
  • Self referential subselects
    • I'm not sure we actually need this one. Can anyone come up with a
      query that uses a subselect which can't be replaced by a join to
      that same subselect?

Ultimately the main questions around design are:

  • How do we define the alias?
  • How do we reference columns from the alias?

In terms of actual aliasing, I think there's basically only one design
that makes sense. You create an aliased table by doing let aliased = table_or_query.aliased(your_alias), which generates table alias for
tables, and (SUBSELECT) alias for subselects. You can then join to
this by doing whatever.inner_join(aliased.on(...)). These seem
straightforward and don't have many alternatives.

Even when this is fully implemented, this feature has to cover enough
use cases, and is complex enough that I'd like to have at least one full
version with this feature available before we commit to stability on
this. For that reason I've hidden it behind a cfg gate. This is
specifically not a feature, so that the end user has to explicitly opt
into the use of the unstable feature, similar to the requirements around
nightly.

Defining an alias

The design I've gone with requires forward declaring aliases before they
can be used. In the test case I've declared it in the function, but I
expect that the majority of people will define them all in schema.rs
and then add them to their allow_tables_to_appear_in_same_query! line.

The defining macro creates a unit struct with similar properties to
tables.

Pros

This form is flexible and consistent with how we handle tables. It fits
into our type system in the same way, and has the same benefits
(ensuring an alias appears in the from clause no more than once,
disallows referencing columns from that alias unless it's inside the
from clause). It also follows the same "rules" with regards to joins.
Though allow_tables_to_appear_in_same_query! is a total hack, but it's
not going away any time soon, and I think keeping the rules simple and
consistent is valuable.

Cons

This requires a lot of ceremony to ever alias a table. It requires two
lines that are likely going to be in a completely different file before
you can ever even use it. Additionally, the resulting type for an alias
does not have an identical structure to a table module. Because of
that, we have to introduce a new macro
allow_types_to_appear_in_same_query, which takes a type instead of an
ident. We'll probably need to change diesel print-schema to use the
more verbose form.

The fact that we're using unit structs means that you can't use the same
name in the let binding. let users2 = users.aliased(users2) would feel
more natural.

Alternatives

  • Avoid unit structs
    • We'd basically stick a single field on the struct with type () to
      allow let bindings with the same name. This would change the
      invocation to users::table.aliased::<users2>(). This feels
      inconsistent and weird to me.
  • Inline declarations let users2 = users::table.aliased(diesel_alias!(users2))
    • This is what I had originally envisioned, and would be my preferred
      form if it is possible. However, we need the enumerated
      AppearsInFromClause impls in order for this to work at all. We can
      theoretically generate them specifically for tables, but this still
      wouldn't allow multiple aliases to appear together. Perhaps I'm
      missing something, but if so it's not clear to me.
  • Remove the need for allow_types_to_appear_in_same_query!
    • Right now our convention is that in schema.rs you have a single
      allow_tables_to_appear_in_same_query! invocation that contains all
      your tables. We could decide to make this convention actually
      required for this feature. We could define a macro from inside that
      macro which you use to define your aliases. You'd have to define
      them all in a single macro call, but it'd just be a single
      diesel_define_aliases! after your table declarations. This
      requires a bit too much specific ordering of things, and I think
      it's too much hard to understand magic.
  • Strings for aliases
    • If we don't actually try to enforce type safety, we don't have
      problems that arise from proving a query is type safe. This one is a
      non-starter for me personally, but I figured I'd mention it here. In
      addition to losing type safety, we would also no longer be able to store
      these queries in the prepared statement cache.

Unresolved Questions

  • allow_types_to_appear_in_same_query! basically obsoletes
    allow_tables_to_appear_in_same_query!. Should we deprecate it?
  • If we expect folks to be putting this in schema.rs anyway, should we
    add support for it in diesel.toml and generate this code for them?

Accessing the aliased columns

The current design has you call .selection() to get back the resulting
aliased columns as a tuple.

Pros

This is the simplest design, and pretty much the only possible design
for aliased subselects, which have an unknown structure and also require
generating labels for each expression (the unique labels will be
required, even if the expression is just a normal column. We have no way
of knowing if there would be a name conflict otherwise). This means that
what you write is the same, regardless of what's being aliased is a
table or a subselect.

Cons

The main issue here is that this is a bit cumbersome, especially if you
want to join on the 17th column in the expression. We can work around
this for specifically tables with an alternate design if we want (see
below), but this restriction is basically always going to exist for
subselects. In the general case this can be worked around by explicitly
calling .select with the bits that you want first.

Alternatives

  • Stick this on the table instead
    • So the idea here is basically that we turn tables into structs
      containing all columns as fields, and then make both tables and
      columns generic over the alias. This means that you can do let u2 = users.aliased(users2) and then access users.id, etc. There
      are two main reasons I haven't done this. First is just for
      consistency, I'd like to have the same thing work for both tables
      and subselects. Secondly, this will make our type errors way
      worse. Every appearance of a table would change from
      schema::foo::table to schema::foo::table<schema::foo::__NoAlias>.
      Same for columns. Not to mention the implementation is substantially
      more complicated.
    • One big upside here is that it would be possible to write code
      generic over a table and that table when it's aliased. I suspect
      this is niche enough to ignore.
  • Forward declare the fields as well
    • This would look something like named_alias! { foo { bar, baz, qux } } etc. It's more or less the same thing, but it makes you state the
      names far more up front so you can access them with dot syntax. We
      could potentially bake the select clause in here as well. I feel
      like I actually had a more fleshed out design for this a while back,
      but it was in the private channel which doesn't have search. I
      remember we all liked the tuple version better though.

It's worth noting that the current design doesn't preclude doing the
others in the future, both of those can be built on the current
structure.

Unresolved Questions

  • The selection method probably will always just be calling
    .default_selection() -- Should we codify that in the code?

So that's basically it. The actual code included here is just a minimal
skeleton for table aliases, it does not allow subselects. It for sure
needs more tests added, and especially compile tests. I've been writing
this commit message for like an hour now so I'm going to stop.

Initial implementation of table aliasing
This work is very incomplete, but I think I've poked this from enough
directions to be confident that the problem is well understood, and what
the design tradeoffs and implementation hurdles will be. So I'm pushing
up what I've got so far in order to have a discussion about the
potential design choices.

Any version of this feature that we go with must handle these use cases:

- Joining to the same table multiple times
- Joining to a subselect
- Self referential subselects
  - I'm not sure we actually need this one. Can anyone come up with a
    query that uses a subselect which can't be replaced by a join to
    that same subselect?

Ultimately the main questions around design are:

- How do we define the alias?
- How do we reference columns from the alias?

In terms of actual aliasing, I think there's basically only one design
that makes sense. You create an aliased table by doing `let aliased =
table_or_query.aliased(your_alias)`, which generates `table alias` for
tables, and `(SUBSELECT) alias` for subselects. You can then join to
this by doing `whatever.inner_join(aliased.on(...))`. These seem
straightforward and don't have many alternatives.

Even when this is fully implemented, this feature has to cover enough
use cases, and is complex enough that I'd like to have at least one full
version with this feature available before we commit to stability on
this. For that reason I've hidden it behind a cfg gate. This is
specifically not a feature, so that the end user has to explicitly opt
into the use of the unstable feature, similar to the requirements around
nightly.

Defining an alias
==

The design I've gone with requires forward declaring aliases before they
can be used. In the test case I've declared it in the function, but I
expect that the majority of people will define them all in `schema.rs`
and then add them to their `allow_tables_to_appear_in_same_query!` line.

The defining macro creates a unit struct with similar properties to
tables.

Pros
--

This form is flexible and consistent with how we handle tables. It fits
into our type system in the same way, and has the same benefits
(ensuring an alias appears in the from clause no more than once,
disallows referencing columns from that alias unless it's inside the
from clause). It also follows the same "rules" with regards to joins.
Though `allow_tables_to_appear_in_same_query!` is a total hack, but it's
not going away any time soon, and I think keeping the rules simple and
consistent is valuable.

Cons
--

This requires a *lot* of ceremony to ever alias a table. It requires two
lines that are likely going to be in a completely different file before
you can ever even use it. Additionally, the resulting type for an alias
does *not* have an identical structure to a table module. Because of
that, we have to introduce a new macro
`allow_types_to_appear_in_same_query`, which takes a type instead of an
ident. We'll probably need to change `diesel print-schema` to use the
more verbose form.

The fact that we're using unit structs means that you can't use the same
name in the let binding. `let users2 = users.aliased(users2)` would feel
more natural.

Alternatives
--

- Avoid unit structs
  - We'd basically stick a single field on the struct with type `()` to
    allow let bindings with the same name. This would change the
    invocation to `users::table.aliased::<users2>()`. This feels
    inconsistent and weird to me.
- Inline declarations `let users2 = users::table.aliased(diesel_alias!(users2))`
  - This is what I had originally envisioned, and would be my preferred
    form if it is possible. However, we *need* the enumerated
    `AppearsInFromClause` impls in order for this to work at all. We can
    theoretically generate them specifically for tables, but this still
    wouldn't allow multiple aliases to appear together. Perhaps I'm
    missing something, but if so it's not clear to me.
- Remove the need for `allow_types_to_appear_in_same_query!`
  - Right now our convention is that in `schema.rs` you have a single
    `allow_tables_to_appear_in_same_query!` invocation that contains all
    your tables. We could decide to make this convention actually
    required for this feature. We could define a macro from inside that
    macro which you use to define your aliases. You'd have to define
    them all in a single macro call, but it'd just be a single
    `diesel_define_aliases!` after your table declarations. This
    requires a bit too much specific ordering of things, and I think
    it's too much hard to understand magic.
- Strings for aliases
  - If we don't actually try to enforce type safety, we don't have
    problems that arise from proving a query is type safe. This one is a
    non-starter for me personally, but I figured I'd mention it here. In
    addition to losing type safety, we would also no longer be able to store
    these queries in the prepared statement cache.

Unresolved Questions
--

- `allow_types_to_appear_in_same_query!` basically obsoletes
  `allow_tables_to_appear_in_same_query!`. Should we deprecate it?
- If we expect folks to be putting this in `schema.rs` anyway, should we
  add support for it in `diesel.toml` and generate this code for them?

Accessing the aliased columns
==

The current design has you call `.selection()` to get back the resulting
aliased columns as a tuple.

Pros
--

This is the simplest design, and pretty much the only possible design
for aliased subselects, which have an unknown structure and also require
generating labels for each expression (the unique labels will be
required, even if the expression is just a normal column. We have no way
of knowing if there would be a name conflict otherwise). This means that
what you write is the same, regardless of what's being aliased is a
table or a subselect.

Cons
--

The main issue here is that this is a bit cumbersome, especially if you
want to join on the 17th column in the expression. We can work around
this for specifically tables with an alternate design if we want (see
below), but this restriction is basically always going to exist for
subselects. In the general case this can be worked around by explicitly
calling `.select` with the bits that you want first.

Alternatives
--

- Stick this on the table instead
  - So the idea here is basically that we turn tables into structs
    containing all columns as fields, and then make both tables and
    columns generic over the alias. This means that you can do `let
    u2 = users.aliased(users2)` and then access `users.id`, etc. There
    are two main reasons I haven't done this. First is just for
    consistency, I'd like to have the same thing work for both tables
    and subselects. Secondly, this will make our type errors *way*
    worse. Every appearance of a table would change from
    `schema::foo::table` to `schema::foo::table<schema::foo::__NoAlias>`.
    Same for columns. Not to mention the implementation is substantially
    more complicated.
  - One big upside here is that it would be possible to write code
    generic over a table and that table when it's aliased. I suspect
    this is niche enough to ignore.
- Forward declare the fields as well
  - This would look something like `named_alias! { foo { bar, baz, qux }
    }` etc. It's more or less the same thing, but it makes you state the
    names far more up front so you can access them with dot syntax. We
    could potentially bake the select clause in here as well. I feel
    like I actually had a more fleshed out design for this a while back,
    but it was in the private channel which doesn't have search. I
    remember we all liked the tuple version better though.

It's worth noting that the current design doesn't preclude doing the
others in the future, both of those can be built on the current
structure.

Unresolved Questions
--

- The `selection` method probably will always just be calling
  `.default_selection()` -- Should we codify that in the code?

So that's basically it. The actual code included here is just a minimal
skeleton for table aliases, it does not allow subselects. It for sure
needs more tests added, and especially compile tests. I've been writing
this commit message for like an hour now so I'm going to stop.

@sgrif sgrif requested a review from diesel-rs/reviewers Jul 3, 2018

@sgrif

This comment has been minimized.

Copy link
Member Author

sgrif commented Jul 10, 2018

@diesel-rs/reviewers This could still use review. I'd like feedback on this overall design before continuing further on this.

@mehcode

This comment has been minimized.

Copy link
Contributor

mehcode commented Jan 11, 2019

Postfix macros would be nice here:

users::table.aliased!(users2)

Though it'd have a similar problem to "inline declarations". I wonder if that could be resolved with both a marker trait for aliases and specialization for a default impl of AppearsInFromClause for types that implement the marker.

@mehcode

This comment has been minimized.

Copy link
Contributor

mehcode commented Jan 24, 2019

@weiznich I'm not qualified to try and understand the inner workings here but I just want to express how much I would greatly appreciate this being considered for the next diesel release. There is such a large amount of SQL that diesel cannot cover because of this feature.

@sgrif

This comment has been minimized.

Copy link
Member Author

sgrif commented Jan 24, 2019

@mehcode Right now we need feedback on the design of the feature more than the implementation. :)

@nsavch

This comment has been minimized.

Copy link

nsavch commented Mar 27, 2019

I gave a try for this branch yesterday as I had a piece of code which required joining same table multiple times. I'm not very experienced diesel programmer, virtually my first week with it. So I beg my pardon if the feedback is somewhat stupid.

  1. I was perfectly fine with defining aliases in schema.rs and replacing allow_tables_in_the_same_query with allow_types_in_the_same_query. Of course I feel bad that this will get overwritten when I write a new migration, so would be really cool to include aliases into diesel.toml.

  2. Why can't the diesel_define_alias macro specify the aliased table straight away? I understand that aliasing isn't exclusively for tables, but it's one of the main use cases. Could we have a diesel_define_aliased_table!(alias, something::table) macro to avoid calling something::table.aliased(alias) later then?

  3. The selection() function was the most painful and kinda breaks the code flow. Would be much better to use just alias::column_name, but I understand the argument about worse type check errors...

  4. I also noticed that you can't join two aliased tables users::table.inner_join(users2.inner_join(users3)) (where users2 and users3 are aliases of users::table, also the real code included valid .on statements for all joins) didn't work for me. Was this intended design or just an early implementation bug?

Anyway, it's a very much needed feature, even this early implementation helped me a lot :) Thanks for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.