Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

union with multiple relations at once, quote table name #4

Closed
wants to merge 1 commit into from

Conversation

odedniv
Copy link
Contributor

@odedniv odedniv commented Sep 2, 2015

  • Lose all the unnecessary outer SELECTs when unioning with N queries.
  • Table name needs to be quoted as it may be a reserved database keyword.
  • Minor refactoring.

@scarfacedeb
Copy link

+1

@bughit
Copy link

bughit commented Dec 9, 2015

@odedniv

do you know why union all produces extra, unnecessary grouping?:

User.where(id: 1).union(User.where(id: 2), User.where(id: 3)).to_sql
SELECT `users`.* FROM ( (SELECT `users`.* FROM `users` WHERE `users`.`id` = 1) UNION (SELECT `users`.* FROM `users` WHERE `users`.`id` = 2) UNION (SELECT `users`.* FROM `users` WHERE `users`.`id` = 3) ) `users`
User.where(id: 1).union_all(User.where(id: 2), User.where(id: 3)).to_sql
SELECT `users`.* FROM ( ( (SELECT `users`.* FROM `users` WHERE `users`.`id` = 1) UNION ALL (SELECT `users`.* FROM `users` WHERE `users`.`id` = 2) ) UNION ALL (SELECT `users`.* FROM `users` WHERE `users`.`id` = 3) ) `users`

I think this does not impact the execution of the query, right?

But it would be better without it.

@brianhempel Why aren't you merging this?

@odedniv
Copy link
Contributor Author

odedniv commented Dec 9, 2015

@bughit ha! I didn't believe you at first...

You seem to have stumbled on a special handling of union in Arel: https://github.com/rails/arel/blob/77ec13b46af2926bfcfc3073685711c874b0d272/lib/arel/visitors/mysql.rb#L5.
They actually only do this for MySQL, which seems pretty cheap if you ask me. I would do it for all same-type chaining (which work for And, didn't take the time to look for why).

Both Union and UnionAll inherit from Binary, which inherently gives parentheses to its operations. You can publish a pretty simple pull request there to suppress parentheses for UnionAll, but I wouldn't worry about it too much.

@brianhempel
Copy link
Owner

Thanks for the PR! I've half-merged it, as explained below:

Two things:

As performed in this PR, in the test suite the table names quote "like this" on MySQL, but MySQL uses backticks for quote. This is partly my fault: the database specific tests were actually not running (I forgot a yield statement). I've made the proper modification to get the correct type of quoting.

I don't want to support the a.union(b, c, d) syntax until we can reliably produce a UNION b UNION c UNION d queries without nesting. Also, I realized that active_record_union supports AREL, but this PR didn't extend the a.union(b, c, d) syntax to AREL arguments. As I tried to add it I realized that it was going to be a source of bugs to get it right (namely, telling if we have an AREL argument is actually non-trivial since there's more than one class we could be passed; it's better to just let the battle-tested plumbing downstream handle that). I don't have a lot of time these days for open source maintenance, so I don't want the possibility of more bugs like #5 & #7 just for a minuscule improvement in the DSL.

@brianhempel
Copy link
Owner

I'm confused. Hold on.

@brianhempel brianhempel reopened this Mar 20, 2016
@brianhempel
Copy link
Owner

Oh, this does produce a UNION b UNION c UNION d; you were discussing something else up there.

I'll fight with this some more.

@bughit
Copy link

bughit commented Mar 20, 2016

I don't want to support the a.union(b, c, d) syntax until we can reliably produce a UNION b UNION c UNION d queries without nesting.

Are you referring to what I reported for union all? If so, two points:

  1. it does not happen for union, only union all
  2. it's not really nesting, but associativity grouping so it's only a cosmetic problem (I think)

at any rate, this would be good to have, as it is definitely better than the actual nesting that you currently get:

SELECT "posts".* FROM (
  SELECT "posts".* FROM (
    SELECT "posts".* FROM "posts"  WHERE "posts"."user_id" = 1
    UNION
    SELECT "posts".* FROM "posts"  WHERE "posts"."user_id" = 2
  ) posts
  UNION
  SELECT "posts".* FROM "posts"  WHERE (published_at < '2014-07-19 16:12:45.882648')
) posts

@brianhempel
Copy link
Owner

Yeah, I didn't read carefully. I'll make sure this gets in.

@brianhempel
Copy link
Owner

Okay, so in SQLite this PR changes

SELECT "posts".* FROM (
  SELECT "posts".* FROM (
    SELECT "posts".* FROM (
      SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 1
      UNION
      SELECT "posts".* FROM "posts" WHERE "posts"."id" = 2
    ) "posts"
    UNION
    SELECT "posts".* FROM "posts" WHERE "posts"."id" = 3
  ) "posts"
  UNION
  SELECT "posts".* FROM "posts" WHERE "posts"."id" = 4
) "posts"

to

SELECT "posts".* FROM (
  (
    (
      SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 1
      UNION
      SELECT "posts".* FROM "posts" WHERE "posts"."id" = 2
    )
    UNION
    SELECT "posts".* FROM "posts" WHERE "posts"."id" = 3
  )
  UNION
  SELECT "posts".* FROM "posts" WHERE "posts"."id" = 4
) "posts"

Is this a win?

@brianhempel
Copy link
Owner

The second version above is a syntax error in SQLite. So that settles it: we'll have to be content with nesting.

@bughit
Copy link

bughit commented Mar 20, 2016

Is this a win?

I believe so, ideally there would be no grouping, but there is no additional nesting, all the union sub-queries are at the same level, just grouped

@brianhempel
Copy link
Owner

Yeah, SQLite doesn't allow associativity grouping. If you remove the parens, this works:

SELECT "posts".* FROM (
  SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 1
  UNION
  SELECT "posts".* FROM "posts" WHERE "posts"."id" = 2
  UNION
  SELECT "posts".* FROM "posts" WHERE "posts"."id" = 3
  UNION
  SELECT "posts".* FROM "posts" WHERE "posts"."id" = 4
) "posts"

It's going to be too hack-tastic to do that in this gem though, so I'm going to pass on this.

@bughit
Copy link

bughit commented Mar 20, 2016

The second version above is a syntax error in SQLite.So that settles it: we'll have to be content with nesting.

So this is an arel bug about how unions are materialized for sqlite? Why should an arel bug prevent you from adding an otherwise useful feature? Why not report it to arel, assuming it's different from rails/arel#404, and in the mean time the feature will work for other dbs

@brianhempel
Copy link
Owner

Why should an arel bug prevent you from adding an otherwise useful feature?

I want the databases to all have roughly equal support. It's going to be a huge time sink for me to work around SQLite's limitations just for something that's cosmetic.

If someone has a performance problem, that swings the scales a little, but not a lot.

@bughit
Copy link

bughit commented Mar 20, 2016

It's going to be a huge time sink for me to work around SQLite's limitations just for something that's cosmetic.

  1. extra grouping is a minor cosmetic issue, but extra nesting makes muti-union queries nearly unreadable and possibly less performant
  2. I don't think you need to do anything (at least not in your code), I submitted for sqlite, multiple union sub-queries are emitted grouped (parens) which sqlite can not handle rails/arel#418, that's what needs fixing, perhaps you could provide a more isolated example for them.

I want the databases to all have roughly equal support.

denying functionality to mysql, postgres, etc users, that's temporarily unavailable (due to an arel bug) to sqlite users, does not benefit anyone

@brianhempel
Copy link
Owner

v1.1.1 on Rubygems implements table-name quoting.

@odedniv
Copy link
Contributor Author

odedniv commented Mar 20, 2016

Wow, such a fluid discussion, never been to one of those in github. Guess I still haven't since I just woke up!

Anyway, IMO you can add an NotImplementedError for SQLite, as this is a real performance issue, that makes this trivial usage of union somewhat unusable. The whole reason to use unions is to go around a left join performance issue (you never really have to use union).

Performance issues are not even a real problem in SQLite since no one uses SQLite when there's a lot of data (does it support left join?), so gem users can revert to previous nesting for now.

Adding a "pending rails/arel#418" comment will sort this out.

@kainosnoema
Copy link

With very large numbers of union'd subqueries (dozens+), we're seeing VERY slow PostgreSQL response times (multiple seconds), whereas the flat versions are fast as expected (ms). Whether its query parsing or some other side-affect of the nesting, it's untenable to use the nested versions for our use-case. Hoping this gets figured out! In the mean time we'll write our UNIONs by hand.

@kbrock
Copy link

kbrock commented Nov 14, 2018

Hello, this fix has been applied to rails.
Looks like a great Gem. Thanks for publishing it

@odedniv
Copy link
Contributor Author

odedniv commented Nov 14, 2018

@kbrock what do you mean? Can you link to the commit or PR?

@kbrock
Copy link

kbrock commented Nov 14, 2018

Union all fix merged in rails/rails#34437

@odedniv
Copy link
Contributor Author

odedniv commented Nov 14, 2018

Awesome! So does this mean this PR can be merged now?
(I honestly don't remember why it was closed, was it pending this fix?)

@bughit
Copy link

bughit commented Nov 14, 2018

I believe (was a while ago that I looked at it) that nesting is more than a cosmetic problem, it changes the execution, creates more temp tables and likely slows things down, so this should definitely be merged. I've been using @odedniv fork because it makes no sense to use a union gem with sub-optimal multi-union queries.

@odedniv
Copy link
Contributor Author

odedniv commented Nov 14, 2018

Since you're using it, I thought I'd merge the branch with the current upstream. I think my previous workplace might also be using it.

I didn't rebase so that your lockfile would work against my branch, but if/when this branch is merged I would definitely use the rebase/squash options since the history in the branch is irrelevant.

Note that I may have done something wrong during the merge (there were a lot of changes in upstream), so it needs to be code reviewed again.

Goodnight!

@bughit
Copy link

bughit commented Nov 15, 2018

@odedniv Since this PR was partially merged and closed, it may make sense to do another one with the unmerged portion, assuming there are no further objections, @brianhempel?

@odedniv
Copy link
Contributor Author

odedniv commented Nov 15, 2018

@bughit I believe when the PR is reopened (is it possible? I don't see an option but perhaps @brianhempel does) it will show the current difference between master and the branch, which are only the "new" changes (excluding the partially merged areas). So I see no reason to create another PR unless this one cannot be reopened for some reason.

If you'll git diff the branch with master in your local environment you'll see that the "UNION" -> :union change for example is not visible. The "Files changed" tab is only outdated because the PR is closed, and so I think GH is attempting to show a snapshot of what was the state when the PR was last open.

The fork branch is updated and this PR is for that updated branch.

@bughit
Copy link

bughit commented Feb 12, 2019

@brianhempel Please clarify your position on this pr. The arel bug has been fixed so what is standing in the way of merging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants