Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate unused UNION tables #6326

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Eliminate unused UNION tables #6326

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2017-05-25 16:39:54 +0200
From: Richard Hughes <<richard.monetdb>>
To: SQL devs <>
Version: 11.23.13 (Jun2016-SP2)
CC: @njnes

Last updated: 2019-04-30 12:36:02 +0200

Comment 25349

Date: 2017-05-25 16:39:54 +0200
From: Richard Hughes <<richard.monetdb>>

Created attachment 552
Patch v1 (Jun2016)

The attached patch adds a new relational optimizer pass which removes any unioned selects which cannot contribute any rows to the result set. It applies to Jun2016 7e686221367d; if there's interest from other people then I'll rebase it to something newer.

This is similar to, but a superset of, the merge table functionality implemented in rel_merge_table_rewrite(). I looked at using merge tables in my application, but the following problems exist:

  1. Bug #6235
  2. I'd have to change quite a few bits of my application (not a show-stopper, but it seemed more productive to spend the time in Monet instead)
  3. There's no way to make it work with aligned partitions

That last point needs some explanation. My application partitions tables by date, but has multiple associated tables in each date window, call them 'a' and 'b'. It is much more efficient to execute "(a1 join b1) union (a2 join b2) union (a3 join b3)" than "(a1 union a2 union a3) join (b1 union b2 union b3)" because the row count of the intermediate tables is much smaller. Merge tables, however, can only be used to implement the latter. The attached patch works for either.

A minimal example of this patch at work is:

sql>create table sub1 (i int);
sql>create table sub2 (i int);
sql>insert into sub1 values (10),(20);
sql>insert into sub2 values (30),(40);
sql>alter table sub1 set read only;
sql>alter table sub2 set read only;
sql>analyze sys.sub1;
sql>analyze sys.sub2;
sql>plan select i from sub1 where i between 25 and 50 union all select i from sub2 where i between 25 and 50;
+----------------------------------------+
| rel |
+========================================+
| project ( |
| | select ( |
| | | table(sys.sub2) [ sub2.i ] COUNT |
| | ) [ int "25" <= sub2.i <= int "50" ] |
| ) [ sub2.i ] |
+----------------------------------------+

There clearly need to be a whole bunch of tests written for this change, but I'm putting it out there now as an RFC. This is the first time I've messed around in this area of the MonetDB code base, so I'd appreciate it if somebody could tell me if I've done it right.

Attached file: remove-union-partitions.patch (text/plain, 4238 bytes)
Description: Patch v1 (Jun2016)

Comment 25350

Date: 2017-05-26 14:24:52 +0200
From: Richard Hughes <<richard.monetdb>>

I should mention why partition elimination is important to me. There are two reasons:

  1. Combining indexes. We have a number of queries like "select ... from unionview where t between timestamp '2017-5-14' and timestamp '2017-5-21' and x=42;". In the absence of partition elimination the quickest way to execute this is to filter on 't' first because that gets the row count down fastest and has the best locality in association with other queries; for this reason we actually write the above query as "... and x in (42)" to ensure that the optimizer sees that clause as less selective than the time. With partition elimination we can effectively use indexes on both t and x: out-of-range 't' tables are initially eliminated then the 'x' filter becomes easily the most selective without compromising locality.

  2. MAL interpreter performance. I did this to myself due to Bug #3548: most of our plans are around the 20,000-30,000 instruction mark, and 80,000 instructions is not unusual. Benchmarking suggests that the MAL interpreter can dispatch on the order of 20,000-40,000 instructions per second on our hardware (assuming those instructions have no actual work to do) which means that our queries take upwards of a second to return even if they're easy. Eliminating partitions much earlier saves both instruction execution time and MAL generation/optimization time.

The combination of both of these effects allows us to achieve sub-100ms response times in some common cases and generally saves a second or so on every single query.

Comment 26861

Date: 2019-01-30 12:31:29 +0100
From: MonetDB Mercurial Repository <>

Changeset 2c50b80d80c7 made by Niels Nes niels@cwi.nl in the MonetDB repo, refers to this bug.

For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=2c50b80d80c7

Changeset description:

used patch from feature request Bug #6326, added cleanup of the
statistics on changing the access (read only vs read write)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant