constants as MAL function parameters prevent intermediate reuse #3361
Last updated: 2020-09-23 17:31:10 +0200
Date: 2013-09-09 17:11:03 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36
Perhaps I missed something in the recent MonetDB developments, but after noticing unexpected MAL plan explosions (see BUG #3294), I came to this simple query:
Which translates to:
It looks to me that this repeats twice the same select. Is that true?
Date: 2013-09-09 17:41:02 +0200
| X_56 := algebra.subselect(X_10,X_7,A0,A0,true,true,false); | mod,0
| X_57 := algebra.subselect(r1_13,A0,A0,true,true,false); | mod,2
| X_58 := algebra.subselect(X_16,X_7,A0,A0,true,true,false); | mod,1
| X_30 := algebra.subselect(X_29,A1,A1,true,true,false); | projection
Optimizer seems to have squeezed it as it should.
Date: 2013-09-09 17:49:16 +0200
I was aware of the 0,1,2 binds.
What I don't understand is: if the result consists of the same selection appended twice, why does X_36 (the second append) read as:
X_36 := bat.append(X_27,X_35,true);
and not as
X_36 := bat.append(X_27,X_26,true);
Date: 2013-09-09 18:02:30 +0200
To further clarify, this is an explain from an older MonetDB version (Dec2011), where you can clearly see that the result of the same select is appended twice to the final result.
Date: 2013-09-09 18:12:35 +0200
A critical test for common term elimination is that instructions should be side-effect free. Appends are not.
Also the equality of the arguments A0, and A1 can not be determined statically.
Date: 2013-09-09 18:23:16 +0200
A difference that I notice with the Dec2011 plan that I posted is indeed that the two strings are now parameters of the function.
Understanding why this is happening doesn't really satisfy me though. The behaviour of the older version was far more correct, I believe. Notice that in real-life queries I get very long plan portions repeated over and over again, just because of this.
Do you think this is the way it should behave?
Date: 2013-09-09 18:32:44 +0200
I recently had the problem with Feb2013 (or more precisely SciQL-2, but I don't expect that it was related / limited to the SciQL extensions on to of Feb2013) that when using a UNION ALL of several sub-queries in a function declaration, common sub-expressions --- in my specific case an aggregation that was identical in the select- and the having-clause of a group by sub-query --- were not recognized as such, i.e., the identical aggregation was evaluated twice. In my case, introducing a separate sub-function per sub-query and UNION ALL over the sub-function calls "solved" the problem.
(Given to many other obligations, I did not have/find the time to file my problem as a proper bug report ... :-()
While I assumed the problem was related to "too complicated" function definitions, it might actually be that it is related to UNION ALL queries --- the most prominent commonality between Roborto's and my case.
Would this make this bug report "valid", again?
Date: 2013-09-09 18:36:08 +0200
While appends are indeed not side-effect free, selections are.
The same holds for the identical aggregations in my group-by-having case.
Date: 2013-09-09 18:38:22 +0200
Indeed, in this case it all boils down to the handling of constants at the SQL level to derive templates. It is up to the SQL engineer to validate and reassign it.
Date: 2013-09-09 19:23:14 +0200
For what it's worth:
Date: 2013-09-09 19:49:08 +0200
Having said that, with the following simple SQL-only example trying to mimic my (actually more complex SciQL) case, I cannot reproduce the problem, i.e., common subexpression elimination works fine with Feb2013-SP3 & latest Feb2013 from HG (changeset 23a797ab1392):
create table bug_3361_stm ( a int, b int );
Date: 2013-09-09 20:29:27 +0200
And another difference between Roberto's and my case is that in Roberto's case the common sub-expression contains a literal, while mine does not.
Date: 2013-09-11 11:51:48 +0200
How literals are passed as function parameters seems indeed the be the main cause here.
If I rewrite my original query as
Then I get the MAL plan that I would expect:
I would put it this way: rewriting the sql query helps the translator produce better MAL code. Ideally, that should not be needed, as SQL is supposed to be declarative. Especially considering that in many real-life applications query are generated - thus more likely to contain verbose patterns, I would consider this something to try and improve. A good reason is also that it came as an unexpected side effect of other developments, reducing the effectiveness of previously working code optimizations.
Will this report be reconsidered?
Date: 2014-08-29 10:29:45 +0200
A year later, I'd like to reopen this bug report.
It was marked as "INVALID", but as the discussion shows, it was not invalid at all.
Perhaps you want to mark it as "WON'T FIX", but I hope you reconsider, as I find it serious for every non-trivial query scenario.
Date: 2014-12-17 16:55:21 +0100
On the sql side the 2 variables are created because of the query cache. For now you could switch off the query cache (set cache=false). This should push the problem back to the commonterms optimizer. Currently the optimizer fails at finding the duplicates.
Date: 2014-12-17 16:57:10 +0100
For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=9a7952fd5a0c
Date: 2015-06-19 18:23:12 +0200
Setting cache=false does indeed inline the parameters.
However commonTerms cannot find the duplicates yet. I don't really understand why.
Notice that the problem I am describing here is the root of MAL plans longer than 30K lines (where I would expect max 2-3K lines).
On Oct2014-SP4, with cache=false (Jul2015 behaves the same):
Date: 2015-06-19 18:37:22 +0200
Indeed, the window for searching for constants in a MAL plan is limited
In this case we have
None of the arguments pairs are identical and in this simple common term
Date: 2015-06-19 18:49:47 +0200
Martin, do you mean that simply adding the existing aliasRemoval before commonTerms in the current pipeline would do the trick? Or would they need changes anyway?
I am aware of the limited window. We discussed this a few times and I still don't agree with it. ;-)
That's why there is no limit in my code (that's the only change), so the window cannot be a reason in this case.
Yes, that optimizer is expensive, but with the current limitations it's hardly ever useful and its failed application can be way worse than its quadratic execution on long plans (that's my opinion of course).
Date: 2015-06-19 20:38:53 +0200
If you add the alias optimizer:
Date: 2015-06-22 10:46:43 +0200
Thanks, that looks much better indeed :)
From my point of view this bug report can be closed as fixed.
However.... ;) I hope you don't mind if I submit others on the same topic. Re-use of intermediate results remains a nasty issue with generated SQL queries.
Date: 2015-08-28 13:42:22 +0200
Jul2015 has been released.
Date: 2016-04-14 12:07:04 +0200
The conclusion for this issues seemed to be, from the last comments, that that aliases optimizer should have been placed before the commonTerms optimizer, in order to catch instructions that are not identical but equivalent.
The bug has been closed as fixed in Jul2015, but I see the default pipe hasn't changed.
Is there a reason? Was that not a good fix?
Date: 2016-04-14 12:57:05 +0200
Actually: the test case
eventually worked correctly by
Now this seems not to be enough any more.
To me this remains a serious issue. Plans are exploding and repeating 40-50 times the same computations.
Date: 2016-04-14 13:08:01 +0200
Yes, the issue is known and we are considering to make caching default false. Calling the optimizer for each query instead. That would remove this issue.
Date: 2016-04-17 11:41:10 +0200
For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=6db31f60efeb
Date: 2016-04-17 11:42:47 +0200
reordered the optimizers such that the alias gets removed. This solve the commonterms problem. For now still the cache needs to be disabled. We are still thinking/testing if the agressive statement cache could be removed (or only used for a smaller set of statements (is updates)).
Date: 2016-05-06 19:02:04 +0200
Just a reminder that the commit above modifies only default_pipe. The others still have the pushselect optimizer after the alias optimizer.
Date: 2020-09-23 17:31:10 +0200
the oct2020 will no longer have the query cache, ie should run these issues without problems.
The text was updated successfully, but these errors were encountered: