[CALCITE-3531] AggregateProjectPullUpConstantsRule should not remove …#1602
[CALCITE-3531] AggregateProjectPullUpConstantsRule should not remove …#1602danny0405 merged 1 commit intoapache:masterfrom
Conversation
|
|
||
| /** | ||
| * @return true iff it is unsafe to cache query plans referencing this | ||
| * @return true if it is unsafe to cache query plans referencing this |
There was a problem hiding this comment.
iff means if and only if. So it is ok.
There was a problem hiding this comment.
In my opinion, 'return true if...' is more verbose than it needs to be. The following all mean practically the same:
- return true if x is greater than 5
- return true iff x is greater than 5
- return true if and only if x is greater than 5
- return true if it is true that x is greater than 5
- return whether x is greater than 5
I prefer the last one.
'if' is problematic if you're pedantic; the method would not be breaking its contract if it decided to return true because it's a Tuesday
'iff' is jargon
'true if and only if' adds another layer of indirection. Just as a getTemperature method would be adding indirection if it said that it returns an integer value equal to the temperature rather than returns the temperature.
Lastly, I don't think that the @return javadoc tag helps. I think that every method needs a description, and @return and @param are only necessary if they are not clear from the description. I'd write simply
/** Returns whether x is greater than 5. */
| public Boolean visitCall(RexCall call) { | ||
| // Constant if operator is deterministic and all operands are | ||
| // constant. | ||
| return call.getOperator().isDeterministic() |
There was a problem hiding this comment.
Based on the comment of isDeterministic, I am thinking the original idea of using isDeterministic() is still correct.
Basically current_timestamp is not deterministic: it depends on when it's called and not reply on constant parameters.
Instead of changing here, I found that current_timestamp, current_date, etc. didn't override isDeterministic() so it returns true by default. Should those functions be changed to return fasle?
Using isDynamicFunction happens to be right because current_timestamp, current_date, etc. happen to override isDynamicFunction and return true.
There was a problem hiding this comment.
current_timestamp is deterministic, or non-volatile. Because the value doesn't change within a transaction. Functions like random(), timeofday() are volatile, because the value can change in a single query, we need to reevaluate for every tuple.
There was a problem hiding this comment.
Danny mentions in the JIRA that if it's a batch job that runs cross days, the result is wrong.
My understanding is if current_timestamp is called in different day because batch job could be large, then it returns a different current_timestamp, do I misunderstand it? If my understanding is correct, then if we say a batch query is in a transaction, it sounds more like current_timestamp implementation is wrong.
I found that in SqlFunctions, current_timestamp is marked as non-deterministic: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L2235
There was a problem hiding this comment.
It depends on different database vendors. MySQL, Postgres mark it as deterministic, while SQL Server mark it as nondeterministic.
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_current-timestamp
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_now
NOW() returns a constant time that indicates the time at which the statement began to execute.
There was a problem hiding this comment.
There was a problem hiding this comment.
Depends on the definition of "deterministic":
- returns the same value every time you call it? (not true for CURRENT_TIMESTAMP)
- returns the same value every time you call it within a given query execution (true for CURRENT_TIMESTAMP)
- returns the same value every you call it within a given row (also true for CURRENT_TIMESTAMP, also true for the RANDOM function in some databases IIRC)
I don't recall how Calcite defines "deterministic". Different optimizations need different notions of "constant" and "determinism".
…deterministic function group key if the function is dynamic We can decide an operator is constant only if: 1. It is non-dynamic, e.g. it is safe to cache query plans referencing this operator; 2. It is deterministic; 3. All its operands are constant.
4d09b48 to
ac2b440
Compare
…deterministic function group key if the function is dynamic
We can decide an operator is constant only if:
cache query plans referencing this operator;