Skip to content

[CALCITE-3531] AggregateProjectPullUpConstantsRule should not remove …#1602

Merged
danny0405 merged 1 commit intoapache:masterfrom
danny0405:CALCITE-3531
Nov 25, 2019
Merged

[CALCITE-3531] AggregateProjectPullUpConstantsRule should not remove …#1602
danny0405 merged 1 commit intoapache:masterfrom
danny0405:CALCITE-3531

Conversation

@danny0405
Copy link
Contributor

…deterministic function group key if the function is dynamic

We can decide an operator is constant only if:

  1. It is non-dynamic, e.g. it is safe to
    cache query plans referencing this operator;
  2. It is deterministic;
  3. All its operands are constant.


/**
* @return true iff it is unsafe to cache query plans referencing this
* @return true if it is unsafe to cache query plans referencing this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iff means if and only if. So it is ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, 'return true if...' is more verbose than it needs to be. The following all mean practically the same:

  • return true if x is greater than 5
  • return true iff x is greater than 5
  • return true if and only if x is greater than 5
  • return true if it is true that x is greater than 5
  • return whether x is greater than 5

I prefer the last one.

'if' is problematic if you're pedantic; the method would not be breaking its contract if it decided to return true because it's a Tuesday

'iff' is jargon

'true if and only if' adds another layer of indirection. Just as a getTemperature method would be adding indirection if it said that it returns an integer value equal to the temperature rather than returns the temperature.

Lastly, I don't think that the @return javadoc tag helps. I think that every method needs a description, and @return and @param are only necessary if they are not clear from the description. I'd write simply

/** Returns whether x is greater than 5. */

public Boolean visitCall(RexCall call) {
// Constant if operator is deterministic and all operands are
// constant.
return call.getOperator().isDeterministic()
Copy link
Contributor

@amaliujia amaliujia Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment of isDeterministic, I am thinking the original idea of using isDeterministic() is still correct.

Basically current_timestamp is not deterministic: it depends on when it's called and not reply on constant parameters.

Instead of changing here, I found that current_timestamp, current_date, etc. didn't override isDeterministic() so it returns true by default. Should those functions be changed to return fasle?

Using isDynamicFunction happens to be right because current_timestamp, current_date, etc. happen to override isDynamicFunction and return true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_timestamp is deterministic, or non-volatile. Because the value doesn't change within a transaction. Functions like random(), timeofday() are volatile, because the value can change in a single query, we need to reevaluate for every tuple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Danny mentions in the JIRA that if it's a batch job that runs cross days, the result is wrong.

My understanding is if current_timestamp is called in different day because batch job could be large, then it returns a different current_timestamp, do I misunderstand it? If my understanding is correct, then if we say a batch query is in a transaction, it sounds more like current_timestamp implementation is wrong.

I found that in SqlFunctions, current_timestamp is marked as non-deterministic: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L2235

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on different database vendors. MySQL, Postgres mark it as deterministic, while SQL Server mark it as nondeterministic.

https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_current-timestamp
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_now
NOW() returns a constant time that indicates the time at which the statement began to execute.

Copy link
Member

@hsyuan hsyuan Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the definition of "deterministic":

  • returns the same value every time you call it? (not true for CURRENT_TIMESTAMP)
  • returns the same value every time you call it within a given query execution (true for CURRENT_TIMESTAMP)
  • returns the same value every you call it within a given row (also true for CURRENT_TIMESTAMP, also true for the RANDOM function in some databases IIRC)

I don't recall how Calcite defines "deterministic". Different optimizations need different notions of "constant" and "determinism".

…deterministic function group key if the function is dynamic

We can decide an operator is constant only if:
1. It is non-dynamic, e.g. it is safe to
   cache query plans referencing this operator;
2. It is deterministic;
3. All its operands are constant.
@danny0405 danny0405 merged commit ed4c3cc into apache:master Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants