Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR][DOCS] Document when current_date and current_timestamp are evaluated #29892

Closed
wants to merge 1 commit into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Sep 28, 2020

What changes were proposed in this pull request?

Explicitly document that current_date and current_timestamp are executed at the start of query evaluation. And all calls of current_date/current_timestamp within the same query return the same value

Why are the changes needed?

Users could expect that current_date and current_timestamp return the current date/timestamp at the moment of query execution but in fact the functions are folded by the optimizer at the start of query evaluation:

/**
* Computes the current date and time to make sure we return the same result in a single query.
*/
object ComputeCurrentTime extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = {
val currentDates = mutable.Map.empty[String, Literal]
val timeExpr = CurrentTimestamp()
val timestamp = timeExpr.eval(EmptyRow).asInstanceOf[Long]
val currentTime = Literal.create(timestamp, timeExpr.dataType)
plan transformAllExpressions {
case CurrentDate(Some(timeZoneId)) =>
currentDates.getOrElseUpdate(timeZoneId, {
Literal.create(
LocalDate.now(DateTimeUtils.getZoneId(timeZoneId)),
DateType)
})
case CurrentTimestamp() | Now() => currentTime
}
}
}

Does this PR introduce any user-facing change?

No

How was this patch tested?

by running ./dev/scalastyle.

@SparkQA
Copy link

SparkQA commented Sep 28, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33797/

@SparkQA
Copy link

SparkQA commented Sep 28, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33797/

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good though do you want to merge this into the PR that fixes this behavior?

@SparkQA
Copy link

SparkQA commented Sep 28, 2020

Test build #129181 has finished for PR 29892 at commit df64fae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 1b60ff5 Sep 29, 2020
cloud-fan pushed a commit that referenced this pull request Sep 29, 2020
…e evaluated

### What changes were proposed in this pull request?
Explicitly document that `current_date` and `current_timestamp` are executed at the start of query evaluation. And all calls of `current_date`/`current_timestamp` within the same query return the same value

### Why are the changes needed?
Users could expect that `current_date` and `current_timestamp` return the current date/timestamp at the moment of query execution but in fact the functions are folded by the optimizer at the start of query evaluation:
https://github.com/apache/spark/blob/0df8dd60733066076967f0525210bbdb5e12415a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala#L71-L91

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
by running `./dev/scalastyle`.

Closes #29892 from MaxGekk/doc-current_date.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 1b60ff5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…e evaluated

### What changes were proposed in this pull request?
Explicitly document that `current_date` and `current_timestamp` are executed at the start of query evaluation. And all calls of `current_date`/`current_timestamp` within the same query return the same value

### Why are the changes needed?
Users could expect that `current_date` and `current_timestamp` return the current date/timestamp at the moment of query execution but in fact the functions are folded by the optimizer at the start of query evaluation:
https://github.com/apache/spark/blob/0df8dd60733066076967f0525210bbdb5e12415a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala#L71-L91

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
by running `./dev/scalastyle`.

Closes apache#29892 from MaxGekk/doc-current_date.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 1b60ff5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@MaxGekk MaxGekk deleted the doc-current_date branch December 11, 2020 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants