-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14321][SQL] Reduce date format cost and string-to-date cost in date functions #13522
Conversation
@cloud-fan - Sorry about the delay. Rebased SPARK-14321 for master. #12105 had become stale and got little messy in my system. Ended up creating this PR. I will close the earlier one after review. |
@@ -554,14 +561,19 @@ case class FromUnixTime(sec: Expression, format: Expression) | |||
boolean ${ev.isNull} = true; | |||
${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};""") | |||
} else { | |||
val sdfTerm = ctx.freshName("formatter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trivial but why use a different variable name here from the above one? (which is called "formatter")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatter
looks better
Test build #60042 has finished for PR 13522 at commit
|
LGTM except one naming comment, thanks for working on it! |
Thank you. I have pushed the fixes in the recent commit. |
Test build #60118 has finished for PR 13522 at commit
|
Some more ideas: I think we should create a subclass of |
… date functions ## What changes were proposed in this pull request? The current implementations of `UnixTime` and `FromUnixTime` do not cache their parser/formatter as much as they could. This PR resolved this issue. This PR is a take over from #13522 and further optimizes the re-use of the parser/formatter. It also fixes the improves handling (catching the actual exception instead of `Throwable`). All credits for this work should go to rajeshbalamohan. This PR closes #13522 ## How was this patch tested? Current tests. Author: Herman van Hovell <hvanhovell@databricks.com> Author: Rajesh Balamohan <rbalamohan@apache.org> Closes #13581 from hvanhovell/SPARK-14321. (cherry picked from commit b076853) Signed-off-by: Reynold Xin <rxin@databricks.com>
… date functions ## What changes were proposed in this pull request? The current implementations of `UnixTime` and `FromUnixTime` do not cache their parser/formatter as much as they could. This PR resolved this issue. This PR is a take over from apache#13522 and further optimizes the re-use of the parser/formatter. It also fixes the improves handling (catching the actual exception instead of `Throwable`). All credits for this work should go to rajeshbalamohan. This PR closes apache#13522 ## How was this patch tested? Current tests. Author: Herman van Hovell <hvanhovell@databricks.com> Author: Rajesh Balamohan <rbalamohan@apache.org> Closes apache#13581 from hvanhovell/SPARK-14321.
What changes were proposed in this pull request?
Here is the generated code snippet when executing date functions. SimpleDateFormat is fairly expensive and can show up bottleneck when processing millions of records. It would be better to instantiate it once.
With modified code, here is the generated code
Similarly Calendar.getInstance was used in DateTimeUtils which can be lazily inited.
How was this patch tested?
org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite,org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite