[SPARK-29362][SQL] Move timestamp expressions and tests to separate files#26031
[SPARK-29362][SQL] Move timestamp expressions and tests to separate files#26031MaxGekk wants to merge 5 commits intoapache:masterfrom
Conversation
|
Test build #111801 has finished for PR 26031 at commit
|
|
@dongjoon-hyun @srowen @cloud-fan @HyukjinKwon Could you take a look at the PR, please. |
srowen
left a comment
There was a problem hiding this comment.
I'm kind of neutral on it. Yeah it's a big file but there are lots of date/time functions, and they're related.
I usually wonder if this will cause a few merge conflict problems, and how much it's worth in comparison.
|
I am just worry IntelliJ IDEA heats my laptop so much when I change this file that I can lost it at some moment. ;-) |
|
I also agree with @srowen . cc @gatorsmile since there was a similar discussion on the other big SQL files. IIRC, it ended up with a negative conclusion. In any cases, this is a case-by-case decision. |
Large file is only one reason. Another one is date expressions and timestamp expressions in
It seems this is main concern, right? As far as I understand, if a private Spark fork has additional date or timestamp expressions, this will cause some conflicts. ... but such conflicts could be easily resolved by keeping an expression in the same place (date expression) or just move it to |
|
@MaxGekk . Since you were not there, I'll give you more context. In addition to them, the last blocker concern of the last discussion was not about resolving conflicts. It's about weakening the traceability due to the loss of the commit history on that file. At Spark 2.0, we did many refactoring like this PR, and this removes the line-by-line commit history and increased the chance of bugs. That was the reason why the PMCs are reluctant to do that again. And, basically, I've heard that comment from @gatorsmile . |
| @ExpressionDescription( | ||
| usage = "_FUNC_() - Returns the current date at the start of query evaluation.", | ||
| since = "1.5.0") | ||
| case class CurrentDate(timeZoneId: Option[String] = None) |
|
I would agree with @dongjoon-hyun . I have to say -1 for such a PR, like what I did in the other similar PRs. Reading our change history is the most efficient way to learn our code base. I did it almost everyday. |
|
Does it preserve git history for both files? I'd imagine at most one of them can be a rename, so the other appears all new. |
@srowen Git (and IntelliJ IDEA) is able to track history for both files. I see original history (and blame annotations) in the renamed file ( |
|
Here is a related answer https://stackoverflow.com/a/40466759 . The command below show blame annotation of copied code: |
|
Interesting, so you can get |
Yes, you just need to pass additional options to git
Less than 2 seconds: $ time git blame -C ./timestampExpressions.scala > /dev/null
git blame -C ./timestampExpressions.scala > /dev/null 1.21s user 0.09s system 99% cpu 1.312 total |
|
That's good. Let me check that too, @MaxGekk . |
|
@MaxGekk . GitHub doesn't support that by default. Please see you new file in GitHub. Everything is under one commit. In this case, I'm still -1. |
|
Yes, and I was also wondering about IJ. I get that you can do this on the command line, which is great. I was also asking what the IJ perf problem is. |
|
I am closing this PR. |
|
Thank you for the decision, @MaxGekk . |
|
This is a tradeoff that we need to evaluate. We should not blindly reject any cross-file code movement simply because it makes commit tracking harder. Users can still find the code movement commit and continue to look at the commits of the old file. It's still doable but harder. For this particular case, I think |
Yea, I think so too. |





What changes were proposed in this pull request?
In the PR, I propose refactoring of date and timestamp expressions by:
datetimeExpressions.scalatotimestampExpressions.scaladatetimeExpressions.scalatodateExpressions.scalaDateExpressionsSuite.scalatoTimestampExpressionsSuite.scala.DateFunctionsSuite.scalatoTimestampFunctionsSuite.scala.Why are the changes needed?
The
datetimeExpressions.scalafile has been becoming large. Its size is more than 2000 lines, at the moment. Also it contains 2 kind of expressions - date expressions and timestamp expressions. To make easier navigation and maintainability of date-time expressions, it would be nice to separate the expressions. The same reason is applicable for timestamp expressions and functions.Does this PR introduce any user-facing change?
No
How was this patch tested?
By existing tests from the test suite
DateExpressionsSuiteandTimestampExpressionSuite, and by new test suitesTimestampExpressionsSuiteandTimestampFunctionsSuite.