[CALCITE-6226] Wrong ISOWEEK and no ISOYEAR on BigQuery FORMAT_DATE#3653
[CALCITE-6226] Wrong ISOWEEK and no ISOYEAR on BigQuery FORMAT_DATE#3653rubenada merged 1 commit intoapache:mainfrom
Conversation
| } | ||
| }, | ||
| D("F", "The weekday (Monday as the first day of the week) as a decimal number (1-7)") { | ||
| // TODO: Parsing of weekdays with sunday as first day of the week |
There was a problem hiding this comment.
Why use Todo?
Can we finish it in the ticket?
There was a problem hiding this comment.
I don't think we can. I'm not that familiar with the codebase, but my understanding is that the parsing uses java's DateFormat class that that simply doesn't have a pattern for the weekday with Sunday as first day of the week.
There are other TODOs related to parsing in this class, and I think to make it all work a more broad/complex change is needed, mainly because in some cases not only the pattern is important when parsing, but also the calendar.
Note that I only removed the "F" because it is not the pattern for weekday. I know the best would be to fix it all, but I think it's better to point out it's not implemented yet than to have an implementation that gives us the wrong result.
There was a problem hiding this comment.
Please remove the TODO, log a jira case.
There was a problem hiding this comment.
Created CALCITE-6233 to handle the parsing of ISOWEEK.
It doesn't seem that parsing the weekday with Sunday as first day (1) is possible with java's DateFormat.
core/src/main/java/org/apache/calcite/util/format/FormatElementEnum.java
Show resolved
Hide resolved
| final Calendar calendar = Work.get().calendar; | ||
| calendar.setTime(date); | ||
| int weekDay = calendar.get(Calendar.DAY_OF_WEEK); | ||
| sb.append(String.format(Locale.ROOT, "%d", weekDay == 1 ? 7 : weekDay - 1)); |
There was a problem hiding this comment.
Does this just convert from Sunday being 1 to Monday being 1?
There was a problem hiding this comment.
Yes. Maybe change the comparison to weekDay == Calendar.SUNDAY to make it clearer? Do you see another way?
There was a problem hiding this comment.
That could work, or a simple comment could help also. It is no big deal either way :)
There was a problem hiding this comment.
In my mind Calcite is complex enough that when we have opportunities to explain little pieces of logic to reduce mental load for the developer, we should
There was a problem hiding this comment.
The DOW function implementation assumes that Sunday = 1.
In Postgres DOW assumes that Sunday = 0.
There was a problem hiding this comment.
Added a comment - don't think my suggestion would make it clearer.
Looking at Postgres documentation, the EXTRACT(DOW... assumes Sunday = 0, but the format functions to_char, to_date,... also assume Sunday = 1 for day of week.
| } | ||
| }, | ||
| D("F", "The weekday (Monday as the first day of the week) as a decimal number (1-7)") { | ||
| // TODO: Parsing of weekdays with sunday as first day of the week |
There was a problem hiding this comment.
Please remove the TODO, log a jira case.
| assertFormatElement(FormatElementEnum.IW, "2014-09-30T10:00:00Z", "40"); | ||
| @Test void testID() { | ||
| assertFormatElement(FormatElementEnum.ID, "2014-09-30T10:00:00Z", "2"); | ||
| } |
There was a problem hiding this comment.
I don't understand why some of the tests in this class are parameterized and others are not.
Frankly I don't think the framework is helping make the tests more understandable. I would move to code, and add comments if a test is testing something subtle.
It seems to me that this test class should be very simple.
There was a problem hiding this comment.
As there is one method for each format element, I didn't want to add new methods to test the edge cases of ISOWEEK and ISOYEAR, and I thought a good solution was to make those tests parameterized.
But I agree that the way it was done didn't help in making the added test cases understandable. I removed the parameters and made more assertions inside the method (with a comment).
|
| @Test | ||
| void testIW() { | ||
| assertFormatElement(FormatElementEnum.IW, "2014-09-30T10:00:00Z", "40"); | ||
| // edge case where ISO WEEK != WEEK |
There was a problem hiding this comment.
I have discovered that the Java Calendar class doesn't handle correctly some operations with dates before the Gregorian calendar introduction. Can you please add some tests for dates in that range?
https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-6252
Can you also confirm that the values have been validated against BigQuery?
There was a problem hiding this comment.
I added additional test cases, all manually validated against BigQuery.
|
Since I had to rebase it onto master to address the last change request, I squashed the commits. |
|
It always struck me as strange that this change is using Calendar. Day of the week is simple if you convert to Julian date. JulianOn Mar 9, 2024, at 07:59, rorueda ***@***.***> wrote:
@rorueda commented on this pull request.
In core/src/test/java/org/apache/calcite/util/format/FormatElementEnumTest.java:
assertFormatElement(FormatElementEnum.IW, "2014-09-30T10:00:00Z", "40");
+ // edge case where ISO WEEK != WEEK
I added additional test cases, all manually validated against BigQuery.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
mihaibudiu
left a comment
There was a problem hiding this comment.
I trust that the results were all compared against the reference databases, I didn't check myself.
|
The ISOWEEK format function was not setting the minimalDaysInFirstWeek to 4. To avoid having to set the calendar week definition fields in each format function, a new calendar instance configured with the iso8601 settings was added. The format elements for the ISOYEAR with the century (%G) and without it (%g) were added. Also, the weekday with Monday as first day of week (%u) was fixed.
e141648 to
71b5c96
Compare
|
This is a bit old, so there were conflicts with the current master. I did a rebase and fixed the conflicts. |
|
|
LGTM |
core/src/main/java/org/apache/calcite/util/format/FormatElementEnum.java
Show resolved
Hide resolved
| "VARCHAR NOT NULL"); | ||
| f.checkString("to_char(timestamp '2022-06-03 13:15:48.678', 'IW')", | ||
| "23", | ||
| "22", |
There was a problem hiding this comment.
Yes, same ISOWEEK issue, but for PostgreSQL: #3653 (comment)



https://issues.apache.org/jira/browse/CALCITE-6226