Skip to content

[CALCITE-6226] Wrong ISOWEEK and no ISOYEAR on BigQuery FORMAT_DATE#3653

Merged
rubenada merged 1 commit intoapache:mainfrom
rorueda:fix-isoweek
Sep 23, 2024
Merged

[CALCITE-6226] Wrong ISOWEEK and no ISOYEAR on BigQuery FORMAT_DATE#3653
rubenada merged 1 commit intoapache:mainfrom
rorueda:fix-isoweek

Conversation

@rorueda
Copy link
Copy Markdown
Contributor

@rorueda rorueda commented Jan 26, 2024

}
},
D("F", "The weekday (Monday as the first day of the week) as a decimal number (1-7)") {
// TODO: Parsing of weekdays with sunday as first day of the week
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use Todo?
Can we finish it in the ticket?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can. I'm not that familiar with the codebase, but my understanding is that the parsing uses java's DateFormat class that that simply doesn't have a pattern for the weekday with Sunday as first day of the week.

There are other TODOs related to parsing in this class, and I think to make it all work a more broad/complex change is needed, mainly because in some cases not only the pattern is important when parsing, but also the calendar.

Note that I only removed the "F" because it is not the pattern for weekday. I know the best would be to fix it all, but I think it's better to point out it's not implemented yet than to have an implementation that gives us the wrong result.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the TODO, log a jira case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created CALCITE-6233 to handle the parsing of ISOWEEK.

It doesn't seem that parsing the weekday with Sunday as first day (1) is possible with java's DateFormat.

final Calendar calendar = Work.get().calendar;
calendar.setTime(date);
int weekDay = calendar.get(Calendar.DAY_OF_WEEK);
sb.append(String.format(Locale.ROOT, "%d", weekDay == 1 ? 7 : weekDay - 1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this just convert from Sunday being 1 to Monday being 1?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Maybe change the comparison to weekDay == Calendar.SUNDAY to make it clearer? Do you see another way?

Copy link
Copy Markdown
Contributor

@tanclary tanclary Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work, or a simple comment could help also. It is no big deal either way :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind Calcite is complex enough that when we have opportunities to explain little pieces of logic to reduce mental load for the developer, we should

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DOW function implementation assumes that Sunday = 1.
In Postgres DOW assumes that Sunday = 0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment - don't think my suggestion would make it clearer.

Looking at Postgres documentation, the EXTRACT(DOW... assumes Sunday = 0, but the format functions to_char, to_date,... also assume Sunday = 1 for day of week.

}
},
D("F", "The weekday (Monday as the first day of the week) as a decimal number (1-7)") {
// TODO: Parsing of weekdays with sunday as first day of the week
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the TODO, log a jira case.

assertFormatElement(FormatElementEnum.IW, "2014-09-30T10:00:00Z", "40");
@Test void testID() {
assertFormatElement(FormatElementEnum.ID, "2014-09-30T10:00:00Z", "2");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why some of the tests in this class are parameterized and others are not.

Frankly I don't think the framework is helping make the tests more understandable. I would move to code, and add comments if a test is testing something subtle.

It seems to me that this test class should be very simple.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is one method for each format element, I didn't want to add new methods to test the edge cases of ISOWEEK and ISOYEAR, and I thought a good solution was to make those tests parameterized.

But I agree that the way it was done didn't help in making the added test cases understandable. I removed the parameters and made more assertions inside the method (with a comment).

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

6 New issues
0 Security Hotspots
83.7% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

@Test
void testIW() {
assertFormatElement(FormatElementEnum.IW, "2014-09-30T10:00:00Z", "40");
// edge case where ISO WEEK != WEEK
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have discovered that the Java Calendar class doesn't handle correctly some operations with dates before the Gregorian calendar introduction. Can you please add some tests for dates in that range?
https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-6252
Can you also confirm that the values have been validated against BigQuery?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added additional test cases, all manually validated against BigQuery.

@rorueda
Copy link
Copy Markdown
Contributor Author

rorueda commented Mar 9, 2024

Since I had to rebase it onto master to address the last change request, I squashed the commits.

@julianhyde
Copy link
Copy Markdown
Contributor

julianhyde commented Mar 9, 2024 via email

Copy link
Copy Markdown
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust that the results were all compared against the reference databases, I didn't check myself.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Mar 9, 2024

The ISOWEEK format function was not setting the minimalDaysInFirstWeek
to 4.

To avoid having to set the calendar week definition fields in each
format function, a new calendar instance configured with the iso8601
settings was added.

The format elements for the ISOYEAR with the century (%G) and without
it (%g) were added.

Also, the weekday with Monday as first day of week (%u) was fixed.
@rorueda
Copy link
Copy Markdown
Contributor Author

rorueda commented Sep 17, 2024

This is a bit old, so there were conflicts with the current master. I did a rebase and fixed the conflicts.

@sonarqubecloud
Copy link
Copy Markdown

@rubenada
Copy link
Copy Markdown
Contributor

LGTM

@rubenada rubenada added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Sep 18, 2024
"VARCHAR NOT NULL");
f.checkString("to_char(timestamp '2022-06-03 13:15:48.678', 'IW')",
"23",
"22",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a bug?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same ISOWEEK issue, but for PostgreSQL: #3653 (comment)

@rubenada rubenada merged commit 8ffcc2a into apache:main Sep 23, 2024
@rorueda rorueda deleted the fix-isoweek branch June 4, 2025 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants