Fix date_trunc() behavior for decades, centuries and millenniums and add the ability to extract() these values out of a timestamp. #5056

zRedShift · 2020-12-15T02:05:13Z

Postgres implementation of truncation and extraction

This change is

…havior of PostgeSQL when truncating dates by millennium, century or decade. Negative years are dealt with, as well as the issue with years such as 1900 and 2000 being truncated to 1901 and 2001 respectively.

centuries and decades from timestamps.

CLAassistant · 2020-12-15T02:05:20Z

All committers have signed the CLA.

benesch · 2020-12-15T04:18:32Z

Thanks very much! @quodlibetor or @sploiselle, can you review? Also, we'll need to get some test cases for the new/changed behavior.

zRedShift · 2020-12-15T14:33:28Z

Hey @benesch, thanks for the quick consideration. I added some tests in test/sqllogictest, hopefully it's enough. The additions don't work for INTERVAL/TIME date_part()/date_trunc() functions yet, but those require more work due to being separated from timestamps, dates and date-times, so I think it should go into a separate PR.

sploiselle · 2020-12-15T16:04:26Z

src/expr/src/scalar/func.rs

+    }
+
+    fn extract_decade(&self) -> f64 {
+        f64::from(self.year().div_euclid(10))


Euclidean division doesn't produce different results than standard division in the provided test cases. Can you let me know why you chose this method, or provide a test case that standard division fails?

The current version of Materialize will return the error unsupported timestamp units 'decade' on the test for decade extraction that I added in the last commit.

PostgreSQL will return the right result, "200 200 199 0 0 -1 -1 -2".
The years obtained from self.year() are [2001, 2000, 1999, 1, 0, -1, -10, -11] in the test.

Had I implemented it as

fn extract_decade(&self) -> f64 { f64::from(self.year() / 10) }

That is, using normal division, Materialize would return "200 200 199 0 0 0 -1 -1", which is not how postgres handles it.

Ah, you are correct!

quodlibetor · 2020-12-15T17:43:40Z

Thanks for this @zRedShift! The code looks reasonable, I'm trying to wrap my head around the exact semantics of this math, and it's kind of comforting that postgres has some of the same thoughts as I do:

https://github.com/postgres/postgres/blob/a58db3aa10e62e4228aa409ba006014fa07a8ca2/src/backend/utils/adt/timestamp.c#L4007-L4018

zRedShift · 2020-12-15T18:02:58Z

Yep, just following their logic for feature parity. If only the CE started with the year 0 most of this math could've been spared, but we're not that lucky.

You'd think, why would a real-time streaming database concern itself with timestamps of events that happened over 2 millennia ago? But having a faithful reproduction of the original behavior is still nice to have.

quodlibetor · 2020-12-15T21:43:14Z

why would a real-time streaming database concern itself with timestamps of events that happened over 2 millennia ago?

That high-volume computational archaeology 😂 . I agree doing this right is important!

quodlibetor

LGTM!

Thanks again! Sorry this took awhile, I wanted to make sure that I understood it!

I've got one nit, but if you'd rather not handle it I'm happy to merge as-is, just let me know.

quodlibetor · 2020-12-15T21:44:40Z

src/repr/src/adt/datetime.rs

@@ -96,7 +96,9 @@ impl FromStr for DateTimeUnits {
            "isodow" => Ok(Self::IsoDayOfWeek),
            "isodoy" => Ok(Self::IsoDayOfYear),
            "h" | "hour" | "hours" | "hr" | "hrs" => Ok(Self::Hour),
-            "microsecond" | "microseconds" => Ok(Self::Microseconds),
+            "us" | "usec" | "microsecond" | "microseconds" | "useconds" | "usecs" => {


nit: could you reorganize this slightly:

Suggested change

"us" | "usec" | "microsecond" | "microseconds" | "useconds" | "usecs" => {

"us" | "usec" | "usecs" | "useconds" | "microsecond" | "microseconds" => {

I agree, that's a good change. I just modelled it to have the same order as milliseconds down below. So I'll change the order for both of those.

BTW, I've seen other codebases use burntsushi's fst when there are so many strings to match since the code generated from a match expression doesn't turn the strings into a trie, but since the statements just get parsed once I don't think there's any benefit from using it.

Indeed, in this case it's basically part of the query compilation step so hopefully isn't often in a hot path. We generally like to have benchmarks or other evidence for optimizations that make code meaningfully more complex (even if only under the hood). We should keep it in m ind, though, there might be some places where it would help!

quodlibetor · 2020-12-17T17:35:53Z

Thanks again for this!

zRedShift added 3 commits December 15, 2020 03:43

The current date_trunc() behavior in Materialize doesn't match the be…

08ac00b

…havior of PostgeSQL when truncating dates by millennium, century or decade. Negative years are dealt with, as well as the issue with years such as 1900 and 2000 being truncated to 1901 and 2001 respectively.

Add the ability to extract milliseconds, microseconds, millenniums,

94017f6

centuries and decades from timestamps.

Add some other valid postgres identifiers to microseconds

20126dc

zRedShift mentioned this pull request Dec 15, 2020

date_trunc() behaves incorrectly when asked to truncate to millennium, century or decade #5057

Closed

6 tasks

benesch requested review from sploiselle and quodlibetor December 15, 2020 04:18

zRedShift added 2 commits December 15, 2020 15:06

Add tests for new date_trunc() functionality

b677620

Add tests for new extract() functionality for timestamps and dates

8b4376b

zRedShift marked this pull request as ready for review December 15, 2020 14:33

sploiselle reviewed Dec 15, 2020

View reviewed changes

quodlibetor approved these changes Dec 15, 2020

View reviewed changes

reorder milliseconds and microseconds more tastefully

a067fe9

quodlibetor merged commit 4a64938 into MaterializeInc:main Dec 17, 2020

zRedShift deleted the timestamp-truncation branch December 17, 2020 21:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix date_trunc() behavior for decades, centuries and millenniums and add the ability to extract() these values out of a timestamp. #5056

Fix date_trunc() behavior for decades, centuries and millenniums and add the ability to extract() these values out of a timestamp. #5056

zRedShift commented Dec 15, 2020 •

edited

CLAassistant commented Dec 15, 2020 •

edited

benesch commented Dec 15, 2020

zRedShift commented Dec 15, 2020

sploiselle Dec 15, 2020

zRedShift Dec 15, 2020 •

edited

sploiselle Dec 15, 2020

quodlibetor commented Dec 15, 2020

zRedShift commented Dec 15, 2020

quodlibetor commented Dec 15, 2020

quodlibetor left a comment

quodlibetor Dec 15, 2020

zRedShift Dec 16, 2020

quodlibetor Dec 17, 2020

quodlibetor commented Dec 17, 2020

	"us" \| "usec" \| "microsecond" \| "microseconds" \| "useconds" \| "usecs" => {
	"us" \| "usec" \| "usecs" \| "useconds" \| "microsecond" \| "microseconds" => {

Fix date_trunc() behavior for decades, centuries and millenniums and add the ability to extract() these values out of a timestamp. #5056

Fix date_trunc() behavior for decades, centuries and millenniums and add the ability to extract() these values out of a timestamp. #5056

Conversation

zRedShift commented Dec 15, 2020 • edited

CLAassistant commented Dec 15, 2020 • edited

benesch commented Dec 15, 2020

zRedShift commented Dec 15, 2020

sploiselle Dec 15, 2020

Choose a reason for hiding this comment

zRedShift Dec 15, 2020 • edited

Choose a reason for hiding this comment

sploiselle Dec 15, 2020

Choose a reason for hiding this comment

quodlibetor commented Dec 15, 2020

zRedShift commented Dec 15, 2020

quodlibetor commented Dec 15, 2020

quodlibetor left a comment

Choose a reason for hiding this comment

quodlibetor Dec 15, 2020

Choose a reason for hiding this comment

zRedShift Dec 16, 2020

Choose a reason for hiding this comment

quodlibetor Dec 17, 2020

Choose a reason for hiding this comment

quodlibetor commented Dec 17, 2020

zRedShift commented Dec 15, 2020 •

edited

CLAassistant commented Dec 15, 2020 •

edited

zRedShift Dec 15, 2020 •

edited