Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #2779: DATE_PART structs #2822

Merged
merged 12 commits into from
Dec 29, 2021

Conversation

hawkfish
Copy link
Contributor

Implement for TIMESTAMP, DATE, TIME, INTERVAL and TIMESTAMPTZ.
Tweak promotion weights to prefer VARCHAR to LIST.
Some minor code cleanup and templatisation.
Expand test data.

Richard Wesley added 3 commits December 18, 2021 16:09
Implement for TIMESTAMP, DATE, TIME and INTERVAL.
Tweak promotion weights to prefer VARCHAR to LIST.
Implement for TIMESTAMPTZ.
Some minor code cleanup and templatisation.
Expand test data.
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good. A request for more tests below. Also, since the purpose of this is to improve performance when multiple extractions are requested, could we add some benchmarks comparing multiple calls to date_part to a single call to date_part with multiple components? Particularly I'm interested in the cross-over point (is it already faster with two parts, or three parts, etc), since there does seem to be some extra work performed per value.

test/sql/function/timestamp/test_icu_datepart.test Outdated Show resolved Hide resolved
Handle hostile date part LISTs.
@hawkfish
Copy link
Contributor Author

Filling in a STRUCT is a row-ish operation, so there is definitely a trade off. I was expecting that the multiplexing was less costly than all the arithmetic, but I should measure it.

@hawkfish
Copy link
Contributor Author

I see parity at about 5 (['year', 'month', 'day', 'decade', 'week']), but they both seem to stop there. Profiling is indicated.

It's also worth mentioning that ICU computes all parts when you ask for any of them, but the built in code only computes a few. Is it possible to benchmark ICU calls?

@Mytherin
Copy link
Collaborator

You can add require icu to a benchmark script, similar to a test script.

@hawkfish
Copy link
Contributor Author

hawkfish commented Dec 22, 2021

The crossover depends on what calculations are being performed. In particular, the week calculations are expensive and should only be performed if needed. After filtering computations for the struct, I see the crossover for the YMD family at around 4 values (year, month, day, decade are on par, but adding century puts struct ahead.

The ICU part extraction always benefits because it always computes all parts.

Crossover Dashboard

@Mytherin
Copy link
Collaborator

Mytherin commented Dec 22, 2021

Perhaps it could be sped up by having a bool part_required[X] array that is used to figure out which values are actually required? I imagine certain components are also much more common than others (e.g. I imagine decade, century and millennium are very rarely used).

Richard Wesley added 7 commits December 22, 2021 12:25
@Mytherin
Copy link
Collaborator

Thanks for the updates! This looks great now. One minor nit: there are several combinations of date parts that appear to not be tested, e.g. the blocks with mask & DOW, mask & EPOCH and mask & DOY. Could you add some tests that trigger these scenarios? After that I think this is ready to merge.

Improve coverage and fix a few small bugs.
@hawkfish
Copy link
Contributor Author

Good call - found three small bugs!

@Mytherin Mytherin merged commit ef3070e into duckdb:master Dec 29, 2021
@Mytherin
Copy link
Collaborator

Excellent! Thanks for the updates. Everything looks now.

@hawkfish hawkfish deleted the hawkfish-datepart-struct branch December 29, 2021 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants