Initial implementation of DateTime64 #5187

Gladdy · 2019-05-04T15:48:29Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Request for feedback - not for merging yet.

#4860
There is still some functionality that should be added, however, I believe this should be a valid first prototype implementation in terms of functionality, though feedback would be very much appreciated.

For changelog. Remove if this is non-significant change.
Add DateTime64

Category (leave one):

New Feature

Short description (up to few sentences):
Adding a DateTime64 column. As the name implies it will consist of 64 bits. Currently the only supported way of accessing it is as nanos since epoch. It supports formatting, entering as string and some basic transformations to described in DateTimeTransforms.h.

Detailed description (optional):
The interpretation of the 64 bit value has been factored out, so it should be straightforward to also include functionality for millis/micros since epoch (this would work in a similar fashion to how the timezone changes how the field is interpreted).

The reason for picking nanos since epoch initially is that is also what the python data processing ecosystem has settled on (eg. pandas using int64_t nanos since epoch for their timestamps). The test below outlines the currently supported functionality. As the column is based in Int64, joins work fine, though stuff that is currently lacking is arithmetic.

CREATE TABLE A(t DateTime64) ENGINE = MergeTree() ORDER BY t;
INSERT INTO A(t) VALUES (1556879125123456789);
INSERT INTO A(t) VALUES ('2019-05-03 11:25:25.123456789');

SELECT toString(t, 'UTC'), toDate(t), toStartOfDay(t), toStartOfQuarter(t), toTime(t), toStartOfMinute(t) FROM A ORDER BY t;

2019-05-03 10:25:25.123456789	2019-05-03	2019-05-03 00:00:00	2019-04-01	1970-01-02 11:25:25	2019-05-03 11:25:00
2019-05-03 10:25:25.123456789	2019-05-03	2019-05-03 00:00:00	2019-04-01	1970-01-02 11:25:25	2019-05-03 11:25:00

…o martijn-datetime64

…oDate for some reason, maybe the data is unaligned?

filimonov · 2019-05-05T19:55:28Z

That implementation does not extend supported datatime range, and make nanoseconds a 'default choice' for users who need (let's say) milliseconds only.

In my opinion:

precision should be optionated (like in Decimal datatype), because nanoseconds are needed quite rare (while micro-/milli- seconds - quite often).
introducing 'wider' DataTime fields shoud also allow to cover wider timeranges than 1970..2105. (it can be just fallback to traditional and slow calendar calculations for times not covered by lookup tables).

alexey-milovidov · 2019-05-05T20:17:54Z

precision should be optionated (like in Decimal datatype), because nanoseconds are needed quite rare (while micro-/milli- seconds - quite often).

Even milliseconds forces to use at least 64bit data type instead of 32bit. Consequently, there is not much difference between milliseconds and nanoseconds resolution.

introducing 'wider' DataTime fields shoud also allow to cover wider timeranges than 1970..2105. (it can be just fallback to traditional and slow calendar calculations for times not covered by lookup tables).

+1.

In straightforward implementation (a number of nanoseconds since the epoch), Int64 data type will give us about 292 years around 1970:

example.yandex.net :) SELECT (0x7FFFFFFFFFFFFFFF / 1000000000) / 86400 / 365

SELECT ((9223372036854775807 / 1000000000) / 86400) / 365

┌─divide(divide(divide(9223372036854775807, 1000000000), 86400), 365)─┐
│                                                    292.471208677536 │
└─────────────────────────────────────────────────────────────────────┘

It is not obvious whether it's enough.

Another way of implementation is to store fractional component in different subcolumn and data stream (like Tuple, Array, Nullable data types are stored and processed).

filimonov · 2019-05-05T20:31:51Z

precision should be optionated (like in Decimal datatype), because nanoseconds are needed quite rare (while micro-/milli- seconds - quite often).

Even milliseconds forces to use at least 64bit data type instead of 32bit. Consequently, there is not much difference between milliseconds and nanoseconds resolution.

If you will use microseconds instead of nanoseconds it will leave space to store 1000x more seconds which will give 292471 years in both directions, which is 100% enough for all possible cases.

Or you can decrease subseconds precision to zero and store the age of universe in ClickHouse :)

Another (more realistic) scenario - i have microseconds in datastream, but want to store only with 1/10 precision in DB. So in insert it cames like that '2019-05-05 23:02:00.123141203' and want '2019-05-05 23:02:00.1' to be stored.

Also - you can use bit shifts instead of decimal division to remove subseconds precision. I.e. give whole lower 31bits for nanoseconds, or lower 21bits for microseconds, or lower 10bits for milliseconds, giving the rest - to seconds. (but that will make it incompatible with plain UInt64, and direct typecast will give strange results)

If making some fixed precision for DataTime64 - i think it shoud fixed to mictoseconds, not nanoseconds. I don't know any database natively supporting nanoseconds.

introducing 'wider' DataTime fields shoud also allow to cover wider timeranges than 1970..2105. (it can be just fallback to traditional and slow calendar calculations for times not covered by lookup tables).

+1.

In straightforward implementation (a number of nanoseconds since the epoch), Int64 data type will give us about 292 years around 1970:
It is not obvious whether it's enough.

SQL Server: January 1, 1753, through December 31, 9999
MySQL: '1000-01-01 00:00:00' to '9999-12-31 23:59:59'
Postgres: 4713 BC..294276 AD
Oracle: '0001-01-01-00.00.00.000000'..'9999-12-31-23.59.59.999999'

filimonov · 2019-05-05T20:48:56Z

In straightforward implementation (a number of nanoseconds since the epoch), Int64 data type will give us about 292 years around 1970:

BTW: anyway DateLUT should be adjusted to support that, right? Bacause it will start getting Int64 instead of UInt32.

alexey-milovidov · 2019-05-05T23:28:44Z

will give 292471 years in both directions, which is 100% enough for all possible cases.

Yes. Some calendar issues will arise (for calculations on historical events), but we can just ignore them.

Another (more realistic) scenario - i have microseconds in datastream, but want to store only with 1/10 precision in DB. So in insert it cames like that '2019-05-05 23:02:00.123141203' and want '2019-05-05 23:02:00.1' to be stored.

Ok, but it will be doable nevertheless.

Also - you can use bit shifts instead of decimal division to remove subseconds precision. I.e. give whole lower 31bits for nanoseconds, or lower 21bits for microseconds, or lower 10bits for milliseconds, giving the rest - to seconds. (but that will make it incompatible with plain UInt64, and direct typecast will give strange results)

Compiler will translate division to multiplication (latency 3 clock cycles) and bit shift (latency 1 clock cycle). As we done all operations in a loop, the loop will be unrolled and vectorized. SSE4.1 has packed multiplication of two 64bit integers. It will be slower than plain bit shift but not too much (about two-three times).

If making some fixed precision for DataTime64 - i think it shoud fixed to mictoseconds, not nanoseconds. I don't know any database natively supporting nanoseconds.

As far as I know, InfluxDB use nanoseconds precision by default.

BTW: anyway DateLUT should be adjusted to support that, right? Bacause it will start getting Int64 instead of UInt32.

We can introduce (probably with template) another DateLUT for Int64 and keep existing (for UInt32) to avoid any performance penalty. Existing DateLUT has too nice memory layout (but the difference should be measured).

suxw8813 · 2019-05-15T06:25:56Z

Does the Windowfunnel function support datetime64?

filimonov · 2019-07-05T09:21:14Z

IMHO should be some with dynamic precision - like Decimal, and like MySQL https://dev.mysql.com/doc/refman/5.6/en/fractional-seconds.html

stavrolia · 2019-11-27T15:12:34Z

The continuation of this PR is here.

Gladdy added 15 commits April 1, 2019 11:12

add datetime64 definition

e40a384

able to insert DateTime64 objects into the table

9407a24

Merge branch 'master' into martijn-datetime64

fba1321

add datetime64 definition

a602a6c

able to insert DateTime64 objects into the table

23b53ee

Merge branch 'master' into martijn-datetime64

f76a2e2

Merge branch 'martijn-datetime64' of github.com:Gladdy/ClickHouse int…

f6319cf

…o martijn-datetime64

read and write datetime64

fc0e8d3

factor out the conversion between datetime64 and the uint64

258f425

also parse datetime64 from string

176fc98

fixed up test

c5af12a

working timezone conversion in the toString function

118e498

various formatters are working, though there is still an issue with t…

81ca591

…oDate for some reason, maybe the data is unaligned?

Merge branch 'master' into martijn-datetime64

4b1ac69

working datetime64 with some conversions

3d7656c

alexey-milovidov added can be tested pr-feature Pull request with new product feature labels May 4, 2019

vitlibar self-assigned this May 31, 2019

alexey-milovidov unassigned vitlibar Sep 2, 2019

filimonov assigned filimonov and unassigned filimonov Sep 3, 2019

stale bot added the st-wontfix Known issue, no plans to fix it currenlty label Oct 20, 2019

blinkov removed the st-wontfix Known issue, no plans to fix it currenlty label Oct 20, 2019

ClickHouse deleted a comment from stale bot Oct 29, 2019

stavrolia closed this Nov 27, 2019

Enmk mentioned this pull request Dec 11, 2019

DateTime64 data type #7170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial implementation of DateTime64 #5187

Initial implementation of DateTime64 #5187

Gladdy commented May 4, 2019 •

edited

Loading

filimonov commented May 5, 2019 •

edited

Loading

alexey-milovidov commented May 5, 2019 •

edited

Loading

filimonov commented May 5, 2019 •

edited

Loading

filimonov commented May 5, 2019 •

edited

Loading

alexey-milovidov commented May 5, 2019

suxw8813 commented May 15, 2019

filimonov commented Jul 5, 2019

stavrolia commented Nov 27, 2019

Initial implementation of DateTime64 #5187

Initial implementation of DateTime64 #5187

Conversation

Gladdy commented May 4, 2019 • edited Loading

Request for feedback - not for merging yet.

filimonov commented May 5, 2019 • edited Loading

alexey-milovidov commented May 5, 2019 • edited Loading

filimonov commented May 5, 2019 • edited Loading

filimonov commented May 5, 2019 • edited Loading

alexey-milovidov commented May 5, 2019

suxw8813 commented May 15, 2019

filimonov commented Jul 5, 2019

stavrolia commented Nov 27, 2019

Gladdy commented May 4, 2019 •

edited

Loading

filimonov commented May 5, 2019 •

edited

Loading

alexey-milovidov commented May 5, 2019 •

edited

Loading

filimonov commented May 5, 2019 •

edited

Loading

filimonov commented May 5, 2019 •

edited

Loading