-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of DateTime64 #5187
Conversation
…o martijn-datetime64
…oDate for some reason, maybe the data is unaligned?
That implementation does not extend supported datatime range, and make nanoseconds a 'default choice' for users who need (let's say) milliseconds only. In my opinion:
|
Even milliseconds forces to use at least 64bit data type instead of 32bit. Consequently, there is not much difference between milliseconds and nanoseconds resolution.
+1. In straightforward implementation (a number of nanoseconds since the epoch), Int64 data type will give us about 292 years around 1970:
It is not obvious whether it's enough. Another way of implementation is to store fractional component in different subcolumn and data stream (like Tuple, Array, Nullable data types are stored and processed). |
If you will use microseconds instead of nanoseconds it will leave space to store 1000x more seconds which will give 292471 years in both directions, which is 100% enough for all possible cases. Or you can decrease subseconds precision to zero and store the age of universe in ClickHouse :) Another (more realistic) scenario - i have microseconds in datastream, but want to store only with 1/10 precision in DB. So in insert it cames like that '2019-05-05 23:02:00.123141203' and want '2019-05-05 23:02:00.1' to be stored. Also - you can use bit shifts instead of decimal division to remove subseconds precision. I.e. give whole lower 31bits for nanoseconds, or lower 21bits for microseconds, or lower 10bits for milliseconds, giving the rest - to seconds. (but that will make it incompatible with plain UInt64, and direct typecast will give strange results) If making some fixed precision for DataTime64 - i think it shoud fixed to mictoseconds, not nanoseconds. I don't know any database natively supporting nanoseconds.
SQL Server: January 1, 1753, through December 31, 9999 |
BTW: anyway DateLUT should be adjusted to support that, right? Bacause it will start getting Int64 instead of UInt32. |
Yes. Some calendar issues will arise (for calculations on historical events), but we can just ignore them.
Ok, but it will be doable nevertheless.
Compiler will translate division to multiplication (latency 3 clock cycles) and bit shift (latency 1 clock cycle). As we done all operations in a loop, the loop will be unrolled and vectorized. SSE4.1 has packed multiplication of two 64bit integers. It will be slower than plain bit shift but not too much (about two-three times).
As far as I know, InfluxDB use nanoseconds precision by default.
We can introduce (probably with template) another DateLUT for Int64 and keep existing (for UInt32) to avoid any performance penalty. Existing DateLUT has too nice memory layout (but the difference should be measured). |
Does the Windowfunnel function support datetime64? |
IMHO should be some with dynamic precision - like Decimal, and like MySQL https://dev.mysql.com/doc/refman/5.6/en/fractional-seconds.html |
The continuation of this PR is here. |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Request for feedback - not for merging yet.
#4860
There is still some functionality that should be added, however, I believe this should be a valid first prototype implementation in terms of functionality, though feedback would be very much appreciated.
For changelog. Remove if this is non-significant change.
Add DateTime64
Category (leave one):
Short description (up to few sentences):
Adding a DateTime64 column. As the name implies it will consist of 64 bits. Currently the only supported way of accessing it is as nanos since epoch. It supports formatting, entering as string and some basic transformations to described in DateTimeTransforms.h.
Detailed description (optional):
The interpretation of the 64 bit value has been factored out, so it should be straightforward to also include functionality for millis/micros since epoch (this would work in a similar fashion to how the timezone changes how the field is interpreted).
The reason for picking nanos since epoch initially is that is also what the python data processing ecosystem has settled on (eg. pandas using int64_t nanos since epoch for their timestamps). The test below outlines the currently supported functionality. As the column is based in Int64, joins work fine, though stuff that is currently lacking is arithmetic.