Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Dates #41

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

feat: add Dates #41

wants to merge 8 commits into from

Conversation

ianna
Copy link
Member

@ianna ianna commented Oct 19, 2023

No description provided.

@ianna ianna marked this pull request as draft October 19, 2023 14:42
Project.toml Outdated Show resolved Hide resolved
Copy link
Member Author

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, I'll need to add some tests for Datetime.

Project.toml Outdated Show resolved Hide resolved
@ianna ianna changed the title test: add Dates and other dependencies feat: add Dates Nov 15, 2023
@ianna ianna marked this pull request as ready for review November 15, 2023 17:03
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a high-priority PR (as I said, "nice to have, because Awkward can handle dates"), so you can come back to it, but it's not currently ready.

In NumPy, and therefore Awkward Array, the datetime and timedelta primitives are parameterized by units. If unspecified, NumPy has a default.

https://github.com/scikit-hep/awkward/blob/062deb3ab39525c741907893dca402aa480b8b0b/src/awkward/types/numpytype.py#L28-L65

Specifically, the primitive strings can be

_primitive_to_dtype_datetime = re.compile(
    r"datetime64\[(\s*-?[0-9]*)?(Y|M|W|D|h|m|s|ms|us|\u03bc|ns|ps|fs|as)\]"
)
_primitive_to_dtype_timedelta = re.compile(
    r"timedelta64\[(\s*-?[0-9]*)?(Y|M|W|D|h|m|s|ms|us|\u03bc|ns|ps|fs|as)\]"
)

To fully implement dates, the Julia code would have to parse the string and multiply the UInt64 values in the buffer by an appropriate amount.

NumPy's default units are not the same as Julia's units, so this PR will always give wrong results:

>>> import numpy as np
>>> [int(x) for x in np.array(["2023-11-15T13:10:00"], dtype="datetime64").tobytes()]
[40, 195, 84, 101, 0, 0, 0, 0]
>>> [int(x) for x in np.array([123], dtype="timedelta64").tobytes()]
[123, 0, 0, 0, 0, 0, 0, 0]
julia> reinterpret(Dates.DateTime, Vector{UInt8}([56, 194, 84, 101, 0, 0, 0, 0]))
1-element reinterpret(Dates.DateTime, ::Vector{UInt8}):
 0001-01-19T16:14:13.560

julia> reinterpret(Dates.TimePeriod, Vector{UInt8}([123, 0, 0, 0, 0, 0, 0, 0]))
ERROR: ArgumentError: cannot reinterpret `UInt8` as `Dates.TimePeriod`, type `Dates.TimePeriod` is not a bits type
Stacktrace:
 [1] (::Base.var"#throwbits#323")(S::Type, T::Type, U::Type)
   @ Base ./reinterpretarray.jl:16
 [2] reinterpret(#unused#::Type{Dates.TimePeriod}, a::Vector{UInt8})
   @ Base ./reinterpretarray.jl:62
 [3] top-level scope
   @ REPL[4]:1

Furthermore, it doesn't look like reinterpret(TimePeriod, buffer) is even compilable. The tests probably pass because this code isn't being compiled.

@ianna ianna marked this pull request as draft November 16, 2023 08:28
@Moelf
Copy link
Member

Moelf commented Nov 16, 2023

julia> dump(now())
DateTime
  instant: Dates.UTInstant{Millisecond}
    periods: Millisecond
      value: Int64 63835823961576

help?> DateTime()
  DateTime


  DateTime wraps a UTInstant{Millisecond} and interprets it according to the proleptic Gregorian calendar.

shouldn't be too hard to convert between things but yeah they don't work out of the box

@jpivarski
Copy link
Member

Even if Julia's DateTime does all of the non-linear time units (e.g. months), leap-seconds, and such, a proper implementation would at least require us to check to be sure that Julia and NumPy do the same set of things. Like, suppose that Julia does leap-seconds and NumPy does not—we'd have to adjust for NumPy's naivete in the conversion (or vice-versa).

Incidentally, this is just as much of a problem going from NumPy to Pandas to Arrow to...

(Dates are hard.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants