New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5855: [Python] Support for Duration (timedelta) type #5566
ARROW-5855: [Python] Support for Duration (timedelta) type #5566
Conversation
@@ -449,6 +449,63 @@ class TimestampConverter | |||
TimeUnit::type unit_; | |||
}; | |||
|
|||
template <NullCoding null_coding> | |||
class DurationConverter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost literally a copy paste of the TimestampConverter above with timestamp-things replaced with duration-things. Is there a way to deduplicate some code?
Templating this might be difficult as I also need to switch Python / numpy C API calls? (PyDateTime_DateTime
-> PyDateTime_Delta
, PyDatetimeScalarObject
-> PyTimedeltaScalarObject
, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use something like this:
template <typename ArrowType>
struct PyDateTimeTraits {};
template <>
struct PyDateTimeTraits<TimestampType> {
using PyTypeCheck = PyDateTime_Check;
using PyDateTimeObject = PyDateTime_DateTime;
using py_to_s = internal::PyDateTime_to_s;
using py_to_ms = internal::PyDateTime_to_ms;
using py_to_us = internal::PyDateTime_to_us;
using py_to_ns = internal::PyDateTime_to_ns;
using np_traits = internal::npy_traits<NPY_DATETIME>;
using NpScalarObject = PyDatetimeScalarObject;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that using
can only be used with type names, and not function names?
(I could define an inline function that wraps the correct function for each, but that gets a bit verbose?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline functions sound fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I pushed a version with the template (I am personally not fully sure the end result is easier / better than some duplicated code, but ok ..)
725c49c
to
c38a20a
Compare
Also opened a few follow-up items that I ran into: https://issues.apache.org/jira/browse/ARROW-6778 (casting), overflow in conversion to ns (https://issues.apache.org/jira/browse/ARROW-6779) and parquet support (https://issues.apache.org/jira/browse/ARROW-6780). Another question is if we want to call this type "timedelta" instead of "duration" in the python interface (like we also have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks globally fine, but some questions?
@@ -230,6 +231,26 @@ inline void set_numpy_metadata(int type, const DataType* datatype, PyArray_Descr | |||
// datatype->type == Type::DATE64 | |||
date_dtype->meta.base = NPY_FR_D; | |||
} | |||
} else if (type == NPY_TIMEDELTA) { | |||
auto date_dtype = reinterpret_cast<PyArray_DatetimeDTypeMetaData*>(out->c_metadata); | |||
if (datatype->id() == Type::DURATION) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That shouldn't happen (as NPY_TIMEDELTA should only be passed in case of Type::DURATION). So I can just leave out the if check? (and maybe add a debug assert?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, add a debug assert.
@@ -449,6 +449,63 @@ class TimestampConverter | |||
TimeUnit::type unit_; | |||
}; | |||
|
|||
template <NullCoding null_coding> | |||
class DurationConverter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use something like this:
template <typename ArrowType>
struct PyDateTimeTraits {};
template <>
struct PyDateTimeTraits<TimestampType> {
using PyTypeCheck = PyDateTime_Check;
using PyDateTimeObject = PyDateTime_DateTime;
using py_to_s = internal::PyDateTime_to_s;
using py_to_ms = internal::PyDateTime_to_ms;
using py_to_us = internal::PyDateTime_to_us;
using py_to_ns = internal::PyDateTime_to_ns;
using np_traits = internal::npy_traits<NPY_DATETIME>;
using NpScalarObject = PyDatetimeScalarObject;
};
@@ -863,6 +867,34 @@ def test_datetime_subclassing(): | |||
assert arr_us[0].as_py() == datetime.datetime(2007, 7, 13, 1, | |||
23, 34, 123456) | |||
|
|||
data = [ | |||
MyTimedelta(1, 1, 1000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to test the conversion with more interesting values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What values would you find interesting? (thinking about specific corner cases?)
(it's mainly the fact that the subclass is working that is being tested here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something less trivial than days and seconds having the same value.
@pytest.mark.parametrize('unit', ['s', 'ms', 'us', 'ns']) | ||
def test_sequence_duration_with_unit(unit): | ||
data = [ | ||
datetime.timedelta(1, 1, 1001), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to test with more interesting values here as well...
def test_duration(self): | ||
arr = np.array([0, 3600000000000], dtype='timedelta64[ns]') | ||
|
||
units = ['us', 'ms', 's'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not test nanoseconds as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nanoseconds are tested below separately, as they have a different behaviour for with or without pandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Will update for your suggestions later
@@ -230,6 +231,26 @@ inline void set_numpy_metadata(int type, const DataType* datatype, PyArray_Descr | |||
// datatype->type == Type::DATE64 | |||
date_dtype->meta.base = NPY_FR_D; | |||
} | |||
} else if (type == NPY_TIMEDELTA) { | |||
auto date_dtype = reinterpret_cast<PyArray_DatetimeDTypeMetaData*>(out->c_metadata); | |||
if (datatype->id() == Type::DURATION) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That shouldn't happen (as NPY_TIMEDELTA should only be passed in case of Type::DURATION). So I can just leave out the if check? (and maybe add a debug assert?)
@@ -926,6 +931,16 @@ def test_cast_date32_to_int(): | |||
assert result2.equals(arr) | |||
|
|||
|
|||
@pytest.mark.xfail(strict=True) # TODO implement duration cast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already opened https://issues.apache.org/jira/browse/ARROW-6778 for that (but I also already wrote the test when I found out, so thought to just keep it)
@@ -863,6 +867,34 @@ def test_datetime_subclassing(): | |||
assert arr_us[0].as_py() == datetime.datetime(2007, 7, 13, 1, | |||
23, 34, 123456) | |||
|
|||
data = [ | |||
MyTimedelta(1, 1, 1000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What values would you find interesting? (thinking about specific corner cases?)
(it's mainly the fact that the subclass is working that is being tested here)
def test_duration(self): | ||
arr = np.array([0, 3600000000000], dtype='timedelta64[ns]') | ||
|
||
units = ['us', 'ms', 's'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nanoseconds are tested below separately, as they have a different behaviour for with or without pandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated for your feedback! Added some additional questions
@@ -449,6 +449,63 @@ class TimestampConverter | |||
TimeUnit::type unit_; | |||
}; | |||
|
|||
template <NullCoding null_coding> | |||
class DurationConverter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that using
can only be used with type names, and not function names?
(I could define an inline function that wraps the correct function for each, but that gets a bit verbose?)
@ursabot crossbow submit docker-python-3.6-nopandas |
AMD64 Conda Crossbow Submit (#67612) builder has been succeeded. Revision: f35aa25 Submitted crossbow builds: ursa-labs/crossbow @ ursabot-238
|
Codecov Report
@@ Coverage Diff @@
## master #5566 +/- ##
==========================================
+ Coverage 88.79% 89.35% +0.55%
==========================================
Files 983 791 -192
Lines 132170 117358 -14812
Branches 1501 0 -1501
==========================================
- Hits 117362 104865 -12497
+ Misses 14443 12493 -1950
+ Partials 365 0 -365
Continue to review full report at Codecov.
|
9b3f61d
to
7458000
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, just a nit
static constexpr const char* const np_name = "timedelta64"; | ||
}; | ||
|
||
template <NullCoding null_coding, typename Type> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: call this ArrowType
?
@pitrou updated, and all green! |
Thanks @jorisvandenbossche ! |
And thanks a lot for your review and help! |
https://issues.apache.org/jira/browse/ARROW-5855