[C++] `duration` add missing kernels #39233

randolf-scholz · 2023-12-14T18:41:35Z

Describe the enhancement requested

There are lots of kernels missing on the duration type for feature-parity with e.g. numpy.timedelta64. Since duration types are basically just wrapped integers, most of the integer kernels should be transferrable, with some exceptions (e.g. prod does not make sense since it would change the physical units)

I ran a script to figure out what is currently supported.

import pyarrow as pa
from pyarrow.lib import ArrowNotImplementedError

i8 = pa.int8()
i64 = pa.int64()
i32 = pa.int32()
f64 = pa.float64()
td64 = pa.duration("s")
b = pa.bool_()

duration_arr = pa.array([-3, 2, -1, 1], type=td64)
int_arr = pa.array([-3, 2, -1, 1], type=i64)
float_arr = pa.array([-3, 2, -1, 1], type=f64)

# td64 = pa.int64()
# duration_arr = pa.array([-3, 2, -1, 1], type=i64)

unary_ops = [
    (pa.compute.negate, duration_arr, td64),
    (pa.compute.negate_checked, duration_arr, td64),
    (pa.compute.abs, duration_arr, td64),
    (pa.compute.abs_checked, duration_arr, td64),
    (pa.compute.sign, duration_arr, i8),
    # tests
    (pa.compute.is_null, duration_arr, b),
    (pa.compute.is_valid, duration_arr, b),
    (pa.compute.is_finite, duration_arr, b),
    (pa.compute.is_inf, duration_arr, b),
    (pa.compute.is_nan, duration_arr, b),
    (pa.compute.true_unless_null, duration_arr, b),
    # aggregations
    (pa.compute.min_max, duration_arr, pa.struct([('min', td64), ('max', td64)])),
    (pa.compute.max, duration_arr, td64),
    (pa.compute.min, duration_arr, td64),
    (pa.compute.sum, duration_arr, td64),
    (pa.compute.mode, duration_arr, pa.struct([('mode', td64), ('count', i64)])),
    # cumulative aggregations
    (pa.compute.cumulative_sum, duration_arr, td64),
    (pa.compute.cumulative_sum_checked, duration_arr, td64),
    (pa.compute.cumulative_min, duration_arr, td64),
    (pa.compute.cumulative_max, duration_arr, td64),
]

binary_ops = [
    # arithmetic
    (pa.compute.add, duration_arr, duration_arr, td64),
    (pa.compute.add_checked, duration_arr, duration_arr, td64),
    (pa.compute.subtract, duration_arr, duration_arr, td64),
    (pa.compute.subtract_checked, duration_arr, duration_arr, td64),
    (pa.compute.multiply, duration_arr, int_arr, td64),
    (pa.compute.multiply_checked, duration_arr, int_arr, td64),
    (pa.compute.divide, duration_arr, duration_arr, f64),
    (pa.compute.divide, duration_arr, int_arr, td64),
    (pa.compute.divide_checked, duration_arr, duration_arr, f64),
    (pa.compute.divide_checked, duration_arr, int_arr, td64),
    # comparisons
    (pa.compute.less, duration_arr, duration_arr, b),
    (pa.compute.less_equal, duration_arr, duration_arr, b),
    (pa.compute.greater, duration_arr, duration_arr, b),
    (pa.compute.greater_equal, duration_arr, duration_arr, b),
    (pa.compute.equal, duration_arr, duration_arr, b),
    (pa.compute.not_equal, duration_arr, duration_arr, b),
    # min/max
    (pa.compute.max_element_wise, duration_arr, duration_arr, td64),
    (pa.compute.min_element_wise, duration_arr, duration_arr, td64),
    # containment
    (pa.compute.is_in, duration_arr, duration_arr, b),
    (pa.compute.index_in, duration_arr, duration_arr, i32),
    # functions that require rounding
]

rounding_ops = [
    # operations that require rounding
    (pa.compute.mean, duration_arr, td64),
    (pa.compute.quantile, duration_arr, td64),
    (pa.compute.approximate_median, duration_arr, td64),
    (pa.compute.multiply, duration_arr, float_arr, td64),
    (pa.compute.multiply_checked, duration_arr, float_arr, td64),
    (pa.compute.divide, duration_arr, float_arr, td64),
    (pa.compute.divide_checked, duration_arr, float_arr, td64),
]

for op, *operands, dtype in unary_ops + binary_ops:
    try:
        result = op(*operands)
    except ArrowNotImplementedError as e:
        x = " "
    else:
        x = "x"
        assert result.type == dtype, f"{op}: got {result.type} expected {dtype}"

    formatted_ops = ", ".join(f"{op.type!s:<11}" for op in operands)
    print(f" [{x}] {op.__name__:<24}({formatted_ops}) -> {dtype}")

EDIT: updated with `pyarrow` 16.0

Unary Ops

Binary Ops

Additional Ops

These are somewhat questionable, as they require rounding. They are supported by numpy.timedelta64 arrays.

mean(duration[s]) -> duration[s]
multiply(duration[s], double) -> duration[s]
multiply_checked(duration[s], double) -> duration[s]
divide(duration[s], double) -> duration[s]
divide_checked(duration[s], double) -> duration[s]
quantile(duration[s], double) -> duration[s]
approximate_median(duration[s]) -> duration[s]

Component(s)

C++

The text was updated successfully, but these errors were encountered:

js8544 · 2023-12-15T05:28:50Z

Thanks for checking! Most of the missing ones should be trivial to implement. I'll take a look soon.

js8544 · 2023-12-22T07:16:55Z

take

### Rationale for this change Add kernels for durations. ### What changes are included in this PR? In this PR I added the ones that require only registration and unit tests. More complicated ones will be in another PR for readability. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #39233 Authored-by: Jin Shang <shangjin1997@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

### Rationale for this change Add kernels for durations. ### What changes are included in this PR? In this PR I added the ones that require only registration and unit tests. More complicated ones will be in another PR for readability. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#39233 Authored-by: Jin Shang <shangjin1997@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

randolf-scholz · 2024-04-22T19:27:39Z

There are 2 more operations supported by numpy that are missing in the list above:

multiply(duration, double/float) -> duration
divide(duration, double/float) -> duration

In both cases, numpy performs automatic rounding (towards zero) to the specified time resolution:

import numpy as np
td = np.timedelta64(5, "D")
print(td * 1.5)  # 7 days (rounded towards 0 from 7.5)
print(td / 1.5)  # 3 days (rounded towards 0 from 3.3333....)
print(td * -1.55)  # -7 days (rounded towards 0 from -7.75)

Especially the multiply kernel would be very useful, because it is the inverse operation to the divide(duration, duration) -> double kernel, which is needed if one wants to reverse pre-processing transformations.

randolf-scholz · 2024-04-24T10:20:15Z

@js8544 I updated the list with the kernel available in the 16.0 release

randolf-scholz added the Type: enhancement label Dec 14, 2023

github-actions bot added the Component: C++ label Dec 14, 2023

randolf-scholz changed the title ~~[C++] duration add missing arithmetic kernels~~ [C++] duration add missing kernels Dec 14, 2023

github-actions bot assigned js8544 Dec 22, 2023

github-actions bot mentioned this issue Dec 23, 2023

GH-39233: [Compute] Add some duration kernels #39358

Merged

pitrou closed this as completed in #39358 Jan 11, 2024

pitrou added this to the 16.0.0 milestone Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] `duration` add missing kernels #39233

[C++] `duration` add missing kernels #39233

randolf-scholz commented Dec 14, 2023 •

edited

Loading

js8544 commented Dec 15, 2023

js8544 commented Dec 22, 2023

randolf-scholz commented Apr 22, 2024 •

edited

Loading

randolf-scholz commented Apr 24, 2024

[C++] duration add missing kernels #39233

[C++] duration add missing kernels #39233

Comments

randolf-scholz commented Dec 14, 2023 • edited Loading

Describe the enhancement requested

EDIT: updated with pyarrow 16.0

Unary Ops

Binary Ops

Additional Ops

Component(s)

js8544 commented Dec 15, 2023

js8544 commented Dec 22, 2023

randolf-scholz commented Apr 22, 2024 • edited Loading

randolf-scholz commented Apr 24, 2024

[C++] `duration` add missing kernels #39233

[C++] `duration` add missing kernels #39233

randolf-scholz commented Dec 14, 2023 •

edited

Loading

EDIT: updated with `pyarrow` 16.0

randolf-scholz commented Apr 22, 2024 •

edited

Loading