New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-16741: [C++] Add Benchmarks for Binary Temporal Operations #13302
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. It is very much needed!
@@ -214,5 +240,19 @@ BENCHMARK_TEMPLATE(BenchmarkStrptime, non_zoned)->Apply(SetArgs); | |||
BENCHMARK_TEMPLATE(BenchmarkStrptime, zoned)->Apply(SetArgs); | |||
BENCHMARK(BenchmarkAssumeTimezone)->Apply(SetArgs); | |||
|
|||
// binary temporal benchmarks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could also add add
, add_checked
, subtract
, subtract_checked
, multiply
, multiply_checked
, divide
and divide_checked
here since they are binary?
These kernels are defined here. However they are a bit more heterogeneous type wise so they will require some care (e.g. add(timestamp, timestamp)
is invalid, while subtract(timestamp, timestamp)
isn't)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are binary, but I believe these kernels are already benchmarked in scalar_arithmetic_benchmark.cc
. Does this mean adding support for adding, subtracting, (multiplying and dividing!) timestamps, and then benchmarking those implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scalar_arithmetic_benchmark.cc
benchmarks non-temporal arrays right now. It would be useful to also benchmark for temporal zoned/nonzoned temporal types as computation would follow a different codepath for those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe the codepaths are not that different. Feel free to ignore this.
#define DECLARE_TEMPORAL_BINARY_BENCHMARKS(OP) \ | ||
BENCHMARK_TEMPLATE(BenchmarkTemporalBinary, OP, non_zoned)->Apply(SetArgs); \ | ||
BENCHMARK_TEMPLATE(BenchmarkTemporalBinary, OP, zoned)->Apply(SetArgs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some cases you might want other time types here (date32 date64, time32, time64, duration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rok, thanks for the timely review!
I was a bit curious about this, since it seems for BenchmarkTemporalBinary
and BenchmarkTemporal
(the original unary version), we are picking random int64_t
-- which I assume is date64
. How do we achieve the other time types, and is there some helpful example code I can follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you're referring to this part?
auto array = rand.Numeric<Int64Type>(array_size, kInt64Min, kInt64Max, args.null_proportion);
std::shared_ptr<DataType> timestamp_type = timestamp(TimeUnit::NANO, "Pacific/Marquesas");
EXPECT_OK_AND_ASSIGN(auto timestamp_array, array->View(timestamp_type));
So int64_t
here represents an integer array. Before throwing it into the kernel we interpret it as a timestamp array which does not change the underlaying buffer. I believe 32 types use Int32Type
and 64 use Int64Type
. Perhaps you can just use RandomArrayGenerator
to generate needed arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rok, thank you for the clarification!
I refactored the code to work for the other data types, however it seems like (many of) these temporal binary functions currently do not support any other datatype than Int64
.
When running my code with something like date32
or time64
with random.ArrayOf
, I get an NotImplemented: Function years_between has no kernel matching input types (array[time64[ns]], array[time64[ns]])
,
Let me know if I've missed a detail!
EDIT: Ah, I see my error now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not all kernels will support all types.
cpp/src/arrow/compute/api_scalar.h
Outdated
/// \since 8.0.0 | ||
/// \note API not yet finalized | ||
ARROW_EXPORT Result<Datum> MonthDayNanoBetween(const Datum& left, const Datum& right, | ||
ExecContext* ctx = NULLPTR); | ||
|
||
/// \brief DayTime Between finds the number of days and milliseconds between two values | ||
/// | ||
/// \param[in] left input treated as the start time | ||
/// \param[in] right input treated as the end time | ||
/// \param[in] ctx the function execution context, optional | ||
/// \return the resulting datum | ||
/// | ||
/// \since 8.0.0 | ||
/// \note API not yet finalized | ||
ARROW_EXPORT Result<Datum> DayTimeBetween(const Datum& left, const Datum& right, | ||
ExecContext* ctx = NULLPTR); | ||
|
||
/// \brief Days Between finds the number of days between two values | ||
/// | ||
/// \param[in] left input treated as the start time | ||
/// \param[in] right input treated as the end time | ||
/// \param[in] ctx the function execution context, optional | ||
/// \return the resulting datum | ||
/// | ||
/// \since 8.0.0 | ||
/// \note API not yet finalized | ||
ARROW_EXPORT Result<Datum> DaysBetween(const Datum& left, const Datum& right, | ||
ExecContext* ctx = NULLPTR); | ||
|
||
/// \brief Hours Between finds the number of hours between two values | ||
/// | ||
/// \param[in] left input treated as the start time | ||
/// \param[in] right input treated as the end time | ||
/// \param[in] ctx the function execution context, optional | ||
/// \return the resulting datum | ||
/// | ||
/// \since 8.0.0 | ||
/// \note API not yet finalized | ||
ARROW_EXPORT Result<Datum> HoursBetween(const Datum& left, const Datum& right, | ||
ExecContext* ctx = NULLPTR); | ||
|
||
/// \brief Minutes Between finds the number of minutes between two values | ||
/// | ||
/// \param[in] left input treated as the start time | ||
/// \param[in] right input treated as the end time | ||
/// \param[in] ctx the function execution context, optional | ||
/// \return the resulting datum | ||
/// | ||
/// \since 8.0.0 | ||
/// \note API not yet finalized | ||
ARROW_EXPORT Result<Datum> MinutesBetween(const Datum& left, const Datum& right, | ||
ExecContext* ctx = NULLPTR); | ||
|
||
/// \brief Seconds Between finds the number of hours between two values | ||
/// | ||
/// \param[in] left input treated as the start time | ||
/// \param[in] right input treated as the end time | ||
/// \param[in] ctx the function execution context, optional | ||
/// \return the resulting datum | ||
/// | ||
/// \since 8.0.0 | ||
/// \note API not yet finalized | ||
ARROW_EXPORT Result<Datum> SecondsBetween(const Datum& left, const Datum& right, | ||
ExecContext* ctx = NULLPTR); | ||
|
||
/// \brief Milliseconds Between finds the number of milliseconds between two values | ||
/// | ||
/// \param[in] left input treated as the start time | ||
/// \param[in] right input treated as the end time | ||
/// \param[in] ctx the function execution context, optional | ||
/// \return the resulting datum | ||
/// | ||
/// \since 8.0.0 | ||
/// \note API not yet finalized | ||
ARROW_EXPORT Result<Datum> MillisecondsBetween(const Datum& left, const Datum& right, | ||
ExecContext* ctx = NULLPTR); | ||
|
||
/// \brief Microseconds Between finds the number of microseconds between two values | ||
/// | ||
/// \param[in] left input treated as the start time | ||
/// \param[in] right input treated as the end time | ||
/// \param[in] ctx the function execution context, optional | ||
/// \return the resulting datum | ||
/// | ||
/// \since 8.0.0 | ||
/// \note API not yet finalized | ||
ARROW_EXPORT Result<Datum> MicrosecondsBetween(const Datum& left, const Datum& right, | ||
ExecContext* ctx = NULLPTR); | ||
|
||
/// \brief Nanoseconds Between finds the number of nanoseconds between two values | ||
/// | ||
/// \param[in] left input treated as the start time | ||
/// \param[in] right input treated as the end time | ||
/// \param[in] ctx the function execution context, optional | ||
/// \return the resulting datum | ||
/// | ||
/// \since 8.0.0 | ||
/// \note API not yet finalized | ||
ARROW_EXPORT Result<Datum> NanoesecondsBetween(const Datum& left, const Datum& right, | ||
ExecContext* ctx = NULLPTR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidavidm was this not mapped out for a reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there was any reason, I just neglected to add those.
Hi @rok, unfortunately I can't re-request review as a first-time contributor but I was wondering if you could answer some of my clarification questions from a few days ago when you get the chance! Excited to get this merged in. |
Hey @iChauster sorry for the delay! |
@rok, excellent, I think the binary benchmarks now support different datatypes for their respective kernels. Is this ready to be merged now? |
I have write access yes was just waiting for CI to pass/it was the end of the day for me :) |
@ursabot please benchmark command=cpp-micro --suite-filter=scalar-temporal |
Benchmark runs are scheduled for baseline = fc082c5 and contender = dc9cd20. Results will be available as each benchmark for each run completes. |
@iChauster here are the temporal benchmarks for this PR if you're curious. EDIT: This list is more relevant. |
Add all binary temporal benchmarks and documentation to
api_scalar.h