ARROW-16741: [C++] Add Benchmarks for Binary Temporal Operations #13302

iChauster · 2022-06-02T17:56:08Z

Add all binary temporal benchmarks and documentation to api_scalar.h

github-actions · 2022-06-02T18:03:40Z

https://issues.apache.org/jira/browse/ARROW-16741

github-actions · 2022-06-02T18:03:41Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

rok

Thanks for doing this. It is very much needed!

rok · 2022-06-02T18:12:15Z

cpp/src/arrow/compute/kernels/scalar_temporal_benchmark.cc

@@ -214,5 +240,19 @@ BENCHMARK_TEMPLATE(BenchmarkStrptime, non_zoned)->Apply(SetArgs);
 BENCHMARK_TEMPLATE(BenchmarkStrptime, zoned)->Apply(SetArgs);
 BENCHMARK(BenchmarkAssumeTimezone)->Apply(SetArgs);

+// binary temporal benchmarks


Perhaps we could also add add, add_checked, subtract, subtract_checked, multiply, multiply_checked, divide and divide_checked here since they are binary?
These kernels are defined here. However they are a bit more heterogeneous type wise so they will require some care (e.g. add(timestamp, timestamp) is invalid, while subtract(timestamp, timestamp) isn't)

They are binary, but I believe these kernels are already benchmarked in scalar_arithmetic_benchmark.cc. Does this mean adding support for adding, subtracting, (multiplying and dividing!) timestamps, and then benchmarking those implementations?

scalar_arithmetic_benchmark.cc benchmarks non-temporal arrays right now. It would be useful to also benchmark for temporal zoned/nonzoned temporal types as computation would follow a different codepath for those.

Actually, maybe the codepaths are not that different. Feel free to ignore this.

rok · 2022-06-02T18:27:10Z

cpp/src/arrow/compute/kernels/scalar_temporal_benchmark.cc

+#define DECLARE_TEMPORAL_BINARY_BENCHMARKS(OP)                                \
+  BENCHMARK_TEMPLATE(BenchmarkTemporalBinary, OP, non_zoned)->Apply(SetArgs); \
+  BENCHMARK_TEMPLATE(BenchmarkTemporalBinary, OP, zoned)->Apply(SetArgs);


For some cases you might want other time types here (date32 date64, time32, time64, duration).

Hi @rok, thanks for the timely review!

I was a bit curious about this, since it seems for BenchmarkTemporalBinary and BenchmarkTemporal (the original unary version), we are picking random int64_t -- which I assume is date64. How do we achieve the other time types, and is there some helpful example code I can follow?

I assume you're referring to this part?

auto array = rand.Numeric<Int64Type>(array_size, kInt64Min, kInt64Max, args.null_proportion); std::shared_ptr<DataType> timestamp_type = timestamp(TimeUnit::NANO, "Pacific/Marquesas"); EXPECT_OK_AND_ASSIGN(auto timestamp_array, array->View(timestamp_type));

So int64_t here represents an integer array. Before throwing it into the kernel we interpret it as a timestamp array which does not change the underlaying buffer. I believe 32 types use Int32Type and 64 use Int64Type. Perhaps you can just use RandomArrayGenerator to generate needed arrays.

A more complete example.

Hi @rok, thank you for the clarification!

I refactored the code to work for the other data types, however it seems like (many of) these temporal binary functions currently do not support any other datatype than Int64.

When running my code with something like date32 or time64 with random.ArrayOf, I get an NotImplemented: Function years_between has no kernel matching input types (array[time64[ns]], array[time64[ns]]),

Let me know if I've missed a detail!

EDIT: Ah, I see my error now!

Yeah, not all kernels will support all types.

rok · 2022-06-02T18:28:23Z

cpp/src/arrow/compute/api_scalar.h

+/// \since 8.0.0
+/// \note API not yet finalized
+ARROW_EXPORT Result<Datum> MonthDayNanoBetween(const Datum& left, const Datum& right,
+                                               ExecContext* ctx = NULLPTR);
+
+/// \brief DayTime Between finds the number of days and milliseconds between two values
+///
+/// \param[in] left input treated as the start time
+/// \param[in] right input treated as the end time
+/// \param[in] ctx the function execution context, optional
+/// \return the resulting datum
+///
+/// \since 8.0.0
+/// \note API not yet finalized
+ARROW_EXPORT Result<Datum> DayTimeBetween(const Datum& left, const Datum& right,
+                                          ExecContext* ctx = NULLPTR);
+
+/// \brief Days Between finds the number of days between two values
+///
+/// \param[in] left input treated as the start time
+/// \param[in] right input treated as the end time
+/// \param[in] ctx the function execution context, optional
+/// \return the resulting datum
+///
+/// \since 8.0.0
+/// \note API not yet finalized
+ARROW_EXPORT Result<Datum> DaysBetween(const Datum& left, const Datum& right,
+                                       ExecContext* ctx = NULLPTR);
+
+/// \brief Hours Between finds the number of hours between two values
+///
+/// \param[in] left input treated as the start time
+/// \param[in] right input treated as the end time
+/// \param[in] ctx the function execution context, optional
+/// \return the resulting datum
+///
+/// \since 8.0.0
+/// \note API not yet finalized
+ARROW_EXPORT Result<Datum> HoursBetween(const Datum& left, const Datum& right,
+                                        ExecContext* ctx = NULLPTR);
+
+/// \brief Minutes Between finds the number of minutes between two values
+///
+/// \param[in] left input treated as the start time
+/// \param[in] right input treated as the end time
+/// \param[in] ctx the function execution context, optional
+/// \return the resulting datum
+///
+/// \since 8.0.0
+/// \note API not yet finalized
+ARROW_EXPORT Result<Datum> MinutesBetween(const Datum& left, const Datum& right,
+                                          ExecContext* ctx = NULLPTR);
+
+/// \brief Seconds Between finds the number of hours between two values
+///
+/// \param[in] left input treated as the start time
+/// \param[in] right input treated as the end time
+/// \param[in] ctx the function execution context, optional
+/// \return the resulting datum
+///
+/// \since 8.0.0
+/// \note API not yet finalized
+ARROW_EXPORT Result<Datum> SecondsBetween(const Datum& left, const Datum& right,
+                                          ExecContext* ctx = NULLPTR);
+
+/// \brief Milliseconds Between finds the number of milliseconds between two values
+///
+/// \param[in] left input treated as the start time
+/// \param[in] right input treated as the end time
+/// \param[in] ctx the function execution context, optional
+/// \return the resulting datum
+///
+/// \since 8.0.0
+/// \note API not yet finalized
+ARROW_EXPORT Result<Datum> MillisecondsBetween(const Datum& left, const Datum& right,
+                                               ExecContext* ctx = NULLPTR);
+
+/// \brief Microseconds Between finds the number of microseconds between two values
+///
+/// \param[in] left input treated as the start time
+/// \param[in] right input treated as the end time
+/// \param[in] ctx the function execution context, optional
+/// \return the resulting datum
+///
+/// \since 8.0.0
+/// \note API not yet finalized
+ARROW_EXPORT Result<Datum> MicrosecondsBetween(const Datum& left, const Datum& right,
+                                               ExecContext* ctx = NULLPTR);
+
+/// \brief Nanoseconds Between finds the number of nanoseconds between two values
+///
+/// \param[in] left input treated as the start time
+/// \param[in] right input treated as the end time
+/// \param[in] ctx the function execution context, optional
+/// \return the resulting datum
+///
+/// \since 8.0.0
+/// \note API not yet finalized
+ARROW_EXPORT Result<Datum> NanoesecondsBetween(const Datum& left, const Datum& right,
+                                               ExecContext* ctx = NULLPTR);


@lidavidm was this not mapped out for a reason?

I don't think there was any reason, I just neglected to add those.

iChauster · 2022-06-06T15:40:28Z

Hi @rok, unfortunately I can't re-request review as a first-time contributor but I was wondering if you could answer some of my clarification questions from a few days ago when you get the chance! Excited to get this merged in.

rok · 2022-06-08T16:36:03Z

Hey @iChauster sorry for the delay!

iChauster · 2022-06-08T20:33:54Z

@rok, excellent, I think the binary benchmarks now support different datatypes for their respective kernels. Is this ready to be merged now?

rok · 2022-06-08T21:02:21Z

@rok, excellent, I think the binary benchmarks now support different datatypes for their respective kernels. Is this ready to be merged now?

Looks good to me. Do benchmarks run ok for you locally?

We need a committer to merge this. @lidavidm probably knows these functions best.

iChauster · 2022-06-09T13:14:13Z

@rok the benchmarks run great locally and fall into a similar range as the TemporalRounding statistics.

Thanks @lidavidm for approving, I'm assuming someone with write access will merge eventually?

Thanks everyone for all the help on this PR!

lidavidm · 2022-06-09T13:23:03Z

I have write access yes was just waiting for CI to pass/it was the end of the day for me :)

rok · 2022-06-09T13:23:25Z

@ursabot please benchmark command=cpp-micro --suite-filter=scalar-temporal

ursabot · 2022-06-09T13:23:28Z

Benchmark runs are scheduled for baseline = fc082c5 and contender = dc9cd20. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['lang', 'name'] filters are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Only ['lang', 'name'] filters are supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Only ['lang', 'name'] filters are supported on ursa-i9-9960x] ursa-i9-9960x
[Finished ⬇️0.0% ⬆️0.07%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] dc9cd205 ursa-thinkcentre-m75q
[Finished] fc082c5e ec2-t3-xlarge-us-east-2
[Finished] fc082c5e test-mac-arm
[Failed] fc082c5e ursa-i9-9960x
[Finished] fc082c5e ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

rok · 2022-06-09T16:32:40Z

@iChauster here are the temporal benchmarks for this PR if you're curious.

EDIT: This list is more relevant.

Ivan Chau added 2 commits June 2, 2022 09:53

add years between

4e21d82

add remaining temporal binary benchmarks

be8c216

github-actions bot added the Component: C++ label Jun 2, 2022

apply ninja format, lint

f5d4ca0

rok reviewed Jun 2, 2022

View reviewed changes

iChauster mentioned this pull request Jun 8, 2022

ARROW-16716: [C++] Add Benchmarks for ProjectNode #13314

Merged

Ivan Chau added 2 commits June 8, 2022 15:40

add date64 types to benchmark

d96998b

add time, date to temporal benchmarks, fix typo in api_scalar

dc9cd20

lidavidm approved these changes Jun 8, 2022

View reviewed changes

lidavidm merged commit 32054f7 into apache:master Jun 9, 2022

iChauster deleted the temporal_binary_benchmarks branch June 9, 2022 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-16741: [C++] Add Benchmarks for Binary Temporal Operations #13302

ARROW-16741: [C++] Add Benchmarks for Binary Temporal Operations #13302

iChauster commented Jun 2, 2022 •

edited

github-actions bot commented Jun 2, 2022

github-actions bot commented Jun 2, 2022

rok left a comment

rok Jun 2, 2022

iChauster Jun 2, 2022

rok Jun 8, 2022

rok Jun 8, 2022

rok Jun 2, 2022

iChauster Jun 2, 2022

rok Jun 8, 2022

rok Jun 8, 2022

iChauster Jun 8, 2022 •

edited

rok Jun 8, 2022

rok Jun 2, 2022

lidavidm Jun 2, 2022

iChauster commented Jun 6, 2022

rok commented Jun 8, 2022

iChauster commented Jun 8, 2022

rok commented Jun 8, 2022

iChauster commented Jun 9, 2022 •

edited

lidavidm commented Jun 9, 2022

rok commented Jun 9, 2022

ursabot commented Jun 9, 2022 •

edited

rok commented Jun 9, 2022 •

edited

ARROW-16741: [C++] Add Benchmarks for Binary Temporal Operations #13302

ARROW-16741: [C++] Add Benchmarks for Binary Temporal Operations #13302

Conversation

iChauster commented Jun 2, 2022 • edited

github-actions bot commented Jun 2, 2022

github-actions bot commented Jun 2, 2022

rok left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iChauster Jun 8, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iChauster commented Jun 6, 2022

rok commented Jun 8, 2022

iChauster commented Jun 8, 2022

rok commented Jun 8, 2022

iChauster commented Jun 9, 2022 • edited

lidavidm commented Jun 9, 2022

rok commented Jun 9, 2022

ursabot commented Jun 9, 2022 • edited

rok commented Jun 9, 2022 • edited

iChauster commented Jun 2, 2022 •

edited

iChauster Jun 8, 2022 •

edited

iChauster commented Jun 9, 2022 •

edited

ursabot commented Jun 9, 2022 •

edited

rok commented Jun 9, 2022 •

edited