Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-16741: [C++] Add Benchmarks for Binary Temporal Operations #13302

Merged
merged 5 commits into from Jun 9, 2022

Conversation

iChauster
Copy link
Contributor

@iChauster iChauster commented Jun 2, 2022

Add all binary temporal benchmarks and documentation to api_scalar.h

@github-actions
Copy link

github-actions bot commented Jun 2, 2022

@github-actions
Copy link

github-actions bot commented Jun 2, 2022

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. It is very much needed!

@@ -214,5 +240,19 @@ BENCHMARK_TEMPLATE(BenchmarkStrptime, non_zoned)->Apply(SetArgs);
BENCHMARK_TEMPLATE(BenchmarkStrptime, zoned)->Apply(SetArgs);
BENCHMARK(BenchmarkAssumeTimezone)->Apply(SetArgs);

// binary temporal benchmarks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could also add add, add_checked, subtract, subtract_checked, multiply, multiply_checked, divide and divide_checked here since they are binary?
These kernels are defined here. However they are a bit more heterogeneous type wise so they will require some care (e.g. add(timestamp, timestamp) is invalid, while subtract(timestamp, timestamp) isn't)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are binary, but I believe these kernels are already benchmarked in scalar_arithmetic_benchmark.cc. Does this mean adding support for adding, subtracting, (multiplying and dividing!) timestamps, and then benchmarking those implementations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scalar_arithmetic_benchmark.cc benchmarks non-temporal arrays right now. It would be useful to also benchmark for temporal zoned/nonzoned temporal types as computation would follow a different codepath for those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe the codepaths are not that different. Feel free to ignore this.

Comment on lines 197 to 199
#define DECLARE_TEMPORAL_BINARY_BENCHMARKS(OP) \
BENCHMARK_TEMPLATE(BenchmarkTemporalBinary, OP, non_zoned)->Apply(SetArgs); \
BENCHMARK_TEMPLATE(BenchmarkTemporalBinary, OP, zoned)->Apply(SetArgs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some cases you might want other time types here (date32 date64, time32, time64, duration).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rok, thanks for the timely review!

I was a bit curious about this, since it seems for BenchmarkTemporalBinary and BenchmarkTemporal (the original unary version), we are picking random int64_t -- which I assume is date64. How do we achieve the other time types, and is there some helpful example code I can follow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're referring to this part?

auto array = rand.Numeric<Int64Type>(array_size, kInt64Min, kInt64Max, args.null_proportion);
std::shared_ptr<DataType> timestamp_type = timestamp(TimeUnit::NANO, "Pacific/Marquesas");
EXPECT_OK_AND_ASSIGN(auto timestamp_array, array->View(timestamp_type));

So int64_t here represents an integer array. Before throwing it into the kernel we interpret it as a timestamp array which does not change the underlaying buffer. I believe 32 types use Int32Type and 64 use Int64Type. Perhaps you can just use RandomArrayGenerator to generate needed arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@iChauster iChauster Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rok, thank you for the clarification!

I refactored the code to work for the other data types, however it seems like (many of) these temporal binary functions currently do not support any other datatype than Int64.

When running my code with something like date32 or time64 with random.ArrayOf, I get an NotImplemented: Function years_between has no kernel matching input types (array[time64[ns]], array[time64[ns]]),

Let me know if I've missed a detail!

EDIT: Ah, I see my error now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not all kernels will support all types.

Comment on lines 1449 to 1604
/// \since 8.0.0
/// \note API not yet finalized
ARROW_EXPORT Result<Datum> MonthDayNanoBetween(const Datum& left, const Datum& right,
ExecContext* ctx = NULLPTR);

/// \brief DayTime Between finds the number of days and milliseconds between two values
///
/// \param[in] left input treated as the start time
/// \param[in] right input treated as the end time
/// \param[in] ctx the function execution context, optional
/// \return the resulting datum
///
/// \since 8.0.0
/// \note API not yet finalized
ARROW_EXPORT Result<Datum> DayTimeBetween(const Datum& left, const Datum& right,
ExecContext* ctx = NULLPTR);

/// \brief Days Between finds the number of days between two values
///
/// \param[in] left input treated as the start time
/// \param[in] right input treated as the end time
/// \param[in] ctx the function execution context, optional
/// \return the resulting datum
///
/// \since 8.0.0
/// \note API not yet finalized
ARROW_EXPORT Result<Datum> DaysBetween(const Datum& left, const Datum& right,
ExecContext* ctx = NULLPTR);

/// \brief Hours Between finds the number of hours between two values
///
/// \param[in] left input treated as the start time
/// \param[in] right input treated as the end time
/// \param[in] ctx the function execution context, optional
/// \return the resulting datum
///
/// \since 8.0.0
/// \note API not yet finalized
ARROW_EXPORT Result<Datum> HoursBetween(const Datum& left, const Datum& right,
ExecContext* ctx = NULLPTR);

/// \brief Minutes Between finds the number of minutes between two values
///
/// \param[in] left input treated as the start time
/// \param[in] right input treated as the end time
/// \param[in] ctx the function execution context, optional
/// \return the resulting datum
///
/// \since 8.0.0
/// \note API not yet finalized
ARROW_EXPORT Result<Datum> MinutesBetween(const Datum& left, const Datum& right,
ExecContext* ctx = NULLPTR);

/// \brief Seconds Between finds the number of hours between two values
///
/// \param[in] left input treated as the start time
/// \param[in] right input treated as the end time
/// \param[in] ctx the function execution context, optional
/// \return the resulting datum
///
/// \since 8.0.0
/// \note API not yet finalized
ARROW_EXPORT Result<Datum> SecondsBetween(const Datum& left, const Datum& right,
ExecContext* ctx = NULLPTR);

/// \brief Milliseconds Between finds the number of milliseconds between two values
///
/// \param[in] left input treated as the start time
/// \param[in] right input treated as the end time
/// \param[in] ctx the function execution context, optional
/// \return the resulting datum
///
/// \since 8.0.0
/// \note API not yet finalized
ARROW_EXPORT Result<Datum> MillisecondsBetween(const Datum& left, const Datum& right,
ExecContext* ctx = NULLPTR);

/// \brief Microseconds Between finds the number of microseconds between two values
///
/// \param[in] left input treated as the start time
/// \param[in] right input treated as the end time
/// \param[in] ctx the function execution context, optional
/// \return the resulting datum
///
/// \since 8.0.0
/// \note API not yet finalized
ARROW_EXPORT Result<Datum> MicrosecondsBetween(const Datum& left, const Datum& right,
ExecContext* ctx = NULLPTR);

/// \brief Nanoseconds Between finds the number of nanoseconds between two values
///
/// \param[in] left input treated as the start time
/// \param[in] right input treated as the end time
/// \param[in] ctx the function execution context, optional
/// \return the resulting datum
///
/// \since 8.0.0
/// \note API not yet finalized
ARROW_EXPORT Result<Datum> NanoesecondsBetween(const Datum& left, const Datum& right,
ExecContext* ctx = NULLPTR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm was this not mapped out for a reason?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there was any reason, I just neglected to add those.

@iChauster
Copy link
Contributor Author

Hi @rok, unfortunately I can't re-request review as a first-time contributor but I was wondering if you could answer some of my clarification questions from a few days ago when you get the chance! Excited to get this merged in.

@rok
Copy link
Member

rok commented Jun 8, 2022

Hey @iChauster sorry for the delay!

@iChauster
Copy link
Contributor Author

@rok, excellent, I think the binary benchmarks now support different datatypes for their respective kernels. Is this ready to be merged now?

@rok
Copy link
Member

rok commented Jun 8, 2022

@rok, excellent, I think the binary benchmarks now support different datatypes for their respective kernels. Is this ready to be merged now?

Looks good to me. Do benchmarks run ok for you locally?

We need a committer to merge this. @lidavidm probably knows these functions best.

@iChauster
Copy link
Contributor Author

iChauster commented Jun 9, 2022

@rok the benchmarks run great locally and fall into a similar range as the TemporalRounding statistics.

Thanks @lidavidm for approving, I'm assuming someone with write access will merge eventually?

Thanks everyone for all the help on this PR!

@lidavidm
Copy link
Member

lidavidm commented Jun 9, 2022

I have write access yes was just waiting for CI to pass/it was the end of the day for me :)

@rok
Copy link
Member

rok commented Jun 9, 2022

@ursabot please benchmark command=cpp-micro --suite-filter=scalar-temporal

@ursabot
Copy link

ursabot commented Jun 9, 2022

Benchmark runs are scheduled for baseline = fc082c5 and contender = dc9cd20. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['lang', 'name'] filters are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Only ['lang', 'name'] filters are supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Only ['lang', 'name'] filters are supported on ursa-i9-9960x] ursa-i9-9960x
[Finished ⬇️0.0% ⬆️0.07%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] dc9cd205 ursa-thinkcentre-m75q
[Finished] fc082c5e ec2-t3-xlarge-us-east-2
[Finished] fc082c5e test-mac-arm
[Failed] fc082c5e ursa-i9-9960x
[Finished] fc082c5e ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@lidavidm lidavidm merged commit 32054f7 into apache:master Jun 9, 2022
@iChauster iChauster deleted the temporal_binary_benchmarks branch June 9, 2022 16:00
@rok
Copy link
Member

rok commented Jun 9, 2022

@iChauster here are the temporal benchmarks for this PR if you're curious.

EDIT: This list is more relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants