Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-15029: [C++] Split compute/kernels/scalar_string.cc #12084

Conversation

jvanstraten
Copy link
Contributor

This PR splits compute/kernels/scalar_string.cc into a file for ASCII/binary-based kernels and a file for UTF-8-based kernels to speed up compilation and improve readability. Kernel definitions within the files are also grouped and ordered more consistently for readability.

Note: some kernels used to exist regardless of ARROW_WITH_UTF8PROC,
but now only exist when ARROW_WITH_UTF8PROC is enabled. The change
is not reflected in the tests, so the tests should currently fail
for -DARROW_WITH_UTF8PROC=OFF.
@github-actions
Copy link

github-actions bot commented Jan 5, 2022

@github-actions
Copy link

github-actions bot commented Jan 5, 2022

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@@ -422,7 +422,9 @@ if(ARROW_COMPUTE)
compute/kernels/scalar_nested.cc
compute/kernels/scalar_random.cc
compute/kernels/scalar_set_lookup.cc
compute/kernels/scalar_string.cc
compute/kernels/scalar_string_common.cc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current convention is to suffix file names with _internal when its contents are local common utilities. Based on this, I suggest scalar_string_common.cc --> scalar_string_internal.cc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, done in 36e134b. Based on other *_internal.h files, I guess that will also satisfy the linter w.r.t. RETURN_NOT_OK macro usage in the corresponding header file?

namespace compute {
namespace internal {

FunctionDoc StringPredicateDoc(std::string summary, std::string description) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new file looks not necessary. Move these two functions to scalar_string_ascii.cc? Or to the header (static)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two functions are used by both scalar_string_ascii.cc and scalar_string_utf8.cc equally, so putting them in the header seems more suitable to me. I tried that initially by means of putting all the code in the header in an anonymous namespace, mimicking how the code used to work, but the linter and/or clang-tidy wouldn't have it. I'll give static a try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, scalar_string_internal.cc should contain the definition of all the methods/functions in scalar_string_internal.h, instead of having them in embedded in *.h files. It makes it easier to understand the API of all those classes, but this is not followed systematically in Arrow and, if we want to do this, then it can be done in a follow-up PR.

@cyb70289
Copy link
Contributor

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Jan 11, 2022

Benchmark runs are scheduled for baseline = ec38aeb and contender = 3d77c90. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Failed ⬇️0.84% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@cyb70289
Copy link
Contributor

RTools CI failures are not related.

Copy link
Contributor

@cyb70289 cyb70289 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@cyb70289 cyb70289 closed this in 540dbf6 Jan 11, 2022
@cyb70289
Copy link
Contributor

Thanks @jvanstraten @edponce !

@ursabot
Copy link

ursabot commented Jan 11, 2022

Benchmark runs are scheduled for baseline = 2d1bd96 and contender = 540dbf6. 540dbf6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] ursa-i9-9960x
[Finished ⬇️0.31% ⬆️0.09%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants