Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-15665: [C++] Add error handling option to StrptimeOptions #12464

Closed
wants to merge 8 commits into from

Conversation

rok
Copy link
Member

@rok rok commented Feb 18, 2022

This is to resolve ARROW-15665.

@github-actions
Copy link

@rok
Copy link
Member Author

rok commented Feb 23, 2022

@pitrou I'm not sure what's causing this (Check failed: out.array()->buffers[0] Null bitmap deleted by kernel but can_write_into_slices = true). Any insight?
Would kernel preallocation settings help this or am I missing something else?

@pitrou
Copy link
Member

pitrou commented Feb 28, 2022

can_write_into_slices means the kernel exec machinery won't let you redefine buffer addresses: you can write into the buffers given to you by the machinery but you are not allowed to point them elsewhere.

@rok rok force-pushed the ARROW-15665 branch 6 times, most recently from 4df261b to a71ed14 Compare March 17, 2022 03:06
@rok rok marked this pull request as ready for review March 17, 2022 03:52
@rok
Copy link
Member Author

rok commented Mar 17, 2022

@pitrou I think this is ready for review. I'm not sure why can_write_into_slices=true is required here, could you please check that I'm not doing something ill advised.

@rok
Copy link
Member Author

rok commented Mar 17, 2022

@lidavidm I moved Strptime to scalar_temporal_unary.cc. Do you think that makes sense or better keep it in scalar_string_ascii.cc?

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to remove it from scalar_string_ascii.cc.

TimeUnit::type unit;
/// Raise on parsing errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior if we don't raise parse errors?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we may want to make this more descriptive, e.g. bool error_is_null or something. Opinions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to error_is_null. How about error_returns_null?

cpp/src/arrow/compute/kernels/scalar_string_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/scalar_string_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/scalar_string_test.cc Outdated Show resolved Hide resolved
size_t cur = 0;
std::string zone = "";
while (cur < format.size() - 1) {
if (format[cur] == '%') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to respect %% for escaping?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to regex replace to do escaping. Feels a bit over engineered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex is a bit much, how about just checking if the next char is '%' and advancing the string if so? (I think we do such things elsewhere)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to a sequence counting approach.

cpp/src/arrow/compute/kernels/scalar_temporal_unary.cc Outdated Show resolved Hide resolved
python/pyarrow/_compute.pyx Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/scalar_temporal_unary.cc Outdated Show resolved Hide resolved
@rok rok force-pushed the ARROW-15665 branch 2 times, most recently from 33ce31a to 70d6b28 Compare March 17, 2022 20:37
"""

def __init__(self, format, unit):
self._set_options(format, unit)
def __init__(self, format, unit, error_is_null=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be False for the default?

Can you also add a test for this default? (a test that checks an error is raised for the default options without explicitly passing error_is_null)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, it was a remainder from the previous raise_errors parameter. Changed and added a test.

@rok rok force-pushed the ARROW-15665 branch 3 times, most recently from e721a39 to 04202c9 Compare March 22, 2022 19:14
int64_t* out_data = out->GetMutableValues<int64_t>(1);

if (self.error_is_null) {
arrow::internal::BitmapWriter out_writer(out->buffers[0]->mutable_data(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we might need to test this more. I thought we merged an optimization where INTERSECTION wouldn't allocate a null bitmap if all inputs were non-null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what would you recommend we test? Bigger arrays?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays with no input nulls

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Added bitmap allocation and tests. Could you please check if we're still missing something?

@rok
Copy link
Member Author

rok commented Mar 23, 2022

@lidavidm thanks for the review. Could you take another look?

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM overall, just a couple comments

int64_t* out_data = out->GetMutableValues<int64_t>(1);

if (self.error_is_null) {
if (!in.MayHaveNulls()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might be better to directly check out->buffers[0] instead of relying on the optimization being performed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to if (out->buffers[0] == nullptr).

@rok rok requested a review from lidavidm March 24, 2022 15:18
@lidavidm lidavidm closed this in ddb663b Mar 24, 2022
@ursabot
Copy link

ursabot commented Mar 24, 2022

Benchmark runs are scheduled for baseline = 623a15e and contender = ddb663b. ddb663b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.34% ⬆️0.08%] test-mac-arm
[Failed ⬇️0.36% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.51% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants