New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15665: [C++] Add error handling option to StrptimeOptions #12464
Conversation
da7c0b5
to
6352488
Compare
|
4df261b
to
a71ed14
Compare
@pitrou I think this is ready for review. I'm not sure why |
@lidavidm I moved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to remove it from scalar_string_ascii.cc.
cpp/src/arrow/compute/api_scalar.h
Outdated
TimeUnit::type unit; | ||
/// Raise on parsing errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the behavior if we don't raise parse errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, we may want to make this more descriptive, e.g. bool error_is_null
or something. Opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to error_is_null
. How about error_returns_null
?
size_t cur = 0; | ||
std::string zone = ""; | ||
while (cur < format.size() - 1) { | ||
if (format[cur] == '%') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to respect %%
for escaping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to regex replace to do escaping. Feels a bit over engineered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regex is a bit much, how about just checking if the next char is '%' and advancing the string if so? (I think we do such things elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to a sequence counting approach.
33ce31a
to
70d6b28
Compare
python/pyarrow/_compute.pyx
Outdated
""" | ||
|
||
def __init__(self, format, unit): | ||
self._set_options(format, unit) | ||
def __init__(self, format, unit, error_is_null=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be False for the default?
Can you also add a test for this default? (a test that checks an error is raised for the default options without explicitly passing error_is_null
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, it was a remainder from the previous raise_errors
parameter. Changed and added a test.
e721a39
to
04202c9
Compare
int64_t* out_data = out->GetMutableValues<int64_t>(1); | ||
|
||
if (self.error_is_null) { | ||
arrow::internal::BitmapWriter out_writer(out->buffers[0]->mutable_data(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we might need to test this more. I thought we merged an optimization where INTERSECTION
wouldn't allocate a null bitmap if all inputs were non-null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what would you recommend we test? Bigger arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrays with no input nulls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Added bitmap allocation and tests. Could you please check if we're still missing something?
@lidavidm thanks for the review. Could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM overall, just a couple comments
int64_t* out_data = out->GetMutableValues<int64_t>(1); | ||
|
||
if (self.error_is_null) { | ||
if (!in.MayHaveNulls()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it might be better to directly check out->buffers[0]
instead of relying on the optimization being performed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to if (out->buffers[0] == nullptr)
.
Benchmark runs are scheduled for baseline = 623a15e and contender = ddb663b. ddb663b is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
This is to resolve ARROW-15665.