-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10990: [Rust] Refactor simd comparison kernels to avoid out of bounds reads #8975
ARROW-10990: [Rust] Refactor simd comparison kernels to avoid out of bounds reads #8975
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8975 +/- ##
==========================================
- Coverage 82.61% 82.60% -0.02%
==========================================
Files 202 202
Lines 50055 50008 -47
==========================================
- Hits 41355 41308 -47
Misses 8700 8700
Continue to review full report at Codecov.
|
@jhorstmann -- shall we convert this PR into "ready for review"? It seems important to me but I figured I would wait until you were done with your planned changes before looking into it more carefully. |
@alamb I finished rewriting the tests and it's ready for review now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jhorstmann!
I went through this PR carefully and I understand the logic and it all makes sense to me. I think it is worth considering removing value_slice
as a separate PR but otherwise I think this is ready to go.
cc @nevi-me
@@ -67,19 +67,6 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> { | |||
self.data.is_empty() | |||
} | |||
|
|||
/// Returns a slice for the given offset and length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a breaking change in the API, it might be worth doing as part of a separate PR. What do you think @jorgecarleitao and @nevi-me ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My rationale was that this would have already been removed as part of ARROW-10989, if not for this one usage in the comparison kernels. The straight-forward replacement would be values()[offset..offset+len]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this one, sorry.
I am fine removing it altogether. The change of signature from safe
to unsafe
in a previous commit already made this backward incompatible, anyways. As a user, I will prefer to migrate it to values()[offset..offset+len]
instead of introducing an unsafe
in my code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last merge reintroduced the method, I created a followup ticket to remove it and also from BooleanArray
so we have the api change documented in a ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it https://issues.apache.org/jira/browse/ARROW-11099 👍 thanks @jhorstmann
|
||
let data = ArrayData::new( | ||
DataType::Boolean, | ||
left.len(), | ||
len, | ||
None, | ||
null_bit_buffer, | ||
0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you didn't change this but I don't understand null_count
here being zero -- it probably can be non-zero as we are just cloning the null buffer from left. In any event we don't need to handle that in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, no need to recalculate the null count when comparing against a scalar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API was deprecated altogether to remove the risk of a wrong count, which I think addresses this altogether :)
The full set of Rust CI tests did not run on this PR :( Can you please rebase this PR against apache/master to pick up the changes in #9056 so that they do? I apologize for the inconvenience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. a couple of less unsafe
🎉 :)
3fd84b1
to
5c5a55b
Compare
Rebased. The clippy warnings are all in the datafusion logical plan and don't seem related to this PR. |
…re-kernels-out-of-bounds
…bounds reads - [x] Adjust tests so the input data is bigger than one vector lane - [x] Remove `value_slice` function when ARROW-10989 gets merged Closes apache#8975 from jhorstmann/ARROW-10990-compare-kernels-out-of-bounds Authored-by: Jörn Horstmann <git@jhorstmann.net> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
value_slice
function when ARROW-10989 gets merged