Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10990: [Rust] Refactor simd comparison kernels to avoid out of bounds reads #8975

Conversation

jhorstmann
Copy link
Contributor

@jhorstmann jhorstmann commented Dec 20, 2020

  • Adjust tests so the input data is bigger than one vector lane
  • Remove value_slice function when ARROW-10989 gets merged

@codecov-io
Copy link

codecov-io commented Dec 20, 2020

Codecov Report

Merging #8975 (4835d17) into master (d4b2ad8) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8975      +/-   ##
==========================================
- Coverage   82.61%   82.60%   -0.02%     
==========================================
  Files         202      202              
  Lines       50055    50008      -47     
==========================================
- Hits        41355    41308      -47     
  Misses       8700     8700              
Impacted Files Coverage Δ
rust/arrow/src/datatypes.rs 76.45% <ø> (ø)
rust/arrow/src/compute/kernels/comparison.rs 95.91% <100.00%> (-0.38%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4b2ad8...4835d17. Read the comment docs.

@github-actions
Copy link

@github-actions github-actions bot added needs-rebase A PR that needs to be rebased by the author and removed needs-rebase A PR that needs to be rebased by the author labels Dec 24, 2020
@alamb
Copy link
Contributor

alamb commented Dec 27, 2020

@jhorstmann -- shall we convert this PR into "ready for review"? It seems important to me but I figured I would wait until you were done with your planned changes before looking into it more carefully.

@jhorstmann jhorstmann marked this pull request as ready for review December 27, 2020 18:58
@jhorstmann
Copy link
Contributor Author

@alamb I finished rewriting the tests and it's ready for review now

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jhorstmann!

I went through this PR carefully and I understand the logic and it all makes sense to me. I think it is worth considering removing value_slice as a separate PR but otherwise I think this is ready to go.

cc @nevi-me

@@ -67,19 +67,6 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
self.data.is_empty()
}

/// Returns a slice for the given offset and length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a breaking change in the API, it might be worth doing as part of a separate PR. What do you think @jorgecarleitao and @nevi-me ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale was that this would have already been removed as part of ARROW-10989, if not for this one usage in the comparison kernels. The straight-forward replacement would be values()[offset..offset+len].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this one, sorry.

I am fine removing it altogether. The change of signature from safe to unsafe in a previous commit already made this backward incompatible, anyways. As a user, I will prefer to migrate it to values()[offset..offset+len] instead of introducing an unsafe in my code base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last merge reintroduced the method, I created a followup ticket to remove it and also from BooleanArray so we have the api change documented in a ticket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


let data = ArrayData::new(
DataType::Boolean,
left.len(),
len,
None,
null_bit_buffer,
0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you didn't change this but I don't understand null_count here being zero -- it probably can be non-zero as we are just cloning the null buffer from left. In any event we don't need to handle that in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, no need to recalculate the null count when comparing against a scalar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API was deprecated altogether to remove the risk of a wrong count, which I think addresses this altogether :)

@alamb
Copy link
Contributor

alamb commented Dec 31, 2020

The full set of Rust CI tests did not run on this PR :(

Can you please rebase this PR against apache/master to pick up the changes in #9056 so that they do?

I apologize for the inconvenience.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. a couple of less unsafe 🎉 :)

@jhorstmann jhorstmann force-pushed the ARROW-10990-compare-kernels-out-of-bounds branch from 3fd84b1 to 5c5a55b Compare December 31, 2020 15:59
@jhorstmann
Copy link
Contributor Author

Rebased. The clippy warnings are all in the datafusion logical plan and don't seem related to this PR.

@alamb alamb closed this in 2d28778 Jan 1, 2021
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…bounds reads

- [x] Adjust tests so the input data is bigger than one vector lane
- [x] Remove `value_slice` function when ARROW-10989 gets merged

Closes apache#8975 from jhorstmann/ARROW-10990-compare-kernels-out-of-bounds

Authored-by: Jörn Horstmann <git@jhorstmann.net>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants