-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance reading ByteViewArray
from parquet by removing an implicit copy
#6031
Conversation
One thing we could do is to remove the explclt Specifically, remove https://github.com/XiangpengHao/arrow-rs/blob/view-buffer/arrow-buffer/src/buffer/immutable.rs#L361-L370 and make it a function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XiangpengHao -- this makes a lot of sense to me
I think adding some more comments would help future readers, but I also don't think it is required for merge
@@ -316,7 +315,8 @@ impl ByteViewArrayDecoderPlain { | |||
} | |||
|
|||
pub fn read(&mut self, output: &mut ViewBuffer, len: usize) -> Result<usize> { | |||
let block_id = output.append_block(self.buf.clone().into()); | |||
let buf = arrow_buffer::Buffer::from_bytes(self.buf.clone().into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should at least add a comment the rationale for this non obvious code.
Maybe it would make sense to pull it into its a function (that could be commented, and more easily discoverable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about creating a from_arrow_bytes(...)
and from_bytes(...)
, and then remove the impl<T: AsRef<[u8]>> From<T> for Buffer
as it is too easy to misuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend we split the code into multiple PRs -- this one to improve performance of the parquet reader and one to make it harder to misuse the API (which I suspect will be a breaking API change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the breakage is much larger than I thought, will continue on this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #6033 to track the idea
@@ -71,7 +71,6 @@ struct ByteViewArrayReader { | |||
} | |||
|
|||
impl ByteViewArrayReader { | |||
#[allow(unused)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I think we can justify non intuitive code with sufficient performance gains (justified by benchmarks) and sufficient comments to
Thak you @aoli-al 🙏 |
Here is what I think we should do for next steps:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
ByteViewArray
from parquet by removing an implicit copy
Filed #6034 to track this issue |
Which issue does this PR close?
Closes #.
Rationale for this change
I made a mistake in a previous pr which writes:
I thought the
into()
will convert theBytes
intoarray_buffer::Buffer()
without copying the data.But
self.buf
isbytes::Bytes
, notarrow_buffer::Bytes
(they confusingly having the same name). The consequence is that the code above will run into this procedure: https://github.com/XiangpengHao/arrow-rs/blob/view-buffer/arrow-buffer/src/buffer/immutable.rs#L361-L370, which implicitly copies data.(I think we should do something to prevent future similar mistakes, but I don't know how).
What changes are included in this PR?
This PR explicitly converts
bytes::Bytes
intoarrow_buffer::Bytes
and then converts toarrow_buffer::Buffer
; I have manually verified that each of the conversions requires no copy.This further improves the benchmark performance:
Bonus (not related to this PR)
You may wonder how it is possible to be faster than StringArray if the current implementation still copies data. TLDR: bc
memcpy
inlining.In our benchmark, every string is larger than 12 bytes, this means that when making the views, we always fall into this branch. The specialty about this branch is that it only reads the first 4 bytes of the string, and LLVM is smart enough to inline loading the values (i.e., prevent calling into
copy_non_overlapping
).Is that a big deal?
Unfortunately, yes. If you change the benchmark string to very small, e.g.,
"test"
, you should be able to see (using the same benchmark script above) the performance of loadingViewArray
drops significantly, and it doesn't make sense -- how could build shorter string much slower than building longer string?How to fix the short string regression?
The root cause is that when string is short, we need to
memcpy
the bytes to theview
, as the compiler has no idea how long the string is, it can not inline the bytes and has to call tomemcpy
, which is slow.To convince you more, here's a special variant of
make_view
, you can replace this function with the following new implementation, and the performance will back to normal. This new implementation makes 13 copies of make_view_inner; each copy is a specialized length known to the compiler (LEN
is const), and then the compiler can optimize each of the implementations as needed. Looking at the assembly code, there is indeed no call tomemcpy
.(special thanks to @aoli-al for triangulating the root cause and prototype the fix)
What should we do?
I don't know, I'm not sure if we want to merge that special
make_view
, as it is very unintuitive. I'll keep a local copy of thismake_view
and think about it in the background, maybe I'll have a better idea in a month.Are there any user-facing changes?