-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-3347: [Rust] Implement PrimitiveArrayBuilder #2858
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2858 +/- ##
==========================================
+ Coverage 87.55% 88.49% +0.93%
==========================================
Files 410 361 -49
Lines 63486 61702 -1784
==========================================
- Hits 55586 54602 -984
+ Misses 7828 7100 -728
+ Partials 72 0 -72
Continue to review full report at Codecov.
|
rust/src/builder.rs
Outdated
T: ArrowPrimitiveType, | ||
{ | ||
values_builder: BufferBuilder<T>, | ||
bitmap_builder: Option<BufferBuilder<bool>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how much we can save by making this an option, and having multiple checks in push*
doesn't look good - can we start with a allocated bitmap_builder
and optimize later if this is indeed an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I was on the fence about this. Sounds good, I'll remove the Option
rust/src/builder.rs
Outdated
|
||
/// Pushes a value of type T into the builder | ||
pub fn push(&mut self, v: $native_ty) -> Result<()> { | ||
if self.bitmap_builder.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use if let
syntax - it looks nicer :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
} | ||
|
||
/// Pushes an Option<T> into the builder | ||
pub fn push_option(&mut self, v: Option<$native_ty>) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like C++, perhaps we can have a version with the following interface:
push_values(&mut self, values: &[$native_ty], is_valid: &[u8])
which can allow us to efficiently memcpy
the values and null array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunchao ok with you if we add this in a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. SGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created ticket https://issues.apache.org/jira/browse/ARROW-3688
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kszucs, was waiting until it was merged...
rust/src/builder.rs
Outdated
} | ||
} | ||
|
||
/// Returns the capacity of this builder measured in slots of type T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use T
(with backticks) in all the places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, agreed.
rust/src/builder.rs
Outdated
Ok(()) | ||
} | ||
|
||
/// Pushes an Option<T> into the builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: also Option<T>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@sunchao, thanks for the review, appreciated. Will update soon |
The underlying |
@crepererum I'm not sure about this, the spec says that lengths should be But different implementations can provide support up to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @paddyhoran . LGTM except one nit.
Implementations are free to support larger than |
@crepererum my thinking is that as Does this seem right to you or do you still see an issue? |
I still don't see how "array slots" can be negative though. Same goes for everything that @xhochy said. How can a length be negative? Not matter how we decide, we have basically two options: |
These lengths cannot be negative. We use signed types due to some weird language behaviours in Java and C++. Sometimes, e.g. for the null count, we use -1 as a placeholder value that we need to compute this value first. Thus feel free in the Rust implementation to use one of the things @crepererum proposed. |
Thanks @xhochy. @crepererum the jira is here. Is this PR ready to merge then? |
Great. Thanks @paddyhoran for the work, patience and issue creation. Thanks @xhochy for the clarifications. IMHO the PR is now ready to merge. |
Rebased on top of #2868 |
@xhochy Is there any existing example on this? I assume if we choose |
Thanks for the PR and the reviews! |
@sunchao We should add rust to the integration testing suite: https://github.com/apache/arrow/tree/master/integration |
Sure. Filed ARROW-3688. |
Adds builder for
PrimitiveArray
.@sunchao has mentioned that it's unfortunate that we have to rely on macros to define the
impl
block for types implementingArrowPrimitiveType
. When specialization lands in stable we can remove much/all of this but for now we have to rely on macros.This implementation mostly focuses on being correct. However, maybe we should add
push_value_raw
andpush_null_raw
and allow the caller to handle updating the bitmap (i.e. avoid checking if the bitmapis_some
on everypush
)? If so, I can add this as a separate PR (along with other optimizations).