New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11149: [Rust] DF Support List/LargeList/FixedSizeList in create_batch_empty #9114
Conversation
rust/arrow/src/array/array_list.rs
Outdated
$list_builder | ||
) | ||
} | ||
_ => Err(ArrowError::Unsupported(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What ArrowError
should I throw on unimplemented? todo!? Or let's declare this one?
Thanks
I believe that we can make this with generics, now that we have |
@jorgecarleitao You are right, but I was thinking about modifying Something like:
But I am having a problem with constructing it,
Do you have any solutions how is it possible to do by Marcos? And is it a good idea? Thanks! |
Last time I tried something similar, I had to implement the |
Yes, better to implement it differently |
@jorgecarleitao and @nevi-me 👍 I've moved build_empty_list_array to function Thanks |
In apache#9114, I've prepared support for List/LargeList/FixedSizeList, but will be great to support more types
In apache#9114, I've prepared support for List/LargeList/FixedSizeList, but will be great to support more types
@andygrove Can you take a look? Thanks |
3da157f
to
f0251d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this, and it looks good and really useful.
I would just prefer that we have at least 1 test for this, for verification. I left some other minor comments, all optional.
Great work, @ovr
($type_builder:ident, $offset_type:ident) => {{ | ||
let values_builder = $type_builder::new(0); | ||
let mut builder = | ||
GenericListBuilder::<$offset_type, $type_builder>::new(values_builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that for an empty list, we know that the offset buffer will be a single entry, 0
, and the values buffer will be an empty buffer (len = 0
). Therefore, this code could be simplified by just passing the buffers directly instead of using builders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrow API is a Complex, I still dont know how to do it pretty simple as you suggested with Buffer. I think it's not a big performance impact to use builder in this place.
))), | ||
} | ||
} | ||
|
||
#[cfg(test)] | ||
mod tests { | ||
use crate::{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test just to verify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added.
rust/arrow/src/error.rs
Outdated
@@ -90,6 +91,9 @@ impl From<serde_json::Error> for ArrowError { | |||
impl Display for ArrowError { | |||
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { | |||
match self { | |||
ArrowError::Unsupported(source) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of Unimplemented
instead of Unsupported
? Just to be consistent with the unimplemented!
macro that rust already offers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a better name from reading parquet sources, It's called NYI
(Not yet implemented), I think it's better to use similar names accross packages.
Renamed.
Thanks
849d8d3
to
c898107
Compare
Codecov Report
@@ Coverage Diff @@
## master #9114 +/- ##
==========================================
- Coverage 81.61% 81.59% -0.02%
==========================================
Files 215 215
Lines 51867 51928 +61
==========================================
+ Hits 42329 42370 +41
- Misses 9538 9558 +20
Continue to review full report at Codecov.
|
rust/arrow/src/error.rs
Outdated
@@ -24,6 +24,9 @@ use std::error::Error; | |||
/// Many different operations in the `arrow` crate return this error type. | |||
#[derive(Debug)] | |||
pub enum ArrowError { | |||
/// "Not yet implemented" Arrow error. | |||
/// Returned when functionality is not yet available. | |||
NYI(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the error name should be explicit, like all others. I can't understand what NYI
means without having to go to the docs, and in an error message, the person often has no access to the docs (at least not in a 1 click thing).
NotYetImplemented
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that NotYetImplemented
is a better name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ovr -- I think this PR is looking good to go.
Ideally we would could get the rename of NYI
to NotYetImplemented
but I also think we could merge this PR and rename the enum in a follow on PR
However, since this this PR needs a rebase, sadly, perhaps we can do the rename as part of the rebase.
rust/arrow/src/error.rs
Outdated
@@ -24,6 +24,9 @@ use std::error::Error; | |||
/// Many different operations in the `arrow` crate return this error type. | |||
#[derive(Debug)] | |||
pub enum ArrowError { | |||
/// "Not yet implemented" Arrow error. | |||
/// Returned when functionality is not yet available. | |||
NYI(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that NotYetImplemented
is a better name
@alamb Rebased + renamed NYI -> |
Awesome -- thanks @ovr ! |
The travis CI run is backed up -- https://github.com/apache/arrow/pull/9114/checks?check_run_id=1735587739 hasn't finished -- and this PR has no non-rust changes. I think it is good to go |
…_batch_empty Previously `build_empty_list_array` was declared inside Parquet (`array_reader`), but I will use this function inside DataFushion's `create_batch_empty` (it's used inside hash_aggregate to make an empty batch from the provided schema that contains type for columns). I moved it to Arrow (because it's common and useful) and made `build_empty_large_list_array` (for large lists) on top of macros with different implementation than build_empty_list_array. Closes #9114 from ovr/issue-11149 Authored-by: Dmitry Patsura <zaets28rus@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
…_batch_empty Previously `build_empty_list_array` was declared inside Parquet (`array_reader`), but I will use this function inside DataFushion's `create_batch_empty` (it's used inside hash_aggregate to make an empty batch from the provided schema that contains type for columns). I moved it to Arrow (because it's common and useful) and made `build_empty_large_list_array` (for large lists) on top of macros with different implementation than build_empty_list_array. Closes apache#9114 from ovr/issue-11149 Authored-by: Dmitry Patsura <zaets28rus@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
…_batch_empty Previously `build_empty_list_array` was declared inside Parquet (`array_reader`), but I will use this function inside DataFushion's `create_batch_empty` (it's used inside hash_aggregate to make an empty batch from the provided schema that contains type for columns). I moved it to Arrow (because it's common and useful) and made `build_empty_large_list_array` (for large lists) on top of macros with different implementation than build_empty_list_array. Closes apache#9114 from ovr/issue-11149 Authored-by: Dmitry Patsura <zaets28rus@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Previously
build_empty_list_array
was declared inside Parquet (array_reader
), but I will use this function inside DataFushion'screate_batch_empty
(it's used inside hash_aggregate to make an empty batch from the provided schema that contains type for columns). I moved it to Arrow (because it's common and useful) and madebuild_empty_large_list_array
(for large lists) on top of macros with different implementation than build_empty_list_array.