Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is it possible to run try_into_collection on a Chunk instead of an Array? #82

Open
AlJohri opened this issue Oct 30, 2022 · 1 comment

Comments

@AlJohri
Copy link

AlJohri commented Oct 30, 2022

Starting with the parquet_read_parallel example from arrow2, I am trying to deserialize a Chunk into a Vec of structs.

Using the deserialize_parallel function as defined in the above example, the following code currently works for me:

pub struct Document {
    content: String,
}

...
let chunk = deserialize_parallel(&mut columns)?;
let array = StructArray::new(
    DataType::Struct(fields.clone()),
    chunk.arrays().to_vec(),
    None,
);
let documents: Vec<Document> = array.to_boxed().try_into_collection().unwrap();

Questions:

  1. With the currently exposed APIs in arrow2 and arrow2-convert, is there a better way to convert the Chunk into a Struct? I think the extra conversion from Chunk to StructArray with the to_boxed at the end is perhaps not the most efficient.
  2. Would it be possible to expose TryIntoCollection::try_into_collection directly on the Chunk as well?
@ncpenke
Copy link
Collaborator

ncpenke commented Oct 30, 2022

Hi @AlJohri thanks for the ticket:

  1. The way you've described is the best way to do it right now. You're absolutely right. The extra to_boxed is not necessary, and we can certainly improve that by providing a method directly on StructArray.
  2. Yes

Just a heads up, we're currently refactoring the crate to take advantage of some of the new features that will be introduced in 1.65, so we can fix this as part of those changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants