Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] Add Builder interface for adding Arrays to record batches #18622

Closed
asfimport opened this issue Apr 15, 2021 · 1 comment
Closed

[Rust] Add Builder interface for adding Arrays to record batches #18622

asfimport opened this issue Apr 15, 2021 · 1 comment

Comments

@asfimport
Copy link

Use case:

While writing tests (both in IOx and in DataFusion) where I need a single RecordBatch, I often find myself doing something like this:

        let schema = Arc::new(Schema::new(vec![
            ArrowField::new("float_field", ArrowDataType::Float64, true),
            ArrowField::new("time", ArrowDataType::Int64, true),
        ]));

        let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
        let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));

        let batch = RecordBatch::try_new(schema, vec![float_array, timestamp_array])
            .expect("created new record batch");

This is annoying because the information that float_field is a float is encoded both in the Schema and the Float64Array

I would much rather rather be able to construct RecordBatches a a builder style to avoid the the redundancy and reduce the amount of typing / redundancy:


        let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
        let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));

        let batch = RecordBatch::empty()
          .append("float_field", timestamp_array).unwrap()
          .append("time", float_array).unwrap;

The proposal is to add a method to RecordBatch like

impl RecordBatch {
...
  fn append(self, field_name: &str, field_values: ArrayRef) -> Result<Self>
}

That would append the a field name to the current schema, returning an error if field_name was already present.

The nullability of the field would be set based on the actual null count of the field_values

Reporter: Andrew Lamb / @alamb
Assignee: Andrew Lamb / @alamb

PRs and other links:

Note: This issue was originally created as ARROW-12411. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Andrew Lamb / @alamb:
Migrated to github: apache/arrow-rs#210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants