Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Allow RecordBatch.length to be less than array lengths #22326

Closed
asfimport opened this issue Jul 12, 2019 · 2 comments
Closed

[C++] Allow RecordBatch.length to be less than array lengths #22326

asfimport opened this issue Jul 12, 2019 · 2 comments

Comments

@asfimport
Copy link

0.13 ignored RecordBatch.length.  0.14 requires that RecordBatch.length and array length be equal.  As per https://lists.apache.org/thread.html/2692dd8fe09c92aa313bded2f4c2d4240b9ef75a8604ec214eb02571@%3Cdev.arrow.apache.org%3E , we discussed changing this so that RecordBatch.length can be [0,array length].

 If RecordBatch.length is less than array length, the reader should ignore the portion of the array(s) beyond RecordBatch.length.  This will allow partially populated batches to be read in scenarios identified in the above discussion.

  Status GetFieldMetadata(int field_index, ArrayData* out) {
    auto nodes = metadata_->nodes();
    // pop off a field
    if (field_index >= static_cast<int>(nodes->size())) {
      return Status::Invalid("Ran out of field metadata, likely malformed");
    }
    const flatbuf::FieldNode* node = nodes->Get(field_index);

*    //out->length = node->length();*
*    out->length = metadata_->length();*
    out->null_count = node->null_count();
    out->offset = 0;
    return Status::OK();
  }

Attached is a test IPC File containing a batch with length 1, array length 3.

Reporter: John Muehlhausen / @jgm-ktg

Original Issue Attachments:

PRs and other links:

Note: This issue was originally created as ARROW-5916. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
I think that the fields' length needs to be respected in GetFieldMetadata (since this can vary for nested fields). The appropriate fix is probably here:

https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/reader.cc#L368

If the arrays come back from LoadArray longer than num_rows, then the record batch should be constructed then sliced.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
We didn't reach a conclusion on this so closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant