ARROW-969: [C++] Add add/remove field functions for RecordBatch#1574
ARROW-969: [C++] Add add/remove field functions for RecordBatch#1574xuepanchen wants to merge 4 commits intoapache:masterfrom
Conversation
|
Will review when I can, thanks! |
wesm
left a comment
There was a problem hiding this comment.
I spotted some possible issues / bugs
cpp/src/arrow/record_batch.cc
Outdated
| Status AddColumn(int i, const std::shared_ptr<Field>& field, | ||
| const std::shared_ptr<ArrayData>& column, | ||
| std::shared_ptr<RecordBatch>* out) const override { | ||
| if (i < 0 || i > num_columns() + 1) { |
There was a problem hiding this comment.
I think this should be i > num_columns(). This is also a bug in SimpleTable::AddColumn. Can you add a test where i == num_columns()?
cpp/src/arrow/record_batch.cc
Outdated
| std::stringstream ss; | ||
| ss << "Column " << i << " was null"; | ||
| return Status::Invalid(ss.str()); | ||
| } |
There was a problem hiding this comment.
I think these should both be DCHECK, since null would indicate a problem with application logic, so should be a "can't fail"
There was a problem hiding this comment.
I took a look at SimpleTable::AddColumn; there col is being null-checked -- I think that should also be a DCHECK
cpp/src/arrow/record_batch.cc
Outdated
| std::shared_ptr<ArrayData> column_data(int i) const override { return columns_[i]; } | ||
|
|
||
| Status AddColumn(int i, const std::shared_ptr<Field>& field, | ||
| const std::shared_ptr<ArrayData>& column, |
There was a problem hiding this comment.
Pass Array here instead, since that's more likely to be what the user has
| } | ||
|
|
||
| std::shared_ptr<Schema> new_schema; | ||
| RETURN_NOT_OK(schema_->AddField(i, field, &new_schema)); |
There was a problem hiding this comment.
We could leave the boundschecking above to Schema::AddField -- could you also check whether that function has the issues described above?
| std::shared_ptr<RecordBatch>* out) const = 0; | ||
|
|
||
| /// \brief Remove column from the record batch, producing a new RecordBatch | ||
| virtual Status RemoveColumn(int i, std::shared_ptr<RecordBatch>* out) const = 0; |
There was a problem hiding this comment.
Can you document the parameters for these functions?
cpp/src/arrow/table-test.cc
Outdated
| std::shared_ptr<RecordBatch> result; | ||
|
|
||
| // Negative tests with invalid index | ||
| Status status = batch.AddColumn(5, field1, array1->data(), &result); |
There was a problem hiding this comment.
Add a test for batch.AddColumn(2, ...) to address the edge case in the implementation. We probably need a corresponding test for Table (and maybe also Schema).
| ASSERT_TRUE(result->Equals(*batch3)); | ||
|
|
||
| ASSERT_OK(batch.RemoveColumn(2, &result)); | ||
| ASSERT_TRUE(result->Equals(*batch4)); |
There was a problem hiding this comment.
Add a test for removing an out of bounds index
wesm
left a comment
There was a problem hiding this comment.
+1. I am going to quickly add a convenience method AddColumn that accepts a string name instead of a Field
… of a Field Change-Id: I628c4d3a66f4f7b5f3946bf0047b551efceb077f
|
Merging, thanks @xuepanchen! @trxcllnt @TheNeuralBit odd unrelated JS failure here, in case you want to have a look |
|
@wesm looks like noise, but ping me if you see it again. |
Add AddColumn and RemoveColumn methods for RecordBatch, as well as test cases