-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37012: [MATLAB] Remove the private property ArrowArrays
from arrow.tabular.RecordBatch
#37012
Comments
take |
arrow::Array
objects within proxy objects.arrow::Array
objects.
arrow::Array
objects.ArrowArrays
property from arrow.tabular.RecordBatch
ArrowArrays
property from arrow.tabular.RecordBatch
ArrowArrays
property from arrow.tabular.RecordBatch
ArrowArrays
property from arrow.tabular.RecordBatch
ArrowArrays
from arrow.tabular.RecordBatch
kevingurney
pushed a commit
that referenced
this issue
Aug 4, 2023
…row.tabular.RecordBatch` (#37015) ### Rationale for this change In our initial implementation of the MATLAB class `arrow.tabular.RecordBatch`, we included a property property called `ArrowArrays`, which is a `cell` array whose elements are scalar `arrow.array.Array` objects. These arrays correspond to the columns in the `arrow::RecordBatch` that the MATLAB class wraps. The purpose of `ArrowArrays` was to enable zero-copy construction of the `arrow::Array` objects backing the `arrow::RecordBatch` from MATLAB arrays. The `ArrowArrays` property was necessary to ensure the MATLAB data from which the `arrow::Array` columns were constructed don't get deallocated before they are done being used. However, we no longer need the `ArrowArrays` property on `arrow.tabular.RecordBatch` because of #36615, in which we implemented `arrow::matlab::buffer::MatlabBuffer`. This class inherits from `arrow::Buffer` and stores a reference to the MATLAB data it wraps, ensuring that the wrapped MATLAB data is kept alive as long as the buffer is around. We now only create `arrow::Array` objects from `arrow::matlab::buffer::MatlabBuffer` objects - instead of `arrow::Buffer` objects. As a result, the `ArrowArrays` property is no longer necessary on `arrow.tabular.RecordBatch` because the `arrow::Array` columns within the `arrow::RecordBatch` are all backed by `arrow::matlab::buffer::MatlabBuffer` objects. The backing MATLAB data is kept alive as long as the arrays and their buffers are kept alive. ### What changes are included in this PR? 1. Removed the `ArrowArrays` property from `arrow.tabular.RecordBatch`. 2. Because `ArrowArrays` is no longer a property, we had to add a new method to `arrow::matlab::tabular::proxy::RecordBatch` called `getColumnByIndex()`. This method creates a proxy object around the specified `arrrow::Array`. The MATLAB method `RecordBatch/column(index)` uses `getColumnByIndex()`. 3. Added a new function called `wrap`, which accepts an `arrow::Array` and returns an `arrow::matlab::array::proxy::Array`. While working on `getColumnByIndex()`, we realized there would be multiple places in the interface where we will need to create proxies around `arrow::Array` objects. We wrote the `wrap` utility function to reduce duplicated code in the future. Currently, only `getColumnByIndex()` utilizes `wrap`. ### Are these changes tested? Yes. 1. The existing unit tests in `tRecordBatch.m` cover these changes. 2. Added a few more test cases to `tRecordBatch.m` to cover error conditions when a bad `index` value is provided to `arrow.tabular.RecordBatch/column(index)`. ### Are there any user-facing changes? No. ### Future Directions 1. In a followup PR, we will add support for indexing columns by names (similar to `Schema.field(fieldName)` - #37013). 2. We also plan on adding an index validation function within the `arrow.internal.validate` package. 3. We also plan on adding a convenience constructor `arrow.recordbatch()` and changing the constructor of `arrow.tabular.RecordBatch` to expect a `libmexclass.proxy.Proxy` object. * Closes: #37012 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne
pushed a commit
to loicalleyne/arrow
that referenced
this issue
Nov 13, 2023
…om `arrow.tabular.RecordBatch` (apache#37015) ### Rationale for this change In our initial implementation of the MATLAB class `arrow.tabular.RecordBatch`, we included a property property called `ArrowArrays`, which is a `cell` array whose elements are scalar `arrow.array.Array` objects. These arrays correspond to the columns in the `arrow::RecordBatch` that the MATLAB class wraps. The purpose of `ArrowArrays` was to enable zero-copy construction of the `arrow::Array` objects backing the `arrow::RecordBatch` from MATLAB arrays. The `ArrowArrays` property was necessary to ensure the MATLAB data from which the `arrow::Array` columns were constructed don't get deallocated before they are done being used. However, we no longer need the `ArrowArrays` property on `arrow.tabular.RecordBatch` because of apache#36615, in which we implemented `arrow::matlab::buffer::MatlabBuffer`. This class inherits from `arrow::Buffer` and stores a reference to the MATLAB data it wraps, ensuring that the wrapped MATLAB data is kept alive as long as the buffer is around. We now only create `arrow::Array` objects from `arrow::matlab::buffer::MatlabBuffer` objects - instead of `arrow::Buffer` objects. As a result, the `ArrowArrays` property is no longer necessary on `arrow.tabular.RecordBatch` because the `arrow::Array` columns within the `arrow::RecordBatch` are all backed by `arrow::matlab::buffer::MatlabBuffer` objects. The backing MATLAB data is kept alive as long as the arrays and their buffers are kept alive. ### What changes are included in this PR? 1. Removed the `ArrowArrays` property from `arrow.tabular.RecordBatch`. 2. Because `ArrowArrays` is no longer a property, we had to add a new method to `arrow::matlab::tabular::proxy::RecordBatch` called `getColumnByIndex()`. This method creates a proxy object around the specified `arrrow::Array`. The MATLAB method `RecordBatch/column(index)` uses `getColumnByIndex()`. 3. Added a new function called `wrap`, which accepts an `arrow::Array` and returns an `arrow::matlab::array::proxy::Array`. While working on `getColumnByIndex()`, we realized there would be multiple places in the interface where we will need to create proxies around `arrow::Array` objects. We wrote the `wrap` utility function to reduce duplicated code in the future. Currently, only `getColumnByIndex()` utilizes `wrap`. ### Are these changes tested? Yes. 1. The existing unit tests in `tRecordBatch.m` cover these changes. 2. Added a few more test cases to `tRecordBatch.m` to cover error conditions when a bad `index` value is provided to `arrow.tabular.RecordBatch/column(index)`. ### Are there any user-facing changes? No. ### Future Directions 1. In a followup PR, we will add support for indexing columns by names (similar to `Schema.field(fieldName)` - apache#37013). 2. We also plan on adding an index validation function within the `arrow.internal.validate` package. 3. We also plan on adding a convenience constructor `arrow.recordbatch()` and changing the constructor of `arrow.tabular.RecordBatch` to expect a `libmexclass.proxy.Proxy` object. * Closes: apache#37012 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
In our initial implementation of the MATLAB class
arrow.tabular.RecordBatch
, we included a property property calledArrowArrays
, which is acell
array whose elements are scalararrow.array.Array
objects. These arrays are the columns in thearrow::RecordBatch
that the MATLAB class wraps. The purpose ofArrowArrays
was to enable zero-copy construction of thearrow::Array
objects backing thearrow::RecordBatch
. TheArrowArrays
property was necessary to ensure the MATLAB data from which thearrow::Array
columns were constructed didn't go out of scope.However, we no longer need the
ArrowArrays
property onarrow.tabular.RecordBatch
because of #36615, in which we implementedarrow::matlab::buffer::MatlabBuffer
. This class inherits fromarrow::Buffer
and stores a reference to the MATLAB data it wraps. Doing so ensures the wrapped MATLAB data is kept alive as long as the buffer is around.We now only create
arrow::Array
objects fromarrow::matlab::buffer::MatlabBuffer
objects - instead ofarrow::Buffer
objects. As a result, theArrowArrays
property is no longer necessary onarrow.tabular.RecordBatch
because thearrow::Array
columns within thearrow::RecordBatch
are all backed byarrow::matlab::buffer::MatlabBuffer
objects. The backing MATLAB data is kept alive as long as the arrays and their buffers are alive.Component(s)
MATLAB
The text was updated successfully, but these errors were encountered: