-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MATLAB] Add support for indexing RecordBatch
columns by Field
name
#37473
Comments
kevingurney
added a commit
that referenced
this issue
Aug 30, 2023
…`Field` name (#37475) ### Rationale for this change Currently, `arrow.tabular.Schema` supports indexing by `Field` name. However, `arrow.tabular.RecordBatch` does not. This pull request adds the ability to index columns in a `RecordBatch` by `Field` name. ### What changes are included in this PR? 1. Added support for indexing columns in a `RecordBatch` by `Field` name via the `column` method. **Example** ```matlab >> recordBatch = arrow.tabular.RecordBatch.fromArrays(... arrow.array([1, 2, 3]), ... arrow.array(["A", "B", "C"]), ... arrow.array([true, false, true]), ... ColumnNames=["A", "B", "C"] ... ) recordBatch = A: [ 1, 2, 3 ] B: [ "A", "B", "C" ] C: [ true, false, true ] >> recordBatch.column("B") ans = [ "A", "B", "C" ] >> recordBatch.column("C") ans = [ true, false, true ] ``` 2. Removed comments about vectorizing `field` method of `Schema` and `column` method of `RecordBatch`. After further consideration, we believe it would make more sense to only allow these methods to accept scalar inputs. We could revisit support for vectorization if we overload the parenthesis operator (e.g. `recordBatch(rows, columns)`) in the future to return another `RecordBatch`/`Schema` that only includes the selected columns/fields. 3. Fixed typo in `tSchema.m`. ### Are these changes tested? Yes. 1. Added tests for indexing by column name using the `column` method to `tRecordBatch.m`. ### Are there any user-facing changes? Yes. 1. Users can now index `RecordBatch` columns by name using the syntax `column(name)`. ### Future Directions 1. Consider overloading parentheses-based indexing on `RecordBatch` and `Schema`. * Closes: #37473 Lead-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne
pushed a commit
to loicalleyne/arrow
that referenced
this issue
Nov 13, 2023
…ns by `Field` name (apache#37475) ### Rationale for this change Currently, `arrow.tabular.Schema` supports indexing by `Field` name. However, `arrow.tabular.RecordBatch` does not. This pull request adds the ability to index columns in a `RecordBatch` by `Field` name. ### What changes are included in this PR? 1. Added support for indexing columns in a `RecordBatch` by `Field` name via the `column` method. **Example** ```matlab >> recordBatch = arrow.tabular.RecordBatch.fromArrays(... arrow.array([1, 2, 3]), ... arrow.array(["A", "B", "C"]), ... arrow.array([true, false, true]), ... ColumnNames=["A", "B", "C"] ... ) recordBatch = A: [ 1, 2, 3 ] B: [ "A", "B", "C" ] C: [ true, false, true ] >> recordBatch.column("B") ans = [ "A", "B", "C" ] >> recordBatch.column("C") ans = [ true, false, true ] ``` 2. Removed comments about vectorizing `field` method of `Schema` and `column` method of `RecordBatch`. After further consideration, we believe it would make more sense to only allow these methods to accept scalar inputs. We could revisit support for vectorization if we overload the parenthesis operator (e.g. `recordBatch(rows, columns)`) in the future to return another `RecordBatch`/`Schema` that only includes the selected columns/fields. 3. Fixed typo in `tSchema.m`. ### Are these changes tested? Yes. 1. Added tests for indexing by column name using the `column` method to `tRecordBatch.m`. ### Are there any user-facing changes? Yes. 1. Users can now index `RecordBatch` columns by name using the syntax `column(name)`. ### Future Directions 1. Consider overloading parentheses-based indexing on `RecordBatch` and `Schema`. * Closes: apache#37473 Lead-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss
pushed a commit
to dgreiss/arrow
that referenced
this issue
Feb 19, 2024
…ns by `Field` name (apache#37475) ### Rationale for this change Currently, `arrow.tabular.Schema` supports indexing by `Field` name. However, `arrow.tabular.RecordBatch` does not. This pull request adds the ability to index columns in a `RecordBatch` by `Field` name. ### What changes are included in this PR? 1. Added support for indexing columns in a `RecordBatch` by `Field` name via the `column` method. **Example** ```matlab >> recordBatch = arrow.tabular.RecordBatch.fromArrays(... arrow.array([1, 2, 3]), ... arrow.array(["A", "B", "C"]), ... arrow.array([true, false, true]), ... ColumnNames=["A", "B", "C"] ... ) recordBatch = A: [ 1, 2, 3 ] B: [ "A", "B", "C" ] C: [ true, false, true ] >> recordBatch.column("B") ans = [ "A", "B", "C" ] >> recordBatch.column("C") ans = [ true, false, true ] ``` 2. Removed comments about vectorizing `field` method of `Schema` and `column` method of `RecordBatch`. After further consideration, we believe it would make more sense to only allow these methods to accept scalar inputs. We could revisit support for vectorization if we overload the parenthesis operator (e.g. `recordBatch(rows, columns)`) in the future to return another `RecordBatch`/`Schema` that only includes the selected columns/fields. 3. Fixed typo in `tSchema.m`. ### Are these changes tested? Yes. 1. Added tests for indexing by column name using the `column` method to `tRecordBatch.m`. ### Are there any user-facing changes? Yes. 1. Users can now index `RecordBatch` columns by name using the syntax `column(name)`. ### Future Directions 1. Consider overloading parentheses-based indexing on `RecordBatch` and `Schema`. * Closes: apache#37473 Lead-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
Currently,
arrow.tabular.Schema
supports indexing byField
name. However,arrow.tabular.RecordBatch
does not.The ability to index columns in a
RecordBatch
byField
name would be a helpful usability improvement.Example
Component(s)
MATLAB
The text was updated successfully, but these errors were encountered: