Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Add select method to pyarrow.RecordBatch #34359

Closed
AlenkaF opened this issue Feb 27, 2023 · 0 comments · Fixed by #34360
Closed

[Python] Add select method to pyarrow.RecordBatch #34359

AlenkaF opened this issue Feb 27, 2023 · 0 comments · Fixed by #34360

Comments

@AlenkaF
Copy link
Member

AlenkaF commented Feb 27, 2023

Describe the enhancement requested

There is a select method defined for pa.Table

arrow/python/pyarrow/table.pxi

Lines 3107 to 3156 in db60be2

def select(self, object columns):
"""
Select columns of the Table.
Returns a new Table with the specified columns, and metadata
preserved.
Parameters
----------
columns : list-like
The column names or integer indices to select.
Returns
-------
Table
Examples
--------
>>> import pyarrow as pa
>>> import pandas as pd
>>> df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
... 'n_legs': [2, 4, 5, 100],
... 'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]})
>>> table = pa.Table.from_pandas(df)
>>> table.select([0,1])
pyarrow.Table
year: int64
n_legs: int64
----
year: [[2020,2022,2019,2021]]
n_legs: [[2,4,5,100]]
>>> table.select(["year"])
pyarrow.Table
year: int64
----
year: [[2020,2022,2019,2021]]
"""
cdef:
shared_ptr[CTable] c_table
vector[int] c_indices
for idx in columns:
idx = self._ensure_integer_index(idx)
idx = _normalize_index(idx, self.num_columns)
c_indices.push_back(<int> idx)
with nogil:
c_table = GetResultValue(self.table.SelectColumns(move(c_indices)))
return pyarrow_wrap_table(c_table)

and we should add the same for pa.RecordBatch.

Component(s)

Python

@AlenkaF AlenkaF self-assigned this Feb 27, 2023
jorisvandenbossche pushed a commit that referenced this issue Feb 27, 2023
### Rationale for this change
There is a `select` method defined for `pa.Table` and we should add the same for `pa.RecordBatch.`

### What changes are included in this PR?
Added method to `RecordBatch` class in `table.pxi`.

### Are these changes tested?
Yes, tests are added to _python/pyarrow/tests/test_table.py_.

### Are there any user-facing changes?
No.
* Closes: #34359

Authored-by: Alenka Frim <frim.alenka@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche added this to the 12.0.0 milestone Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants