Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Consolidate shared methods of RecordBatch and Table #30559

Open
asfimport opened this issue Dec 9, 2021 · 2 comments
Open

[Python] Consolidate shared methods of RecordBatch and Table #30559

asfimport opened this issue Dec 9, 2021 · 2 comments

Comments

@asfimport
Copy link

asfimport commented Dec 9, 2021

RecordBatch and Table have a bunch of similar methods that don't directly interact with the C++ pointer, and thus that could be shared in a common base class.

In addition, we also have some methods on Table that would also be useful for RecordBatch (eg cast, group_by, drop, select, sort_by, rename_columns), which could also be shared with a common mixin.

Reporter: Joris Van den Bossche / @jorisvandenbossche

Related issues:

Other ideas without dedicated issues:

  • Make pa.record_batch(..) constructor consistent with pa.table(..) (eg accepting a dict of column names -> column values)

Note: This issue was originally created as ARROW-15042. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Suhail Razzak:
Hey @jorisvandenbossche, in this case, would we want something like an _NDFrame base class that stores these kinds of methods? Would that approach make sense?

@asfimport
Copy link
Author

Todd Farmer / @toddfarmer:
This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

jorisvandenbossche pushed a commit that referenced this issue Apr 27, 2023
)

### Rationale for this change

This is an incremental first step towards #30559

### What changes are included in this PR?

Introduce `class _Table` in `table.pxi`.

### Are these changes tested?

Existing pytests will check for regression.

### Are there any user-facing changes?

No
* Closes: #34979

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
apache#34980)

### Rationale for this change

This is an incremental first step towards apache#30559

### What changes are included in this PR?

Introduce `class _Table` in `table.pxi`.

### Are these changes tested?

Existing pytests will check for regression.

### Are there any user-facing changes?

No
* Closes: apache#34979

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
apache#34980)

### Rationale for this change

This is an incremental first step towards apache#30559

### What changes are included in this PR?

Introduce `class _Table` in `table.pxi`.

### Are these changes tested?

Existing pytests will check for regression.

### Are there any user-facing changes?

No
* Closes: apache#34979

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
apache#34980)

### Rationale for this change

This is an incremental first step towards apache#30559

### What changes are included in this PR?

Introduce `class _Table` in `table.pxi`.

### Are these changes tested?

Existing pytests will check for regression.

### Are there any user-facing changes?

No
* Closes: apache#34979

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
jorisvandenbossche added a commit that referenced this issue Mar 15, 2024
### Rationale for this change

These methods are present on `Table` but missing on `RecordBatch`:

* `add_column`
* `append_column`
* `remove_column`
* `set_column`
* `drop_columns`
* `rename_columns`
* `cast`

We also should probably accept a `dict` as input to `pa.record_batch` like we do for `pa.table`.

### What changes are included in this PR?

Add the methods.

### Are these changes tested?

Yes.

* Parent issue: #36399
* Related: #30559
* Closes #30915
* GitHub Issue: #30915

Lead-authored-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant