New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] How to add one level of nesting to flat table? #38912
Comments
If you can work with record batches I would suggest using import pyarrow as pa
batch = pa.RecordBatch.from_pydict({"a": [1, 2, 3], "b": [3, 4, 5]})
struct_array = batch.to_struct_array()
batch_result = pa.RecordBatch.from_arrays([struct_array], names=["c"])
# pyarrow.RecordBatch
# c: struct<a: int64, b: int64>
# child 0, a: int64
# child 1, b: int64
# ----
# c: -- is_valid: all not null
# -- child 0 type: int64
# [1,2,3]
# -- child 1 type: int64
# [3,4,5] If you need to work with tables then you can do the same for each individual chunk: # I think this should work
table = pa.table({"a": [1, 2, 3], "b": [3, 4, 5]})
batches = []
for b in table.to_batches():
batches.append(pa.RecordBatch.from_arrays([b.to_struct_array()], names=["c"]))
table_result = pa.Table.from_batches(batches)
# pyarrow.Table
# c: struct<a: int64, b: int64>
# child 0, a: int64
# child 1, b: int64
# ----
# c: [
# -- is_valid: all not null
# -- child 0 type: int64
# [1,2,3]
# -- child 1 type: int64
# [3,4,5]] |
Thanks a lot @AlenkaF ! Am I right such transformations Table <-> Batches cost close to zero according: "However, a table can be converted to and built from a sequence of record batches easily without needing to copy the underlying array buffers. A table can be streamed as an arbitrary number of record batches using a arrow::TableBatchReader. Conversely, a logical sequence of record batches can be assembled to form a table using one of the arrow::Table::FromRecordBatches() factory function overloads." |
|
We might want to add |
Oh, forgot the issue already exists with an open PR! :)#38520 |
Describe the usage question you have. Please include as many useful details as possible.
I have flat pa.Table:
How can I create new table from this one by adding one level of nesting?
So I want to have a new table with only one column "c" of type struct with two fields "a" and "b" and keep data from original table.
Component(s)
Python
The text was updated successfully, but these errors were encountered: