Skip to content

Commit

Permalink
Unblocking [RUST-INT][TPCH]: Add a call to .combine_chunks() when cre…
Browse files Browse the repository at this point in the history
…ating a Table from arrow (#700)

* When we create Tables from PyArrow tables, we can only handle one
batch at the moment
* This causes some issues when running our TPC-H unit tests, ostensibly
because the parquet files define multiple Parquet row groups
* This PR adds a temporary fix to unblock running our new Rust code on
TPC-H by calling to `.combine_chunks()` before creating the table so
that we only have one batch
  • Loading branch information
jaychia committed Mar 15, 2023
1 parent 78b9cc8 commit 095a12f
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions daft/table/table.py
Expand Up @@ -70,6 +70,9 @@ def _from_pytable(pyt: _PyTable) -> Table:
@staticmethod
def from_arrow(arrow_table: pa.Table) -> Table:
assert isinstance(arrow_table, pa.Table)
# TODO: [RUST-INT][TPCH] _PyTable.from_arrow_record_batches only supports single-batch inputs at the moment
# so we hack around it by combining the chunks first. We should fix this and remove arrow_table.combine_chunks() here.
arrow_table = arrow_table.combine_chunks()
pyt = _PyTable.from_arrow_record_batches(arrow_table.to_batches())
return Table._from_pytable(pyt)

Expand Down

0 comments on commit 095a12f

Please sign in to comment.