Skip to content

Conversation

@nuno-faria
Copy link
Contributor

Which issue does this PR close?

Closes #1314.

Rationale for this change

Allow the conversion of batches to pyarrow when there are inconsistencies with the DataFrame's schema (i.e., on the nullability of columns).

What changes are included in this PR?

Are there any user-facing changes?

No.

Comment on lines +1870 to +1881
def test_parquet_non_null_column_to_pyarrow(ctx, tmp_path):
path = tmp_path.joinpath("t.parquet")

ctx.sql("create table t_(a int not null)").collect()
ctx.sql("insert into t_ values (1), (2), (3)").collect()
ctx.sql(f"copy (select * from t_) to '{path}'").collect()

ctx.register_parquet("t", path)
pyarrow_table = ctx.sql("select max(a) as m from t").to_arrow_table()
assert pyarrow_table.to_pydict() == {"m": [3]}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add

  • a regression test for the empty-batch case to ensure when there are zero record batches the DataFrame schema is still applied (because that’s the behavior the code has preserved).
  • a test covering an edge-case where an aggregation yields a NULL (e.g., max on an empty input) and ensure to_arrow_table correctly represents that ensures we didn't hide other nullability issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I've added those two new tests.

Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistencies between RecordBatch and DataFrame schemas cause to_arrow_table to fail

2 participants