New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Cannot read empty DataFrame Interchange object #37050
Comments
Thank you for submitting the issue @stinodego! This is true, the After some research I found that the underlying issue is in the >>> # Using schema when constructing the table here otherwise
>>> # we get a Null Array which is not supported by the protocol
>>> my_schema = pa.schema([
... pa.field('col1', pa.int64()),])
>>> df = pa.table([[]], schema=my_schema)
>>> df
pyarrow.Table
col1: int64
----
col1: [[]]
>>> df.to_batches()
[]
>>> # Should result in
>>> batch = pa.record_batch([[]], schema=my_schema)
>>> batch
pyarrow.RecordBatch
col1: int64
----
col1: [] Because we use I have opened a new issue to fix the behaviour of |
Added a workaround in the case of empty dataframes with 0 chunks as it is more general (it is possible other libraries might also create interchange object without chunks). PR: #38037 |
…ataframes (#38037) ### Rationale for this change The implementation of the DataFrame Interchange Protocol does not currently support consumption of dataframes with 0 number of chunks (empty dataframes). ### What changes are included in this PR? Add a workaround to not error in this case. ### Are these changes tested? Yes, added `test_empty_dataframe` in `python/pyarrow/tests/interchange/test_conversion.py`. ### Are there any user-facing changes? No. * Closes: #37050 Authored-by: AlenkaF <frim.alenka@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…mpty dataframes (apache#38037) ### Rationale for this change The implementation of the DataFrame Interchange Protocol does not currently support consumption of dataframes with 0 number of chunks (empty dataframes). ### What changes are included in this PR? Add a workaround to not error in this case. ### Are these changes tested? Yes, added `test_empty_dataframe` in `python/pyarrow/tests/interchange/test_conversion.py`. ### Are there any user-facing changes? No. * Closes: apache#37050 Authored-by: AlenkaF <frim.alenka@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…mpty dataframes (apache#38037) ### Rationale for this change The implementation of the DataFrame Interchange Protocol does not currently support consumption of dataframes with 0 number of chunks (empty dataframes). ### What changes are included in this PR? Add a workaround to not error in this case. ### Are these changes tested? Yes, added `test_empty_dataframe` in `python/pyarrow/tests/interchange/test_conversion.py`. ### Are there any user-facing changes? No. * Closes: apache#37050 Authored-by: AlenkaF <frim.alenka@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…mpty dataframes (apache#38037) ### Rationale for this change The implementation of the DataFrame Interchange Protocol does not currently support consumption of dataframes with 0 number of chunks (empty dataframes). ### What changes are included in this PR? Add a workaround to not error in this case. ### Are these changes tested? Yes, added `test_empty_dataframe` in `python/pyarrow/tests/interchange/test_conversion.py`. ### Are there any user-facing changes? No. * Closes: apache#37050 Authored-by: AlenkaF <frim.alenka@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Describe the bug, including details regarding any error messages, version, and platform.
Creating an empty table, converting to the interchange format, then reading it back, gives an error:
I believe the reason for this is that
dfi.num_chunks()
is 0, when it should be 1 (a single, empty chunk).Component(s)
Python
The text was updated successfully, but these errors were encountered: