-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow support #21
Arrow support #21
Conversation
I think the reason for the failing test is a bug in the Vaex implementation. It seems to create a bitmask where a missing value should be defined with >>> import pyarrow as pa
>>> table = pa.table([[False]], names=["A"])
>>> table
pyarrow.Table
A: bool
----
A: [[false]]
>>> import vaex
>>> from vaex.dataframe_protocol import from_dataframe_to_vaex
>>> vaex_df = from_dataframe_to_vaex(table)
>>> vaex_df
# A
0 False
>>> vaex_df_p = vaex_df.__dataframe__()
>>> vaex_col_p = vaex_df_p.get_column_by_name("A")
>>> vaex_col_p.describe_null
(3, 0)
>>> vaex_col_p.get_buffers()
{'data': (VaexBuffer({'bufsize': 1, 'ptr': 105553125834912, 'device': 'CPU'}), (<_DtypeKind.BOOL: 20>, 8, '|b1', '|')), 'validity': (VaexBuffer({'bufsize': 1, 'ptr': 105553125865376, 'device': 'CPU'}), (<_DtypeKind.BOOL: 20>, 8, '|b1', '|')), 'offsets': None}
>>> vaex_col_p._get_validity_buffer()[0]._x
array([False]) In the above example |
13cb838
to
6aa17f4
Compare
FWIW I am not sure it is very important to test both Table and RecordBatch (the implementation is almost fully shared under the hood, and certainly when creating the Table in a simple way, it will typically consist of a single batch, and then it is fully the same), in case that complicates the testing code |
2215ab8
to
8f8d2ad
Compare
Can't seem to install `pandas` nightlies without `>=3.9`
8f8d2ad
to
707b8c3
Compare
707b8c3
to
0c74fcb
Compare
In the end it looks like
Good shout, fortunately its been pretty simple to test |
Great news! 🎉 |
Should resolve #20. Wrapped
pa.Table
but haven't gone through all the current failures, and will need to do the same withpa.RecordBatch
.