You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's a corner case: suppose that I have data with type null, but it can have missing values so the whole array consists of nothing but nulls. In real life, this might only happen inside a nested data structure, at some level where an untyped data source (e.g. nested Python lists) had no entries so a type could not be determined. We expect to be able to write and read this data to and from Parquet, and we can—as long as it doesn't have an ExtensionType.
Here's an example that works, without ExtensionType:
If "nullable type null" were outside the set of types that should be writable to Parquet, then it would not work for the non-ExtensionType or it would fail on writing, not reading, so I'm quite sure this is a bug.
While trying to replicate this with simplified code (you can create a null array more easily, and an ExtensionArray from its storage array), it seems just creating this array already fails that way:
Also doing a full validation of the null extension array you created with from_buffers seems to fail (although without full=True it does not fail, which is a bit strange):
Joris Van den Bossche / @jorisvandenbossche:
There is a second issue here that the way you are creating the null array is using a buffer, while a it is expected that a NullArray has no buffers allocated at all (only stores the length). That trips up the ValidationFull (the second code example in the comment above)
Here's a corner case: suppose that I have data with type null, but it can have missing values so the whole array consists of nothing but nulls. In real life, this might only happen inside a nested data structure, at some level where an untyped data source (e.g. nested Python lists) had no entries so a type could not be determined. We expect to be able to write and read this data to and from Parquet, and we can—as long as it doesn't have an ExtensionType.
Here's an example that works, without ExtensionType:
And here's a continuation of that example, which doesn't work because the type
pa.null()
is replaced byAnnotatedType(pa.null(), \{"cool": "beans"})
:If "nullable type null" were outside the set of types that should be writable to Parquet, then it would not work for the non-ExtensionType or it would fail on writing, not reading, so I'm quite sure this is a bug.
Reporter: Jim Pivarski / @jpivarski
Assignee: Joris Van den Bossche / @jorisvandenbossche
PRs and other links:
Note: This issue was originally created as ARROW-14522. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: