Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Support writing lists that have null elements that are non-empty. #24425

Open
asfimport opened this issue Mar 26, 2020 · 4 comments

Comments

@asfimport
Copy link

asfimport commented Mar 26, 2020

With the new V2 level writing engine we can detect this case but fail as not implemented.  Fixing this will require changes to the "core" parquet API.

Reporter: Micah Kornfield / @emkornfield

Related issues:

Note: This issue was originally created as ARROW-8228. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
[~emkornfield@gmail.com] Can you elaborate on what's needed here?

The following is working at least:

>>> table = pa.table({"lists": [[1,2,None], None, [], [4,5]]})
>>> pq.write_table(table, "tt.pq")
>>> pq.read_table("tt.pq")
pyarrow.Table
lists: list<item: int64>
  child 0, item: int64
>>> pq.read_table("tt.pq").to_pandas()
             lists
0  [1.0, 2.0, nan]
1             None
2               []
3       [4.0, 5.0]

@asfimport
Copy link
Author

Micah Kornfield / @emkornfield:
The case that isn't covered is if you have something like:

Null bitmap: [Not Null, Null, Not Null]

List_offset = [0, 2, 5, 7]

Data= [0, 1, 2, 3, 4, 5, 6]

 

Nullable FixedSizeList also runs into this issue (any nulls in the middle can't be written).

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
Example with fixed size list:

>>> arr = pa.array([[1, 2], None, [3, 4]], pa.list_(pa.int64(), 2))
>>> arr.values
<pyarrow.lib.Int64Array object at 0x7f64aeebb820>
[
  1,
  2,
  null,
  null,
  3,
  4
]

>>> pq.write_table(pa.table({"col": arr}), "test.parquet")
ArrowNotImplementedError: Lists with non-zero length null components are not supported

@asfimport
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant