Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet column array<struct<>> with null value is read in as empty list #251

Open
puchengy opened this issue Jan 4, 2024 · 3 comments 路 Fixed by #252
Open

Parquet column array<struct<>> with null value is read in as empty list #251

puchengy opened this issue Jan 4, 2024 · 3 comments 路 Fixed by #252
Assignees

Comments

@puchengy
Copy link
Contributor

puchengy commented Jan 4, 2024

Apache Iceberg version

main (development)

Please describe the bug 馃悶

An Iceberg table with column type array that has null value is read in as empty list, however, it should be none instead.

reproducible scripts: puchengy@3fd6d3d

shows

>       assert pyberg_val == direct_val
E       assert [] == None
@puchengy puchengy changed the title [BUG] parquet col array<int> with null value is read in as empty list but should but none [BUG] parquet col array<int> with null value is read in as empty list but should be none Jan 4, 2024
@Fokko Fokko self-assigned this Jan 5, 2024
@Fokko
Copy link
Contributor

Fokko commented Jan 5, 2024

Great catch @puchengy, let me see what's needed to fix this

Fokko added a commit to Fokko/iceberg-python that referenced this issue Jan 5, 2024
Wrapping the list seems to introduce an odd behavior where
`null` values are converted to an empty list `[]`.

Resolves apache#251
Fokko added a commit to Fokko/iceberg-python that referenced this issue Jan 5, 2024
Wrapping the list seems to introduce an odd behavior where
`null` values are converted to an empty list `[]`.

Resolves apache#251
@Fokko
Copy link
Contributor

Fokko commented Jan 5, 2024

I've found the issue. We don't respect the null count when fetching the array through the accessor:

image

We just return the array and then create a new array with offset 1, and then it just injects a []

@HonahX
Copy link
Contributor

HonahX commented Jan 26, 2024

There is still an edge case unfixed. We need to wait for an upstream fix: apache/arrow#38809

ref: #252 (comment)

@HonahX HonahX reopened this Jan 26, 2024
@Fokko Fokko changed the title [BUG] parquet col array<int> with null value is read in as empty list but should be none Parquet column array<struct<>> with null value is read in as empty list Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants