Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet reader fails to read null list. #1399

Closed
novemberkilo opened this issue Mar 5, 2022 · 3 comments · Fixed by #1448 or #1481
Closed

Parquet reader fails to read null list. #1399

novemberkilo opened this issue Mar 5, 2022 · 3 comments · Fixed by #1448 or #1481
Labels
bug parquet Changes to the parquet crate

Comments

@novemberkilo
Copy link
Contributor

novemberkilo commented Mar 5, 2022

Describe the bug
Parquet reader fails to read a parquet file that is generated from this json:

{"emptylist":[]}

To Reproduce
I have written a test on a branch to reproduce this failure.

  1. Generated a parquet-file from the aforementioned json. See WIP - Fix bug JSON input barfs on {"emptylist":[]}  #1063 (comment) and the related issue for context. Verified that this parquet file can be read using pyarrow as the following arrow table:
>>> empty_table = pq.read_table("empty_table.parquet")
>>> empty_table
pyarrow.Table
emptylist: list<item: null>
  child 0, item: null
----
emptylist: [[0 nulls]]
  1. Added this parquet file to a branch of my fork of parquet-testing novemberkilo/parquet-testing@a827b12

  2. Wrote a test that reads this file.

Test: novemberkilo@e5952ae
Failure: https://github.com/novemberkilo/arrow-rs/runs/5431110483?check_suite_focus=true#step:4:1999
Failure: https://github.com/novemberkilo/arrow-rs/runs/5431134675?check_suite_focus=true#step:7:1944

Expected behavior
Should read the file (like pyarrow can)

Additional context
This is all related to this original issue #1036 // @alamb

@alamb alamb added the parquet Changes to the parquet crate label Mar 5, 2022
@alamb
Copy link
Contributor

alamb commented Mar 7, 2022

@novemberkilo notes that this is a blocker to #1036

I don't fully understand how this bug report is related to the test failure, which seems to imply a missing file (rather than an error reading null lists)

https://github.com/novemberkilo/arrow-rs/runs/5431110483?check_suite_focus=true#step:4:1999

---- arrow::arrow_reader::tests::test_read_null_list stdout ----
thread 'arrow::arrow_reader::tests::test_read_null_list' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', parquet/src/arrow/arrow_reader.rs:1036:62
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@alamb
Copy link
Contributor

alamb commented Mar 7, 2022

I wonder if this could be related to #993 as well

@novemberkilo
Copy link
Contributor Author

I'm sorry I've linked the wrong test run in the original bug report.

The correct one is here: https://github.com/novemberkilo/arrow-rs/runs/5431134675?check_suite_focus=true#step:7:1944

---- arrow::arrow_reader::tests::test_read_null_list stdout ----
thread 'arrow::arrow_reader::tests::test_read_null_list' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', parquet/src/arrow/arrow_reader.rs:1036:62
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parquet Changes to the parquet crate
Projects
None yet
2 participants