Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot read parquet file #1515

Closed
zhaoyanggh opened this issue Apr 1, 2022 · 6 comments · Fixed by #1517
Closed

cannot read parquet file #1515

zhaoyanggh opened this issue Apr 1, 2022 · 6 comments · Fixed by #1517
Labels
bug parquet Changes to the parquet crate

Comments

@zhaoyanggh
Copy link

Describe the bug
I want to read the parquet I generated, which get the error when I use the "get_row_iter" api, I get this error:
thread 'main' panicked at 'called Option::unwrap() on a None value', /home/yzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-11.0.0/src/record/reader.rs:132:52

To Reproduce
Steps to reproduce the behavior:
This is my schema:
message table {
REPEATED group table_info {
REQUIRED BYTE_ARRAY name;
REPEATED group cols {
REQUIRED BYTE_ARRAY name;
REQUIRED INT32 type;
OPTIONAL INT32 length;
}
REPEATED group tags {
REQUIRED BYTE_ARRAY name;
REQUIRED INT32 type;
OPTIONAL INT32 length;
}
}
}

I can successfully read the parquet if I change the schema to :
message table {
REPEATED group table_info {
REQUIRED BYTE_ARRAY name;
REPEATED group cols {
REQUIRED BYTE_ARRAY name;
REQUIRED INT32 type;
OPTIONAL INT32 length;
}
}
}

Expected behavior
For my generated parquet file, I can successfully use mac's parquet-tools to read them:
Screen Shot 2022-04-01 at 10 50 44 AM

Additional context

@zhaoyanggh zhaoyanggh added the bug label Apr 1, 2022
@zhaoyanggh
Copy link
Author

@zhaoyanggh
Copy link
Author

and this is my parquet file

@jhorstmann
Copy link
Contributor

Can confirm the issue with the given file:

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', parquet/src/record/reader.rs:134:52
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:116:14
   2: core::panicking::panic
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:48:5
   3: core::option::Option<T>::unwrap
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/option.rs:729:21
   4: parquet::record::reader::TreeBuilder::reader_tree
             at ./parquet/src/record/reader.rs:134:31
   5: parquet::record::reader::TreeBuilder::reader_tree
             at ./parquet/src/record/reader.rs:301:38
   6: parquet::record::reader::TreeBuilder::reader_tree
             at ./parquet/src/record/reader.rs:281:34
   7: parquet::record::reader::TreeBuilder::reader_tree
             at ./parquet/src/record/reader.rs:301:38
   8: parquet::record::reader::TreeBuilder::reader_tree
             at ./parquet/src/record/reader.rs:281:34
   9: parquet::record::reader::TreeBuilder::build
             at ./parquet/src/record/reader.rs:79:26
  10: parquet::record::reader::TreeBuilder::as_iter
             at ./parquet/src/record/reader.rs:102:25
  11: <parquet::record::reader::RowIter as core::iter::traits::iterator::Iterator>::next
             at ./parquet/src/record/reader.rs:774:32
  12: parquet_read::main
             at ./parquet/src/bin/parquet-read.rs:84:15
  13: core::ops::function::FnOnce::call_once
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

The column path that can't be found is ["tags", "name"] but should be ["table_info", "tags", "name"].

@viirya
Copy link
Member

viirya commented Apr 1, 2022

The current List logic in get_arrow_field is not correct. Proposed a fix at #1517.

But after the fix, you still get:

panicked at 'Failed to read into array!: ArrowError("Reading repeated field (\"cols\") is not supported yet!")'

It is another issue, I think.

@zhaoyanggh
Copy link
Author

Thank you so much for the help. For the new problem, should I or someone else create a new github issue or just fix under this one.

@viirya
Copy link
Member

viirya commented Apr 2, 2022

It seems a known limitation, but I don't find related issue though. I think it is another issue, maybe you can create a new one.

@alamb alamb added the parquet Changes to the parquet crate label Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants