Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet array reader panics #72

Closed
alamb opened this issue Apr 26, 2021 · 1 comment
Closed

Parquet array reader panics #72

alamb opened this issue Apr 26, 2021 · 1 comment
Labels
parquet Changes to the parquet crate

Comments

@alamb
Copy link
Contributor

alamb commented Apr 26, 2021

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-8737

I'm trying to read some parquet files produced by Apache Spark 3.0.0-preview2 and the parquet crate is panicking. It should at least fail with an Err rather than panic.
{code:java}
thread '' panicked at 'index out of bounds: the len is 1024 but the index is 1087', /home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-0.17.0/src/arrow/record_reader.rs:415:21
stack backtrace:
0: 0x564dbc25a9d4 - backtrace::backtrace::libunwind::trace::hfcd33194db0151d4
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
1: 0x564dbc25a9d4 - backtrace::backtrace::trace_unsynchronized::hfd1904bbbd5335b5
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
2: 0x564dbc25a9d4 - std::sys_common::backtrace::_print_fmt::h8476c57b177b254e
at src/libstd/sys_common/backtrace.rs:78
3: 0x564dbc25a9d4 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h73acbc5f6d4b1044
at src/libstd/sys_common/backtrace.rs:59
4: 0x564dbc28727c - core::fmt::write::hdf236390fbd68d3d
at src/libcore/fmt/mod.rs:1069
5: 0x564dbc2536c3 - std::io::Write::write_fmt::h5722fa40bb2afafd
at src/libstd/io/mod.rs:1532
6: 0x564dbc25d2d5 - std::sys_common::backtrace::_print::ha468e873aada7c78
at src/libstd/sys_common/backtrace.rs:62
7: 0x564dbc25d2d5 - std::sys_common::backtrace::print::h149365a2f029de62
at src/libstd/sys_common/backtrace.rs:49
8: 0x564dbc25d2d5 - std::panicking::default_hook::{{closure}}::hb4a33f9e05934a52
at src/libstd/panicking.rs:198
9: 0x564dbc25d012 - std::panicking::default_hook::hc4535d7b0c743abd
at src/libstd/panicking.rs:218
10: 0x564dbc25d918 - std::panicking::rust_panic_with_hook::haa34a96a6dbd5a2e
at src/libstd/panicking.rs:477
11: 0x564dbc25d51b - rust_begin_unwind
at src/libstd/panicking.rs:385
12: 0x564dbc285071 - core::panicking::panic_fmt::hd101a87121fa411f
at src/libcore/panicking.rs:89
13: 0x564dbc285032 - core::panicking::panic_bounds_check::ha0668dcff6357ef4
at src/libcore/panicking.rs:65
14: 0x564dbbcdbf46 - parquet::arrow::record_reader::RecordReader::read_records::hc8f50faae4afaae7
15: 0x564dbbc4da98 - <parquet::arrow::array_reader::PrimitiveArrayReader as parquet::arrow::array_reader::ArrayReader>::next_batch::hb4e5b687cd08ee46
16: 0x564dbbcca3c9 - <core::iter::adapters::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold::h4206004da76eb745
17: 0x564dbbc51c51 - <parquet::arrow::array_reader::StructArrayReader as parquet::arrow::array_reader::ArrayReader>::next_batch::hf1c89300e65c72e8
18: 0x564dbbcacaba - <parquet::arrow::arrow_reader::ParquetRecordBatchReader as arrow::record_batch::RecordBatchReader>::next_batch::ha906d7eb32c7238a
19: 0x564dbbbe33b8 - std::sys_common::backtrace::__rust_begin_short_backtrace::hc2fd908045ecbee0
20: 0x564dbbb4a7ff - core::ops::function::FnOnce::call_once{{vtable.shim}}::h58c848a35fea035b
21: 0x564dbc264f7a - <alloc::boxed::Box as core::ops::function::FnOnce>::call_once::ha26a994a135d55de
at /rustc/1836e3b42a5b2f37fd79104eedbe8f48a5afdee6/src/liballoc/boxed.rs:1034
22: 0x564dbc264f7a - <alloc::boxed::Box as core::ops::function::FnOnce>::call_once::h677072ad3ba2806b
at /rustc/1836e3b42a5b2f37fd79104eedbe8f48a5afdee6/src/liballoc/boxed.rs:1034
23: 0x564dbc264f7a - std::sys::unix::thread::Thread::new::thread_start::h7c46ce580f54dd0e
at src/libstd/sys/unix/thread.rs:87
24: 0x7f332cf79669 - start_thread
at /build/glibc-t7JzpG/glibc-2.30/nptl/pthread_create.c:479
25: 0x7f332ce85323 - clone
26: 0x0 -
Error: DataFusionError(General("Error receiving batch: RecvError"))
{code}

@alamb alamb added the arrow Changes to the arrow crate label Apr 26, 2021
@alamb
Copy link
Contributor Author

alamb commented Apr 26, 2021

Comment from Andy Grove(andygrove) @ 2020-05-08T02:52:44.177+0000:

I was able to work around the issue by increasing a batch size from 1024 to 4096, but seems like there is a missing bounds check in this code.

@alamb alamb added the parquet Changes to the parquet crate label Apr 26, 2021
@jorgecarleitao jorgecarleitao removed the arrow Changes to the arrow crate label Apr 29, 2021
@jorgecarleitao jorgecarleitao changed the title [Parquet] Parquet array reader panics Parquet array reader panics Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

No branches or pull requests

3 participants