[Parquet] Throw Better Exception with Vectorized Parquet V2 Format#2740
[Parquet] Throw Better Exception with Vectorized Parquet V2 Format#2740RussellSpitzer wants to merge 1 commit intoapache:masterfrom
Conversation
Previously when vectorizied reading was enabled and a ParquetV2 File was being read our reader would throw a NullPointer exception because the inputStream would not have the expected bytes. To temporarily give a better error, we rethrow any npe errors in attempting to read the encoding as a ParquetDecodingException with details on Vectorized reading only being compatible with V1 Encodings.
|
Fixes #2692 |
|
@kbendick + @samarthjain Could you take a look at this? |
|
@RussellSpitzer - I am hoping we can find a better solution here. I am generally not a fan of catching NPEs :) There are a few other approaches possible here:
I can take up looking into 2). |
|
@samarthjain i'm fine with just disabling if we detect if files are in V2 format. I just wanted to get this fix in for now since the current behavior is just throw an NPE which doesn't contain any information. I'd be fine removing it once we have a better fix, or if you know of a fast way we can detect if a file is V2 |
|
Will current Iceberg not work for v2 files at all, for example if the columns all wind up with plain encoding (as a trivial example)? I agree that catching the NPE isn't the most elegant solution, but if there's a possibility that page v2 records can be read if they only use the existing encoding, that would be nice. Otherwise, explicitly failing upon the encounter of a v2 page would be the best way to start. Please let me know if I can help somehow @samarthjain |
|
For now, a better solution would be to throw a similar exception like The place to make this change would be: |
|
I submitted a PR that adds limited support for Parquet V2. Like Spark vectorized reads, we fail if the encoding of non-dictionary encoded data is not plain. @RussellSpitzer, @kbendick - would be good to get your eyes on it. Thanks! |
Previously when vectorizied reading was enabled and a ParquetV2 File
was being read our reader would throw a NullPointer exception because
the inputStream would not have the expected bytes. To temporarily give
a better error, we rethrow any npe errors in attempting to read the
encoding as a ParquetDecodingException with details on Vectorized
reading only being compatible with V1 Encodings.