Skip to content

Comments

Parquet: Fix NPE in value reader building for projections#1164

Merged
rdblue merged 3 commits intoapache:masterfrom
rdblue:fix-parquet-projection-bug
Jul 6, 2020
Merged

Parquet: Fix NPE in value reader building for projections#1164
rdblue merged 3 commits intoapache:masterfrom
rdblue:fix-parquet-projection-bug

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Jul 3, 2020

The visitor that builds Parquet value readers for Iceberg generics did not handle cases where the expected type was null because a field was is not projected. This was not a problem for most primitive types that did not use the expected type, but cases that did would throw a NullPointerException.

Fixes #1117.

@rdblue rdblue changed the title Parquet: Fix bug building value readers Parquet: Fix NPE in value reader building for projections Jul 3, 2020
@rdblue rdblue force-pushed the fix-parquet-projection-bug branch from 93b8f5e to 775c74d Compare July 3, 2020 22:39
@rdblue rdblue requested a review from rdsr July 3, 2020 22:42
@rdsr
Copy link
Contributor

rdsr commented Jul 6, 2020

I was trying to look into why this was only a failure for Parquet. I might be wrong here, but It seems that for ORC and Avro were are building the readerFuncs in TestReadProjection.writeAndRead using Iceberg expected schema and format specific read schemas, whereas in Parquet we are using Iceberg expected schemas and the format specific file schema. Any thoughts on why that is the case?

@rdsr
Copy link
Contributor

rdsr commented Jul 6, 2020

Also should this be addressed for other formats?

Copy link
Contributor

@rdsr rdsr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@rdblue
Copy link
Contributor Author

rdblue commented Jul 6, 2020

@rdsr, the test applies to all formats and only Parquet was broken.

@rdblue rdblue merged commit 7ecf672 into apache:master Jul 6, 2020
@rdblue
Copy link
Contributor Author

rdblue commented Jul 6, 2020

Any thoughts on why that is the case?

The equivalent Avro builder doesn't use the expected primitive at all. The variable name is ignored. But looking at the ORC one, I think there would be a similar problem. I assumed it would have been caught by the test, but I guess we need to fix this there as well.

@rdsr
Copy link
Contributor

rdsr commented Jul 6, 2020

I assumed it would have been caught by the test, but I guess we need to fix this there as well.

I believe it is not getting caught because in ORC (as well as Avro) the reader func are utilizing the read schema which completely match the Iceberg expected schema so there will not be a case where the iceberg type is null and the format type is not, whereas in Parquet the reader func is using the file schema.

cmathiesen pushed a commit to ExpediaGroup/iceberg that referenced this pull request Aug 19, 2020
Co-authored-by: 范欣欣 <hzfanxinxin@corp.netease.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NullPointerException is throwed when function GenericParquetReaders.primitive is called and the parameter 'expected' is null

3 participants