Hive: Fix NPE when reading a struct field with null value #4283

tprelle · 2022-03-07T22:29:18Z

kbendick · 2022-03-08T01:25:14Z

Thanks for this @tprelle. Would you mind copying over information from the issue to the PR description so that people can find it properly (e.g. if they're looking at git history in their IDE etc).

Additionally, I noticed you said the behavior of Spark was to return null if a null was encountered in this situation. I'm not a Hive user... is this the same behavior that's expected in Hive proper?

kbendick

Thanks @tprelle! This looks good to me outside of a few nits.

I'm not much of a Hive user, so it would be nice to have an end to end test where we read in null records that were written (and not just check the result of the Object Inspector).

.../java/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergRecordObjectInspector.java

...main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java

kbendick

This looks great to me. Thank you @tprelle!

The test gives me a lot more confidence that we won't have a regression. I just have one or two small nits.

Can you please change the name of the PR to be more specific (like the issue is)? Maybe Hive: Fix NPE when reading a struct field with null value? Will leave up to you the best explanation of what is being fixed, not just how it's fixed.

Also, adding a summary to the PR (vs just the issue link) would be helpful. Many people review git history in their IDE and don't necessarily see the issues or it's much less convenient to have to move out of their workflow for further context.

Again, really appreciate you reporting and fixing this issue!

kbendick · 2022-03-13T04:06:29Z

mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java

+    Assume.assumeTrue("Failed on vectorized parquet for a bug in parquet vectorization",
+        !("PARQUET".equals(fileFormat.name()) && isVectorized));


Nit: Instead of negating on assumeTrue, why not just use assumeFalse? Either that or distributing the not operator, but I think the following is much easier to read.

Assume.assumeFalse("...", "PARQUET".equals(fileFormat.name()) && isVectorized);

Also, can you please update the Assume statement to provide more details on what the bug is and why we can't run this test under these conditions?

The assume statement should provide enough details and context to know how to follow up if possible, or at least what the bug is. For example, some of the other Asssume statements in this file say Tez is not implemented yet which is straightforward. But I don't know what the vectorized parquet bug is when seeing this and I'm left with more questions. Is it something that's not yet implemented, or maybe there's an issue that can be referenced from the parquet project? A short high level summary of what would happen and possibly a link to an existing parquet-ticket or something would be really appreciated. Maybe this is something that can be fixed?

In general, the assume statements and error messages should be written in a way that we have enough context to understand why we're skipping the test and where to follow up if need be.

Hi @kbendick, sorry for the delay,
I create an issue for this #4403 and add the info inside the test.

Fix apache#4282 If we have a column which is a map of a struct, if the value for the key is null or the key not existing, hive should return null as ask in parent class org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector and not throwing an NullPointerException

rdblue · 2022-07-06T21:27:17Z

Thanks, @tprelle!

github-actions bot added the MR label Mar 7, 2022

kbendick reviewed Mar 8, 2022

View reviewed changes

tprelle force-pushed the protectIcebergRecordObjectInspectorFromNullStruct branch 7 times, most recently from ea5482b to 56bc3ec Compare March 8, 2022 21:04

pvary approved these changes Mar 9, 2022

View reviewed changes

kbendick approved these changes Mar 13, 2022

View reviewed changes

kbendick added this to the Iceberg 0.14.0 Release milestone Mar 13, 2022

tprelle changed the title ~~Return null if hive inspect a null record~~ Hive: Fix NPE when reading a struct field with null value Mar 16, 2022

tprelle mentioned this pull request Mar 25, 2022

Hive : Tez parquet vectorization throw a ClassCastException when accessing a null struct inside a map #4403

Closed

tprelle force-pushed the protectIcebergRecordObjectInspectorFromNullStruct branch from 56bc3ec to cf7622c Compare March 25, 2022 14:21

rdblue approved these changes Jul 6, 2022

View reviewed changes

rdblue merged commit 3959e2f into apache:master Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hive: Fix NPE when reading a struct field with null value #4283

Hive: Fix NPE when reading a struct field with null value #4283

tprelle commented Mar 7, 2022

kbendick commented Mar 8, 2022

kbendick left a comment

kbendick left a comment

kbendick Mar 13, 2022

kbendick Mar 13, 2022

tprelle Mar 25, 2022

rdblue commented Jul 6, 2022

		Assume.assumeTrue("Failed on vectorized parquet for a bug in parquet vectorization",
		!("PARQUET".equals(fileFormat.name()) && isVectorized));

Hive: Fix NPE when reading a struct field with null value #4283

Hive: Fix NPE when reading a struct field with null value #4283

Conversation

tprelle commented Mar 7, 2022

kbendick commented Mar 8, 2022

kbendick left a comment

Choose a reason for hiding this comment

kbendick left a comment

Choose a reason for hiding this comment

kbendick Mar 13, 2022

Choose a reason for hiding this comment

kbendick Mar 13, 2022

Choose a reason for hiding this comment

tprelle Mar 25, 2022

Choose a reason for hiding this comment

rdblue commented Jul 6, 2022