Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive: Fix NPE when reading a struct field with null value #4283

Conversation

tprelle
Copy link
Contributor

@tprelle tprelle commented Mar 7, 2022

Fix #4282

@github-actions github-actions bot added the MR label Mar 7, 2022
@kbendick
Copy link
Contributor

kbendick commented Mar 8, 2022

Thanks for this @tprelle. Would you mind copying over information from the issue to the PR description so that people can find it properly (e.g. if they're looking at git history in their IDE etc).

Additionally, I noticed you said the behavior of Spark was to return null if a null was encountered in this situation. I'm not a Hive user... is this the same behavior that's expected in Hive proper?

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tprelle! This looks good to me outside of a few nits.

I'm not much of a Hive user, so it would be nice to have an end to end test where we read in null records that were written (and not just check the result of the Object Inspector).

@tprelle tprelle force-pushed the protectIcebergRecordObjectInspectorFromNullStruct branch 7 times, most recently from ea5482b to 56bc3ec Compare March 8, 2022 21:04
Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. Thank you @tprelle!

The test gives me a lot more confidence that we won't have a regression. I just have one or two small nits.

Can you please change the name of the PR to be more specific (like the issue is)? Maybe Hive: Fix NPE when reading a struct field with null value? Will leave up to you the best explanation of what is being fixed, not just how it's fixed.

Also, adding a summary to the PR (vs just the issue link) would be helpful. Many people review git history in their IDE and don't necessarily see the issues or it's much less convenient to have to move out of their workflow for further context.

Again, really appreciate you reporting and fixing this issue!

Comment on lines 713 to 718
Assume.assumeTrue("Failed on vectorized parquet for a bug in parquet vectorization",
!("PARQUET".equals(fileFormat.name()) && isVectorized));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Instead of negating on assumeTrue, why not just use assumeFalse? Either that or distributing the not operator, but I think the following is much easier to read.

    Assume.assumeFalse("...", "PARQUET".equals(fileFormat.name()) && isVectorized);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you please update the Assume statement to provide more details on what the bug is and why we can't run this test under these conditions?

The assume statement should provide enough details and context to know how to follow up if possible, or at least what the bug is. For example, some of the other Asssume statements in this file say Tez is not implemented yet which is straightforward. But I don't know what the vectorized parquet bug is when seeing this and I'm left with more questions. Is it something that's not yet implemented, or maybe there's an issue that can be referenced from the parquet project? A short high level summary of what would happen and possibly a link to an existing parquet-ticket or something would be really appreciated. Maybe this is something that can be fixed?

In general, the assume statements and error messages should be written in a way that we have enough context to understand why we're skipping the test and where to follow up if need be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kbendick, sorry for the delay,
I create an issue for this #4403 and add the info inside the test.

@kbendick kbendick added this to the Iceberg 0.14.0 Release milestone Mar 13, 2022
@tprelle tprelle changed the title Return null if hive inspect a null record Hive: Fix NPE when reading a struct field with null value Mar 16, 2022
Fix apache#4282

If we have a column which is a map of a struct, if the value for the key
is null or the key not existing, hive should return null as ask in
parent class
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector
and not throwing an NullPointerException
@tprelle tprelle force-pushed the protectIcebergRecordObjectInspectorFromNullStruct branch from 56bc3ec to cf7622c Compare March 25, 2022 14:21
@rdblue rdblue merged commit 3959e2f into apache:master Jul 6, 2022
@rdblue
Copy link
Contributor

rdblue commented Jul 6, 2022

Thanks, @tprelle!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hive : Hive throw a NullPointerException when accessing a null struct inside a map.
4 participants