-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive: Fix NPE when reading a struct field with null value #4283
Hive: Fix NPE when reading a struct field with null value #4283
Conversation
Thanks for this @tprelle. Would you mind copying over information from the issue to the PR description so that people can find it properly (e.g. if they're looking at git history in their IDE etc). Additionally, I noticed you said the behavior of Spark was to return null if a null was encountered in this situation. I'm not a Hive user... is this the same behavior that's expected in Hive proper? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tprelle! This looks good to me outside of a few nits.
I'm not much of a Hive user, so it would be nice to have an end to end test where we read in null
records that were written (and not just check the result of the Object Inspector).
.../java/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergRecordObjectInspector.java
Show resolved
Hide resolved
.../java/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergRecordObjectInspector.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java
Show resolved
Hide resolved
ea5482b
to
56bc3ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me. Thank you @tprelle!
The test gives me a lot more confidence that we won't have a regression. I just have one or two small nits.
Can you please change the name of the PR to be more specific (like the issue is)? Maybe Hive: Fix NPE when reading a struct field with null value
? Will leave up to you the best explanation of what is being fixed, not just how it's fixed.
Also, adding a summary to the PR (vs just the issue link) would be helpful. Many people review git history in their IDE and don't necessarily see the issues or it's much less convenient to have to move out of their workflow for further context.
Again, really appreciate you reporting and fixing this issue!
Assume.assumeTrue("Failed on vectorized parquet for a bug in parquet vectorization", | ||
!("PARQUET".equals(fileFormat.name()) && isVectorized)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Instead of negating on assumeTrue
, why not just use assumeFalse
? Either that or distributing the not
operator, but I think the following is much easier to read.
Assume.assumeFalse("...", "PARQUET".equals(fileFormat.name()) && isVectorized);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you please update the Assume
statement to provide more details on what the bug is and why we can't run this test under these conditions?
The assume statement should provide enough details and context to know how to follow up if possible, or at least what the bug is. For example, some of the other Asssume
statements in this file say Tez is not implemented yet
which is straightforward. But I don't know what the vectorized parquet bug is when seeing this and I'm left with more questions. Is it something that's not yet implemented, or maybe there's an issue that can be referenced from the parquet project? A short high level summary of what would happen and possibly a link to an existing parquet-ticket or something would be really appreciated. Maybe this is something that can be fixed?
In general, the assume statements and error messages should be written in a way that we have enough context to understand why we're skipping the test and where to follow up if need be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix apache#4282 If we have a column which is a map of a struct, if the value for the key is null or the key not existing, hive should return null as ask in parent class org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector and not throwing an NullPointerException
56bc3ec
to
cf7622c
Compare
Thanks, @tprelle! |
Fix #4282