-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write null byte when indexing numeric dimensions with Hadoop #7020
Write null byte when indexing numeric dimensions with Hadoop #7020
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! (and apologies it has taken so long for a review)
I think this is reasonable, it's the same approach being used for the metrics columns.
I think it would be nice to add a test in InputRowSerdeTest
to cover this. All of the tests in travis are run with and without sql null compatibility, so you can probably just write one test that can assert that null valued input columns are either the null byte or zero depending on the NullHandling.replaceWithDefault()
.
392f3ae
to
7ad1d91
Compare
Thanks for taking a look at this! I added a test to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding a test 👍
// Write the null byte only if the default numeric value is still null. | ||
if (ret == null) { | ||
out.writeByte(NullHandling.IS_NULL_BYTE); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the extra blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@@ -190,7 +215,7 @@ public void serialize(ByteArrayDataOutput out, Object value) | |||
@Override | |||
public Long deserialize(ByteArrayDataInput in) | |||
{ | |||
return in.readLong(); | |||
return isNullByteSet(in) ? null : in.readLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be better to use a functional programming style here.
return Optional.ofNullable(in)
.filter(InputRowSerde::isNotNullByteSet)
.map(ByteArrayDataInput::readLong)
.get();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I sort of prefer it the way it currently is, seems clearer to me, is there any reason it would be better other than preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I prefer the functional style because it makes the code more readable. If we don't use Optional
, then we need add the @Nullable
annotation for this method. It's up to you. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the non-functional style. Also, maybe I'm misunderstanding, but wouldn't the get()
cause the code to throw if the null byte is set?
I'll add @Nullable
annotations to these deserialize methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but wouldn't the get() cause the code to throw if the null byte is set?
@ferristseng If the null byte is set, then get
will return a null
value. What you describe should be the orElseThrow
function. Thanks for your contribution.
@@ -229,7 +249,7 @@ public void serialize(ByteArrayDataOutput out, Object value) | |||
@Override | |||
public Float deserialize(ByteArrayDataInput in) | |||
{ | |||
return in.readFloat(); | |||
return isNullByteSet(in) ? null : in.readFloat(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
return Optional.ofNullable(in)
.filter(InputRowSerde::isNotNullByteSet)
.map(ByteArrayDataInput::readFloat)
.get();
@@ -268,7 +283,7 @@ public void serialize(ByteArrayDataOutput out, Object value) | |||
@Override | |||
public Double deserialize(ByteArrayDataInput in) | |||
{ | |||
return in.readDouble(); | |||
return isNullByteSet(in) ? null : in.readDouble(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
return Optional.ofNullable(in)
.filter(InputRowSerde::isNotNullByteSet)
.map(ByteArrayDataInput::readDouble)
.get();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM 👍 Also I left a few suggestions.
This has two approvals -- merging it. |
* write null byte in hadoop indexing for numeric dimensions * Add test case to check output serializing null numeric dimensions * Remove extra line * Add @nullable annotations
I noticed a couple of comments that hadn't been addressed in the Hadoop Indexing project regarding serializing and deserializing null numeric values, so I figured I would try to tackle it. I'm not super familiar with the internals of Druid, so let me know if I need to change code elsewhere.
Also, I ran the existing tests in the Hadoop Indexing project with
-Ddruid.generic.useDefaultValueForNull=false
, and they still passed. Let me know if I need to add additional ones!