Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write null byte when indexing numeric dimensions with Hadoop #7020

Merged
merged 4 commits into from
Mar 12, 2019

Conversation

ferristseng
Copy link
Contributor

I noticed a couple of comments that hadn't been addressed in the Hadoop Indexing project regarding serializing and deserializing null numeric values, so I figured I would try to tackle it. I'm not super familiar with the internals of Druid, so let me know if I need to change code elsewhere.

Also, I ran the existing tests in the Hadoop Indexing project with -Ddruid.generic.useDefaultValueForNull=false, and they still passed. Let me know if I need to add additional ones!

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! (and apologies it has taken so long for a review)

I think this is reasonable, it's the same approach being used for the metrics columns.

I think it would be nice to add a test in InputRowSerdeTest to cover this. All of the tests in travis are run with and without sql null compatibility, so you can probably just write one test that can assert that null valued input columns are either the null byte or zero depending on the NullHandling.replaceWithDefault().

@ferristseng ferristseng force-pushed the feature-hadoop-index-numeric-dim branch from 392f3ae to 7ad1d91 Compare February 28, 2019 15:08
@ferristseng
Copy link
Contributor Author

Thanks for taking a look at this!

I added a test to InputRowSerdeTest that just makes sure the default value or null byte are written in the expected positions in the serialized array.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding a test 👍

@jon-wei jon-wei added the Bug label Feb 28, 2019
@jon-wei jon-wei modified the milestone: 0.14.0 Feb 28, 2019
// Write the null byte only if the default numeric value is still null.
if (ret == null) {
out.writeByte(NullHandling.IS_NULL_BYTE);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the extra blank line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@@ -190,7 +215,7 @@ public void serialize(ByteArrayDataOutput out, Object value)
@Override
public Long deserialize(ByteArrayDataInput in)
{
return in.readLong();
return isNullByteSet(in) ? null : in.readLong();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be better to use a functional programming style here.

return Optional.ofNullable(in)
               .filter(InputRowSerde::isNotNullByteSet)
               .map(ByteArrayDataInput::readLong)
               .get();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I sort of prefer it the way it currently is, seems clearer to me, is there any reason it would be better other than preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I prefer the functional style because it makes the code more readable. If we don't use Optional, then we need add the @Nullable annotation for this method. It's up to you. 😅

Copy link
Contributor Author

@ferristseng ferristseng Mar 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the non-functional style. Also, maybe I'm misunderstanding, but wouldn't the get() cause the code to throw if the null byte is set?

I'll add @Nullable annotations to these deserialize methods

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but wouldn't the get() cause the code to throw if the null byte is set?

@ferristseng If the null byte is set, then get will return a null value. What you describe should be the orElseThrow function. Thanks for your contribution.

@@ -229,7 +249,7 @@ public void serialize(ByteArrayDataOutput out, Object value)
@Override
public Float deserialize(ByteArrayDataInput in)
{
return in.readFloat();
return isNullByteSet(in) ? null : in.readFloat();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

return Optional.ofNullable(in)
               .filter(InputRowSerde::isNotNullByteSet)
               .map(ByteArrayDataInput::readFloat)
               .get();

@@ -268,7 +283,7 @@ public void serialize(ByteArrayDataOutput out, Object value)
@Override
public Double deserialize(ByteArrayDataInput in)
{
return in.readDouble();
return isNullByteSet(in) ? null : in.readDouble();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

return Optional.ofNullable(in)
               .filter(InputRowSerde::isNotNullByteSet)
               .map(ByteArrayDataInput::readDouble)
               .get();

Copy link
Member

@asdf2014 asdf2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM 👍 Also I left a few suggestions.

@gianm
Copy link
Contributor

gianm commented Mar 12, 2019

This has two approvals -- merging it.

@gianm gianm merged commit c503ba9 into apache:master Mar 12, 2019
clintropolis pushed a commit that referenced this pull request Apr 24, 2019
* write null byte in hadoop indexing for numeric dimensions

* Add test case to check output serializing null numeric dimensions

* Remove extra line

* Add @nullable annotations
@clintropolis clintropolis added this to the 0.14.1 milestone Apr 24, 2019
gianm pushed a commit to implydata/druid-public that referenced this pull request May 10, 2019
…7020)

* write null byte in hadoop indexing for numeric dimensions

* Add test case to check output serializing null numeric dimensions

* Remove extra line

* Add @nullable annotations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants