Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write null byte when indexing numeric dimensions with Hadoop #7020

Merged
merged 4 commits into from Mar 12, 2019

Conversation

Projects
None yet
5 participants
@ferristseng
Copy link
Contributor

commented Feb 6, 2019

I noticed a couple of comments that hadn't been addressed in the Hadoop Indexing project regarding serializing and deserializing null numeric values, so I figured I would try to tackle it. I'm not super familiar with the internals of Druid, so let me know if I need to change code elsewhere.

Also, I ran the existing tests in the Hadoop Indexing project with -Ddruid.generic.useDefaultValueForNull=false, and they still passed. Let me know if I need to add additional ones!

@clintropolis
Copy link
Member

left a comment

Thanks for the contribution! (and apologies it has taken so long for a review)

I think this is reasonable, it's the same approach being used for the metrics columns.

I think it would be nice to add a test in InputRowSerdeTest to cover this. All of the tests in travis are run with and without sql null compatibility, so you can probably just write one test that can assert that null valued input columns are either the null byte or zero depending on the NullHandling.replaceWithDefault().

@ferristseng ferristseng force-pushed the ferristseng:feature-hadoop-index-numeric-dim branch from 392f3ae to 7ad1d91 Feb 28, 2019

@ferristseng

This comment has been minimized.

Copy link
Contributor Author

commented Feb 28, 2019

Thanks for taking a look at this!

I added a test to InputRowSerdeTest that just makes sure the default value or null byte are written in the expected positions in the serialized array.

@clintropolis
Copy link
Member

left a comment

LGTM, thanks for adding a test 👍

@jon-wei jon-wei added the Bug label Feb 28, 2019

@jon-wei jon-wei modified the milestone: 0.14.0 Feb 28, 2019

// Write the null byte only if the default numeric value is still null.
if (ret == null) {
out.writeByte(NullHandling.IS_NULL_BYTE);

This comment has been minimized.

Copy link
@asdf2014

asdf2014 Mar 6, 2019

Member

Please remove the extra blank line.

This comment has been minimized.

Copy link
@ferristseng

ferristseng Mar 7, 2019

Author Contributor

Will do.

@@ -190,7 +215,7 @@ public void serialize(ByteArrayDataOutput out, Object value)
@Override
public Long deserialize(ByteArrayDataInput in)
{
return in.readLong();
return isNullByteSet(in) ? null : in.readLong();

This comment has been minimized.

Copy link
@asdf2014

asdf2014 Mar 6, 2019

Member

Perhaps it would be better to use a functional programming style here.

return Optional.ofNullable(in)
               .filter(InputRowSerde::isNotNullByteSet)
               .map(ByteArrayDataInput::readLong)
               .get();

This comment has been minimized.

Copy link
@clintropolis

clintropolis Mar 6, 2019

Member

Eh, I sort of prefer it the way it currently is, seems clearer to me, is there any reason it would be better other than preference?

This comment has been minimized.

Copy link
@asdf2014

asdf2014 Mar 6, 2019

Member

Alright, I prefer the functional style because it makes the code more readable. If we don't use Optional, then we need add the @Nullable annotation for this method. It's up to you. 😅

This comment has been minimized.

Copy link
@ferristseng

ferristseng Mar 7, 2019

Author Contributor

I think I prefer the non-functional style. Also, maybe I'm misunderstanding, but wouldn't the get() cause the code to throw if the null byte is set?

I'll add @Nullable annotations to these deserialize methods

This comment has been minimized.

Copy link
@asdf2014

asdf2014 Mar 8, 2019

Member

but wouldn't the get() cause the code to throw if the null byte is set?

@ferristseng If the null byte is set, then get will return a null value. What you describe should be the orElseThrow function. Thanks for your contribution.

@@ -229,7 +249,7 @@ public void serialize(ByteArrayDataOutput out, Object value)
@Override
public Float deserialize(ByteArrayDataInput in)
{
return in.readFloat();
return isNullByteSet(in) ? null : in.readFloat();

This comment has been minimized.

Copy link
@asdf2014

asdf2014 Mar 6, 2019

Member

Same.

return Optional.ofNullable(in)
               .filter(InputRowSerde::isNotNullByteSet)
               .map(ByteArrayDataInput::readFloat)
               .get();
@@ -268,7 +283,7 @@ public void serialize(ByteArrayDataOutput out, Object value)
@Override
public Double deserialize(ByteArrayDataInput in)
{
return in.readDouble();
return isNullByteSet(in) ? null : in.readDouble();

This comment has been minimized.

Copy link
@asdf2014

asdf2014 Mar 6, 2019

Member

Same.

return Optional.ofNullable(in)
               .filter(InputRowSerde::isNotNullByteSet)
               .map(ByteArrayDataInput::readDouble)
               .get();
@asdf2014
Copy link
Member

left a comment

Overall LGTM 👍 Also I left a few suggestions.

@gianm

This comment has been minimized.

Copy link
Contributor

commented Mar 12, 2019

This has two approvals -- merging it.

@gianm gianm merged commit c503ba9 into apache:master Mar 12, 2019

2 checks passed

Inspections: pull requests (Druid) TeamCity build finished
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.