-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build v9 directly #2138
build v9 directly #2138
Conversation
flattener.write(StupidResourceHolder.create(endBuffer)); | ||
} | ||
endBuffer = null; | ||
flattener.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move close to a finally block.
@KurtYoung This is an awesome PR and one that I know several folks want to code review. Everything is going to be slow during the Christmas holidays, but this will get attention after new years. |
👍, seems good to go after a minor nitpick for closing in finally block. |
added IndexMergerV9, changed some low level interfaces but totally compatible with the old way. Some explanation and thoughts here:
And here are some points I think should be discussed with your guys when writing the codes: |
54725a7
to
8daf783
Compare
if (!endBuffer.hasRemaining()) { | ||
endBuffer.rewind(); | ||
flattener.write(StupidResourceHolder.create(endBuffer)); | ||
endBuffer.rewind(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New two IntBuffer to reuse, when after write switch them to make sure GenericIndexedWriter can sort correct.
hmm...I see the point why the null dict value and null row set are handled both in IndexMaker and IndexIO's converting. My previous decision about skippedDimension & nullSet are wrong, just ignore it. Working on this now... |
Found a bug of merge & maker about dimension orders: |
Is it ok to insert null to every dimension's dictionary even if the dimension did not contain any null values? Update: Found a way to deal with null value now, but it's a little tricky(there are comments in IncrementalIndexAdapter) and easy to create inconsistency(IncrementalIndexStorageAdapter also rely on IncrementalIndex but does not have this logic right now or it's just does not need this right now). I think proposal above is a possible and easy solution, what do you guys think? |
Wishfully -1 could be used for null and number of distincts would be specified for each dimension in meta. Would be possible? |
@KurtYoung: are there any corresponding changes in the filters/query path for null handling in case we add null to every dimension dictionary. |
@nishantmonu51 I'm also aware of this, the current implementation did not add null to each dimension but handled null value in both IncrementalIndexAdapter and IndexMergerV9. |
@KurtYoung there's been a lot of optimizations in the old index merger over the last few months. Are those optimizations incorporated in building the v9 segment directly? |
@@ -67,11 +65,10 @@ public Object extractValue(InputRow inputRow, String metricName) | |||
} | |||
|
|||
@Override | |||
public ColumnPartSerde deserializeColumn(ByteBuffer buffer, ColumnBuilder builder) | |||
public void deserializeColumn(ByteBuffer buffer, ColumnBuilder builder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are changing the behavior of this method, can we please add a comment on the interface about how the method is supposed to be used?
@KurtYoung I can't find the logic where you actually use index merger v9 instead of index merger |
@KurtYoung I think the best way to think about how to handle nulls/empty strings in Druid is described in this PR: #995 |
I did a first pass over this PR but didn't go into detail for IndexMergerV9. Will look into more once we that know it reasonably works. High level I'm on board with the changes. |
@fjy Actually, I did not change any logic to use IndexMergerV9 now, but I switch the current IndexMerger's logic to IndexMergerV9's and make all the test cases passes. |
@KurtYoung can't seem to make any comments for indexmergerv9 |
ComplexColumnPartSerde.legacySerializerBuilder() | ||
.withTypeName(complexType) | ||
.withDelegate(metricColumn) | ||
.build(), | ||
metBuilder, | ||
metric | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not able to make any comments for IndexMergerV9 below this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is IndexMergerV9 just a rename of IndexMerger.java?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, did you remove the v8 to v9 conversion step in IndexMerger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexMergerV9 make v9 index files directly, the main step are very like with IndexMerger except the v8 to v9 conversion step is no longer necessary.
@himanshug Any more comments? @KurtYoung Can we add some way to switch between IndexMerger and IndexMerger v9 in the configuration? Index Merger should be the default. |
@fjy I am 👍 once the configuration to switch to IndexMergerV9 is in place. |
I believe @cheddar should have a look at this one. He had a lot of opinions about format when I made changes to introduce dimension compression, and this one introduces even more changes. I also agree with @gianm we should be able to switch between implementations until it has been verified to be production ready. |
5d4423f
to
75fdd58
Compare
Added "buildV9Directly" option to TuningConfig, docs are updated |
@himanshug @xvrl we good to move forward? |
👍 for me |
👍 for me too. Will leave this open until tomm to see if anyone else has comments. @KurtYoung have you filled out the CLA: http://druid.io/community/cla.html You guys might consider a corporate CLA. |
@fjy @KurtYoung pls squash the commits / cleanup the history, very useful contribution. |
@fjy @KurtYoung I might be wrong, but I still see some comments outstanding. Can we respond or address them? |
@fjy I have filled out individual CLA, don't know if i had the right to fill a corporate CLA. |
@KurtYoung thanks @xvrl any more comments? |
@xvrl All these comments had been addressed and solved. |
add unit tests for IndexMergerV9 and fix some bugs add more unit tests and fix bugs handle null values and add more tests minor changes & use LoggingProgressIndicator in IndexGeneratorReducer make some static class public from IndexMerger minor changes and add some comments changes for comments
75fdd58
to
82ff98c
Compare
squashed into 3 commits. |
@@ -191,6 +196,11 @@ public boolean getUseCombiner() | |||
return useCombiner; | |||
} | |||
|
|||
@JsonProperty | |||
public Boolean getBuildV9Directly() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since buildV9Directly is never null, this should probably be boolean
instead of Boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also renamed to isBuildV9Directly
This PR tracks the feature of building v9 directly which had been discussed in https://groups.google.com/forum/#!topic/druid-development/0CxhljSGeeo
We can divide this PR into 3 main parts:
Here are the classed which are doing the real things:
VSizeIndexedIntsWriter (single value, vsize encoded, not compressed)
CompressedIntsIndexedWriter (single value, not vsize encoded, compressed)
CompressedVSizeIntsIndexedWriter (single value, vsize encoded, compressed)
VSizeIndexedWriter (multi value, both offset and values are vsized, not compressed)
CompressedVSizeIndexedV3Writer (multi value, only values are vsized, compressed)
More details can be found here: https://groups.google.com/forum/#!topic/druid-development/0CxhljSGeeo
LongColumnSerializer (write long metrics)
FloatColumnSerializer (write float metrics)
ComplexColumnSerializer (writer complex metrics)