Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#4317 Feature/variable length bytes offline dictionary for indexing bytes and string dicts. #4321

Merged

Conversation

buchireddy
Copy link
Contributor

@buchireddy buchireddy commented Jun 14, 2019

Implementation of the feature mentioned in #4317

Copy link
Member

@kishoreg kishoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Minor comments

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with a template implementation like this #4348
instead of duplicate logic?
IDE will re-factor out Buffer, rename, etc. to our heart's content.

@mcvsubbu
Copy link
Contributor

@buchireddy , @kishoreg please provide your comments on #4348. I need a ship-it from one of you

I have not addressed most of the comments by @mayankshriv which will be in the implementation of VarLengthBytesValueReaderWriter. Basically, my proposal is to build mutable and immutable versions of OffHeapByteArrayStore. Either we can derive two classes out of this and use them, or we can play around with the ctor of OffHeapByteArraystore to make it independent of memory manager etc. (caller allocates) and provide only a pnot-databuffer that is mutable (vs the immutable, which could then becoime an empty constructor with an init call). We can work around these things. All I want is one way to encode strings in a compact fashion. thanks.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please re-format the files with Pinot format introduced in #3705? Also, for member variables, we prefix the value with '_' and do not use 'this.' when accessing them.

@buchireddy
Copy link
Contributor Author

FTR, I was waiting to address review comments on this PR until now because @mcvsubbu was working on pulling out Buffer as a common class (PR #4348) which can be reused in this PR as well. I personally think that's a nice-to-have change and will help to reuse the byte array store code for both mutable and immutable dictionaries.
However, @Jackie-Jiang is working on refactoring/cleaning the mutable byte array store related code so #4348 isn't going to be merged soon. Also, @Jackie-Jiang and I discussed that the byte array store used for VariableLength dictionary is different enough from the regular mutable store that it's okay for VarLength dictionary to have it's own immutable bytes store. We can definitely find better optimizations for immutable byte store that may not be applicable for mutable store.
Hence, I am now going to work on addressing the review comments on this PR and will get it ready to be merged. @mcvsubbu @Jackie-Jiang Please reply if you guys think otherwise.
cc @kishoreg

@kishoreg
Copy link
Member

kishoreg commented Jul 2, 2019

@buchireddy Have you addressed all the comments? @Jackie-Jiang, can you take another look and approve.

@buchireddy
Copy link
Contributor Author

@kishoreg @Jackie-Jiang @mayankshriv I've addressed the review comments and pushed the latest code. Please take a look and LMK if you have any more comments.

since we don't expect it to be ever called.
we can't really reuse it so removing the ThreadLocal byte[].

Thanks to the unit test which caught this issue 👍
@codecov-io
Copy link

codecov-io commented Jul 3, 2019

Codecov Report

Merging #4321 into master will increase coverage by 9.12%.
The diff coverage is 71.42%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #4321      +/-   ##
============================================
+ Coverage     56.23%   65.35%   +9.12%     
  Complexity       20       20              
============================================
  Files          1061     1067       +6     
  Lines         54980    55526     +546     
  Branches       7824     7906      +82     
============================================
+ Hits          30916    36289    +5373     
+ Misses        21677    16665    -5012     
- Partials       2387     2572     +185
Impacted Files Coverage Δ Complexity Δ
...ot/core/segment/store/SegmentLocalFSDirectory.java 65.73% <0%> (+1.68%) 0 <0> (ø) ⬇️
.../core/segment/creator/ColumnIndexCreationInfo.java 93.54% <100%> (+7.34%) 0 <0> (ø) ⬇️
...manager/realtime/HLRealtimeSegmentDataManager.java 81.21% <100%> (+81.21%) 0 <0> (ø) ⬇️
...ment/creator/impl/SegmentColumnarIndexCreator.java 87.5% <100%> (+0.8%) 0 <0> (ø) ⬇️
...t/creator/impl/SegmentIndexCreationDriverImpl.java 89.71% <100%> (+1.74%) 0 <0> (ø) ⬇️
...loader/defaultcolumn/BaseDefaultColumnHandler.java 91.89% <100%> (+6.75%) 0 <0> (ø) ⬇️
...manager/realtime/LLRealtimeSegmentDataManager.java 71.65% <100%> (+28.13%) 0 <0> (ø) ⬇️
...gment/index/readers/ImmutableDictionaryReader.java 90.47% <33.33%> (-3.79%) 0 <0> (ø)
...org/apache/pinot/common/config/IndexingConfig.java 53.57% <50%> (+0.3%) 0 <0> (ø) ⬇️
...indexsegment/generator/SegmentGeneratorConfig.java 61.96% <57.14%> (+9.95%) 0 <0> (ø) ⬇️
... and 394 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e543e9...ea66a11. Read the comment docs.

Copy link
Member

@kishoreg kishoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @mayankshriv, please review.

so that we don't need to make any assumptions about where the data starts.
Copy link
Contributor

@mayankshriv mayankshriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mayankshriv mayankshriv merged commit 49d8fa7 into apache:master Jul 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants