Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding reader settings for moving fst offheap #601

Closed
wants to merge 0 commits into from

Conversation

jainankitk
Copy link

This change adds setting for moving fst offheap, specifically address JIRA-8671

Copy link
Member

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some question

/**
* Reader only attributes for this directory.
*/
private Map<String, String> readerAttributes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Directory is the right abstraction to hold this. I suggest we add an overload to DirectoryReader.open that looks like this:

  public static DirectoryReader open(final Directory directory, final Map<String, String> attributes) throws IOException;

//
  public static DirectoryReader open(final IndexCommit commit, final Map<String, String> attributes) throws IOException;

that allows us to have a dedicated map of attributes per open call. Subsequent reopens then inherit the attributes form the reader. You can also then make this a property of DirecotryReader I guess.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Directory is the right abstraction to hold this.

Is that because single directory can be opened in multiple contexts?

Subsequent reopens then inherit the attributes form the reader. You can also then make this a property of DirecotryReader I guess.

How is the attribute map passed to SegmentReadState in this approach? Also, IndexWriterConfig need not be changed with this approach, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that because single directory can be opened in multiple contexts?
I think there are multiple reasons I think it should be either bound to a single reader or writer. ie two calls to DirectoryReader.open can return readers with different fields on-heap even if they share the same directory.

How is the attribute map passed to SegmentReadState in this approach? Also, IndexWriterConfig need not be changed with this approach, right?

I don't think IWC needs to change. I think such a map can be passed down to SegmentCoreReaders and then passed on via SegmentReadState I am not sure why you think we need to change a lot of things?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think such a map can be passed down to SegmentCoreReaders and then passed on via SegmentReadState I am not sure why you think we need to change a lot of things?

Maybe, better way is to add readerAttributes in SegmentInfo, and populate that during StandardDirectoryReader.open?

* opening the index
*/
public IndexWriterConfig setReaderAttributes(Map<String, String> readerAttributes) {
this.readerAttributes = readerAttributes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one should be an unmodifiable map, you should also make a copy for the attributes passed in.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will make the change. Thanks!

// docCount != sumDocFreq implying field is not primary key
if (clone instanceof ByteBufferIndexInput && this.docCount != this.sumDocFreq) {

final boolean fstOffHeap = readerAttributes != null && "true".equals(readerAttributes.get(FST_OFFHEAP_ATTRIBUTE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want to have a per-field setting? I mean something like:

"true".equals(readerAttributes.get("blocktree.fst." + fieldInfo.name + ".offheap"));

this would be more flexible no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the plan initially, but given that non-PK eligible fields are offheap by default. Having single setting instead of per field setting makes more sense.

If false:
Only non-PK fields offheap
If true:
All fields offheap

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this? I mean there might be fields that can easily be off-heap while another one should be on-heap?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this? I mean there might be fields that can easily be off-heap while another one should be on-heap?

Having per field setting is unnecessary cognitive load for the user, given that non PK eligible fields are offheap by default. So, users want PK eligible fields either offheap or onheap.
Initially, I did try it with this patch, but found it to be unnecessary. There can be dynamic fields as well, for which you need special keyword, which needs to account for all combinations of offheap and onheap.

@s1monw
Copy link
Member

s1monw commented Mar 12, 2019

I tired a different way today and I wonder what you think of this #606 (note I am currently getting a 404 on it - seems a github issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants