New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduced the notion of a FixedBitSetFilter that guarantees to produce a FixedBitSet #7037
Conversation
|
||
/** | ||
* Indicates that this filter returns a {@link org.apache.lucene.search.DocIdSet} implementation that supports | ||
* random access. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it guarantees more than that, it guarantees that a FixedBitSet is returned, so maybe this class should even be called FixedBitSetFilter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, make sense. I will then rename the service to FixedBitSetFilterService.
Left some comments, but I really like the idea of having a different caching infrastructure for these filters with eager loading, no evictions and the warranty to get fixed bit sets! |
@jpountz I've updated the PR and addressed your comments. In addition I moved the FixedBitSetCache to the index.cache package and add |
|
||
@Override | ||
public void onClose(Object ownerCoreCacheKey) { | ||
loadedFilters.invalidate(ownerCoreCacheKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work? (given that loadedFilters
uses a Key object as key while you are invalidating with a reader core key here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops... that wouldn't work
@martijnvg Left a couple more comments. I think it would be nice to also check the size of this cache after integration tests finish to make sure it gets cleaned when indices are closed? |
…uce a FixedBitSet. Nested and parent/child rely on the fact that type filters produce a FixedBitSet, the RandomAccessFilter does this. Also if nested and parent/child is configured the type filters are eagerly loaded by default. Closes elastic#7037 Closes elastic#7031
@jpountz Thanks for looking into this. I update the PR to address your comments. |
LGTM Maybe @kimchy should look at the caching logic before getting this in? |
LGTM, I would add some docs to the FixedBitSet cache that its intentionally not bounded (by size or time), and that it should be used very carefully only for components that require a FixedBitSet that is always there. |
@martijnvg I think it would be nice to merge this one as it would help clean up a couple of other nested and parent/child related changes? |
@jpountz I totally agree and I'll merge it in this week before I work on any other major PR. |
…itSet and does evict based on size or time. Only when segments are merged away due to merging then entries in this cache are cleaned up. Nested and parent/child rely on the fact that type filters produce a FixedBitSet, the FixedBitSetFilterCache does this. Also if nested and parent/child is configured the type filters are eagerly loaded by default via the FixedBitSetFilterCache. Closes elastic#7037 Closes elastic#7031
0d919d6
to
531d6ab
Compare
…itSet and does evict based on size or time. Only when segments are merged away due to merging then entries in this cache are cleaned up. Nested and parent/child rely on the fact that type filters produce a FixedBitSet, the FixedBitSetFilterCache does this. Also if nested and parent/child is configured the type filters are eagerly loaded by default via the FixedBitSetFilterCache. Closes elastic#7037 Closes elastic#7031
…itSet and does evict based on size or time. Only when segments are merged away due to merging then entries in this cache are cleaned up. Nested and parent/child rely on the fact that type filters produce a FixedBitSet, the FixedBitSetFilterCache does this. Also if nested and parent/child is configured the type filters are eagerly loaded by default via the FixedBitSetFilterCache. Closes #7037 Closes #7031
…cIdSet. A few months ago, Lucene switched from FixedBitSet to WAH8DocIdSet in order to cache filters. WAH8DocIdSet is especially better when dealing with sparse sets: iteration is faster, memory usage is lower and there is an index that helps keep advance fast. This doesn't break existing code since elastic#6280 already made sure that there was no abusive cast from DocIdSet to Bits or FixedBitSet and elastic#7037 moved consumers of the filter cache that absolutely need to get fixed bitsets to their own cache. Since cached filters will be more memory-efficient, the filter cache size has been decreased from 10 to 5%. Although smaller, this might still allow to cache more filters.
…itSet and does evict based on size or time. Only when segments are merged away due to merging then entries in this cache are cleaned up. Nested and parent/child rely on the fact that type filters produce a FixedBitSet, the FixedBitSetFilterCache does this. Also if nested and parent/child is configured the type filters are eagerly loaded by default via the FixedBitSetFilterCache. Closes #7037 Closes #7031
Nested and parent/child rely on the fact that type filters produce a FixedBitSet, the RandomAccessFilter does this. By moving away from filter cache this filters will also never be evicted because of LRU reasons, which is what is desired for nested and parent/child.
Also if nested and parent/child is configured the type filters are eagerly loaded by default.
PR for #7031