-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure LeafCollector#finish is only called once on the main collector during drill-sideways #12642
Conversation
… during drill-sideways
|
||
@Override | ||
public void finish() throws IOException { | ||
assertFalse(finished); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe assert to make sure this method get called?
new CollectorManager<>() { | ||
@Override | ||
public Collector newCollector() throws IOException { | ||
return new Collector() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make AssertingCollector public and use it here? It gives additional checks, like making sure finish() is called on a leaf before moving to the next one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like that suggestion. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you are using AssertingLeafCollector
but there are some more interesting checks that only happen if using AssertingCollector
, were there any challenges with using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have only one segment here so maybe AssertingSearcher is also needed to guarantee#finish called :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thoughts, thanks! I'll see if I can leverage AssertingCollector
here. I don't think AssertingSearcher
will be easy to use since drill-sideways requires bulk scoring all docs at once (i.e., 0 - INT_MAX) and AssertingBulkScorer
may randomly try to score ranges (which results in an assertion error in DrillSidewaysScorer). I'll dig a bit more to see what's possible though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gsmiller for digging!
It is a bit pity that we can not introduce AssertingIndexSearcher
here as we need it to ensure #finish
called on the last LeafCollector
. And if we have only one segment, the check could be lost :(
if we can accept the expose of AssertingCollector#hasFinishedCollectingPreviousLeaf
, maybe tweak the asserting search logic like:
IndexSearcher searcher = new IndexSearcher(r) {
@Override
protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector)
throws IOException {
AssertingCollector assertingCollector = AssertingCollector.wrap(collector);
super.search(leaves, weight, assertingCollector);
assert assertingCollector.hasFinishedCollectingPreviousLeaf();
}
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm kind of conflicted here to be honest. I'm hesitant to expose something like hasFinishedCollectingPreviousLeaf
for this one testing use-case. Another option would be to somehow make AssertingBulkScorer
"aware" of when it's allowed to score ranges of documents. That's the fundamental issue here is that we have this drill-sideways bulk scorer DrillSidewaysScorer
that must score all docs at once, and cannot score ranges. This really just exposes some fundamental design frictions with drill-sideways, but I don't want to spiral this change into something much bigger than it needs to be (i.e., I acknowledge that we might benefit from a rethinking of how we expose drill-sideways functionality, but that's a much bigger task).
So... things we can do here without turning this into much more work:
- Expose
hasFinishedCollectingPreviousLeaf
as you suggest - Skip using this "asserting" family of classes and just implement our own wrapper classes specific to this test that are meant to just test the calling of
finish
- Add a new method to the
BulkScorer
API that communicates whether-or-not doc range scoring is legal, then makeAssertingBulkScorer
aware of that new method and have it check whether-or-not it can try scoring a range of docs - Have
AssertingBulkScorer
check the instance type of the bulk scorer it's wrapping, which would require exposing the definition ofDrillSidewaysScorer
beyond the facets package (i.e., make it public)
I don't like #3; adding a new method to the BulkScorer
definition feels like overkill to solve this. I also don't really like #1 or #4 since they both require visibility modifications just for this test, but I think #1 is better than #4 since it only impacts testing code and not production code. I also don't love #2 since we miss out on other nice checks.
I think I'm convinced that your suggestion of exposing hasFinishedCollectingPreviousLeaf
is the least of all evils here. I'll give that a try and let's see what we think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gsmiller , conflicted +1 :)
I think the root cause here is that AssertingCollector
can not be perfectly self-contained —— it requires AssertingIndexSeacher
to do some additional check. As we are considering to make AssertingCollector
public, we should at least expose something to allow users that want to use public AssertingCollector
directly without AssertingIndexSearcher
to do the additional check. Maybe there's a more elegant way than exposing hasFinishedCollectingPreviousLeaf
directly, but I'm not sure what it is.
@Override | ||
public LeafCollector getLeafCollector(LeafReaderContext context) | ||
throws IOException { | ||
return new LeafCollector() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe simplify the layer a bit with org.apache.lucene.search.SimpleCollector
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair. I think I'll actually use a CollectorManager
that's already defined for drill-sideways testing to simplify even further. We don't really need to collect anything for this test, but it's easier to just use it and not do all this custom setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
… during drill-sideways (#12642)
Small bug fix where
#finish
can be called multiple times on the base collector during drill-sideways