Ensure LeafCollector#finish is only called once on the main collector during drill-sideways #12642

gsmiller · 2023-10-09T18:12:10Z

Small bug fix where #finish can be called multiple times on the base collector during drill-sideways

… during drill-sideways

gf2121 · 2023-10-10T07:32:59Z

lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java

+
+    @Override
+    public void finish() throws IOException {
+      assertFalse(finished);


Maybe assert to make sure this method get called?

jpountz · 2023-10-10T08:08:31Z

lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java

+            new CollectorManager<>() {
+              @Override
+              public Collector newCollector() throws IOException {
+                return new Collector() {


Maybe make AssertingCollector public and use it here? It gives additional checks, like making sure finish() is called on a leaf before moving to the next one.

Yeah, I like that suggestion. Thanks!

I see you are using AssertingLeafCollector but there are some more interesting checks that only happen if using AssertingCollector, were there any challenges with using it?

We have only one segment here so maybe AssertingSearcher is also needed to guarantee#finish called :)

Good thoughts, thanks! I'll see if I can leverage AssertingCollector here. I don't think AssertingSearcher will be easy to use since drill-sideways requires bulk scoring all docs at once (i.e., 0 - INT_MAX) and AssertingBulkScorer may randomly try to score ranges (which results in an assertion error in DrillSidewaysScorer). I'll dig a bit more to see what's possible though.

Thanks @gsmiller for digging!

It is a bit pity that we can not introduce AssertingIndexSearcher here as we need it to ensure #finish called on the last LeafCollector. And if we have only one segment, the check could be lost :(

if we can accept the expose of AssertingCollector#hasFinishedCollectingPreviousLeaf, maybe tweak the asserting search logic like:

IndexSearcher searcher = new IndexSearcher(r) { @Override protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector) throws IOException { AssertingCollector assertingCollector = AssertingCollector.wrap(collector); super.search(leaves, weight, assertingCollector); assert assertingCollector.hasFinishedCollectingPreviousLeaf(); } };

Yeah, I'm kind of conflicted here to be honest. I'm hesitant to expose something like hasFinishedCollectingPreviousLeaf for this one testing use-case. Another option would be to somehow make AssertingBulkScorer "aware" of when it's allowed to score ranges of documents. That's the fundamental issue here is that we have this drill-sideways bulk scorer DrillSidewaysScorer that must score all docs at once, and cannot score ranges. This really just exposes some fundamental design frictions with drill-sideways, but I don't want to spiral this change into something much bigger than it needs to be (i.e., I acknowledge that we might benefit from a rethinking of how we expose drill-sideways functionality, but that's a much bigger task).

So... things we can do here without turning this into much more work:

Expose hasFinishedCollectingPreviousLeaf as you suggest

Skip using this "asserting" family of classes and just implement our own wrapper classes specific to this test that are meant to just test the calling of finish

Add a new method to the BulkScorer API that communicates whether-or-not doc range scoring is legal, then make AssertingBulkScorer aware of that new method and have it check whether-or-not it can try scoring a range of docs

Have AssertingBulkScorer check the instance type of the bulk scorer it's wrapping, which would require exposing the definition of DrillSidewaysScorer beyond the facets package (i.e., make it public)

I don't like #3; adding a new method to the BulkScorer definition feels like overkill to solve this. I also don't really like #1 or #4 since they both require visibility modifications just for this test, but I think #1 is better than #4 since it only impacts testing code and not production code. I also don't love #2 since we miss out on other nice checks.

I think I'm convinced that your suggestion of exposing hasFinishedCollectingPreviousLeaf is the least of all evils here. I'll give that a try and let's see what we think.

Thanks @gsmiller , conflicted +1 :)

I think the root cause here is that AssertingCollector can not be perfectly self-contained —— it requires AssertingIndexSeacher to do some additional check. As we are considering to make AssertingCollector public, we should at least expose something to allow users that want to use public AssertingCollector directly without AssertingIndexSearcher to do the additional check. Maybe there's a more elegant way than exposing hasFinishedCollectingPreviousLeaf directly, but I'm not sure what it is.

gf2121 · 2023-10-11T05:24:05Z

lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java

+                      @Override
+                      public LeafCollector getLeafCollector(LeafReaderContext context)
+                          throws IOException {
+                        return new LeafCollector() {


Maybe simplify the layer a bit with org.apache.lucene.search.SimpleCollector :)

Yeah fair. I think I'll actually use a CollectorManager that's already defined for drill-sideways testing to simplify even further. We don't really need to collect anything for this test, but it's easier to just use it and not do all this custom setup.

gf2121

Thank you!

… during drill-sideways (#12642)

gsmiller added 2 commits October 9, 2023 15:06

Ensure LeafCollector#finish is only called once on the main collector…

4d1f118

… during drill-sideways

changes

e4a2aa5

gf2121 approved these changes Oct 10, 2023

View reviewed changes

jpountz reviewed Oct 10, 2023

View reviewed changes

gsmiller added 4 commits October 10, 2023 09:12

use AssertingLeafCollector for testing

c5dad6b

more asserting

db73b1e

back out asserting leaf collector vis changes

602a1f7

no asserting index searcher

12cca1c

gf2121 reviewed Oct 11, 2023

View reviewed changes

gsmiller added 2 commits October 12, 2023 09:36

own final finish check

a008c35

simplify testing

185d3f3

gf2121 approved these changes Oct 12, 2023

View reviewed changes

gsmiller merged commit 7b7b0d2 into apache:main Oct 13, 2023
4 checks passed

gsmiller deleted the GH/ds-idempotent-finish branch October 13, 2023 14:24

gsmiller added a commit that referenced this pull request Oct 13, 2023

Ensure LeafCollector#finish is only called once on the main collector…

fd126bb

… during drill-sideways (#12642)

gsmiller added this to the 9.9.0 milestone Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure LeafCollector#finish is only called once on the main collector during drill-sideways #12642

Ensure LeafCollector#finish is only called once on the main collector during drill-sideways #12642

gsmiller commented Oct 9, 2023

gf2121 Oct 10, 2023

jpountz Oct 10, 2023

gsmiller Oct 10, 2023

jpountz Oct 10, 2023

gf2121 Oct 10, 2023

gsmiller Oct 10, 2023

gf2121 Oct 11, 2023

gsmiller Oct 12, 2023

gf2121 Oct 12, 2023

gf2121 Oct 11, 2023

gsmiller Oct 12, 2023

gf2121 left a comment

Ensure LeafCollector#finish is only called once on the main collector during drill-sideways #12642

Ensure LeafCollector#finish is only called once on the main collector during drill-sideways #12642

Conversation

gsmiller commented Oct 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gf2121 left a comment

Choose a reason for hiding this comment