Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure LeafCollector#finish is only called once on the main collector during drill-sideways #12642

Merged
merged 8 commits into from
Oct 13, 2023

Conversation

gsmiller
Copy link
Contributor

@gsmiller gsmiller commented Oct 9, 2023

Small bug fix where #finish can be called multiple times on the base collector during drill-sideways


@Override
public void finish() throws IOException {
assertFalse(finished);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe assert to make sure this method get called?

new CollectorManager<>() {
@Override
public Collector newCollector() throws IOException {
return new Collector() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make AssertingCollector public and use it here? It gives additional checks, like making sure finish() is called on a leaf before moving to the next one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like that suggestion. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you are using AssertingLeafCollector but there are some more interesting checks that only happen if using AssertingCollector, were there any challenges with using it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have only one segment here so maybe AssertingSearcher is also needed to guarantee#finish called :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thoughts, thanks! I'll see if I can leverage AssertingCollector here. I don't think AssertingSearcher will be easy to use since drill-sideways requires bulk scoring all docs at once (i.e., 0 - INT_MAX) and AssertingBulkScorer may randomly try to score ranges (which results in an assertion error in DrillSidewaysScorer). I'll dig a bit more to see what's possible though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gsmiller for digging!

It is a bit pity that we can not introduce AssertingIndexSearcher here as we need it to ensure #finish called on the last LeafCollector. And if we have only one segment, the check could be lost :(

if we can accept the expose of AssertingCollector#hasFinishedCollectingPreviousLeaf, maybe tweak the asserting search logic like:

IndexSearcher searcher = new IndexSearcher(r) {
  @Override
  protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector)
      throws IOException {
    AssertingCollector assertingCollector = AssertingCollector.wrap(collector);
    super.search(leaves, weight, assertingCollector);
    assert assertingCollector.hasFinishedCollectingPreviousLeaf();
  }
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm kind of conflicted here to be honest. I'm hesitant to expose something like hasFinishedCollectingPreviousLeaf for this one testing use-case. Another option would be to somehow make AssertingBulkScorer "aware" of when it's allowed to score ranges of documents. That's the fundamental issue here is that we have this drill-sideways bulk scorer DrillSidewaysScorer that must score all docs at once, and cannot score ranges. This really just exposes some fundamental design frictions with drill-sideways, but I don't want to spiral this change into something much bigger than it needs to be (i.e., I acknowledge that we might benefit from a rethinking of how we expose drill-sideways functionality, but that's a much bigger task).

So... things we can do here without turning this into much more work:

  1. Expose hasFinishedCollectingPreviousLeaf as you suggest
  2. Skip using this "asserting" family of classes and just implement our own wrapper classes specific to this test that are meant to just test the calling of finish
  3. Add a new method to the BulkScorer API that communicates whether-or-not doc range scoring is legal, then make AssertingBulkScorer aware of that new method and have it check whether-or-not it can try scoring a range of docs
  4. Have AssertingBulkScorer check the instance type of the bulk scorer it's wrapping, which would require exposing the definition of DrillSidewaysScorer beyond the facets package (i.e., make it public)

I don't like #3; adding a new method to the BulkScorer definition feels like overkill to solve this. I also don't really like #1 or #4 since they both require visibility modifications just for this test, but I think #1 is better than #4 since it only impacts testing code and not production code. I also don't love #2 since we miss out on other nice checks.

I think I'm convinced that your suggestion of exposing hasFinishedCollectingPreviousLeaf is the least of all evils here. I'll give that a try and let's see what we think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gsmiller , conflicted +1 :)

I think the root cause here is that AssertingCollector can not be perfectly self-contained —— it requires AssertingIndexSeacher to do some additional check. As we are considering to make AssertingCollector public, we should at least expose something to allow users that want to use public AssertingCollector directly without AssertingIndexSearcher to do the additional check. Maybe there's a more elegant way than exposing hasFinishedCollectingPreviousLeaf directly, but I'm not sure what it is.

@Override
public LeafCollector getLeafCollector(LeafReaderContext context)
throws IOException {
return new LeafCollector() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe simplify the layer a bit with org.apache.lucene.search.SimpleCollector :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fair. I think I'll actually use a CollectorManager that's already defined for drill-sideways testing to simplify even further. We don't really need to collect anything for this test, but it's easier to just use it and not do all this custom setup.

Copy link
Contributor

@gf2121 gf2121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@gsmiller gsmiller merged commit 7b7b0d2 into apache:main Oct 13, 2023
4 checks passed
@gsmiller gsmiller deleted the GH/ds-idempotent-finish branch October 13, 2023 14:24
gsmiller added a commit that referenced this pull request Oct 13, 2023
@gsmiller gsmiller added this to the 9.9.0 milestone Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants