Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-15595: Partial results from shard queries needlessly discarded for queries without sort fields #267

Merged
merged 7 commits into from
Sep 3, 2021

Conversation

makosten
Copy link
Contributor

@makosten makosten commented Aug 18, 2021

https://issues.apache.org/jira/browse/SOLR-15595

Description

Partial shard results due to timeouts are not merged into the final query results. This is unnecessary for queries sorted only by score and/or docid.

Solution

No longer exit the shard merge early when partial results are returned for queries sorted just by score or docid.

Tests

Additional tests added for SortSpecParser for detecting that the sort only includes the score and/or docid.
Added unit test class for QueryComponent.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

sortFieldValues = new NamedList<>();
}
// skip merging results for this shard if the sortSpec includes a non-scoredoc field but the sortFieldValues is empty.
if (thisResponseIsPartial && sortFieldValues.size() == 0 && ss.includesNonScoreOrDocField()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's interesting to me that we do a similar check later on line 1122, not sure if we can combine or remove some of this logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unmarshalSortValues method just returns an empty list if the sort if by score and/or docid and no other field. I could revise the mergeIds method to not call it for this case. The only thing that gives me pause is that unmarshalSortValues is protected, so presumably its being overridden someplace? It doesn't appear to be anywhere in the code base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding skipping calling unmarshalSortValues if it is a no-op.

@epugh
Copy link
Contributor

epugh commented Aug 24, 2021

The test that @makosten added looks good to me, thoughts @madrob ?

import org.junit.Test;
import org.mockito.Mockito;

import java.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no wildcard imports please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed import wildcard.

Comment on lines 209 to 210
solrDocument.addField("score", id * 1.1F);
solrDocument.addField(SORT_FIELD_NAME, id * 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these values actually matter? What if we made them constant. Or equal to id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used just the id since the values don't matter,

assertEquals((shouldIncludePartialShardResult ? shard1Size : 0) + shard2Size, responseBuilder.getResponseDocs().size());
}

private static class MockResponseBuilder extends ResponseBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost feel like all the Mock* classes shouldn't be in here, but could have their own classes. It looks like the convention we use is to do inner Mock classes when they are nearly trivial and give them their own files otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted the Mock classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock implementations are very bespoke to the new tests, but this way others can extend them.

Comment on lines 89 to 98
// shard 1 is marked partial
NamedList<Object> responseHeader1 = new NamedList<>();
responseHeader1.add(SolrQueryResponse.RESPONSE_HEADER_PARTIAL_RESULTS_KEY, Boolean.TRUE);

// shard 2 is not partial
NamedList<Object> responseHeader2 = new NamedList<>();

MockShardRequest shardRequest = MockShardRequest.create()
.withShardResponse(responseHeader1, createSolrDocumentList(shard1Size))
.withShardResponse(responseHeader2, createSolrDocumentList(shard2Size));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be pulled out into a field that only gets created once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 961 to 966
boolean needsUnmarshalling = ss.includesNonScoreOrDocField();
// skip merging results for this shard if the sortSpec sort values need to be marshalled but the sortFieldValues is empty.
if (thisResponseIsPartial && sortFieldValues.size() == 0 && needsUnmarshalling) {
continue;
}
NamedList<List<Object>> unmarshalledSortFieldValues = needsUnmarshalling ? unmarshalSortValues(ss, sortFieldValues, schema) : new NamedList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this needs more comments, or I'm being dense here. I have to mentally trace the code every time I come back to it and convince myself that it's right (which I'm sure it is because I trust you've done the testing, not because I actually understand it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments. Hope they help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do! I am super happy with these, and they help me understand it much better than I thought I got it before.

…st and use a singleton for test ShardRequest.
@@ -0,0 +1,36 @@
package org.apache.solr.handler.component;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh!

@@ -0,0 +1,54 @@
package org.apache.solr.handler.component;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license header

Comment on lines 961 to 966
boolean needsUnmarshalling = ss.includesNonScoreOrDocField();
// skip merging results for this shard if the sortSpec sort values need to be marshalled but the sortFieldValues is empty.
if (thisResponseIsPartial && sortFieldValues.size() == 0 && needsUnmarshalling) {
continue;
}
NamedList<List<Object>> unmarshalledSortFieldValues = needsUnmarshalling ? unmarshalSortValues(ss, sortFieldValues, schema) : new NamedList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do! I am super happy with these, and they help me understand it much better than I thought I got it before.

Copy link
Contributor

@madrob madrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending the two missing headers, and precommit/tests coming back happy

@epugh
Copy link
Contributor

epugh commented Aug 26, 2021

I just triggered the precommit check ;-)

@epugh
Copy link
Contributor

epugh commented Aug 27, 2021

What do we think @madrob @makosten ? Ready for merging? It would be nice to get this into 8.10! Do we need to update the ref guide?

@madrob
Copy link
Contributor

madrob commented Sep 2, 2021

LGTM, yes!

epugh@opensourceconnections.com and others added 2 commits September 3, 2021 17:57
@epugh epugh merged commit 812de21 into apache:main Sep 3, 2021
epugh pushed a commit to epugh/solr that referenced this pull request Oct 22, 2021
…or queries without sort fields (apache#267)

* No longer skip merging partial shard results for non-sort queries

* Add unit test for QueryComponent. Add javadoc for new SortSpec method. Refactor QueryComponent changes.

* Improve javadocs. Fix parameter name in unit test class method.

* Add comments to QueryComponent. Extract inner test class for added test and use a singleton for test ShardRequest.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants