Solr 8127 backport distributed luke#4472
Conversation
* Fans out to one replica per shard by default when in Solr Cloud mode as well as with `shards` explicitly specified in non-Cloud mode * Any index information that can't be aggregated, i.e. directory, version, indexCommit, etc., will be placed for every *responding* shard in a new shards response field. This only gets returned when shards.info=true * docs and docCount were widened to long as they can now overflow. For javabin codec compatibility the server will narrow these to int for old calling SolrClients (when it is safe to do so) * Previously show=doc mode would error if it couldn't find a matching doc but now returns an empty response and a 200 status code * show=doc in distributed mode works only with Solr document Id but not with lucene docId, i.e. "id=..." works but "docId=..." does not. * When in distributed mode Luke handler will validate index and schema flags of each field for consistency and error with an informative message in case of any mismatch. * You can go back to the old, non-distributed behavior in Cloud mode by specifying distrib=false * For single-sharded Solr Clouds there is no behavior change (this is a special case). Co-authored-by: David Smiley <dsmiley@apache.org> (cherry picked from commit cac69ae)
| HttpSolrCall call = req.getHttpSolrCall(); | ||
| if (call == null) return false; | ||
| SolrVersion clientVersion = call.getUserAgentSolrVersion(); | ||
| return clientVersion != null && clientVersion.lessThan(DISTRIB_LONG_COUNTS_MIN_VERSION); |
There was a problem hiding this comment.
One issue here is that this only works for javabin clients, but a client requesting wt=xml may still get back an unexpected type, i.e. <long name="docs">10000</long> instead of <int name="docs">10000</int>. You can of course go back to the old, undistributed behavior via distrib=false. Btw I found it amusing that you can hijack the solrj-version headers for the xml wt so it respects the narrowing while writing out to xml instead of javabin, i.e.:
curl -s -H 'User-Agent: Solr[org.apache.solr.client.solrj.impl.Http2SolrClient] 9.10.0' \ 'http://localhost:8983/solr/luke_test/admin/luke?wt=xml&shards.info=true'
vs
curl -s -H 'User-Agent: Solr[org.apache.solr.client.solrj.impl.Http2SolrClient] 9.11.0' \ 'http://localhost:8983/solr/luke_test/admin/luke?wt=xml&shards.info=true'
But obviously that is an odd thing to recommend to users so I'd stick to recommending reverting to old behavior via distrib=false as is documented. Perhaps it is worth rewriting this more explicitly:
To revert to old, pre-distributed behavior just pass distrib=false
I just worry about backwards compat and breaking people going from a popular and stable 9.10 version.
There was a problem hiding this comment.
It's not clear to me why the response writer (wt) would foil our attempts to return a compatible response.
There was a problem hiding this comment.
I didn't mean it was an issue with the wt itself, just that to test the old client narrowing behavior I hardcoded the java client version string into a regular curl request header with wt=xml because I realized that the xml writer response includes long/int datatype in the xml tag of the field. But this of course can break callers requesting xml with a non-java client who expect docs to be in an int tag and not a long tag. So I'm just calling this out as a backwards incompatible flow.
| == Distributed Mode (multiple shards) | ||
|
|
||
| When running in SolrCloud, the Luke handler automatically distributes requests across all shards in the collection, the same as search requests. | ||
| To inspect only the receiving shard's index set `distrib=false`. |
There was a problem hiding this comment.
As mentioned previously, perhaps it is worth rewriting this more explicitly:
To revert to old, pre-distributed behavior just pass distrib=false
There was a problem hiding this comment.
I should say, to support older clients without updating them we could also recommend a Solr deployment configure luke in solrconfig.xml with a default of distrib=false. But I don't think that belongs on this page; that's an upgrade page matter.
There was a problem hiding this comment.
This is not a bad idea, I will look to mention this on the upgrade page as part of this change.
There was a problem hiding this comment.
@dsmiley I added a section to 9.11 major changes to incorporate this idea. I apologize about the force pushes but I pushed the wrong branch by accident.
b767b6e to
d72a3f8
Compare
d72a3f8 to
f94f4a2
Compare
Description
Backport #4149
Tests
Aside from the automated tests already included I manually built this and sanity checked some requests against a locally built, multi-shard collection. I manually tested the narrowing logic by tweaking the headers and requesting
wt=xmlwhich passes along the type.