Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17289: OrderedNodePlacementPlugin: optimize don't loop collections #2459

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dsmiley
Copy link
Contributor

@dsmiley dsmiley commented May 14, 2024

https://issues.apache.org/jira/browse/SOLR-17289

Not sure if the first version here is right; maybe there are untested issues where this won't work? And maybe with withCollection/withShards can be made to be scalable too. Like init the replicas of those collections only (not the whole cluster!)

@dsmiley
Copy link
Contributor Author

dsmiley commented May 14, 2024

Separately, I wonder if we make it too easy to loop the state of every collection. I'm looking at Cluster added by @murblanc which contains not only an Iterator<SolrCollection> iterator(); method, but also Iterable<SolrCollection> collections(); to make it that much easier. Instead, let's just have a method to list collection names. Then if the caller is hell bent on looping everything in the cluster, it's going to be that much more obvious to that code that it's looking up collection info for each and every one.

@murblanc
Copy link
Member

Separately, I wonder if we make it too easy to loop the state of every collection. I'm looking at Cluster added by @murblanc which contains not only an Iterator<SolrCollection> iterator(); method, but also Iterable<SolrCollection> collections(); to make it that much easier. Instead, let's just have a method to list collection names. Then if the caller is hell bent on looping everything in the cluster, it's going to be that much more obvious to that code that it's looking up collection info for each and every one.

org.apache.solr.cluster.Cluster is made to present the internal cluster abstraction to plugin writers in order to decouple plugins from the internal implementation (so we can change the abstractions without breaking plugins).

The existing internal cluster abstraction does allow listing all collections, as does the Collection API BTW. Instead of shooting the messenger (org.apache.solr.cluster.Cluster) we could reconsider if listing all collections in general makes sense. Obviously listing all collections does not scale, but most SolrCloud deployments do not have such scaling issues.

If we do think that listing all collections is useful (which you seem to agree to given the proposal to return all names), I'd rather have the API we offer to plugin writers be easy to use. Returning names and forcing the caller to go fetch the collection one by one is not convenient.

@aparnasuresh85
Copy link
Contributor

We discovered a severe issue during QA for this change, where although the fix placed replicas faster by 90% on avg, replicas were consistently placed on only a few nodes. In our case, they were always placed on the same two nodes, likely due to the replication factor.

@dsmiley
Copy link
Contributor Author

dsmiley commented May 17, 2024

Yeah the results were disappointing from a placement diversity standpoint, which is a total deal-breaker. Perhaps a bit of randomness layered onto the placement would help with placement diversity? But I confess this really is just a draft PR; I didn't try to deeply understand why all replicas get weighted. I was encouraged to see all tests pass, so clearly there's a test gap that would allow this change to go in yet be quite flawed. I found no tests specific to SimplePlacementPlugin, the one we used with the change here.

We're going a different direction that does not use an OrderedNodePlacementPlugin foundation; we will not return to this matter to fix it, unfortunately.

RE Collection listing: IMO it should definitely continue to be supported. My objection is producing java.util.Collection or Iterable or Map of basically any aggregate of the state of a collection (e.g. SolrCollectoin, DocCollection, etc.). List collections by name, then force the caller to resolve a name to a state if it must. A bit of API friction can be a good thing where we know there are performance issues. I suppose we shall agree to disagree as usual Ilan.

@epugh
Copy link
Contributor

epugh commented May 22, 2024

One thought I had the other day is that I've seen plenty of API's that when you list things have a range.... Without the range or size parameter, you get X, but you can control that by specifying some other counter.... Would that allow folks with just a small number of collections have simplicity, but if you have 1000's, well then you want to use a range to work your way through? Kind of likes rows and start ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants