New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-14680: Provide simple interfaces to our cloud classes (only API) #1694
Conversation
Can we start simple by defining the external interface without the required changes to internal classes first? Will make a smaller PR easier to discuss. |
I also feel this PR has a lot of overlap with the one I started to define an API for Autoscaling plugins (#1684). Unless we want to pursue two distinct options for these interfaces and decide later (perfectly OK if that’s the idea) I suggest we use the other PR that has also examples of how these interfaces are used (makes it easier to reason about them and decide what goes where). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that this specifies a read-only view. Made a comment regarding naming conventions elsewhere.
|
||
String baseUrl(boolean isV2); | ||
|
||
SimpleMap<ShardReplica> cores(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly good to also answer the question of "what collections are participating in this node" and what shards of Collection X are on this node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say, that is something that can be easily computed from the cores()
method and can be provided as a helper method elsewhere
*/ | ||
public interface SolrCluster { | ||
/** collections in the cluster */ | ||
SimpleMap<SolrCollection> collections(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also aliases... and the Alias class returned should list the collections provided and routing info if routed. (also law of Demeter etc...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wanted to include aliases. Where do you propose it to be there?
I would say , There should be no Alias
class/interface.
It should be as simple as
SolrCluster#collections#get(String name , boolean includeAlias)
and
SolrCollection#aliases() returns List<String>
import org.apache.solr.common.util.SimpleMap; | ||
|
||
/** Represents a collection in Solr */ | ||
public interface SolrCollection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I've definitely had cases where I wanted a list of nodes where this collection is hosted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be a utility method outside
/** | ||
* Represents a Solr cluster | ||
*/ | ||
public interface SolrCluster { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have our own package namespace, prepending solr isn't really needed unless we think we might also model non-solr clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. Imagine this will also be used outside of Solr as a part of SolrJ. So if you read some client code,
Cluster cluster;
is less readable compared to
SolrCluster solrCluster
;
Eventually, I would wish to replace a lot of SolrJ code/API with a standard set of interfaces.
* Solr node | ||
*/ | ||
String thisNode(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've wondered sometimes if clusters should have name or id of some sort but that's probably another topic.
|
||
import java.util.LinkedHashMap; | ||
|
||
public class LinkedSimpleHashMap<T> extends LinkedHashMap<CharSequence, T> implements SimpleMap<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this class appears unused...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this was to demonstrate how to implement a default impl of SimpleMap
@@ -38,7 +43,7 @@ | |||
* {@link ZkStateReader#getClusterState()}. | |||
* @lucene.experimental | |||
*/ | |||
public class ClusterState implements JSONWriter.Writable { | |||
public class ClusterState implements JSONWriter.Writable , SolrCluster { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little worried that this design allows for casting of the SolrCluster reference...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have multiple implementations of SolrCluster
.The idea of existing classes implementing these interfaces is to have readily available implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename Solr luster to Cluster.
Snitches have a target (at least node or replica, but one could imagine Cluster, collection or shard if such snitches do not already exist). (See in https://github.com/apache/lucene-solr/pull/1684/files#diff-fc8bc5eb94b9e48a24c2ec768733e72b) |
As I suggested in the other PR on autoscaling, I feel we should change the package name: Also, WDYT about marking the concrete classes DocCollection, Replica etc. with @lucene.internal to make it clear that they shouldn't be used directly from the plugins? Alternatively, we can mark these new interfaces in this PR with a new annotation (e.g. @solr.external) that signals to plugin writers that these will be stable for their use across versions. |
While marking things internal and external is good, a stronger possibility is to code more defensively and make "new plugins" get instantiated/injected with implementations of a facade that return instances of the interfaces. If those implementations can't just be cast to internal types we then have a much higher barrier to abuse (reflection). As for package names I agree we need to think carefully about them. It is a question what the goal is too. If we're reworking the internals, sdk won't sound right at all. If we are building something only meant to be consumed by an outer layer of plugins to insulate the plugins from internal refactoring, something in that vein sounds fine (or maybe o.a.s.cloud.plugin.api?) Settling on what our goals are will help us decide what name is best. |
Feel free to ignore the implementing classes. That was to demonstrate one way of implementing these interfaces. I'm happy to focus on the interfaces inside the I have updated the description. Only the SDK is important. Implementation can/will be removed |
Yes, totally.That ticket is trying to do something more than the scope of that ticket. I want to limit the scope of that ticket and have wider discussion on how a simple set of interfaces to represent the existing Solr cluster/cloud |
There is no way to have a "machine" abstraction in Solr. We can start with a |
Yeah machine is a difficult notion as it's a real world thing entirely outside java. It probably has to be optional and provided by sysprop or env var. and even so might be reducible to a property on node. Certainly detecting it would be fraught with peril. But machine designations (and rack designations) might be important to some folks. |
Many methods returning other objects are returning names of other abstractions being defined here (Shard, Collection, Node etc.). |
I see two cases where we need interfaces such as defined here:
What’s the intention? If only the later, then there’s really no need to have the internal classes implement these interfaces, a wrapper is ok. |
+1 to |
|
||
String name(); | ||
|
||
/**set of files in the config */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General : Please format Javadocs properly with text starting on line 2 and capital letter, even if it takes more lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A one-line variant is acceptable: https://google.github.io/styleguide/javaguide.html#s7.1.1-javadoc-multi-line
The only problem I see with the javadoc here is that it's missing a leading space, and it should capitalize the first letter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, agree
|
||
import org.apache.solr.common.SolrException; | ||
|
||
public interface Config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ConfigSet
instead? Config
is soooo overloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like both names , Config
& ConfigSet
configset
suggests it is a set of configs? in reality it's a single configuration which has multiple files
Machine vs Node: environment variables (rack, availability zone etc) are "machine" abstractions but can be considered at the node level. Free disk is clearly a machine abstraction (two nodes on same machine might be sharing the same free disk and skewing decisions based on it) but I don't see how this can be solved (even a single node might be sharing space with other processes). |
Would love to see sample plug-in code accessing cluster config. How the instances are obtained and how the cluster is explored (nodes, collections, shards, replicas). |
@murblanc I shall try to provide some sample implementation in another 2-3 days |
Opening another PR |
I intend to merge this soon |
A few notes before anyone who starts reviewing this
o.a.s.cluster.api
package is optional and will be removed