SOLR-18179: Better highlight and expand upon our Cluster concepts in the Ref Guide#4246
SOLR-18179: Better highlight and expand upon our Cluster concepts in the Ref Guide#4246epugh wants to merge 6 commits intoapache:mainfrom
Conversation
Co-authored-by: epugh <22395+epugh@users.noreply.github.com>
…and be more explicit
| @@ -96,7 +96,7 @@ The Files screen in the Admin UI lets you browse & view configuration files (suc | |||
| .The Files Screen | |||
| image::configuration-files/files-screen.png[Files screen,height=400] | |||
|
|
|||
| If you are using xref:deployment-guide:cluster-types.adoc#solrcloud-mode[SolrCloud], the files displayed are the configuration files for this collection stored in ZooKeeper. | |||
| If you are using xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud], the files displayed are the configuration files for this collection stored in ZooKeeper. | |||
| In user-managed clusters or single-node installations, all files in the `conf` directory are displayed. | |||
There was a problem hiding this comment.
lets use "standalone" instead of "single-node" installations.
| @@ -21,7 +21,7 @@ This screen provides status information about each collection & node in your clu | |||
| .Only Visible When using SolrCloud | |||
| [NOTE] | |||
| ==== | |||
| The "Cloud" menu option is only available when Solr is running xref:cluster-types.adoc#solrcloud-mode[SolrCloud]. | |||
| The "Cloud" menu option is only available when Solr is running xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud]. | |||
| User-managed clusters or single-node installations will not display this option. | |||
| There are two general modes of operating a cluster of Solr nodes. | ||
| One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). | ||
|
|
||
| TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. |
There was a problem hiding this comment.
Not just in source code but very much publicly as well. I would remove the "especially" part.
There was a problem hiding this comment.
I would further state we don't need to talk about "single node" as if it's some kind of Solr term or deployment type. We already have a word for this, and we all know what that word is ;-)
|
|
||
| A _server_ is the hardware or virtual machine that hosts Solr software. | ||
| A _node_ is an instance of a running Solr process that services search and indexing requests. | ||
| Large servers may run multiple Solr nodes, though typically one node per server is most common. |
There was a problem hiding this comment.
| Large servers may run multiple Solr nodes, though typically one node per server is most common. | |
| Although servers may run multiple nodes, it makes more sense to avoid that. |
There was a problem hiding this comment.
Or just be specific?
In special cases where oversized pre-existing hardware must be utilized, a server might host two or more nodes. Note that such configurations are typically sub-optimal.
|
|
||
| === Shards | ||
|
|
||
| In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. |
There was a problem hiding this comment.
| In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. | |
| In both cluster modes, a logical collection of documents can be divided into logical _shards_. |
There was a problem hiding this comment.
I don't think the word logical actually ads clarity.
There was a problem hiding this comment.
FWIW I disagree and find that the word "logical" and it's opposite word "physical" are useful to understand indirect / almost abstract from the more tangible / real.
There was a problem hiding this comment.
I feel like we should be saying something along the lines of "shards are logical divisions of collections", and not saying "logical shards" since that is like saying wet water...
| Solr's user-managed mode requires that cluster coordination activities that SolrCloud normally uses ZooKeeper for be performed manually or with local scripts. | ||
|
|
||
| If the corpus of documents is too large for a single shard, the logic to create multiple shards is entirely left to the user. | ||
| There are no automated or programmatic ways for Solr to create shards during indexing. |
There was a problem hiding this comment.
This sentence could be dropped. The "during indexing" addition kind of confuses me... is a point trying to be made about Solr specifically during indexing?
| A follower replica could continue to serve queries if the queries were specifically directed to it. | ||
| Promoting a follower replica to serve as the leader would require changing `solrconfig.xml` configurations on all replicas and reloading each core. | ||
|
|
||
| User-managed mode has no concept of a collection as a managed entity, so for all intents and purposes each Solr core is configured and managed independently. |
There was a problem hiding this comment.
This sentence is absolutely key, and the next good too. Let's elevate them to the top of "user managed clusters".
|
|
||
| As long as one replica of each relevant shard is available, a user query or indexing request can still be satisfied when running in SolrCloud mode. | ||
|
|
||
| == User-Managed Mode |
There was a problem hiding this comment.
| == User-Managed Mode | |
| == User-Managed Clusters |
|
|
||
| In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. | ||
|
|
||
| == SolrCloud Mode |
There was a problem hiding this comment.
| == SolrCloud Mode | |
| == SolrCloud Clusters |
|
|
||
| == SolrCloud Mode | ||
|
|
||
| SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. |
There was a problem hiding this comment.
| SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. | |
| A SolrCloud cluster (or simply "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. |
gus-asf
left a comment
There was a problem hiding this comment.
Overall I like it but I think we can do more to unify the terminology on replica.
|
|
||
| A _server_ is the hardware or virtual machine that hosts Solr software. | ||
| A _node_ is an instance of a running Solr process that services search and indexing requests. | ||
| Large servers may run multiple Solr nodes, though typically one node per server is most common. |
There was a problem hiding this comment.
Or just be specific?
In special cases where oversized pre-existing hardware must be utilized, a server might host two or more nodes. Note that such configurations are typically sub-optimal.
|
|
||
| === Shards | ||
|
|
||
| In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. |
There was a problem hiding this comment.
I don't think the word logical actually ads clarity.
| === Shards | ||
|
|
||
| In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. | ||
| Each shard represents a logical slice of the overall collection and contains a subset of the documents. |
There was a problem hiding this comment.
Or maybe:
Shards slice a collection of documents into discrete non-overlapping subsets, and may be based on data values you specify or ranges of a hash on the document ID.
|
|
||
| A shard is a logical concept—a slice of your collection. | ||
| A _replica_ is the physical manifestation of that logical shard. | ||
| It is the actual running instance that holds and serves the documents belonging to that shard. |
There was a problem hiding this comment.
"likely" raises the question when does it not have an update log? Can we clarify when or omit?
Discussion of what a node does seems better placed in a the node section? (with the word replica as a hyperlink to this section).
"SolrCore" is a class in the code, details like that are developer documentation, not relevant to the user. No need to say anything other than "replica" here?
I do like the idea of noting that there is one Lucene index per replica here, but it seems better (to me) to remain focused on the idea, behind "replica" not the implementation.
| The replicas ARE how the shard exists. | ||
| This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. | ||
|
|
||
| All replicas of the same shard contain the same subset of documents and share the same configuration. |
There was a problem hiding this comment.
+1 and hyperlink "collection" to the section below
| @@ -96,6 +105,11 @@ The arrangement of search results into categories based on indexed terms. | |||
| [[field]]Field:: | |||
| The content to be indexed/searched along with metadata defining how the content should be processed by Solr. | |||
|
|
|||
| [[follower]]Follower:: | |||
| A <<replica,Replica>> that is not the <<leader,Leader>> for its <<shard,Shard>>. | |||
There was a problem hiding this comment.
This is replica level in cloud and node level in standalone, which probably should be called out and clarified.
| @@ -114,8 +132,8 @@ Since users search using terms they expect to be in documents, finding the term | |||
| === L | |||
|
|
|||
| [[leader]]Leader:: | |||
| A single <<replica,Replica>> for each <<shard,Shard>> that takes charge of coordinating index updates (document additions or deletions) to other replicas in the same shard. | |||
| This is a transient responsibility assigned to a node via an election, if the current Shard Leader goes down, a new node will automatically be elected to take its place. | |||
| A single <<replica,Replica>> for each <<shard,Shard>> that serves as the source-of-truth and coordinates index updates (document additions or deletions) to the <<follower,follower>> replicas in the same shard. | |||
There was a problem hiding this comment.
Again cloud/standalone differences
| @@ -163,7 +181,11 @@ The ability of a search engine to retrieve _all_ of the possible matches to a us | |||
| The appropriateness of a document to the search conducted by the user. | |||
|
|
|||
| [[replica]]Replica:: | |||
| A <<core,Core>> that acts as a physical copy of a <<shard,Shard>> in a <<solrclouddef,SolrCloud>> <<collection,Collection>>. | |||
| The physical manifestation of a logical <<shard,Shard>>. | |||
| A replica is the actual running instance (represented as a <<core,Core>>) that holds and serves the documents belonging to that shard. | |||
There was a problem hiding this comment.
No need to mention core here, we should promote one favored name for each entity.
| In SolrCloud, a logical partition of a single <<collection,Collection>>. | ||
| Every shard consists of at least one physical <<replica,Replica>>, but there may be multiple Replicas distributed across multiple <<node,Nodes>> for fault tolerance. | ||
| A logical slice of a <<collection,Collection>>. | ||
| Each shard represents a logical partition containing a subset of the collection's documents. |
There was a problem hiding this comment.
| Each shard represents a logical partition containing a subset of the collection's documents. | |
| Each shard represents a partition containing a subset of the collection's documents. |
| @@ -213,6 +240,12 @@ Synonyms generally are terms which are near to each other in meaning and may sub | |||
| In a search engine implementation, synonyms may be abbreviations as well as words, or terms that are not consistently hyphenated. | |||
| Examples of synonyms in this context would be "Inc." and "Incorporated" or "iPod" and "i-pod". | |||
|
|
|||
| [[standalone]]Standalone:: | |||
| An informal term referring to Solr deployments that do not use <<solrclouddef,SolrCloud>> mode. | |||
There was a problem hiding this comment.
| An informal term referring to Solr deployments that do not use <<solrclouddef,SolrCloud>> mode. | |
| An informal term referring to Solr deployments that do not utilize Apache Zookeeper and thus do not provide the centralized configuration management that is available in <<solrclouddef,SolrCloud>> mode. |
There was a problem hiding this comment.
"deployments" -> "nodes" to re-inforce this is a mode of a Solr process. It is not a characterization of an entire "deployment" which is suggestive of a cluster. I'm cool with the term "user managed" existing to describe such a cluster.
https://issues.apache.org/jira/browse/SOLR-18179
Description
naming is hard, and our ref guide treats "standlone versus cloud" as a "oh, time to deploy" decision. But it's an up front decisino as you use differnent apis and different mental models for defining how your schema evoles and owrks, so lets move it to the top.
If we ever manage to unify standalone and cloud mode into just one "solr mode", then the whole "how do i scale up thing" maybe could be a deployment decision... But today it's your biggest up front decision.
Solution
Input from Gus rant email ;-) plust some claude to help me understand the gaps and differnces. UPdate the clsuter types page into getting started and add more glossary entries.
Tests
manual ref guide reading.