SOLR-18179: Better highlight and expand upon our Cluster concepts in the Ref Guide by epugh · Pull Request #4246 · apache/solr

epugh · 2026-03-29T13:17:33Z

https://issues.apache.org/jira/browse/SOLR-18179

Description

naming is hard, and our ref guide treats "standlone versus cloud" as a "oh, time to deploy" decision. But it's an up front decisino as you use differnent apis and different mental models for defining how your schema evoles and owrks, so lets move it to the top.

If we ever manage to unify standalone and cloud mode into just one "solr mode", then the whole "how do i scale up thing" maybe could be a deployment decision... But today it's your biggest up front decision.

Solution

Input from Gus rant email ;-) plust some claude to help me understand the gaps and differnces. UPdate the clsuter types page into getting started and add more glossary entries.

Tests

manual ref guide reading.

Co-authored-by: epugh <22395+epugh@users.noreply.github.com>

…and be more explicit

dsmiley · 2026-03-29T13:40:13Z

solr/solr-ref-guide/modules/configuration-guide/pages/configuration-files.adoc

@@ -96,7 +96,7 @@ The Files screen in the Admin UI lets you browse & view configuration files (suc
 .The Files Screen
 image::configuration-files/files-screen.png[Files screen,height=400]

-If you are using xref:deployment-guide:cluster-types.adoc#solrcloud-mode[SolrCloud], the files displayed are the configuration files for this collection stored in ZooKeeper.
+If you are using xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud], the files displayed are the configuration files for this collection stored in ZooKeeper.
 In user-managed clusters or single-node installations, all files in the `conf` directory are displayed.


lets use "standalone" instead of "single-node" installations.

dsmiley · 2026-03-29T13:40:56Z

solr/solr-ref-guide/modules/deployment-guide/pages/cloud-screens.adoc

@@ -21,7 +21,7 @@ This screen provides status information about each collection & node in your clu
 .Only Visible When using SolrCloud
 [NOTE]
 ====
-The "Cloud" menu option is only available when Solr is running xref:cluster-types.adoc#solrcloud-mode[SolrCloud].
+The "Cloud" menu option is only available when Solr is running xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud].
 User-managed clusters or single-node installations will not display this option.


again; use "standalone"

dsmiley · 2026-03-29T13:43:14Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code.


Not just in source code but very much publicly as well. I would remove the "especially" part.

I would further state we don't need to talk about "single node" as if it's some kind of Solr term or deployment type. We already have a word for this, and we all know what that word is ;-)

dsmiley · 2026-03-29T13:46:54Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per server is most common.


Suggested change

Large servers may run multiple Solr nodes, though typically one node per server is most common.

Although servers may run multiple nodes, it makes more sense to avoid that.

Or just be specific?

In special cases where oversized pre-existing hardware must be utilized, a server might host two or more nodes. Note that such configurations are typically sub-optimal.

dsmiley · 2026-03-29T13:48:29Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across nodes as _shards_.


Suggested change

In both cluster modes, a logical collection of documents can be divided across nodes as _shards_.

In both cluster modes, a logical collection of documents can be divided into logical _shards_.

I don't think the word logical actually ads clarity.

FWIW I disagree and find that the word "logical" and it's opposite word "physical" are useful to understand indirect / almost abstract from the more tangible / real.

I feel like we should be saying something along the lines of "shards are logical divisions of collections", and not saying "logical shards" since that is like saying wet water...

dsmiley · 2026-03-29T14:30:51Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+Solr's user-managed mode requires that cluster coordination activities that SolrCloud normally uses ZooKeeper for be performed manually or with local scripts.
+
+If the corpus of documents is too large for a single shard, the logic to create multiple shards is entirely left to the user.
+There are no automated or programmatic ways for Solr to create shards during indexing.


This sentence could be dropped. The "during indexing" addition kind of confuses me... is a point trying to be made about Solr specifically during indexing?

dsmiley · 2026-03-29T14:34:30Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+A follower replica could continue to serve queries if the queries were specifically directed to it.
+Promoting a follower replica to serve as the leader would require changing `solrconfig.xml` configurations on all replicas and reloading each core.
+
+User-managed mode has no concept of a collection as a managed entity, so for all intents and purposes each Solr core is configured and managed independently.


This sentence is absolutely key, and the next good too. Let's elevate them to the top of "user managed clusters".

dsmiley · 2026-03-29T14:34:40Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+
+As long as one replica of each relevant shard is available, a user query or indexing request can still be satisfied when running in SolrCloud mode.
+
+== User-Managed Mode


Suggested change

== User-Managed Mode

== User-Managed Clusters

dsmiley · 2026-03-29T14:34:54Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+
+In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk.
+
+== SolrCloud Mode


Suggested change

== SolrCloud Mode

== SolrCloud Clusters

dsmiley · 2026-03-29T14:35:30Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+
+== SolrCloud Mode
+
+SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature.


Suggested change

SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature.

A SolrCloud cluster (or simply "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature.

gus-asf

Overall I like it but I think we can do more to unify the terminology on replica.

gus-asf · 2026-03-29T16:08:09Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per server is most common.


Or just be specific?

In special cases where oversized pre-existing hardware must be utilized, a server might host two or more nodes. Note that such configurations are typically sub-optimal.

gus-asf · 2026-03-29T16:09:46Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across nodes as _shards_.


I don't think the word logical actually ads clarity.

gus-asf · 2026-03-29T18:14:07Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a subset of the documents.


Or maybe:

Shards slice a collection of documents into discrete non-overlapping subsets, and may be based on data values you specify or ranges of a hash on the document ID.

gus-asf · 2026-03-29T19:23:06Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents belonging to that shard.


"likely" raises the question when does it not have an update log? Can we clarify when or omit?

Discussion of what a node does seems better placed in a the node section? (with the word replica as a hyperlink to this section).

"SolrCore" is a class in the code, details like that are developer documentation, not relevant to the user. No need to say anything other than "replica" here?

I do like the idea of noting that there is one Lucene index per replica here, but it seems better (to me) to remain focused on the idea, behind "replica" not the implementation.

gus-asf · 2026-03-29T19:25:07Z

solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc

+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share the same configuration.


+1 and hyperlink "collection" to the section below

gus-asf · 2026-03-29T20:55:10Z

solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc

@@ -96,6 +105,11 @@ The arrangement of search results into categories based on indexed terms.
 [[field]]Field::
 The content to be indexed/searched along with metadata defining how the content should be processed by Solr.

+[[follower]]Follower::
+A <<replica,Replica>> that is not the <<leader,Leader>> for its <<shard,Shard>>.


This is replica level in cloud and node level in standalone, which probably should be called out and clarified.

gus-asf · 2026-03-29T20:55:42Z

solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc

@@ -114,8 +132,8 @@ Since users search using terms they expect to be in documents, finding the term
 === L

 [[leader]]Leader::
-A single <<replica,Replica>> for each <<shard,Shard>> that takes charge of coordinating index updates (document additions or deletions) to other replicas in the same shard.
-This is a transient responsibility assigned to a node via an election, if the current Shard Leader goes down, a new node will automatically be elected to take its place.
+A single <<replica,Replica>> for each <<shard,Shard>> that serves as the source-of-truth and coordinates index updates (document additions or deletions) to the <<follower,follower>> replicas in the same shard.


Again cloud/standalone differences

gus-asf · 2026-03-29T20:57:28Z

solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc

@@ -163,7 +181,11 @@ The ability of a search engine to retrieve _all_ of the possible matches to a us
 The appropriateness of a document to the search conducted by the user.

 [[replica]]Replica::
-A <<core,Core>> that acts as a physical copy of a <<shard,Shard>> in a <<solrclouddef,SolrCloud>> <<collection,Collection>>.
+The physical manifestation of a logical <<shard,Shard>>.
+A replica is the actual running instance (represented as a <<core,Core>>) that holds and serves the documents belonging to that shard.


No need to mention core here, we should promote one favored name for each entity.

gus-asf · 2026-03-29T20:58:53Z

solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc

-In SolrCloud, a logical partition of a single <<collection,Collection>>.
-Every shard consists of at least one physical <<replica,Replica>>, but there may be multiple Replicas distributed across multiple <<node,Nodes>> for fault tolerance.
+A logical slice of a <<collection,Collection>>.
+Each shard represents a logical partition containing a subset of the collection's documents.


Suggested change

Each shard represents a logical partition containing a subset of the collection's documents.

Each shard represents a partition containing a subset of the collection's documents.

gus-asf · 2026-03-29T21:04:27Z

solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc

@@ -213,6 +240,12 @@ Synonyms generally are terms which are near to each other in meaning and may sub
 In a search engine implementation, synonyms may be abbreviations as well as words, or terms that are not consistently hyphenated.
 Examples of synonyms in this context would be "Inc." and "Incorporated" or "iPod" and "i-pod".

+[[standalone]]Standalone::
+An informal term referring to Solr deployments that do not use <<solrclouddef,SolrCloud>> mode.


Suggested change

An informal term referring to Solr deployments that do not use <<solrclouddef,SolrCloud>> mode.

An informal term referring to Solr deployments that do not utilize Apache Zookeeper and thus do not provide the centralized configuration management that is available in <<solrclouddef,SolrCloud>> mode.

"deployments" -> "nodes" to re-inforce this is a mode of a Solr process. It is not a characterization of an entire "deployment" which is suggestive of a cluster. I'm cool with the term "user managed" existing to describe such a cluster.

Copilot AI and others added 6 commits March 13, 2026 02:57

Initial plan

89a86a6

Move Cluster Concepts section under Solr Concepts in reference guide

d2f4246

Co-authored-by: epugh <22395+epugh@users.noreply.github.com>

place holder changelog, will update if we make any big moves

c807443

Bring in new concept of servers and nodes, rework definitions to try …

6c87297

…and be more explicit

expand/update/add to glossary based on the solr cluster concepts

a7735c7

add in common terms standalone and user managed.

48e1f9d

epugh requested a review from gus-asf March 29, 2026 13:17

github-actions bot added the documentation Improvements or additions to documentation label Mar 29, 2026

epugh requested review from dsmiley, gerlowskija and janhoy March 29, 2026 13:17

dsmiley reviewed Mar 29, 2026

View reviewed changes

gus-asf reviewed Mar 29, 2026

View reviewed changes

	Large servers may run multiple Solr nodes, though typically one node per server is most common.
	Although servers may run multiple nodes, it makes more sense to avoid that.


		=== Shards

		In both cluster modes, a logical collection of documents can be divided across nodes as _shards_.


		As long as one replica of each relevant shard is available, a user query or indexing request can still be satisfied when running in SolrCloud mode.

		== User-Managed Mode


		In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk.

		== SolrCloud Mode


		== SolrCloud Mode

		SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature.

	SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature.
	A SolrCloud cluster (or simply "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature.

	Each shard represents a logical partition containing a subset of the collection's documents.
	Each shard represents a partition containing a subset of the collection's documents.

	An informal term referring to Solr deployments that do not use <<solrclouddef,SolrCloud>> mode.
	An informal term referring to Solr deployments that do not utilize Apache Zookeeper and thus do not provide the centralized configuration management that is available in <<solrclouddef,SolrCloud>> mode.

Conversation

epugh commented Mar 29, 2026

Description

Solution

Tests

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gus-asf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants