Skip to content

SOLR-18179: Better highlight and expand upon our Cluster concepts in the Ref Guide#4246

Open
epugh wants to merge 6 commits intoapache:mainfrom
epugh:copilot/move-cluster-concepts-section
Open

SOLR-18179: Better highlight and expand upon our Cluster concepts in the Ref Guide#4246
epugh wants to merge 6 commits intoapache:mainfrom
epugh:copilot/move-cluster-concepts-section

Conversation

@epugh
Copy link
Copy Markdown
Contributor

@epugh epugh commented Mar 29, 2026

https://issues.apache.org/jira/browse/SOLR-18179

Description

naming is hard, and our ref guide treats "standlone versus cloud" as a "oh, time to deploy" decision. But it's an up front decisino as you use differnent apis and different mental models for defining how your schema evoles and owrks, so lets move it to the top.

If we ever manage to unify standalone and cloud mode into just one "solr mode", then the whole "how do i scale up thing" maybe could be a deployment decision... But today it's your biggest up front decision.

Solution

Input from Gus rant email ;-) plust some claude to help me understand the gaps and differnces. UPdate the clsuter types page into getting started and add more glossary entries.

Tests

manual ref guide reading.

@epugh epugh requested a review from gus-asf March 29, 2026 13:17
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 29, 2026
@epugh epugh requested review from dsmiley, gerlowskija and janhoy March 29, 2026 13:17
@@ -96,7 +96,7 @@ The Files screen in the Admin UI lets you browse & view configuration files (suc
.The Files Screen
image::configuration-files/files-screen.png[Files screen,height=400]

If you are using xref:deployment-guide:cluster-types.adoc#solrcloud-mode[SolrCloud], the files displayed are the configuration files for this collection stored in ZooKeeper.
If you are using xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud], the files displayed are the configuration files for this collection stored in ZooKeeper.
In user-managed clusters or single-node installations, all files in the `conf` directory are displayed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use "standalone" instead of "single-node" installations.

@@ -21,7 +21,7 @@ This screen provides status information about each collection & node in your clu
.Only Visible When using SolrCloud
[NOTE]
====
The "Cloud" menu option is only available when Solr is running xref:cluster-types.adoc#solrcloud-mode[SolrCloud].
The "Cloud" menu option is only available when Solr is running xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud].
User-managed clusters or single-node installations will not display this option.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again; use "standalone"

There are two general modes of operating a cluster of Solr nodes.
One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>).

TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just in source code but very much publicly as well. I would remove the "especially" part.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would further state we don't need to talk about "single node" as if it's some kind of Solr term or deployment type. We already have a word for this, and we all know what that word is ;-)


A _server_ is the hardware or virtual machine that hosts Solr software.
A _node_ is an instance of a running Solr process that services search and indexing requests.
Large servers may run multiple Solr nodes, though typically one node per server is most common.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Large servers may run multiple Solr nodes, though typically one node per server is most common.
Although servers may run multiple nodes, it makes more sense to avoid that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just be specific?

In special cases where oversized pre-existing hardware must be utilized, a server might host two or more nodes. Note that such configurations are typically sub-optimal.


=== Shards

In both cluster modes, a logical collection of documents can be divided across nodes as _shards_.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In both cluster modes, a logical collection of documents can be divided across nodes as _shards_.
In both cluster modes, a logical collection of documents can be divided into logical _shards_.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the word logical actually ads clarity.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I disagree and find that the word "logical" and it's opposite word "physical" are useful to understand indirect / almost abstract from the more tangible / real.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should be saying something along the lines of "shards are logical divisions of collections", and not saying "logical shards" since that is like saying wet water...

Solr's user-managed mode requires that cluster coordination activities that SolrCloud normally uses ZooKeeper for be performed manually or with local scripts.

If the corpus of documents is too large for a single shard, the logic to create multiple shards is entirely left to the user.
There are no automated or programmatic ways for Solr to create shards during indexing.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence could be dropped. The "during indexing" addition kind of confuses me... is a point trying to be made about Solr specifically during indexing?

A follower replica could continue to serve queries if the queries were specifically directed to it.
Promoting a follower replica to serve as the leader would require changing `solrconfig.xml` configurations on all replicas and reloading each core.

User-managed mode has no concept of a collection as a managed entity, so for all intents and purposes each Solr core is configured and managed independently.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is absolutely key, and the next good too. Let's elevate them to the top of "user managed clusters".


As long as one replica of each relevant shard is available, a user query or indexing request can still be satisfied when running in SolrCloud mode.

== User-Managed Mode
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
== User-Managed Mode
== User-Managed Clusters


In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk.

== SolrCloud Mode
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
== SolrCloud Mode
== SolrCloud Clusters


== SolrCloud Mode

SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature.
A SolrCloud cluster (or simply "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature.

Copy link
Copy Markdown
Contributor

@gus-asf gus-asf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like it but I think we can do more to unify the terminology on replica.


A _server_ is the hardware or virtual machine that hosts Solr software.
A _node_ is an instance of a running Solr process that services search and indexing requests.
Large servers may run multiple Solr nodes, though typically one node per server is most common.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just be specific?

In special cases where oversized pre-existing hardware must be utilized, a server might host two or more nodes. Note that such configurations are typically sub-optimal.


=== Shards

In both cluster modes, a logical collection of documents can be divided across nodes as _shards_.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the word logical actually ads clarity.

=== Shards

In both cluster modes, a logical collection of documents can be divided across nodes as _shards_.
Each shard represents a logical slice of the overall collection and contains a subset of the documents.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe:

Shards slice a collection of documents into discrete non-overlapping subsets, and may be based on data values you specify or ranges of a hash on the document ID.


A shard is a logical concept—a slice of your collection.
A _replica_ is the physical manifestation of that logical shard.
It is the actual running instance that holds and serves the documents belonging to that shard.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"likely" raises the question when does it not have an update log? Can we clarify when or omit?

Discussion of what a node does seems better placed in a the node section? (with the word replica as a hyperlink to this section).

"SolrCore" is a class in the code, details like that are developer documentation, not relevant to the user. No need to say anything other than "replica" here?

I do like the idea of noting that there is one Lucene index per replica here, but it seems better (to me) to remain focused on the idea, behind "replica" not the implementation.

The replicas ARE how the shard exists.
This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies.

All replicas of the same shard contain the same subset of documents and share the same configuration.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 and hyperlink "collection" to the section below

@@ -96,6 +105,11 @@ The arrangement of search results into categories based on indexed terms.
[[field]]Field::
The content to be indexed/searched along with metadata defining how the content should be processed by Solr.

[[follower]]Follower::
A <<replica,Replica>> that is not the <<leader,Leader>> for its <<shard,Shard>>.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is replica level in cloud and node level in standalone, which probably should be called out and clarified.

@@ -114,8 +132,8 @@ Since users search using terms they expect to be in documents, finding the term
=== L

[[leader]]Leader::
A single <<replica,Replica>> for each <<shard,Shard>> that takes charge of coordinating index updates (document additions or deletions) to other replicas in the same shard.
This is a transient responsibility assigned to a node via an election, if the current Shard Leader goes down, a new node will automatically be elected to take its place.
A single <<replica,Replica>> for each <<shard,Shard>> that serves as the source-of-truth and coordinates index updates (document additions or deletions) to the <<follower,follower>> replicas in the same shard.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again cloud/standalone differences

@@ -163,7 +181,11 @@ The ability of a search engine to retrieve _all_ of the possible matches to a us
The appropriateness of a document to the search conducted by the user.

[[replica]]Replica::
A <<core,Core>> that acts as a physical copy of a <<shard,Shard>> in a <<solrclouddef,SolrCloud>> <<collection,Collection>>.
The physical manifestation of a logical <<shard,Shard>>.
A replica is the actual running instance (represented as a <<core,Core>>) that holds and serves the documents belonging to that shard.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to mention core here, we should promote one favored name for each entity.

In SolrCloud, a logical partition of a single <<collection,Collection>>.
Every shard consists of at least one physical <<replica,Replica>>, but there may be multiple Replicas distributed across multiple <<node,Nodes>> for fault tolerance.
A logical slice of a <<collection,Collection>>.
Each shard represents a logical partition containing a subset of the collection's documents.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Each shard represents a logical partition containing a subset of the collection's documents.
Each shard represents a partition containing a subset of the collection's documents.

@@ -213,6 +240,12 @@ Synonyms generally are terms which are near to each other in meaning and may sub
In a search engine implementation, synonyms may be abbreviations as well as words, or terms that are not consistently hyphenated.
Examples of synonyms in this context would be "Inc." and "Incorporated" or "iPod" and "i-pod".

[[standalone]]Standalone::
An informal term referring to Solr deployments that do not use <<solrclouddef,SolrCloud>> mode.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An informal term referring to Solr deployments that do not use <<solrclouddef,SolrCloud>> mode.
An informal term referring to Solr deployments that do not utilize Apache Zookeeper and thus do not provide the centralized configuration management that is available in <<solrclouddef,SolrCloud>> mode.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"deployments" -> "nodes" to re-inforce this is a mode of a Solr process. It is not a characterization of an entire "deployment" which is suggestive of a cluster. I'm cool with the term "user managed" existing to describe such a cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants