Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running embedded-zk in "ensemble" mode #2391

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

gerlowskija
Copy link
Contributor

Description

Prior to this commit, Solr only supported running embedded-ZK in "standalone" mode (which cannot take part in any larger ZK ensemble or quorum). But there are usecases that would benefit from being able to do this, both on the development and testing side, and event for adventurous users who might want the benefits of a small multi-node SolrCloud cluster without the headache of also deploying ZK.

Solution

This commit augments our embedded-ZK code to support running embedded-ZK
in "quorum" or ensemble mode. Multiple Solr nodes can now all have
their embedded-ZK's join a multi-node quorum upon startup. Other than
Solr and ZK sharing a process, the embedded- ZK ensemble behaves
identically to one formed of independent processes: nodes can join or
leave the cluster, etc.

Embedded-ensemble-ZK is enabled any time the zkQuorumRun system
property is present, along with an explicitly specified ZK host string.
On startup, Solr will identify which host in the zk-conn-string it
should be (based on admittedly hacky heuristics), and then spins up a
'ZooKeeperServerEmbedded' instance in-process to join the ensemble. e.g.

export LH="localhost"

# Run each below in a separate terminal, one for each host in the zk-conn string
bin/solr start -p 8983 -z $LH:9983,$LH:9984,$LH:9985 -DzkQuorumRun
bin/solr start -p 8984 -z $LH:9983,$LH:9984,$LH:9985 -DzkQuorumRun
bin/solr start -p 8985 -z $LH:9983,$LH:9984,$LH:9985 -DzkQuorumRun

Some notes:
- this doesn't (yet) work with ZK's dynamic-ensemble feature, so all
ZK nodes must be specified in a static ZK conn string provided at
startup
- this appears to run best when the security-manager is disabled.
- the interface in particular for how we expose this is pretty rough, and there's a lot of room for improvement.

Tests

None, yet.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

…found" with an out-dated ClusterState (apache#2363)"

This reverts commit 5c399dd.
This commit augments our embedded-ZK code to support running embedded-ZK
in "quorum" or ensemble mode.  Multiple Solr nodes can now all have
their embedded-ZK's join a multi-node quorum upon startup.  Other than
Solr and ZK sharing a process, the embedded- ZK ensemble behaves
identically to one formed of independent processes: nodes can join or
leave the cluster, etc.

Embedded-ensemble-ZK is enabled any time the `zkQuorumRun` system
property is present, along with an explicitly specified ZK host string.
On startup, Solr will identify which host in the zk-conn-string it
should be (based on admittedly hacky heuristics), and then spins up a
'ZooKeeperServerEmbedded' instance in-process to join the ensemble. e.g.

```
export LH="localhost"
bin/solr start -p 8983 -z $LH:9983,$LH:9984,$LH:9985 -DzkQuorumRun
bin/solr start -p 8984 -z $LH:9983,$LH:9984,$LH:9985 -DzkQuorumRun
bin/solr start -p 8985 -z $LH:9983,$LH:9984,$LH:9985 -DzkQuorumRun
```

Some notes:
  - this doesn't (yet) work with ZK's dynamic-ensemble feature, so all
    ZK nodes must be specified in a static ZK conn string provided at
    startup
  - this appears to run best when the security-manager is disabled.
@gerlowskija
Copy link
Contributor Author

FYI: https://cwiki.apache.org/confluence/display/SOLR/SIP-14+Embedded+Zookeeper for context on others' interest in moving this direction.

@janhoy
Copy link
Contributor

janhoy commented Apr 29, 2024

Great work. I think we should rip out the existing embedded ZK, which is a hack.

Do you want to proceed in a PR or perhaps in a central feature branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants