Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support disruption free rolling restart #529

Closed
janhoy opened this issue Mar 7, 2023 · 4 comments · Fixed by #530
Closed

Support disruption free rolling restart #529

janhoy opened this issue Mar 7, 2023 · 4 comments · Fixed by #530
Labels
enhancement New feature or request
Milestone

Comments

@janhoy
Copy link
Contributor

janhoy commented Mar 7, 2023

As discussed in slack https://apachesolr.slack.com/archives/C022UMAPZ0V/p1676970790552379

When the operator restarts the cluster, e.g. during a version upgrade, there is no guarantee that a Solr POD is marked as not ready before solr stop is called. Thus clients may experience connection error during the restart.

@HoustonPutman suggests we can implement a custom readiness gate https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-readiness-gate to control this better.

@HoustonPutman
Copy link
Contributor

@janhoy we should also create a Solr JIRA issue for this, to fix Cloud-aware clients and internal shard requests.

More info: We can fix this for simple use cases where users have clouds that all collections are single-sharded and each collection has a replica on all nodes. That way, Solr has no need to send the request to another node internally.
If a collection is multi-sharded, or a replica of the collection does not exist on all nodes, then Solr might have to forward requests throughout the cluster. Solr is not aware of the podConditions we are using to solve this in Kubernetes, so we need to think of another solution to fix this inside of Solr.

In the meantime #530 is a great start.

@janhoy
Copy link
Contributor Author

janhoy commented Mar 28, 2023

@janhoy we should also create a Solr JIRA issue for this, to fix Cloud-aware clients and internal shard requests.

Sure, I can create one. Do you have a clear idea of how it would work? Now, SolrJ considers collection-state combined with live_nodes to decide what replicas to query. Would we need some new per-node-state znode in Zookeeper to flag a node as "draining", and then let SolrJ act on that?

@HoustonPutman
Copy link
Contributor

Not a clear idea yet.

Would we need some new per-node-state znode in Zookeeper to flag a node as "draining", and then let SolrJ act on that?

That would work, but I'm not sure we'd want to restrict it to just "draining". We might want to send requests elsewhere for other reasons too.

@janhoy
Copy link
Contributor Author

janhoy commented Mar 28, 2023

https://issues.apache.org/jira/browse/SOLR-16722

@HoustonPutman HoustonPutman added this to the v0.7.0 milestone Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants