Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB cluster issue - No replica set members found yet #139

Closed
pkankar opened this issue Jul 2, 2020 · 10 comments
Closed

MongoDB cluster issue - No replica set members found yet #139

pkankar opened this issue Jul 2, 2020 · 10 comments
Labels
bug Something isn't working help wanted Extra attention is needed status:need more info

Comments

@pkankar
Copy link

pkankar commented Jul 2, 2020

I have followed the instructions mentioned in official Stackstorm page, It shows ST2 HA OK message, but my pods get into "CrashLoopBackOff" state.


kubectl exec -it $(kubectl get --namespace default pod -l app=st2client,release=test -o jsonpath="{.items[0].metadata.name}") -- /opt/stackstorm/st2/bin/st2api --config-file=/etc/st2/st2.conf --config-file=/etc/st2/st2.docker.conf --config-file=/etc/st2/st2.user.conf
2020-07-02 22:00:05,080 DEBUG [-] Using Python: 3.6.9 (/opt/stackstorm/st2/bin/python)
2020-07-02 22:00:05,080 DEBUG [-] Using config files: /etc/st2/st2.conf,/etc/st2/st2.docker.conf,/etc/st2/st2.user.conf
2020-07-02 22:00:05,081 DEBUG [-] Using logging config: /etc/st2/logging.api.gunicorn.conf
2020-07-02 22:00:05,098 INFO [-] Connecting to database "st2" @ "test-mongodb-ha-0.test-mongodb-ha:27017,test-mongodb-ha-1.test-mongodb-ha:27017,test-mongodb-ha-2.test-mongodb-ha:27017 (replica set)" as user "admin".
2020-07-02 22:00:08,109 ERROR [-] Failed to connect to database "st2" @ "test-mongodb-ha-0.test-mongodb-ha:27017,test-mongodb-ha-1.test-mongodb-ha:27017,test-mongodb-ha-2.test-mongodb-ha:27017 (replica set)" as user "admin": No replica set members found yet
2020-07-02 22:00:08,109 ERROR [-] (PID=110) ST2 API quit due to exception.
Traceback (most recent call last):
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2api/cmd/api.py", line 84, in main
    _setup()
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2api/cmd/api.py", line 58, in _setup
    service_registry=True, capabilities=capabilities)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/service_setup.py", line 160, in setup
    db_setup()
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/database_setup.py", line 56, in db_setup
    connection = db_init.db_setup_with_retry(**db_cfg)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/persistence/db_init.py", line 75, in db_setup_with_retry
    ssl_match_hostname=ssl_match_hostname)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/persistence/db_init.py", line 58, in db_func_with_retry
    return retrying_obj.call(db_func, *args, **kwargs)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/six.py", line 696, in reraise
    raise value
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/__init__.py", line 169, in db_setup
    ssl_match_hostname=ssl_match_hostname)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/__init__.py", line 151, in _db_connect
    raise e
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/__init__.py", line 144, in _db_connect
    connection.admin.command('ismaster')
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/database.py", line 730, in command
    read_preference, session) as (sock_info, slave_ok):
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1298, in _socket_for_reads
    server = self._select_server(read_preference, session)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1253, in _select_server
    server = topology.select_server(server_selector)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 235, in select_server
    address))
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 193, in select_servers
    selector, server_timeout, address)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 209, in _select_servers_loop
    self._error_message(selector))
pymongo.errors.ServerSelectionTimeoutError: No replica set members found yet
command terminated with exit code 1

@arms11
Copy link
Contributor

arms11 commented Jul 2, 2020

@Prudhveer-Reddy - could you please format this error as well? It's quite verbose as you can see and hard to find something meaningful in the error stack. Thanks in advance!

@pkankar
Copy link
Author

pkankar commented Jul 2, 2020

@Prudhveer-Reddy - could you please format this error as well? It's quite verbose as you can see and hard to find something meaningful in the error stack. Thanks in advance!

Sorry, I guess I applied a single ` rather than ```

@arm4b arm4b added bug Something isn't working help wanted Extra attention is needed labels Jul 3, 2020
@arm4b
Copy link
Member

arm4b commented Jul 3, 2020

This was raised in Slack by @Prudhveer-Reddy who installed stackstorm-ha in minikube.
The fact that I couldn't reproduce it in any other environment makes hard to debug this MongoDB HA cluster issue.

If anyone had similar problem, - please try to dig deeper, investigate and provide more info.

@arms11
Copy link
Contributor

arms11 commented Jul 4, 2020

Today I went ahead and try to set up a brand new cluster using docker-desktop on my mac.

I ran into an issue where multiple pods were failing with pymongo.errors.ServerSelectionTimeoutError: st2ha-mongodb-ha-0.st2ha-mongodb-ha:27017: [Errno 111] ECONNREFUSED,st2ha-mongodb-ha-1.st2ha-mongodb-ha:27017: [Errno 111] ECONNREFUSED,st2ha-mongodb-ha-2.st2ha-mongodb-ha:27017: [Errno 111] ECONNREFUSED

I had to add .{{ $.Release.Namespace }}.svc.cluster.local in the connection string portion of the helpers.tpl i could get all pods running successfully. So, it seems I wanted to replicate the issue, but I could not replicate it.

Please note, I did have my values.yaml modified with replicas reduced.

kubectl get nodes
NAME             STATUS   ROLES    AGE   VERSION
docker-desktop   Ready    master   47d   v1.16.6-beta.0
helm version
version.BuildInfo{Version:"v3.0.3", GitCommit:"ac925eb7279f4a6955df663a0128044a8a6b7593", GitTreeState:"clean", GoVersion:"go1.13.7"}
kubectl get pods
NAME                                                 READY   STATUS      RESTARTS   AGE
etcd-cluster-fd7fg7bpzn                              1/1     Running     0          4h36m
etcd-cluster-hbpvnwp9b2                              1/1     Running     0          4h36m
etcd-cluster-l6m6s45wkc                              1/1     Running     0          4h37m
st2ha-etcd-operator-etcd-operator-5fc86f779d-qgwrl   1/1     Running     1          4h38m
st2ha-job-st2-apikey-load-9sgjs                      0/1     Completed   0          26m
st2ha-job-st2-key-load-qqgmn                         0/1     Completed   0          26m
st2ha-job-st2-register-content-drgzf                 0/1     Completed   0          26m
st2ha-mongodb-ha-0                                   1/1     Running     0          40m
st2ha-mongodb-ha-1                                   1/1     Running     0          39m
st2ha-mongodb-ha-2                                   1/1     Running     0          39m
st2ha-rabbitmq-ha-0                                  1/1     Running     1          4h39m
st2ha-rabbitmq-ha-1                                  1/1     Running     0          4h33m
st2ha-rabbitmq-ha-2                                  1/1     Running     0          4h30m
st2ha-st2actionrunner-8664fb6c94-6m6h4               1/1     Running     0          30m
st2ha-st2actionrunner-8664fb6c94-pqwfh               1/1     Running     0          30m
st2ha-st2api-5d4db59678-dklgk                        1/1     Running     0          30m
st2ha-st2auth-8645cc49db-cvfd7                       1/1     Running     0          30m
st2ha-st2client-8686648c6d-dt2dr                     1/1     Running     0          30m
st2ha-st2garbagecollector-785484d699-bl6th           1/1     Running     0          30m
st2ha-st2notifier-57fbd6559f-9t28j                   1/1     Running     0          30m
st2ha-st2rulesengine-579b786fbb-kr9xh                1/1     Running     0          30m
st2ha-st2scheduler-7f7866ff98-99thc                  1/1     Running     0          30m
st2ha-st2sensorcontainer-5f756c748d-pkzng            1/1     Running     0          30m
st2ha-st2stream-85fcdc6944-qg6lc                     1/1     Running     0          30m
st2ha-st2timersengine-659668d56f-kchrz               1/1     Running     0          30m
st2ha-st2web-7d696d497-pvsmj                         1/1     Running     0          4h39m
st2ha-st2workflowengine-57cdd8f4b7-h77bd             1/1     Running     0          30m

@arms11
Copy link
Contributor

arms11 commented Jul 10, 2020

Found this on a different thread...
kubernetes/minikube#7828
Hope it helps!

@arm4b
Copy link
Member

arm4b commented Jul 10, 2020

@pkankar1 As I understand from Slack it's fixed for you now.
Can you confirm this was specific to minikube version and it's resolved?

Please share your findings so community can benefit from the potential solution.

@pkankar
Copy link
Author

pkankar commented Jul 10, 2020

The issue got solved for me. In my case, I have done two things

I changed the mongodb-replicaset version to 3.12.0. Now, it also works fine with 3.16.2.
If you are using minikube with driver=none, please make sure that the SC, PV, PVC are working correctly. I personally deleted the “standard” sc that minikube provides and created a new one. Also make sure to check if the storage - provisioner pod for minikube is installed and running in the kube-system namespace. First, I was using docker-desktop as a driver for minikube, but shifting to none driver also helped me.
The stackstorm HA works completely fine, unless there is a problem with your system.
If you have any further questions, do join Stackstorm on stackstorm-community.slack.com, they are quite helpful.
Thank you for your help @armab @arms11

@arm4b
Copy link
Member

arm4b commented Jul 10, 2020

Thanks for more info!

Closing this Issue as resolved then.

@ajitkumartanwade
Copy link

Hi All,

We are getting issue while installing the StackStorm ha on Kubernetes. No other pod than st2client, able to connect to mangodb

Error:
kubectl logs stackstorm-ha-1595399628-st2actionrunner-5768ddf56c-cnc57
2020-07-22 08:17:18,737 DEBUG [-] Using Python: 3.6.9 (/opt/stackstorm/st2/bin/python)
2020-07-22 08:17:18,737 DEBUG [-] Using config files: /etc/st2/st2.conf,/etc/st2/st2.docker.conf,/etc/st2/st2.user.conf
2020-07-22 08:17:18,738 DEBUG [-] Using logging config: /etc/st2/logging.actionrunner.conf
2020-07-22 08:17:18,767 INFO [-] Connecting to database "st2" @ "stackstorm-ha-1595399628-mongodb-ha-0.stackstorm-ha-1595399628-mongodb-ha:27017,stackstorm-ha-1595399628-mongodb-ha-1.stackstorm-ha-1595399628-mongodb-ha:27017,stackstorm-ha-1595399628-mongodb-ha-2.stackstorm-ha-1595399628-mongodb-ha:27017 (replica set)" as user "admin".
2020-07-22 08:17:21,780 ERROR [-] Failed to connect to database "st2" @ "stackstorm-ha-1595399628-mongodb-ha-0.stackstorm-ha-1595399628-mongodb-ha:27017,stackstorm-ha-1595399628-mongodb-ha-1.stackstorm-ha-1595399628-mongodb-ha:27017,stackstorm-ha-1595399628-mongodb-ha-2.stackstorm-ha-1595399628-mongodb-ha:27017 (replica set)" as user "admin": No replica set members found yet
2020-07-22 08:17:21,781 ERROR [-] (PID=1) Worker quit due to exception.
Traceback (most recent call last):
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2actions/cmd/actionrunner.py", line 95, in main
_setup()
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2actions/cmd/actionrunner.py", line 56, in _setup
register_signal_handlers=True, service_registry=True, capabilities=capabilities)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/service_setup.py", line 160, in setup
db_setup()
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/database_setup.py", line 56, in db_setup
connection = db_init.db_setup_with_retry(**db_cfg)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/persistence/db_init.py", line 75, in db_setup_with_retry
ssl_match_hostname=ssl_match_hostname)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/persistence/db_init.py", line 58, in db_func_with_retry
return retrying_obj.call(db_func, *args, **kwargs)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 206, in call
return attempt.get(self._wrap_exception)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/opt/stackstorm/st2/lib/python3.6/site-packages/six.py", line 696, in reraise
raise value
File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/init.py", line 169, in db_setup
ssl_match_hostname=ssl_match_hostname)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/init.py", line 151, in _db_connect
raise e
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/init.py", line 144, in _db_connect
connection.admin.command('ismaster')
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/database.py", line 730, in command
read_preference, session) as (sock_info, slave_ok):
File "/usr/lib/python3.6/contextlib.py", line 81, in enter
return next(self.gen)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1298, in _socket_for_reads
server = self._select_server(read_preference, session)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1253, in _select_server
server = topology.select_server(server_selector)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 235, in select_server
address))
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 193, in select_servers
selector, server_timeout, address)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 209, in _select_servers_loop

Using Kubernetes version 1.18.6
SS-ha version - 0.31.0

Also, we have checked that mongodb-replicaset is also downgraded to 3.12.0 and still it not working. Please help

Thanks in advance

@manisha-tanwar
Copy link
Contributor

Hi @ajitkumartanwade The should have been resolved now under StackStorm/st2#4997.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed status:need more info
Projects
None yet
Development

No branches or pull requests

5 participants