MongoDB cluster issue - No replica set members found yet #139

pkankar · 2020-07-02T22:39:15Z

I have followed the instructions mentioned in official Stackstorm page, It shows ST2 HA OK message, but my pods get into "CrashLoopBackOff" state.


kubectl exec -it $(kubectl get --namespace default pod -l app=st2client,release=test -o jsonpath="{.items[0].metadata.name}") -- /opt/stackstorm/st2/bin/st2api --config-file=/etc/st2/st2.conf --config-file=/etc/st2/st2.docker.conf --config-file=/etc/st2/st2.user.conf
2020-07-02 22:00:05,080 DEBUG [-] Using Python: 3.6.9 (/opt/stackstorm/st2/bin/python)
2020-07-02 22:00:05,080 DEBUG [-] Using config files: /etc/st2/st2.conf,/etc/st2/st2.docker.conf,/etc/st2/st2.user.conf
2020-07-02 22:00:05,081 DEBUG [-] Using logging config: /etc/st2/logging.api.gunicorn.conf
2020-07-02 22:00:05,098 INFO [-] Connecting to database "st2" @ "test-mongodb-ha-0.test-mongodb-ha:27017,test-mongodb-ha-1.test-mongodb-ha:27017,test-mongodb-ha-2.test-mongodb-ha:27017 (replica set)" as user "admin".
2020-07-02 22:00:08,109 ERROR [-] Failed to connect to database "st2" @ "test-mongodb-ha-0.test-mongodb-ha:27017,test-mongodb-ha-1.test-mongodb-ha:27017,test-mongodb-ha-2.test-mongodb-ha:27017 (replica set)" as user "admin": No replica set members found yet
2020-07-02 22:00:08,109 ERROR [-] (PID=110) ST2 API quit due to exception.
Traceback (most recent call last):
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2api/cmd/api.py", line 84, in main
    _setup()
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2api/cmd/api.py", line 58, in _setup
    service_registry=True, capabilities=capabilities)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/service_setup.py", line 160, in setup
    db_setup()
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/database_setup.py", line 56, in db_setup
    connection = db_init.db_setup_with_retry(**db_cfg)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/persistence/db_init.py", line 75, in db_setup_with_retry
    ssl_match_hostname=ssl_match_hostname)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/persistence/db_init.py", line 58, in db_func_with_retry
    return retrying_obj.call(db_func, *args, **kwargs)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/six.py", line 696, in reraise
    raise value
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/__init__.py", line 169, in db_setup
    ssl_match_hostname=ssl_match_hostname)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/__init__.py", line 151, in _db_connect
    raise e
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/__init__.py", line 144, in _db_connect
    connection.admin.command('ismaster')
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/database.py", line 730, in command
    read_preference, session) as (sock_info, slave_ok):
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1298, in _socket_for_reads
    server = self._select_server(read_preference, session)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1253, in _select_server
    server = topology.select_server(server_selector)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 235, in select_server
    address))
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 193, in select_servers
    selector, server_timeout, address)
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 209, in _select_servers_loop
    self._error_message(selector))
pymongo.errors.ServerSelectionTimeoutError: No replica set members found yet
command terminated with exit code 1

The text was updated successfully, but these errors were encountered:

arms11 · 2020-07-02T22:50:27Z

@Prudhveer-Reddy - could you please format this error as well? It's quite verbose as you can see and hard to find something meaningful in the error stack. Thanks in advance!

pkankar · 2020-07-02T22:53:41Z

@Prudhveer-Reddy - could you please format this error as well? It's quite verbose as you can see and hard to find something meaningful in the error stack. Thanks in advance!

Sorry, I guess I applied a single ` rather than ```

arm4b · 2020-07-03T21:57:26Z

This was raised in Slack by @Prudhveer-Reddy who installed stackstorm-ha in minikube.
The fact that I couldn't reproduce it in any other environment makes hard to debug this MongoDB HA cluster issue.

If anyone had similar problem, - please try to dig deeper, investigate and provide more info.

arms11 · 2020-07-04T03:22:28Z

Today I went ahead and try to set up a brand new cluster using docker-desktop on my mac.

I ran into an issue where multiple pods were failing with pymongo.errors.ServerSelectionTimeoutError: st2ha-mongodb-ha-0.st2ha-mongodb-ha:27017: [Errno 111] ECONNREFUSED,st2ha-mongodb-ha-1.st2ha-mongodb-ha:27017: [Errno 111] ECONNREFUSED,st2ha-mongodb-ha-2.st2ha-mongodb-ha:27017: [Errno 111] ECONNREFUSED

I had to add .{{ $.Release.Namespace }}.svc.cluster.local in the connection string portion of the helpers.tpl i could get all pods running successfully. So, it seems I wanted to replicate the issue, but I could not replicate it.

Please note, I did have my values.yaml modified with replicas reduced.

kubectl get nodes
NAME             STATUS   ROLES    AGE   VERSION
docker-desktop   Ready    master   47d   v1.16.6-beta.0

helm version
version.BuildInfo{Version:"v3.0.3", GitCommit:"ac925eb7279f4a6955df663a0128044a8a6b7593", GitTreeState:"clean", GoVersion:"go1.13.7"}

kubectl get pods
NAME                                                 READY   STATUS      RESTARTS   AGE
etcd-cluster-fd7fg7bpzn                              1/1     Running     0          4h36m
etcd-cluster-hbpvnwp9b2                              1/1     Running     0          4h36m
etcd-cluster-l6m6s45wkc                              1/1     Running     0          4h37m
st2ha-etcd-operator-etcd-operator-5fc86f779d-qgwrl   1/1     Running     1          4h38m
st2ha-job-st2-apikey-load-9sgjs                      0/1     Completed   0          26m
st2ha-job-st2-key-load-qqgmn                         0/1     Completed   0          26m
st2ha-job-st2-register-content-drgzf                 0/1     Completed   0          26m
st2ha-mongodb-ha-0                                   1/1     Running     0          40m
st2ha-mongodb-ha-1                                   1/1     Running     0          39m
st2ha-mongodb-ha-2                                   1/1     Running     0          39m
st2ha-rabbitmq-ha-0                                  1/1     Running     1          4h39m
st2ha-rabbitmq-ha-1                                  1/1     Running     0          4h33m
st2ha-rabbitmq-ha-2                                  1/1     Running     0          4h30m
st2ha-st2actionrunner-8664fb6c94-6m6h4               1/1     Running     0          30m
st2ha-st2actionrunner-8664fb6c94-pqwfh               1/1     Running     0          30m
st2ha-st2api-5d4db59678-dklgk                        1/1     Running     0          30m
st2ha-st2auth-8645cc49db-cvfd7                       1/1     Running     0          30m
st2ha-st2client-8686648c6d-dt2dr                     1/1     Running     0          30m
st2ha-st2garbagecollector-785484d699-bl6th           1/1     Running     0          30m
st2ha-st2notifier-57fbd6559f-9t28j                   1/1     Running     0          30m
st2ha-st2rulesengine-579b786fbb-kr9xh                1/1     Running     0          30m
st2ha-st2scheduler-7f7866ff98-99thc                  1/1     Running     0          30m
st2ha-st2sensorcontainer-5f756c748d-pkzng            1/1     Running     0          30m
st2ha-st2stream-85fcdc6944-qg6lc                     1/1     Running     0          30m
st2ha-st2timersengine-659668d56f-kchrz               1/1     Running     0          30m
st2ha-st2web-7d696d497-pvsmj                         1/1     Running     0          4h39m
st2ha-st2workflowengine-57cdd8f4b7-h77bd             1/1     Running     0          30m

arms11 · 2020-07-10T14:44:54Z

Found this on a different thread...
kubernetes/minikube#7828
Hope it helps!

arm4b · 2020-07-10T14:49:55Z

@pkankar1 As I understand from Slack it's fixed for you now.
Can you confirm this was specific to minikube version and it's resolved?

Please share your findings so community can benefit from the potential solution.

pkankar · 2020-07-10T15:08:50Z

The issue got solved for me. In my case, I have done two things

I changed the mongodb-replicaset version to 3.12.0. Now, it also works fine with 3.16.2.
If you are using minikube with driver=none, please make sure that the SC, PV, PVC are working correctly. I personally deleted the “standard” sc that minikube provides and created a new one. Also make sure to check if the storage - provisioner pod for minikube is installed and running in the kube-system namespace. First, I was using docker-desktop as a driver for minikube, but shifting to none driver also helped me.
The stackstorm HA works completely fine, unless there is a problem with your system.
If you have any further questions, do join Stackstorm on stackstorm-community.slack.com, they are quite helpful.
Thank you for your help @armab @arms11

arm4b · 2020-07-10T16:06:25Z

Thanks for more info!

Closing this Issue as resolved then.

ajitkumartanwade · 2020-07-22T09:04:11Z

Hi All,

We are getting issue while installing the StackStorm ha on Kubernetes. No other pod than st2client, able to connect to mangodb

Error:
kubectl logs stackstorm-ha-1595399628-st2actionrunner-5768ddf56c-cnc57
2020-07-22 08:17:18,737 DEBUG [-] Using Python: 3.6.9 (/opt/stackstorm/st2/bin/python)
2020-07-22 08:17:18,737 DEBUG [-] Using config files: /etc/st2/st2.conf,/etc/st2/st2.docker.conf,/etc/st2/st2.user.conf
2020-07-22 08:17:18,738 DEBUG [-] Using logging config: /etc/st2/logging.actionrunner.conf
2020-07-22 08:17:18,767 INFO [-] Connecting to database "st2" @ "stackstorm-ha-1595399628-mongodb-ha-0.stackstorm-ha-1595399628-mongodb-ha:27017,stackstorm-ha-1595399628-mongodb-ha-1.stackstorm-ha-1595399628-mongodb-ha:27017,stackstorm-ha-1595399628-mongodb-ha-2.stackstorm-ha-1595399628-mongodb-ha:27017 (replica set)" as user "admin".
2020-07-22 08:17:21,780 ERROR [-] Failed to connect to database "st2" @ "stackstorm-ha-1595399628-mongodb-ha-0.stackstorm-ha-1595399628-mongodb-ha:27017,stackstorm-ha-1595399628-mongodb-ha-1.stackstorm-ha-1595399628-mongodb-ha:27017,stackstorm-ha-1595399628-mongodb-ha-2.stackstorm-ha-1595399628-mongodb-ha:27017 (replica set)" as user "admin": No replica set members found yet
2020-07-22 08:17:21,781 ERROR [-] (PID=1) Worker quit due to exception.
Traceback (most recent call last):
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2actions/cmd/actionrunner.py", line 95, in main
_setup()
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2actions/cmd/actionrunner.py", line 56, in _setup
register_signal_handlers=True, service_registry=True, capabilities=capabilities)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/service_setup.py", line 160, in setup
db_setup()
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/database_setup.py", line 56, in db_setup
connection = db_init.db_setup_with_retry(**db_cfg)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/persistence/db_init.py", line 75, in db_setup_with_retry
ssl_match_hostname=ssl_match_hostname)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/persistence/db_init.py", line 58, in db_func_with_retry
return retrying_obj.call(db_func, *args, **kwargs)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 206, in call
return attempt.get(self._wrap_exception)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/opt/stackstorm/st2/lib/python3.6/site-packages/six.py", line 696, in reraise
raise value
File "/opt/stackstorm/st2/lib/python3.6/site-packages/retrying.py", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/init.py", line 169, in db_setup
ssl_match_hostname=ssl_match_hostname)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/init.py", line 151, in _db_connect
raise e
File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/models/db/init.py", line 144, in _db_connect
connection.admin.command('ismaster')
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/database.py", line 730, in command
read_preference, session) as (sock_info, slave_ok):
File "/usr/lib/python3.6/contextlib.py", line 81, in enter
return next(self.gen)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1298, in _socket_for_reads
server = self._select_server(read_preference, session)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1253, in _select_server
server = topology.select_server(server_selector)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 235, in select_server
address))
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 193, in select_servers
selector, server_timeout, address)
File "/opt/stackstorm/st2/lib/python3.6/site-packages/pymongo/topology.py", line 209, in _select_servers_loop

Using Kubernetes version 1.18.6
SS-ha version - 0.31.0

Also, we have checked that mongodb-replicaset is also downgraded to 3.12.0 and still it not working. Please help

Thanks in advance

manisha-tanwar · 2020-07-24T11:28:19Z

Hi @ajitkumartanwade The should have been resolved now under StackStorm/st2#4997.

arm4b added bug Something isn't working help wanted Extra attention is needed labels Jul 3, 2020

arm4b added the status:need more info label Jul 3, 2020

arm4b closed this as completed Jul 10, 2020

manisha-tanwar mentioned this issue Jul 20, 2020

Integrate ldap #140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MongoDB cluster issue - No replica set members found yet #139

MongoDB cluster issue - No replica set members found yet #139

pkankar commented Jul 2, 2020 •

edited

arms11 commented Jul 2, 2020

pkankar commented Jul 2, 2020

arm4b commented Jul 3, 2020

arms11 commented Jul 4, 2020

arms11 commented Jul 10, 2020

arm4b commented Jul 10, 2020

pkankar commented Jul 10, 2020 •

edited

arm4b commented Jul 10, 2020

ajitkumartanwade commented Jul 22, 2020

manisha-tanwar commented Jul 24, 2020

MongoDB cluster issue - No replica set members found yet #139

MongoDB cluster issue - No replica set members found yet #139

Comments

pkankar commented Jul 2, 2020 • edited

arms11 commented Jul 2, 2020

pkankar commented Jul 2, 2020

arm4b commented Jul 3, 2020

arms11 commented Jul 4, 2020

arms11 commented Jul 10, 2020

arm4b commented Jul 10, 2020

pkankar commented Jul 10, 2020 • edited

arm4b commented Jul 10, 2020

ajitkumartanwade commented Jul 22, 2020

manisha-tanwar commented Jul 24, 2020

pkankar commented Jul 2, 2020 •

edited

pkankar commented Jul 10, 2020 •

edited