Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot deploy on kind because of couchdb-init failure #673

Closed
grussorusso opened this issue Feb 12, 2021 · 11 comments
Closed

Cannot deploy on kind because of couchdb-init failure #673

grussorusso opened this issue Feb 12, 2021 · 11 comments
Assignees

Comments

@grussorusso
Copy link

I am trying to deploy OpenWhisk on my Arch Linux machine using kind.
I have 2 worker nodes in the cluster and I have labelled them according to the official guide. I deploy OW using the official heml chart.

This is the output of kubectl get pods -n openwhisk:

NAME                                   READY   STATUS      RESTARTS   AGE
owdev-alarmprovider-5b86cb64ff-rm498   0/1     Init:0/1    0          22m
owdev-apigateway-bccbbcd67-pd79z       1/1     Running     0          22m
owdev-controller-0                     0/1     Init:1/2    0          22m
owdev-couchdb-584676b956-vctzv         1/1     Running     0          22m
owdev-gen-certs-xmxh7                  0/1     Completed   0          22m
owdev-init-couchdb-7fwnl               0/1     Error       0          20m
owdev-init-couchdb-d5dsv               0/1     Error       0          17m
owdev-init-couchdb-sqqhp               0/1     Error       0          22m
owdev-init-couchdb-wmcfj               0/1     Error       0          19m
owdev-install-packages-hqsg5           0/1     Init:0/1    0          22m
owdev-invoker-0                        0/1     Init:0/1    0          22m
owdev-kafka-0                          1/1     Running     0          22m
owdev-kafkaprovider-5574d4bf5f-ghdtk   0/1     Init:0/1    0          22m
owdev-nginx-86749d59cb-54c6l           0/1     Init:0/1    0          22m
owdev-redis-d65649c5b-xg6gh            1/1     Running     0          22m
owdev-wskadmin                         1/1     Running     0          22m
owdev-zookeeper-0                      1/1     Running     0          22m

and kubectl logs -n openwhisk owdev-init-couchdb-sqqhp:

Cloning into '/openwhisk'...
/openwhisk /
Note: checking out '1.0.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 2c621c07 fix start.sh to work on macos (#5019)
/
/openwhisk/ansible /
 [WARNING]: Unable to parse /openwhisk/ansible/environments/local as an
inventory source
 [WARNING]: No inventory was parsed, only implicit localhost is available
 [WARNING]: provided hosts list is empty, only localhost is available. Note
that the implicit localhost does not match 'all'

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
Friday 12 February 2021  14:41:10 +0000 (0:00:00.120)       0:00:00.120 ******* 
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TimeoutError: Timer expired after 60 seconds
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "/bin/findmnt --list --noheadings --notruncate", "msg": "Timer expired after 60 seconds", "rc": 257}

[FAILED]
> /bin/findmnt --list --noheadings --notruncate
Timer expired after 60 seconds

PLAY RECAP *********************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1   

Friday 12 February 2021  14:42:10 +0000 (0:01:00.469)       0:01:00.589 ******* 
=============================================================================== 
Gathering Facts -------------------------------------------------------- 60.47s

I verified that the issue does not appear when using Minikube (using both Docker and containerd as container runtime), so I think the issue is somehow related to kind.

My whisk.yml configuration is identical to that shown in the guide for deploying OW on kind (except for the apiHostName, which I set as indicated).

Thanks in advance for any hint

@dgrove-oss dgrove-oss self-assigned this Feb 15, 2021
@dgrove-oss
Copy link
Member

hi. What Kubernetes version (v1.18, v1.17, etc). are you running using kind? Our automated testing is currently covering v1.16, 1.17, and 1.18. We need to enable testing for 1.19 and 1.20 in travis-ci, but haven't gotten around to it yet...

@grussorusso
Copy link
Author

Thanks for your reply. Indeed, kind automatically picked v1.20.
Unfortunately, I get the same error with v1.18.15.

@dgrove-oss
Copy link
Member

It worked for me last night using kind 0.10 on MacOS Docker Desktop (aka my laptop) and Kubernetes v1.18.5. But I realized that I deployed the latest chart from git, not the 1.0.0 chart from the helm repo. I will try that later tonight just to make sure it isn't some problem with the chart itself.

@dgrove-oss
Copy link
Member

Probably not surprising, but on my MacOS / Docker Desktop, installing the 1.0.0 helm chart on kind 0.10 works. Here's the beginning snippet of the log from the init-couchdb job.

Daves-MacBook-Pro:kar dgrove$ kubectl logs jobs/owdev-init-couchdb -n openwhisk
Cloning into '/openwhisk'...
/openwhisk /
Note: checking out '1.0.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 2c621c07 fix start.sh to work on macos (#5019)
/
/openwhisk/ansible /
 [WARNING]: Unable to parse /openwhisk/ansible/environments/local as an
inventory source
 [WARNING]: No inventory was parsed, only implicit localhost is available
 [WARNING]: provided hosts list is empty, only localhost is available. Note
that the implicit localhost does not match 'all'

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
Wednesday 17 February 2021  01:37:08 +0000 (0:00:00.175)       0:00:00.176 **** 
ok: [localhost]

TASK [gen hosts if 'local' env is used] ****************************************
Wednesday 17 February 2021  01:37:09 +0000 (0:00:01.093)       0:00:01.269 **** 
changed: [localhost -> localhost]

TASK [find the ip of docker-machine] *******************************************
Wednesday 17 February 2021  01:37:09 +0000 (0:00:00.752)       0:00:02.022 **** 
skipping: [localhost]

TASK [get the docker-machine ip] ***********************************************
Wednesday 17 February 2021  01:37:09 +0000 (0:00:00.053)       0:00:02.076 **** 
skipping: [localhost]

TASK [gen hosts for docker-machine] ********************************************
Wednesday 17 February 2021  01:37:10 +0000 (0:00:00.068)       0:00:02.144 **** 
skipping: [localhost]

TASK [gen hosts for Jenkins] ***************************************************
Wednesday 17 February 2021  01:37:10 +0000 (0:00:00.082)       0:00:02.226 **** 
skipping: [localhost]

TASK [check if db_local.ini exists?] *******************************************
Wednesday 17 February 2021  01:37:10 +0000 (0:00:00.084)       0:00:02.311 **** 
ok: [localhost]

I'm not sure exactly what ansible does in its initial gathering facts stage, but probably the thing to do is to try to run that pod interactively, execute the commands manually, and see if you can get a better error message.

@grussorusso
Copy link
Author

Thanks again for checking. I followed your suggestion and tried executing the command on which Ansible fails (/bin/findmnt --list --noheadings --notruncate) on the container via kubectl exec. And it works without issues...
However, the pod eventually enters the Failed state with the same output.

At this point, I am even more confused about the problem. It is probably related to my own environment (maybe Docker version? I am using Docker 20.10.3 on Linux). I will verify if the same thing happens on a different Linux machine, as soon as I have some time to do so.
Anyway, although annoying, the issue is not blocking for me as I managed to deploy OpenWhisk on Minikube.

@grussorusso
Copy link
Author

I confirm everything works on a different Linux machine. So the issue is caused by something in my own configuration, although I haven't realized what exactly.

@s117
Copy link

s117 commented Jun 26, 2021

Thanks again for checking. I followed your suggestion and tried executing the command on which Ansible fails (/bin/findmnt --list --noheadings --notruncate) on the container via kubectl exec. And it works without issues...
However, the pod eventually enters the Failed state with the same output.

At this point, I am even more confused about the problem. It is probably related to my own environment (maybe Docker version? I am using Docker 20.10.3 on Linux). I will verify if the same thing happens on a different Linux machine, as soon as I have some time to do so.
Anyway, although annoying, the issue is not blocking for me as I managed to deploy OpenWhisk on Minikube.

I just came across the same problem. It turns the timeout is caused by Python rather than /bin/findmnt. Below are some related upstream tickets:
ansible/ansible#24228 (comment)
https://bugs.python.org/issue1663329
https://bugs.python.org/issue11284

My system runs Arch Linux also, and inside the container ulimit -n returns a large value. My workaround is to modify helm/openwhisk/configMapFiles/initCouchDB/initdb.sh to apply this patch to /usr/local/lib/python2.7/dist-packages/ansible/plugins/shell/__init__.py before using ansible.

@Reylak
Copy link

Reylak commented Oct 11, 2022

It seems that the "bug" is also to use Python 2: OpenWhisk uses a Docker image of CouchDB 2.3.1, that is based on Debian Buster (slim) where the only variant of Python is Python 2...

I won't try to run OpenWhisk with CouchDB 3 because I have no idea what that would imply.

However, it means another valid workaround is to "fix" the environment where Ansible run (i.e., the CouchDB container when initializing the DB) by adding ulimit -n 4096 to the init script at "helm/openwhisk/configMapFiles/initCouchDB/initdb.sh". Tested and approved on roughly up-to-date Arch Linux.

Do you think this could be a valid fix to this problem, that could be merged? As a way to deal with a wart from the obsolete Python 2. I find it cleaner, clearer, and easier to implement than to patch some Ansible plugin file.

@rabbah
Copy link
Member

rabbah commented Oct 14, 2022

If still using Python 2, it's def better to move to v3 instead.

@Reylak
Copy link

Reylak commented Oct 14, 2022

Sure, but as I said, using Python 2 comes from using CouchDB 2.3.1 Docker image. I don't know if OpenWhisk can work with v3, which hopefully is based on a more up-to-date Debian image.

@Samxamnom
Copy link

However, it means another valid workaround is to "fix" the environment where Ansible run (i.e., the CouchDB container when initializing the DB) by adding ulimit -n 4096 to the init script at "helm/openwhisk/configMapFiles/initCouchDB/initdb.sh". Tested and approved on roughly up-to-date Arch Linux.

Do you think this could be a valid fix to this problem, that could be merged? As a way to deal with a wart from the obsolete Python 2. I find it cleaner, clearer, and easier to implement than to patch some Ansible plugin file

I came across the same issue on Arch Linux, this fix also worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants