Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't deploy on EKS #766

Closed
diegombeltran opened this issue Jun 17, 2023 · 1 comment
Closed

Can't deploy on EKS #766

diegombeltran opened this issue Jun 17, 2023 · 1 comment

Comments

@diegombeltran
Copy link

diegombeltran commented Jun 17, 2023

Hello,

I spent too many hours trying to deploy the chart into an EKS cluster for testing purposes.

This cluster uses the default AWS CNI and connectivity between pods works ok. Pods can clone from Github (for the CouchDB init pod and the Ansible role)

However, CouchDB gets ready although empty. The init-couchdb pod stucks on waiting for CouchDB to be available

CouchDB pod logs

[info] 2023-06-17T18:56:17.011839Z couchdb@couchdb0 <0.9.0> -------- Application couch_log started on node couchdb@couchdb0                                                                                                                                                      
[info] 2023-06-17T18:56:17.016652Z couchdb@couchdb0 <0.9.0> -------- Application folsom started on node couchdb@couchdb0                                                                                                                                                         
[info] 2023-06-17T18:56:17.056156Z couchdb@couchdb0 <0.9.0> -------- Application couch_stats started on node couchdb@couchdb0                                                                                                                                                    
[info] 2023-06-17T18:56:17.056327Z couchdb@couchdb0 <0.9.0> -------- Application khash started on node couchdb@couchdb0                                                                                                                                                          
[info] 2023-06-17T18:56:17.063368Z couchdb@couchdb0 <0.9.0> -------- Application couch_event started on node couchdb@couchdb0                                                                                                                                                    
[info] 2023-06-17T18:56:17.063521Z couchdb@couchdb0 <0.9.0> -------- Application hyper started on node couchdb@couchdb0                                                                                                                                                          
[info] 2023-06-17T18:56:17.071432Z couchdb@couchdb0 <0.9.0> -------- Application ibrowse started on node couchdb@couchdb0                                                                                                                                                        
[info] 2023-06-17T18:56:17.078125Z couchdb@couchdb0 <0.9.0> -------- Application ioq started on node couchdb@couchdb0                                                                                                                                                            
[info] 2023-06-17T18:56:17.078284Z couchdb@couchdb0 <0.9.0> -------- Application mochiweb started on node couchdb@couchdb0                                                                                                                                                       
[info] 2023-06-17T18:56:17.086877Z couchdb@couchdb0 <0.212.0> -------- Apache CouchDB 2.3.1 is starting.                                                                                                                                                                         
                                                                                                                                                                                                                                                                                 
[info] 2023-06-17T18:56:17.086958Z couchdb@couchdb0 <0.213.0> -------- Starting couch_sup                                                                                                                                                                                        
[notice] 2023-06-17T18:56:17.096614Z couchdb@couchdb0 <0.96.0> -------- config: [features] pluggable-storage-engines set to true for reason nil                                                                                                                                  
[notice] 2023-06-17T18:56:17.104671Z couchdb@couchdb0 <0.96.0> -------- config: [admins] whisk_admin set to -pbkdf2-13d37107c7df3c10734110831868f5fe7559358f,273cbf620346f36e5cb0b982068bc4e0,10 for reason nil                                                                  
[notice] 2023-06-17T18:56:17.132239Z couchdb@couchdb0 <0.96.0> -------- config: [couchdb] uuid set to 44d3007dd9acb050c4a72194f28045f4 for reason nil                                                                                                                            
[info] 2023-06-17T18:56:17.169199Z couchdb@couchdb0 <0.218.0> -------- open_result error {not_found,no_db_file} for _users                                                                                                                                                       
[info] 2023-06-17T18:56:17.242079Z couchdb@couchdb0 <0.212.0> -------- Apache CouchDB has started. Time to relax.                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
[info] 2023-06-17T18:56:17.242203Z couchdb@couchdb0 <0.212.0> -------- Apache CouchDB has started on http://any:5986/                                                                                                                                                            
[info] 2023-06-17T18:56:17.242329Z couchdb@couchdb0 <0.9.0> -------- Application couch started on node couchdb@couchdb0                                                                                                                                                          
[info] 2023-06-17T18:56:17.242425Z couchdb@couchdb0 <0.9.0> -------- Application ets_lru started on node couchdb@couchdb0                                                                                                                                                        
[notice] 2023-06-17T18:56:17.296037Z couchdb@couchdb0 <0.283.0> -------- rexi_server : started servers                                                                                                                                                                           
[notice] 2023-06-17T18:56:17.301428Z couchdb@couchdb0 <0.288.0> -------- rexi_buffer : started servers                                                                                                                                                                           
[info] 2023-06-17T18:56:17.301602Z couchdb@couchdb0 <0.9.0> -------- Application rexi started on node couchdb@couchdb0                                                                                                                                                           
[info] 2023-06-17T18:56:17.315957Z couchdb@couchdb0 <0.218.0> -------- open_result error {not_found,no_db_file} for _nodes                                                                                                                                                       
[warning] 2023-06-17T18:56:17.316010Z couchdb@couchdb0 <0.297.0> -------- creating missing database: _nodes                                                                                                                                                                      
[warning] 2023-06-17T18:56:17.347553Z couchdb@couchdb0 <0.312.0> -------- creating missing database: _dbs                                                                                                                                                                        
[info] 2023-06-17T18:56:17.347556Z couchdb@couchdb0 <0.218.0> -------- open_result error {not_found,no_db_file} for _dbs                                                                                                                                                         
[warning] 2023-06-17T18:56:17.347603Z couchdb@couchdb0 <0.311.0> -------- creating missing database: _dbs                                                                                                                                                                        
[info] 2023-06-17T18:56:17.357304Z couchdb@couchdb0 <0.9.0> -------- Application mem3 started on node couchdb@couchdb0                                                                                                                                                           
[info] 2023-06-17T18:56:17.357411Z couchdb@couchdb0 <0.9.0> -------- Application fabric started on node couchdb@couchdb0                                                                                                                                                         
[info] 2023-06-17T18:56:17.378853Z couchdb@couchdb0 <0.9.0> -------- Application chttpd started on node couchdb@couchdb0                                                                                                                                                         
[notice] 2023-06-17T18:56:17.392567Z couchdb@couchdb0 <0.350.0> -------- chttpd_auth_cache changes listener died database_does_not_exist at mem3_shards:load_shards_from_db/6(line:395) <= mem3_shards:load_shards_from_disk/1(line:370) <= mem3_shards:load_shards_from_disk/2(l
ine:399) <= mem3_shards:for_docid/3(line:86) <= fabric_doc_open:go/3(line:39) <= chttpd_auth_cache:ensure_auth_ddoc_exists/2(line:195) <= chttpd_auth_cache:listen_for_changes/1(line:142)                                                                                       
[error] 2023-06-17T18:56:17.392621Z couchdb@couchdb0 emulator -------- Error in process <0.351.0> on node couchdb@couchdb0 with exit value:                                                                                                                                      
{database_does_not_exist,[{mem3_shards,load_shards_from_db,"_users",[{file,"src/mem3_shards.erl"},{line,395}]},{mem3_shards,load_shards_from_disk,1,[{file,"src/mem3_shards.erl"},{line,370}]},{mem3_shards,load_shards_from_disk,2,[{file,"src/mem3_shards.erl"},{line,399}]},{m
em3_shards,for_docid,3,[{file,"src/mem3_shards.erl"},{line,86}]},{fabric_doc_open,go,3,[{file,"src/fabric_doc_open.erl"},{line,39}]},{chttpd_auth_cache,ensure_auth_ddoc_exists,2,[{file,"src/chttpd_auth_cache.erl"},{line,195}]},{chttpd_auth_cache,listen_for_changes,1,[{file
,"src/chttpd_auth_cache.erl"},{line,142}]}]}                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                 
[info] 2023-06-17T18:56:17.395519Z couchdb@couchdb0 <0.9.0> -------- Application couch_index started on node couchdb@couchdb0                                                                                                                                                    
[info] 2023-06-17T18:56:17.395765Z couchdb@couchdb0 <0.9.0> -------- Application couch_mrview started on node couchdb@couchdb0                                                                                                                                                   
[info] 2023-06-17T18:56:17.395805Z couchdb@couchdb0 <0.9.0> -------- Application couch_plugins started on node couchdb@couchdb0                                                                                                                                                  
[notice] 2023-06-17T18:56:17.420127Z couchdb@couchdb0 <0.96.0> -------- config: [features] scheduler set to true for reason nil                                                                                                                                                  
[info] 2023-06-17T18:56:17.434126Z couchdb@couchdb0 <0.218.0> -------- open_result error {not_found,no_db_file} for _replicator                                                                                                                                                  
[notice] 2023-06-17T18:56:17.438762Z couchdb@couchdb0 <0.368.0> -------- creating replicator ddoc <<"_replicator">>                                                                                                                                                              
[info] 2023-06-17T18:56:17.447682Z couchdb@couchdb0 <0.9.0> -------- Application couch_replicator started on node couchdb@couchdb0                                                                                                                                               
[info] 2023-06-17T18:56:17.457670Z couchdb@couchdb0 <0.9.0> -------- Application couch_peruser started on node couchdb@couchdb0                                                                                                                                                  
[info] 2023-06-17T18:56:17.467642Z couchdb@couchdb0 <0.9.0> -------- Application ddoc_cache started on node couchdb@couchdb0                                                                                                                                                     
[info] 2023-06-17T18:56:17.482205Z couchdb@couchdb0 <0.9.0> -------- Application global_changes started on node couchdb@couchdb0                                                                                                                                                 
[info] 2023-06-17T18:56:17.482230Z couchdb@couchdb0 <0.9.0> -------- Application jiffy started on node couchdb@couchdb0
[info] 2023-06-17T18:56:17.491913Z couchdb@couchdb0 <0.9.0> -------- Application mango started on node couchdb@couchdb0
[info] 2023-06-17T18:56:17.497995Z couchdb@couchdb0 <0.9.0> -------- Application setup started on node couchdb@couchdb0
[info] 2023-06-17T18:56:17.498038Z couchdb@couchdb0 <0.9.0> -------- Application snappy started on node couchdb@couchdb0
[notice] 2023-06-17T18:56:22.393805Z couchdb@couchdb0 <0.350.0> -------- chttpd_auth_cache changes listener died database_does_not_exist at mem3_shards:load_shards_from_db/6(line:395) <= mem3_shards:load_shards_from_disk/1(line:370) <= mem3_shards:load_shards_from_disk/2(l
ine:399) <= mem3_shards:for_docid/3(line:86) <= fabric_doc_open:go/3(line:39) <= chttpd_auth_cache:ensure_auth_ddoc_exists/2(line:195) <= chttpd_auth_cache:listen_for_changes/1(line:142)
[error] 2023-06-17T18:56:22.393843Z couchdb@couchdb0 emulator -------- Error in process <0.490.0> on node couchdb@couchdb0 with exit value:
{database_does_not_exist,[{mem3_shards,load_shards_from_db,"_users",[{file,"src/mem3_shards.erl"},{line,395}]},{mem3_shards,load_shards_from_disk,1,[{file,"src/mem3_shards.erl"},{line,370}]},{mem3_shards,load_shards_from_disk,2,[{file,"src/mem3_shards.erl"},{line,399}]},{m
em3_shards,for_docid,3,[{file,"src/mem3_shards.erl"},{line,86}]},{fabric_doc_open,go,3,[{file,"src/fabric_doc_open.erl"},{line,39}]},{chttpd_auth_cache,ensure_auth_ddoc_exists,2,[{file,"src/chttpd_auth_cache.erl"},{line,195}]},{chttpd_auth_cache,listen_for_changes,1,[{file
,"src/chttpd_auth_cache.erl"},{line,142}]}]}
Cloning into '/openwhisk'...                                                                                                                                                                   
/openwhisk /                                                                                                                                                                                   
Note: checking out 'ef725a653ab112391f79c274d8e3dcfb915d59a3'.                                                                                                                                 
                                                                                                                                                                                               
You are in 'detached HEAD' state. You can look around, make experimental                                                                                                                       
changes and commit them, and you can discard any commits you make in this                                                                                                                      
state without impacting any branches by performing another checkout.                                                                                                                           
                                                                                                                                                                                               
If you want to create a new branch to retain commits you create, you may                                                                                                                       
do so (now or later) by using -b with the checkout command again. Example:                                                                                                                     
                                                                                                                                                                                               
  git checkout -b <new-branch-name>                                                                                                                                                                                                            
                                                                                                                                                                                               
HEAD is now at ef725a65 Prevent cycle in the QueueManager (#5332)                                                                                              
/                                                                                                                                                                                                                                              
/openwhisk/ansible /                                                                                                                                                                                                                           
 [WARNING]: Unable to parse /openwhisk/ansible/environments/local as an                                                                                                                                                                        
inventory source                                                                                                                                                                                                                               
 [WARNING]: No inventory was parsed, only implicit localhost is available                                                                                                                                                                      
 [WARNING]: provided hosts list is empty, only localhost is available. Note                                                                                                                                                                    
that the implicit localhost does not match 'all'                                                                                                                                                                                               
                                                                                                                                                                                                                                               
PLAY [localhost] ***************************************************************                                                                                                                                                               
 [WARNING]: While constructing a mapping from                                                                                                                                                                                                  
/openwhisk/ansible/group_vars/all, line 494, column 3, found a duplicate dict                                                                                                                                                                  
key (dataManagementService). Using last defined value only.                                                                                                                                                                                    
                                                                                                                                                                                                                                               
TASK [Gathering Facts] *********************************************************                                                                                                                                                               
Saturday 17 June 2023  18:56:21 +0000 (0:00:00.041)       0:00:00.041 *********                                                                                                                                                                
ok: [localhost]                                                                                                                                                                                                                                
                                                                                                                                                                                                                                               
TASK [gen hosts if 'local' env is used] ****************************************                                                                                                                                                               
Saturday 17 June 2023  18:56:23 +0000 (0:00:01.517)       0:00:01.558 *********                                                                                                                                                                
changed: [localhost -> localhost]                                                                                                                              
                                                                                                                                                                                                                                               
TASK [find the ip of docker-machine] *******************************************                                                                                                                                                               
Saturday 17 June 2023  18:56:23 +0000 (0:00:00.463)       0:00:02.022 *********                                                                                                                                                                
skipping: [localhost]                                                                                                                                                                                                                          
                                                                                                                                                                                                                                               
TASK [get the docker-machine ip] ***********************************************                                                                                                                                                               
Saturday 17 June 2023  18:56:23 +0000 (0:00:00.023)       0:00:02.045 *********                                                                                                                                                                
skipping: [localhost]                                                                                                                                                                                                                          
                                                                                                                                                                                                                                               
TASK [gen hosts for docker-machine] ********************************************                                                                                                                                                               
Saturday 17 June 2023  18:56:23 +0000 (0:00:00.029)       0:00:02.075 *********                                                                                                                                                                
skipping: [localhost]                                                                                                                                                                                                                          
                                                                                                                                                                                                                                               
TASK [gen hosts for Jenkins] ***************************************************                                                                                                                                                               
Saturday 17 June 2023  18:56:23 +0000 (0:00:00.029)       0:00:02.104 *********                                                                                                                                                                
skipping: [localhost]                                                                                                                                                                                                                          
                                                                                                                                                                                                                                               
TASK [check if db_local.ini exists?] *******************************************                                                                                                                                                               
Saturday 17 June 2023  18:56:23 +0000 (0:00:00.027)       0:00:02.132 *********                                                                                                                                                                
ok: [localhost]                                                                                                                                                
                                                                                                                                                                                                                                               
TASK [prepare db_local.ini] ****************************************************                                                                               
Saturday 17 June 2023  18:56:23 +0000 (0:00:00.149)       0:00:02.281 *********                                                                                
changed: [localhost -> localhost]                                                                                                                              
                                                                                                                                                               
TASK [gen untrusted server certificate for host] *******************************                                                                               
Saturday 17 June 2023  18:56:24 +0000 (0:00:00.268)       0:00:02.550 *********                                                                                
changed: [localhost -> localhost]                                                                                                                              

TASK [gen untrusted client certificate for host] *******************************                                                                               
Saturday 17 June 2023  18:56:24 +0000 (0:00:00.830)       0:00:03.380 *********                                                                                
changed: [localhost -> localhost]                                                                                                                              
                                                                                                                                                               
TASK [clean up old kafka keystore] *********************************************                                                                               
Saturday 17 June 2023  18:56:25 +0000 (0:00:00.819)       0:00:04.200 *********                                                                                
skipping: [localhost]                                                                                                                                          
                                                                                                                                                               
TASK [ensure kafka files directory exists] *************************************                                                                               
Saturday 17 June 2023  18:56:25 +0000 (0:00:00.025)       0:00:04.225 *********                                                                                
skipping: [localhost]                                                                                                                                          

TASK [generate kafka certificates] *********************************************                                                                               
Saturday 17 June 2023  18:56:25 +0000 (0:00:00.031)       0:00:04.257 *********                                                                                
skipping: [localhost]                                                                                                                                          

TASK [ensure controller files directory exists] ********************************                                                                               
Saturday 17 June 2023  18:56:25 +0000 (0:00:00.022)       0:00:04.280 *********                                                                                
changed: [localhost]                                                                                                                                           

TASK [generate controller certificates] ****************************************                                                                               
Saturday 17 June 2023  18:56:26 +0000 (0:00:00.567)       0:00:04.847 *********                                                                                
changed: [localhost -> localhost]                                                                                                                              

TASK [ensure invoker files directory exists] ***********************************                                                                               
Saturday 17 June 2023  18:56:27 +0000 (0:00:01.002)       0:00:05.849 *********                                                                                
changed: [localhost]                                                                                                                                           

TASK [generate invoker certificates] *******************************************                                                                               
Saturday 17 June 2023  18:56:27 +0000 (0:00:00.480)       0:00:06.330 *********                                                                                
changed: [localhost -> localhost]                                                                                                                              

PLAY RECAP *********************************************************************                                                                               
localhost                  : ok=10   changed=8    unreachable=0    failed=0                                                                                    

Saturday 17 June 2023  18:56:28 +0000 (0:00:01.040)       0:00:07.371 *********                                                                                
===============================================================================                                                                                
Gathering Facts --------------------------------------------------------- 1.52s                                                                                
generate invoker certificates ------------------------------------------- 1.04s                                                                                
generate controller certificates ---------------------------------------- 1.00s                                                                                
gen untrusted server certificate for host ------------------------------- 0.83s                                                                                
gen untrusted client certificate for host ------------------------------- 0.82s                                                                                
ensure controller files directory exists -------------------------------- 0.57s                                                                                
ensure invoker files directory exists ----------------------------------- 0.48s                                                                                
gen hosts if 'local' env is used ---------------------------------------- 0.46s                                                                                
prepare db_local.ini ---------------------------------------------------- 0.27s                                                                                
check if db_local.ini exists? ------------------------------------------- 0.15s                                                                                
ensure kafka files directory exists ------------------------------------- 0.03s                                                                                
get the docker-machine ip ----------------------------------------------- 0.03s                                                                                
gen hosts for docker-machine -------------------------------------------- 0.03s                                                                                
gen hosts for Jenkins --------------------------------------------------- 0.03s                                                                                
clean up old kafka keystore --------------------------------------------- 0.03s                                                                                
find the ip of docker-machine ------------------------------------------- 0.02s                                                                                
generate kafka certificates --------------------------------------------- 0.02s                                                                                

PLAY [db] **********************************************************************                                                                               
 [WARNING]: While constructing a mapping from                                                                                                                  
/openwhisk/ansible/group_vars/all, line 494, column 3, found a duplicate dict                                                                                  
key (dataManagementService). Using last defined value only.                                                                                                    

TASK [Gathering Facts] *********************************************************                                                                               
Saturday 17 June 2023  18:56:29 +0000 (0:00:00.044)       0:00:00.044 *********                                                                                
ok: [172.17.0.1]                                                                                                                                               

PLAY RECAP *********************************************************************                                                                               
172.17.0.1                 : ok=1    changed=0    unreachable=0    failed=0                                                                                    

Saturday 17 June 2023  18:56:31 +0000 (0:00:01.499)       0:00:01.543 *********                                                                                
===============================================================================                                                                                
Gathering Facts --------------------------------------------------------- 1.50s                                                                                
/                                                                                                                                                              
waiting for CouchDB to be available                                                                                                                            
waiting for CouchDB to be available

All of this cause other pods to stuck on init status:

wsk           wsk-alarmprovider-7c4d6df59d-b7k9q   0/1     Init:0/1    0          6m13s
wsk           wsk-apigateway-7b985fcb9d-kl2rs      1/1     Running     0          6m13s
wsk           wsk-controller-0                     0/1     Init:0/2    0          6m13s
wsk           wsk-couchdb-7f866c6b59-hds6j         1/1     Running     0          6m13s
wsk           wsk-gen-certs-lf458                  0/1     Completed   0          6m13s
wsk           wsk-grafana-5ccc75c949-zz82h         1/1     Running     0          6m13s
wsk           wsk-init-couchdb-xlr88               1/1     Running     0          6m13s
wsk           wsk-install-packages-brncp           0/1     Init:0/1    0          6m13s
wsk           wsk-invoker-0                        0/1     Init:0/1    0          6m13s
wsk           wsk-kafka-0                          0/1     Init:0/1    0          6m13s
wsk           wsk-kafkaprovider-745d9f46fb-4w7z7   0/1     Init:0/1    0          6m13s
wsk           wsk-nginx-787f646fff-xbgsd           0/1     Init:0/1    0          6m13s
wsk           wsk-prometheus-server-0              1/1     Running     0          6m13s
wsk           wsk-redis-6bfbc9956f-6m9kz           1/1     Running     0          6m13s
wsk           wsk-wskadmin                         1/1     Running     0          6m13s
wsk           wsk-zookeeper-0                      1/1     Running     0          6m13s

I read a lot of issues and docs here about problems like these, mentioning network connectivity issues
I also tried with the latest ow-utils, as stated in another issue from 2022. I changed my k8s/domain field as seen in another issue. I even tried with persistence: false in case that helped, at least to discard storageClass issues or something.

Seriously, the deployment shouldn't be that hard.

@diegombeltran
Copy link
Author

Well, it was indeed connectivity issues between the init and the couchdb pod. Security Groups and nodes deployed with Terraform.

Sorry for the inconvenience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant