Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods are not initializing for Che and keycloak kubernetes #13838

Closed
5 of 23 tasks
SDAdham opened this issue Jul 12, 2019 · 43 comments
Closed
5 of 23 tasks

Pods are not initializing for Che and keycloak kubernetes #13838

SDAdham opened this issue Jul 12, 2019 · 43 comments
Labels
kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.

Comments

@SDAdham
Copy link

SDAdham commented Jul 12, 2019

Describe the bug

keycloak and che are not initializing, they keep in the following stat:

che-6f5989dcc8-cs9k2        0/1     Init:0/2   0          37m
keycloak-6fdbdf45f6-mlmml   0/1     Init:0/1   0          37m
postgres-6c4d6c764c-m9qrn   1/1     Running    0          37m

In the meanwhile, if I am deploying a single-user che on the very same kubernetes environment, it works perfectly!!

Che version

  • latest
  • nightly
  • other: please specify
    6.19.0, 6.19.2, 7.0.0-rc-3.x, 6.19.5

Steps to reproduce

kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
kubectl create serviceaccount tiller --namespace kube-system
git clone https://github.com/eclipse/che.git
cd che/deploy/kubernetes/helm/che
kubectl apply -f ./tiller-rbac.yaml
helm init --service-account tiller --wait
helm dependency update
helm upgrade --install che --namespace dev --set cheImage=eclipse/che-server:<version> --set global.multiuser=true,global.cheDomain=<domain> ./

Expected behavior

Pods should be initialized and che environment should be deployed

Runtime

  • kubernetes (include output of kubectl version)
  • Openshift (include output of oc version)
  • minikube (include output of minikube version and kubectl version)
  • minishift (include output of minishift version and oc version)
  • docker-desktop + K8S (include output of docker version and kubectl version)
  • other: (please specify)
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-21T13:09:06Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-21T13:07:26Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}

Screenshots

Screenshots of one of the pods:
Image Pasted at 2019-7-13 01-46
Image Pasted at 2019-7-13 01-48
Image Pasted at 2019-7-13 01-49

Events of the entire namespace

$ kubectl get events --namespace dev --sort-by='.metadata.creationTimestamp'                                                                                                                            LAST SEEN   TYPE      REASON              OBJECT                           MESSAGE
15m         Normal    Scheduled           pod/che-6f5989dcc8-5wq2d         Successfully assigned dev/che-6f5989dcc8-5wq2d to x
15m         Normal    SuccessfulCreate    replicaset/postgres-6c4d6c764c   Created pod: postgres-6c4d6c764c-jlzpz
15m         Normal    Scheduled           pod/postgres-6c4d6c764c-jlzpz    Successfully assigned dev/postgres-6c4d6c764c-jlzpz to x
15m         Normal    ScalingReplicaSet   deployment/keycloak              Scaled up replica set keycloak-6fdbdf45f6 to 1
15m         Normal    CREATE              ingress/keycloak-ingress         Ingress dev/keycloak-ingress
15m         Normal    SuccessfulCreate    replicaset/che-6f5989dcc8        Created pod: che-6f5989dcc8-5wq2d
15m         Normal    CREATE              ingress/che-ingress              Ingress dev/che-ingress
15m         Normal    CREATE              ingress/che-ingress              Ingress dev/che-ingress
15m         Normal    CREATE              ingress/keycloak-ingress         Ingress dev/keycloak-ingress
15m         Normal    SuccessfulCreate    replicaset/keycloak-6fdbdf45f6   Created pod: keycloak-6fdbdf45f6-q8vd6
15m         Normal    ScalingReplicaSet   deployment/che                   Scaled up replica set che-6f5989dcc8 to 1
15m         Normal    Scheduled           pod/keycloak-6fdbdf45f6-q8vd6    Successfully assigned dev/keycloak-6fdbdf45f6-q8vd6 to x
61s         Warning   DNSConfigForming    pod/keycloak-6fdbdf45f6-q8vd6    Search Line limits were exceeded, some search paths have been omitted, the applied search line is
15m         Normal    ScalingReplicaSet   deployment/postgres              Scaled up replica set postgres-6c4d6c764c to 1
15m         Normal    Pulled              pod/keycloak-6fdbdf45f6-q8vd6    Container image "alpine:3.5" already present on machine
49s         Warning   DNSConfigForming    pod/che-6f5989dcc8-5wq2d         Search Line limits were exceeded, some search paths have been omitted, the applied search line is
16s         Warning   DNSConfigForming    pod/postgres-6c4d6c764c-jlzpz    Search Line limits were exceeded, some search paths have been omitted, the applied search line is
15m         Normal    Created             pod/che-6f5989dcc8-5wq2d         Created container wait-for-postgres
15m         Normal    Pulling             pod/postgres-6c4d6c764c-jlzpz    Pulling image "eclipse/che-postgres:nightly"
15m         Normal    Pulled              pod/che-6f5989dcc8-5wq2d         Container image "alpine:3.5" already present on machine
15m         Normal    Created             pod/keycloak-6fdbdf45f6-q8vd6    Created container wait-for-postgres
15m         Normal    Started             pod/keycloak-6fdbdf45f6-q8vd6    Started container wait-for-postgres
15m         Normal    Started             pod/che-6f5989dcc8-5wq2d         Started container wait-for-postgres
15m         Normal    Pulled              pod/postgres-6c4d6c764c-jlzpz    Successfully pulled image "eclipse/che-postgres:nightly"
15m         Normal    Started             pod/postgres-6c4d6c764c-jlzpz    Started container postgres
15m         Normal    Created             pod/postgres-6c4d6c764c-jlzpz    Created container postgres
15m         Warning   Unhealthy           pod/postgres-6c4d6c764c-jlzpz    Readiness probe failed: psql: could not connect to server: Connection refused
            Is the server running on host "127.0.0.1" and accepting
            TCP/IP connections on port 5432?
14m         Normal    UPDATE              ingress/keycloak-ingress         Ingress dev/keycloak-ingress
14m         Normal    UPDATE              ingress/keycloak-ingress         Ingress dev/keycloak-ingress
14m         Normal    UPDATE              ingress/che-ingress              Ingress dev/che-ingress
14m         Normal    UPDATE              ingress/che-ingress              Ingress dev/che-ingress

Describe of one of the pods:

Name:           keycloak-6fdbdf45f6-p2rnc
Namespace:      dev
Node:           x/x.x.x.x
Start Time:     Sat, 13 Jul 2019 01:54:40 +1000
Labels:         io.kompose.service=keycloak
                pod-template-hash=6fdbdf45f6
Annotations:    <none>
Status:         Pending
IP:             x.y.z.e
Controlled By:  ReplicaSet/keycloak-6fdbdf45f6
Init Containers:
  wait-for-postgres:
    Container ID:  containerd://edb9932d7cf3a56a3e85580f159a5c99ccd35dba81f098dc3b1a0c9a0092f267
    Image:         alpine:3.5
    Image ID:      docker.io/library/alpine@sha256:66952b313e51c3bd1987d7c4ddf5dba9bc0fb6e524eed2448fa660246b3e76ec
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      apk --no-cache add curl jq ; adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for postgres to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/postgres`; adresses_length=`echo $endpoints | jq -r ".subsets[]?.addresses // [] | length"`; done;
    State:          Running
      Started:      Sat, 13 Jul 2019 01:54:42 +1000
    Ready:          False
    Restart Count:  0
    Environment:
      POD_NAMESPACE:  dev (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from che-keycloak-token-k2rl9 (ro)
Containers:
  keycloak:
    Container ID:
    Image:         eclipse/che-keycloak:nightly
    Image ID:
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      /scripts/kc_realm_user.sh
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  1536Mi
    Requests:
      memory:   1Gi
    Liveness:   tcp-socket :8080 delay=5s timeout=30s period=5s #success=1 #failure=11
    Readiness:  http-get http://:8080/auth/js/keycloak.js delay=10s timeout=1s period=3s #success=1 #failure=10
    Environment:
      POSTGRES_PORT_5432_TCP_ADDR:  postgres
      POSTGRES_PORT_5432_TCP_PORT:  5432
      POSTGRES_DATABASE:            keycloak
      POSTGRES_USER:                keycloak
      POSTGRES_PASSWORD:            keycloak
      KEYCLOAK_USER:                admin
      KEYCLOAK_PASSWORD:            admin
      CHE_HOST:                     che-dev.192.168.99.100.nip.io
      ROUTING_SUFFIX:               192.168.99.100.nip.io
      NAMESPACE:                    dev
      PROTOCOL:                     http
    Mounts:
      /opt/jboss/keycloak/standalone/data from keycloak-data (rw)
      /opt/jboss/keycloak/standalone/log from keycloak-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from che-keycloak-token-k2rl9 (ro)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  keycloak-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  keycloak-data
    ReadOnly:   false
  keycloak-log:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  keycloak-log
    ReadOnly:   false
  che-keycloak-token-k2rl9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  che-keycloak-token-k2rl9
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From                  Message
  ----     ------            ----                ----                  -------
  Normal   Scheduled         2m3s                default-scheduler     Successfully assigned dev/keycloak-6fdbdf45f6-p2rnc to x
  Normal   Pulled            2m2s                kubelet, x  Container image "alpine:3.5" already present on machine
  Normal   Created           2m2s                kubelet, x  Created container wait-for-postgres
  Normal   Started           2m1s                kubelet, x  Started container wait-for-postgres
  Warning  DNSConfigForming  53s (x5 over 2m2s)  kubelet, x  Search Line limits were exceeded, some search paths have been omitted, the applied search line is: xyx

validated postgres readiness:

$ kubectl get --namespace dev pod/postgres-6c4d6c764c-m9qrn -o jsonpath='{.spec.containers[0].readinessProbe.exec.command}'                                                                                  [bash -c psql -h 127.0.0.1 -U ${POSTGRESQL_USER} -q -d $POSTGRESQL_DATABASE -c "SELECT 1"]

And also running the following command from it's pod's exec:

sh-4.2$ psql -h 127.0.0.1 -U pgche -q -d dbche -c 'SELECT 1'
 ?column? 
----------
        1
(1 row)

Installation method

  • chectl
  • che-operator
  • minishift-addon
  • I don't know
Client: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}

helm upgrade --install che --namespace dev --set cheImage=eclipse/che-server:<version> --set global.multiuser=true --set global.cheDomain=<domain> ./

In either of the pods che or keycloak logs, it would have a single log line with a reference to itself that it's waiting to be initialized, example:

container "keycloak" in pod "keycloak-6fdbdf45f6-q8vd6" is waiting to start: PodInitializing

Environment

  • my computer
    • Windows
    • Linux
    • macOS
  • Cloud
    • Amazon
    • Azure
    • GCE
    • other (please specify)
  • other: please specify Ubuntu server 18.04

Additional context

@SDAdham
Copy link
Author

SDAdham commented Jul 12, 2019

not sure if #13625 is related to this one, title sounds the same but the outcome and issue details look different? Not sure

@rhopp rhopp added team/platform status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach labels Jul 12, 2019
@rhopp
Copy link
Contributor

rhopp commented Jul 12, 2019

I think I was able to reproduce this issue.

Could you please take a look at logs of the init containers? Like this:
kubectl logs <keycloak_pod_name> -n <namespace> -c <container_name>

So in my case it was:

$ kubectl logs keycloak-7666c648f5-lr7p5 -n che -c wait-for-postgres                                                                                                                                          1 ↵
ERROR: Unable to lock database: Permission denied
ERROR: Failed to open apk database: Permission denied
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand

which seems pretty wrong....
I've reproduced this on minikube 1.2.0 (k8s 1.15).
@skabashnyuk Something to take a look at?

@SDAdham
Copy link
Author

SDAdham commented Jul 13, 2019

Could you please take a look at logs of the init containers? Like this:
kubectl logs <keycloak_pod_name> -n <namespace> -c <container_name>

Yes, I see the same as you:

waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
waiting for postgres to be ready...
sh: 0: unknown operand
sh: curl: not found
sh: jq: not found

@SDAdham
Copy link
Author

SDAdham commented Jul 13, 2019

btw @rhopp, this is not only for the keycloak container, but I can imagine it would happening with the Che container as well. Reason I believe this is because I tried to setup che & postgres containers only using global.cheDedicatedKeycloak=false,customOidcProvider=<oidc-url>,cheKeycloakClientId=<oidc_clientId>,customOidcUsernameClaim=<user_name_claim> from https://www.eclipse.org/che/docs/che-6/kubernetes-multi-user.html and yet it didn't work.

@SDAdham
Copy link
Author

SDAdham commented Jul 13, 2019

Updated description with the 6.19.5 as I also tested against it and yet the issue is still happening.

@skabashnyuk skabashnyuk added the severity/P1 Has a major impact to usage or development of the system. label Jul 13, 2019
@SDAdham
Copy link
Author

SDAdham commented Jul 15, 2019

@skabashnyuk
I'm not expert in Kubernetes pods creation and deployments, however, I did some diggings, and here what I came up with:

      - name: wait-for-postgres
        image: alpine:3.5
        command: ["sh", "-c", "apk --no-cache add curl jq ; adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for postgres to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/postgres`; adresses_length=`echo $endpoints | jq -r \".subsets[]?.addresses // [] | length\"`; done;"]
      - name: wait-for-keycloak
        image: alpine:3.5
        command: ["sh", "-c", "apk --no-cache add curl jq ; adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for keycloak to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/keycloak`; adresses_length=`echo $endpoints | jq -r \".subsets[]?.addresses // [] | length\"`; done;"]
     - name: wait-for-postgres
        image: alpine:3.5
        command: ["sh", "-c", "apk --no-cache add curl jq ; adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for postgres to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/postgres`; adresses_length=`echo $endpoints | jq -r \".subsets[]?.addresses // [] | length\"`; done;"]

I did some research and came up with this answer: https://stackoverflow.com/a/53308881/5224768

So I did try to change the image from alpine:3.5 to byrnedo/alpine-curl and it did solve the issue with curl but not for the other commands:

waiting for postgres to be ready...
sh: jq: not found
sh: 0: unknown operand

So this put me on a track where there is nothing wrong with the sh command in the yamls however, i'd expect it an issue with the image, right?

@SDAdham
Copy link
Author

SDAdham commented Jul 15, 2019

@skabashnyuk tried this image gempesaw/curl-jq from https://hub.docker.com/r/gempesaw/curl-jq
Here is the output:

fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
WARNING: Ignoring http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz: No such file or directory
WARNING: Ignoring http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz: No such file or directory
OK: 7 MiB in 17 packages
waiting for postgres to be ready...

And now it's pulling the image eclipse/che-keycloak:nightly, can you please advise if the image is good to go?


If you see what I did is right, or you have any recommendations, pls let me know and I will create a branch and commit the changes after i test it locally...

@skabashnyuk
Copy link
Contributor

eclipse/che-keycloak:nightly - should be ready at midnight I think.
Can you try https://github.com/che-incubator/chectl to install che?

@SDAdham
Copy link
Author

SDAdham commented Jul 15, 2019

@skabashnyuk sure, I could try, i'll let you know what happened, btw keycloak is deployed now, i'm waiting for che, shall I cancel testing this change?

Should I also wait tilll tomorrow to use https://github.com/che-incubator/chectl ?

@nickboldt nickboldt added the status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering. label Jul 15, 2019
@nickboldt nickboldt added this to the 7.1.0 milestone Jul 15, 2019
@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

@skabashnyuk : I tested the chectl and here is the output:

~$ chectl server:start --installer=helm --multiuser --platform=k8s --tls --self-signed-cert
✔ ✈️  Kubernetes preflight checklist
✔ Verify if kubectl is installed
✔ Verify remote kubernetes status...done.
✔ Verify domain is set...set to <domain>.
❯ 🏃‍  Running Helm to install Che
✔ Verify if helm is installed
✔ Create Tiller Role Binding...it already exist.
✔ Create Tiller Service Account...it already exist.
✔ Create Tiller RBAC
✔ Create Tiller Service...it already exist.
✔ Preparing Che Helm Chart...done.
✔ Updating Helm Chart dependencies...done.
✖ Deploying Che Helm Chart
→ Error: release: "che" not found Error: Command failed: /bin/sh -c helm history che --output json
Error: release: "che" not found
    at makeError (/snapshot/chectl/node_modules/execa/index.js:174:9)
    at module.exports.Promise.all.then.arr (/snapshot/chectl/node_modules/execa/index.js:278:16)

Seems related to che-incubator/chectl#108 as my helm version is 2.14.1

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

I am using snap for helm client and i'm not sure if it supports downgrading and i'm not sure if it's the client only issue or both client and server?

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

hmm, but it's also actually right, there are no che releases? Isn't that what it supposed to check?

@skabashnyuk
Copy link
Contributor

@l0rd @benoitf @davidfestal have you seen that issue with helm && chectl before. Can you assist?

@l0rd
Copy link
Contributor

l0rd commented Jul 16, 2019

@SDAdham what version of chectl are you using? This looks related to che-incubator/chectl#18 that has been fixed a couple of weeks ago.

If that doesn't work plese copy/paste the output of helm list --all --debug here.

And as a workaround, if you are ok to delete your local Che instance and all its data:

chectl server:delete
chectl server:start --installer=helm --multiuser --platform=k8s --tls --self-signed-cert

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

@l0rd I am using the latest version of chectl https://github.com/che-incubator/chectl/releases (freshly installed today)

I don't have any che installed cuz I can't install it at all in the first place and even though all installations are usually failing, I run chectl server:delete all the time to confirm nothing is installed

Here is the output of helm list

$helm list --all --debug
[debug] Created tunnel using local port: '36823'

[debug] SERVER: "127.0.0.1:36823"

Edit:
Reference:

~$ chectl
Eclipse Che CLI

VERSION chectl/0.0.2-b508feb linux-x64 node-v10.4.1

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

New update: After downloading another release of today chectl/0.0.2-68f8872 linux-x64 node-v10.4.1 as this is tested now but I am getting a different error:

Error: Unable to execute helm command helm upgrade --install che --force --namespace che --set global.ingressDomain=<domain> --set global.cheDomain=<domain> --set cheImage=eclipse/che-server:nightly --set global.cheWorkspacesNamespace=che -f /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/values/multi-user.yaml -f /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/values/tls.yaml /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/ / Error: validation failed: [unable to recognize "": no matches for kind "Certificate" in version "certmanager.k8s.io/v1alpha1", unable to recognize "": no matches for kind "ClusterIssuer" in version "certmanager.k8s.io/v1alpha1"]
    at HelmHelper.<anonymous> (/snapshot/chectl/lib/installers/helm.js:0:0)
    at Generator.next (<anonymous>)
    at fulfilled (/snapshot/chectl/node_modules/tslib/tslib.js:107:62)

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

Alright, been able to past that stage after installing the certificate, however it seems that chectl command is putting us back to the main issue again, so no progress and no differences it made:

waiting for postgres to be ready...
sh: 0: unknown operand
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
waiting for postgres to be ready...
sh: 0: unknown operand
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...

Details:
Running chectl command fails with this:
image
Checking on logs for the container wait-for-postgres for keycloak and it seems to be the same issue as described in the description.

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

@l0rd @skabashnyuk @benoitf no progress at all, we're back to the original issue

@nickboldt nickboldt removed the status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering. label Jul 16, 2019
@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

According to @benoitf : it seems that it could be related to k8s version 1.15

@skabashnyuk
Copy link
Contributor

@SDAdham at this moment on clean k8s with chectl server:start --installer=helm --multiuser --platform=k8s --tls --self-signed-cert command you have an issue with postgresql at this moment. Am I right?

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

@skabashnyuk yes, that's correct.

And everything is done as per the instructions and documentations, the only thing that's done that is not documented but with help of @benoitf is installing the cert-manager with the following:

$ kubectl create namespace cert-manager
$ kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v0.8.1/cert-manager.yaml --validate=false

So I can bypass the issue which required me to manually install the cert-manager:

Error: Unable to execute helm command helm upgrade --install che --force --namespace che --set global.ingressDomain=<domain> --set global.cheDomain=<domain> --set cheImage=eclipse/che-server:nightly --set global.cheWorkspacesNamespace=che -f /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/values/multi-user.yaml -f /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/values/tls.yaml /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/ / Error: validation failed: [unable to recognize "": no matches for kind "Certificate" in version "certmanager.k8s.io/v1alpha1", unable to recognize "": no matches for kind "ClusterIssuer" in version "certmanager.k8s.io/v1alpha1"]
    at HelmHelper.<anonymous> (/snapshot/chectl/lib/installers/helm.js:0:0)
    at Generator.next (<anonymous>)
    at fulfilled (/snapshot/chectl/node_modules/tslib/tslib.js:107:62)

@skabashnyuk
Copy link
Contributor

can you try the same without --tls --self-signed-cert ?

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

@skabashnyuk I can do that, but I don't really think that that's going to make any difference as the whole problem is that the commands are undefined, this step you're asking for is going further away from the actual problem.

Please refer to this comment #13838 (comment)

@skabashnyuk
Copy link
Contributor

do you have internet available for you k8s cluster? I see no other reasons why init container command might fail.

@benoitf
Copy link
Contributor

benoitf commented Jul 16, 2019

issue on keycloak is due to the latest changes on keycloak image and in helm templates

init container is trying to install curl and jq and receive

/ $  apk --no-cache add curl jq
ERROR: Unable to lock database: Permission denied
ERROR: Failed to open apk database: Permission denied

so further commands are not working

I didn't have these as I was using cached helm templates with latest image

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

Thanks @benoitf , So this explains why the issue is rising but it wasn't seen? And is that still related to k8s 1.15?

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

also, i'm not sure how can I see:

ERROR: Unable to lock database: Permission denied
ERROR: Failed to open apk database: Permission denied

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

@skabashnyuk I tried your suggestion and no difference like I mentioned.

@benoitf I did try to see if I have the same issue that would be matching the issue you pointed out but turns out no:

$ kubectl exec pod/keycloak-5c79d755db-qh26f -n che -ti -c wait-for-postgres sh
/ # apk --no-cache add curl jq ; adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for postgres to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/postgres`; adresses_length=`echo $endpoints | jq -r ".subsets[]?.addresses // [] | length"`; done;
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
WARNING: Ignoring http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz: No such file or directory
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
WARNING: Ignoring http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz: No such file or directory
ERROR: unsatisfiable constraints:
  curl (missing):
    required by: world[curl]
  jq (missing):
    required by: world[jq]
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...

@benoitf
Copy link
Contributor

benoitf commented Jul 16, 2019

@SDAdham
could you edit helm/che/custom-charts/che-keycloak/templates/deployment.yaml

and use

    spec:
      initContainers:
      - name: wait-for-postgres
        image: everpeace/curl-jq
        command: ["sh", "-c", "adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for postgres to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/postgres`; adresses_length=`echo $endpoints | jq -r \".subsets[]?.addresses // [] | length\"`; done;"]

as a workaround (it's using a jq/curl image and not install it at runtime)

@l0rd
Copy link
Contributor

l0rd commented Jul 16, 2019

@benoitf it looks like we can split this issue in 2:

Is that correct?

@benoitf
Copy link
Contributor

benoitf commented Jul 16, 2019

@l0rd yes but cert-manager doc is documented in the "remote install of che"
I will probably add an extra check to ensure cert-manager is found in chectl if tls is enabled (like now we're checking if che-tls secret is there as well)

@l0rd
Copy link
Contributor

l0rd commented Jul 16, 2019

I will probably add an extra check in chectl if tls is enabled (like checking if che-tls secret is there as well)

Makes sense

@l0rd l0rd removed this from the 7.1.0 milestone Jul 16, 2019
@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

@benoitf As per this comment: #13838 (comment) and this comment: #13838 (comment)
I did update the yamls, yes it's a different image other than what you mentioned, even though it worked, but one thing I noticed, updating the yamls doesn't take effect, I had to update the image in the runtime, can you please advise how can I update the yamls and in the same time, it would take effect in the runtime afterwards?

@benoitf
Copy link
Contributor

benoitf commented Jul 16, 2019

@SDAdham drop the helm cache
$ helm delete che --purge

@SDAdham
Copy link
Author

SDAdham commented Jul 16, 2019

How can I drop the helm cache? I always use this >> $ helm delete che --purge but it doesn't really make any difference for helm not to use the old image still?

@benoitf benoitf added kind/bug Outline of a bug - must adhere to the bug report template. and removed status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach labels Jul 16, 2019
@skabashnyuk
Copy link
Contributor

@SDAdham I'm going to ask @mkuznyetsov to handle this task. The plan is to create an alpine based image under eclipse org that we can use as init container.
CC @vparfonov

@SDAdham
Copy link
Author

SDAdham commented Jul 17, 2019

@skabashnyuk looks about good plan, ill be around if any help needed, id prefer to wait for chectl as this is the most prefered/supported way to go as im looking forward for a production server

@davidwindell
Copy link
Contributor

Here we are using Rancher 2 where chectl isn't useful for us, we can just launch the Helm chart via the GUI so it would be good to see chectl not made a required element where bugs like this are fixed (which I'm sure will be the case, just wanted to highlight that there are users out there deploying on K8s in different ways).

@SDAdham
Copy link
Author

SDAdham commented Jul 17, 2019

@davidwindell i do agree but setting a standard is the best way to move, this way u know whats coming for u but if u do something different then ur on ur own, thats why consistency always succeeds.

I already solved the issue by a different image, and i can go with this. But i prefer to stick to the solution that will be provided to this issue.

Deploying with helm was a way for me to go with initially but since chectl is working, i prefer that more

@SDAdham
Copy link
Author

SDAdham commented Jul 17, 2019

@skabashnyuk @nickboldt : can we add this issue to the mile stone of V7.0, I'm not sure if we should release 7.0 with issues in deployment?

@l0rd
Copy link
Contributor

l0rd commented Jul 17, 2019

@SDAdham actually #13870 is already in 7.0.0. If you are ok we should close this issue in favor of the #13870

@davidwindell chectl server:start --installer helm is just a helm update wrapper that does some pre-flight checks and automated configuration based on the platform you are deploying Che. If something can be fixed at helm chart level we will fix it at that level. I agree with you: it doesn't make sense to do a fix at chectl level.

In this case the fix consist in modifying the permissions in a container image folder. Every installer will benefit of it.

@SDAdham
Copy link
Author

SDAdham commented Jul 17, 2019

@l0rd, if nothing is needed from my side. Please feel free to close this one. Ill track the other issue on when its resolved

@SDAdham SDAdham closed this as completed Jul 17, 2019
@SDAdham SDAdham reopened this Jul 17, 2019
@l0rd
Copy link
Contributor

l0rd commented Jul 17, 2019

Closing in favor of #13870 that has a clear description of the main issue that has been reported here. Thank you @SDAdham and see you on the other side :-)

@l0rd l0rd closed this as completed Jul 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

7 participants