Pods are not initializing for Che and keycloak kubernetes #13838

SDAdham · 2019-07-12T16:23:57Z

Describe the bug

keycloak and che are not initializing, they keep in the following stat:

che-6f5989dcc8-cs9k2        0/1     Init:0/2   0          37m
keycloak-6fdbdf45f6-mlmml   0/1     Init:0/1   0          37m
postgres-6c4d6c764c-m9qrn   1/1     Running    0          37m

In the meanwhile, if I am deploying a single-user che on the very same kubernetes environment, it works perfectly!!

Che version

latest
nightly
other: please specify
6.19.0, 6.19.2, 7.0.0-rc-3.x, 6.19.5

Steps to reproduce

kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
kubectl create serviceaccount tiller --namespace kube-system
git clone https://github.com/eclipse/che.git
cd che/deploy/kubernetes/helm/che
kubectl apply -f ./tiller-rbac.yaml
helm init --service-account tiller --wait
helm dependency update
helm upgrade --install che --namespace dev --set cheImage=eclipse/che-server:<version> --set global.multiuser=true,global.cheDomain=<domain> ./

Expected behavior

Pods should be initialized and che environment should be deployed

Runtime

kubernetes (include output of kubectl version)
Openshift (include output of oc version)
minikube (include output of minikube version and kubectl version)
minishift (include output of minishift version and oc version)
docker-desktop + K8S (include output of docker version and kubectl version)
other: (please specify)

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-21T13:09:06Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-21T13:07:26Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}

Screenshots

Screenshots of one of the pods:

Events of the entire namespace

$ kubectl get events --namespace dev --sort-by='.metadata.creationTimestamp'                                                                                                                            LAST SEEN   TYPE      REASON              OBJECT                           MESSAGE
15m         Normal    Scheduled           pod/che-6f5989dcc8-5wq2d         Successfully assigned dev/che-6f5989dcc8-5wq2d to x
15m         Normal    SuccessfulCreate    replicaset/postgres-6c4d6c764c   Created pod: postgres-6c4d6c764c-jlzpz
15m         Normal    Scheduled           pod/postgres-6c4d6c764c-jlzpz    Successfully assigned dev/postgres-6c4d6c764c-jlzpz to x
15m         Normal    ScalingReplicaSet   deployment/keycloak              Scaled up replica set keycloak-6fdbdf45f6 to 1
15m         Normal    CREATE              ingress/keycloak-ingress         Ingress dev/keycloak-ingress
15m         Normal    SuccessfulCreate    replicaset/che-6f5989dcc8        Created pod: che-6f5989dcc8-5wq2d
15m         Normal    CREATE              ingress/che-ingress              Ingress dev/che-ingress
15m         Normal    CREATE              ingress/che-ingress              Ingress dev/che-ingress
15m         Normal    CREATE              ingress/keycloak-ingress         Ingress dev/keycloak-ingress
15m         Normal    SuccessfulCreate    replicaset/keycloak-6fdbdf45f6   Created pod: keycloak-6fdbdf45f6-q8vd6
15m         Normal    ScalingReplicaSet   deployment/che                   Scaled up replica set che-6f5989dcc8 to 1
15m         Normal    Scheduled           pod/keycloak-6fdbdf45f6-q8vd6    Successfully assigned dev/keycloak-6fdbdf45f6-q8vd6 to x
61s         Warning   DNSConfigForming    pod/keycloak-6fdbdf45f6-q8vd6    Search Line limits were exceeded, some search paths have been omitted, the applied search line is
15m         Normal    ScalingReplicaSet   deployment/postgres              Scaled up replica set postgres-6c4d6c764c to 1
15m         Normal    Pulled              pod/keycloak-6fdbdf45f6-q8vd6    Container image "alpine:3.5" already present on machine
49s         Warning   DNSConfigForming    pod/che-6f5989dcc8-5wq2d         Search Line limits were exceeded, some search paths have been omitted, the applied search line is
16s         Warning   DNSConfigForming    pod/postgres-6c4d6c764c-jlzpz    Search Line limits were exceeded, some search paths have been omitted, the applied search line is
15m         Normal    Created             pod/che-6f5989dcc8-5wq2d         Created container wait-for-postgres
15m         Normal    Pulling             pod/postgres-6c4d6c764c-jlzpz    Pulling image "eclipse/che-postgres:nightly"
15m         Normal    Pulled              pod/che-6f5989dcc8-5wq2d         Container image "alpine:3.5" already present on machine
15m         Normal    Created             pod/keycloak-6fdbdf45f6-q8vd6    Created container wait-for-postgres
15m         Normal    Started             pod/keycloak-6fdbdf45f6-q8vd6    Started container wait-for-postgres
15m         Normal    Started             pod/che-6f5989dcc8-5wq2d         Started container wait-for-postgres
15m         Normal    Pulled              pod/postgres-6c4d6c764c-jlzpz    Successfully pulled image "eclipse/che-postgres:nightly"
15m         Normal    Started             pod/postgres-6c4d6c764c-jlzpz    Started container postgres
15m         Normal    Created             pod/postgres-6c4d6c764c-jlzpz    Created container postgres
15m         Warning   Unhealthy           pod/postgres-6c4d6c764c-jlzpz    Readiness probe failed: psql: could not connect to server: Connection refused
            Is the server running on host "127.0.0.1" and accepting
            TCP/IP connections on port 5432?
14m         Normal    UPDATE              ingress/keycloak-ingress         Ingress dev/keycloak-ingress
14m         Normal    UPDATE              ingress/keycloak-ingress         Ingress dev/keycloak-ingress
14m         Normal    UPDATE              ingress/che-ingress              Ingress dev/che-ingress
14m         Normal    UPDATE              ingress/che-ingress              Ingress dev/che-ingress

Describe of one of the pods:

Name:           keycloak-6fdbdf45f6-p2rnc
Namespace:      dev
Node:           x/x.x.x.x
Start Time:     Sat, 13 Jul 2019 01:54:40 +1000
Labels:         io.kompose.service=keycloak
                pod-template-hash=6fdbdf45f6
Annotations:    <none>
Status:         Pending
IP:             x.y.z.e
Controlled By:  ReplicaSet/keycloak-6fdbdf45f6
Init Containers:
  wait-for-postgres:
    Container ID:  containerd://edb9932d7cf3a56a3e85580f159a5c99ccd35dba81f098dc3b1a0c9a0092f267
    Image:         alpine:3.5
    Image ID:      docker.io/library/alpine@sha256:66952b313e51c3bd1987d7c4ddf5dba9bc0fb6e524eed2448fa660246b3e76ec
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      apk --no-cache add curl jq ; adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for postgres to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/postgres`; adresses_length=`echo $endpoints | jq -r ".subsets[]?.addresses // [] | length"`; done;
    State:          Running
      Started:      Sat, 13 Jul 2019 01:54:42 +1000
    Ready:          False
    Restart Count:  0
    Environment:
      POD_NAMESPACE:  dev (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from che-keycloak-token-k2rl9 (ro)
Containers:
  keycloak:
    Container ID:
    Image:         eclipse/che-keycloak:nightly
    Image ID:
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      /scripts/kc_realm_user.sh
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  1536Mi
    Requests:
      memory:   1Gi
    Liveness:   tcp-socket :8080 delay=5s timeout=30s period=5s #success=1 #failure=11
    Readiness:  http-get http://:8080/auth/js/keycloak.js delay=10s timeout=1s period=3s #success=1 #failure=10
    Environment:
      POSTGRES_PORT_5432_TCP_ADDR:  postgres
      POSTGRES_PORT_5432_TCP_PORT:  5432
      POSTGRES_DATABASE:            keycloak
      POSTGRES_USER:                keycloak
      POSTGRES_PASSWORD:            keycloak
      KEYCLOAK_USER:                admin
      KEYCLOAK_PASSWORD:            admin
      CHE_HOST:                     che-dev.192.168.99.100.nip.io
      ROUTING_SUFFIX:               192.168.99.100.nip.io
      NAMESPACE:                    dev
      PROTOCOL:                     http
    Mounts:
      /opt/jboss/keycloak/standalone/data from keycloak-data (rw)
      /opt/jboss/keycloak/standalone/log from keycloak-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from che-keycloak-token-k2rl9 (ro)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  keycloak-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  keycloak-data
    ReadOnly:   false
  keycloak-log:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  keycloak-log
    ReadOnly:   false
  che-keycloak-token-k2rl9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  che-keycloak-token-k2rl9
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From                  Message
  ----     ------            ----                ----                  -------
  Normal   Scheduled         2m3s                default-scheduler     Successfully assigned dev/keycloak-6fdbdf45f6-p2rnc to x
  Normal   Pulled            2m2s                kubelet, x  Container image "alpine:3.5" already present on machine
  Normal   Created           2m2s                kubelet, x  Created container wait-for-postgres
  Normal   Started           2m1s                kubelet, x  Started container wait-for-postgres
  Warning  DNSConfigForming  53s (x5 over 2m2s)  kubelet, x  Search Line limits were exceeded, some search paths have been omitted, the applied search line is: xyx

validated postgres readiness:

$ kubectl get --namespace dev pod/postgres-6c4d6c764c-m9qrn -o jsonpath='{.spec.containers[0].readinessProbe.exec.command}'                                                                                  [bash -c psql -h 127.0.0.1 -U ${POSTGRESQL_USER} -q -d $POSTGRESQL_DATABASE -c "SELECT 1"]

And also running the following command from it's pod's exec:

sh-4.2$ psql -h 127.0.0.1 -U pgche -q -d dbche -c 'SELECT 1'
 ?column? 
----------
        1
(1 row)

Installation method

chectl
che-operator
minishift-addon
I don't know

Client: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}

helm upgrade --install che --namespace dev --set cheImage=eclipse/che-server:<version> --set global.multiuser=true --set global.cheDomain=<domain> ./

In either of the pods che or keycloak logs, it would have a single log line with a reference to itself that it's waiting to be initialized, example:

container "keycloak" in pod "keycloak-6fdbdf45f6-q8vd6" is waiting to start: PodInitializing

Environment

Additional context

The text was updated successfully, but these errors were encountered:

SDAdham · 2019-07-12T17:15:44Z

not sure if #13625 is related to this one, title sounds the same but the outcome and issue details look different? Not sure

rhopp · 2019-07-12T19:13:30Z

I think I was able to reproduce this issue.

Could you please take a look at logs of the init containers? Like this:
kubectl logs <keycloak_pod_name> -n <namespace> -c <container_name>

So in my case it was:

$ kubectl logs keycloak-7666c648f5-lr7p5 -n che -c wait-for-postgres                                                                                                                                          1 ↵
ERROR: Unable to lock database: Permission denied
ERROR: Failed to open apk database: Permission denied
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand

which seems pretty wrong....
I've reproduced this on minikube 1.2.0 (k8s 1.15).
@skabashnyuk Something to take a look at?

SDAdham · 2019-07-13T01:44:11Z

Could you please take a look at logs of the init containers? Like this:
kubectl logs <keycloak_pod_name> -n <namespace> -c <container_name>

Yes, I see the same as you:

waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
waiting for postgres to be ready...
sh: 0: unknown operand
sh: curl: not found
sh: jq: not found

SDAdham · 2019-07-13T02:31:19Z

btw @rhopp, this is not only for the keycloak container, but I can imagine it would happening with the Che container as well. Reason I believe this is because I tried to setup che & postgres containers only using global.cheDedicatedKeycloak=false,customOidcProvider=<oidc-url>,cheKeycloakClientId=<oidc_clientId>,customOidcUsernameClaim=<user_name_claim> from https://www.eclipse.org/che/docs/che-6/kubernetes-multi-user.html and yet it didn't work.

SDAdham · 2019-07-13T02:46:28Z

Updated description with the 6.19.5 as I also tested against it and yet the issue is still happening.

SDAdham · 2019-07-15T14:34:33Z

@skabashnyuk
I'm not expert in Kubernetes pods creation and deployments, however, I did some diggings, and here what I came up with:

https://github.com/eclipse/che/blob/772ad38414edef85137040a852640ac490cb0233/deploy/kubernetes/helm/che/templates/deployment.yaml#L33

      - name: wait-for-postgres
        image: alpine:3.5
        command: ["sh", "-c", "apk --no-cache add curl jq ; adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for postgres to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/postgres`; adresses_length=`echo $endpoints | jq -r \".subsets[]?.addresses // [] | length\"`; done;"]

I can imagine that since it's failing for wait-for-postgres, it would fail for wait-for-keycloak container https://github.com/eclipse/che/blob/772ad38414edef85137040a852640ac490cb0233/deploy/kubernetes/helm/che/templates/deployment.yaml#L44

      - name: wait-for-keycloak
        image: alpine:3.5
        command: ["sh", "-c", "apk --no-cache add curl jq ; adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for keycloak to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/keycloak`; adresses_length=`echo $endpoints | jq -r \".subsets[]?.addresses // [] | length\"`; done;"]

https://github.com/eclipse/che/blob/ccda8e309ffb9e3b1494143e775dcd3d00367d08/deploy/kubernetes/helm/che/custom-charts/che-keycloak/templates/deployment.yaml#L24

     - name: wait-for-postgres
        image: alpine:3.5
        command: ["sh", "-c", "apk --no-cache add curl jq ; adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for postgres to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/postgres`; adresses_length=`echo $endpoints | jq -r \".subsets[]?.addresses // [] | length\"`; done;"]

I did some research and came up with this answer: https://stackoverflow.com/a/53308881/5224768

So I did try to change the image from alpine:3.5 to byrnedo/alpine-curl and it did solve the issue with curl but not for the other commands:

waiting for postgres to be ready...
sh: jq: not found
sh: 0: unknown operand

So this put me on a track where there is nothing wrong with the sh command in the yamls however, i'd expect it an issue with the image, right?

SDAdham · 2019-07-15T14:48:17Z

@skabashnyuk tried this image gempesaw/curl-jq from https://hub.docker.com/r/gempesaw/curl-jq
Here is the output:

fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
WARNING: Ignoring http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz: No such file or directory
WARNING: Ignoring http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz: No such file or directory
OK: 7 MiB in 17 packages
waiting for postgres to be ready...

And now it's pulling the image eclipse/che-keycloak:nightly, can you please advise if the image is good to go?

If you see what I did is right, or you have any recommendations, pls let me know and I will create a branch and commit the changes after i test it locally...

skabashnyuk · 2019-07-15T14:52:35Z

eclipse/che-keycloak:nightly - should be ready at midnight I think.
Can you try https://github.com/che-incubator/chectl to install che?

SDAdham · 2019-07-15T14:58:17Z

@skabashnyuk sure, I could try, i'll let you know what happened, btw keycloak is deployed now, i'm waiting for che, shall I cancel testing this change?

Should I also wait tilll tomorrow to use https://github.com/che-incubator/chectl ?

SDAdham · 2019-07-16T03:12:29Z

@skabashnyuk : I tested the chectl and here is the output:

~$ chectl server:start --installer=helm --multiuser --platform=k8s --tls --self-signed-cert
✔ ✈️  Kubernetes preflight checklist
✔ Verify if kubectl is installed
✔ Verify remote kubernetes status...done.
✔ Verify domain is set...set to <domain>.
❯ 🏃‍  Running Helm to install Che
✔ Verify if helm is installed
✔ Create Tiller Role Binding...it already exist.
✔ Create Tiller Service Account...it already exist.
✔ Create Tiller RBAC
✔ Create Tiller Service...it already exist.
✔ Preparing Che Helm Chart...done.
✔ Updating Helm Chart dependencies...done.
✖ Deploying Che Helm Chart
→ Error: release: "che" not found Error: Command failed: /bin/sh -c helm history che --output json
Error: release: "che" not found
    at makeError (/snapshot/chectl/node_modules/execa/index.js:174:9)
    at module.exports.Promise.all.then.arr (/snapshot/chectl/node_modules/execa/index.js:278:16)

Seems related to che-incubator/chectl#108 as my helm version is 2.14.1

SDAdham · 2019-07-16T03:32:50Z

I am using snap for helm client and i'm not sure if it supports downgrading and i'm not sure if it's the client only issue or both client and server?

SDAdham · 2019-07-16T03:35:36Z

hmm, but it's also actually right, there are no che releases? Isn't that what it supposed to check?

skabashnyuk · 2019-07-16T06:18:48Z

@l0rd @benoitf @davidfestal have you seen that issue with helm && chectl before. Can you assist?

l0rd · 2019-07-16T08:40:19Z

@SDAdham what version of chectl are you using? This looks related to che-incubator/chectl#18 that has been fixed a couple of weeks ago.

If that doesn't work plese copy/paste the output of helm list --all --debug here.

And as a workaround, if you are ok to delete your local Che instance and all its data:

chectl server:delete
chectl server:start --installer=helm --multiuser --platform=k8s --tls --self-signed-cert

SDAdham · 2019-07-16T09:53:06Z

@l0rd I am using the latest version of chectl https://github.com/che-incubator/chectl/releases (freshly installed today)

I don't have any che installed cuz I can't install it at all in the first place and even though all installations are usually failing, I run chectl server:delete all the time to confirm nothing is installed

Here is the output of helm list

$helm list --all --debug
[debug] Created tunnel using local port: '36823'

[debug] SERVER: "127.0.0.1:36823"

Edit:
Reference:

~$ chectl
Eclipse Che CLI

VERSION chectl/0.0.2-b508feb linux-x64 node-v10.4.1

SDAdham · 2019-07-16T11:27:33Z

New update: After downloading another release of today chectl/0.0.2-68f8872 linux-x64 node-v10.4.1 as this is tested now but I am getting a different error:

Error: Unable to execute helm command helm upgrade --install che --force --namespace che --set global.ingressDomain=<domain> --set global.cheDomain=<domain> --set cheImage=eclipse/che-server:nightly --set global.cheWorkspacesNamespace=che -f /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/values/multi-user.yaml -f /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/values/tls.yaml /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/ / Error: validation failed: [unable to recognize "": no matches for kind "Certificate" in version "certmanager.k8s.io/v1alpha1", unable to recognize "": no matches for kind "ClusterIssuer" in version "certmanager.k8s.io/v1alpha1"]
    at HelmHelper.<anonymous> (/snapshot/chectl/lib/installers/helm.js:0:0)
    at Generator.next (<anonymous>)
    at fulfilled (/snapshot/chectl/node_modules/tslib/tslib.js:107:62)

SDAdham · 2019-07-16T11:47:55Z

Alright, been able to past that stage after installing the certificate, however it seems that chectl command is putting us back to the main issue again, so no progress and no differences it made:

waiting for postgres to be ready...
sh: 0: unknown operand
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
waiting for postgres to be ready...
sh: 0: unknown operand
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...

Details:
Running chectl command fails with this:

Checking on logs for the container wait-for-postgres for keycloak and it seems to be the same issue as described in the description.

SDAdham · 2019-07-16T11:48:44Z

@l0rd @skabashnyuk @benoitf no progress at all, we're back to the original issue

SDAdham · 2019-07-16T12:19:46Z

According to @benoitf : it seems that it could be related to k8s version 1.15

skabashnyuk · 2019-07-16T14:04:21Z

@SDAdham at this moment on clean k8s with chectl server:start --installer=helm --multiuser --platform=k8s --tls --self-signed-cert command you have an issue with postgresql at this moment. Am I right?

SDAdham · 2019-07-16T14:37:13Z

@skabashnyuk yes, that's correct.

And everything is done as per the instructions and documentations, the only thing that's done that is not documented but with help of @benoitf is installing the cert-manager with the following:

$ kubectl create namespace cert-manager
$ kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v0.8.1/cert-manager.yaml --validate=false

So I can bypass the issue which required me to manually install the cert-manager:

Error: Unable to execute helm command helm upgrade --install che --force --namespace che --set global.ingressDomain=<domain> --set global.cheDomain=<domain> --set cheImage=eclipse/che-server:nightly --set global.cheWorkspacesNamespace=che -f /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/values/multi-user.yaml -f /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/values/tls.yaml /home/MyUser/.cache/chectl/templates/kubernetes/helm/che/ / Error: validation failed: [unable to recognize "": no matches for kind "Certificate" in version "certmanager.k8s.io/v1alpha1", unable to recognize "": no matches for kind "ClusterIssuer" in version "certmanager.k8s.io/v1alpha1"]
    at HelmHelper.<anonymous> (/snapshot/chectl/lib/installers/helm.js:0:0)
    at Generator.next (<anonymous>)
    at fulfilled (/snapshot/chectl/node_modules/tslib/tslib.js:107:62)

skabashnyuk · 2019-07-16T14:38:24Z

can you try the same without --tls --self-signed-cert ?

SDAdham · 2019-07-16T14:41:27Z

@skabashnyuk I can do that, but I don't really think that that's going to make any difference as the whole problem is that the commands are undefined, this step you're asking for is going further away from the actual problem.

Please refer to this comment #13838 (comment)

skabashnyuk · 2019-07-16T14:50:13Z

do you have internet available for you k8s cluster? I see no other reasons why init container command might fail.

benoitf · 2019-07-16T14:52:05Z

issue on keycloak is due to the latest changes on keycloak image and in helm templates

init container is trying to install curl and jq and receive

/ $  apk --no-cache add curl jq
ERROR: Unable to lock database: Permission denied
ERROR: Failed to open apk database: Permission denied

so further commands are not working

I didn't have these as I was using cached helm templates with latest image

SDAdham · 2019-07-16T15:00:34Z

Thanks @benoitf , So this explains why the issue is rising but it wasn't seen? And is that still related to k8s 1.15?

SDAdham · 2019-07-16T15:01:40Z

also, i'm not sure how can I see:

ERROR: Unable to lock database: Permission denied
ERROR: Failed to open apk database: Permission denied

SDAdham · 2019-07-16T15:19:44Z

@skabashnyuk I tried your suggestion and no difference like I mentioned.

@benoitf I did try to see if I have the same issue that would be matching the issue you pointed out but turns out no:

$ kubectl exec pod/keycloak-5c79d755db-qh26f -n che -ti -c wait-for-postgres sh
/ # apk --no-cache add curl jq ; adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for postgres to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/postgres`; adresses_length=`echo $endpoints | jq -r ".subsets[]?.addresses // [] | length"`; done;
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
WARNING: Ignoring http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz: No such file or directory
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
WARNING: Ignoring http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz: No such file or directory
ERROR: unsatisfiable constraints:
  curl (missing):
    required by: world[curl]
  jq (missing):
    required by: world[jq]
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...
sh: curl: not found
sh: jq: not found
sh: 0: unknown operand
waiting for postgres to be ready...

benoitf · 2019-07-16T15:39:39Z

@SDAdham
could you edit helm/che/custom-charts/che-keycloak/templates/deployment.yaml

and use

    spec:
      initContainers:
      - name: wait-for-postgres
        image: everpeace/curl-jq
        command: ["sh", "-c", "adresses_length=0; until [ $adresses_length -gt 0 ]; do echo waiting for postgres to be ready...; sleep 2; endpoints=`curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\"     https://kubernetes.default/api/v1/namespaces/$POD_NAMESPACE/endpoints/postgres`; adresses_length=`echo $endpoints | jq -r \".subsets[]?.addresses // [] | length\"`; done;"]

as a workaround (it's using a jq/curl image and not install it at runtime)

l0rd · 2019-07-16T15:41:08Z

@benoitf it looks like we can split this issue in 2:

privileges issue in the keycloak init container: keycloak image init containers shouldn't do post-install stuff #13870
documentation: how to configure the cert-manager

Is that correct?

benoitf · 2019-07-16T15:42:37Z

@l0rd yes but cert-manager doc is documented in the "remote install of che"
I will probably add an extra check to ensure cert-manager is found in chectl if tls is enabled (like now we're checking if che-tls secret is there as well)

l0rd · 2019-07-16T15:43:35Z

I will probably add an extra check in chectl if tls is enabled (like checking if che-tls secret is there as well)

Makes sense

SDAdham · 2019-07-16T15:47:02Z

@benoitf As per this comment: #13838 (comment) and this comment: #13838 (comment)
I did update the yamls, yes it's a different image other than what you mentioned, even though it worked, but one thing I noticed, updating the yamls doesn't take effect, I had to update the image in the runtime, can you please advise how can I update the yamls and in the same time, it would take effect in the runtime afterwards?

benoitf · 2019-07-16T15:51:19Z

@SDAdham drop the helm cache
$ helm delete che --purge

SDAdham · 2019-07-16T15:52:12Z

How can I drop the helm cache? I always use this >> $ helm delete che --purge but it doesn't really make any difference for helm not to use the old image still?

skabashnyuk · 2019-07-17T07:35:59Z

@SDAdham I'm going to ask @mkuznyetsov to handle this task. The plan is to create an alpine based image under eclipse org that we can use as init container.
CC @vparfonov

SDAdham · 2019-07-17T10:25:32Z

@skabashnyuk looks about good plan, ill be around if any help needed, id prefer to wait for chectl as this is the most prefered/supported way to go as im looking forward for a production server

davidwindell · 2019-07-17T10:38:14Z

Here we are using Rancher 2 where chectl isn't useful for us, we can just launch the Helm chart via the GUI so it would be good to see chectl not made a required element where bugs like this are fixed (which I'm sure will be the case, just wanted to highlight that there are users out there deploying on K8s in different ways).

SDAdham · 2019-07-17T12:15:18Z

@davidwindell i do agree but setting a standard is the best way to move, this way u know whats coming for u but if u do something different then ur on ur own, thats why consistency always succeeds.

I already solved the issue by a different image, and i can go with this. But i prefer to stick to the solution that will be provided to this issue.

Deploying with helm was a way for me to go with initially but since chectl is working, i prefer that more

SDAdham · 2019-07-17T13:34:28Z

@skabashnyuk @nickboldt : can we add this issue to the mile stone of V7.0, I'm not sure if we should release 7.0 with issues in deployment?

l0rd · 2019-07-17T14:47:15Z

@SDAdham actually #13870 is already in 7.0.0. If you are ok we should close this issue in favor of the #13870

@davidwindell chectl server:start --installer helm is just a helm update wrapper that does some pre-flight checks and automated configuration based on the platform you are deploying Che. If something can be fixed at helm chart level we will fix it at that level. I agree with you: it doesn't make sense to do a fix at chectl level.

In this case the fix consist in modifying the permissions in a container image folder. Every installer will benefit of it.

SDAdham · 2019-07-17T15:01:38Z

@l0rd, if nothing is needed from my side. Please feel free to close this one. Ill track the other issue on when its resolved

l0rd · 2019-07-17T16:11:21Z

Closing in favor of #13870 that has a clear description of the main issue that has been reported here. Thank you @SDAdham and see you on the other side :-)

SDAdham mentioned this issue Jul 12, 2019

Multi-user Che fails to start on k8s using helm #13625

Closed

rhopp added team/platform status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach labels Jul 12, 2019

skabashnyuk added the severity/P1 Has a major impact to usage or development of the system. label Jul 13, 2019

nickboldt added the status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering. label Jul 15, 2019

nickboldt added this to the 7.1.0 milestone Jul 15, 2019

skabashnyuk mentioned this issue Jul 16, 2019

Eclipse Che Installation in AWS EC2, failure to start a workspace #13690

Closed

nickboldt removed the status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering. label Jul 16, 2019

benoitf mentioned this issue Jul 16, 2019

Add securityContext.fsGroup. #13798

Merged

benoitf mentioned this issue Jul 16, 2019

keycloak image init containers shouldn't do post-install stuff #13870

Closed

23 tasks

l0rd removed this from the 7.1.0 milestone Jul 16, 2019

benoitf added kind/bug Outline of a bug - must adhere to the bug report template. and removed status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach labels Jul 16, 2019

skabashnyuk mentioned this issue Jul 17, 2019

Platform-2019-07-23 (Sprint: 169) #13682

Closed

13 tasks

SDAdham closed this as completed Jul 17, 2019

SDAdham reopened this Jul 17, 2019

l0rd closed this as completed Jul 17, 2019

Pods are not initializing for Che and keycloak kubernetes #13838

Pods are not initializing for Che and keycloak kubernetes #13838

Comments

SDAdham commented Jul 12, 2019 • edited

Describe the bug

Che version

Steps to reproduce

Expected behavior

Runtime

Screenshots

Installation method

Environment

Additional context

SDAdham commented Jul 12, 2019

rhopp commented Jul 12, 2019

SDAdham commented Jul 13, 2019

SDAdham commented Jul 13, 2019

SDAdham commented Jul 13, 2019

SDAdham commented Jul 15, 2019

SDAdham commented Jul 15, 2019 • edited

skabashnyuk commented Jul 15, 2019

SDAdham commented Jul 15, 2019 • edited

SDAdham commented Jul 16, 2019 • edited

SDAdham commented Jul 16, 2019

SDAdham commented Jul 16, 2019

skabashnyuk commented Jul 16, 2019

l0rd commented Jul 16, 2019

SDAdham commented Jul 16, 2019 • edited

SDAdham commented Jul 16, 2019

SDAdham commented Jul 16, 2019

SDAdham commented Jul 16, 2019 • edited

SDAdham commented Jul 16, 2019

skabashnyuk commented Jul 16, 2019

SDAdham commented Jul 16, 2019 • edited

skabashnyuk commented Jul 16, 2019

SDAdham commented Jul 16, 2019 • edited

skabashnyuk commented Jul 16, 2019

benoitf commented Jul 16, 2019

SDAdham commented Jul 16, 2019

SDAdham commented Jul 16, 2019

SDAdham commented Jul 16, 2019

benoitf commented Jul 16, 2019

l0rd commented Jul 16, 2019 • edited

benoitf commented Jul 16, 2019 • edited

l0rd commented Jul 16, 2019

SDAdham commented Jul 16, 2019

benoitf commented Jul 16, 2019

SDAdham commented Jul 16, 2019

skabashnyuk commented Jul 17, 2019

SDAdham commented Jul 17, 2019

davidwindell commented Jul 17, 2019

SDAdham commented Jul 17, 2019 • edited

SDAdham commented Jul 17, 2019

l0rd commented Jul 17, 2019

SDAdham commented Jul 17, 2019

l0rd commented Jul 17, 2019

SDAdham commented Jul 12, 2019 •

edited

SDAdham commented Jul 15, 2019 •

edited

SDAdham commented Jul 15, 2019 •

edited

SDAdham commented Jul 16, 2019 •

edited

SDAdham commented Jul 16, 2019 •

edited

SDAdham commented Jul 16, 2019 •

edited

SDAdham commented Jul 16, 2019 •

edited

SDAdham commented Jul 16, 2019 •

edited

l0rd commented Jul 16, 2019 •

edited

benoitf commented Jul 16, 2019 •

edited

SDAdham commented Jul 17, 2019 •

edited