Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eclipse Che pod bootstrap timeout on chectl install, when using Che operator with TLS and unsigned certificate on non-OpenShift kube #16280

Closed
jgwest opened this issue Mar 6, 2020 · 14 comments
Assignees
Labels
area/chectl Issues related to chectl, the CLI of Che kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Milestone

Comments

@jgwest
Copy link

jgwest commented Mar 6, 2020

Describe the bug

When attempting to install Che 7.9.0 on generic Kubernetes, with TLS enabled and a self-signed certificate, using chectl via the Che operator, the Che pod fails to start due to an inability to connect to Keycloak.

chectl server:start --platform=k8s --installer=operator --domain=(cluster ip).nip.io --che-operator-cr-yaml=./codewind-checluster.yaml --che-operator-image=quay.io/eclipse/che-operator:7.9.0 --tls --self-signed-cert

The Che pod appears not to allow connecting to Keycloak via a self-signed certificate.

As per the attached Che pod logs, the che pod is failing to start due to the following exception

Caused by: java.lang.RuntimeException: Exception while retrieving OpenId configuration from endpoint: https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/.well-known/openid-configuration
  at org.eclipse.che.multiuser.keycloak.server.KeycloakSettings.<init>(KeycloakSettings.java:104)
  at org.eclipse.che.multiuser.keycloak.server.KeycloakSettings$$FastClassByGuice$$e0d0786b.newInstance(<generated>)
  at com.google.inject.internal.DefaultConstructionProxyFactory$FastClassProxy.newInstance(DefaultConstructionProxyFactory.java:89)
  (... edit ...)
  at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4699)
  at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5165)
  at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
  at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:743)
  at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:719)
  at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:714)
  at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:970)
  at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1841)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative DNS name matching keycloak-che.9.42.80.171.nip.io found.
  at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
  (... edit ...)
  at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:268)
  at java.net.URL.openStream(URL.java:1067)
  at org.eclipse.che.multiuser.keycloak.server.KeycloakSettings.<init>(KeycloakSettings.java:97)
  ... 124 more
Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching keycloak-che.9.42.80.171.nip.io found.
  at sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:214)
  at sun.security.util.HostnameChecker.match(HostnameChecker.java:96)
  at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:462)
  (... edit ...)
  at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1621)
  ... 138 more

The che pod appears to be attempting to access this URL https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/.well-known/openid-configuration URL, which I am able to successfully access from my browser (albeit behind a self-signed cert browser warning) and curl:

jgw@pulse-orange$ curl https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/.well-known/openid-configuration --insecure

{"issuer":"https://keycloak-che.9.42.80.171.nip.io/auth/realms/che","authorization_endpoint":"https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/protocol/openid-connect/auth","token_endpoint":"https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/protocol/openid-connect/token","token_introspection_endpoint":"https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/protocol/openid-connect/token/introspect","userinfo_endpoint":"https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/protocol/openid-connect/userinfo","end_session_endpoint":"https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/protocol/openid-connect/logout","jwks_uri":"https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/protocol/openid-connect/certs","check_session_iframe":"https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/protocol/openid-connect/login-status-iframe.html","grant_types_supported":["authorization_code","implicit","refresh_token","password","client_credentials"],"response_types_supported":["code","none","id_token","token","id_token token","code id_token","code token","code id_token token"],"subject_types_supported":["public","pairwise"],"id_token_signing_alg_values_supported":["PS384","ES384","RS384","HS256","HS512","ES256","RS256","HS384","ES512","PS256","PS512","RS512"],"userinfo_signing_alg_values_supported":["PS384","ES384","RS384","HS256","HS512","ES256","RS256","HS384","ES512","PS256","PS512","RS512","none"],"request_object_signing_alg_values_supported":["PS384","ES384","RS384","ES256","RS256","ES512","PS256","PS512","RS512","none"],"response_modes_supported":["query","fragment","form_post"],"registration_endpoint":"https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/clients-registrations/openid-connect","token_endpoint_auth_methods_supported":["private_key_jwt","client_secret_basic","client_secret_post","client_secret_jwt"],"token_endpoint_auth_signing_alg_values_supported":["RS256"],"claims_supported":["aud","sub","iss","auth_time","name","given_name","family_name","preferred_username","email"],"claim_types_supported":["normal"],"claims_parameter_supported":false,"scopes_supported":["openid","microprofile-jwt","web-origins","roles","phone","address","email","profile","offline_access"],"request_parameter_supported":true,"request_uri_parameter_supported":true,"code_challenge_methods_supported":["plain","S256"],"tls_client_certificate_bound_access_tokens":true,"introspection_endpoint":"https://keycloak-che.9.42.80.171.nip.io/auth/realms/che/protocol/openid-connect/token/introspect"}

A Helm install against the same cluster, using the following install command, does not exhibit this problem:

chectl server:start --platform=k8s --installer=helm --domain=9.42.80.171.nip.io --multiuser --tls  --self-signed-cert

Che version

7.9.0

Steps to reproduce

  1. Generate Che self-signed certs and create them as secrets in the che namespace:
export CLUSTER_IP=(cluster ip)

CA_CN=eclipse-che-signer
DOMAIN="*.$CLUSTER_IP.nip.io"
OPENSSL_CNF="/usr/lib/ssl/openssl.cnf"

OUT_DIR="`cd ~;pwd`"

openssl genrsa -out rootCA.key 4096

openssl req -x509 \
  -new -nodes \
  -key rootCA.key \
  -sha256 \
  -days 1024 \
  -out rootCA.crt \
  -subj /CN=${CA_CN} \
  -reqexts SAN \
  -extensions SAN \
  -config <(cat ${OPENSSL_CNF} \
      <(printf '[SAN]\nbasicConstraints=critical, CA:TRUE\nkeyUsage=keyCertSign, cRLSign, digitalSignature, keyEncipherment'))

openssl genrsa -out domain.key 2048

openssl req -new -sha256 \
    -key domain.key \
    -subj "/O=EclipseChe/CN=${DOMAIN}" \
    -reqexts SAN \
    -config <(cat ${OPENSSL_CNF} \
        <(printf "\n[SAN]\nsubjectAltName=DNS:${DOMAIN}\nbasicConstraints=critical, CA:FALSE\nkeyUsage=keyCertSign, digitalSignature, keyEncipherment\nextendedKeyUsage=serverAuth")) \
    -out domain.csr

openssl x509 \
        -req \
        -sha256 \
        -extfile <(printf "subjectAltName=DNS:${DOMAIN}\nbasicConstraints=critical, CA:FALSE\nkeyUsage=keyCertSign,                       digitalSignature, keyEncipherment\nextendedKeyUsage=serverAuth") \
        -days 365 \
        -in domain.csr \
        -CA rootCA.crt \
        -CAkey rootCA.key \
        -CAcreateserial -out "$OUT_DIR/domain.crt"

cp rootCA.crt "$OUT_DIR/ca.crt"


kubectl create namespace che
kubectl create secret tls che-tls --key=domain.key "--cert=$OUT_DIR/domain.crt" -n che
kubectl create secret generic self-signed-cert "--from-file=$OUT_DIR/ca.crt" -n che
  1. Apply the custom clusterrole, which will be referenced in the next step
kubectl apply -f https://raw.githubusercontent.com/eclipse/codewind-che-plugin/master/setup/install_che/codewind-clusterrole.yaml
  1. Download CheCluster operator resource YAML for use by chectl
wget https://raw.githubusercontent.com/eclipse/codewind-che-plugin/master/setup/install_che/che-operator/codewind-checluster.yaml
  • Edit the file and replace ingressDomain: '' with your ingress domain (eg ` ingressDomain: '9.42.80.171.nip.io')
  1. On a non-OpenShift Kubernetes distribution, attempt to install Che using operator install from chectl, using a self-signed certificate.
chectl server:start --platform=k8s --installer=operator --domain=(cluster ip).nip.io --che-operator-cr-yaml=./codewind-checluster.yaml --che-operator-image=quay.io/eclipse/che-operator:7.9.0 --tls --self-signed-cert

Output

chectl server:start --platform=k8s --installer=operator --domain=9.42.80.171.nip.io --che-operator-cr-yaml=/home/ibmadmin/codewind-checluster.yaml --che-operator-image=quay.io/eclipse/che-operator:7.9.0 --tls --self-signed-cert
  ✔ Verify Kubernetes API...OK
  ✔ 👀  Looking for an already existing Eclipse Che instance
    ✔ Verify if Eclipse Che is deployed into namespace "che"...it is not
  ✔ ✈️  Kubernetes preflight checklist
    ✔ Verify if kubectl is installed
    ✔ Verify remote kubernetes status...done.
    ✔ Check Kubernetes version: Found v1.17.3+k3s1.
    ✔ Verify domain is set...set to 9.42.80.171.nip.io.
Eclipse Che logs will be available in '/tmp/chectl-logs/1583510159056'
  ✔ Start following logs
    ✔ Start following Eclipse Che logs...done
    ✔ Start following Postgres logs...done
    ✔ Start following Keycloak logs...done
    ✔ Start following Plugin registry logs...done
    ✔ Start following Devfile registry logs...done
  ✔ Start following events
    ✔ Start following namespace events...done
 ›   Warning: Eclipse Che will be deployed in Multi-User mode as 'operator' installer supports only that mode.
  ✔ 🏃‍  Running the Che Operator
    ✔ Copying operator resources...done.
    ✔ Create Namespace (che)...It already exists.
    ✔ Create ServiceAccount che-operator in namespace che...done.
    ✔ Create Role che-operator in namespace che...done.
    ✔ Create ClusterRole che-operator...done.
    ✔ Create RoleBinding che-operator in namespace che...done.
    ✔ Create ClusterRoleBinding che-operator...done.
    ✔ Create CRD checlusters.org.eclipse.che...done.
    ✔ Waiting 5 seconds for the new Kubernetes resources to get flushed...done.
    ✔ Create deployment che-operator in namespace che...done.
    ✔ Create Eclipse Che Cluster eclipse-che in namespace che...done.
  ❯ ✅  Post installation checklist
    ✔ PostgreSQL pod bootstrap
      ✔ scheduling...done.
      ✔ downloading images...done.
      ✔ starting...done.
    ✔ Keycloak pod bootstrap
      ✔ scheduling...done.
      ✔ downloading images...done.
      ✔ starting...done.
    ✔ Devfile registry pod bootstrap
      ✔ scheduling...done.
      ✔ downloading images...done.
      ✔ starting...done.
    ✔ Plugin registry pod bootstrap
      ✔ scheduling...done.
      ✔ downloading images...done.
      ✔ starting...done.
    ❯ Eclipse Che pod bootstrap
      ✔ scheduling...done.
      ✔ downloading images...done.
      ✖ starting
        → ERR_TIMEOUT: Timeout set to pod ready timeout 130000
      Retrieving Eclipse Che Server URL
      Eclipse Che status check
 ›   Error: Error: ERR_TIMEOUT: Timeout set to pod ready timeout 130000
 ›   Installation failed, check logs in '/tmp/chectl-logs/1583510159056'

See attached logs below.

Expected behavior

Che pod to successfully start after connecting to Keycloak endpoint.

Runtime

Kubernetes:

  • Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3+k3s1", GitCommit:"5b17a175ce333dfb98cb8391afeb1f34219d9275", GitTreeState:"clean", BuildDate:"2020-02-27T07:28:53Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
  • Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3+k3s1", GitCommit:"5b17a175ce333dfb98cb8391afeb1f34219d9275", GitTreeState:"clean", BuildDate:"2020-02-27T07:28:53Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}

Installation method

chectl -platform=k8s --installer=operator, see above for more info.

Environment

Ubuntu 18.04 LTS server

Eclipse Che Logs

ZIP of /tmp/chectl-logs/1583510159056
chectl-logs.zip

@jgwest jgwest added the kind/bug Outline of a bug - must adhere to the bug report template. label Mar 6, 2020
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Mar 6, 2020
@ibuziuk ibuziuk added severity/P1 Has a major impact to usage or development of the system. team/deploy status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. area/chectl Issues related to chectl, the CLI of Che and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. team/deploy severity/P1 Has a major impact to usage or development of the system. labels Mar 9, 2020
@ibuziuk
Copy link
Member

ibuziuk commented Mar 9, 2020

@tolusha could you please take a look?

@sleshchenko
Copy link
Member

@jgwest It would be useful if you provide details of generated certificate, you can do it into your browser[1] or via open-ssl[2]
1: Click certificate
Screenshot_20200310_112850
2: you can find details here https://serverfault.com/questions/215606/how-do-i-view-the-details-of-a-digital-certificate-cer-file

@elavicount
Copy link

Hi, any way to fix this issue before the PR is ready?

@tolusha
Copy link
Contributor

tolusha commented Mar 10, 2020

@eder-santos
We are investigating

@dmytro-ndp
Copy link
Contributor

The issue has been reproduced for installation on minishift 3.11 using custom-resource.yaml with

      selfSignedCert: true
      tlsSupport: true

https://ci.centos.org/view/Devtools/job/devtools-che-pullrequests-java-selenium-tests/186/consoleFull

@jgwest
Copy link
Author

jgwest commented Mar 11, 2020

@jgwest It would be useful if you provide details of generated certificate, you can do it into your browser[1] or via open-ssl[2]

Sounds like it has been reproduced, but here are example self-signed certs generated by the reproduction steps, if additional information is needed from them: certs.zip

@mmorhun
Copy link
Contributor

mmorhun commented Mar 11, 2020

@jgwest I've started investigation of the issue.
First, which is probably a typo, --domain=(cluster ip).nip.io should be --domain=$(cluster ip).nip.io. And if one omits the domain flag Che will try to autodetect it.
Second, the right name for the self signed secret is self-signed-certificate. So the command

kubectl create secret generic self-signed-cert "--from-file=$OUT_DIR/ca.crt" -n che

should be

kubectl create secret generic self-signed-certificate "--from-file=$OUT_DIR/ca.crt" -n che

But it still doesn't help... I continue the investigation.

@jgwest
Copy link
Author

jgwest commented Mar 11, 2020

Thanks @mmorhun, re: self-signed-cert, looks like I used the example from the Che docs: https://www.eclipse.org/che/docs/che-7/setup-che-in-tls-mode-with-self-signed-certificate/#procedure-2

@mmorhun
Copy link
Contributor

mmorhun commented Mar 12, 2020

Another thing which I've found is wrong ingress to secret binding. All Che ingresses should have secretName set to che-tls.

@mmorhun
Copy link
Contributor

mmorhun commented Mar 13, 2020

@jgwest with all the changes from the PRs above it should work (tested on minikube though).
And don't forget about self-signed-certificate secret name.
P.S. The certificate generated in the steps to reproduce is not accepted by Chrome, but ok for Firefox, curl, openssl

@eder-santos

Hi, any way to fix this issue before the PR is ready?

One may implicitly set k8s.tlsSecretName to che-tls.
And, of course, the self signed certificate secret name should be self-signed-certificate not self-signed-cert.

@mmorhun
Copy link
Contributor

mmorhun commented Mar 13, 2020

@themr0c @boczkowska we have mistake in our docs, please see this comment and the one above.

@jgwest
Copy link
Author

jgwest commented Mar 13, 2020

@mmorhun - Looks good, with those two changes I am able to install Che as expected. 👍

Re: docs, it's this page that still suggests to use self-signed-cert as the secret name: https://github.com/eclipse/che-docs/blame/master/src/main/pages/che-7/contributor-guide/proc_deploy-che-with-self-signed-tls-on-kubernetes.adoc#L41

@mmorhun
Copy link
Contributor

mmorhun commented Mar 16, 2020

I am going to create a PR into docs.

@mmorhun
Copy link
Contributor

mmorhun commented Mar 16, 2020

I think the problem is resolved, so closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/chectl Issues related to chectl, the CLI of Che kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

8 participants