Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eclipse Che installation on aks with DevWorkspace #21023

Closed
martinelli-francesco opened this issue Jan 14, 2022 · 19 comments
Closed

Eclipse Che installation on aks with DevWorkspace #21023

martinelli-francesco opened this issue Jan 14, 2022 · 19 comments
Labels
area/install Issues related to installation, including offline/air gap and initial setup kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. severity/P1 Has a major impact to usage or development of the system.

Comments

@martinelli-francesco
Copy link

martinelli-francesco commented Jan 14, 2022

Describe the bug

Installation failed in aks with DevWorkspace.

Che version

7.41@latest

Steps to reproduce

patch.yaml must contain the following:

spec:
  devWorkspace:
    enable: true

Expected behavior

Installation completes fine.

Runtime

Kubernetes (vanilla)

Screenshots

No response

Installation method

chectl/latest

Environment

Azure

Eclipse Che Logs

> chectl server:deploy --che-operator-cr-patch-yaml=patch.yaml --platform=k8s --domain=my.domain.azr
› Installer type is set to: 'operator'
› Current Kubernetes context: 'eclipse-che'
  √ Verify Kubernetes API...OK
  √ 👀  Looking for an already existing Eclipse Che instance
    √ Verify if Eclipse Che is deployed into namespace "eclipse-che"...it is not
  √ 🧪  DevWorkspace engine (experimental / technology preview) 🚨
    √ Verify cert-manager installation
      √ Check Cert Manager deployment...already deployed
      √ Wait for Cert Manager...ready
  √ ✈️  Kubernetes preflight checklist
    √ Verify if kubectl is installed
    √ Verify remote kubernetes status...done.
    √ Check Kubernetes version: Found v1.19.13.
    √ Verify domain is set...set to my.domain.azr.
    ↓ Check if cluster accessible [skipped]
  √ Following Eclipse Che logs
    √ Start following Operator logs...done
    √ Start following Eclipse Che Server logs...done
    √ Start following PostgreSQL logs...done
    √ Start following Keycloak logs...done
    √ Start following Plug-in Registry logs...done
    √ Start following Devfile Registry logs...done
    √ Start following Eclipse Che Dashboard logs...done
    √ Start following namespace events...done
  √ Create Namespace eclipse-che...[Exists]
  √ Create Namespace eclipse-che...[Exists]
  √ 🏃‍  Running the Eclipse Che operator
    √ Create ServiceAccount che-operator in namespace eclipse-che...done.
    √ Read Roles and Bindings...done.
    √ Creating Roles and Bindings...done.
    √ Create CRD checlusters.org.eclipse.che...done.
    √ Create backup and restore CRDs...done.
    √ Waiting 5 seconds for the new Kubernetes resources to get flushed...done.
    √ Create deployment che-operator in namespace eclipse-che...done.
    √ Operator pod bootstrap
      √ Scheduling...done
      √ Downloading images...done
      √ Starting...done
    √ Prepare Eclipse Che cluster CR...Done.
    √ Create the Custom Resource of type checlusters.org.eclipse.che in the namespace eclipse-che...done.
  > ✅  Post installation checklist
    √ PostgreSQL pod bootstrap
      √ Scheduling...done
      √ Downloading images...done
      √ Starting...done
    √ Keycloak pod bootstrap...skipped
    √ Devfile Registry pod bootstrap
      √ Scheduling...done
      √ Downloading images...done
      √ Starting...done
    √ Plug-in Registry pod bootstrap
      √ Scheduling...done
      √ Downloading images...done
      √ Starting...done
    √ Eclipse Che Dashboard pod bootstrap
      √ Scheduling...done
      √ Downloading images...done
      √ Starting...done
    > Eclipse Che Server pod bootstrap
      √ Scheduling...done
      √ Downloading images...done
      × Starting...failed
        → Failed to start a pod, reason: Error, exitCode: 137
      Eclipse Che status check
    Retrieving Che self-signed CA certificate
    Prepare post installation output
    Error: Command server:deploy failed.

eclipse-che-logs.zip

Logs of %LocalAppData%\chectl\error.log.

2022-01-17T09:15:02.879Z Warning: Consider using the more reliable 'OLM' installer when deploying a stable release of Eclipse Che (--installer=olm).
2022-01-17T09:15:02.879Z     at Object.warn (C:\Users\framar\AppData\Local\chectl\client\7.41.2\node_modules\@oclif\errors\lib\index.js:49:15)
2022-01-17T09:15:02.879Z     at Deploy.warn (C:\Users\framar\AppData\Local\chectl\client\7.41.2\node_modules\@oclif\command\lib\command.js:57:16)
2022-01-17T09:15:02.879Z     at OperatorTasks.<anonymous> (C:\Users\framar\AppData\Local\chectl\client\7.41.2\lib\tasks\installers\operator.js:151:25)
2022-01-17T09:15:02.879Z     at Generator.next (<anonymous>)
2022-01-17T09:15:02.879Z     at fulfilled (C:\Users\framar\AppData\Local\chectl\client\7.41.2\node_modules\tslib\tslib.js:114:62)
2022-01-17T09:15:02.879Z     at processTicksAndRejections (internal/process/task_queues.js:97:5)
2022-01-17T09:26:20.631Z Error: Command server:deploy failed. Error log: C:/Users/framar/AppData/Local/chectl/error.log.
2022-01-17T09:26:20.631Z     at newError (C:/Users/framar/AppData/Local/chectl/client/7.41.2/lib/util.js:199:19)
2022-01-17T09:26:20.631Z     at Object.wrapCommandError (C:/Users/framar/AppData/Local/chectl/client/7.41.2/lib/util.js:195:12)
2022-01-17T09:26:20.631Z     at Deploy.<anonymous> (C:/Users/framar/AppData/Local/chectl/client/7.41.2/lib/commands/server/deploy.js:228:35)
2022-01-17T09:26:20.631Z     at Generator.throw (<anonymous>)
2022-01-17T09:26:20.631Z     at rejected (C:/Users/framar/AppData/Local/chectl/client/7.41.2/node_modules/tslib/tslib.js:115:69)
2022-01-17T09:26:20.631Z     at runMicrotasks (<anonymous>)
2022-01-17T09:26:20.631Z Cause: Error: Failed to start a pod, reason: Error, exitCode: 137
2022-01-17T09:26:20.631Z     at KubeTasks.<anonymous> (C:/Users/framar/AppData/Local/chectl/client/7.41.2/lib/tasks/kube.js:134:35)
2022-01-17T09:26:20.631Z     at Generator.next (<anonymous>)
2022-01-17T09:26:20.631Z     at fulfilled (C:/Users/framar/AppData/Local/chectl/client/7.41.2/node_modules/tslib/tslib.js:114:62)
2022-01-17T09:26:20.631Z     at runMicrotasks (<anonymous>)

Additional context

@martinelli-francesco martinelli-francesco added the kind/bug Outline of a bug - must adhere to the bug report template. label Jan 14, 2022
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Jan 14, 2022
@tolusha
Copy link
Contributor

tolusha commented Jan 14, 2022

pls, do chectl server:logs and attach obtained logs to figure out issue with che-server start up

@martinelli-francesco
Copy link
Author

pls, do chectl server:logs and attach obtained logs to figure out issue with che-server start up

just added. Thank you.

@tolusha
Copy link
Contributor

tolusha commented Jan 17, 2022

@martinelli-francesco
Unfortunately che-server logs end with

------------------------------------------------------------------
GMS: address=che-85cd76dc78-mrhvm-34251, cluster=EclipseLinkCommandChannel, physical address=10.244.0.41:7803
-------------------------------------------------------------------
2022-01-14 15:48:18,870[main]             [INFO ] [o.jgroups.protocols.pbcast.GMS 125]  - che-85cd76dc78-mrhvm-34251: no members discovered after 3096 ms: creating cluster as coordinator
2022-01-14 15:48:18,930[main]             [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 182]   - Configured factories for environments: '[kubernetes, no-environment]'
2022-01-14 15:48:18,930[main]             [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 183]   - Registered infrastructure 'kubernetes'
2022-01-14 15:48:18,996[main]             [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 705]   - Infrastructure is tracking 0 active runtimes that need to be stopped
2022-01-14 15:48:19,003[main]             [INFO ] [o.e.c.m.oidc.OIDCInfoProvider 71]    - Retrieving OpenId configuration from endpoint: http://vista.qplatform.it:5050/auth/.well-known/openid-configuration

and don't contain any stracktraces.
Could you grab them one more time and ensure that stacktrace is there?

@themr0c themr0c added area/install Issues related to installation, including offline/air gap and initial setup severity/P1 Has a major impact to usage or development of the system. and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Jan 17, 2022
@martinelli-francesco
Copy link
Author

@martinelli-francesco Unfortunately che-server logs end with

------------------------------------------------------------------
GMS: address=che-85cd76dc78-mrhvm-34251, cluster=EclipseLinkCommandChannel, physical address=10.244.0.41:7803
-------------------------------------------------------------------
2022-01-14 15:48:18,870[main]             [INFO ] [o.jgroups.protocols.pbcast.GMS 125]  - che-85cd76dc78-mrhvm-34251: no members discovered after 3096 ms: creating cluster as coordinator
2022-01-14 15:48:18,930[main]             [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 182]   - Configured factories for environments: '[kubernetes, no-environment]'
2022-01-14 15:48:18,930[main]             [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 183]   - Registered infrastructure 'kubernetes'
2022-01-14 15:48:18,996[main]             [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 705]   - Infrastructure is tracking 0 active runtimes that need to be stopped
2022-01-14 15:48:19,003[main]             [INFO ] [o.e.c.m.oidc.OIDCInfoProvider 71]    - Retrieving OpenId configuration from endpoint: http://vista.qplatform.it:5050/auth/.well-known/openid-configuration

and don't contain any stracktraces. Could you grab them one more time and ensure that stacktrace is there?

Reinstalled and re-uploaded the zip file containing the logs.
I also added the logs of %LocalAppData%\chectl\error.log.
Let me know

@tolusha
Copy link
Contributor

tolusha commented Jan 17, 2022

logs dont contains any stracktraces.
Could you check if auth.nativeUserMode is set to true in CheCluster CR ? It is important when DevWorkspace is enabled.
kubectl get checluster/eclipse-che -n eclipse-che -o jsonpath='{.spec.auth.nativeUserMode}'
If it is false then set it to true

@martinelli-francesco
Copy link
Author

Sorry, I tried three times and the stacktraces were never present.
nativeUserMode is already set to true.

@tolusha
Copy link
Contributor

tolusha commented Jan 18, 2022

So, then it stuck on retrieving Retrieving OpenId configuration from endpoint.
Could you try:

kubectl exec deploy/che -n eclipse-che  -- bash -c "curl --verbose http://vista.qplatform.it:5050/auth/.well-known/openid-configuration"

@martinelli-francesco
Copy link
Author

Also note that from the Azure dashboard you can see that the oauth-proxy container (pod che-gateway) is in CrashLoopBackOff. Its logs are:

[2022/01/18 09:02:56] [main.go:54] invalid configuration:
  oidc provider requires an oidc issuer URL

I tried the above command with both http and https.
HTTP:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 20.73.205.9:5050...
* TCP_NODELAY set
  0     0    0     0    0     0      0      0 --:--:--  0:02:09 --:--:--     0* connect to 20.73.205.9 port 5050 failed: Connection timed out
* Failed to connect to vista.qplatform.it port 5050: Connection timed out
  0     0    0     0    0     0      0      0 --:--:--  0:02:09 --:--:--     0
* Closing connection 0
curl: (28) Failed to connect to vista.qplatform.it port 5050: Connection timed out
command terminated with exit code 28

HTTPS:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 20.73.205.9:5050...
* TCP_NODELAY set
  0     0    0     0    0     0      0      0 --:--:--  0:01:48 --:--:--     0command terminated with exit code 137

@tolusha
Copy link
Contributor

tolusha commented Jan 18, 2022

That's might be a cause

[2022/01/18 09:02:56] [main.go:54] invalid configuration:
  oidc provider requires an oidc issuer URL

If DevWorkspace is enabled then OIDC provider must be configured on the cluster (mandatory since 7.42.0).

@martinelli-francesco
Copy link
Author

Do you mean I need to configure an external identity provider? Should't be the keycloak? Btw I am installing 7.41.2.

I have a parallel installation in Minikube without an additional configuration for OIDC provider.

@tolusha
Copy link
Contributor

tolusha commented Jan 18, 2022

It can be keycloak but spec.devWorkspace.enable must be false

@martinelli-francesco
Copy link
Author

So you mean that if I want to use the devworkspace I must have an external identity provider, correct?
If so, where can I find documentation to configure the OIDC provider?

@martinelli-francesco
Copy link
Author

I tried to apply a custom OIDC (auth0) by the following command:
chectl server:deploy --che-operator-cr-patch-yaml=patch.yaml --platform=k8s --domain=vista.qplatform.it
where patch.yaml is:

spec:
  devWorkspace:
    enable: true
  auth:
    externalIdentityProvider: true
    identityProviderURL: https://xxx.auth0.com
    identityProviderRealm: ExampleId
    identityProviderClientId: my-client-id

Now, the oauth-proxy container shows another error:

[2022/01/18 16:49:17] [main.go:54] invalid configuration:
  provider missing setting: client-id
  missing setting: client-secret or client-secret-file

Looking at the configMap che-gateway-config-oauth-proxy we see that the oidc_issuer_url field has now the correct value but client_id is empty even if we set the identityProviderClientId configuration.

@tolusha
Copy link
Contributor

tolusha commented Jan 19, 2022

cc @sparkoo, do we have any docs to point out?

To configure Eclipse Che, you need to set the following fields:

spec:
  auth:
    identityProviderURL:
    oAuthClientName:
    oAuthSecret:

@martinelli-francesco
Copy link
Author

In the documentation (https://www.eclipse.org/che/docs/che-7/installation-guide/configuring-the-che-installation/) these fields refer to the "OpenShift OAuthClient resource used to set up identity federation on the OpenShift side".

I am deploying the custer in Azure (kubernetes) and not Openshift and I need to use an external OIDC provider (auth0: https://auth0.com/). Should I still configure oAuthClientName and oAuthSecret fields?

@tolusha
Copy link
Contributor

tolusha commented Jan 21, 2022

Hi.
Yes, you have to configure oAuthClientName and oAuthSecret fields
We've switched to a new workspace engine. The doc has not been updated yet.

@martinelli-francesco
Copy link
Author

Unfortunately aks does not support custom OIDC other than Active Directory for native kubernetes users.
So we moved on to ews.

@che-bot
Copy link
Contributor

che-bot commented Jul 27, 2022

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2022
@l0rd
Copy link
Contributor

l0rd commented Jul 27, 2022

There is a blog post with instructions to do that now:
https://che.eclipseprojects.io/2022/07/25/@karatkep-installing-eclipse-che-on-aks.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/install Issues related to installation, including offline/air gap and initial setup kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

5 participants