Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the restart of che-code based workspaces #21085

Closed
nils-mosbach opened this issue Jan 28, 2022 · 19 comments
Closed

Fix the restart of che-code based workspaces #21085

nils-mosbach opened this issue Jan 28, 2022 · 19 comments
Labels
area/dashboard kind/bug Outline of a bug - must adhere to the bug report template. new&noteworthy For new and/or noteworthy issues that deserve a blog post, new docs, or emphasis in release notes severity/P1 Has a major impact to usage or development of the system.
Milestone

Comments

@nils-mosbach
Copy link

nils-mosbach commented Jan 28, 2022

Describe the bug

Starting a devworkspace with che-code enabled fails to restart or start (if stopped).

image

Code was enabled with url parameter. e.g.
https://company.dev/dashboard/#/load-factory?url=https%3A%2F%2Fgit.company.dev%2FMosbachN%2Fdevfile-test-vscode.git&che-editor=che-incubator/che-code/insiders&policies.create=perclick

Che version

next (development version)

Steps to reproduce

  1. Start a new Workspace with Che-Code.
  2. Stop workspace
  3. Start workspace

Expected behavior

Should not throw an error.

Runtime

Kubernetes (vanilla)

Screenshots

No response

Installation method

chectl/next

Environment

Linux

Eclipse Che Logs

$ kubectl get devworkspace devfile-test-vscode-pkyn -o yaml     
            
apiVersion: workspace.devfile.io/v1alpha2
kind: DevWorkspace
metadata:
  annotations:
    che.eclipse.org/che-editor: che-incubator/che-code/insiders
    che.eclipse.org/last-updated-timestamp: "2022-01-28T13:29:08.757Z"
  creationTimestamp: "2022-01-28T13:27:48Z"
  finalizers:
  - storage.controller.devfile.io
  generation: 5
  labels:
    controller.devfile.io/creator: ""
  managedFields:
  - apiVersion: workspace.devfile.io/v1alpha2
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:che.eclipse.org/che-editor: {}
          f:che.eclipse.org/last-updated-timestamp: {}
      f:spec:
        .: {}
        f:routingClass: {}
        f:template:
          .: {}
          f:components: {}
          f:projects: {}
    manager: unknown
    operation: Update
    time: "2022-01-28T13:27:48Z"
  - apiVersion: workspace.devfile.io/v1alpha2
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .: {}
          v:"storage.controller.devfile.io": {}
      f:spec:
        f:started: {}
      f:status:
        .: {}
        f:conditions: {}
        f:devworkspaceId: {}
        f:mainUrl: {}
        f:message: {}
        f:phase: {}
    manager: devworkspace-controller
    operation: Update
    time: "2022-01-28T13:29:10Z"
  name: devfile-test-vscode-pkyn
  namespace: dev-studio-workspace-n-m-com-zsobsb
  resourceVersion: "323737583"
  uid: 84440f1d-230b-4ef9-aa09-37e1347841f2
spec:
  routingClass: che
  started: false
  template:
    components:
    - attributes:
        che-code.eclipse.org/contribute-cpuLimit: true
        che-code.eclipse.org/contribute-cpuRequest: true
        che-code.eclipse.org/contribute-endpoint/che-code: 3100
        che-code.eclipse.org/contribute-endpoint/code-redirect-1: 13131
        che-code.eclipse.org/contribute-endpoint/code-redirect-2: 13132
        che-code.eclipse.org/contribute-endpoint/code-redirect-3: 13133
        che-code.eclipse.org/contribute-entry-point: true
        che-code.eclipse.org/contribute-memoryLimit: true
        che-code.eclipse.org/contribute-memoryRequest: true
        che-code.eclipse.org/contribute-volume-mount/checode: /checode
        che-code.eclipse.org/contributed-container: nodejsdev
        che-code.eclipse.org/original-memoryLimit: 1G
      container:
        command:
        - /checode/entrypoint-volume.sh
        cpuLimit: 500m
        cpuRequest: 30m
        endpoints:
        - exposure: internal
          name: nodejs
          protocol: http
          targetPort: 3000
        - attributes:
            contributed-by: che-code.eclipse.org
            cookiesAuthEnabled: true
            discoverable: false
            type: main
            urlRewriteSupported: true
          exposure: public
          name: che-code
          path: ?tkn=eclipse-che
          protocol: https
          secure: false
          targetPort: 3100
        - attributes:
            contributed-by: che-code.eclipse.org
            discoverable: false
            urlRewriteSupported: true
          exposure: public
          name: code-redirect-1
          protocol: http
          targetPort: 13131
        - attributes:
            contributed-by: che-code.eclipse.org
            discoverable: false
            urlRewriteSupported: true
          exposure: public
          name: code-redirect-2
          protocol: http
          targetPort: 13132
        - attributes:
            contributed-by: che-code.eclipse.org
            discoverable: false
            urlRewriteSupported: true
          exposure: public
          name: code-redirect-3
          protocol: http
          targetPort: 13133
        env:
        - name: DEVWORKSPACE_CREATOR
          value: mosbachn
        - name: _BUILDAH_STARTED_IN_USERNS
          value: test
        image: quay.io/devfile/universal-developer-image:ubi8-b452131
        memoryLimit: 2.07G
        memoryRequest: 256Mi
        mountSources: true
        sourceMapping: /projects
        volumeMounts:
        - name: checode
          path: /checode
      name: nodejsdev
    - name: che-code-workspace84440f1d230b4ef9
      plugin:
        kubernetes:
          name: che-code-workspace84440f1d230b4ef9
          namespace: dev-studio-workspace-n-m-com-zsobsb
    projects:
    - git:
        remotes:
          origin: https://git.company.dev/MN/devfile-test-vscode.git
      name: devfile-test-vscode
status:
  conditions:
  - lastTransitionTime: "2022-01-28T13:29:04Z"
    status: Unknown
    type: DevWorkspaceWarning
  - lastTransitionTime: "2022-01-28T13:29:04Z"
    message: "8 errors occurred:\n\t* devfile contains multiple endpoint entries with
      same name: che-code\n\t* devfile contains multiple endpoint entries with same
      name: code-redirect-1\n\t* devfile contains multiple endpoint entries with same
      name: code-redirect-2\n\t* devfile contains multiple endpoint entries with same
      name: code-redirect-3\n\t* devfile contains multiple containers with same TargetPort:
      3100\n\t* devfile contains multiple containers with same TargetPort: 13131\n\t*
      devfile contains multiple containers with same TargetPort: 13132\n\t* devfile
      contains multiple containers with same TargetPort: 13133\n\n"
    reason: BadRequest
    status: "True"
    type: FailedStart
  - lastTransitionTime: "2022-01-28T13:29:04Z"
    message: Workspace stopped due to error
    status: "False"
    type: Started
  - lastTransitionTime: "2022-01-28T13:29:04Z"
    status: Unknown
    type: DevWorkspaceResolved
  devworkspaceId: workspace84440f1d230b4ef9
  mainUrl: https://company.dev/workspace84440f1d230b4ef9/nodejsdev/3100/?tkn=eclipse-che
  message: "8 errors occurred:\n\t* devfile contains multiple endpoint entries with
    same name: che-code\n\t* devfile contains multiple endpoint entries with same
    name: code-redirect-1\n\t* devfile contains multiple endpoint entries with same
    name: code-redirect-2\n\t* devfile contains multiple endpoint entries with same
    name: code-redirect-3\n\t* devfile contains multiple containers with same TargetPort:
    3100\n\t* devfile contains multiple containers with same TargetPort: 13131\n\t*
    devfile contains multiple containers with same TargetPort: 13132\n\t* devfile
    contains multiple containers with same TargetPort: 13133\n\n"
  phase: Failed

Additional context

Restarting theia workspaces don't throw errors.

Release Notes Text

Workspaces that use che-code as an editor failed to restart or start (if stopped). This has been fixed in this release.

@nils-mosbach nils-mosbach added the kind/bug Outline of a bug - must adhere to the bug report template. label Jan 28, 2022
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Jan 28, 2022
@benoitf
Copy link
Contributor

benoitf commented Jan 28, 2022

@nils-mosbach could you get as well DeWorkspaceRouting objects ?

@benoitf
Copy link
Contributor

benoitf commented Jan 28, 2022

based on the provided devfile yaml I don't see how we can have multiple endpoints with the same name ?
cc @amisevsk

@amisevsk
Copy link
Contributor

It depends on what the DevWorkspaceTemplate che-code-workspace84440f1d230b4ef9 in namespace dev-studio-workspace-n-m-com-zsobsb looks like and what the dashboard is doing internally.

My hunch is that somewhere along the line, endpoints are being added a second time -- either the DWT contains them as well as the main container, or they're injected into the main container somewhere and then reinjected, causing duplicates.

@nils-mosbach could you share the DevWorkspaceTemplate (i.e. kubectl get dwt che-code-workspace84440f1d230b4ef9 -n dev-studio-workspace-n-m-com-zsobsb). Also what does the original devfile (i.e. the one contained in https://git.company.dev/MosbachN/devfile-test-vscode.git), if any, look like?

@amisevsk
Copy link
Contributor

amisevsk commented Jan 28, 2022

To shortcut the discussion a little, looking at what I think is the definition of the devworkspacetemplate mentioned above here, I see that the plugin also includes the endpoints che-code and code-redirect-[1,2,3], which is likely where the conflict is coming from.

The next question would be how the nodejsdev component ended up with the same endpoints attached to it.

@nils-mosbach
Copy link
Author

nils-mosbach commented Jan 28, 2022

If a workspace is started with che-code as the editor, che-code will get injected to the developer image (sidecarless-mode, #20435). I guess the operator copies endpoints, etc. defined by the che-code plugin to the developer image that has mountSources: true. Since theia isn't causing any issues this might be related to the sidecarless-mode. Is there a check that validates if containers don't collide with plugins in v1?

I've created a simple devfile and extracted devworkspace, devworkspaceroutings and devworkspacetemplates information.
https://github.com/nils-mosbach/devfile.io-demo-che-21085/blob/df1757f4387afdd01e036642626a408c949dc97c/devfile.yaml

Workspace was started with:
http://company.dev/#https://github.com/nils-mosbach/devfile.io-demo-che-21085.git?che-editor=che-incubator/che-code/insiders

I've dropped all information in a repo for better readability:
https://github.com/nils-mosbach/devfile.io-demo-che-21085

@Kasturi1820 Kasturi1820 added severity/P1 Has a major impact to usage or development of the system. and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Jan 31, 2022
@amisevsk
Copy link
Contributor

amisevsk commented Feb 1, 2022

@nils-mosbach thanks for the repo, that's very helpful in testing.

Looking at the DevWorkspaceTemplate and DevWorkspace as applied to the cluster (from the given devfile), it looks like whatever is injecting those endpoints into the dev container is doing it every time, without checking whether they're already added. However, I don't know which component is responsible for doing this -- the DevWorkspace Operator plays no role here.

@nils-mosbach
Copy link
Author

nils-mosbach commented Feb 2, 2022

I wasn't able to find the area where che-code gets injected to the users main devcontainer. The only difference in the editor config seems to be skipMetaYaml. (che-editors.yaml) @benoitf is that the check where the magic happens?

Since that error pops up in the devfile status I guess devworkspace-operator might at least throw the error.

If I look at the devfile that's generated there's an additional plugin che-code-workspace40d5c314bea044ab defined.

devworkspace.yaml#L130

    - name: che-code-workspace40d5c314bea044ab
      plugin:
        kubernetes:
          name: che-code-workspace40d5c314bea044ab
          namespace: dev-studio-workspace-nm-company-com-zsobsb

I guess that's linking to the devworkspacetemplate che-code-workspace40d5c314bea044ab. If so, that template also contains the che-code-runtime-description which already got injected. From a devworkspace-operators view that error makes sense, since two independet plugins share the same endpoint definition.
If that's the case, the question would be if its better to get rid of the devworkspacetemplate che-code-runtime-description by the component that injects that definition in the user container or if devworkspace operator should ignore that component.

@benoitf
Copy link
Contributor

benoitf commented Feb 2, 2022

I'm not reproducing the issue (I'm stopping/restarting che-code workspace) all the time and I'm using che/next as well.

Is it a fresh 'chectl/next' install or did you reinstall/restart che containers recently ? (else you might have old containers)

Magic happens there:https://github.com/che-incubator/che-code/tree/main/tools/devworkspace-handler

but magic is happening before a workspace is started.

So starting a workspace or restarting it should involve the same devWorkspaceTemplate/devWorkspace objects

@nils-mosbach
Copy link
Author

@benoitf All right, thanks. I'll check the status of all tagged next images.

@nils-mosbach
Copy link
Author

nils-mosbach commented Feb 2, 2022

@benoitf all right. cleaned and redeployed everything with latest current chectl:next. Pull policy is Always for all images, so images should be on all next tags of today. Issue still exists.

First of all, I set che-code as the default editor for all workspaces (CHE_FACTORY_DEFAULT__EDITOR, CHE_WORKSPACE_DEVFILE_DEFAULT__EDITOR). Otherwise all getting-started examples will run theia.

While all our devfiles won't work, or at least don't restart, Java Spring Boot examples defined by the devfile registry do restart fine. Digging a little deeper that only applies to devfiles that are part of the devfile-registry and basically started through e.g. Create workspace > Select a sample > Java Spring Boot. The same project/devfile https://github.com/che-samples/java-spring-petclinic started using Import from Git > Git Repo URL does not work/ restart.

So I looked at the devfile registry and there seems to happen a little bit of magic.

image

It seems that java-web-spring/devworkspace-che-code-insiders.yaml contains the injected result that is therefore used as the devworkspace/devworkspacetemplate.

apiVersion: workspace.devfile.io/v1alpha2
kind: DevWorkspaceTemplate
metadata:
  name: che-code-spring-petclinic
spec:
  commands:
    - id: init-container-command
      apply:
        component: che-code-injector
  events:
    preStart:
      - init-container-command
  components:
    - name: checode
      volume: {}
    - name: che-code-injector
      container:
        image: quay.io/che-incubator/che-code:insiders
        command:
          - /entrypoint-init-container.sh
        volumeMounts:
          - name: checode
            path: /checode
        memoryLimit: 128Mi
        memoryRequest: 32Mi
        cpuLimit: 500m
        cpuRequest: 30m
---
apiVersion: workspace.devfile.io/v1alpha2
kind: DevWorkspace
metadata:
  name: spring-petclinic
spec:
  started: true
  template:
    components:
      - name: tools
        container:
          image: quay.io/devfile/universal-developer-image:ubi8-d433ed6
          memoryLimit: 4Gi
          endpoints:
            - exposure: none
              name: debug
              protocol: tcp
              targetPort: 5005
            - exposure: public
              name: 8080-tcp
              protocol: http
              targetPort: 8080
            - name: che-code
              attributes:
                type: main
                cookiesAuthEnabled: true
                discoverable: false
                urlRewriteSupported: true
                contributed-by: che-code.eclipse.org
              targetPort: 3100
              exposure: public
              path: '?tkn=eclipse-che'
              secure: false
              protocol: https
            - name: code-redirect-1
              attributes:
                discoverable: false
                urlRewriteSupported: true
                contributed-by: che-code.eclipse.org
              targetPort: 13131
              exposure: public
              protocol: http
.....

Che Dashboard seems to recognize this and adds a url parameter to the factory:

https://che.company.dev/dashboard/#/load-factory?url=https://github.com/che-samples/java-spring-petclinic/devfilev2&devWorkspace=https://che.company.dev/devfile-registry/devfiles/java-web-spring/devworkspace-che-code-insiders.yaml

Removing this parameter leads to the issue of workspaces that can't be restarted since DevworkspaceTemplate contains che-codes' runtime description is this case.

https://che.company.dev/dashboard/#/load-factory?url=https://github.com/che-samples/java-spring-petclinic/devfilev2

Bottom lined: I guess DevworkspaceTemplates should not contain che-code-runtime-description. That's the case if started a factory link or using Import from Git. The step which is done to prebuild Devworkspace/DevworkspaceTemplate files from a given Devfile in the registry is missing here, or does not remove the runtime-description properly.

@nils-mosbach
Copy link
Author

nils-mosbach commented Feb 2, 2022

Looking at che-code / che-code-devfile-resolver-impl.ts this doesn't work in my case. At least the modified DevworkspaceTemplate is not used. This is triggered here dashboard / DevWorkspaceEditorProcessCode.ts, right?

In our che-dashboard logs there's an error message Could not resolve the kubeconfig env variable in dev-studio-workspace-nm-company-com-plzm2u/workspace05092c6a90594233-864b4f6846-qzkr6/tools. Could that point to the issue?

@nils-mosbach
Copy link
Author

@benoitf While digging a little deeper it seems that che-code devworkspace-handler works as expected and removes the runtime-description from the devworkspace-template before passing to the dashboard-backend.

image

after the command is executed, devworkspace template is the same on kubernetes with only 2 components. So this looks good. Workspace can be stopped and restarted without any issues.

The issue must happen on dashboard start. If I reload the dashboard page it patches the devworkspacetemplate with the original che-code template provided by the plugin-registry.

image

Since this is only called once, I guess there's something like an update routine, that allows keeping existing templates in line with updates in the plugin registry. I'll try to find out where this happens.

It took me a while to get che-dashboard dev environment running on kubernetes. If you'd like I submit a PR that adds a command yarn start:k8s that handles OAuth on kubernetes. (basically a customized version of local-run.sh without non-native-auth mode.

@benoitf
Copy link
Contributor

benoitf commented Feb 4, 2022

thanks for the heads up

On the first startup of the workspace, is that DevWorkspaceTemplate contains the che-code-runtime-description or not ?

maybe devWorkspaceHandler cleanup some entries but it's not the one that is sent to the DevWorkspace Operator

so on the first startup, DevWorkspaceTemplate should be without "che-code-runtime-description" (the handler should have remove the entry and adds all options into the existing devContainer (first component found in the devfile))

And then, do we go into DevWorkspaceHandler on the restart ?

I'll investigate next week

@nils-mosbach
Copy link
Author

The first startup is fine. As long as che-dashboard is not restarted, everything keeps working.

The issue is caused in the editor update routine of the dashboard:

https://github.com/eclipse-che/che-dashboard/blob/f9c2a7c087871c6095b033729c51729f779f252c/packages/dashboard-frontend/src/services/bootstrap/index.ts#L190-L212

  private async updateDevWorkspaceTemplates(settings: che.WorkspaceSettings): Promise<void> {
    if (!isDevworkspacesEnabled(settings)) {
      return;
    }
    const defaultKubernetesNamespace = selectDefaultNamespace(this.store.getState());
    const defaultNamespace = defaultKubernetesNamespace.name;
    try {
      const pluginsByUrl: { [url: string]: devfileApi.Devfile } = {};
      const state = this.store.getState();
      selectDwEditorsPluginsList(state.dwPlugins.defaultEditorName)(state).forEach(dwEditor => {
        pluginsByUrl[dwEditor.url] = dwEditor.devfile;
      });
      const updates = await this.devWorkspaceClient.checkForTemplatesUpdate(
        defaultNamespace,
        pluginsByUrl,
      );
      if (Object.keys(updates).length > 0) {
        await this.devWorkspaceClient.updateTemplates(defaultNamespace, updates);
      }
    } catch (e) {
      console.error('Failed to update templates.', e);
    }
  }

This will override all template files by their default settings as defined by the plugin registry. As far as I can see, DevworkspaceOperator is not involved.

Easiest fix would be to skip conversions in case of sidecarless-editors. But that would eventually break exisiting workspaces if editor runs on next or latest tags. Based on the original devfile we could use the same routine that generate Devworkspace and DevworkspaceTemplates in case of new Workspaces, but it seems that the original Devfile isn't persisted after the creation.

@benoitf
Copy link
Contributor

benoitf commented Feb 4, 2022

ok I think this logic was added to handle "automatic updates" of the DevWorkspace templates when upgrading to a newer che version.

@nils-mosbach
Copy link
Author

@benoitf I'm happy to submit a PR that skips updates in case of vscode and intellij idea (sidecarless). What do you think?

I would add a check, somewhere in here, where templates that have changed are identified: https://github.com/eclipse-che/che-dashboard/blob/f9c2a7c087871c6095b033729c51729f779f252c/packages/dashboard-frontend/src/services/workspace-client/devworkspace/devWorkspaceClient.ts#L906

@benoitf
Copy link
Contributor

benoitf commented Feb 8, 2022

@nils-mosbach yes it may work to skip other editors for now

@nils-mosbach
Copy link
Author

Solved by PR. Closing.

@l0rd l0rd changed the title che-code - restart workspace fails Fix the restart of che-code based workspaces Feb 17, 2022
@l0rd l0rd added new&noteworthy For new and/or noteworthy issues that deserve a blog post, new docs, or emphasis in release notes status/release-notes-review-needed Issues that needs to be reviewed by the doc team for the Release Notes wording labels Feb 17, 2022
@l0rd l0rd added this to the 7.44 milestone Feb 17, 2022
@devstudio-release
Copy link

sync'd to Red Hat JIRA https://issues.redhat.com/browse/CRW-2771

@max-cx max-cx removed the status/release-notes-review-needed Issues that needs to be reviewed by the doc team for the Release Notes wording label Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dashboard kind/bug Outline of a bug - must adhere to the bug report template. new&noteworthy For new and/or noteworthy issues that deserve a blog post, new docs, or emphasis in release notes severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

8 participants