Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminal disconnect when another workspace is starting or stopping on kubernetes #9943

Closed
liyanliang1994 opened this issue Jun 5, 2018 · 15 comments
Labels
kind/question Questions that haven't been identified as being feature requests or bugs.

Comments

@liyanliang1994
Copy link

Description

I try to start several workspaces and the terminal doesn't work sometimes.

Reproduction Steps

After a workspace started, I try to start another one, and when it's starting, the terminal in the previous workspace doesn't work and can't reconnect again unless I start a new terminal. This also happens when another workspace is stopping.
OS and version:
che 6.6.0
kubernetes 1.9.2
nginx-ingress-controller 0.15.0
Diagnostics:
default

@liyanliang1994
Copy link
Author

I guess the problem is caused by the nginx-ingress. When a workspace starting, new ingresses is generated and the nginx will reload, which breaks the websoket.

@ghost
Copy link

ghost commented Jun 5, 2018

@liyanliang1994 you are most probably right. However, the IDE should reconnect

@sleshchenko can this be caused by nginx reload?

@ghost ghost added the kind/question Questions that haven't been identified as being feature requests or bugs. label Jun 5, 2018
@sleshchenko
Copy link
Member

@eivantsov I think that the reason is nginx reload. I've described it here #8675 (comment)

@liyanliang1994
Copy link
Author

@sleshchenko #8675 solved the bug of websocket in dashboard. But the terminal is a websocket in ide.
20180608105157

@sleshchenko
Copy link
Member

sleshchenko commented Jun 8, 2018

@liyanliang1994 Well, when I shared this issue I just wanted to point to a description of WebSocket and nginx issue.
So, as far as I see IDE reconnects to terminal WebSocket endpoint when a connection is lost. It tries to reconnect 2 times each 2 seconds.
@vparfonov @AndrienkoAleksandr Am I right? Could you take a look at this issue maybe terminal in IDE can be improved and reconnection can be fixed in such cases?

@liyanliang1994 But still it can be a quite annoying issue that all clients lose WebSocket connections when each workspace start/stop. You can try to fix it for you by using another Ingress Backend, like traefik.

@liubin10
Copy link

liubin10 commented Jun 8, 2018

@sleshchenko I found that if socket is closed or some error occurred, terminal send method can still be triggered. I think some check should be added before sending data through Websocket.

@AndrienkoAleksandr
Copy link
Contributor

About reconnection: we need improve this stuff. Because on reconnection terminal-agent server side create new one terminal process instead of connection to the previous one. So seems it's issue about improve terminal agent server side api and adaptation client side to the impoved api.

@liyanliang1994
Copy link
Author

Thanks for your reply.
@sleshchenko I try to replace nginx with traefik. Che create ingresses and service for workspace successsfully, but pod is not created. Is there anything I forget to configure?

traefik deployment

apiVersion: v1
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: traefik-ingress-controller
  namespace: kube-system
  labels:
    k8s-app: traefik-ingress-lb
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: traefik-ingress-lb
  template:
    metadata:
      labels:
        k8s-app: traefik-ingress-lb
        name: traefik-ingress-lb
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - image: traefik
        name: traefik-ingress-lb
        ports:
        - name: http
          containerPort: 80
          hostport: 80
          protocol: TCP
        args:
        - --api
        - --kubernetes
        - --logLevel=INFO

configmap
CHE_INFRA_KUBERNETES_INGRESS_ANNOTATIONS__JSON: '{"kubernetes.io/ingress.class": "traefik", "traefik.ingress.kubernetes.io/rewrite-target": "/","traefik.ingress.kubernetes.io/ssl-redirect": "false","traefik.ingress.kubernetes.io/proxy-connect-timeout": "3600","traefik.ingress.kubernetes.io/proxy-read-timeout": "3600"}'

che-ingress

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: che-ingress
  annotations:
    kubernetes.io/ingress.class: "traefik"
    traefik.ingress.kubernetes.io/proxy-read-timeout: "3600"
    traefik.ingress.kubernetes.io/proxy-connect-timeout: "3600"
    traefik.ingress.kubernetes.io/ssl-redirect: "false"
spec:
  rules:
  - host: <domain>
    http:
      paths:
      - path: /
        backend:
          serviceName: che-host
          servicePort: 8080

@liyanliang1994
Copy link
Author

liyanliang1994 commented Jun 12, 2018

@AndrienkoAleksandr It's true. Sometimes terminal reconnect successfully, but the working directory reset to '/projects' so that the work before can't continue.

@sleshchenko
Copy link
Member

@liyanliang1994 If Che creates ingresses and service for workspace successfully, but a pod is not created I guess workspace start hung up on WaitReady phase https://github.com/eclipse/che/blob/0b30ca1d9bf3a35b4997e1fac8004ac5a8fa5eea/infrastructures/kubernetes/src/main/java/org/eclipse/che/workspace/infrastructure/kubernetes/KubernetesInternalRuntime.java#L569
Need to debug to make sure and investigate why load balancer list is empty, maybe it should be empty with Traefik ingress controller then K8s Infra should be improved.

@liyanliang1994
Copy link
Author

@sleshchenko Finally, I replace nginx with traefik successfully. And the bug of terminal is fixed. There are 2 things worth exploring.

  1. Traefik doesn't update ingress status(may be fixed in v1.7.0 #3324 ), which causes that Che keeps waiting for it to be ready and won't create pod for workspace. I set a Thread.sleep for severl seconds and skip the waiting process by changing the Predicate below to true. Maybe it's a bad solution, but it works.
    https://github.com/eclipse/che/blob/0b30ca1d9bf3a35b4997e1fac8004ac5a8fa5eea/infrastructures/kubernetes/src/main/java/org/eclipse/che/workspace/infrastructure/kubernetes/KubernetesInternalRuntime.java#L585

  2. CHE_INFRA_KUBERNETES_INGRESS_ANNOTATIONS__JSON in Che ConfigMap.
    '{"kubernetes.io/ingress.class": "traefik", "traefik.ingress.kubernetes.io/rule-type": "PathPrefixStrip", "traefik.ingress.kubernetes.io/ssl-redirect": "false", "traefik.ingress.kubernetes.io/proxy-connect-timeout": "3600", "traefik.ingress.kubernetes.io/proxy-read-timeout": "3600"}'
    For traefik, "rewrite-target": "/" will rewite entire url to "/" so that workspace cannot recieve expected requests from che-server.

@sleshchenko
Copy link
Member

@liyanliang1994 Great to hear that it finally works.
@eivantsov Should we create bugs for issues that @liyanliang1994 described?

@ghost
Copy link

ghost commented Sep 10, 2018

I think we should, those are legitimate use cases

@sleshchenko
Copy link
Member

@liyanliang1994

  1. There is a created issue to improve Che Server and doesn't wait for load balancer IP if it is not needed Do not wait loadBalancer of ingresses if servers host is known #10767. When it will be fixed it will be possible to use ingress backend that doesn't update ingress status if single-host or multi-host server strategy is configured.
  2. @eivantsov Could you please add a small note somewhere in our documentation(or just create an issue) which value should be configured for CHE_INFRA_KUBERNETES_INGRESS_ANNOTATIONS__JSON property if traefik ingress backend is configured on Kubernetes infrastructure.

@liyanliang1994 Could your close this issue if Che Server finally works for you?

@skabashnyuk
Copy link
Contributor

Closing. Feel free to reopen it if you think its still relevant for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Questions that haven't been identified as being feature requests or bugs.
Projects
None yet
Development

No branches or pull requests

5 participants