Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A workpace-related k8s resources(pods, services, ...) are not removed after inactivity timeout #15312

Closed
5 of 22 tasks
lautou opened this issue Nov 26, 2019 · 20 comments
Closed
5 of 22 tasks
Labels
kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.

Comments

@lautou
Copy link

lautou commented Nov 26, 2019

Describe the bug

Che version

  • latest
  • nightly
  • other: 7.3.0, 7.4.0

Steps to reproduce

  1. Create a new OpenShift project: eclipse-che
  2. Install Eclipse Che operator using OperatorHub:
    • Install Mode: A specific namespace on the cluster: eclipse-che
    • Update Channel: stable
    • Approval Strategy: Automatic
  3. Create a Che Cluster: eclipse-che
  4. Log into Eclipse Che using OAuth login
  5. Authorize Access user:full
  6. Update Account Information
  7. Create a Java Maven stack workspace selecting console-java-simple project sample
  8. Wait for workspace creation and open workspace
  9. Close web browser and wait 30 minutes for workspace timeout.
  10. Watch che pod log. Che server is unable to delete workspace pods because of system:serviceaccount:eclipse-che:che service account does not have access to workspace namespace.

Expected behavior

Workspace pods should be terminated

Runtime

  • kubernetes (include output of kubectl version)
  • Openshift (include output of oc version)
  • minikube (include output of minikube version and kubectl version)
  • minishift (include output of minishift version and oc version)
  • docker-desktop + K8S (include output of docker version and kubectl version)
  • other: (please specify)

Screenshots

2019-11-25 20:56:24,399[nio-8080-exec-3]  [INFO ] [o.e.c.a.w.s.WorkspaceManager 569]    - Workspace 'karla/wksp-bmah' with id 'workspace4v17d6er9cojt3ml' created by user 'karla'
2019-11-25 20:56:30,449[nio-8080-exec-7]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 432]   - Starting workspace 'karla/wksp-bmah' with id 'workspace4v17d6er9cojt3ml' by user 'karla'
2019-11-25 20:59:50,946[aceSharedPool-0]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 835]   - Workspace 'karla:wksp-bmah' with id 'workspace4v17d6er9cojt3ml' started by user 'karla'
2019-11-25 21:34:24,006[ted-scheduler-8]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 493]   - Workspace 'karla/wksp-bmah' with id 'workspace4v17d6er9cojt3ml' is stopping by user 'activity-checker'
2019-11-25 21:34:24,206[aceSharedPool-1]  [ERROR] [o.e.c.a.w.s.WorkspaceRuntimes 923]   - Error occurred during stopping of runtime 'workspace4v17d6er9cojt3ml:default:987583aa-ae88-4eec-9aa6-6ed216526e07' by user 'activity-checker'. Error: Error(s) occurs while cleaning up the namespace. Failure executing: GET at: https://172.30.0.1/apis/route.openshift.io/v1/namespaces/karla-che/routes?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. routes.route.openshift.io is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "routes" in API group "route.openshift.io" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/services?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. services is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "services" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/apis/apps/v1/namespaces/karla-che/deployments?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. deployments.apps is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "deployments" in API group "apps" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/secrets?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. secrets is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "secrets" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/configmaps?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. configmaps is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "configmaps" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password.
org.eclipse.che.api.workspace.server.spi.InfrastructureException: Error(s) occurs while cleaning up the namespace. Failure executing: GET at: https://172.30.0.1/apis/route.openshift.io/v1/namespaces/karla-che/routes?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. routes.route.openshift.io is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "routes" in API group "route.openshift.io" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/services?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. services is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "services" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/apis/apps/v1/namespaces/karla-che/deployments?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. deployments.apps is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "deployments" in API group "apps" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/secrets?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. secrets is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "secrets" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/configmaps?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. configmaps is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "configmaps" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password.
	at org.eclipse.che.workspace.infrastructure.kubernetes.namespace.KubernetesNamespace.doRemove(KubernetesNamespace.java:205)
	at org.eclipse.che.workspace.infrastructure.openshift.project.OpenShiftProject.cleanUp(OpenShiftProject.java:120)
	at org.eclipse.che.workspace.infrastructure.kubernetes.KubernetesInternalRuntime.internalStop(KubernetesInternalRuntime.java:571)
	at org.eclipse.che.api.workspace.server.spi.InternalRuntime.stop(InternalRuntime.java:177)
	at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StopRuntimeTask.run(WorkspaceRuntimes.java:893)
	at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalRunnable.run(CopyThreadLocalRunnable.java:38)
	at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Installation method

  • chectl
  • che-operator 7.4.0
  • minishift-addon
  • I don't know

Environment

  • my computer
    • Windows
    • [] Linux
    • macOS
  • Cloud
    • Amazon
    • Azure
    • GCE
    • other (please specify)
  • other: please specify

Additional context

@lautou lautou added the kind/bug Outline of a bug - must adhere to the bug report template. label Nov 26, 2019
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Nov 26, 2019
@skabashnyuk
Copy link
Contributor

@lautou can you please provide more information about Openshift you are using and the way how did you install and setup Che?

@lautou
Copy link
Author

lautou commented Nov 26, 2019

@skabashnyuk
Openshift 4.2.7 on AWS.
I installed it through operator.

@skabashnyuk
Copy link
Contributor

I installed it through operator.

  • Can you provide all sets of parameters that you use?
  • Can you provide CheCluster CR?
  • Can you provide Che-server pod yaml?

@sleshchenko sleshchenko changed the title Error occurred during stopping of runtime workspace due that configured service account doesn't have access A workpace-related k8s resources(pods, services, ...) are not removed after inactivity timeout Nov 26, 2019
@skabashnyuk
Copy link
Contributor

There is an assumption that you are using OAuth. Activity checker at this case uses system:serviceaccount:eclipse-che:che sa to list routes and it has not enough permission to do that. As a solution, I may suggest turning workspace idling off or provide more permissions for system:serviceaccount:eclipse-che:che sa

@lautou
Copy link
Author

lautou commented Nov 26, 2019

I installed it through operator.

  • Can you provide all sets of parameters that you use?
  • Can you provide CheCluster CR?
  • Can you provide Che-server pod yaml?

@skabashnyuk
Please find che cluster YAML and pods YAML
Yes i am using OAuth for Che authentication.

checluster.yaml.txt
pod-workspace4v17d6er9cojt3ml.che-jwtproxy-546c9fd9bf-vn54x.yaml.txt
pod-workspace4v17d6er9cojt3ml.gradle-5bd4bd9b5d-srkrw.yaml.txt

@skabashnyuk
Copy link
Contributor

Can you try to set CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT=-1 to disable workspace idling?

@l0rd l0rd added severity/P1 Has a major impact to usage or development of the system. team/platform and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Nov 26, 2019
@l0rd
Copy link
Contributor

l0rd commented Nov 26, 2019

Setting sev/P1 because not being able to idle workspaces when OAuth is activated has a critical impact.

@lautou
Copy link
Author

lautou commented Nov 27, 2019

@skabashnyuk
I have created a rolebinding for service account eclipse-che/che on karla-che namespace with edit role.
It works:
2019-11-26 20:44:26,193[nio-8080-exec-4] [INFO ] [o.e.c.a.w.s.WorkspaceManager 569] - Workspace 'karla/wksp-f54e' with id 'workspaceesljlhg2flqua4lu' created by user 'karla' 2019-11-26 20:44:30,076[nio-8080-exec-2] [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 432] - Starting workspace 'karla/wksp-f54e' with id 'workspaceesljlhg2flqua4lu' by user 'karla' 2019-11-26 20:46:08,892[ceSharedPool-22] [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 835] - Workspace 'karla:wksp-f54e' with id 'workspaceesljlhg2flqua4lu' started by user 'karla' 2019-11-26 21:16:28,903[ted-scheduler-7] [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 493] - Workspace 'karla/wksp-f54e' with id 'workspaceesljlhg2flqua4lu' is stopping by user 'activity-checker' 2019-11-26 21:16:29,937[ceSharedPool-23] [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 900] - Workspace 'karla/wksp-f54e' with id 'workspaceesljlhg2flqua4lu' is stopped by user 'activity-checker'

Find as attachment the RoleBinding YAML.

rolebinding-che-edit.yaml.txt
I will check for CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT

@skabashnyuk
Copy link
Contributor

Setting sev/P1 because not being able to idle workspaces when OAuth is activated has a critical impact.

@l0rd
We have two options here either disable idling either provides more permissions. I'm not sure that provides more permissions is a good thing. Can you confirm your intentions?

@skabashnyuk skabashnyuk added the status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach label Nov 27, 2019
@lautou
Copy link
Author

lautou commented Nov 27, 2019

@skabashnyuk I have set CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT=-1 in che deployment resource. Unfortunately it doesn't seem to work. My workspace has been stopped after 30 minutes.

See logs + eclipse che pod YAML (containing CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT)
che-787579f549-6dsmf-che.log
pod-che-787579f549-6dsmf.yaml.txt

@lautou
Copy link
Author

lautou commented Nov 27, 2019

By the way, i have restarted che cluster deployment while workspace pods were still running. Maybe CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT should have not taken into account. Let me test while restarting the workspace.

@lautou
Copy link
Author

lautou commented Nov 27, 2019

Ok settings CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT=-1 works. My workspace is still up and running.
Nevertheless it is a workaround. Since deployment resource is managed by checluster Custom Resource, this setup should be rather applying in checluster CR yaml but i don't have any clue if it possible.

@skabashnyuk
Copy link
Contributor

@lautou you can add to cluster CR

 customCheProperties:
     CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT:'-1'

@l0rd
Copy link
Contributor

l0rd commented Nov 27, 2019

Setting sev/P1 because not being able to idle workspaces when OAuth is activated has a critical impact.

@l0rd
We have two options here either disable idling either provides more permissions. I'm not sure that provides more permissions is a good thing. Can you confirm your intentions?

Disable idling is not an option. I don't know if we need more permissions or we need to use another context (user token vs workspaces service account) or we should renew an expired token or something else but we need to find a solution to make idling work.

@skabashnyuk
Copy link
Contributor

we need to use another context (user token vs workspaces service account)

Can you explain what do you mean by that in case that "Activity checker" has no user context?

we should renew an expired token

The difficulty here is that we don't have a token at all, because we don't have a user.

The only working solution at this moment what I know is to identify correct permission, make sure it was requested by the che's installation method (chectl, che-operator, helm), add an adjustment to configuration.

@lautou
Copy link
Author

lautou commented Nov 28, 2019

@lautou you can add to cluster CR

 customCheProperties:
     CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT:'-1'

I confim this setting works

@lautou
Copy link
Author

lautou commented Dec 11, 2019

I disagree the ticket closing.
Setting CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT is a workaround.
After a timeout pod should be terminated gracefully.

@sleshchenko
Copy link
Member

@lautou totally agree with you. I believe there should be another the same issue registered, but let's keep this one opened until it's not linked

@sleshchenko sleshchenko reopened this Dec 11, 2019
@lautou
Copy link
Author

lautou commented Jan 10, 2020

I confirm on 7.6.0 there is no default timeout anymore. My workspaces are kept running.

@skabashnyuk skabashnyuk removed the status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach label Jan 10, 2020
@skabashnyuk
Copy link
Contributor

Duplicate #15906

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

5 participants