A workpace-related k8s resources(pods, services, ...) are not removed after inactivity timeout #15312

lautou · 2019-11-26T04:35:34Z

Describe the bug

Che version

latest
nightly
other: 7.3.0, 7.4.0

Steps to reproduce

Create a new OpenShift project: eclipse-che
Install Eclipse Che operator using OperatorHub:
- Install Mode: A specific namespace on the cluster: eclipse-che
- Update Channel: stable
- Approval Strategy: Automatic
Create a Che Cluster: eclipse-che
Log into Eclipse Che using OAuth login
Authorize Access user:full
Update Account Information
Create a Java Maven stack workspace selecting console-java-simple project sample
Wait for workspace creation and open workspace
Close web browser and wait 30 minutes for workspace timeout.
Watch che pod log. Che server is unable to delete workspace pods because of system:serviceaccount:eclipse-che:che service account does not have access to workspace namespace.

Expected behavior

Workspace pods should be terminated

Runtime

kubernetes (include output of kubectl version)
Openshift (include output of oc version)
minikube (include output of minikube version and kubectl version)
minishift (include output of minishift version and oc version)
docker-desktop + K8S (include output of docker version and kubectl version)
other: (please specify)

Screenshots

2019-11-25 20:56:24,399[nio-8080-exec-3]  [INFO ] [o.e.c.a.w.s.WorkspaceManager 569]    - Workspace 'karla/wksp-bmah' with id 'workspace4v17d6er9cojt3ml' created by user 'karla'
2019-11-25 20:56:30,449[nio-8080-exec-7]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 432]   - Starting workspace 'karla/wksp-bmah' with id 'workspace4v17d6er9cojt3ml' by user 'karla'
2019-11-25 20:59:50,946[aceSharedPool-0]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 835]   - Workspace 'karla:wksp-bmah' with id 'workspace4v17d6er9cojt3ml' started by user 'karla'
2019-11-25 21:34:24,006[ted-scheduler-8]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 493]   - Workspace 'karla/wksp-bmah' with id 'workspace4v17d6er9cojt3ml' is stopping by user 'activity-checker'
2019-11-25 21:34:24,206[aceSharedPool-1]  [ERROR] [o.e.c.a.w.s.WorkspaceRuntimes 923]   - Error occurred during stopping of runtime 'workspace4v17d6er9cojt3ml:default:987583aa-ae88-4eec-9aa6-6ed216526e07' by user 'activity-checker'. Error: Error(s) occurs while cleaning up the namespace. Failure executing: GET at: https://172.30.0.1/apis/route.openshift.io/v1/namespaces/karla-che/routes?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. routes.route.openshift.io is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "routes" in API group "route.openshift.io" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/services?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. services is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "services" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/apis/apps/v1/namespaces/karla-che/deployments?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. deployments.apps is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "deployments" in API group "apps" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/secrets?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. secrets is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "secrets" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/configmaps?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. configmaps is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "configmaps" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password.
org.eclipse.che.api.workspace.server.spi.InfrastructureException: Error(s) occurs while cleaning up the namespace. Failure executing: GET at: https://172.30.0.1/apis/route.openshift.io/v1/namespaces/karla-che/routes?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. routes.route.openshift.io is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "routes" in API group "route.openshift.io" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/services?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. services is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "services" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/apis/apps/v1/namespaces/karla-che/deployments?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. deployments.apps is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "deployments" in API group "apps" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/secrets?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. secrets is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "secrets" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/karla-che/configmaps?labelSelector=che.workspace_id%3Dworkspace4v17d6er9cojt3ml. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. configmaps is forbidden: User "system:serviceaccount:eclipse-che:che" cannot list resource "configmaps" in API group "" in the namespace "karla-che". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password.
	at org.eclipse.che.workspace.infrastructure.kubernetes.namespace.KubernetesNamespace.doRemove(KubernetesNamespace.java:205)
	at org.eclipse.che.workspace.infrastructure.openshift.project.OpenShiftProject.cleanUp(OpenShiftProject.java:120)
	at org.eclipse.che.workspace.infrastructure.kubernetes.KubernetesInternalRuntime.internalStop(KubernetesInternalRuntime.java:571)
	at org.eclipse.che.api.workspace.server.spi.InternalRuntime.stop(InternalRuntime.java:177)
	at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StopRuntimeTask.run(WorkspaceRuntimes.java:893)
	at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalRunnable.run(CopyThreadLocalRunnable.java:38)
	at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Installation method

chectl
che-operator 7.4.0
minishift-addon
I don't know

Environment

Additional context

The text was updated successfully, but these errors were encountered:

skabashnyuk · 2019-11-26T08:00:53Z

@lautou can you please provide more information about Openshift you are using and the way how did you install and setup Che?

lautou · 2019-11-26T08:47:33Z

@skabashnyuk
Openshift 4.2.7 on AWS.
I installed it through operator.

skabashnyuk · 2019-11-26T13:26:56Z

I installed it through operator.

Can you provide all sets of parameters that you use?
Can you provide CheCluster CR?
Can you provide Che-server pod yaml?

skabashnyuk · 2019-11-26T13:41:16Z

There is an assumption that you are using OAuth. Activity checker at this case uses system:serviceaccount:eclipse-che:che sa to list routes and it has not enough permission to do that. As a solution, I may suggest turning workspace idling off or provide more permissions for system:serviceaccount:eclipse-che:che sa

lautou · 2019-11-26T17:55:56Z

I installed it through operator.

Can you provide all sets of parameters that you use?

Can you provide CheCluster CR?

Can you provide Che-server pod yaml?

@skabashnyuk
Please find che cluster YAML and pods YAML
Yes i am using OAuth for Che authentication.

checluster.yaml.txt
pod-workspace4v17d6er9cojt3ml.che-jwtproxy-546c9fd9bf-vn54x.yaml.txt
pod-workspace4v17d6er9cojt3ml.gradle-5bd4bd9b5d-srkrw.yaml.txt

skabashnyuk · 2019-11-26T18:33:24Z

Can you try to set CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT=-1 to disable workspace idling?

l0rd · 2019-11-26T23:29:33Z

Setting sev/P1 because not being able to idle workspaces when OAuth is activated has a critical impact.

lautou · 2019-11-27T05:21:44Z

@skabashnyuk
I have created a rolebinding for service account eclipse-che/che on karla-che namespace with edit role.
It works:
2019-11-26 20:44:26,193[nio-8080-exec-4] [INFO ] [o.e.c.a.w.s.WorkspaceManager 569] - Workspace 'karla/wksp-f54e' with id 'workspaceesljlhg2flqua4lu' created by user 'karla' 2019-11-26 20:44:30,076[nio-8080-exec-2] [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 432] - Starting workspace 'karla/wksp-f54e' with id 'workspaceesljlhg2flqua4lu' by user 'karla' 2019-11-26 20:46:08,892[ceSharedPool-22] [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 835] - Workspace 'karla:wksp-f54e' with id 'workspaceesljlhg2flqua4lu' started by user 'karla' 2019-11-26 21:16:28,903[ted-scheduler-7] [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 493] - Workspace 'karla/wksp-f54e' with id 'workspaceesljlhg2flqua4lu' is stopping by user 'activity-checker' 2019-11-26 21:16:29,937[ceSharedPool-23] [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 900] - Workspace 'karla/wksp-f54e' with id 'workspaceesljlhg2flqua4lu' is stopped by user 'activity-checker'

Find as attachment the RoleBinding YAML.

rolebinding-che-edit.yaml.txt
I will check for CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT

skabashnyuk · 2019-11-27T07:11:18Z

Setting sev/P1 because not being able to idle workspaces when OAuth is activated has a critical impact.

@l0rd
We have two options here either disable idling either provides more permissions. I'm not sure that provides more permissions is a good thing. Can you confirm your intentions?

lautou · 2019-11-27T07:37:17Z

@skabashnyuk I have set CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT=-1 in che deployment resource. Unfortunately it doesn't seem to work. My workspace has been stopped after 30 minutes.

See logs + eclipse che pod YAML (containing CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT)
che-787579f549-6dsmf-che.log
pod-che-787579f549-6dsmf.yaml.txt

lautou · 2019-11-27T07:40:05Z

By the way, i have restarted che cluster deployment while workspace pods were still running. Maybe CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT should have not taken into account. Let me test while restarting the workspace.

lautou · 2019-11-27T09:54:51Z

Ok settings CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT=-1 works. My workspace is still up and running.
Nevertheless it is a workaround. Since deployment resource is managed by checluster Custom Resource, this setup should be rather applying in checluster CR yaml but i don't have any clue if it possible.

skabashnyuk · 2019-11-27T12:57:58Z

@lautou you can add to cluster CR

 customCheProperties:
     CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT:'-1'

l0rd · 2019-11-27T13:22:15Z

Setting sev/P1 because not being able to idle workspaces when OAuth is activated has a critical impact.

@l0rd
We have two options here either disable idling either provides more permissions. I'm not sure that provides more permissions is a good thing. Can you confirm your intentions?

Disable idling is not an option. I don't know if we need more permissions or we need to use another context (user token vs workspaces service account) or we should renew an expired token or something else but we need to find a solution to make idling work.

skabashnyuk · 2019-11-27T13:36:34Z

we need to use another context (user token vs workspaces service account)

Can you explain what do you mean by that in case that "Activity checker" has no user context?

we should renew an expired token

The difficulty here is that we don't have a token at all, because we don't have a user.

The only working solution at this moment what I know is to identify correct permission, make sure it was requested by the che's installation method (chectl, che-operator, helm), add an adjustment to configuration.

lautou · 2019-11-28T05:42:56Z

@lautou you can add to cluster CR
 customCheProperties:
     CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT:'-1'

I confim this setting works

lautou · 2019-12-11T09:03:50Z

I disagree the ticket closing.
Setting CHE_LIMITS_WORKSPACE_IDLE_TIMEOUT is a workaround.
After a timeout pod should be terminated gracefully.

sleshchenko · 2019-12-11T09:44:46Z

@lautou totally agree with you. I believe there should be another the same issue registered, but let's keep this one opened until it's not linked

lautou · 2020-01-10T16:15:35Z

I confirm on 7.6.0 there is no default timeout anymore. My workspaces are kept running.

skabashnyuk · 2020-03-04T10:19:53Z

Duplicate #15906

lautou added the kind/bug Outline of a bug - must adhere to the bug report template. label Nov 26, 2019

che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Nov 26, 2019

lautou mentioned this issue Nov 26, 2019

Rare error on start workspace. #14859

Closed

23 tasks

sleshchenko changed the title ~~Error occurred during stopping of runtime workspace due that configured service account doesn't have access~~ A workpace-related k8s resources(pods, services, ...) are not removed after inactivity timeout Nov 26, 2019

l0rd added severity/P1 Has a major impact to usage or development of the system. team/platform and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Nov 26, 2019

skabashnyuk added the status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach label Nov 27, 2019

skabashnyuk closed this as completed Dec 10, 2019

sleshchenko reopened this Dec 11, 2019

skabashnyuk removed the status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach label Jan 10, 2020

skabashnyuk added team/platform and removed team/platform labels Jan 10, 2020

skabashnyuk closed this as completed Mar 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A workpace-related k8s resources(pods, services, ...) are not removed after inactivity timeout #15312

A workpace-related k8s resources(pods, services, ...) are not removed after inactivity timeout #15312

lautou commented Nov 26, 2019 •

edited

Loading

skabashnyuk commented Nov 26, 2019

lautou commented Nov 26, 2019 •

edited

Loading

skabashnyuk commented Nov 26, 2019

skabashnyuk commented Nov 26, 2019

lautou commented Nov 26, 2019 •

edited

Loading

skabashnyuk commented Nov 26, 2019

l0rd commented Nov 26, 2019

lautou commented Nov 27, 2019 •

edited

Loading

skabashnyuk commented Nov 27, 2019

lautou commented Nov 27, 2019

lautou commented Nov 27, 2019

lautou commented Nov 27, 2019 •

edited

Loading

skabashnyuk commented Nov 27, 2019

l0rd commented Nov 27, 2019

skabashnyuk commented Nov 27, 2019

lautou commented Nov 28, 2019

lautou commented Dec 11, 2019

sleshchenko commented Dec 11, 2019

lautou commented Jan 10, 2020

skabashnyuk commented Mar 4, 2020

A workpace-related k8s resources(pods, services, ...) are not removed after inactivity timeout #15312

A workpace-related k8s resources(pods, services, ...) are not removed after inactivity timeout #15312

Comments

lautou commented Nov 26, 2019 • edited Loading

Describe the bug

Che version

Steps to reproduce

Expected behavior

Runtime

Screenshots

Installation method

Environment

Additional context

skabashnyuk commented Nov 26, 2019

lautou commented Nov 26, 2019 • edited Loading

skabashnyuk commented Nov 26, 2019

skabashnyuk commented Nov 26, 2019

lautou commented Nov 26, 2019 • edited Loading

skabashnyuk commented Nov 26, 2019

l0rd commented Nov 26, 2019

lautou commented Nov 27, 2019 • edited Loading

skabashnyuk commented Nov 27, 2019

lautou commented Nov 27, 2019

lautou commented Nov 27, 2019

lautou commented Nov 27, 2019 • edited Loading

skabashnyuk commented Nov 27, 2019

l0rd commented Nov 27, 2019

skabashnyuk commented Nov 27, 2019

lautou commented Nov 28, 2019

lautou commented Dec 11, 2019

sleshchenko commented Dec 11, 2019

lautou commented Jan 10, 2020

skabashnyuk commented Mar 4, 2020

lautou commented Nov 26, 2019 •

edited

Loading

lautou commented Nov 26, 2019 •

edited

Loading

lautou commented Nov 26, 2019 •

edited

Loading

lautou commented Nov 27, 2019 •

edited

Loading

lautou commented Nov 27, 2019 •

edited

Loading