-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Description
In WorkspaceRuntimes#recover(), the loop over infra identities is inside the try block. If any exception occurs, an error will be logged and the method will return. This means that the recovery process can skip workspaces where there is no error.
Instead, the try/catch block should be inside the loop, to allow the recovery process to continue even if some workspaces cannot be recovered. This would also log all workspaces that cannot be recovered rather than the first.
Skipping workspaces after the first error also can have the effect of leaving workspace resources in an untracked and uncleaned state, which we've seen running rh-che on OpenShift (Deployments are left running while Che lists workspace as "stopped").
Reproduction Steps
Error can occur if Che server is restarted for any reason; if any exception is thrown while recovering runtimes, remaining runtimes are not recovered. Server logs show e.g.
October 2nd 2018, 04:36:35.628 Successfully recovered workspace runtime 'workspacetymylu68u6l6qwgp'
October 2nd 2018, 04:36:35.751 Successfully recovered workspace runtime 'workspacegj73ipi3nn3jjm3e'
October 2nd 2018, 04:36:35.903 Successfully recovered workspace runtime 'workspaceop853grq9fo01pb7'
October 2nd 2018, 04:36:36.055 Successfully recovered workspace runtime 'workspace37t5fhzb9erijycf'
October 2nd 2018, 04:36:36.132 An error occurred while attempted to recover runtimes using infrastructure 'openshift'. Reason: 'Couldn't recover runtime 'workspacefdz4vn59o1myqi1a:default'. Error: The anonymous subject is used, and won't be able to perform this action'
[no more recovered messages]