Workspaces can not be started after change image in the "Machine" section #9542

Ohrimenko1988 · 2018-04-25T14:35:17Z

Reproduction Steps

Create workspace from default "Java" stack
Open "Workspaces" page
In the workspaces list select just created workspace and click on it
In the opened "workspace details" page click on the "Machine" button
Click on the "Edit" button
In the "Recipe" text field delete several letters in the image name, it should be "eclipse/ubuntu_jd"
Click on the "Save" button
Click on the "Run" button
Pay attention to the error message
Try to create and run workspace with any stack
Pay attention to the errors

Expected conditions:
Workspace with wrong image is not started but test workspace is started successfully

Observed behavior:
Workspace with wrong image is not started and test workspace has the same error

Attachment:

OS and version:
Che 6.5.0 ; Fedora 25 ; Chrome 63.0

olexii4 · 2018-04-26T09:18:46Z

Objectively we cannot start any workspace after the first image pull workspaces error.
It is reproduced on Openshift infrastructure.
@sleshchenko WDYT?

sleshchenko · 2018-04-26T09:28:14Z

I think it is a server side issue. It is related to #9426.
There is an issue with container watching when all workspaces' objects are created in one namespace: In such case, it is required to filter pods and propagate only events which are related a particular workspace.

A similar scenario but with one workspace has two issues:

Reproduction Steps

Create workspace from default "Java" stack.
Open "Workspaces" page.
In the workspaces list select just created workspace and click on it.
In the opened "workspace details" page click on the "Machine" button.
Click on the "Edit" button.
In the "Recipe" text field delete several letters in the image name, it should be "eclipse/ubuntu_jd".
Click on the "Save" button.
Click on the "Run" button.
Pay attention to the error message.
Fix machine's image to a correct one.
Rerun the workspace.
Pay attention to the errors.

OpenShift Infrastructure doesn't support start interruption and there appears DB inconsistency, because of that the second start try fails with error message: Runtime is already started. Also because of that, it is not possible to delete failed workspace. Will be fixed by Implement interruption of start for OpenShift workspaces #5918.
OpenShift sends events that happened before (like 20 minutes ago). It means that even when image reference will be fixed workspace start will fails because of an old event from history. In this way, Che Server considers workspace as failed to start when it receives old events. REST API allows specifying resource version or time (I'm not sure what exactly) for filtering old events. But Fabric8 client has deprecated method watch(String resourceVerion, W watcher). Need to investigate how to fix it.

@ibuziuk I think it would be better to comment unrecoverable events listener until these issues are not fixed.

garagatyi · 2018-04-26T09:31:44Z

@ibuziuk @l0rd I think we should not include unrecoverable events commit in 6.4.1 to avoid blocker bug on OSIO prod.

ghost · 2018-04-30T05:12:44Z

Will this issues be taken care of in upcoming sprints? The ability to show why exactly pod failed to start is a very helpful feature both for Che admin and users on OpenShift.

garagatyi · 2018-05-07T18:23:03Z

@eivantsov it is not something easy to fix and includes changes in k8s workspace infrastructure and investigation of a fix on the fabric8 client and upgrade on Che side.
I would suggest to discuss the issue with @skabashnyuk and @l0rd to fit it into our sprints.

garagatyi · 2018-05-08T08:53:12Z

And since it is blocker should we fix it right away with even stopping work on other issues?

ghost · 2018-05-08T09:01:28Z

I agree, we should at least disable the feature @ibuziuk added until we investigate a proper fix.

@l0rd @skabashnyuk can you agree on whose sprint it is going to land in?

Disable handling of unrecoverable events because they prevent start of workspaces in certain cases after a workspace failed to start. See eclipse-che#9542 Signed-off-by: Oleksandr Garagatyi <ogaragat@redhat.com>

garagatyi · 2018-05-10T12:18:47Z

I've disabled unrecoverable events handling till we have a proper fix. Downgrading the priority from blocker to P1

SkorikSergey · 2018-05-10T14:17:15Z

Still actual for CHE ver. 6.5.0.

ibuziuk · 2018-05-15T12:28:04Z

PR is sent - #9703

…stamp Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

ibuziuk · 2018-05-18T13:12:25Z

Fix is merged to master

…tainers' initialization Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

…watchContainers` initialization Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

…ners` initialization Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

Ohrimenko1988 · 2018-06-01T12:00:49Z

This issue may be closed because the main problem was resolved. But bug with deleting of the workspace is still actual. It is described in the next issue:
#9905

Disable handling of unrecoverable events because they prevent start of workspaces in certain cases after a workspace failed to start. See eclipse-che#9542 Signed-off-by: Oleksandr Garagatyi <ogaragat@redhat.com>

…watchContainers` initialization Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

Ohrimenko1988 added kind/bug Outline of a bug - must adhere to the bug report template. severity/blocker Causes system to crash and be non-recoverable or prevents Che developers from working on Che code. labels Apr 25, 2018

ashumilova assigned olexii4 Apr 26, 2018

ashumilova added team/ide2 sprint/current labels Apr 26, 2018

olexii4 added the status/in-progress This issue has been taken by an engineer and is under active development. label Apr 26, 2018

olexii4 removed the status/in-progress This issue has been taken by an engineer and is under active development. label Apr 26, 2018

olexii4 removed their assignment Apr 26, 2018

ashumilova removed sprint/current team/ide2 labels May 2, 2018

garagatyi mentioned this issue May 10, 2018

CHE-9542: Disable unrecoverable k8s events handling #9661

Merged

garagatyi added severity/P1 Has a major impact to usage or development of the system. and removed severity/blocker Causes system to crash and be non-recoverable or prevents Che developers from working on Che code. labels May 10, 2018

l0rd mentioned this issue May 11, 2018

Properly fix how OS failure events are handled #9542 redhat-developer/rh-che#669

Closed

ibuziuk mentioned this issue May 15, 2018

#9542 Proper handling of unrecoverable events. Processing only events which are related to current workspace pods #9703

Merged

ibuziuk added a commit to ibuziuk/che that referenced this issue May 16, 2018

eclipse-che#9542 Adding filtering of unrecoverable events by lastTime…

c2a9655

…stamp Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

ibuziuk added a commit to ibuziuk/che that referenced this issue May 17, 2018

eclipse-che#9542 Adding proper event filtering

6914139

Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

ibuziuk added a commit to ibuziuk/che that referenced this issue May 18, 2018

eclipse-che#9542 Processing only events that happened after 'watchCon…

68f148b

…tainers' initialization Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

ibuziuk added a commit to ibuziuk/che that referenced this issue May 22, 2018

che eclipse-che#9542 Processing only the events that happened after `…

aaa9fec

…watchContainers` initialization Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

ibuziuk added a commit that referenced this issue May 22, 2018

che #9542 Processing only the events that happened after `watchContai…

72e350b

…ners` initialization Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

riuvshin pushed a commit that referenced this issue May 31, 2018

che #9542 Processing only the events that happened after `watchContai…

e2e9b05

…ners` initialization Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

Ohrimenko1988 closed this as completed Jun 1, 2018

hbhargav pushed a commit to hbhargav/che that referenced this issue Dec 5, 2018

che eclipse-che#9542 Processing only the events that happened after `…

a415f08

…watchContainers` initialization Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workspaces can not be started after change image in the "Machine" section #9542

Workspaces can not be started after change image in the "Machine" section #9542

Ohrimenko1988 commented Apr 25, 2018 •

edited

Loading

olexii4 commented Apr 26, 2018 •

edited

Loading

sleshchenko commented Apr 26, 2018 •

edited

Loading

garagatyi commented Apr 26, 2018

ghost commented Apr 30, 2018

garagatyi commented May 7, 2018

garagatyi commented May 8, 2018

ghost commented May 8, 2018

garagatyi commented May 10, 2018

SkorikSergey commented May 10, 2018

ibuziuk commented May 15, 2018 •

edited

Loading

ibuziuk commented May 18, 2018

Ohrimenko1988 commented Jun 1, 2018

Workspaces can not be started after change image in the "Machine" section #9542

Workspaces can not be started after change image in the "Machine" section #9542

Comments

Ohrimenko1988 commented Apr 25, 2018 • edited Loading

Reproduction Steps

olexii4 commented Apr 26, 2018 • edited Loading

sleshchenko commented Apr 26, 2018 • edited Loading

garagatyi commented Apr 26, 2018

ghost commented Apr 30, 2018

garagatyi commented May 7, 2018

garagatyi commented May 8, 2018

ghost commented May 8, 2018

garagatyi commented May 10, 2018

SkorikSergey commented May 10, 2018

ibuziuk commented May 15, 2018 • edited Loading

ibuziuk commented May 18, 2018

Ohrimenko1988 commented Jun 1, 2018

Ohrimenko1988 commented Apr 25, 2018 •

edited

Loading

olexii4 commented Apr 26, 2018 •

edited

Loading

sleshchenko commented Apr 26, 2018 •

edited

Loading

ibuziuk commented May 15, 2018 •

edited

Loading