Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspaces can not be started after change image in the "Machine" section #9542

Closed
Ohrimenko1988 opened this issue Apr 25, 2018 · 12 comments
Closed
Labels
kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.

Comments

@Ohrimenko1988
Copy link
Contributor

Ohrimenko1988 commented Apr 25, 2018

Reproduction Steps

  • Create workspace from default "Java" stack
  • Open "Workspaces" page
  • In the workspaces list select just created workspace and click on it
  • In the opened "workspace details" page click on the "Machine" button
  • Click on the "Edit" button
  • In the "Recipe" text field delete several letters in the image name, it should be "eclipse/ubuntu_jd"
  • Click on the "Save" button
  • Click on the "Run" button
  • Pay attention to the error message
  • Try to create and run workspace with any stack
  • Pay attention to the errors

Expected conditions:
Workspace with wrong image is not started but test workspace is started successfully

Observed behavior:
Workspace with wrong image is not started and test workspace has the same error

Attachment:
screencast-6_1

OS and version:
Che 6.5.0 ; Fedora 25 ; Chrome 63.0

@Ohrimenko1988 Ohrimenko1988 added kind/bug Outline of a bug - must adhere to the bug report template. severity/blocker Causes system to crash and be non-recoverable or prevents Che developers from working on Che code. labels Apr 25, 2018
@olexii4 olexii4 added the status/in-progress This issue has been taken by an engineer and is under active development. label Apr 26, 2018
@olexii4
Copy link
Contributor

olexii4 commented Apr 26, 2018

Objectively we cannot start any workspace after the first image pull workspaces error.
It is reproduced on Openshift infrastructure.
@sleshchenko WDYT?

@olexii4 olexii4 removed the status/in-progress This issue has been taken by an engineer and is under active development. label Apr 26, 2018
@olexii4 olexii4 removed their assignment Apr 26, 2018
@sleshchenko
Copy link
Member

sleshchenko commented Apr 26, 2018

I think it is a server side issue. It is related to #9426.
There is an issue with container watching when all workspaces' objects are created in one namespace: In such case, it is required to filter pods and propagate only events which are related a particular workspace.

A similar scenario but with one workspace has two issues:

Reproduction Steps
  1. Create workspace from default "Java" stack.
  2. Open "Workspaces" page.
  3. In the workspaces list select just created workspace and click on it.
  4. In the opened "workspace details" page click on the "Machine" button.
  5. Click on the "Edit" button.
  6. In the "Recipe" text field delete several letters in the image name, it should be "eclipse/ubuntu_jd".
  7. Click on the "Save" button.
  8. Click on the "Run" button.
  9. Pay attention to the error message.
  10. Fix machine's image to a correct one.
  11. Rerun the workspace.
  12. Pay attention to the errors.
  1. OpenShift Infrastructure doesn't support start interruption and there appears DB inconsistency, because of that the second start try fails with error message: Runtime is already started. Also because of that, it is not possible to delete failed workspace. Will be fixed by Implement interruption of start for OpenShift workspaces #5918.
  2. OpenShift sends events that happened before (like 20 minutes ago). It means that even when image reference will be fixed workspace start will fails because of an old event from history. In this way, Che Server considers workspace as failed to start when it receives old events. REST API allows specifying resource version or time (I'm not sure what exactly) for filtering old events. But Fabric8 client has deprecated method watch(String resourceVerion, W watcher). Need to investigate how to fix it.

@ibuziuk I think it would be better to comment unrecoverable events listener until these issues are not fixed.

@garagatyi
Copy link

@ibuziuk @l0rd I think we should not include unrecoverable events commit in 6.4.1 to avoid blocker bug on OSIO prod.

@ghost
Copy link

ghost commented Apr 30, 2018

Will this issues be taken care of in upcoming sprints? The ability to show why exactly pod failed to start is a very helpful feature both for Che admin and users on OpenShift.

@garagatyi
Copy link

@eivantsov it is not something easy to fix and includes changes in k8s workspace infrastructure and investigation of a fix on the fabric8 client and upgrade on Che side.
I would suggest to discuss the issue with @skabashnyuk and @l0rd to fit it into our sprints.

@garagatyi
Copy link

And since it is blocker should we fix it right away with even stopping work on other issues?

@ghost
Copy link

ghost commented May 8, 2018

I agree, we should at least disable the feature @ibuziuk added until we investigate a proper fix.

@l0rd @skabashnyuk can you agree on whose sprint it is going to land in?

garagatyi pushed a commit to garagatyi/che that referenced this issue May 10, 2018
Disable handling of unrecoverable events because they prevent
start of workspaces in certain cases after a workspace failed to
start. See eclipse-che#9542
Signed-off-by: Oleksandr Garagatyi <ogaragat@redhat.com>
garagatyi pushed a commit to garagatyi/che that referenced this issue May 10, 2018
Disable handling of unrecoverable events because they prevent
start of workspaces in certain cases after a workspace failed to
start. See eclipse-che#9542
Signed-off-by: Oleksandr Garagatyi <ogaragat@redhat.com>
@garagatyi
Copy link

I've disabled unrecoverable events handling till we have a proper fix. Downgrading the priority from blocker to P1

@garagatyi garagatyi added severity/P1 Has a major impact to usage or development of the system. and removed severity/blocker Causes system to crash and be non-recoverable or prevents Che developers from working on Che code. labels May 10, 2018
@SkorikSergey
Copy link
Contributor

Still actual for CHE ver. 6.5.0.

@ibuziuk
Copy link
Member

ibuziuk commented May 15, 2018

PR is sent - #9703

ibuziuk added a commit to ibuziuk/che that referenced this issue May 16, 2018
…stamp

Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>
ibuziuk added a commit to ibuziuk/che that referenced this issue May 17, 2018
Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>
@ibuziuk
Copy link
Member

ibuziuk commented May 18, 2018

Fix is merged to master

ibuziuk added a commit to ibuziuk/che that referenced this issue May 18, 2018
…tainers' initialization

Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>
ibuziuk added a commit to ibuziuk/che that referenced this issue May 22, 2018
…watchContainers` initialization

Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>
ibuziuk added a commit that referenced this issue May 22, 2018
…ners` initialization

Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>
riuvshin pushed a commit that referenced this issue May 31, 2018
…ners` initialization

Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>
@Ohrimenko1988
Copy link
Contributor Author

This issue may be closed because the main problem was resolved. But bug with deleting of the workspace is still actual. It is described in the next issue:
#9905

hbhargav pushed a commit to hbhargav/che that referenced this issue Dec 5, 2018
Disable handling of unrecoverable events because they prevent
start of workspaces in certain cases after a workspace failed to
start. See eclipse-che#9542
Signed-off-by: Oleksandr Garagatyi <ogaragat@redhat.com>
hbhargav pushed a commit to hbhargav/che that referenced this issue Dec 5, 2018
…watchContainers` initialization

Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

7 participants