Make start of OpenShift machines parallel #8836

akorneta · 2018-02-19T15:05:50Z

What does this PR do?

Parallelizes the start of OpenShift machines.

Schematically the process of machine boot described in the following picture:

failure context its a state that shared overall machine threads in the scope of one workspace, it's used as an indicator of error that might appear in process of the start. This context holds only one exception that was first thrown(and its is visible for all the threads).

s1. Creates OpenShift pod watcher that listens to pod events and if the state satisfies predicate then this step is considered as finished. If the connection is closed or the listening is cancelled then this step is considered as failed;
s2, s4, s6 These steps are used to break the chain of invocation if there is an exception in the failure context(so if one of machine is failed we need to prevent the launching of other machines);
s3. Sets machine running status and propagates machine running event through event service.
s5. Asynchronously starts bootstrapping of installers and listen to bootstrapper status events, when an event with status done received then this step is considered as finished if the received event has status failed or listening is cancelled then this step is considered as failed.
s7. Asynchronously starts checking the readiness of servers of a machine, when servers are available then this step is considered as finished when this step is cancelled then it is considered as failed.

What issues does this PR fix or reference?

fixes #7067

Release Notes

n/a

akorneta · 2018-02-19T15:08:32Z

ci-test-ocp

codenvy-ci · 2018-02-20T02:02:13Z

ci-test build report:
Build details
Test report
selenium tests report data
docker image: eclipseche/che-server:8836
https://github.com/orgs/eclipse/teams/eclipse-che-qa please check this report.

sleshchenko · 2018-02-20T06:28:28Z

...space/src/main/java/org/eclipse/che/api/workspace/server/bootstrap/AbstractBootstrapper.java

@@ -28,6 +28,7 @@
 * @author Sergii Leshchenko
 */
 public abstract class AbstractBootstrapper {
+
  private final String machineName;
  private final int bootstrappingTimeoutMinutes;


Since it's used only by one bootstrapping method (#bootstrap but not #bootstrapAsync), I would move it to parameters of the #bootstrap method. It'd make clear that fact that bootstrap async is not limited by time out of the box.

sleshchenko · 2018-02-20T06:30:42Z

...main/java/org/eclipse/che/workspace/infrastructure/kubernetes/KubernetesInternalRuntime.java

+      final CompletableFuture<Void> allDone =
+          CompletableFuture.allOf(
+              machinesFutures.toArray(new CompletableFuture[machinesFutures.size()]));
+      CompletableFuture.anyOf(allDone, failure).get(machineStartTimeoutMin, TimeUnit.MINUTES);


As far as I understand previously it was used for setting a timeout of waiting for one machine to become running (Pod should be running but installers and servers are not launched yet). Now within this time all machines must become running, installers must be launched and servers must become RUNNING. Please consider renaming configuration parameter. Now it looks like Kubernetes workspace start timeout.

garagatyi

Looks very interesting, but requires careful review. Good job!

garagatyi · 2018-02-20T07:44:33Z

...tes/src/main/java/org/eclipse/che/workspace/infrastructure/kubernetes/KubernetesMachine.java

@@ -113,4 +114,11 @@ public void waitRunning(int timeoutMin) throws InfrastructureException {
            timeoutMin,
            p -> (KUBERNETES_POD_STATUS_RUNNING.equals(p.getStatus().getPhase())));
  }
+
+  public CompletableFuture<Void> watchReadinessAsync() {


Do we need to ensure that we won't have connection leaks because of that? @akorneta @sleshchenko WDYT?

I'll write doc for this method and will describe how this method might be used, to prevent leaks.

garagatyi · 2018-02-20T07:47:03Z

...main/java/org/eclipse/che/workspace/infrastructure/kubernetes/KubernetesInternalRuntime.java

+        toCancelFutures.add(machineRunningFuture);
+        final CompletableFuture<Void> machineBootChain =
+            machineRunningFuture
+                .thenComposeAsync(checkFailure(failure), executor)


Used executor uses non-daemon threads. But in this case, it seems to me that daemon threads would be a better fit, WDYT?

makes sense, I'll change that.

garagatyi · 2018-02-20T07:50:06Z

.../main/java/org/eclipse/che/workspace/infrastructure/kubernetes/namespace/KubernetesPods.java

+    try {
+      final PodResource<Pod, DoneablePod> podResource =
+          clientFactory.create().pods().inNamespace(namespace).withName(name);
+      final Watch watch =


Do we need to ensure that we won't have connection leaks because of that? @akorneta @sleshchenko WDYT?

I think that there should not be any problems with WebSocket connection leaks because the method documents indicate that the future must be explicitly cancelled if it is not needed and has not reached the final state.

akorneta · 2018-02-20T16:27:26Z

ci-test-ocp

codenvy-ci · 2018-02-20T16:33:21Z

ci-test build report:
Build details
Test report
selenium tests report data
docker image: eclipseche/che-server:8836
https://github.com/orgs/eclipse/teams/eclipse-che-qa please check this report.

akorneta · 2018-02-20T16:50:06Z

ci-test-ocp

codenvy-ci · 2018-02-21T03:46:36Z

ci-test build report:
Build details
Test report
selenium tests report data
docker image: eclipseche/che-server:8836
https://github.com/orgs/eclipse/teams/eclipse-che-qa please check this report.

akorneta · 2018-02-21T13:57:22Z

ci-test-ocp

codenvy-ci · 2018-02-21T14:03:07Z

ci-test build report:
Build details
Test report
selenium tests report data
docker image: eclipseche/che-server:8836
https://github.com/orgs/eclipse/teams/eclipse-che-qa please check this report.

akorneta · 2018-02-21T15:03:02Z

ci-test-ocp

codenvy-ci · 2018-02-21T16:08:07Z

ci-test build report:
Build details
Test report
selenium tests report data
docker image: eclipseche/che-server:8836
https://github.com/orgs/eclipse/teams/eclipse-che-qa please check this report.

akorneta · 2018-02-21T16:09:46Z

ci-test-ocp

riuvshin · 2018-02-22T12:02:19Z

akorneta · 2018-02-22T14:16:12Z

ci-test-ocp

codenvy-ci · 2018-02-22T21:30:15Z

ci-test build report:
Build details
Test report
selenium tests report data
docker image: eclipseche/che-server:8836
https://github.com/orgs/eclipse/teams/eclipse-che-qa please check this report.

mshaposhnik · 2018-02-23T09:48:45Z

...n/java/org/eclipse/che/multiuser/machine/authentication/server/MachineTokenProviderImpl.java

@@ -33,8 +32,7 @@ public MachineTokenProviderImpl(MachineTokenRegistry tokenRegistry) {
  }

  @Override
-  public String getToken(String workspaceId) {


I would rather left this method here and call getToken() with two params within it instead of adding
EnvironmentContext.getCurrent()... in 15 other places

yes, it makes sense. The reason why the method was removed is to force developers to explicitly call EnvironmentContext.getCurrent() in cases where they sure that user won't be anonymous, but I think it might be reworked in a better way, so I'll consider to back this method and adding a check(that user is not anonymous) before token retrieval.

akorneta · 2018-02-23T11:14:28Z

ci-test

sleshchenko

Good job! 👍
It's mega cool feature, especially for agents that are launched as parallel machines.

codenvy-ci · 2018-02-23T11:44:23Z

ci-test build report:
Build details
Test report
selenium tests report data
docker image: eclipseche/che-server:8836
https://github.com/orgs/eclipse/teams/eclipse-che-qa please check this report.

akorneta · 2018-02-23T13:34:23Z

ci-test

codenvy-ci · 2018-02-23T20:40:51Z

ci-test build report:
Build details
Test report
selenium tests report data
docker image: eclipseche/che-server:8836
https://github.com/orgs/eclipse/teams/eclipse-che-qa please check this report.

akorneta · 2018-02-25T16:27:22Z

ci-test-ocp

codenvy-ci · 2018-02-25T22:36:59Z

ci-test build report:
Build details
Test report
selenium tests report data
docker image: eclipseche/che-server:8836
https://github.com/orgs/eclipse/teams/eclipse-che-qa please check this report.

garagatyi

Cool PR! Looks good.

akorneta added status/in-progress This issue has been taken by an engineer and is under active development. kind/task Internal things, technical debt, and to-do tasks to be performed. labels Feb 19, 2018

akorneta self-assigned this Feb 19, 2018

akorneta requested review from skabashnyuk, mshaposhnik, garagatyi and sleshchenko February 19, 2018 15:05

akorneta changed the title ~~Make start of OpenShift machines parallel~~ [WIP] - Make start of OpenShift machines parallel Feb 19, 2018

sleshchenko reviewed Feb 20, 2018

View reviewed changes

garagatyi reviewed Feb 20, 2018

View reviewed changes

akorneta force-pushed the CHE-7067 branch 2 times, most recently from 5e34902 to a6d6c3e Compare February 20, 2018 13:29

akorneta requested a review from riuvshin as a code owner February 20, 2018 13:29

akorneta force-pushed the CHE-7067 branch 2 times, most recently from c0af867 to 6de9531 Compare February 20, 2018 16:39

akorneta force-pushed the CHE-7067 branch from 6de9531 to a55e81f Compare February 21, 2018 12:49

akorneta requested review from tolusha and vparfonov as code owners February 21, 2018 13:57

akorneta force-pushed the CHE-7067 branch from 3ed091c to 941e185 Compare February 21, 2018 15:51

akorneta force-pushed the CHE-7067 branch 2 times, most recently from ce789d8 to 9c72a6e Compare February 22, 2018 11:49

akorneta force-pushed the CHE-7067 branch from 8fa5f01 to 38b9977 Compare February 22, 2018 13:46

akorneta changed the title ~~[WIP] - Make start of OpenShift machines parallel~~ Make start of OpenShift machines parallel Feb 22, 2018

akorneta added status/code-review This issue has a pull request posted for it and is awaiting code review completion by the community. and removed status/in-progress This issue has been taken by an engineer and is under active development. labels Feb 22, 2018

akorneta force-pushed the CHE-7067 branch from 38b9977 to a75c648 Compare February 22, 2018 13:51

akorneta force-pushed the CHE-7067 branch 2 times, most recently from ffacfaf to e71d4a0 Compare February 23, 2018 08:32

mshaposhnik reviewed Feb 23, 2018

View reviewed changes

skabashnyuk approved these changes Feb 23, 2018

View reviewed changes

sleshchenko approved these changes Feb 23, 2018

View reviewed changes

mshaposhnik approved these changes Feb 23, 2018

View reviewed changes

garagatyi approved these changes Feb 26, 2018

View reviewed changes

Make start of OpenShift machines parallel

e4d55b3

akorneta force-pushed the CHE-7067 branch from 0ce9aee to e4d55b3 Compare February 26, 2018 09:50

akorneta merged commit 806a6da into eclipse-che:master Feb 26, 2018

akorneta deleted the CHE-7067 branch February 26, 2018 09:53

akorneta removed the status/code-review This issue has a pull request posted for it and is awaiting code review completion by the community. label Feb 26, 2018

benoitf added this to the 6.2.0 milestone Feb 26, 2018

akorneta mentioned this pull request Feb 27, 2018

Fix context propagation for bootstrap and server checks #8924

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make start of OpenShift machines parallel #8836

Make start of OpenShift machines parallel #8836

akorneta commented Feb 19, 2018 •

edited

Loading

akorneta commented Feb 19, 2018

codenvy-ci commented Feb 20, 2018

sleshchenko Feb 20, 2018

sleshchenko Feb 20, 2018 •

edited

Loading

garagatyi left a comment

garagatyi Feb 20, 2018

akorneta Feb 22, 2018

garagatyi Feb 20, 2018

akorneta Feb 22, 2018

garagatyi Feb 20, 2018

akorneta Feb 22, 2018

akorneta commented Feb 20, 2018

codenvy-ci commented Feb 20, 2018

akorneta commented Feb 20, 2018

codenvy-ci commented Feb 21, 2018

akorneta commented Feb 21, 2018

codenvy-ci commented Feb 21, 2018

akorneta commented Feb 21, 2018

codenvy-ci commented Feb 21, 2018

akorneta commented Feb 21, 2018

riuvshin commented Feb 22, 2018

akorneta commented Feb 22, 2018

codenvy-ci commented Feb 22, 2018

mshaposhnik Feb 23, 2018

akorneta Feb 23, 2018

akorneta commented Feb 23, 2018

sleshchenko left a comment

codenvy-ci commented Feb 23, 2018

akorneta commented Feb 23, 2018

codenvy-ci commented Feb 23, 2018

akorneta commented Feb 25, 2018

codenvy-ci commented Feb 25, 2018

garagatyi left a comment

Make start of OpenShift machines parallel #8836

Make start of OpenShift machines parallel #8836

Conversation

akorneta commented Feb 19, 2018 • edited Loading

What does this PR do?

What issues does this PR fix or reference?

Release Notes

akorneta commented Feb 19, 2018

codenvy-ci commented Feb 20, 2018

Choose a reason for hiding this comment

sleshchenko Feb 20, 2018 • edited Loading

Choose a reason for hiding this comment

garagatyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akorneta commented Feb 20, 2018

codenvy-ci commented Feb 20, 2018

akorneta commented Feb 20, 2018

codenvy-ci commented Feb 21, 2018

akorneta commented Feb 21, 2018

codenvy-ci commented Feb 21, 2018

akorneta commented Feb 21, 2018

codenvy-ci commented Feb 21, 2018

akorneta commented Feb 21, 2018

riuvshin commented Feb 22, 2018

akorneta commented Feb 22, 2018

codenvy-ci commented Feb 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akorneta commented Feb 23, 2018

sleshchenko left a comment

Choose a reason for hiding this comment

codenvy-ci commented Feb 23, 2018

akorneta commented Feb 23, 2018

codenvy-ci commented Feb 23, 2018

akorneta commented Feb 25, 2018

codenvy-ci commented Feb 25, 2018

garagatyi left a comment

Choose a reason for hiding this comment

akorneta commented Feb 19, 2018 •

edited

Loading

sleshchenko Feb 20, 2018 •

edited

Loading