-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added an ability to interrupt the start of a Kubernetes/OpenShift runtime #9816
Added an ability to interrupt the start of a Kubernetes/OpenShift runtime #9816
Conversation
ac53757
to
0ba805f
Compare
a7f0170
to
c00881b
Compare
ci-test |
afbca11
to
ebfda55
Compare
ci-test build report: |
ccb8c11
to
0d30c1f
Compare
@@ -139,22 +140,30 @@ public KubernetesInternalRuntime( | |||
this.executor = sharedPool.getExecutor(); | |||
this.runtimeStates = runtimeStates; | |||
this.machines = machines; | |||
this.startSynchronizer = new StartSynchronizer(context.getIdentity(), eventService); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: looks like event service is just a dependency of StartSynchronizer and KubernetesInternalRuntime doesn't have to have this dependency too. So it might be cleaner to create some factory for this class and hide its dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, and while there is only one injected field creating one more class looks a bit overkill. I would prefer to leave it as it is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
I have concerns regarding the complexity of KubernetesinternalRuntime and WorkspaceRuntimes classes but it is not concern about this PR but general concern about their design. I encourage you and will try to do that eventually too, to simplify them in future PRs.
I also have some question/comments about this PR which I inlined. Please, elaborate.
import org.eclipse.che.api.core.model.workspace.WorkspaceStatus; | ||
|
||
/** | ||
* Listener interface for being notified about workspace status changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you specify in javadocs why it might be used instead of subscribing to events from EventsService?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added note about clustering.
} catch (ExecutionException e) { | ||
throw new InfrastructureException( | ||
"Error occured while executing command in pod: " + e.getMessage(), e); | ||
} catch (TimeoutException ignored) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception is not ignored, so can you fix the variable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not ignored by not used =) Renamed to e
)
* | ||
* @throws InternalInfrastructureException when {@link #startThread} is already set. | ||
*/ | ||
public synchronized void setStartThread() throws InternalInfrastructureException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can include this operation into start method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I afraid I can't. Because start
method is invoked in two places: #internalStart
and #markStopping
. While #setStartThread
is invoked only by #internalStart
method
|
||
/** Registers a runtime start. */ | ||
public synchronized void start() { | ||
if (!isStarting) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate what will happen if 2 servers will try to simultaneously start a runtime? Is there a protection from that on this or some another level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WorkspaceRuntimes
class should synchronize invocation of InternalRuntime#start
.
So 2 server must not simultaneously start a runtime.
And it's possible that two threads will invoke StartSynchronizer#start
method on start interruption because start
method is invoked in two places: #internalStart and #markStopping.
Then the first call will set state to STARTING with performing the corresponding actions and the second call will do nothing.
* @throws RuntimeStartInterruptedException when start was interrupted | ||
* @throws InfrastructureException when any other exception occurs during start | ||
*/ | ||
public synchronized void complete() throws InfrastructureException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that methods are synchronized but fields might not be synced between threads caches. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand using synchronized
method ensures synchronization shared data between threads, am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But those fields are not shared between threads. They can be cached by each thread separately.
@@ -106,6 +105,7 @@ | |||
private final Executor executor; | |||
private final KubernetesRuntimeStateCache runtimeStates; | |||
private final KubernetesMachineCache machines; | |||
private final StartSynchronizer startSynchronizer; | |||
|
|||
@Inject | |||
public KubernetesInternalRuntime( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems OK, but, to be honest, I no longer can validate the logic of this class as it gets too complicated. I encourage you, and myself, to refactor this class as soon as possible to simplify it drastically.
* @author Sergii Leshchenko | ||
*/ | ||
@Listeners(MockitoTestNGListener.class) | ||
public class StartSynchronizerTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand these tests don't check multi-threaded use cases that are covered by the StartSynchronizer design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
5956e1f
to
a1bbeb4
Compare
ci-test |
f0a800b
to
7887c24
Compare
Signed-off-by: Sergii Leshchenko <sleshche@redhat.com>
It is needed to avoid '403 Pod doesn't exists' errors. It happens when start is interrupted when any of machines is on bootstrapping phase. As result connection leak happens TODO Create an issue for fabric8-client
…blishing a WebSocket connection
7887c24
to
0b30ca1
Compare
ci-test build report: |
@eclipse/eclipse-che-qa Please check test report. |
In the functionality which is covered by selenium tests - new regression was not found |
What does this PR do?
Adds an ability to interrupt the start of a Kubernetes/OpenShift runtime.
It is done by adding Utility class
StartSynchronizer
that is designed to synchronize start/stop operations. Synchronization is performed by registering all start/stop operations (like start is launched, an error occurs during start, start is interrupted, etc.)So,
KubernetesInternalRuntime
registers when the runtime is starting and then checks failure on each phase of start and interrupts start is a failure is registered.In turn,
KubernetesInternalRuntime
checks that start is in progress on runtime stopping, if yes - then start failure will be registered. So, Thread which is performing start will receive the registered exception and interrupt start.It also works in cluster mode, now there can be two Che Servers during Rolling Update. Synchronization between Che Servers is done by using JGroups Replicated Cache listeners. Each Che Server receives the corresponding event when a runtime stop is launched/completed.
There are 3 more changes:
Rework
UnrecoverableEventHandler
to fail workspace start instead interruption (earlier ) since there is such ability.Improve executing of commands in
KubernetesPods
class to throw if occurs while a WebSocket connection establishing.KubernetesBootstrapper now checks start interruption before execution each of commands. Without that there is much more probability that commands fails with the following error:
And it leads to connection was leaked warning
What issues does this PR fix or reference?
#5918
It also fixes #9905
Release Notes
Added an ability to interrupt the start of a Kubernetes/OpenShift runtime.
Docs PR