Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch connection manager never closed when trying to delete a non-existing POD #9932

Merged
merged 6 commits into from Jun 5, 2018
Expand Up @@ -554,22 +554,36 @@ public void delete() throws InfrastructureException {
}

private CompletableFuture<Void> doDelete(String name) throws InfrastructureException {
Watch toCloseOnException = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can propose you to rewrite this method in the following way:

@VisibleForTesting
  CompletableFuture<Void> doDelete(String name) {
    final CompletableFuture<Void> deleteFuture = new CompletableFuture<>();
    try {
      final PodResource<Pod, DoneablePod> podResource =
          clientFactory.create(workspaceId).pods().inNamespace(namespace).withName(name);

      Watch watch = podResource.watch(new DeleteWatcher(deleteFuture));
      //watch is opened - register callback to close it
      deleteFuture.whenComplete((v, e) -> watch.close());

      Boolean deleteSucceeded = podResource.delete();
      if (deleteSucceeded == null || !deleteSucceeded) {
        deleteFuture.complete(null);
      }
    } catch (KubernetesClientException ex) {
      deleteFuture.completeExceptionally(new KubernetesInfrastructureException(ex));
    } catch (Exception e) {
      deleteFuture.completeExceptionally(new InternalInfrastructureException(e.getMessage(), e));
    }

    return deleteFuture;
  }

Why it can be better - because in a case when an exception occurs during removing many pods - then there will be trying to remove all pods. With current approach - if an exception occurs then no more pods will be removed. And I think it would be better to try to clean up resources as many as we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind if we do this change in another PR ?
I'd prefer keep the behavior as much identical as possible to what was previously, and only fix the bug of the non-closed watch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidfestal Makes sense. I'm OK with that

try {
final PodResource<Pod, DoneablePod> podResource =
clientFactory.create(workspaceId).pods().inNamespace(namespace).withName(name);
final CompletableFuture<Void> deleteFuture = new CompletableFuture<>();
CompletableFuture<Void> deleteFuture = new CompletableFuture<>();
final Watch watch = podResource.watch(new DeleteWatcher(deleteFuture));

podResource.delete();
return deleteFuture.whenComplete(
(v, e) -> {
if (e != null) {
LOG.warn("Failed to remove pod {} cause {}", name, e.getMessage());
}
watch.close();
});
toCloseOnException = watch;
deleteFuture =
deleteFuture.whenComplete(
(v, e) -> {
if (e != null) {
LOG.warn("Failed to remove pod {} cause {}", name, e.getMessage());
}
watch.close();
});
Boolean deleteSucceeded = podResource.delete();
if (deleteSucceeded == null || !deleteSucceeded) {
watch.close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, in a case when a pod doesn't exist, delete future will be completed exceptionally with message Webscoket connection is closed. But event about removing is not received.. And I would say that it is not an exceptional situation. Then maybe deleteFuture.complete() would be better instead of closing a Watch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sleshchenko It appears (also through debugging) that it's clearly not the case: when the POD doesn't exist, the DeleteWatcher isn't called at all by the WatchConnectionManager and finally the WatchConnectionManager is never closed.

This is precisely the bug I'm fixing here.

This is why I have to explicitly close the WatchConnectionManager when the delete() call returns false because no such POD exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the fixed bug and it's nice catch =)
And what I understand is (as far as I understand) if you completed future then onComplete will be called and watch will be closed. And completing the future without exception would be more correctly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, we would still have to close the watch in case an exception occurs in the podResource.delete() call and no DeleteWatcher method is called. So why not do the same action in the 2 cases where deletion didn't occur ?

}
return deleteFuture;
} catch (KubernetesClientException ex) {
if (toCloseOnException != null) {
toCloseOnException.close();
}
throw new KubernetesInfrastructureException(ex);
} catch (Exception e) {
if (toCloseOnException != null) {
toCloseOnException.close();
}
throw e;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to wrap this exception into InternalInfastructureException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, honestly I prefer re-throwing exactly the same exception, so that there is no behavior change compared to the previous implementation that used to only catch KubernetesClientException exceptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's the same as was before. But only InfrastructureException is declared as throwing and it would be safer to wrap other exception in InternalInfastructureException.
InternalInfastructureException is special kind of exception that should wrap unexpected exceptions like NullPointerException which happen because of developer fault.
I would recommend wrapping but you can leave it as it is since it doesn't change current approach.

}
}

Expand Down