[cherry-pick] prevent dynamic service deletion during upgrade#16153
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates KubeDiscoveryService to skip service registration if the current pod is terminating, and to trigger service updates if the owner references of a service have changed. The review feedback identifies a potential NullPointerException when handling nullable expected owner references, and suggests making the pod termination check more robust by handling non-404 ApiExceptions (such as RBAC or transient errors) and guarding against a null podName.
| private boolean hasOwnerReferencesChanged(V1Service currentService, | ||
| @Nullable List<V1OwnerReference> expectedOwners) { | ||
| List<V1OwnerReference> currentOwners = Optional.ofNullable(currentService.getMetadata()) | ||
| .map(V1ObjectMeta::getOwnerReferences).orElse(Collections.emptyList()); | ||
|
|
||
| return !new HashSet<>(currentOwners).equals(new HashSet<>(expectedOwners)); | ||
| } |
There was a problem hiding this comment.
The expectedOwners parameter is annotated with @Nullable, but passing a null value to new HashSet<>(expectedOwners) will throw a NullPointerException. We should safely handle the case where expectedOwners is null by defaulting to an empty list.
| private boolean hasOwnerReferencesChanged(V1Service currentService, | |
| @Nullable List<V1OwnerReference> expectedOwners) { | |
| List<V1OwnerReference> currentOwners = Optional.ofNullable(currentService.getMetadata()) | |
| .map(V1ObjectMeta::getOwnerReferences).orElse(Collections.emptyList()); | |
| return !new HashSet<>(currentOwners).equals(new HashSet<>(expectedOwners)); | |
| } | |
| private boolean hasOwnerReferencesChanged(V1Service currentService, | |
| @Nullable List<V1OwnerReference> expectedOwners) { | |
| List<V1OwnerReference> currentOwners = Optional.ofNullable(currentService.getMetadata()) | |
| .map(V1ObjectMeta::getOwnerReferences).orElse(Collections.emptyList()); | |
| List<V1OwnerReference> expected = expectedOwners == null ? Collections.emptyList() : expectedOwners; | |
| return !new HashSet<>(currentOwners).equals(new HashSet<>(expected)); | |
| } |
| private boolean isPodTerminating(CoreV1Api api) throws ApiException { | ||
| try { | ||
| V1Pod pod = api.readNamespacedPod(podName, namespace, null); | ||
| return Optional.ofNullable(pod.getMetadata()).map(V1ObjectMeta::getDeletionTimestamp) | ||
| .isPresent(); | ||
|
|
||
| } catch (ApiException e) { | ||
| // If it is a 404 ApiException, the pod is already gone. Treat as terminating. | ||
| if (e.getCode() == HttpURLConnection.HTTP_NOT_FOUND) { | ||
| return true; | ||
| } | ||
| throw e; | ||
| } | ||
| } |
There was a problem hiding this comment.
If the Kubernetes service account lacks permissions to read pods (e.g., due to RBAC restrictions returning 403 Forbidden) or if there is a transient API error, readNamespacedPod will throw an ApiException. Currently, this exception is rethrown, which will cause the entire service registration to fail. To make this more robust, we should catch ApiException, log a warning, and return false (assuming the pod is not terminating) so that service registration can still proceed. Additionally, we should guard against podName being null.
private boolean isPodTerminating(CoreV1Api api) {
if (podName == null) {
return false;
}
try {
V1Pod pod = api.readNamespacedPod(podName, namespace, null);
return Optional.ofNullable(pod.getMetadata()).map(V1ObjectMeta::getDeletionTimestamp)
.isPresent();
} catch (ApiException e) {
// If it is a 404 ApiException, the pod is already gone. Treat as terminating.
if (e.getCode() == HttpURLConnection.HTTP_NOT_FOUND) {
return true;
}
LOG.warn("Failed to read pod status for {}. Assuming pod is not terminating.", podName, e);
return false;
}
}d25bff8 to
312e3cd
Compare
|


Cherry Pick of #16151