Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade the fabric client to support newer versions of k8s #13804

Closed
wants to merge 16 commits into from

Conversation

churromorales
Copy link
Contributor

This PR contains 2 items.

  1. Upgrade the k8s client to support newer versions of k8s.
  2. Ability to override specific Metrics Monitors

We found that when we used the TaskCountStatsMonitor in the overlord config, the peon tasks would not start because they inherit the monitors from the parent process, which used to be the Middle Manager (but that would never have that monitor originally). So now you can override this value with the following config:

druid.indexer.runner.peonMonitors

@@ -66,6 +66,7 @@ Additional Configuration
|`druid.indexer.runner.javaOptsArray`|`JsonArray`|java opts for the task.|`-Xmx1g`|No|
|`druid.indexer.runner.labels`|`JsonObject`|Additional labels you want to add to peon pod|`{}`|No|
|`druid.indexer.runner.annotations`|`JsonObject`|Additional annotations you want to add to peon pod|`{}`|No|
|`druid.indexer.runner.peonMonitors`|`JsonArray`|An override for the `druid.monitoring.monitors`, For the situation you have monitors setup, and do not want to inherit those from the overlord.|`[]`|No|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It would be useful to mention this property in the extension specific docs as well.

import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;

class LogWatchInputStreamTest

Check notice

Code scanning / CodeQL

Unused classes and interfaces Note

Unused class: LogWatchInputStreamTest is not referenced within this codebase. If not used as an external API it should be removed.
@@ -106,13 +104,15 @@ public JobResponse waitForJobCompletion(K8sTaskId taskId, long howLong, TimeUnit
.inNamespace(namespace)
.withName(taskId.getK8sTaskId())
.waitUntilCondition(
x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null,
x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null
&& (x.getStatus().getFailed() != null || x.getStatus().getSucceeded() !=null),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a tricky one. For some background

the overlord launches a task, waits for it to complete and then returns the status. Now in k8s < 1.25 it would just mean once the pod is not active, go grab the status.
Now the contact has changed a bit, because in 1.25 they introduced finalizers:
kubernetes/kubernetes#110948

And we noticed that when we ran tasks on a 1.25 k8s druid cluster they would complete fine but marked as failure.
We outputted the job status and noticed that the job was not active, but both success and failure were not set, but there was this field that had

uncountedTerminatedPods=UncountedTerminatedPods(failed=[], succeeded=[e916cbf9-467a-45f3-86a7-3767145d6384], additionalProperties={})

which from the docs:

UncountedTerminatedPods holds the UIDs of Pods that have terminated but the job controller hasn't yet accounted for in the status counters. The job controller creates pods with a finalizer. When a pod terminates (succeeded or failed), the controller does three steps to account for it in the job status: (1) Add the pod UID to the arrays in this field. (2) Remove the pod finalizer. (3) Remove the pod UID from the arrays while increasing the corresponding counter. This field is beta-level. The job controller only makes use of this field when the feature gate JobTrackingWithFinalizers is enabled (enabled by default). Old jobs might not be tracked using this field, in which case the field remains null.

So now what happens is the job goes from a state where it is not active, to having uncountedTerminatedPods to then having a status with success or failure. I will push up a one-line fix to make this work, but for those of you working with 1.25 version of k8s, I’m sure you will be affected as well.

Basically add another check to wait on,
Right now we wait for this:

// block until
job.getStatus() != null && job.getActive() == null
// then return 
return job.getStatus().getSucceeded() != null

So the change has to become

// block until
job.getStatus() != null && job.getActive() == null && (job.getStatus().getFailed() != null || job.getStatus().getSucceeded() !=null)
// then return 
return job.getStatus().getSucceeded() != null

This should keep things backwards compatible and working in all versions of k8s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for the older k8s version, we can expect that when job.getActive() is null, one of success or failure field must be set. Question - do we need to check for getActive() result at all? can we just block till the job either succeeded or failed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that could work, want me to change it? This definitely works above, I don't have access to many different k8s environments, and someone in the community helped me test the above on 1.25. I could definitely change this, but I unfortunately have no way of testing this out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this change, we can test it out on a v1.25 cluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nlippis I believe you can test by taking this commit and making that small change and apply it onto the 25 branch. I think if it works for you, ill resubmit this with the change.

@churromorales
Copy link
Contributor Author

@abhishekagarwal87 what do you think? This should bring the client up-to-date as well as fix things for those druid users running on k8s 1.25+ that were having issues.

@@ -106,13 +104,15 @@ public JobResponse waitForJobCompletion(K8sTaskId taskId, long howLong, TimeUnit
.inNamespace(namespace)
.withName(taskId.getK8sTaskId())
.waitUntilCondition(
x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null,
x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null
&& (x.getStatus().getFailed() != null || x.getStatus().getSucceeded() !=null),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for the older k8s version, we can expect that when job.getActive() is null, one of success or failure field must be set. Question - do we need to check for getActive() result at all? can we just block till the job either succeeded or failed?

{

@Test
void makingCodeCoverageHappy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This is a bit of a bummer. Is there no way to avoid this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I had tests for a class, then i upgraded a client library and had to change some things. but the functions were all tested before. Now i would expect that if tests pass I would be good, since it was already tested before. But turns out if you have to make any changes to make the new library apis work, then code coverage complains. It is unfortunate here, because I didn't change any behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't there any tests covering that path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added a getter to a class, the code coverage tool complained that i added new code without a test. So I added a test for the getter. That was new code, so i guess its okay to add a test if that's what druid wants. Although I feel tests like this are not really that useful and make the whole suite slow in general.

@@ -54,6 +54,7 @@
import static org.junit.jupiter.api.Assertions.assertEquals;

// must have a kind / minikube cluster installed and the image pushed to your repository
@Disabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using K3S now. do you still same issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k3s? This was for anyone who had an existing k8s cluster, they could just run this test and point it there. The k8s integration tests are definitely not flexible enough for me to test mm-less without rewriting a lot of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how was this running before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a kind cluster running locally, it was in the original patch. For anyone that wants to test with a local k8s cluster I left this test in the codebase. If you want me to remove it, I'll be happy to get rid of it. I find it useful for when I make changes, since I can't integration test this with the way current druid-it tests are setup.

Copy link
Contributor

@ektravel ektravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one suggestion for the doc.

churromorales and others added 2 commits February 28, 2023 12:45
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
@abhishekagarwal87
Copy link
Contributor

@churromorales - can you address the build failures?

Copy link
Contributor

@techdocsmith techdocsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small stylistic change for doc.

docs/development/extensions-contrib/k8s-jobs.md Outdated Show resolved Hide resolved
@churromorales
Copy link
Contributor Author

@churromorales - can you address the build failures?
I addressed them in a comment above, but maybe it was missed.

So I had tests for the DruidKubernetesPeonClient class, then i upgraded a client library and had to change some things. but the functions were all tested before. Now i would expect that if tests pass I would be good, since it was already tested before. But turns out if you have to make any changes to make the new library apis work, then code coverage complains. It is unfortunate here, because I didn't change any behavior. I can add more tests, but they wont be meaningful here, I believe the build problems are all because I updated the library and had to make some code changes to make it work. There are no logic changes here, except for the issue in 1.25, which I did test.

Copy link
Contributor

@jwitko jwitko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very much needed changes. thank you for making

@churromorales
Copy link
Contributor Author

@abhishekagarwal87 you good with this now? I explained about the test coverage in an earlier comment.

@mindreader
Copy link

Do you also need to make this change?

--- a/extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
+++ b/extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
@@ -180,7 +180,7 @@ public class KubernetesTaskRunner implements TaskLogStreamer, TaskRunner
                     completedPhase = monitorJob(peonPod, k8sTaskId);
                   } else {
                     Job job = existingJob.get();
-                    if (job.getStatus().getActive() == null) {
+                    if (job.getStatus() != null && job.getStatus().getActive() == null && (job.getStatus().getFailed() != null || job.getStatus().getSucceeded() !=null)) {
                       if (job.getStatus().getSucceeded() != null) {
                         completedPhase = new JobResponse(job, PeonPhase.SUCCEEDED);
                       } else {

@churromorales
Copy link
Contributor Author

Do you also need to make this change?

--- a/extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
+++ b/extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
@@ -180,7 +180,7 @@ public class KubernetesTaskRunner implements TaskLogStreamer, TaskRunner
                     completedPhase = monitorJob(peonPod, k8sTaskId);
                   } else {
                     Job job = existingJob.get();
-                    if (job.getStatus().getActive() == null) {
+                    if (job.getStatus() != null && job.getStatus().getActive() == null && (job.getStatus().getFailed() != null || job.getStatus().getSucceeded() !=null)) {
                       if (job.getStatus().getSucceeded() != null) {
                         completedPhase = new JobResponse(job, PeonPhase.SUCCEEDED);
                       } else {

I believe that is the crux of the change. I explain why here: #13804 (comment) Or let me know if I misunderstood you.

@a2l007
Copy link
Contributor

a2l007 commented Mar 21, 2023

@abhishekagarwal87 Are there any additional concerns with this PR apart from the code coverage failure? From the discussion it looks like there isn't really a way to make the coverage tool pass without adding non-meaningful tests.

@mindreader
Copy link

Or let me know if I misunderstood you.

Do you also need to make this change?

--- a/extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
+++ b/extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
@@ -180,7 +180,7 @@ public class KubernetesTaskRunner implements TaskLogStreamer, TaskRunner
                     completedPhase = monitorJob(peonPod, k8sTaskId);
                   } else {
                     Job job = existingJob.get();
-                    if (job.getStatus().getActive() == null) {
+                    if (job.getStatus() != null && job.getStatus().getActive() == null && (job.getStatus().getFailed() != null || job.getStatus().getSucceeded() !=null)) {
                       if (job.getStatus().getSucceeded() != null) {
                         completedPhase = new JobResponse(job, PeonPhase.SUCCEEDED);
                       } else {

I believe that is the crux of the change. I explain why here: #13804 (comment) Or let me know if I misunderstood you.

Sorry I wasn't clear. There are two spots in the code that have that same success / failure logic, I only see one in this pr, after the job is monitored. But I also see one before the job is monitored just after it is spawned in KubernetesTaskRunner. If the logic is not necessary at that point, please feel free to ignore me.

@abhishekagarwal87
Copy link
Contributor

@a2l007 - Sorry if I am missing something but if CC is failing anyway, there is no need to add makingCodeCoverageHappy() test in this PR. That's the only comment I have. LGTM otherwise.

@churromorales
Copy link
Contributor Author

it is a good question. I think the way I coded it was not very clear, I apologize for that.
I tried to make the KubernetesTaskRunnerTest cover the different scenarios.

Lets look at this case....

startup overlord,
1st thing it does is go to k8s and see if there are any running tasks. If there are tasks, watch them and do bookkeeping when they finish.

The conditional block you are referring to handles that case. It first does a check to see if the task is not active, then checks the state. If it is still running it does the if statement you suggested (in following lines to monitorJob())`, so no need to do it twice. It's just a short circuit in case its already finished, grab the status and return, no need to make more k8s calls.

Does that help make sense of how things are working? These are good questions, and I want to make sure the code is doing the correct thing.

@abhishekagarwal87
Copy link
Contributor

@churromorales - Thank you for resolving the conflicts. There are build failures. can you take a look?

@churromorales
Copy link
Contributor Author

@abhishekagarwal87 I pushed up the fix, there were some conflicts due to something else getting merged before this.
Now there are issues downloading node, I just want to let you know, I don't have anymore cycles to follow up on this patch. If there are any more changes, feel free to take over and make whatever changes to merge if you want, or if you don't want that is fine too. I just wont be following up on this anymore.

@churromorales
Copy link
Contributor Author

@abhishekagarwal87 The test failing are from the other PR that was merged, I don't have cycles to look and fix whatever is wrong and we honestly forked our druid so whether it goes upstream is not a big deal.

I think the thing is the other PR is using all the old libraries, and it can't deserialize the pod template anymore. So if you want to update this feature to work with newer k8s versions, that will have to be sorted out at some point, or you can just close off this PR and worry about upgrades when the time comes. As it stands the mm-less wont work in newer versions of k8s so, I'll leave it up to imply what they want to do.

@abhishekagarwal87
Copy link
Contributor

I understand @churromorales. would you be open to giving write access on your branch to someone who wants to take this PR forward?

@abhishekagarwal87 abhishekagarwal87 added this to the 26.0 milestone Apr 4, 2023
@nlippis nlippis mentioned this pull request Apr 4, 2023
10 tasks
@clintropolis
Copy link
Member

closing since these changes were added to #14028

@clintropolis clintropolis removed this from the 26.0 milestone Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants