New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade the fabric client to support newer versions of k8s #13804
Conversation
…ty to override peon monitors
@@ -66,6 +66,7 @@ Additional Configuration | |||
|`druid.indexer.runner.javaOptsArray`|`JsonArray`|java opts for the task.|`-Xmx1g`|No| | |||
|`druid.indexer.runner.labels`|`JsonObject`|Additional labels you want to add to peon pod|`{}`|No| | |||
|`druid.indexer.runner.annotations`|`JsonObject`|Additional annotations you want to add to peon pod|`{}`|No| | |||
|`druid.indexer.runner.peonMonitors`|`JsonArray`|An override for the `druid.monitoring.monitors`, For the situation you have monitors setup, and do not want to inherit those from the overlord.|`[]`|No| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It would be useful to mention this property in the extension specific docs as well.
...rlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/LogWatchInputStream.java
Show resolved
Hide resolved
import static org.mockito.Mockito.verify; | ||
import static org.mockito.Mockito.when; | ||
|
||
class LogWatchInputStreamTest |
Check notice
Code scanning / CodeQL
Unused classes and interfaces Note
@@ -106,13 +104,15 @@ public JobResponse waitForJobCompletion(K8sTaskId taskId, long howLong, TimeUnit | |||
.inNamespace(namespace) | |||
.withName(taskId.getK8sTaskId()) | |||
.waitUntilCondition( | |||
x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null, | |||
x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null | |||
&& (x.getStatus().getFailed() != null || x.getStatus().getSucceeded() !=null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a tricky one. For some background
the overlord launches a task, waits for it to complete and then returns the status. Now in k8s < 1.25 it would just mean once the pod is not active, go grab the status.
Now the contact has changed a bit, because in 1.25 they introduced finalizers:
kubernetes/kubernetes#110948
And we noticed that when we ran tasks on a 1.25 k8s druid cluster they would complete fine but marked as failure.
We outputted the job status and noticed that the job was not active, but both success and failure were not set, but there was this field that had
uncountedTerminatedPods=UncountedTerminatedPods(failed=[], succeeded=[e916cbf9-467a-45f3-86a7-3767145d6384], additionalProperties={})
which from the docs:
UncountedTerminatedPods holds the UIDs of Pods that have terminated but the job controller hasn't yet accounted for in the status counters. The job controller creates pods with a finalizer. When a pod terminates (succeeded or failed), the controller does three steps to account for it in the job status: (1) Add the pod UID to the arrays in this field. (2) Remove the pod finalizer. (3) Remove the pod UID from the arrays while increasing the corresponding counter. This field is beta-level. The job controller only makes use of this field when the feature gate JobTrackingWithFinalizers is enabled (enabled by default). Old jobs might not be tracked using this field, in which case the field remains null.
So now what happens is the job goes from a state where it is not active, to having uncountedTerminatedPods to then having a status with success or failure. I will push up a one-line fix to make this work, but for those of you working with 1.25 version of k8s, I’m sure you will be affected as well.
Basically add another check to wait on,
Right now we wait for this:
// block until
job.getStatus() != null && job.getActive() == null
// then return
return job.getStatus().getSucceeded() != null
So the change has to become
// block until
job.getStatus() != null && job.getActive() == null && (job.getStatus().getFailed() != null || job.getStatus().getSucceeded() !=null)
// then return
return job.getStatus().getSucceeded() != null
This should keep things backwards compatible and working in all versions of k8s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for the older k8s version, we can expect that when job.getActive() is null, one of success
or failure
field must be set. Question - do we need to check for getActive() result at all? can we just block till the job either succeeded or failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that could work, want me to change it? This definitely works above, I don't have access to many different k8s environments, and someone in the community helped me test the above on 1.25. I could definitely change this, but I unfortunately have no way of testing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this change, we can test it out on a v1.25 cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nlippis I believe you can test by taking this commit and making that small change and apply it onto the 25 branch. I think if it works for you, ill resubmit this with the change.
@abhishekagarwal87 what do you think? This should bring the client up-to-date as well as fix things for those druid users running on k8s 1.25+ that were having issues. |
...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
Outdated
Show resolved
Hide resolved
@@ -106,13 +104,15 @@ public JobResponse waitForJobCompletion(K8sTaskId taskId, long howLong, TimeUnit | |||
.inNamespace(namespace) | |||
.withName(taskId.getK8sTaskId()) | |||
.waitUntilCondition( | |||
x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null, | |||
x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null | |||
&& (x.getStatus().getFailed() != null || x.getStatus().getSucceeded() !=null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for the older k8s version, we can expect that when job.getActive() is null, one of success
or failure
field must be set. Question - do we need to check for getActive() result at all? can we just block till the job either succeeded or failed?
...extensions/src/main/java/org/apache/druid/k8s/overlord/common/DruidKubernetesPeonClient.java
Show resolved
Hide resolved
{ | ||
|
||
@Test | ||
void makingCodeCoverageHappy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. This is a bit of a bummer. Is there no way to avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I had tests for a class, then i upgraded a client library and had to change some things. but the functions were all tested before. Now i would expect that if tests pass I would be good, since it was already tested before. But turns out if you have to make any changes to make the new library apis work, then code coverage complains. It is unfortunate here, because I didn't change any behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't there any tests covering that path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added a getter to a class, the code coverage tool complained that i added new code without a test. So I added a test for the getter. That was new code, so i guess its okay to add a test if that's what druid wants. Although I feel tests like this are not really that useful and make the whole suite slow in general.
@@ -54,6 +54,7 @@ | |||
import static org.junit.jupiter.api.Assertions.assertEquals; | |||
|
|||
// must have a kind / minikube cluster installed and the image pushed to your repository | |||
@Disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using K3S now. do you still same issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k3s? This was for anyone who had an existing k8s cluster, they could just run this test and point it there. The k8s integration tests are definitely not flexible enough for me to test mm-less without rewriting a lot of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how was this running before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a kind cluster running locally, it was in the original patch. For anyone that wants to test with a local k8s cluster I left this test in the codebase. If you want me to remove it, I'll be happy to get rid of it. I find it useful for when I make changes, since I can't integration test this with the way current druid-it tests are setup.
...erlord-extensions/src/test/java/org/apache/druid/k8s/overlord/common/K8sTaskAdapterTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one suggestion for the doc.
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
@churromorales - can you address the build failures? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small stylistic change for doc.
So I had tests for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very much needed changes. thank you for making
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
…ests caused by a merge conflict
@abhishekagarwal87 you good with this now? I explained about the test coverage in an earlier comment. |
Do you also need to make this change? --- a/extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
+++ b/extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
@@ -180,7 +180,7 @@ public class KubernetesTaskRunner implements TaskLogStreamer, TaskRunner
completedPhase = monitorJob(peonPod, k8sTaskId);
} else {
Job job = existingJob.get();
- if (job.getStatus().getActive() == null) {
+ if (job.getStatus() != null && job.getStatus().getActive() == null && (job.getStatus().getFailed() != null || job.getStatus().getSucceeded() !=null)) {
if (job.getStatus().getSucceeded() != null) {
completedPhase = new JobResponse(job, PeonPhase.SUCCEEDED);
} else { |
I believe that is the crux of the change. I explain why here: #13804 (comment) Or let me know if I misunderstood you. |
@abhishekagarwal87 Are there any additional concerns with this PR apart from the code coverage failure? From the discussion it looks like there isn't really a way to make the coverage tool pass without adding non-meaningful tests. |
Or let me know if I misunderstood you.
Sorry I wasn't clear. There are two spots in the code that have that same success / failure logic, I only see one in this pr, after the job is monitored. But I also see one before the job is monitored just after it is spawned in KubernetesTaskRunner. If the logic is not necessary at that point, please feel free to ignore me. |
@a2l007 - Sorry if I am missing something but if CC is failing anyway, there is no need to add |
it is a good question. I think the way I coded it was not very clear, I apologize for that. Lets look at this case.... startup overlord, The conditional block you are referring to handles that case. It first does a check to see if the task is not active, then checks the state. If it is still running it does the Does that help make sense of how things are working? These are good questions, and I want to make sure the code is doing the correct thing. |
@churromorales - Thank you for resolving the conflicts. There are build failures. can you take a look? |
@abhishekagarwal87 I pushed up the fix, there were some conflicts due to something else getting merged before this. |
@abhishekagarwal87 The test failing are from the other PR that was merged, I don't have cycles to look and fix whatever is wrong and we honestly forked our druid so whether it goes upstream is not a big deal. I think the thing is the other PR is using all the old libraries, and it can't deserialize the pod template anymore. So if you want to update this feature to work with newer k8s versions, that will have to be sorted out at some point, or you can just close off this PR and worry about upgrades when the time comes. As it stands the mm-less wont work in newer versions of k8s so, I'll leave it up to imply what they want to do. |
I understand @churromorales. would you be open to giving write access on your branch to someone who wants to take this PR forward? |
closing since these changes were added to #14028 |
This PR contains 2 items.
We found that when we used the
TaskCountStatsMonitor
in the overlord config, the peon tasks would not start because they inherit the monitors from the parent process, which used to be the Middle Manager (but that would never have that monitor originally). So now you can override this value with the following config:druid.indexer.runner.peonMonitors