Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2637] finalizePods should ignore pods like registerPods #847

Closed
wants to merge 3 commits into from

Conversation

wilfred-s
Copy link
Contributor

What is this PR for?

If a pod was in a terminal state during registration it is skipped. The same principal should apply to finalisePods:

  • only check pods that were added in registerPods.
  • remove finished pods in the finalizePods call if they were registered.
  • new pods are not added in finalizePods

What type of PR is it?

  • - Bug Fix

What is the Jira issue?

How should this be tested?

Unit tests now cover the code path
e2e tests coverage is not possible as we cannot time the pod removal well enough

If a pod was in a terminal state during registration it is skipped. The
same principal should apply to finalisePods:
* only check pods that were added in registerPods.
* remove finished pods in the finalizePods call if they were registered.
* new pods are not added in finalizePods
@wilfred-s wilfred-s self-assigned this May 23, 2024
@pbacsko
Copy link
Contributor

pbacsko commented May 23, 2024

Unit test failure is due to YUNIKORN-2629.

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending e2e

A related question: Context doesn't have the best unit test coverage. I do believe we need to enhance it. Eg bindPodVolumes() is completely uncovered, but there are a lot of missing paths, too. Thoughts?

@wilfred-s
Copy link
Contributor Author

+1 pending e2e

A related question: Context doesn't have the best unit test coverage. I do believe we need to enhance it. Eg bindPodVolumes() is completely uncovered, but there are a lot of missing paths, too. Thoughts?

I would completely agree. That is why I wrote tests that cover the two functions that I changed.

@wilfred-s
Copy link
Contributor Author

I think we need to get a fix for YUNIKORN-2629 created, reviewed and committed to get the unit tests to pass on the k8shim and get things stable again. Until that point I think we need to wait with commits in the master.

if _, ok := podMap[pod.UID]; !ok {
// node no longer exists, delete it
// pod no longer exists, delete it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. In this phase we have added event handler, so those pods may be removed already from context already. It seems to me that is a race condition but it is fine as ctx.DeletePod(pod) is no-op for repeated delete. If this description is valid, maybe we should add comments for it. Also, we should add tests for DeletePod to make sure it is no-op in repeated deletes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to enhance test coverage for Context, we can have a separate test case for DeletePod() to make sure that's idempotent. But we can do this in this PR as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to address that in separate PR :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so those pods may be removed already

The maybe is what we're trying to catch. It could be that the pod was removed between the list in registerPods and the event handler add later on. It will not show here but the event handler has not seen it being removed. Those we want to catch. Not the ones that have already been deleted by the handlers. We should not have to describe that in this function.

Also, we should add tests for DeletePod to make sure it is no-op in repeated deletes.

See above for the comment made by Peter around the unit test coverage of the context code. It needs to be extended. Any code path or function that declares to be idempotent should have tests for it.

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
All green, I'm merging this.

@pbacsko pbacsko closed this in 8e680af Jun 7, 2024
pbacsko pushed a commit that referenced this pull request Jun 7, 2024
If a pod was in a terminal state during registration it is skipped. The
same principal should apply to finalisePods:
* only check pods that were added in registerPods.
* remove finished pods in the finalizePods call if they were registered.
* new pods are not added in finalizePods

Closes: #847

Signed-off-by: Peter Bacsko <pbacsko@cloudera.com>
(cherry picked from commit 8e680af)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants