Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using labels with PodGC, completion label is not added to the pod #7159

Closed
alexec opened this issue Nov 4, 2021 · 3 comments · Fixed by #7176
Closed

When using labels with PodGC, completion label is not added to the pod #7159

alexec opened this issue Nov 4, 2021 · 3 comments · Fixed by #7176

Comments

@alexec
Copy link
Contributor

alexec commented Nov 4, 2021

No description provided.

@alexec alexec self-assigned this Nov 4, 2021
@alexec alexec added this to To do in Run The Business (incl. bugs) via automation Nov 4, 2021
@alexec alexec added good first issue Good for newcomers and removed triage labels Nov 4, 2021
@alexec alexec removed this from To do in Run The Business (incl. bugs) Nov 4, 2021
@alexec
Copy link
Contributor Author

alexec commented Nov 4, 2021

@ziv-codefresh @roi-codefresh @pasha-codefresh @kostis-codefresh @plakyda-codefresh if you're looking for any issues to start work on, issues like this one, i.e. labelled with "good first issue" are the best places to start. When you want something more substantial look for "help wanted".

@alexec
Copy link
Contributor Author

alexec commented Nov 4, 2021

Cause: feat(controller): Support pod GC strategy based on label selector on pods (#5090)

@terrytangyuan you authored this feature in Feb 2021. It is minor (no reports).

@alexec alexec removed the good first issue Good for newcomers label Nov 4, 2021
@alexec alexec added this to To do in Run The Business (incl. bugs) via automation Nov 4, 2021
@alexec
Copy link
Contributor Author

alexec commented Nov 4, 2021

Having analysed this, it is not a good first issue. There are at least two bugs in this code, one related to the matching, at least one other, that result in cases where completed pods are never labelled and sit consuming memory they do not need to.

I don't think this is a big problem, because no one has reported it yet.

I think the podGC code should be centralised and we can remove completePods and use woc.getAllWorkflowPods() to list the workflows pods. This func did not exist in 2018.

	// It is important that we *never* label pods as completed until we successfully updated the workflow
	// Failing to do so means we can have inconsistent state.
	// Pods may be be labeled multiple times.
	// We MUST not need to label the pod if we might delete it later for GC. 
	delay := woc.controller.Config.GetPodGCDeleteDelayDuration()
	podGC := woc.execWf.Spec.PodGC
	strategy := podGC.GetStrategy()
	pods, err := woc.getAllWorkflowPods()
	if err != nil {
		woc.log.WithError(err).Error("failed to get pods")
		return
	}

	for _, pod := range pods {
		podCompleted := pod.Status.Phase == apiv1.PodSucceeded || pod.Status.Phase == apiv1.PodFailed
		matches, _ := podGC.Matches(pod.Labels)
		if matches && (
			strategy == wfv1.PodGCOnPodCompletion && podCompleted ||
				strategy == wfv1.PodGCOnPodSuccess && pod.Status.Phase == apiv1.PodSucceeded ||
				strategy == wfv1.PodGCOnWorkflowCompletion && woc.wf.Status.Phase.Completed() ||
				strategy == wfv1.PodGCOnWorkflowSuccess && woc.wf.Status.Phase == wfv1.WorkflowSucceeded) {
			woc.controller.queuePodForCleanupAfter(woc.wf.Namespace, pod.Name, deletePod, delay)
		} else if woc.wf.Status.Fulfilled() || podCompleted {
			woc.controller.queuePodForCleanup(woc.wf.Namespace, pod.Name, labelPodCompleted)
		}
	}

@alexec alexec removed their assignment Nov 4, 2021
@alexec alexec moved this from To do to In progress in Run The Business (incl. bugs) Nov 5, 2021
@alexec alexec self-assigned this Nov 5, 2021
alexec added a commit to alexec/argo-workflows that referenced this issue Nov 5, 2021
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@alexec alexec moved this from In progress to Done in Run The Business (incl. bugs) Nov 16, 2021
@alexec alexec closed this as completed Nov 16, 2021
alexec added a commit that referenced this issue Nov 18, 2021
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@sarabala1979 sarabala1979 mentioned this issue Dec 15, 2021
73 tasks
sarabala1979 pushed a commit that referenced this issue Dec 15, 2021
Signed-off-by: Alex Collins <alex_collins@intuit.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

1 participant