Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: return failed instead of success when no container status #12197

Merged
merged 1 commit into from
Dec 5, 2023

Conversation

shuangkun
Copy link
Member

We have a product named ManagedArgoWorkflows in AlibabaCloud use eci (virtual-kubelet).
Sometimes the pod failed for some reason (no stock、timeout、schedulerFailed and more than these),in these conditions, the pod mightly not has containerstatus/condations/messages, and then the pod will marked success and caused mistake.

So I think we should return failed when the pod is Failed and we can't find message and status in the pod.

Maybe this problem will also occur in other public cloud platforms.

@Joibel
Copy link
Member

Joibel commented Nov 17, 2023

This feels like a very dangerous assumption to make. How do we know that the status won't get updated later to something more successful?

@shuangkun
Copy link
Member Author

This feels like a very dangerous assumption to make. How do we know that the status won't get updated later to something more successful?

I think if it get update later from failed to successeed. the workflow will reconicle. if the pod failed, it may be retry. if it marked successed(but actually never run ),we made a mistake. This is what I encountered

@shuangkun
Copy link
Member Author

This feels like a very dangerous assumption to make. How do we know that the status won't get updated later to something more successful?

I think in function "inferFailedReason", we are more likely retune failed instead of successed if we really didn't got the fail reason.

Copy link
Member

@Joibel Joibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I follow your reasoning now and why this is a reasonable change.

workflow/controller/operator.go Outdated Show resolved Hide resolved
@shuangkun shuangkun force-pushed the fix/returnFailed branch 2 times, most recently from 340ba38 to 219c610 Compare November 21, 2023 13:17
Copy link
Member

@Joibel Joibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me.

Signed-off-by: shuangkun <tsk2013uestc@163.com>
auto-merge was automatically disabled December 4, 2023 06:16

Head branch was pushed to by a user without write access

@juliev0 juliev0 enabled auto-merge (squash) December 5, 2023 04:55
@juliev0 juliev0 merged commit 7bcf908 into argoproj:main Dec 5, 2023
27 checks passed
sarabala1979 pushed a commit that referenced this pull request Jan 9, 2024
@agilgur5 agilgur5 added the area/controller Controller issues, panics label Jan 19, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request Mar 12, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants