Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make sure Finalizers has chance to be removed. Fixes: #12836 #12831

Merged
merged 7 commits into from Apr 2, 2024

Conversation

shuangkun
Copy link
Member

@shuangkun shuangkun commented Mar 21, 2024

Fixes: #12836

When the cluster pressure is high or execution is slow, TestStoppedWorkflow often fail.
Because artifacts gc didn't execute.

It should be that after the last Failed, the controller has no chance to execute this workflow again.

Motivation

Modifications

Verification

Signed-off-by: shuangkun <tsk2013uestc@163.com>
Signed-off-by: shuangkun <tsk2013uestc@163.com>
@shuangkun shuangkun changed the title fix: artifact GC. fix: make sure artifact GC has a chance to execute. Mar 21, 2024
@shuangkun shuangkun changed the title fix: make sure artifact GC has a chance to execute. fix: make sure artifact GC has chance to execute. Mar 21, 2024
@shuangkun shuangkun closed this Mar 21, 2024
@shuangkun shuangkun reopened this Mar 21, 2024
@shuangkun shuangkun closed this Mar 21, 2024
@shuangkun shuangkun reopened this Mar 21, 2024
@shuangkun shuangkun closed this Mar 21, 2024
@shuangkun shuangkun reopened this Mar 21, 2024
@shuangkun shuangkun closed this Mar 21, 2024
@shuangkun shuangkun reopened this Mar 21, 2024
@shuangkun
Copy link
Member Author

@juliev0 Hi, can you help have a look, might help developers. Thanks!

Copy link
Member

@tczhao tczhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you paste the link to the failed TestStoppedWorkflow workflow, that would help us better understand the problem

@@ -806,6 +809,10 @@ func (woc *wfOperationCtx) persistUpdates(ctx context.Context) {
woc.log.WithError(err).Warn("failed to delete task-results")
}
}
// If FinalizerArtifactGC exists, requeue to make sure artifact GC can execute.
if woc.wf.Status.Fulfilled() && slices.Contains(wf.GetFinalizers(), common.FinalizerArtifactGC) {
woc.requeue()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this result in an infinite loop where the workflow always in the wfqueue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FinalizerArtifactGC should be removed after gc completed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should theoretically requeue if there's any Finalizer, not just Artifact GC?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, how do you feel about adding to this to the block above if woc.wf.Status.Fulfilled() { as a nested if statement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On version 3.5.5, I've noticed that when I stop & delete workflows in the UI before they are complete, finalizers aren't removed and the workflow gets stuck (but artifact gc succeeds). I believe this is the same issue.

@juliev0 Looking at some of the test failures linked, I'm noticing the following:

Waiting 1m30s for workflows {{ } workflows.argoproj.io/test metadata.name=artgc-dag-wf-stopped-pod-gc-on-pod-completion-qrlp5 false false   <nil> 0 }
    when.go:356: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

Indicating a rate limit issue with calling wfList, err := w.client.List(ctx, listOptions). If we're getting a rate limit error, and execution moves on to the artifact presence check too soon, then yes, it could be that the controller didn't have a chance to finish PodGC yet.

@shuangkun @juliev0 per my first comment here, I still think there's an issue that needs to be solved (and I think your proposed solution could be sufficient). As @tczhao mentioned, requeueing could be an issue, but only if finalizers are never removed for some reason.

If adding rate limiting for WaitForWorkflowList resolves these transient test issues, then we still need to modify the TestStoppedWorkflow test to ensure that finalizers are removed.

Copy link
Contributor

@juliev0 juliev0 Mar 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry, I just realized my last comment said "PodGC" - I meant to say "ArtifactGC" (which I've since edited))

If we're getting a rate limit error, and execution moves on to the artifact presence check too soon, then yes, it could be that the controller didn't have a chance to finish PodGC yet. <-- I assume you also mean "ArtifactGC"

@Garett-MacGowan Thanks for tying that together. Actually, in my original statement, I didn't realize that this WaitForWorkflowDeletion() call essentially waits for the finalizer to have been removed, so the logic does make sense, except that rate limiting seems to defeat that. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On version 3.5.5, I've noticed that when I stop & delete workflows in the UI before they are complete, finalizers aren't removed and the workflow gets stuck (but artifact gc succeeds).

That's interesting that it sounds like there's also a real issue. It's notable what @tczhao pointed out that the WorkflowArtifactGCTaskInformer should requeue when the ArtifactGCTask has changed, which should cause the Workflow to be processed and the WorkflowArtifactGCTasks listed and read here and then the Finalizer removed. Maybe there's some race condition in there that could prevent that?

I guess at the end of the day we need to determine if there's any harm in this change. If we change woc.requeue() to woc.requeueAfter(delay) and reduce the immediacy of the requeue, then maybe it's generally a good thing that we regularly revisit Workflows that still have Finalizers just in case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you also mean "ArtifactGC"

Yes, I do.

Actually, in my original statement, I didn't realize that this WaitForWorkflowDeletion() call essentially waits for the finalizer to have been removed, so the logic does make sense, except that rate limiting seems to defeat that. :(

Ahh, yes, it does handle that properly already. @shuangkun maybe we could add time.Sleep(time.Second), or whatever the kubernetes API QPS rate limit is, to WaitForWorkflowList()?

maybe it's generally a good thing that we regularly revisit Workflows that still have Finalizers just in case?

I think it's not a terrible idea. If it's requeued after a reasonable delay, it should prevent resource hogging to a degree. I can imagine an issue where a cron workflow continuously fails to GC, leading to a workflow build up, and a large amount of requeueing workflows, though. This would bog down other workflows eventually, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, true. I guess there is something broken as far as finalizer logic if we get into that scenario.

@shuangkun
Copy link
Member Author

After the workflow is set to Failed, the workflow is never in the queue, resulting in no artifact gc.

@shuangkun
Copy link
Member Author

could you paste the link to the failed TestStoppedWorkflow workflow, that would help us better understand the problem

I'll post it next time, as this thing is not easy to reproduce. But I've encountered it several times recently.

@shuangkun
Copy link
Member Author

could you paste the link to the failed TestStoppedWorkflow workflow, that would help us better understand the problem

https://github.com/argoproj/argo-workflows/actions/runs/8374128100/job/22928794428?pr=12780 Here!

@shuangkun
Copy link
Member Author

when workflow marked failed, no artifact gc:

2024-03-21T11:39:27.9575692Z controller: time="2024-03-21T11:37:57.358Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=3221 namespace=argo workflow=     artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9542 2024-03-21T11:39:27.9576502Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=4 workflow=artgc-dag-wf-stopp     ed-pod-gc-on-pod-completion-vwxxz
9543 2024-03-21T11:39:27.9577647Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="task-result changed" namespace=argo nodeID=artgc-dag-wf-stopped-pod-gc-on-pod-co     mpletion-vwxxz-4010160274 workflow=artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9544 2024-03-21T11:39:27.9578782Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="task-result changed" namespace=argo nodeID=artgc-dag-wf-stopped-pod-gc-on-pod-co     mpletion-vwxxz-3197474861 workflow=artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9545 2024-03-21T11:39:27.9579911Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="task-result changed" namespace=argo nodeID=artgc-dag-wf-stopped-pod-gc-on-pod-co     mpletion-vwxxz-3993382655 workflow=artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9546 2024-03-21T11:39:27.9581042Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="task-result changed" namespace=argo nodeID=artgc-dag-wf-stopped-pod-gc-on-pod-co     mpletion-vwxxz-554465011 workflow=artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9547 2024-03-21T11:39:27.9581817Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="Updated phase Running -> Failed" namespace=argo workflow=artgc-dag-wf-stopped-po     d-gc-on-pod-completion-vwxxz
9548 2024-03-21T11:39:27.9582791Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="Updated message  -> Stopped with strategy 'Stop'" namespace=argo workflow=artgc-     dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9549 2024-03-21T11:39:27.9583656Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="Marking workflow completed" namespace=argo workflow=artgc-dag-wf-stopped-pod-gc-     on-pod-completion-vwxxz
9550 2024-03-21T11:39:27.9584121Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="Workflow to be dehydrated" Workflow Size=7879
9551 2024-03-21T11:39:27.9585013Z controller: time="2024-03-21T11:37:57.365Z" level=info msg="cleaning up pod" action=deletePod key=argo/artgc-dag-wf-stopped-pod-gc-on-pod-co     mpletion-vwxxz-1340600742-agent/deletePod
9552 2024-03-21T11:39:27.9585931Z controller: time="2024-03-21T11:37:57.367Z" level=info msg="Workflow update successful" namespace=argo phase=Failed resourceVersion=3225 wor     kflow=artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9553 2024-03-21T11:39:27.9587435Z controller: time="2024-03-21T11:37:57.367Z" level=warning msg="failed to clean-up pod" action=deletePod error="pods \"artgc-dag-wf-stopped-p     od-gc-on-pod-completion-vwxxz-1340600742-agent\" not found" key=argo/artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz-1340600742-agent/deletePod
9554 2024-03-21T11:39:27.9588296Z controller: time="2024-03-21T11:37:57.367Z" level=warning msg="Non-transient error: pods \"artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwx     xz-1340600742-agent\" not found"
9555 2024-03-21T11:39:27.9588475Z port-forward: Handling connection for 9000
9556 2024-03-21T11:39:27.9589786Z controller: time="2024-03-21T11:37:58.569Z" level=info msg="cleaning up pod" action=killContainers key=argo/artgc-dag-wf-stopped-pod-gc-on-p     od-completion-vwxxz-artgc-dag-workflow-stopper-554465011/killContainers
9557 2024-03-21T11:39:27.9590915Z controller: time="2024-03-21T11:37:58.599Z" level=info msg="cleaning up pod" action=killContainers key=argo/artgc-dag-wf-stopped-pod-gc-on-p     od-completion-vwxxz-artgc-dag-artifact-creator-2-3993382655/killContainers
9558 2024-03-21T11:39:27.9591099Z port-forward: Handling connection for 9000
9559 2024-03-21T11:39:27.9591264Z port-forward: Handling connection for 9000
9560 2024-03-21T11:39:27.9591429Z port-forward: Handling connection for 9000
9561 2024-03-21T11:39:27.9591601Z port-forward: Handling connection for 9000
9562 2024-03-21T11:39:27.9591765Z port-forward: Handling connection for 9000
9563 2024-03-21T11:39:27.9591933Z port-forward: Handling connection for 9000

@shuangkun shuangkun changed the title fix: make sure artifact GC has chance to execute. fix: make sure artifact GC has chance to execute. Fixes: #12836 Mar 22, 2024
Signed-off-by: shuangkun <tsk2013uestc@163.com>
@shuangkun shuangkun force-pushed the fix/LetGCCanExecute branch 2 times, most recently from b6ee60c to d2c6ea0 Compare March 22, 2024 10:39
@shuangkun shuangkun added area/controller Controller issues, panics area/artifacts S3/GCP/OSS/Git/HDFS etc labels Mar 22, 2024
@juliev0
Copy link
Contributor

juliev0 commented Mar 22, 2024

I will definitely take a look at this.

@juliev0 juliev0 self-assigned this Mar 22, 2024
@juliev0
Copy link
Contributor

juliev0 commented Mar 22, 2024

Are you saying that in the current code in master, that we do requeue in the case of the Workflow being in Succeeded state but not Failed state? Or that we don't necessarily requeue for any Completed state? I'm curious where the logic for this is.

@shuangkun
Copy link
Member Author

Are you saying that in the current code in master, that we do requeue in the case of the Workflow being in Succeeded state but not Failed state? Or that we don't necessarily requeue for any Completed state? I'm curious where the logic for this is.

workflow is failed. In the current code, sometimes during the last reconceil, the workflow is set to failed, but the wf queue is emptied. At this time, there is no chance for the next round of reconceil and garbage collection.

@shuangkun
Copy link
Member Author

when workflow marked failed, no artifact gc:

2024-03-21T11:39:27.9575692Z controller: time="2024-03-21T11:37:57.358Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=3221 namespace=argo workflow=     artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9542 2024-03-21T11:39:27.9576502Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=4 workflow=artgc-dag-wf-stopp     ed-pod-gc-on-pod-completion-vwxxz
9543 2024-03-21T11:39:27.9577647Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="task-result changed" namespace=argo nodeID=artgc-dag-wf-stopped-pod-gc-on-pod-co     mpletion-vwxxz-4010160274 workflow=artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9544 2024-03-21T11:39:27.9578782Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="task-result changed" namespace=argo nodeID=artgc-dag-wf-stopped-pod-gc-on-pod-co     mpletion-vwxxz-3197474861 workflow=artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9545 2024-03-21T11:39:27.9579911Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="task-result changed" namespace=argo nodeID=artgc-dag-wf-stopped-pod-gc-on-pod-co     mpletion-vwxxz-3993382655 workflow=artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9546 2024-03-21T11:39:27.9581042Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="task-result changed" namespace=argo nodeID=artgc-dag-wf-stopped-pod-gc-on-pod-co     mpletion-vwxxz-554465011 workflow=artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9547 2024-03-21T11:39:27.9581817Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="Updated phase Running -> Failed" namespace=argo workflow=artgc-dag-wf-stopped-po     d-gc-on-pod-completion-vwxxz
9548 2024-03-21T11:39:27.9582791Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="Updated message  -> Stopped with strategy 'Stop'" namespace=argo workflow=artgc-     dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9549 2024-03-21T11:39:27.9583656Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="Marking workflow completed" namespace=argo workflow=artgc-dag-wf-stopped-pod-gc-     on-pod-completion-vwxxz
9550 2024-03-21T11:39:27.9584121Z controller: time="2024-03-21T11:37:57.359Z" level=info msg="Workflow to be dehydrated" Workflow Size=7879
9551 2024-03-21T11:39:27.9585013Z controller: time="2024-03-21T11:37:57.365Z" level=info msg="cleaning up pod" action=deletePod key=argo/artgc-dag-wf-stopped-pod-gc-on-pod-co     mpletion-vwxxz-1340600742-agent/deletePod
9552 2024-03-21T11:39:27.9585931Z controller: time="2024-03-21T11:37:57.367Z" level=info msg="Workflow update successful" namespace=argo phase=Failed resourceVersion=3225 wor     kflow=artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz
9553 2024-03-21T11:39:27.9587435Z controller: time="2024-03-21T11:37:57.367Z" level=warning msg="failed to clean-up pod" action=deletePod error="pods \"artgc-dag-wf-stopped-p     od-gc-on-pod-completion-vwxxz-1340600742-agent\" not found" key=argo/artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz-1340600742-agent/deletePod
9554 2024-03-21T11:39:27.9588296Z controller: time="2024-03-21T11:37:57.367Z" level=warning msg="Non-transient error: pods \"artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwx     xz-1340600742-agent\" not found"
9555 2024-03-21T11:39:27.9588475Z port-forward: Handling connection for 9000
9556 2024-03-21T11:39:27.9589786Z controller: time="2024-03-21T11:37:58.569Z" level=info msg="cleaning up pod" action=killContainers key=argo/artgc-dag-wf-stopped-pod-gc-on-p     od-completion-vwxxz-artgc-dag-workflow-stopper-554465011/killContainers
9557 2024-03-21T11:39:27.9590915Z controller: time="2024-03-21T11:37:58.599Z" level=info msg="cleaning up pod" action=killContainers key=argo/artgc-dag-wf-stopped-pod-gc-on-p     od-completion-vwxxz-artgc-dag-artifact-creator-2-3993382655/killContainers
9558 2024-03-21T11:39:27.9591099Z port-forward: Handling connection for 9000
9559 2024-03-21T11:39:27.9591264Z port-forward: Handling connection for 9000
9560 2024-03-21T11:39:27.9591429Z port-forward: Handling connection for 9000
9561 2024-03-21T11:39:27.9591601Z port-forward: Handling connection for 9000
9562 2024-03-21T11:39:27.9591765Z port-forward: Handling connection for 9000
9563 2024-03-21T11:39:27.9591933Z port-forward: Handling connection for 9000

but the wf queue is emptied

Do you by chance know in the code where that's happening?

This cannot be reproduced locally, I inferred based on the logs in the test. Because there are no artifact logs after workflow Failed. The previous logs have "adding artifact GC finalizer" log, but not found "removing artifact GC finalizer" log. So I guess the wf queue is empty at this time.

2024-03-21T11:39:27.9277151Z controller: time="2024-03-21T11:37:34.876Z" level=info msg="adding artifact GC finalizer" namespace=argo workflow=artgc-dag-wf-stopped-pod-gc-on-pod-completion-vwxxz

Signed-off-by: shuangkun <tsk2013uestc@163.com>
@shuangkun shuangkun changed the title fix: make sure artifact GC has chance to execute. Fixes: #12836 fix: make sure Finalizer has chance to be removed. Fixes: #12836 Mar 24, 2024
@shuangkun shuangkun changed the title fix: make sure Finalizer has chance to be removed. Fixes: #12836 fix: make sure Finalizers has chance to be removed. Fixes: #12836 Mar 24, 2024
@agilgur5 agilgur5 added the area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more label Mar 24, 2024
@@ -806,6 +809,10 @@ func (woc *wfOperationCtx) persistUpdates(ctx context.Context) {
woc.log.WithError(err).Warn("failed to delete task-results")
}
}
// If FinalizerArtifactGC exists, requeue to make sure artifact GC can execute.
if woc.wf.Status.Fulfilled() && slices.Contains(wf.GetFinalizers(), common.FinalizerArtifactGC) {
woc.requeue()
Copy link
Member

@tczhao tczhao Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the log and my understanding of how artifactGC works, I think the issue is something else.

Currently, the finalizer does have a chance to be removed, see if pseudo code below makes sense

// operate()
if wf.status.fulfilled
   garbageCollectArtifacts
      create WorkflowArtifactGCTask
      create artifact gc pod
          loop through WorkflowArtifactGCTask
          patch WorkflowArtifactGCTask for each deletion
// controller()
WorkflowArtifactGCTaskInformer.
   on WorkflowArtifactGCTask Update
      wfqueue.AddRateLimited(key) // this requeue wf to remove finalizer

Signed-off-by: shuangkun <tsk2013uestc@163.com>
@shuangkun shuangkun force-pushed the fix/LetGCCanExecute branch 2 times, most recently from 354016f to a8077e5 Compare March 31, 2024 05:34
@shuangkun shuangkun closed this Mar 31, 2024
@shuangkun shuangkun reopened this Mar 31, 2024
Signed-off-by: shuangkun <tsk2013uestc@163.com>
@shuangkun
Copy link
Member Author

Is requeueAfter 5s OK?

@shuangkun shuangkun requested a review from juliev0 April 1, 2024 06:48
Signed-off-by: shuangkun <tsk2013uestc@163.com>
@@ -347,6 +347,7 @@ func (w *When) WaitForWorkflowList(listOptions metav1.ListOptions, condition fun
return w
}
}
time.Sleep(time.Second)
Copy link
Contributor

@juliev0 juliev0 Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here that we are causing the rate limiting problem ourselves with too many consecutive queries? As far as I know, all of these tests run in parallel as part of CI so they can all affect each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, never mind. It's probably a per-client rate limiting and each test would have its own client I guess?

@juliev0
Copy link
Contributor

juliev0 commented Apr 1, 2024

I think I'm good with this. If there are no objections from the other reviewers I can merge it. Thanks as always for the iterations @shuangkun !

Copy link
Member

@tczhao tczhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fair to me

@juliev0 juliev0 merged commit fb6c3d0 into argoproj:main Apr 2, 2024
27 checks passed
@shuangkun
Copy link
Member Author

Thank you everyone for reviews!

@agilgur5 agilgur5 added this to the v3.5.x patches milestone Apr 19, 2024
agilgur5 pushed a commit that referenced this pull request Apr 19, 2024
…2831)

Signed-off-by: shuangkun <tsk2013uestc@163.com>
(cherry picked from commit fb6c3d0)
@agilgur5
Copy link
Member

agilgur5 commented Apr 19, 2024

Backported cleanly into release-3.5 as ce7cad3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc area/controller Controller issues, panics area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestStoppedWorkflow fail sometimes (flakey test)
5 participants