Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArtifactGC not executing on archiveLogs when specified in Workflow level #13421

Open
3 of 4 tasks
segues opened this issue Jul 31, 2024 · 1 comment
Open
3 of 4 tasks
Labels
area/archive-logs Archive Logs feature area/artifacts S3/GCP/OSS/Git/HDFS etc area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more P3 Low priority solution/workaround There's a workaround, might not be great, but exists type/bug

Comments

@segues
Copy link

segues commented Jul 31, 2024

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

The artifactGC pod was not being deployed after workflow deletion. We have tested it with CronWorkflows also.
We deploy argo-workflows with helm.

artifactRepositoryRef:
  artifact-repositories:
    annotations:
      workflows.argoproj.io/default-artifact-repository: default
    default:
      archiveLogs: true
      s3:
        bucket: bucket-name
        endpoint: s3.amazonaws.com
        keyFormat: workflows-logs/{{workflow.name}}/{{workflow.creationTimestamp}}
        useSDKCreds: true

In our use case we use archiveLogs: true and AWS IRSA to authenticate from the service accounts (workflow and argo server). At workflow default configuration we use the following:

workflowDefaults:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 8737
      artifactRepositoryRef:
        configMap: artifact-repositories
        key: default
      serviceAccountName: argo-workflows
      artifactGC:
        strategy: OnWorkflowDeletion
        serviceAccountName: argo-workflows
        forceFinalizerRemoval: true

With the workflow below, both the generated logs and the files saved as artifacts are correctly removed from the s3 bucket.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-gc-
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion # the overall strategy, which can be overridden
    serviceAccountName: argo-workflows
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "hello world" > /tmp/on-completion.txt
            echo "hello world" > /tmp/on-deletion.txt
      outputs:
        artifacts:
          - name: on-completion
            path: /tmp/on-completion.txt
            s3:
              key: on-completion.txt
            artifactGC:
              strategy: OnWorkflowCompletion # overriding the default strategy for this artifact
          - name: on-deletion
            path: /tmp/on-deletion.txt
            s3:
              key: on-deletion.txt

But when we comment out the lines containing info about artifacts in the different workflow templates, the logs are not removed from the bucket.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: artifact-gc
  generateName: artifact-gc-
  namespace: workflows
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion # the overall strategy, which can be overridden
    serviceAccountName: argo-workflows
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "hello world" > /tmp/on-completion.txt
            echo "hello world" > /tmp/on-deletion.txt
      # outputs:
        # artifacts:
          # - name: on-completion
          #   path: /tmp/on-completion.txt
          #   s3:
          #     key: workflows-logs/on-completion.txt
          #   artifactGC:
          #     strategy: OnWorkflowCompletion # overriding the default strategy for this artifact
          #     serviceAccountName: argo-workflows
          # - name: on-deletion
          #   path: /tmp/on-deletion.txt
          #   s3:
          #     key: workflows-logs/on-deletion.txt

After this configuration change, we receive this message in workflow status:

artifactGCStatus:
    notSpecified: true

We have found a workaround, adding this metadata for both Workflows and CronWorkflows:

metadata:
    finalizers:
      - workflows.argoproj.io/artifact-gc

After adding it, the logs are deleted from s3 bucket, but it's necessary to fix it.

Version(s)

v3.5.8

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: artifact-gc
  generateName: artifact-gc-
  namespace: workflows
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion # the overall strategy, which can be overridden
    serviceAccountName: argo-workflows
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "hello world" > /tmp/on-completion.txt
            echo "hello world" > /tmp/on-deletion.txt
      outputs:
        artifacts:
          - name: on-completion
            path: /tmp/on-completion.txt
            s3:
              key: workflows-logs/on-completion.txt
            artifactGC:
              strategy: OnWorkflowCompletion # overriding the default strategy for this artifact
              serviceAccountName: argo-workflows
          - name: on-deletion
            path: /tmp/on-deletion.txt
            s3:
              key: workflows-logs/on-deletion.txt

Logs from the workflow controller

kubectl logs -n argo deploy/argo-workflows-workflow-controller | grep artifact-gc
time="2024-07-31T12:48:46.359Z" level=info msg="Processing workflow" Phase= ResourceVersion=1459135652 namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:46.369Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=0 workflow=artifact-gc
time="2024-07-31T12:48:46.369Z" level=info msg="Updated phase  -> Running" namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:46.369Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:46.369Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:46.369Z" level=info msg="Pod node artifact-gc initialized Pending" namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:46.453Z" level=info msg="Created pod: artifact-gc (artifact-gc)" namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:46.453Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:46.453Z" level=info msg=reconcileAgentPod namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:46.464Z" level=info msg="Workflow update successful" namespace=workflows phase=Running resourceVersion=1459135656 workflow=artifact-gc
time="2024-07-31T12:48:56.436Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=1459135656 namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:56.437Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=1 workflow=artifact-gc
time="2024-07-31T12:48:56.437Z" level=info msg="task-result changed" namespace=workflows nodeID=artifact-gc workflow=artifact-gc
time="2024-07-31T12:48:56.437Z" level=info msg="node changed" namespace=workflows new.message= new.phase=Succeeded new.progress=0/1 nodeID=artifact-gc old.message= old.phase=Pending old.progress=0/1 workflow=artifact-gc
time="2024-07-31T12:48:56.437Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:56.437Z" level=info msg=reconcileAgentPod namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:56.437Z" level=info msg="Updated phase Running -> Succeeded" namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:56.438Z" level=info msg="Marking workflow completed" namespace=workflows workflow=artifact-gc
time="2024-07-31T12:48:56.444Z" level=info msg="cleaning up pod" action=deletePod key=workflows/artifact-gc-1340600742-agent/deletePod
time="2024-07-31T12:48:56.447Z" level=info msg="Workflow update successful" namespace=workflows phase=Succeeded resourceVersion=1459135849 workflow=artifact-gc
time="2024-07-31T12:48:56.464Z" level=info msg="cleaning up pod" action=labelPodCompleted key=workflows/artifact-gc/labelPodCompleted
time="2024-07-31T12:50:32.699Z" level=info msg="reconciling artifact-gc pod" message= namespace=workflows phase=Succeeded pod=hello-world-1722430080-artgc-wfdel-3857365535 workflow=hello-world-1722430080
time="2024-07-31T12:52:33.673Z" level=info msg="reconciling artifact-gc pod" message= namespace=workflows phase=Succeeded pod=hello-world-1722430200-artgc-wfdel-3857365535 workflow=hello-world-1722430200

Logs from in your workflow's wait container

kubectl logs -n workflows -c wait -l workflows.argoproj.io/workflow=artifact-gc,workflow.argoproj.io/phase!=Succeeded
time="2024-07-31T12:58:18.288Z" level=info msg="No output artifacts"
time="2024-07-31T12:58:18.288Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: workflows-logs/artifact-gc/2024-07-31T12:58:14Z/main.log"
time="2024-07-31T12:58:18.301Z" level=info msg="Creating minio client using AWS SDK credentials"
time="2024-07-31T12:58:18.365Z" level=info msg="Saving file to s3" bucket=sgs-argo-prod-eu endpoint=s3.amazonaws.com key="workflows-logs/artifact-gc/2024-07-31T12:58:14Z/main.log" path=/tmp/argo/outputs/logs/main.log
time="2024-07-31T12:58:18.455Z" level=info msg="Save artifact" artifactName=main-logs duration=167.080514ms error="<nil>" key="workflows-logs/artifact-gc/2024-07-31T12:58:14Z/main.log"
time="2024-07-31T12:58:18.455Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2024-07-31T12:58:18.455Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2024-07-31T12:58:18.471Z" level=info msg="Alloc=10018 TotalAlloc=17754 Sys=24677 NumGC=5 Goroutines=12"
time="2024-07-31T12:58:18.477Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2024-07-31T12:58:18.477Z" level=info msg="Deadline monitor stopped"
@agilgur5 agilgur5 changed the title ArtifactGC not executing when specified in Workflow level ArtifactGC not executing on archiveLogs when specified in Workflow level Jul 31, 2024
@agilgur5 agilgur5 added P3 Low priority area/archive-logs Archive Logs feature area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more area/artifacts S3/GCP/OSS/Git/HDFS etc labels Jul 31, 2024
@agilgur5
Copy link
Member

This sounds similar if not identical to #13338, cc @juliev0. Also as I wrote there:

Since we don't recommend archive logs in the docs, I'm not sure it makes sense to fix or change this; we may very well remove the archive logs feature entirely

@agilgur5 agilgur5 added the solution/workaround There's a workaround, might not be great, but exists label Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/archive-logs Archive Logs feature area/artifacts S3/GCP/OSS/Git/HDFS etc area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more P3 Low priority solution/workaround There's a workaround, might not be great, but exists type/bug
Projects
None yet
Development

No branches or pull requests

2 participants