New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output artifact gets deleted even when it resides on a volume mount #4676
Comments
Interesting. Could be pretty nasty. We'll discuss. |
Perhaps there is no need to delete the artifact files after uploading them.
This scenario seems pretty strange. If the output data is generated by the component, then it's usually located on the local disk, not the mounted volume.
Perhaps it might be better to make uploading explicit and add an |
It's actually quite common and the recommended way to pass data between steps, without artifact upload and then download. We are talking about "heavy" artifacts, multiple gigabytes.
As an optimization we typically do that, so that the upload can happen concurrently with other workflow steps (otherwise, the step that creates the output data, doesn't end until the artifact upload is complete, preventing the dependent steps from starting). However this upload template is just a dummy busybox |
…proj#4676 Signed-off-by: Alex Collins <alex_collins@intuit.com>
@antoniomo. I've created a dev build for you to test. Can you please test |
Thanks, I can give it a go over the weekend! |
Hi! I used this modified example workflow for testing: apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: volumes-existing-
spec:
entrypoint: volumes-existing-example
volumes:
# Pass my-existing-volume as an argument to the volumes-existing-example template
# Same syntax as k8s Pod spec
- name: workdir
persistentVolumeClaim:
claimName: my-existing-volume
templates:
- name: volumes-existing-example
steps:
- - name: generate
template: whalesay
- - name: print
template: print-message
- name: whalesay
container:
image: docker/whalesay:latest
command: [sh, -c]
args: ["echo generating message in volume; cowsay hello world | tee /mnt/vol/hello_world.txt"]
volumeMounts:
- name: workdir
mountPath: /mnt/vol
outputs:
artifacts:
- name: hello
path: /mnt/vol/hello_world.txt
- name: print-message
container:
image: alpine:latest
command: [sh, -c]
args: ["echo getting message from volume; find /mnt/vol; cat /mnt/vol/hello_world.txt"]
volumeMounts:
- name: workdir
mountPath: /mnt/vol It completes just fine, including the expected output on the second step, reading from the volume after the output artifact upload (to the default artifact storage here). The logs of the wait container on the first step correctly show:
So I think it works as expected and is good to go :) Thank you! |
Summary
If we have an output artifact from a mounted PVC or some other Volume mount, it shouldn't be deleted after upload.
The code: https://github.com/argoproj/argo/blob/master/workflow/executor/executor.go#L343-L348
Assumes that the artifact is on the container ethereal storage, but that might not be the case.
We use a volume to share data fast between workflow steps. However, some of the outputs produced on intermediate steps, we want to upload as output artifacts right away to also use elsewhere. We see that Argo removes the files of the output artifacts, so we can't use them on the next steps... which goes against the recommended advice here: https://github.com/argoproj/argo/blob/master/docs/cost-optimisation.md#consider-trying-volume-claim-templates-or-volumes-instead-of-artifacts
Diagnostics
What Kubernetes provider are you using? 1.18
What version of Argo Workflows are you running? 2.9.3
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: