Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argo tries to untar artifacts, even if they aren't tarballs (when archive is none) #3012

Closed
4 tasks done
youngjoon-lee opened this issue May 12, 2020 · 5 comments · Fixed by #3014
Closed
4 tasks done
Assignees
Labels

Comments

@youngjoon-lee
Copy link
Contributor

youngjoon-lee commented May 12, 2020

Checklist:

  • I've included the version.
  • I've included reproduction steps.
  • I've included the workflow YAML.
  • I've included the logs.

What happened:
When I generate an artifact which is a .gz, and when I set archive to none, the artifact is uploaded well without archiving.
But, when it's used as an input artifact, Argo tries to untar it, even though it's not a tarball.
It happens only when the file is .gz and it's too small.

What you expected to happen:
I expect Argo doesn't try to untar it.

How to reproduce it (as minimally and precisely as possible):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: test-
spec:
  entrypoint: main
  templates:
  - name: main
    steps:
    - - name: generate-small
        template: generate-small
      - name: generate-big
        template: generate-big
    - - name: consume-artifact
        template: print-contents
        arguments:
          artifacts:
          - name: small
            from: "{{steps.generate-small.outputs.artifacts.csv-gz}}"
          - name: big
            from: "{{steps.generate-big.outputs.artifacts.csv-gz}}"

  - name: generate-small
    script:
      image: debian:latest
      command: [bash]
      source: |
        echo "hello" > small.csv
        gzip -c small.csv > /tmp/small.csv.gz
    outputs:
      artifacts:
      - name: csv-gz
        path: /tmp/small.csv.gz
        archive:
          none: {}

  - name: generate-big
    script:
      image: debian:latest
      command: [bash]
      source: |
        touch big.csv
        for i in {1..100}; do echo "helloworld-${i}" >> big.csv; done
        gzip -c big.csv > /tmp/big.csv.gz
    outputs:
      artifacts:
      - name: csv-gz
        path: /tmp/big.csv.gz
        archive:
          none: {}

  - name: print-contents
    inputs:
      artifacts:
      - name: small
        path: /tmp/small.csv.gz
      - name: big
        path: /tmp/big.csv.gz
    script:
      image: debian:latest
      command: [bash]
      source: |
        zcat /tmp/small.csv.gz 
        zcat /tmp/big.csv.gz 

Anything else we need to know?:
I used S3 as the Artifactory.

Environment:

  • Argo version:
$ argo version
argo: v2.7.2
  BuildDate: 2020-04-10T19:24:53Z
  GitCommit: c52a65aa62426f5e874e1d3f1058af15c43eb35f
  GitTreeState: clean
  GitTag: v2.7.2
  GoVersion: go1.13.4
  Compiler: gc
  Platform: linux/amd64
  • Kubernetes version :
$ kubectl version -o yaml
clientVersion:
  buildDate: "2020-01-18T23:30:10Z"
  compiler: gc
  gitCommit: 59603c6e503c87169aea6106f57b9f242f64df89
  gitTreeState: clean
  gitVersion: v1.17.2
  goVersion: go1.13.5
  major: "1"
  minor: "17"
  platform: linux/amd64
serverVersion:
  buildDate: "2020-04-03T15:20:08Z"
  compiler: gc
  gitCommit: 1fc58d90cc4bc569394d9a882522adcb99fbaa57
  gitTreeState: clean
  gitVersion: v1.16.7
  goVersion: go1.13.6
  major: "1"
  minor: "16"
  platform: linux/amd64

Other debugging information (if applicable):

  • workflow result:
    big.csv.gz was loaded well, but small.csv.gz was loaded as a directory because of untar.
$ argo logs test-qp8r2 | head -n 5
test-qp8r2-886468555: gzip: /tmp/small.csv.gz is a directory -- ignored
test-qp8r2-886468555: helloworld-1
test-qp8r2-886468555: helloworld-2
test-qp8r2-886468555: helloworld-3
test-qp8r2-886468555: helloworld-4
  • executor logs:

From logs, we can see Argo recognized small.csv.gz as a tarball.

$ kubectl -n argo logs test-qp8r2-886468555 -c init
time="2020-05-12T18:29:26Z" level=info msg="Getting from s3 (endpoint: argo-artifacts.argo:9000, bucket: argo-artifacts, key: test-qp8r2/test-qp8r2-4025503258/small.csv.gz) to /argo/inputs/artifacts/small.tmp"
time="2020-05-12T18:29:26Z" level=info msg="[tar -tf /argo/inputs/artifacts/small.tmp]"
time="2020-05-12T18:29:26Z" level=info msg="tar -xf /argo/inputs/artifacts/small.tmp -C /argo/inputs/artifacts/small.tmpdir"
time="2020-05-12T18:29:26Z" level=info msg="Successfully download file: /argo/inputs/artifacts/small"
time="2020-05-12T18:29:26Z" level=info msg="Downloading artifact: big"
time="2020-05-12T18:29:26Z" level=info msg="S3 Load path: /argo/inputs/artifacts/big.tmp, key: test-qp8r2/test-qp8r2-3282682115/big.csv.gz"
time="2020-05-12T18:29:26Z" level=info msg="Creating minio client argo-artifacts.argo:9000 using static credentials"
time="2020-05-12T18:29:26Z" level=info msg="Getting from s3 (endpoint: argo-artifacts.argo:9000, bucket: argo-artifacts, key: test-qp8r2/test-qp8r2-3282682115/big.csv.gz) to /argo/inputs/artifacts/big.tmp"
time="2020-05-12T18:29:26Z" level=info msg="[tar -tf /argo/inputs/artifacts/big.tmp]"
time="2020-05-12T18:29:26Z" level=info msg="Successfully download file: /argo/inputs/artifacts/big"

Message from the maintainers:

If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@alexec
Copy link
Contributor

alexec commented May 12, 2020

Thank you for your bug report. Would you be interesting in helping test or ever develop the fix?

@youngjoon-lee
Copy link
Contributor Author

Thank you for your bug report. Would you be interesting in helping test or ever develop the fix?

Hi @alexec
Thank you for your quick reply! :) I actually made a PR #3014.
I hope it can be reviewed when you have time.

@alexec alexec self-assigned this May 12, 2020
@jdu
Copy link
Contributor

jdu commented Jun 17, 2020

Is there a workaround for this currently until a new stable release is out? I'm hitting this problem pretty hard right now with .json.gz files.

@youngjoon-lee
Copy link
Contributor Author

youngjoon-lee commented Jun 17, 2020

@jdu I recommend to generate output artifacts without the following option, until a new stable release is out.

archive:
  none: {}

Then, argo will tar/untar it automatically, even though the artifact consists of only one file.

@jdu
Copy link
Contributor

jdu commented Jun 17, 2020

@youngjoon-lee Ah yes, the issue is the artifact is an input, and the input is coming from an S3 bucket where argo wasn't the one that populated the S3 bucket.

WIll have to rework things a bit and pull the file inside of the task so it doesn't get untarred for now.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants