Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: oss list bucket return all records #12084

Merged
merged 1 commit into from
Oct 27, 2023

Conversation

shuangkun
Copy link
Member

when list a bucket over 100 object, the response can only return 100 records, so wrote a loop to return all records.

Fixes #TODO

Motivation

Modifications

Verification

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Can you add some notes in the code? Is there any reference to the 100 records limit?

@shuangkun
Copy link
Member Author

Thank you. Can you add some notes in the code? Is there any reference to the 100 records limit?

https://help.aliyun.com/zh/oss/user-guide/list-objects-18?spm=a2c4g.11186623.0.0.275f7711UTgi3H
there are some offical examples.

    continueToken := ""
    for {
         lsRes, err := bucket.ListObjectsV2(oss.ContinuationToken(continueToken))
         if err != nil {
             HandleError(err)
          }
          // 打印列举结果。默认情况下,一次返回100条记录。
          for _, object := range lsRes.Objects {
              fmt.Println(object.Key, object.Type, object.Size, object.ETag, object.LastModified, object.StorageClass)
          }
          if lsRes.IsTruncated {
              continueToken = lsRes.NextContinuationToken
          } else {
              break
          }
      }

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also bump max-keys to 1000?

@shuangkun
Copy link
Member Author

Should we also bump max-keys to 1000?

Maybe we need a configuration to limit it, because some customers' oss objects are indeed relatively large, exceeding 1000.

@shuangkun
Copy link
Member Author

/CI

@@ -236,12 +236,21 @@ func (ossDriver *ArtifactDriver) ListObjects(artifact *wfv1.Artifact) ([]string,
if err != nil {
return !isTransientOSSErr(err), err
}
results, err := bucket.ListObjectsV2(oss.Prefix(artifact.OSS.Key))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oss.Prefix is removed accidentally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i forget it. thanks.

Comment on lines 249 to 255
continueToken = results.NextContinuationToken
if !results.IsTruncated {
break

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check whether it's truncated before accessing next continuation token

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

if err != nil {
return !isTransientOSSErr(err), err
}
// Add to files. By default, 100 records are returned at a time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@terrytangyuan
Copy link
Member

Can you address my comments and then run some tests if you have OSS available?

@terrytangyuan
Copy link
Member

Can you test it against actual OSS environment and see if everything is working correctly?

@shuangkun
Copy link
Member Author

Can you test it against actual OSS environment and see if everything is working correctly?

OK. I'll run some tests later.

@shuangkun
Copy link
Member Author

Can you test it against actual OSS environment and see if everything is working correctly?

I made a test use the file. It works well.
After fixing this problem, more than 100 steps can be triggered when the directory to be listed has more than 100 files.(There were only 100 before)

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: data-transformations-
  labels:
    workflows.argoproj.io/test: "true"
  annotations:
    workflows.argoproj.io/description: |
      This workflow demonstrates using a data template to list in an S3 bucket
      and then process those log files.
    workflows.argoproj.io/version: ">= 3.1.0"
spec:
  entrypoint: data-transformations
  templates:
    - name: data-transformations
      steps:
        - - name: list-log-files
            template: list-log-files
        - - name: process-logs
            template: process-logs
            withParam: "{{steps.list-log-files.outputs.result}}"
            arguments:
              artifacts:
                - name: file
                  oss:
                    endpoint: http://oss-cn-hangzhou.aliyuncs.com
                    bucket: bucket-workflow-artifect
                    accessKeySecret:
                      name: bucket-workflow-artifect-credentials
                      key: accessKey
                    secretKeySecret:
                      name: bucket-workflow-artifect-credentials
                      key: secretKey
                    key: "{{item}}"

              parameters:
                - name: file-name
                  value: "{{item}}"

    - name: list-log-files
      data:
        source:
          artifactPaths:
            name: test-bucket
            oss:
              endpoint: http://oss-cn-hangzhou.aliyuncs.com
              bucket: bucket-workflow-artifect
              key: artifact-passing-wsl88 # this is path in the bucket
              accessKeySecret:
                name: bucket-workflow-artifect-credentials
                key: accessKey
              secretKeySecret:
                name: bucket-workflow-artifect-credentials
                key: secretKey

        transformation:
          - expression: "filter(data, {# startsWith \"arti\"})"
      outputs:
        artifacts:
          - name: file
            path: /tmp

    - name: process-logs
      inputs:
        parameters:
          - name: file-name
        artifacts:
          - name: file
            path: /tmp
      container:
        image:  argoproj/argosay:v2
        command: [ sh, -c ]
        args:
          - |
            echo {{inputs.parameters.file-name}}
            head /tmp

@shuangkun shuangkun changed the title fix: oss list bucket return all records fix: oss list bucket return all records Oct 27, 2023
Signed-off-by: shuangkun <tsk2013uestc@163.com>
Signed-off-by: zjgemi <liuxin_zijian@163.com>
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@terrytangyuan terrytangyuan enabled auto-merge (squash) October 27, 2023 14:58
@terrytangyuan terrytangyuan merged commit fa15743 into argoproj:master Oct 27, 2023
26 of 27 checks passed
dpadhiar pushed a commit to dpadhiar/argo-workflows that referenced this pull request May 9, 2024
Signed-off-by: Dillen Padhiar <dillen_padhiar@intuit.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants