Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload Folder to s3 Bucket #97

Open
jaibhavaya opened this issue Apr 9, 2018 · 11 comments
Open

Upload Folder to s3 Bucket #97

jaibhavaya opened this issue Apr 9, 2018 · 11 comments

Comments

@jaibhavaya
Copy link

Hello,

I see that currently there is no way to supply a folder rather than a file, to be uploaded to s3. Is there a reason why this wasn't implemented? In googling, I find that in the golang aws sdk there doesn't seem to be a straightforward way to do this, in stark constrast to the java sdk and even node sdk. I would love to contribute this if the methods exist, but wanted to make sure there wasn't any particular reason this wasn't implemented yet.

Thanks!

Jacob

@marco-m
Copy link

marco-m commented May 31, 2018

I think it is because an S3 bucket is not a filesystem, so uploading a directory (which might be recursive) is tricky to get right. The usual Concourse approach is simply to create an archive file containing the directory you want to upload (maybe filtering only the files you are really interested into). This works fine. There are also 3rd-party resources that do what you want, for example https://github.com/18F/s3-resource-simple

See also #66

@marco-m
Copy link

marco-m commented Sep 28, 2018

@vito are you open to this possibility or do you consider this out of scope for s3-resource ?

@vito
Copy link
Member

vito commented Sep 28, 2018

Sounds related to #55 too. As long as the resource can sanely version everything that it's uploading (perhaps by having a version number in the folder name or all filenames), it could make sense, but I'd still like to see the real-world use case for this (where uploading a .tgz would also not suffice) prior to someone implementing it. So far the ask has just been to upload a directory, but no one's given the context as to why. 🙂

@marco-m
Copy link

marco-m commented Sep 28, 2018

I can give one use case: from a Concourse pipeline, we upload to S3 two types of artifacts: a .tgz with s3-resource, and a directory with s3-resource-simple. The reason we upload a directory is because we use an S3 bucket as a place to publish the HTML documentation of a project. It is nice to upload directly the directory because then the bucket can be "served" as-is. The advantage is the simplicity, you don't need to find a way to "unpack" the .tgz on S3. Granted, I am not sure if this justifies the added complexity to s3-resource, so I don't think that this single use-case is enough.

@vito
Copy link
Member

vito commented Sep 28, 2018

Makes sense, with the caveat being that it would only make sense to use this resource for that if you also intend to version the documentation. If you're just looking to upload (and possibly replace) the docs each time you publish, that'd be better served by a publishing task or something.

@marco-m
Copy link

marco-m commented Sep 28, 2018

Yes, I forgot to mention that. The documentation is versioned, it is associated to a given commit.

@EduardoAC
Copy link

We have another example for this kind of resource, serving static assets from s3. We want to upload a folder containing all our static assets to be serve from our CDN connected to the bucket.

Currently, we are doing this through a command line aws s3 sync. I feel that using a resource will be much cleaner

@jchampio
Copy link

@EduardoAC +1, that's our use case as well.

@kallisti5
Copy link

Why isn't this a feature? It seems pretty basic to upload "a directory of objects" into object storage.
We don't even care about versioning, we handle versioning on our own via the names of the files.

@vito
Copy link
Member

vito commented Aug 13, 2019

@kallisti5 It's not a feature because no one has PR'd it. 🙂 Our team isn't big enough to handle everything ourselves.

@kallisti5
Copy link

kallisti5 commented Aug 19, 2019

@vito fair. I started hacking away at things. Theory at the moment is "disabling versioning" results in the ability to "just upload things"

diff --git a/README.md b/README.md
index 078532b..ff11366 100644
--- a/README.md
+++ b/README.md
@@ -35,6 +35,8 @@ version numbers.
 * `disable_ssl`: *Optional.* Disable SSL for the endpoint, useful for S3
   compatible providers without SSL.
 
+* `skip_versioning`: *Optional* Don't version artifacts, any previous artifacts will be overwritten.
+
 * `skip_ssl_verification`: *Optional.* Skip SSL verification for S3 endpoint. Useful for S3 compatible providers using self-signed SSL certificates.
 
 * `skip_download`: *Optional.* Skip downloading object from S3. Useful only trigger the pipeline without using the object.
@@ -51,7 +53,7 @@ version numbers.
 
 ### File Names
 
-One of the following two options must be specified:
+For versioning, one of the following two options must be specified:
 
 * `regexp`: *Optional.* The pattern to match filenames against within S3. The first
   grouped match is used to extract the version, or if a group is explicitly
diff --git a/check/command.go b/check/command.go
index b7d302d..965c184 100644
--- a/check/command.go
+++ b/check/command.go
@@ -24,6 +24,8 @@ func (command *Command) Run(request Request) (Response, error) {
 
        if request.Source.Regexp != "" {
                return command.checkByRegex(request), nil
+       } else if request.Source.SkipVersioning {
+               return command.checkByPath(request), nil
        } else {
                return command.checkByVersionedFile(request), nil
        }
diff --git a/models.go b/models.go
index c2f4adf..6225a1c 100644
--- a/models.go
+++ b/models.go
@@ -15,6 +15,7 @@ type Source struct {
        ServerSideEncryption string `json:"server_side_encryption"`
        SSEKMSKeyId          string `json:"sse_kms_key_id"`
        UseV2Signing         bool   `json:"use_v2_signing"`
+       SkipVersioning       bool   `json:"skip_versioning"`
        SkipSSLVerification  bool   `json:"skip_ssl_verification"`
        SkipDownload         bool   `json:"skip_download"`
        InitialVersion       string `json:"initial_version"`
@@ -25,14 +26,18 @@ type Source struct {
 }
 
 func (source Source) IsValid() (bool, string) {
-       if source.Regexp != "" && source.VersionedFile != "" {
-               return false, "please specify either regexp or versioned_file"
+       if !source.SkipVersioning && (source.Regexp != "" && source.VersionedFile != "") {
+               return false, "please specify either regexp or versioned_file for versioning"
        }
 
        if source.Regexp != "" && source.InitialVersion != "" {
                return false, "please use initial_path when regexp is set"
        }
 
+       if source.SkipVersioning && source.InitialPath != "" {
+               return false, "please use initial_path when not using versioning"
+       }
+
        if source.VersionedFile != "" && source.InitialPath != "" {
                return false, "please use initial_version when versioned_file is set"
        }

The logic though given how this resource works is weird. Thinking that "Initial Path" is the local path which gets recursively uploaded?

Haiku is running short on time... we need to get our package repository uploads working. Just using an s3 client as others have done as a workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants