Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional parameter to specify a cloud bucket as an output location? #85

Open
gotdan opened this issue Sep 3, 2020 · 1 comment
Open

Comments

@gotdan
Copy link
Collaborator

gotdan commented Sep 3, 2020

Open Questions:

Auth: Do we want to only target servers that have pre-configured write permissions to a bucket, or do we need a way to pass in auth credentials to the server? If so, what would this look like?

Path: Does the we need additional information in addition to the bucket name, like file prefix (eg. to support a "folder" within the bucket that incorporates a timestamp), or service provider (to support use cases where the server is writing to a bucket provided by a different cloud vendor)?

Completion: Should we require that the output manifest file be written to the bucket last so it could be used as an event to trigger followup actions (eg. a de-id or db load) or would we expect clients to use job polling to determine all files have been written?

@jmandel
Copy link
Collaborator

jmandel commented Sep 3, 2020

Re: auth and paths, I've been super impressed with the open-source rclone project, which has thought carefully and comprehensively about authorization for bucket access.

They have a JSON file describing their schema for cloud storage services, including provider types (e.g., s3) and providers which offer endpoints (e.g., AWS or DigitalOcean or Wasabi, all of whom offer s3-compatible APIs). So I might have a remote configured like:

{
    "access_key_id": "redacted",
    "acl": "private",
    "endpoint": "s3.wasabisys.com",
    "env_auth": "false",
    "provider": "Wasabi",
    "secret_access_key": "redacted",
    "type": "s3"
}

Anyway, if we wanted to standardize on how to convey access, the rclone config format is a great place to look.

We might also try to profile some "common denominator" of shared access signatures / signed URLs at the bucket level.


... but even if we leave authorization out of band, I think having a way to point to a bucket would be lovely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants