Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulkio: support passing key in query param for google cloud storage #31602

Closed
dt opened this issue Oct 18, 2018 · 6 comments
Closed

bulkio: support passing key in query param for google cloud storage #31602

dt opened this issue Oct 18, 2018 · 6 comments
Assignees
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@dt
Copy link
Member

dt commented Oct 18, 2018

Motivation/Context

Currently interacting with files on google cloud storage in backup/restore/import/export requiring using either the implicit "instance" auth (where the file access is done using the vm's auth) or using a key stored in a the cloudstorage.gs.default.key key (identified by auth=default). This setting was devised and named with the intent that we could define other settings to save multiple keys, though support for arbitrary, user-defined settings names does not yet exist. We went with this approach because "key" for gcs access was a large, cumbersome JSON blob that we judged would be clunky to put in a query parameter.

However using settings has some drawbacks -- only the root user can read and write them, and they are global to a cluster -- anyone who can use the cluster can use them. This is less than ideal -- ideally a user who can access a specific GCS bucket should be able to backup or restore to it independently of what a different user, on the same cluster, can do. Making auth per-statement avoids this issue. Most of the time a human isn't running a BACKUP by hand anyway -- a script composes the query by combining variables, so it doesn't matter how large a key is (up to the max URL length).

Plan

  1. Allow supplying the gcs key json bundle in as a base64 encoded blob in a query param to bulk IO ops
  2. Deprecate the gcs key setting.
  3. Add a settings to enable "implicit" / "instance" auth for each SDK where use allow using it, defaulting to disabled, so that simply being allowed to run bulk io ops doesn't automatically allow using the buckets that the machine's account has access to.
@dt
Copy link
Member Author

dt commented Oct 18, 2018

@rolandcrosby
Copy link

This seems like a good plan. Does 3) mean you're also considering implementing implicit auth for AWS S3 when CockroachDB is running on AWS EC2? We got some questions about this recently from a prospective customer, and it sounded like a relatively simple addition using the ec2rolecreds package in the Go SDK for S3.

@rolandcrosby
Copy link

We should also consider how to allow users to pass additional parameters to our various cloud storage options. For example, S3 allows users to configure server-side encryption for PutObject requests via three x-amz-server-side-encryption headers; I'm sure other cloud storage services have similar options that customers will ask us to implement at some point. Do you have any thoughts about how to extensibly support these kinds of cloud-service-specific options?

@maddyblue
Copy link
Contributor

We can add more of these options as requested. I think they are pretty straightforward.

@dt
Copy link
Member Author

dt commented Oct 18, 2018

Let's file them individually though -- I've like to be able to close this one once we implement the described auth params.

@rolandcrosby
Copy link

@petermattis petermattis added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Oct 19, 2018
thoszhang pushed a commit to thoszhang/cockroach that referenced this issue Nov 21, 2018
Previously, the options for GCS authentication for bulk IO were:

* set the value of `cloudstorage.gs.default.key` (`auth=default`)
* use the GCP SDK's implicit auth (`auth=implicit`)

This change adds the option `auth=specified`, allowing the JSON key object to
be sent in the new `credentials` query param, allowing for authentication on a
per-statement basis.

Fixes cockroachdb#31602

Release note (enterprise change): Adds the option to supply Google Cloud
Storage credentials on a per-statement basis with the query parameter
`credentials`.
craig bot pushed a commit that referenced this issue Dec 3, 2018
32544: storageccl: support per-statement credentials param for GCS r=lucy-zhang a=lucy-zhang

Previously, the options for GCS authentication for bulk IO were:

* set the value of `cloudstorage.gs.default.key` (`auth=default`)
* use the GCP SDK's implicit auth (`auth=implicit`)

This change adds the option `auth=specified`, allowing the JSON key object to
be sent in the new `credentials` query param, allowing for authentication on a
per-statement basis.

Fixes #31602

Release note (enterprise change): Adds the option to supply Google Cloud
Storage credentials on a per-statement basis with the query parameter
`credentials`.

Co-authored-by: Lucy Zhang <lucy-zhang@users.noreply.github.com>
@craig craig bot closed this as completed in #32544 Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

5 participants