Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] GCS FileSystem - support federated identity #34595

Closed
martin-traverse opened this issue Mar 16, 2023 · 5 comments · Fixed by #34707
Closed

[Python] GCS FileSystem - support federated identity #34595

martin-traverse opened this issue Mar 16, 2023 · 5 comments · Fixed by #34707

Comments

@martin-traverse
Copy link

Describe the enhancement requested

Hello,

It seems that federated / external identities are not supported in the Arrow GcsFileSystem implementation? It would be great to support this. I'm using it in CI with GitHub as an IdP as per the instructions and it works great for the Google CLI tools.

I'm not sure how this is implemented, is there a standard library from Google that can just be linked / updated? The federated auth mechanism creates a regular Google application creds file, with the "type" set as "external_account". Is there an easy (ish) way to bring this in if the Google libs handle the different standard credential types? Or is that wishful thinking on my part?

I'm interested in the Python component but guess this would come to other languages that use the same underlying code if it was added.

Here is the stack trace from Arrow:

File "pyarrow/_fs.pyx", line 571, in pyarrow._fs.FileSystem.get_file_info
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: google::cloud::Status(INVALID_ARGUMENT: Permanent error GetObjectMetadata: Could not create a OAuth2 access token to authenticate the request. The request was not sent, as such an access token is required to complete the request successfully. Learn more about Google Cloud authentication at https://cloud.google.com/docs/authentication. The underlying error message was: Unsupported credential type (external_account) when reading Application Default Credentials file from [/path/to/credentials.json].). Detail: [errno 22] Invalid argument

Component(s)

Python

@kou kou changed the title GCS FileSystem - support federated identity [Python] GCS FileSystem - support federated identity Mar 17, 2023
@westonpace
Copy link
Member

CC @coryan perhaps?

@coryan
Copy link
Contributor

coryan commented Mar 21, 2023

Yup, I am probably the right person to ask about this. Workload identity federation is supported starting with v2.6.0:

https://github.com/googleapis/google-cloud-cpp/releases/tag/v2.6.0

We actually use the integration with GitHub actions in our own testing:

https://github.com/googleapis/google-cloud-cpp/actions/workflows/external-account-integration.yml

I am not sure how one goes about updating pyarrow to require google-cloud-cpp >= v2.6.0, I assume it depends on how it is getting installed?

HTH

@westonpace
Copy link
Member

I am not sure how one goes about updating pyarrow to require google-cloud-cpp >= v2.6.0, I assume it depends on how it is getting installed?

In theory we just need to update this line to cover most users (e.g. pyarrow):

ARROW_GOOGLE_CLOUD_CPP_BUILD_VERSION=v1.42.0

In practice things typically aren't that simple (e.g. breaking changes or changes in the way the gcs library is built).

@coryan
Copy link
Contributor

coryan commented Mar 23, 2023

I can give this a try later this month. FWIW, the only backwards incompatible change between 1.42.0 and 2.x is the requirement for C++14. Since arrow already requires C++17 that should be trivial. I do not recall any changes to the build requirements, but that is always risky.

I assume there is no way for @martin-traverse to upgrade before then?

kou pushed a commit that referenced this issue Mar 24, 2023
Newer versions of `google-cloud-cpp` include support for workload identity federation (aka BYOID, aka federated identity). There are many other features and improvements too.  FWIW, this library increased its major version (from 1.x to 2.x) simply because it dropped support for C++11. Since Arrow already requires C++17, this is a non-issue.

### Rationale for this change

This fixes #34595.  Newer versions of `google-cloud-cpp` support federated identity (and other features).

### What changes are included in this PR?

### Are these changes tested?

N/A. The existing tests should cover this.

### Are there any user-facing changes?

No.  There are no public facing changes, nor breaks to public APIs.

* Closes: #34595

Authored-by: Carlos O'Ryan <coryan@google.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou kou added this to the 12.0.0 milestone Mar 24, 2023
@martin-traverse
Copy link
Author

This is great, thanks so much for the quick turn-around! I look forward to simplifying our workflows when Arrow 12 is released :-)

rtpsw pushed a commit to rtpsw/arrow that referenced this issue Mar 27, 2023
Newer versions of `google-cloud-cpp` include support for workload identity federation (aka BYOID, aka federated identity). There are many other features and improvements too.  FWIW, this library increased its major version (from 1.x to 2.x) simply because it dropped support for C++11. Since Arrow already requires C++17, this is a non-issue.

### Rationale for this change

This fixes apache#34595.  Newer versions of `google-cloud-cpp` support federated identity (and other features).

### What changes are included in this PR?

### Are these changes tested?

N/A. The existing tests should cover this.

### Are there any user-facing changes?

No.  There are no public facing changes, nor breaks to public APIs.

* Closes: apache#34595

Authored-by: Carlos O'Ryan <coryan@google.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
Newer versions of `google-cloud-cpp` include support for workload identity federation (aka BYOID, aka federated identity). There are many other features and improvements too.  FWIW, this library increased its major version (from 1.x to 2.x) simply because it dropped support for C++11. Since Arrow already requires C++17, this is a non-issue.

### Rationale for this change

This fixes apache#34595.  Newer versions of `google-cloud-cpp` support federated identity (and other features).

### What changes are included in this PR?

### Are these changes tested?

N/A. The existing tests should cover this.

### Are there any user-facing changes?

No.  There are no public facing changes, nor breaks to public APIs.

* Closes: apache#34595

Authored-by: Carlos O'Ryan <coryan@google.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants