-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] GCS FileSystem - support federated identity #34595
Comments
CC @coryan perhaps? |
Yup, I am probably the right person to ask about this. Workload identity federation is supported starting with v2.6.0: https://github.com/googleapis/google-cloud-cpp/releases/tag/v2.6.0 We actually use the integration with GitHub actions in our own testing: https://github.com/googleapis/google-cloud-cpp/actions/workflows/external-account-integration.yml I am not sure how one goes about updating pyarrow to require google-cloud-cpp >= v2.6.0, I assume it depends on how it is getting installed? HTH |
In theory we just need to update this line to cover most users (e.g. pyarrow): arrow/cpp/thirdparty/versions.txt Line 52 in f10f5cf
In practice things typically aren't that simple (e.g. breaking changes or changes in the way the gcs library is built). |
I can give this a try later this month. FWIW, the only backwards incompatible change between 1.42.0 and 2.x is the requirement for C++14. Since arrow already requires C++17 that should be trivial. I do not recall any changes to the build requirements, but that is always risky. I assume there is no way for @martin-traverse to upgrade before then? |
Newer versions of `google-cloud-cpp` include support for workload identity federation (aka BYOID, aka federated identity). There are many other features and improvements too. FWIW, this library increased its major version (from 1.x to 2.x) simply because it dropped support for C++11. Since Arrow already requires C++17, this is a non-issue. ### Rationale for this change This fixes #34595. Newer versions of `google-cloud-cpp` support federated identity (and other features). ### What changes are included in this PR? ### Are these changes tested? N/A. The existing tests should cover this. ### Are there any user-facing changes? No. There are no public facing changes, nor breaks to public APIs. * Closes: #34595 Authored-by: Carlos O'Ryan <coryan@google.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
This is great, thanks so much for the quick turn-around! I look forward to simplifying our workflows when Arrow 12 is released :-) |
Newer versions of `google-cloud-cpp` include support for workload identity federation (aka BYOID, aka federated identity). There are many other features and improvements too. FWIW, this library increased its major version (from 1.x to 2.x) simply because it dropped support for C++11. Since Arrow already requires C++17, this is a non-issue. ### Rationale for this change This fixes apache#34595. Newer versions of `google-cloud-cpp` support federated identity (and other features). ### What changes are included in this PR? ### Are these changes tested? N/A. The existing tests should cover this. ### Are there any user-facing changes? No. There are no public facing changes, nor breaks to public APIs. * Closes: apache#34595 Authored-by: Carlos O'Ryan <coryan@google.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Newer versions of `google-cloud-cpp` include support for workload identity federation (aka BYOID, aka federated identity). There are many other features and improvements too. FWIW, this library increased its major version (from 1.x to 2.x) simply because it dropped support for C++11. Since Arrow already requires C++17, this is a non-issue. ### Rationale for this change This fixes apache#34595. Newer versions of `google-cloud-cpp` support federated identity (and other features). ### What changes are included in this PR? ### Are these changes tested? N/A. The existing tests should cover this. ### Are there any user-facing changes? No. There are no public facing changes, nor breaks to public APIs. * Closes: apache#34595 Authored-by: Carlos O'Ryan <coryan@google.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Describe the enhancement requested
Hello,
It seems that federated / external identities are not supported in the Arrow GcsFileSystem implementation? It would be great to support this. I'm using it in CI with GitHub as an IdP as per the instructions and it works great for the Google CLI tools.
I'm not sure how this is implemented, is there a standard library from Google that can just be linked / updated? The federated auth mechanism creates a regular Google application creds file, with the "type" set as "external_account". Is there an easy (ish) way to bring this in if the Google libs handle the different standard credential types? Or is that wishful thinking on my part?
I'm interested in the Python component but guess this would come to other languages that use the same underlying code if it was added.
Here is the stack trace from Arrow:
File "pyarrow/_fs.pyx", line 571, in pyarrow._fs.FileSystem.get_file_info
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: google::cloud::Status(INVALID_ARGUMENT: Permanent error GetObjectMetadata: Could not create a OAuth2 access token to authenticate the request. The request was not sent, as such an access token is required to complete the request successfully. Learn more about Google Cloud authentication at https://cloud.google.com/docs/authentication. The underlying error message was: Unsupported credential type (external_account) when reading Application Default Credentials file from [/path/to/credentials.json].). Detail: [errno 22] Invalid argument
Component(s)
Python
The text was updated successfully, but these errors were encountered: