-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Document S3FileSystem
behavior in EC2
#35409
Comments
pyarrow.fs.S3FileSystem
behavior in EC2 pyarrow.fs.S3FileSystem
behavior in EC2
pyarrow.fs.S3FileSystem
behavior in EC2 S3FileSystem
behavior in EC2
@kou hello, can I get some eyes on the PR? |
kou
added a commit
that referenced
this issue
Aug 1, 2023
…C2 (#35312) ### Rationale for this change When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. I want to document this behavior for `pyarrow`. The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2. ### What changes are included in this PR? Documentation for the behavior described above. #### Technical Details `S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor. It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213) The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). ### Are these changes tested? No, just documentation changes ### Are there any user-facing changes? Yes, changing public documentation * Closes: #35409 ### Render Changes Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): `docs/source/python/filesystems.rst`: ![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6) `python/pyarrow/_s3fs.pyx`: ![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d) Lead-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
R-JunmingChen
pushed a commit
to R-JunmingChen/arrow
that referenced
this issue
Aug 20, 2023
… for EC2 (apache#35312) ### Rationale for this change When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. I want to document this behavior for `pyarrow`. The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2. ### What changes are included in this PR? Documentation for the behavior described above. #### Technical Details `S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor. It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213) The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). ### Are these changes tested? No, just documentation changes ### Are there any user-facing changes? Yes, changing public documentation * Closes: apache#35409 ### Render Changes Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): `docs/source/python/filesystems.rst`: ![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6) `python/pyarrow/_s3fs.pyx`: ![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d) Lead-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
loicalleyne
pushed a commit
to loicalleyne/arrow
that referenced
this issue
Nov 13, 2023
… for EC2 (apache#35312) ### Rationale for this change When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. I want to document this behavior for `pyarrow`. The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2. ### What changes are included in this PR? Documentation for the behavior described above. #### Technical Details `S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor. It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213) The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). ### Are these changes tested? No, just documentation changes ### Are there any user-facing changes? Yes, changing public documentation * Closes: apache#35409 ### Render Changes Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): `docs/source/python/filesystems.rst`: ![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6) `python/pyarrow/_s3fs.pyx`: ![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d) Lead-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
Context
While working with both
s3fs
andpyarrow
, I noticed the difference in documentation regarding establishing AWS credentials.s3fs
documentation explicitly mentions this scenario for using EC2pyarrow
documentation does not mention this behavior.After digging into
pyarrow
code and the C++ implementation, I was able to find thatpyarrow
will use IAM metadata provide in EC2 to establish credentials.Here's the PR to explain in depth and the associated change to the
pyarrow.fs.S3FileSystem
documentation:Component(s)
Documentation
The text was updated successfully, but these errors were encountered: