Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Document S3FileSystem behavior in EC2 #35409

Closed
kevinjqliu opened this issue May 3, 2023 · 1 comment · Fixed by #35312
Closed

[Docs] Document S3FileSystem behavior in EC2 #35409

kevinjqliu opened this issue May 3, 2023 · 1 comment · Fixed by #35312

Comments

@kevinjqliu
Copy link
Contributor

Describe the enhancement requested

Context

While working with both s3fs and pyarrow, I noticed the difference in documentation regarding establishing AWS credentials.

s3fs documentation explicitly mentions this scenario for using EC2

Boto will try the following methods, in order:
...
* for nodes on EC2, the IAM metadata provider

pyarrow documentation does not mention this behavior.

After digging into pyarrow code and the C++ implementation, I was able to find that pyarrow will use IAM metadata provide in EC2 to establish credentials.

Here's the PR to explain in depth and the associated change to the pyarrow.fs.S3FileSystem documentation:

Component(s)

Documentation

@jorisvandenbossche jorisvandenbossche changed the title Document pyarrow.fs.S3FileSystem behavior in EC2 [Docs] Document pyarrow.fs.S3FileSystem behavior in EC2 May 4, 2023
@jorisvandenbossche jorisvandenbossche changed the title [Docs] Document pyarrow.fs.S3FileSystem behavior in EC2 [Docs] Document S3FileSystem behavior in EC2 May 4, 2023
@kevinjqliu kevinjqliu removed their assignment Jul 30, 2023
@kevinjqliu
Copy link
Contributor Author

@kou hello, can I get some eyes on the PR?

@kou kou closed this as completed in #35312 Aug 1, 2023
@kou kou added this to the 14.0.0 milestone Aug 1, 2023
kou added a commit that referenced this issue Aug 1, 2023
…C2 (#35312)

### Rationale for this change

When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. 

I want to document this behavior for `pyarrow`.  The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2.

### What changes are included in this PR?

Documentation for the behavior described above. 

#### Technical Details
`S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor.  It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213)

The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). 

### Are these changes tested?

No, just documentation changes

### Are there any user-facing changes?

Yes, changing public documentation

* Closes: #35409

### Render Changes
Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): 
`docs/source/python/filesystems.rst`:
![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6)

`python/pyarrow/_s3fs.pyx`:
![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d)

Lead-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this issue Aug 20, 2023
… for EC2 (apache#35312)

### Rationale for this change

When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. 

I want to document this behavior for `pyarrow`.  The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2.

### What changes are included in this PR?

Documentation for the behavior described above. 

#### Technical Details
`S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor.  It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213)

The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). 

### Are these changes tested?

No, just documentation changes

### Are there any user-facing changes?

Yes, changing public documentation

* Closes: apache#35409

### Render Changes
Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): 
`docs/source/python/filesystems.rst`:
![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6)

`python/pyarrow/_s3fs.pyx`:
![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d)

Lead-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
… for EC2 (apache#35312)

### Rationale for this change

When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. 

I want to document this behavior for `pyarrow`.  The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2.

### What changes are included in this PR?

Documentation for the behavior described above. 

#### Technical Details
`S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor.  It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213)

The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). 

### Are these changes tested?

No, just documentation changes

### Are there any user-facing changes?

Yes, changing public documentation

* Closes: apache#35409

### Render Changes
Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): 
`docs/source/python/filesystems.rst`:
![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6)

`python/pyarrow/_s3fs.pyx`:
![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d)

Lead-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants