Skip to content

DefaultAWSCredentialsProviderChain does not work with credential_process and named profiles #2584

@stevenmanton

Description

@stevenmanton

Describe the bug

Using named profiles along with credential_process appears to be incompatible with the DefaultAWSCredentialsProviderChain class.

Expected Behavior

Using AWS profiles should be compatible with the credential_process.

Current Behavior

In using the PyArrow library I came across an issue where using credential_process to authenticate fails only when using a named AWS profile. That library simply calls to DefaultAWSCredentialsProviderChain, so it seems most likely an issue with the AWS SDK itself.

In short, suppose that I have a config file as follows:

[default]
region = us-east-1
credential_process = /path/to/get-creds.sh

[dev]
region = us-east-1
credential_process = /path/to/get-creds.sh

These commands both work, which validates that both accounts have access to S3:

aws s3 ls s3://bucket/path --profile default
aws s3 ls s3://bucket/path --profile dev

Since PyArrow uses DefaultAWSCredentialsProviderChain under the hood, we should be able to set the environment variables to control which profile is used. Let's use a simple script to test the behavior:

# script.py
import pyarrow.dataset as ds

dataset = ds.dataset("s3://bucket/path/")  # authenticates with `DefaultAWSCredentialsProviderChain`

Now, here's the strangeness:

# Check that no environment variables are set in this shell:
env | grep AWS

# This works:
python ./script.py
# So does this:
AWS_PROFILE=default python ./script.py
# But this fails:
AWS_PROFILE=dev python ./script.py

The error is:

Traceback (most recent call last):
  File "./script.py", line 3, in <module>
    dataset = ds.dataset("s3://bucket/path/")
  File "/home/antonstv/miniconda3/envs/pdna/lib/python3.8/site-packages/pyarrow/dataset.py", line 763, in dataset
    return _filesystem_dataset(source, **kwargs)
  File "/home/antonstv/miniconda3/envs/pdna/lib/python3.8/site-packages/pyarrow/dataset.py", line 446, in _filesystem_dataset
    fs, paths_or_selector = _ensure_single_source(source, filesystem)
  File "/home/antonstv/miniconda3/envs/pdna/lib/python3.8/site-packages/pyarrow/dataset.py", line 413, in _ensure_single_source
    file_info = filesystem.get_file_info(path)
  File "pyarrow/_fs.pyx", line 571, in pyarrow._fs.FileSystem.get_file_info
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: When getting information for key 'path' in bucket 'bucket': AWS Error ACCESS_DENIED during HeadObject operation: No response body.

I've also confirmed that the above script works if instead of credential_process, I used fixed credentials that were created by the credential process.

Reproduction Steps

See above.

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

I'm unsure; pyarrow is 12.0.1

Compiler and Version used

I'm unsure

Operating System and version

I'm unsure

Metadata

Metadata

Assignees

Labels

bugThis issue is a bug.response-requestedWaiting on additional info and feedback. Will move to "closing-soon" in 10 days.third-partyThis issue is related to third-party libraries or applications.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions