Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Work with Temporary AWS Security Credentials #1514

Open
omad opened this issue Nov 24, 2023 · 6 comments
Open

Unable to Work with Temporary AWS Security Credentials #1514

omad opened this issue Nov 24, 2023 · 6 comments

Comments

@omad
Copy link
Member

omad commented Nov 24, 2023

Expected behaviour

We should be able to use temporary/expiring/refreshing AWS security credentials while running ODC code. Eg. via the AWS AssumeRoleWithWebIdentity - AWS Security Token Service API call.

This can handled automatically by boto3.

When you do this, Boto3 will automatically make the corresponding AssumeRoleWithWebIdentity calls to AWS STS on your behalf. It will handle in-memory caching as well as refreshing credentials, as needed.

Actual behaviour

ODC code accessing AWS APIs (like S3) work initially when the correct environment variables are set, but start and continue to fail once the credentials expire, which for OIDC/WebIdentityProvider defaults to 2 hours. They are never renewed.

This is inadequate for long processing jobs and for server applications.

More details

There is a comment https://github.com/opendatacube/datacube-core/blob/develop/datacube/utils/aws/__init__.py#L468-L472 indicating that this is known behaviour when using datacube.utlis.aws.configure_s3_access().

Fixing this may be as simple as removing most of the custom AWS setup code we have... as I believe some of it is no longer required with better support of AWS in GDAL and rasterio.

Environment information

  • Which datacube --version are you using?
    1.8.17

  • What datacube deployment/enviornment are you running against?

@benjimin

@SpacemanPaul
Copy link
Contributor

Is there a reason you can't used IAM credentials (which autorenew)? (Can be configured in the datacube.conf config file or via environment variables.)

@Kirill888
Copy link
Member

they only auto-renew when using boto3 library, once inside GDAL they no longer do. What's more, it's really tricky to tell why read failed, there is no clear "expired credentials error". One can solve this by running a service thread that copies frozen creds from boto3 to GDAL on a regular interval. Since we don't really have a place to put "IO driver state for a given dc.load", we use globals and hacky Dask code injections to make authentication work at all.

Proper solution will require introducing "shared state" for dc.load io driver, right now we can't even access two different buckets with two different sets of credentials from the same process or on the same Dask cluster.

@omad
Copy link
Member Author

omad commented Nov 28, 2023

There's several types of AWS Credentials. What I'm interested in now is using the AWS AssumeRoleWithWebIdentity, which is similar to OIDC. Support was added to GDAL in 3.6 (November 2022).

KK:

once inside GDAL they no longer do

This used to be the case, but I think it was fixed in 3.1.0.

KK:

right now we can't even access two different buckets with two different sets of credentials from the same process

I'm not sure about how rasterio or ODC fits in, but GDAL since 3.5 has support for using a configuration file to define per path prefix credentials or options.

@Kirill888
Copy link
Member

should have said "the way we put those into GDAL using rasterio can not be refreshed and can not use multiple sets of credentials". Does rasterio support talking to GDAL in a way that allows AssumeRoleWithWebIdentity with auto-refresh? That would be the first thing to figure out.

@Kirill888
Copy link
Member

Looks like rasterio's AWSSession is not aware of AWS_WEB_IDENTITY_TOKEN_FILE environment variable, and just pushes frozen credentials obtained with boto3 into GDAL. You'll need custom AWSSession class that doesn't use boto3 to get a set of frozen creds and instead allow GDAL to deal with it internally.

https://github.com/rasterio/rasterio/blob/b8911b29001c0a2e67320741770bc35b260ed88e/rasterio/session.py#L318-L335

Maybe raise an issue in rasterio, something like "Support delegating AWS credentialization to GDAL".

In datacube custom Session would be plugged in here based on some config setting:

session = AWSSession(**aws)

@omad
Copy link
Member Author

omad commented Nov 29, 2023

Thanks for the pointers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants