Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Refresh credentials from file #1993

Closed
mikkeloscar opened this issue Jun 14, 2018 · 19 comments
Closed

[Feature request] Refresh credentials from file #1993

mikkeloscar opened this issue Jun 14, 2018 · 19 comments
Labels
feature-request A feature should be added or improved.

Comments

@mikkeloscar
Copy link

mikkeloscar commented Jun 14, 2018

This is a feature request to support reading temporary credentials from a file i.e. the shared credentials file, and refresh those credentials when they expire assuming the file has been updated by some external source.

Other credential providers like the assume role provider implements refresh of credentials and the docstring for the SharedCredentialsProvider also suggests to support this (at least how I read it) but unfortunately it only reads the credentials once and implements a "fake" IsExpired() method.

Given that you can store temporary credentials in the shared credentials file already it would make sense if the SDK could automatically reload the file once the credentials expire. This way you could have some external tool that refreshes the credentials in the file and the SDK would not have to be restarted to reload them.

My specific use case is to solve a problem of distributing AWS IAM credentials to Pods in Kubernetes clusters hopefully helping a lot of EKS users in the future!
I have written a controller that can do this (README explains how it solves some problems not fixed in other solutions).
However it relies on the fact that the AWS SDKs can refresh credentials from a file, which unfortunately isn't the case at the moment and the reason why I'm reaching out.

I realize that this fix needs to be done in every AWS SDK, but Go is what I'm most familiar with so my hope is I can start the discussion here and maybe you can help me get in contact with the right people in case this needs coordination across different AWS SDK implementations.

I have already talked with the EKS team about this, but they did not have bandwidth to focus on this topic at the moment.

I have done my own quick implementation in the Go SDK. Feedback very welcome also if I should approach it differently. E.g. I have not considered any possible backwards compatible issues with this change.

master...mikkeloscar:file-refresh

For reference here is a community driven comparison of solutions to the problem of distributing AWS IAM credentials in Kubernetes clusters.

@xibz xibz added the feature-request A feature should be added or improved. label Jun 15, 2018
@xibz
Copy link
Contributor

xibz commented Jun 15, 2018

Hello @mikkeloscar, thank you for reaching out to us and proposing this. We have a very similar concept proposed. I'll mark this as a feature request and bring this up in our next sprint.

@mikkeloscar
Copy link
Author

@xibz awesome! If there's anything I can do to help out, just let me know!

@mikkeloscar
Copy link
Author

@xibz Any news on this? I'll be happy to help in any way I can to get this going.

@jasdel
Copy link
Contributor

jasdel commented Jul 18, 2018

@mikkeloscar thanks again for reaching out to us with this feature request. We decided that while this feature can be useful, bundling it into the SDKs introduces the potential for race conditions, and cross platform compatibility support. Performing atomic file locking and change notification do not have feature parity across platforms, and can introduce unexpected instability.

I think there are a couple alternative ways your IAM Controller for Kubernetes could solve the EC2 Metadata request timeout (<1s) issue you're seeing.

Generic HTTP Credential provider

The SDK includes a Generic HTTP Credential Provider which will attempt to retrieve credentials from an HTTP endpoint on the 169.254.170.2 IP. This credential provider supports expiration just like EC2 Instance roles. Your application(s) can use this credential provider directly via code, or rely on the SDK's default credential chain using the environment variable, AWS_CONTAINER_CREDENTIALS_RELATIVE_URI to retrieve credentials from an HTTP endpoint similar to what your Kubernetes Controller is doing with the EC2 Metadata proxy. The AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable's value is a URI path that the SDK will make an HTTP request to. Even better, this credential provider is supported by the default chain across all of the AWS SDKs.

In the following example the SDK will make an HTTP request to http://169.254.170.2/MyCreds, if no other credentials are found in the credential chain, (i.e no environment, nor file credentials)

AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/MyCreds ./MyApp

The HTTP response body is a JSON value containing the credentials the SDK should use. See the IAM Roles for Tasks documentation for more details of this JSON value.

{
    "AccessKeyId": "ACCESS_KEY_ID",
    "Expiration": "EXPIRATION_DATE",
    "RoleArn": "TASK_ROLE_ARN",
    "SecretAccessKey": "SECRET_ACCESS_KEY",
    "Token": "SECURITY_TOKEN_STRING"
}

aws.Config.EC2MetadataDisableTimeoutOverride

Alternatively, setting the aws.Config.EC2MetadataDisableTimeoutOverride configuration value to true within your Go application will prevent the SDK from using a significantly shorter request timeout, in comparison to the <1s timeout used by default when making EC2 Metadata requests.

For this solution I think we can also consider adding support for an environment variable which uses non-shortened EC2 Metadata request timeout by default without modifying the application(s) running in the pod directly.

@jasdel jasdel added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. closing-soon This issue will automatically close in 4 days unless further comments are made. and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Jul 18, 2018
@mikkeloscar
Copy link
Author

@jasdel Thanks for getting back to me on this.

While I really appreciate you giving a reason for not wanting to go forward with this feature request, I'm sad that you will not consider it because of the reasons you give.

I don't see what file locking issues there would be. In my mind the implementation should just be a simple read of the file, and if the credentials, at the time of reading, are up to date then fine, otherwise it would just fail to get credentials. It would be up to some external tool to keep the files up to date.

As an anecdote we implemented something similar for our IAM infrastructure in 2016 at Zalando and are currently using this in 85 Kubernetes clusters. Our experience is that this approach is much more stable than anything involving the network.

I have ofc. considered what you propose, and it has also been discussed multiple times in the Kubernetes community. However I still think that the approach is so much more complicated to get right, that I really don't see the reason for not having the simple file read support in the SDKs. If you have something that calls an HTTP endpoint, you need to ensure that this endpoint is available at all times, this mean you need to run an HA setup in order to do updates of you component. Additionally you need to have retries in your clients to account for network issues, something you most likely won't need reading from a local file as you would have bigger problems if this started to fail.

If I have to ask users to set aws.Config.EC2MetadataDisableTimeoutOverride in the applications then I might as well ask them to use a custom SDK with file refresh support, so I don't think this is so helpful. Having an environment variable would be helpful, but I still consider this a hack rather than a robust solution.

I wish you would reconsider this feature request, but if not, there is nothing I can do about it. Either way, thanks for looking at my request.

@jasdel
Copy link
Contributor

jasdel commented Aug 24, 2018

Thanks for the feedback and update @mikkeloscar.

In the context of Kubernetes our concerns with the refreshing from a file are probably manageable. Given that Kubernetes would own the environment be responsible for writing the file as needed. Our biggest concern with this file being refreshed is two factor. Usage outside of Kubernetes, and that the shared configuration and shared credential files are interchangeable for the most part. This means that there are non-credential related information that would be refreshed. The additional information in the files doesn't make sense to be refreshed during the lifetime of an application.

A http endpoint for retrieving credentials per pod provides the best flexibility for applications of users. The issue with only being able share a single environment variable for AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is problematic. Especially, if pods should have different roles they assume. But if all Pods are able to use the same environment variable using this method of setting up the HTTP endpoint is the best. In addition SDKs will not apply short timeouts for the endpoint since it must be explicitly enabled unlike EC2 Instance roles which is automatic.

I think further discussing EC2MetadataDisableTimeoutOverride as an environment variable makes absolute sense. Ideally this environment variable would be usable across all the SDKs.

@mikkeloscar
Copy link
Author

@jasdel thanks for providing an update on this!

Our biggest concern with this file being refreshed is two factor. Usage outside of Kubernetes, and that the shared configuration and shared credential files are interchangeable for the most part. This means that there are non-credential related information that would be refreshed. The additional information in the files doesn't make sense to be refreshed during the lifetime of an application.

If this is such a big concern, then let me propose a new credentials provider for this use case. We could call it RefreshableFileCredentials or something like that and it could rely on a file with the same content as what you get from an assume role i.e.:

{
    "AccessKeyId": "ACCESS_KEY_ID",
    "Expiration": "EXPIRATION_DATE",
    "RoleArn": "TASK_ROLE_ARN",
    "SecretAccessKey": "SECRET_ACCESS_KEY",
    "Token": "SECURITY_TOKEN_STRING"
}

It could even be enabled ONLY when a certain environment variable is set, e.g.: AWS_REFRESHABLE_FILE_CREDENTIALS=/path/to/file. This way it would not affect users outside of Kubernetes (or other environments where this could be useful).

I don't mind writing a proper proposal for how this would work, I just don't want to spend time on it if there's no chance it could be considered.

Btw. today I looked into the code of the AWS SDK for Java and found out that it does support some form of file refreshing. It does so with a fixed refresh interval of 5 min. and reloads the file if it was changed since last check. While this is not the nicest solution, it actually solves the problem. I made a simple example application to verify that it does indeed work. This basically means the problem is solved for all JVM users which is nice.

I don't suggest that it should be done the same way for the Go SDK, but considering that the Java SDK has support, would you be more willing to add support (in some form) to the Go SDK?

A http endpoint for retrieving credentials per pod provides the best flexibility for applications of users.

I don't agree. Users don't care how it's implemented. They care that it just works and they don't have to jump through a million hoops. To make an HTTP endpoint implementation "just work" is much more complex than what I propose. For reference this is the current HTTP endpoint based proposal from the community. It's very complex..

I think further discussing EC2MetadataDisableTimeoutOverride as an environment variable makes absolute sense. Ideally this environment variable would be usable across all the SDKs.

I would really prefer to spend effort on fixing the root cause instead of implementing hacks.
For me this is equivalent to putting sleeps in your code to work around race conditions.
That's why I don't like it because it moves the problem away from the SDKs to other components that becomes harder to implement in a good way because of that. When a change is needed to all SDKs anyway, why not make a change that can solve the problem at the lowest level, and solves the problem for good?

@jasdel jasdel removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Sep 24, 2018
@xibz
Copy link
Contributor

xibz commented Oct 5, 2018

Hello @mikkeloscar, thank you for pointing out some various points on this feature. I've went ahead and started working with EKS to see what solutions may be possible. Once we land on something concrete, I'll update this issue and hopefully have a PR with the implementation.

@mikkeloscar
Copy link
Author

@xibz That sounds like great news!

Do you have any rough estimation on the time frame for this? It's hurting us a lot not having a robust solution and we are able to help any way you need. We have 100 Kubernetes clusters on AWS we can use for testing this! I can also write a full proposal describing the idea described above if this would be helpful, just let me know!

Thanks for picking this up again!!

@jasdel
Copy link
Contributor

jasdel commented Oct 8, 2018

Thanks @mikkeloscar we'll be glad to review changes when we get closer to having a design for this feature. We'll be working with the other AWS SDKs in our designs to create a consistent experience and support.

Using a JSON file instead of the ini-like shared config file addresses the majority of our concerns about partial reads. The Go structure with a JSON file most likely would be similar to the endpointcreds.getCredentialsOutput type. The Expiration member would be serialized from a string formatted as RFC3339 time, e.g. (2006-01-02T15:04:05Z).

type RefreshableCredentials struct {
	Expiration      *time.Time
	AccessKeyID     string
	SecretAccessKey string
	Token           string
}

In you're example you included a RoleArn member. Could you go into details more about the usage of this value? In general SDKs load this value from the shared config file (~/.aws/config) when loading configuration. Currently most SDKs don't support assuming a role with credentials which are not defined in the shared config/credential file (#1019), but I think also updating the SDK to support assuming a role with additional credentials would be the best way to support the Assume Role use case the RoleArn looks to facilitate.

@mikkeloscar
Copy link
Author

In you're example you included a RoleArn member. Could you go into details more about the usage of this value?

Sorry, this I copied verbatim from your comment (comes from here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html). I didn't actually have anything in mind for the RoleArn. The struct you suggest covers everything that would be needed and is actually what I had in mind.

I did also not consider the Assume Role use case. I would expect whatever provides the credentials would already have performed the assuming of a role if needed. This I would simply consider out of scope.

@jasdel
Copy link
Contributor

jasdel commented Oct 8, 2018

Thanks for the clarification. I missed that when making my post. Agreed, assume role case would be a nice to have, but would probably not be a blocker.

@idcmp
Copy link

idcmp commented Aug 22, 2019

Following this; I'm a bit confused on the technical impediment.

Calls to rename(2) are atomic; so external tools refreshing AWS credential files need only to write them to temporary file and rename them into place. The Go SDK needs only to call os.Stat periodically (possibly 5 minutes like the Java SDK) to see if the ModTime has changed and return true in IsExpired if the modification time has changed.

@mikkeloscar
Copy link
Author

The Go SDK needs only to call os.Stat periodically (possibly 5 minutes like the Java SDK) to see if the ModTime has changed and return true in IsExpired if the modification time has changed.

You don't need to do it periodically, you already read credentials which is valid for a certain period the first time the file was read, you only need to refresh those once they expire so you can simply read the file every time the credentials expire, and it could be changed a million times in between without you having to care about it :)

@jtlisi
Copy link

jtlisi commented Jul 7, 2020

Hey, is there any headway on this feature?

jaylim-crl added a commit to jaylim-crl/aws-sdk-go that referenced this issue Nov 16, 2023
…edentials

Previously, we were using the StaticProvider for all credentials resolved
while reading shared config files. There is logic in the AfterRetryHandler that
marks a credential object as expired whenever an expired token exception
occurs. When that happens, the ideal workflow is for the provider to trigger
a credentials refresh by re-reading the profile file, but this is not possible
with the StaticProvider.

This commit addresses that problem by using the SharedCredentialsProvider
instead of the StaticProvider when resolving such credentials. With this
change, the SDK should automatically refreshes the credentials whenever we get
a expired token exception from AWS. This should also implicitly address the
feature requested in aws#1993, but instead of refreshing explicitly, the trigger
here would be an exception from AWS.
@jaylim-crl
Copy link

jaylim-crl commented Nov 16, 2023

It appears that there's logic in the AfterRetryHandler that marks a credential as expired whenever we receive an expired exception from AWS:

// when the expired token exception occurs the credentials
// need to be expired locally so that the next request to
// get credentials will trigger a credentials refresh.
if r.IsErrorExpired() {
r.Config.Credentials.Expire()
}

The current code uses a StaticProvider, which means that such refresh mechanisms would not work with a shared credential file. I have submitted a PR addressing that issue: #5073. It should also indirectly addresses this feature request. With that change, one could just periodically update the shared credentials file. As long as the file gets updated before it expires, the SDK should automatically re-read the credential file when it receives an expired token exception.

jaylim-crl added a commit to jaylim-crl/aws-sdk-go that referenced this issue Nov 17, 2023
…edentials

Previously, we were using the StaticProvider for all credentials resolved
while reading shared config files. There is logic in the AfterRetryHandler that
marks a credential object as expired whenever an expired token exception
occurs. When that happens, the ideal workflow is for the provider to trigger
a credentials refresh by re-reading the profile file, but this is not possible
with the StaticProvider.

This commit addresses that problem by using the SharedCredentialsProvider
instead of the StaticProvider when resolving such credentials. With this
change, the SDK should automatically refreshes the credentials whenever we get
a expired token exception from AWS. This should also implicitly address the
feature requested in aws#1993, but instead of refreshing explicitly, the trigger
here would be an exception from AWS.
jaylim-crl added a commit to cockroachdb/aws-sdk-go that referenced this issue Nov 22, 2023
…edentials

Previously, we were using the StaticProvider for all credentials resolved
while reading shared config files. There is logic in the AfterRetryHandler that
marks a credential object as expired whenever an expired token exception
occurs. When that happens, the ideal workflow is for the provider to trigger
a credentials refresh by re-reading the profile file, but this is not possible
with the StaticProvider.

This commit addresses that problem by using the SharedCredentialsProvider
instead of the StaticProvider when resolving such credentials. With this
change, the SDK should automatically refreshes the credentials whenever we get
a expired token exception from AWS. This should also implicitly address the
feature requested in aws#1993, but instead of refreshing explicitly, the trigger
here would be an exception from AWS.
@GarrettBeatty
Copy link

is there any update on this feature request? We have similar use case for using open telemetry and reading temporary credentials from shared credentials file.

@lucix-aws
Copy link
Contributor

We're not going to implement this, this type of credential setup isn't present anywhere in the AWS ecosystem to my knowledge (creds that will need refreshed are always vended from an explicitly dynamic source e.g. IMDS, eks/ecs endpointcreds, etc.) and none of the SDKs are implemented to refresh file-based credentials accordingly.

Our recommendation as of right now is to either implement a local credentials service to be retrieved with endpointcreds / IMDS or roll your own logic to load credentials from the ~/.aws/credentials file.

@lucix-aws lucix-aws closed this as not planned Won't fix, can't repro, duplicate, stale Jan 2, 2024
Copy link

github-actions bot commented Jan 2, 2024

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

No branches or pull requests

9 participants