Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding persistent caching of temporary credentials with serialization #1157

Conversation

gene1wood
Copy link
Contributor

@gene1wood gene1wood commented Feb 24, 2017

Adding persistent caching of temporary credentials from aws/aws-cli@22932e5 with serialization

This PR addresses #1148 by porting the code present in awscli into botocore to make it available to boto3. Having the code in both awscli and botocore until it's removed from awscli doesn't appear to have any negative impact.

One thing I'd like guidance on is:

  • The RefreshableCredentials class stores the _expiry_time natively in datetime format.
  • The AssumeRoleProvider class expects the possibility that the response['Credentials']['Expiration'] value, which may come from the cache, may need to be parsed from a string back into a datetime
  • In my searches of the awscli code it appears to me that awscli doesn't have any special logic (logic not present in botocore) to ensure that the expiration datetime is serialized to a string before it's passed to JSONFileCache
  • When I use the existing assumerole customization in awscli the expiration datetime is correctly serialized to a string when stored in the JSONFileCache
  • When I use this new functionality that I've moved into botocore in this PR without an explicit serialization call the expiration datetime is passed to JSONFileCache un-serialized (which JSONFileCache rightly rejects). As a result I've set json.dumps to serialize the datetime but I don't understand why this step isn't also required for awscli to work.

I've implemented the serialization by adding _serialize_if_needed as the default argument to the json.dumps which works but seems to be in opposition to the intent of the JSONFileCache based on the test_only_accepts_json_serializable_data test.

@jamesls Can you provide any insight on this question about serialization? If there's any way I can remove that default argument to json.dumps which calls _serialize_if_needed and have the expiration datetime serialized through whatever method awscli accomplishes it I'd prefer it.

Note : I originally submitted PR #1156 with serialization outside of JSONFileCache but I implemented it in a way that didn't work so I closed out that PR.

To try this PR out you can install it by running

pip install git+https://github.com/gene1wood/botocore.git@persistent-credential-cache-with-serialization

@codecov-io
Copy link

codecov-io commented Feb 24, 2017

Codecov Report

Merging #1157 into develop will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1157      +/-   ##
===========================================
+ Coverage    98.11%   98.12%   +<.01%     
===========================================
  Files           46       46              
  Lines         7649     7685      +36     
===========================================
+ Hits          7505     7541      +36     
  Misses         144      144
Impacted Files Coverage Δ
botocore/session.py 98.67% <ø> (ø) ⬆️
botocore/credentials.py 98.4% <100%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e5dc5e9...e5dbeba. Read the comment docs.

@andrewkrug
Copy link

This works great! Please merge.

@et304383
Copy link

Please merge this and upload to pip!

with os.fdopen(os.open(full_key,
os.O_WRONLY | os.O_CREAT, 0o600), 'w') as f:
f.truncate()
f.write(file_content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, should this be written to a temporary location and then renamed into place so that it is written in one atomic action?

Copy link
Contributor Author

@gene1wood gene1wood Mar 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesls Can you speak to @dstufft 's question about this section of your commit aws/aws-cli@22d4cea aws/aws-cli@a508d66 and aws/aws-cli@22932e5 specifically about opening the file and writing to it in two separate steps?

Copy link
Contributor Author

@gene1wood gene1wood Mar 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstufft I brought this code block over directly from @jamesls 's code in aws-cli. I would imagine if we wanted to implement your suggestion (which I think is good) we should do so in aws-cli and boto. If we defer that change it could potentially be made only in botocore once aws-cli no longer duplicates this cache persistence logic. @dstufft Do you think we should change this here in this PR now or wait on fixing it in both botocore and aws-cli?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstufft Could you let me know if you'd like me to implement your suggestion in light of my note in the comment above . Really, any thoughts you have would be helpful here.

@dstufft dstufft added pr/needs-review This PR needs a review from a member of the team. needs-additional-reviewer labels Mar 23, 2017
@dstufft
Copy link
Contributor

dstufft commented Mar 23, 2017

One implementation comment, but otherwise this looks ~ok to me. I'd like another reviewer though!

@et304383
Copy link

Can we please oh pretty please merge this?

@gene1wood
Copy link
Contributor Author

@dstufft

Any thoughts on my comment above?

@dstufft or @stealthycoin

Anything you can think of that I can do to enable/encourage @jamesls @JordonPhillips or @kyleknap to review this PR?

@et304383
Copy link

I know everyone is probably very busy, but please can we prioritize merging this? We've stopped writing Python scripts for AWS CLI calls because of how annoying having to put MFA in every single time has become.

@et304383
Copy link

Guys this is getting ridiculous. This is an insanely needed feature and there's been no activity on it. Why can't this be merged?

@jamesls
Copy link
Member

jamesls commented May 26, 2017

I don't mind adding the json file cache class here in botocore, but I'm hesitant to change the defaults to start writing out to ~/.aws/cli/cache in botocore, which isn't actually tied to the CLI. In botocore, the default implementations are typically in memory representations. I'd prefer to make this an opt-in configurable thing (likely via client config objs), in addition to allowing a user to specify the cache directory so it can be shared with the CLI.

I think that should be a relatively small change. Thoughts?

@et304383
Copy link

If the issue is putting it in botocore then why not put it into boto3?

@et304383
Copy link

I've resorted to applying a patch based on this pull request to my local botocore package folder. It's been working without issue for weeks. I just don't see why anyone would not want this behaviour. Am I missing something? Windows support issues?

@dstufft
Copy link
Contributor

dstufft commented Jun 16, 2017

@eric-tucker I think the comment by @jamesls stands even if we moved to it boto3 instead of botocore, since I don't think that boto3 has any default implementations that aren't in memory either. Would having a default in memory implementation with the way to configure it work for you? Does it need to be on by default?

@et304383
Copy link

@dstufft I think there should be some synergy between boto3 and the CLI since they're both powered by the same engine (sort of?)

Considering that all SDKs (I think) are cable of using ~/.aws contents I see no reason that by default the SDKs shouldn't also utilize the exact same mechanisms for caching MFA credentials as the CLI. Consistency is key.

I believe that it's because this wasn't all done in a consistent manner across SDKs from day one that we're in the situation we're in now. Consider that the SDKs are only catching up recently to supporting the same techniques that the CLI does, such as assumed role profiles.

I could be way off here, of course.

@nskitch
Copy link

nskitch commented Jul 12, 2017

I'm just here as a cheerleader. Almost a month since last comment. I'd love to see this functionality restored. Even if it means getting it working again, and assigning future work to follow the best practice suggestions.
typing an mfa code for each iteration of my python/boto script testing is slowing me down. appreciate everybody's work!

@rvandegrift
Copy link
Contributor

@gene1wood are you still interested in this PR? If not, I'm happy to pick it up and implement the changes that @jamesls suggested.

This adds two new configuration settings
* `credential_cache`
* `credential_cache_directory`

which allow the user to change the default cache system from an in-memory cache to a filesystem cache

It also changes the default filesystem cache directory to one unrelated to the awscli cache directory
@gene1wood
Copy link
Contributor Author

74a061e implements the suggestion above

This change necessarily requires a PR against boto3 to update the configuration documentation.

This uses two configuration settings, one to set the type of cache to be used, and one to set the directory to store the cache in the filesystem if a file based cache is used. This seeks to follow the intent of the cache argument to credentials.AssumeRoleProvider by leaving it open to additional cache systems being added down the road.

The default behavior is to function as botocore does currently by using an in memory cache.

The default file cache directory is a new directory, ~/.aws/cache. This is different from the cache directory of awscli (~/.aws/cli/cache) but the user can set it to be the same so that botocore and awscli share cache files.

Feel free to suggest different configuration setting names or environment variable names, suggest words other than memory and file to select the cache system, suggest a different default directory or anything else.

I did not add a unit test for the session conversion_func values for credential_cache and credential_cache_directory which use string.lower and os.path.exanduser respectively because it felt like overkill but I'm happy to if it' needed.

@gene1wood
Copy link
Contributor Author

gene1wood commented Jul 23, 2017

Well, that didn't work, there's a problem with my unit tests. I've tested and confirmed the new code work and that the issue is with my unit test. I'll keep looking for what I'm doing wrong but if someone else sees the mistake feel free to speak up. I'm not very knowledgeable about the botocore unit test framework

@et304383
Copy link

I see no reason to not use the same cache directory by default. If I'm switching between cli and boto I would prefer not to have to enter MFA twice.

I'm in favour of defaults that support making user's lives easier.

@gene1wood
Copy link
Contributor Author

@et304383 I think @jamesls 's point was that the default behavior should be distinct from awscli and as a consequence shouldn't use the same directory but that the user should be able to set the directory to be the same.

@jamesls I'd like some clarification on your thought on this.

@gene1wood
Copy link
Contributor Author

@jamesls Does this update satisfy your concerns

@rvandegrift
Copy link
Contributor

awscli uses botocore to get temp creds, and both will race to update the cache

I don't think this is the case in that awscli doesn't write to the cache itself, it merely tells boto to do so

Is that the right link? The code there isn't asking boto to cache, it has a local copy of JSONFileCache that directly modifies the json files in ~/.aws/cli/cache. Am I missing something?

@gene1wood
Copy link
Contributor Author

gene1wood commented Aug 31, 2017

I could be mistaken but I think that the existing awscli code

In between those two steps I don't see any race condition. On line 39 the CredentialProvider would set self.cache to the botocore JSONFileCache, then on the subsequent line it would change self.cache to use the awscli JSONFileCache

@rvandegrift to help me understand, can you walk through a scenario where the race condition you're thinking of would be triggered?

@rvandegrift
Copy link
Contributor

Ah, now I get it - I missed that the cache provider is replaced. Makes perfect sense now, sorry for being dense!

@et304383
Copy link

OK so if the concerns around race conditions on file cache are satisfied, can we please merge this PR with the option to enable file cache in a persistent manner? I don't want to have to set an additional option in code to enable this. It should be a user environment setting.

@gene1wood
Copy link
Contributor Author

@et304383 This PR allows a user to enable file cache in a persistent manner by setting the credential_cache and credential_cache_directory config settings. No in-code settings are needed.

@jamesls Does this update satisfy your concerns

@stealthycoin stealthycoin added feature-request This issue requests a feature. large labels Sep 14, 2017
@gene1wood
Copy link
Contributor Author

@jamesls Does this update satisfy your concerns

@gwkunze
Copy link

gwkunze commented Oct 16, 2017

While we were eagerly anticipating the merging of this PR, the workaround we now use in our code is to use the caching mechanism of the awscli itself. This does require making the awscli a dependency of your app but at least it alleviates the problem of constantly having to wait for (and enter) new MFA codes.

        session = boto3.session.Session()
        # UGLY Hack to use credentials caching from awscli in this application
        # Awaiting https://github.com/boto/botocore/pull/1157 to be merged
        try:
            from awscli.customizations.assumerole import inject_assume_role_provider_cache
            from awscli.customizations.scalarparse import add_scalar_parsers
            inject_assume_role_provider_cache(session._session)
            add_scalar_parsers(session._session)
        except:
            pass

@gene1wood
Copy link
Contributor Author

@jamesls Does this update satisfy your concerns

@et304383
Copy link

et304383 commented Nov 23, 2017

Why on earth can't we merge this? :(

Looks like we have conflicts now.

@joguSD
Copy link
Contributor

joguSD commented Nov 24, 2017

The conflicts seem pretty minimal, mostly just imports and a declaration moved.

@gene1wood
Copy link
Contributor Author

gene1wood commented Nov 25, 2017

Oh, didn't even notice that master had moved causing conflicts. I've fixed that and it can be merged.

…ntial-cache-with-serialization

# Conflicts:
#	botocore/credentials.py
#	tests/unit/test_credentials.py
@JordonPhillips
Copy link
Contributor

Thanks for opening the PR and driving it as far as you have! I've talked with the rest of the team about this, and here's the interface we'd like to go for:

from botocore.session import Session
from botocore.credentials import create_credential_resolver
from botocore.credentials import JSONFileCache


sess = Session()
cache = JSONFileCache()
resolver = create_credential_resolver(session, cache)
sess.register_component('credential_resolver', resolver)

There are several reasons for this. This allows a customer to specify whatever cache implementation they want instead of specifically tying them to our built-in cache implementations. We didn't simply add a credential_cache session component because that would only ever be used once. The way the session works, the credential chain is generated exactly once. So if you set credential_cache after the chain had already been generated, your new cache would never be used. We can't simply re-generate the chain because customers can provide their own, so we would effectively be overwriting their custom chain with our default chain without warning.

With this approach customers can use whatever cache they want and the process for setting it will result in the behavior they would expect out of the Session whenever they decide to set it. It still leaves us the ability to add config file settings later on.

I'll be taking this on unless you really want to do it yourself. Let me know what you think.

@et304383
Copy link

@JordonPhillips are you suggesting we'd have to enable proper caching of MFA on a script by script basis?

I would highly suggest providing a flexible way for users to set this on a system level. MFA is for interactive execution of scripts. Please tell me a scenario where I, as a user, would want to cache MFA for one script and not for another.

@JordonPhillips
Copy link
Contributor

@et304383 I'm not saying we should rule out external configuration, just that I would like to keep that out of the scope of this PR. The reason for that is that the AWS SDKs have an initiative to add features around configuration in a consistent manner. I'll have conversations with other SDK engineers to see how the changes will affect them and if they have feedback on how it is exposed.

This process will take some time, so in the meantime I would like to enable a minimal workaround inside botocore that will also make it easier to add external configuration down the line.

@gene1wood
Copy link
Contributor Author

@JordonPhillips Thanks!

@et304383
Copy link

OK how do we restore the functionality of this PR? Having to type MFA in every time I run a python script is driving me nuts.

@gene1wood
Copy link
Contributor Author

Here's a summary of where the session caching functionality in botocore is at

In order to use this new session caching functionality, you'll need to both

  • enable it in your code instead of your user's config
  • set the working_dir where the on disk cache files are stored to be the same as where awscli stores it's cache. This is optional. With it awscli and botocore share a single set of cached sessions reducing the frequency with which you have to type your MFA code.

Here's a code snippet from @mixja showing how to use the new functionality

from botocore import credentials
import botocore.session
import boto3
import os

# Change the cache path from the default of ~/.aws/boto/cache to the one used by awscli
working_dir = os.path.join(os.path.expanduser('~'),'.aws/cli/cache')

# Construct botocore session with cache
session = botocore.session.get_session()
provider = session.get_component('credential_provider').get_provider('assume-role')
provider.cache = credentials.JSONFileCache(working_dir)

# Create boto3 client from session
client = boto3.Session(botocore_session=session).client('s3')

print(client.list_buckets())

adamnovak added a commit to DataBiosphere/toil that referenced this pull request Sep 19, 2019
See boto/botocore#1157 and also
boto/botocore#1338 which actually merged.

There's no global way to say that we want Boto3 to cache credentials;
we need to do a dance for every session.

TODO: Should we use the same credentials cache as awscli? Right
now this uses the default boto3 cache location which is different.

TODO: Should we remove the custom caching at the boto2 monkey
patch level?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request This issue requests a feature. pr/needs-review This PR needs a review from a member of the team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet