-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: parameter retrieval utility #94
Comments
Thanks for creating the RFC, Nicolas :)
Two questions for you:
1. Do we want to support multiple parameters? E.g fetch all parameters by
path
Why - Helps those fetching more than one parameter with a single call to
SSM
2. Do we want to have a decorator for that too?
e.g @fetch/get_parameter
Why - Mostly syntactic sugar. It’d be a no brainer given we have a
decorator factory, though I haven’t put much thought into it as to naming
conventions
…On Mon, 27 Jul 2020 at 18:07, Nicolas Moutschen ***@***.***> wrote:
Key information
- RFC PR: (leave this empty)
- Related issue(s), if known:
- Area: Utilities
- Meet tenets
<https://awslabs.github.io/aws-lambda-powertools-python/#tenets>: Yes
Summary
Add a utility to facilitate retrieval of parameters from the SSM Parameter
store or Secrets Manager. This would have support for caching parameter
values, preventing retrieving the value at every execution.
Motivation
Many Lambda users use either the Parameter Store or Secrets Manager to
store things such as feature flags, third-party API keys, etc. In a
serverless context, a simple approach is to either fetch at every
invocation (which might be costly/run into limits) or fetch at
initialisation (meaning no control over expiration and refresh). This would
simplify that experience for customers.
Proposal
This utility should provide a very simple to use interface for customers
that don't want to deep-dive into how this utility works. To prevent
hitting the throughput limit for the Parameter Store, this should have a
default cache value in the single digit seconds (e.g. 5).
*Basic usage*
# For SSM Parameterfrom aws_lambda_powertools.utilities import get_parameter# For Secrets Managerfrom aws_lambda_powertools.utilities import get_secret
def handler(event, context):
param = get_parameter("my-parameter")
secret = get_secret("my-secret")
*Advanced usage*
from aws_lambda_powertools.utilities import get_parameter
def handler(event, context):
# Only refresh after 300 seconds
param = get_parameter("my-parameter", max_age=300)
Drawbacks
- This would add a dependency on boto3. Many functions probably use it
in some form, but the Powertools don't require it at the moment.
- Many problems around parameters can be solved using environment
variables, thus the usefulness is limited to cases where value could change
with a short notice.
Rationale and alternatives
- *What other designs have been considered? Why not them?* Replicating
ssm-cache-python <https://github.com/alexcasalboni/ssm-cache-python>
feature set, however this might be too feature-rich for this use-case.
- *What is the impact of not doing this?* Users who want to retrieve
dynamic parameters will have to think about the expiration logic if they
don't want to risk getting throttles at scale.
Unresolved questions
Optional, stash area for topics that need further development e.g. TBD
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#94>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZPQBEPXOU3TNKMIQIWFSLR5WQ4XANCNFSM4PI5VOAA>
.
|
I lack data here to know how frequently this is used. In the context of a API key parameter or feature flags, this could be done through a single parameter. I'm also thinking that only supporting a single parameter would encourage by design to use a single parameter because of the Parameter Store API limits. The default throughput limit is 40 RPS, which means a maximum concurrency of 200 assuming a timeout of 5. Users could encounter unexpected behaviour or higher API calls because of how GetParameterByPath works: If the service reaches an internal limit while processing the results, it stops the operation and returns the matching values up to that point and a NextToken.
How would developers access the parameter value within the function? I'm not sure how to design this so it looks convenient from a developer perspective. |
That makes sense on 1 - Let’s begin with a single parameter. If we hear
otherwise from customers we could always extend that afterwards.
On 2. I was thinking on either injecting on context, or introducing a
shared state object within Powertools that can be retrieved anywhere in
their code (default dict + Borg)
There’s pros and cons to both, but if we want to do that, then I think the
latter gives us flexibility to allow other customers who are using Chalice
like Michael express on the other RFC
…On Tue, 28 Jul 2020 at 08:24, Nicolas Moutschen ***@***.***> wrote:
1. Do we want to support multiple parameters? E.g fetch all parameters
by
path
I lack data here to know how frequently this is used.
In the context of a API key parameter or feature flags, this could be done
through a single parameter. I'm also thinking that only supporting a single
parameter would encourage by design to use a single parameter because of
the Parameter Store API limits
<https://docs.aws.amazon.com/general/latest/gr/ssm.html#limits_ssm>.
The default throughput limit is 40 RPS, which means a maximum concurrency
of 200 assuming a timeout of 5. Users could encounter unexpected behaviour
or higher API calls because of how GetParameterByPath
<https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_GetParametersByPath.html>
works: *If the service reaches an internal limit while processing the
results, it stops the operation and returns the matching values up to that
point and a NextToken.*
1. Do we want to have a decorator for that too?
e.g @fetch/get_parameter
How would developers access the parameter value within the function? I'm
not sure how to design this so it looks convenient from a developer
perspective.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#94 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZPQBE5H3QGY3IKHVOBBY3R5ZVIDANCNFSM4PI5VOAA>
.
|
I agree on the shared state, there's potential usefulness there beyond the parameter store. |
cc @bahrmichael @Nr18 @keithrozario @michaelbrewer - Would love to get your inputs on this as we scope this new utility |
As we will depend on Boto3 and possibly other libraries, as we grow our utilities for Powertools.... it's worth thinking whether customers would be fine in having another PyPi package just for utilities e.g. from aws_lambda_powertools_utilities import get_parameter, validate, boto_tuner, requests_tuner... UPDATE: An alternative could be bringing a few Kilobytes into Powertools with new utilities, but in order to use some of them there would be a conscious explicit decision of bringing extra dependencies like - UPDATE 2: Well, X-Ray SDK brings botocore (47M) in anyway, so if we were to suggest This means we can do: from aws_lambda_powertools.utilities import ... |
@heitorlessa in the project that I am currently working on we are investigating if we could use the We have the following: def get_secret(secret_name: str) -> dict:
if not secret_name:
raise ValueError("Invalid secret_name given!")
try:
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
except ClientError as e:
if e.response["Error"]["Code"] == "DecryptionFailureException":
raise e
elif e.response["Error"]["Code"] == "InternalServiceErrorException":
raise e
elif e.response["Error"]["Code"] == "InvalidParameterException":
raise e
elif e.response["Error"]["Code"] == "InvalidRequestException":
raise e
elif e.response["Error"]["Code"] == "ResourceNotFoundException":
raise e
else:
if "SecretString" not in get_secret_value_response:
raise ValueError("Expected to find a SecretString!")
return json.loads(get_secret_value_response["SecretString"]) It's not that fancy I believe it's largely the code that secrets manager supplies as a sample, but the reason for us to put it in a separate file and method was so that we could use it like this: def handler(event: dict, context: LambdaContext) -> None:
secret = get_secret(os.getenv("MY_SECRET", ""))
# The following is a snippet from our unit tests
from src.my_lambda.function import index
def test_handler(monkeypatch) -> None:
monkeypatch.setenv("MY_SECRET", "my-secret-key")
mock_get_secret = MagicMock(return_value={})
monkeypatch.setattr(index, "get_secret", mock_get_secret)
index({}, LambdaContext())
mock_get_secret.assert_called_with("my-secret-key") Previously we used We only implemented the from aws_lambda_powertools.utilities import get_secret, get_json_secret, get_binary_secret
# or
from aws_lambda_powertools.utilities import get_secret
get_secret("MySecret").json
get_secret("MySecret").text
get_secret("MySecret").binary Because you typically will have 1 lambda function doing one thing I don't necessarily see the need to be able to fetch multiple secrets but there could be a use-case a lambda is invoked and needs to get credentials to read something from an API and then it needs a different set of credentials to write to another API. (Think this is typically needed when you use a scheduled event rule to keep something in sync for example) For the parameter store, however, I do see the need to fetch all values recursively. Let say you would have a parameter story structure as following:
So when you run From a usage perspective it would be nice to be able to do something like: application_id = foo
application_params = get_parameters(f"/Services/{application_id}", recursive=True)
print(application_params.Name)
print(application_params.get("Name", "No name found")) |
Hey @Nr18 ! Thanks a lot for the (very detailed) feedback here. 😄 On the return type, I have questions if someone is doing something like this: value = get_parameter("some-parameter")
print(value.json["a"])
print(value.json["b"]) In this case, should we cache the output from get_parameter("some-parameter", format="text") # Default, returns a string
get_parameter("some-parameter", format="json") # json.loads() the value
get_parameter("some-parameter", format="binary") # base64.b64decode() the value Agree on the params.Dependencies.OtherAppId I quite like using |
Another question that popped in the PR (ping @jplock): should we decrypt data or not? I'm tempted to say yes from an ease of use, but I'm worried about the security implication to have something that will decrypt data without action. E.g. developers could forget that this is sensitive information and accidentally leak it because they didn't explicitly decrypt it themselves. |
I did something very similar to this, a few months back: https://github.com/keithrozario/lambda-cache. It looks something like this: from lambda_cache import ssm
@ssm.cache(parameter='/production/app/var', max_age_in_seconds=300)
def handler(event, context):
var = getattr(context,'var')
response = do_something(var)
return response I used a decorator with parameters to inject the parameter/secret into the context (because unlike event, context is the same regardless of what triggers the function). But, I'm not sure if it's 100% the best way to do this. It does allow us to do interesting things though -- like decorator stacking, and even multiple parameter caching. My thoughts:
|
So not sure if it should be part of this or that it's more a project-wide discussion since python is moving to a more type hinted language every release it would help to provide typing that would help the developer both in the actual functions but maybe more importantly when writing unit tests. If a decorator changes the context object it would be great if you IDE helps you figure out what it does (No expert on this area) instead of having to read through samples that are potentially outdated if they already exist. Love the caching idea of a secret/parameter btw it saves a few calls and would increase invocation times. |
I'm not a huge fan of using the Lambda context for this. This has a few issues:
On making it a generic caching system out of this, that could be a good idea! That'd drastically increase the scope and potentially help way more use cases. I just have a few concerns on expanding the scope too much. People could use DynamoDB or a relational DB to store parameters and want to retrieve them. However, when thinking about S3, some people might want to pull 100s of MBs and put that into /tmp, which I feel is out of scope for this. We could make a generic parameter retrieval system, accepting parameter store providers. This way, people could make their own parameter store providers if they want. Then we could provide ones for common cases (e.g. SSM Parameter Store, Secrets Manager, DynamoDB, etc.). I'd keep it to things that we can pull from boto3, though, to not add dependencies on Postgres/MySQL/etc. libraries. By the way, on boto3, we are already pulling botocore indirectly through the X-Ray SDK. boto3 doesn't add much there compared to botocore so I think it's fine to have a direct dependency. |
@nmoutschen I thought about it and realized that we typically use SQS in front of the Lambda function and we have some plumming (which might be a good candidate to be done by the powertools 💪, so if you agree I can try to write an RFC for that) and it looks something like this: sqs = SQS(os.getenv("EVENT_SOURCE"))
def handler(event: dict, context: LambdaContext) -> None:
# Run the given callback method for each message
sqs.process_messages(event["Records"], sqs_callback)
def sqs_callback(message: str, attributes: dict) -> bool:
# get_secret() or get_parameter()
# Process your message here
return True # False or an Exception would not delete the specific message So the reason why we build this is that we had a lot of duplicate code of handling the messages and making sure it gets deleted and that makes the tests of the function somewhat complex and harder to maintain. With ☝️ you typically test that if the handler is called are the records passed to the That solution would not work if the secrets would come in the I like the idea of the generic parameter retrieval system as long as the parameter store and secrets manager are included so you don't need to write your own. Regarding the boto3 being included I have not seen a clear use case where I do not include it, so the powertools might not use it but the business logic in the function typically is so I am definitely fine with pulling it in by default especially when you consider: botocore (47M) vs boto3 (1M). (We always include boto3 in a lambda layer anyway) |
Following the discussion here, I've done a few thing in the PR:
The implementation for a specific provider is fairly straightforward, and much of the caching/transformation logic is handled by the BaseProvider. For example, for the SSM Parameter store ( def _get(self, name: str, **kwargs) -> str:
# Load kwargs
decrypt = kwargs.get("decrypt", False)
return self.client.get_parameter(Name=name, WithDecryption=decrypt)["Parameter"]["Value"] |
@nmoutschen great discussion and the PR looks great! Have you considered adding an "automatic retry" decorator as well? I implemented that here: https://github.com/alexcasalboni/ssm-cache-python/blob/master/ssm_cache/cache.py#L143 The idea was to simplify the invalidation of parameters that might change at run-time without forcing a short cache TTL, based on a specific exception/error. So your cache is valid forever, until you get the expected error. Something like this this: from ssm_cache import SSMParameter
from my_db_lib import Client, InvalidCredentials # pseudo-code
param = SSMParameter('my_db_password')
my_db_client = Client(password=param.value)
# this callback is called when InvalidCredentials is raised
# it will just re-initialize the db client with the new password
def on_error_callback():
my_db_client = Client(password=param.value)
@param.refresh_on_error(InvalidCredentials, on_error_callback)
def read_record(is_retry=False):
return my_db_client.read_record()
def lambda_handler(event, context):
return {
'record': read_record(),
} I know that for this specific use case we'd rather recommend using AWS Secrets Manager to handle db host/password & automatic rotation, but there are other cases where you want to be able to automatically re-fetch a parameter and retry (instead of a lot of try-catch and manual invalidation). |
Closing as we release this feature in 1.3.0. |
Key information
Summary
Add a utility to facilitate retrieval of parameters from the SSM Parameter store or Secrets Manager. This would have support for caching parameter values, preventing retrieving the value at every execution.
Motivation
Many Lambda users use either the Parameter Store or Secrets Manager to store things such as feature flags, third-party API keys, etc. In a serverless context, a simple approach is to either fetch at every invocation (which might be costly/run into limits) or fetch at initialisation (meaning no control over expiration and refresh). This would simplify that experience for customers.
Proposal
This utility should provide a very simple to use interface for customers that don't want to deep-dive into how this utility works. To prevent hitting the throughput limit for the Parameter Store, this should have a default cache value in the single digit seconds (e.g. 5).
Basic usage
Changing the default cache duration
Convert from specific format
Retrieve multiple parameters from a path
Drawbacks
Rationale and alternatives
Unresolved questions
The text was updated successfully, but these errors were encountered: