Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Collection of tiny utilities #30

Closed
heitorlessa opened this issue May 5, 2020 · 22 comments
Closed

RFC: Collection of tiny utilities #30

heitorlessa opened this issue May 5, 2020 · 22 comments
Labels
help wanted Could use a second pair of eyes/hands RFC

Comments

@heitorlessa
Copy link
Contributor

heitorlessa commented May 5, 2020

Background

At GA, Powertools will offer Tracer, Metrics, Logger, and Middleware factory as the core utilities.

Optionally, I'm pondering on the idea of providing a set of tiny handy utilities that can be used either as standalone functions, or as part of a custom middleware - For example, JSON serialization, detect retries, fetch secrets, etc.

The benefit of providing utilities as functions is two-fold a) ease of maintenance, b) pick and choose and create a single custom middleware instead of nesting a myriad of decorators.

Propose solution

Use case: Custom middleware using a handful of utilities

from aws_lambda_powertools.middleware_factory import lambda_handler_decorator
from aws_lambda_powertools.utilities import validate_event, cors, decode_json, detect_retry

@lambda_handler_decorator
def custom_middleware_name(handler, event, context):
    # Before
    detect_retry(event)
    event = decode_json(event)
    validate_event(event)

    response = handler(event, context)

    # After
    response = cors(response)
    return response

@custom_middleware_name
def lambda_handler(event, context):
    ...

Use case: Using utilities standalone

from aws_lambda_powertools.utilities import validate_event, cors, decode_json, detect_retry

@custom_middleware_name
def lambda_handler(event, context):
    detect_retry(event)
    event = decode_json(event)
    validate_event(event)

    return cors(result)

Request for comments

As part of this RFC, I'd like to know what utilities are the most useful to have upfront - Leave a comment, and vote using 👍 on each comment instead of a new comment.

For ideas, here are some utilities as decorators created by other awesome authors: Lambda Decorators by Grid smarter cities, and Lambda Decorators by Daniel Schep.

Tenets

  • AWS Lambda only – We optimise for AWS Lambda function environments only. Utilities might work with web frameworks and non-Lambda environments, though they are not officially supported.
  • Eases the adoption of best practices – The main priority of the utilities is to facilitate best practices adoption, as defined in the AWS Well-Architected Serverless Lens; all other functionality is optional.
  • Keep it lean – Additional dependencies are carefully considered for security and ease of maintenance, and prevent negatively impacting startup time.
  • We strive for backwards compatibility – New features and changes should keep backwards compatibility. If a breaking change cannot be avoided, the deprecation and migration process should be clearly defined.
  • We work backwards from the community – We aim to strike a balance of what would work best for 80% of customers. Emerging practices are considered and discussed via Requests for Comment (RFCs)
  • Idiomatic – Utilities follow programming language idioms and language-specific best practices.

* Core utilities are Tracer, Logger and Metrics. Optional utilities may vary across languages.

@heitorlessa heitorlessa added enhancement help wanted Could use a second pair of eyes/hands labels May 5, 2020
@heitorlessa heitorlessa changed the title Feature: Collection of tiny utilities for custom middlewares [RFC] Feature: Collection of tiny utilities for custom middlewares May 5, 2020
@heitorlessa heitorlessa added the RFC label May 5, 2020
@nmoutschen
Copy link
Contributor

One question I have here is the balance regarding the number of utility functions. This should provide utilities that are useful for most use cases out of the box (e.g. CORS, validate authentication for APIGW, etc.) and not add additional dependencies to this project. Having strong guidelines here would prevent bloat/scope creep.

Some are hard to implement without external dependencies/prerequisites. E.g. detecting retries in a case where a Lambda function has multiple execution environments means fetching it from an external data source.

Here are some I'd find useful:

  • Around common Lambda use-cases (processing events from API Gateway proxy integration, creating API Gateway responses, SNS, SQS, Kinesis, etc.).
  • Validating event against a JSON schema (if it doesn't add too much dependencies).

@heitorlessa
Copy link
Contributor Author

Absolutely - I wasn't explicit as I didn't publish the tenets yet but we'd favour those that don't need additional dependencies, and second the most useful cases.

We use fastjsonschema so a schema validator isn't a problem. Detecting retries would mean a retried Lambda invocation where the request_id from context would be the same - There are other more sophisticated ways to do, that's for sure.

@keithrozario
Copy link
Contributor

Would definitely love to see a native download to memory functionality for S3. Current method of doing it is too clunky.

Something like:

data = aws_lambda_powertools.json_from_s3('bucket', '/path/to/file.json')
some_config = data['some']['config']

@bahrmichael
Copy link

I had troubles with the Decimal type from DynamoDB. I'm now using import simplejson as json everywhere as a workaround. I'm curious if that workaround can be replaced with something better.

@bahrmichael
Copy link

To auto enable CORS with SPAs I wrote my own class to convert dict to APIGW compatible payloads.

import json
from log import logger


def deep_render(body, to_json_method):
    if isinstance(body, str) or isinstance(body, int) or isinstance(body, dict):
        return body
    elif isinstance(body, list):
        result = []
        for el in body:
            result.append(deep_render(el, to_json_method))
        return result
    else:
        return to_json_method(body)


class ApiResponse:

    status_code = 200
    body = None
    headers = {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Credentials": "true"
    }

    def __init__(self, status_code=None, body=None, headers=None):
        if status_code is not None:
            self.status_code = status_code
        if body is not None:
            self.body = body
        if headers is not None:
            for key, value in headers.items():
                self.headers[key] = value

    def render(self, to_json_method=None):
        response = {'statusCode': self.status_code, 'headers': self.headers}
        if self.body is not None:
            self.body = deep_render(self.body, to_json_method)
            response['body'] = json.dumps(self.body)
        logger.debug('Response: %s', json.dumps(response))
        return response

Note that this code is more than a year old and was written when I started out with Lambda+Python. I hope there's something better out there that I just was too lazy to search for yet. Having some help to convert payloads to CORS compatible responses plus fully configurable status codes (I might want to return a payload and a certain http error code) would be nice.

@keithrozario
Copy link
Contributor

+1 for the DynamoDB decimal helper. I paste a helper function in my code I copied from AWS, but better if that is in a proper library somewhere.

@heitorlessa
Copy link
Contributor Author

heitorlessa commented May 12, 2020

Hey everyone - I've just published the Tenets to help us prioritize suggestions from this RFC.

Adding them below to facilitate discussions

Tenets

  • AWS Lambda only – We optimise for AWS Lambda function environments only. Utilities might work with web frameworks and non-Lambda environments, though they are not officially supported.
  • Eases the adoption of best practices – The main priority of the utilities is to facilitate best practices adoption, as defined in the AWS Well-Architected Serverless Lens; all other functionality is optional.
  • Keep it lean – Additional dependencies are carefully considered for security and ease of maintenance, and prevent negatively impacting startup time.
  • We strive for backwards compatibility – New features and changes should keep backwards compatibility. If a breaking change cannot be avoided, the deprecation and migration process should be clearly defined.
  • We work backwards from the community – We aim to strike a balance of what would work best for 80% of customers. Emerging practices are considered and discussed via Requests for Comment (RFCs)
  • Idiomatic – Utilities follow programming language idioms and language-specific best practices.

* Core utilities are Tracer, Logger and Metrics. Optional utilities may vary across languages.

@keithrozario
Copy link
Contributor

I fully agree to this, great job @heitorlessa. Especially on the AWS Lambda only, and keeping it lean.

  1. But I always thought this was Python Specific -- but the tenets seem to suggest a more broader language base. Are we planning for something in other languages.

  2. Should there be a tenet for us to continue support for Python 3.6 (and above) or at least support officially supported runtimes. (stop supporting Python3.6 only when support for it expires on Lambda)

  3. I think backwards compatibility is a real important one -- as everybody loves that aspect of AWS. That being said, should a breaking change be necessary, because this is a python package, it probably isn't that big a deal, since consumers can upgrade their packages whenever they're ready, and continue using the older one till then. I'm on the fence over whether this should be a tenet -- or if it is, we should be more specific, something along the lines of we will only break backwards compatibility if there are security considerations...etc.

@heitorlessa
Copy link
Contributor Author

Hey @keithrozario - Your suggestion is spot on, we do plan to have other languages after Python is GA :)

On 2., that's a very good point - Let me think this through as to whether this should be a tenet or state this more openly in the project's description.

On 3., I'm planning to create GitHub Issue and PR templates to make it explicit the reasoning for a breaking change, and migration process to this regard. Breaking changes are inevitable, but as you said, I want to make sure this is a tenet so all contributions, new features, or changes by AWS or outside have this in mind.

API themselves are now stable, so I'll focus on creating a docs website using portray, and create Issue templates to make this easier to maintain as we think of other languages to work on - Feature Design, Feature Request, RFC, Breaking Change checklist, etc.

It goes without saying that I appreciate the support :)

@heitorlessa heitorlessa pinned this issue May 16, 2020
@michaelbrewer
Copy link
Contributor

A standard method for handling secrets and/or configuration management would be great. Whether via Parameter store or Secrets manager, it should be fast and cost effective to use.

@heitorlessa
Copy link
Contributor Author

A standard method for handling secrets and/or configuration management would be great. Whether via Parameter store or Secrets manager, it should be fast and cost effective to use.

hey Michael - Would you be able to draft a RFC for the Developer experience you'd like on this?

https://github.com/awslabs/aws-lambda-powertools-python/issues/new?assignees=&labels=RFC%2C+triage&template=rfc.md&title=RFC%3A+

We think this is great. A RFC would help us answer some pending questions on developer experience, whether to support multiple params/secrets, where to expose a fetched param/secret (lambda context?), etc.

@heitorlessa
Copy link
Contributor Author

To auto enable CORS with SPAs I wrote my own class to convert dict to APIGW compatible payloads.

import json
from log import logger


def deep_render(body, to_json_method):
    if isinstance(body, str) or isinstance(body, int) or isinstance(body, dict):
        return body
    elif isinstance(body, list):
        result = []
        for el in body:
            result.append(deep_render(el, to_json_method))
        return result
    else:
        return to_json_method(body)


class ApiResponse:

    status_code = 200
    body = None
    headers = {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Credentials": "true"
    }

    def __init__(self, status_code=None, body=None, headers=None):
        if status_code is not None:
            self.status_code = status_code
        if body is not None:
            self.body = body
        if headers is not None:
            for key, value in headers.items():
                self.headers[key] = value

    def render(self, to_json_method=None):
        response = {'statusCode': self.status_code, 'headers': self.headers}
        if self.body is not None:
            self.body = deep_render(self.body, to_json_method)
            response['body'] = json.dumps(self.body)
        logger.debug('Response: %s', json.dumps(response))
        return response

Note that this code is more than a year old and was written when I started out with Lambda+Python. I hope there's something better out there that I just was too lazy to search for yet. Having some help to convert payloads to CORS compatible responses plus fully configurable status codes (I might want to return a payload and a certain http error code) would be nice.

hey Michael - Would you be able to draft a RFC for the Developer experience you'd like on this?

https://github.com/awslabs/aws-lambda-powertools-python/issues/new?assignees=&labels=RFC%2C+triage&template=rfc.md&title=RFC%3A+

CORS is a classic one and we could quickly do that after GA. HTTP API kinda handles that well now with the new payload format, so I'd like to discuss this with the team too

@bahrmichael
Copy link

@heitorlessa I'll have a look tomorrow.

@bahrmichael
Copy link

I have to postpone this a bit :/

@michaelbrewer
Copy link
Contributor

@heitorlessa - i will look at what we do for configuration management and see what fits for this type of framework.

I am wondering how would Chalice fit into this, or more specifically would we want a simple lightweight method of offering routes to a AWS Lambda proxy function linked to the API Gareway or an Application Load Balancer.

@heitorlessa
Copy link
Contributor Author

heitorlessa commented Jul 10, 2020 via email

@michaelbrewer
Copy link
Contributor

If CDK & Powertools could work nicely with Chalice then maybe that is all we need.

I would thinking along the lines when you need a API Gateway proxy lambda using python (which is probably the most common use case in that area), it would be nice that you can setup routes from the event and not worry about how to get the headers and request parameters from the JSON.

Having used aws-serverless-java-container you could does this by simplier parsing the json event as a http request and then treat this as if it was a real request.

All these other features are great, especially if there is zero overhead when you don't need them.

@michaelbrewer
Copy link
Contributor

@heitorlessa

An example would be how powertools would fit into a library like https://github.com/vincentsarago/lambda-proxy where you don't have a standard def lambda_handler(event, _): which you can add the @tracer.capture_lambda_handler annotation.

import json
from typing import Tuple

from lambda_proxy.proxy import API

APP = API(name="example-app", debug=True)


@APP.get("/example/<id_str>")
def example(id_str: str) -> Tuple[str, str, str]:
    return "OK", "application/json", json.dumps({"id": id_str})

So setting up routes like above is really handy when you have a couple lambdas for linked the api gateway using the proxy integration

@heitorlessa
Copy link
Contributor Author

Hey @michaelbrewer,

Could you open a RFC issue so we can discuss this at length?

The API Proxy lib example supports passing the context (under Advanced section), so that should work.

As regards to chalice, I'm not sure if they've added an option to include context or if it continues to be under "app.lambda_context" -- In the case of Chalice, it wouldn't be a clean cut, though doable.

As of now, we've consciously decided to support these on a best effort basis. However, I'm keen to have a RFC in place to hear from customers what a seamless experience could look like without bloating Powertools.

PS: I'm on PTO until 27th, so I'll experiment with this in the first week of August.

cc @nmoutschen @cakepietoast for visibility and ideas

@heitorlessa heitorlessa unpinned this issue Jul 28, 2020
@heitorlessa heitorlessa changed the title [RFC] Feature: Collection of tiny utilities for custom middlewares [RFC] Feature: Collection of tiny utilities -- We need your idea Aug 5, 2020
@heitorlessa heitorlessa pinned this issue Aug 22, 2020
@heitorlessa heitorlessa changed the title [RFC] Feature: Collection of tiny utilities -- We need your idea RFC: Collection of tiny utilities Aug 30, 2020
@heitorlessa
Copy link
Contributor Author

I'm closing this now as we have a public email to capture feedback that we can associate with companies to help us prioritize utilities more efficiently ;)

📧 aws-lambda-powertools-feedback@amazon.com

Some updates on this so far based on your feedback - We launched:

  • Chalice integration with Lambda Powertools
  • [Batch processing utilityhttps://awslabs.github.io/aws-lambda-powertools-python/utilities/batch/) to handle partial failures when processing messages from SQS, including a generic processor you can use for any batch processing
  • Parameters utility to fetch parameters from SSM, Secrets Manager, DynamoDB, and a generic means to bring your own repository
  • Lambda Context type to auto-complete what's available out there
  • JSON Schema validation utility to quickly validate incoming event against JSON Schema Draft 4 and 7, including JMESPath expressions to validate a selected portion of your data structure
  • Data classes utility to self-document multiple event sources, and handy methods to manipulate data and commonly sought keys
  • Parser utility to easily design and validate Pydantic models after your domain events

Next

  • Idempotency - An utility to help you detect duplicated requests, respond to them, handle exceptions and cache invalidation, inspect or combine idempotency tokens from your event, and use any control database you want to store them (or not)

HUGE thanks to everyone who contributed to it in all shapes and forms ;)

@mwarkentin
Copy link

mwarkentin commented Nov 25, 2020

@heitorlessa this is the first time I've seen that email - is it documented anywhere (I did a quick search but just came across this and #84 (comment))? Maybe readme or contributing.md would be a good place to put it? Maybe a small description about what types of things should be opened as issues vs. feedback in email?

@heitorlessa
Copy link
Contributor Author

hey @mwarkentin we've just created that email today.. I will update the README to share that more broadly - For now, my recommendation for the email would be in only two occasions: a) share sensitive information like company, project info, and b) when we explicit ask for feedback we know it might have some sensitive info.

As we've already finalized our decisions on this issue and #84 it felt prudent to ask for sensitive info to be sent via email - Right now, we're lacking feedback we can associate AWS customers to it to prioritize, and GitHub isn't the best place for this atm.

Hope that makes sense :)

@heitorlessa heitorlessa unpinned this issue Nov 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Could use a second pair of eyes/hands RFC
Projects
None yet
Development

No branches or pull requests

6 participants