Merge pull request #241 from andrestoll/doc-input-readme

README restructuring (input-output)
alexcasalboni · Apr 5, 2024 · acbcf36 · acbcf36
2 parents 97bf068 + f11dbb9
commit acbcf36
Show file tree

Hide file tree

Showing 6 changed files with 212 additions and 236 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,50 @@
+## CHANGELOG (SAR versioning)
+
+From most recent to oldest, with major releases in bold:
+
+* *4.3.4* (2024-02-26): upgrade to Nodejs20, custom state machine prefix, SDKv3 migration, new includeOutputResults input parameter, JSON loggin support
+* *4.3.3* (2023-10-30): parametrized currency for visualization URL (USD|CNY)
+* *4.3.2* (2023-08-16): new disablePayloadLogs flag, updated documentation
+* *4.3.1* (2023-05-09): update dependencies, add VPC Configuration support, use Billed Duration instead Duration from logs, update state machine with ItemSelector
+* ***4.3.0*** (2023-03-06): SnapStart support (alias waiter)
+* *4.2.3* (2023-03-01): fix layer runtime (nodejs16.x)
+* *4.2.2* (2023-02-15): configurable sleep parameter, bump runtime to nodejs16.x, docs updates, GH Actions, and minor bug fixes
+* *4.2.1* (2022-08-02): customizable SDK layer name and logs retention value
+* ***4.2.0*** (2022-01-03): support S3 payloads
+* *4.1.4* (2022-01-03): sorting bugfix and updated dependencies
+* *4.1.3* (2021-12-16): support simple strings as event payload
+* *4.1.2* (2021-10-12): add x86_64 fallback when Graviton is not supported yet
+* *4.1.1* (2021-10-12): fixed connection timeout for long-running functions
+* ***4.1.0*** (2021-10-11): support Lambda functions powered by Graviton2
+* ***4.0.0*** (2021-08-16): support AWS Lambda states expansion to all functions
+* *3.4.2* (2020-12-03): permissions boundary bugfix (Step Functions role)
+* *3.4.1* (2020-12-02): permissions boundary support
+* ***3.4.0*** (2020-12-01): 1ms billing
+* *3.3.3* (2020-07-17): payload logging bugfix for pre-processors
+* *3.3.2* (2020-06-17): weighted payloads bugfix (for real)
+* *3.3.1* (2020-06-16): weighted payloads bugfix
+* ***3.3.0*** (2020-06-10): Pre/Post-processing functions, correct regional pricing, customizable execution timeouts, and other internal improvements
+* *3.2.5* (2020-05-19): improved logging for weighted payloads and in case of invocation errors
+* *3.2.4* (2020-03-11): dryRun bugfix
+* *3.2.3* (2020-02-25): new dryRun input parameter
+* *3.2.2* (2020-01-30): upgraded runtime to Node.js 12.x
+* *3.2.1* (2020-01-27): improved scripts and SAR template reference
+* ***3.2.0*** (2020-01-17): support for weighted payloads
+* *3.1.2* (2020-01-17): improved optimal selection when same speed/cost
+* *3.1.1* (2019-10-24): customizable least-privilege (lambdaResource CFN param)
+* ***3.1.0*** (2019-10-24): $LATEST power reset and optional auto-tuning (new Optimizer step)
+* ***3.0.0*** (2019-10-22): dynamic parallelism (powerValues as execution parameter)
+* *2.1.3* (2019-10-22): upgraded runtime to Node.js 10.x
+* *2.1.2* (2019-10-17): new balanced optimization strategy
+* *2.1.1* (2019-10-10): custom domain for visualization URL
+* ***2.1.0*** (2019-10-10): average statistics visualization (URL in state machine output)
+* ***2.0.0*** (2019-07-28): multiple optimization strategies (cost and speed), new output format with AWS Step Functions and AWS Lambda cost
+* *1.3.1* (2019-07-23): retry policies and failed invocations management
+* ***1.3.0*** (2019-07-22): implemented error handling
+* *1.2.1* (2019-07-22): Node.js refactor and updated IAM permissions (added lambda:UpdateAlias)
+* ***1.2.0*** (2019-05-24): updated IAM permissions (least privilege for actions)
+* *1.1.1* (2019-05-15): updated docs
+* ***1.1.0*** (2019-05-15): cross-region invocation support
+* *1.0.1* (2019-05-13): new README for SAR
+* ***1.0.0*** (2019-05-13): AWS SAM refactor (published on SAR)
+* *0.0.1* (2017-03-27): previous project (serverless framework)
diff --git a/README-ADVANCED.md b/README-ADVANCED.md
@@ -59,3 +59,79 @@ The AWS Step Functions state machine is composed of five Lambda functions:
 
 Initializer, cleaner, analyzer, and optimizer are executed only once, while the executor is used by N parallel branches of the state machine - one for each configured power value. By default, the executor will execute the given Lambda function `num` consecutive times, but you can enable parallel invocation by setting `parallelInvocation` to `true`.
 
+## Weighted Payloads
+
+> [!IMPORTANT]
+> Your payload will only be treated as a weighted payload if it adheres to the JSON structure that follows. Otherwise, it's assumed to be an array-shaped payload.
+
+Weighted payloads can be used in scenarios where the payload structure and the corresponding performance/speed could vary a lot in production and you'd like to include multiple payloads in the tuning process.
+
+You may want to use weighted payloads also in case of functions with side effects that would be hard or impossible to test with the very same payload (for example, a function that deletes records from a database).
+
+You configure weighted payloads as follows:
+
+```json
+{
+    ...
+    "num": 50,
+    "payload": [
+        { "payload": {...}, "weight": 5 },
+        { "payload": {...}, "weight": 15 },
+        { "payload": {...}, "weight": 30 }
+    ]
+}
+```
+
+In the example above, the weights `5`, `15`, and `30` are used as relative weights. They will correspond to `10%` (5 out of 50), `30%` (15 out of 50), and `60%` (30 out of 50) respectively - meaning that the corresponding payload will be used 10%, 30% and 60% of the time.
+
+For example, if `num=100` the first payload will be used 10 times, the second 30 times, and the third 60 times.
+
+To simplify these calculations, you could use weights that sum up to 100.
+
+Note: the number of weighted payloads must always be smaller or equal than `num` (or `num >= count(payloads)`). For example, if you have 50 weighted payloads, you'll need to set at least `num: 50` so that each payload will be used at least once.
+
+
+## Pre/Post-processing functions
+
+Sometimes you need to power-tune Lambda functions that have side effects such as creating or deleting records in a database. In these cases, you may need to execute some pre-processing or post-processing logic before and/or after each function invocation.
+
+For example, imagine that you are power-tuning a function that deletes one record from a downstream database. Since you want to execute this function `num` times you'd need to insert some records in advance and then find a way to delete all of them with a dynamic payload. Or you could simply configure a pre-processing function (using the `preProcessorARN` input parameter) that will create a brand new record before the actual function is executed.
+
+Here's the flow in pseudo-code:
+
+```
+function Executor:
+  iterate from 0 to num:
+    [payload = execute Pre-processor (payload)]
+    results = execute Main Function (payload)
+    [execute Post-processor (results)]
+```
+
+Please also keep in mind the following:
+
+* You can configure a pre-processor and/or a post-processor independently
+* The pre-processor will receive the original payload
+* If the pre-processor returns a non-empty output, it will overwrite the original payload
+* The post-processor will receive the main function's output as payload
+* If a pre-processor or post-processor fails, the whole power-tuning state machine will fail
+* Pre/post-processors don't have to be in the same region of the main function
+* Pre/post-processors don't alter the statistics related to cost and performance
+
+## S3 payloads
+
+In case of very large payloads above 256KB, you can provide an S3 object reference (`s3://bucket/key`) instead of an inline payload.
+
+Your state machine input will look like this:
+
+```json
+{
+    "lambdaARN": "your-lambda-function-arn",
+    "powerValues": [128, 256, 512, 1024],
+    "num": 50,
+    "payloadS3": "s3://your-bucket/your-object.json"
+}
+```
+
+Please note that the state machine will require IAM access to your S3 bucket, so you might need to redeploy the Lambda Power Tuning application and configure the `payloadS3Bucket` parameter at deployment time. This will automatically generate a custom IAM managed policy to grant read-only access to that bucket. If you want to narrow down the read-only policy to a specific object or pattern, use the `payloadS3Key` parameter (which is `*` by default).
+
+S3 payloads work fine with weighted payloads too.
diff --git a/README-DEPLOY.md b/README-DEPLOY.md
@@ -9,8 +9,7 @@ There are 5 deployment options for deploying the tool using Infrastructure as Co
 1. [Using Terraform by Hashicorp and SAR](#option4)
 1. [Using native Terraform](#option5)
 
-
-Read more about the [deployment parameters here](README-INPUT-OUTPUT.md#state-machine-configuration-at-deployment-time).
+Read more about the [deployment parameters here](README.md#state-machine-configuration-at-deployment-time).
 
 ## Option 1: AWS Serverless Application Repository<a name="option1"></a>
 

diff --git a/README-EXECUTE.md b/README-EXECUTE.md
@@ -10,6 +10,23 @@ Feel free to customize the `scripts/sample-execution-input.json`, and then run `
 
 The script will start a state machine execution, wait for the execution to complete (polling), and then show the execution results.
 
+### Usage in CI/CD pipelines
+
+If you want to run the state machine as part of your continuous integration pipeline and automatically fine-tune your functions at every deployment, you can execute it with the script `scripts/execute.sh` (or similar) by providing the following input parameters:
+
+```json
+{
+    "lambdaARN": "...",
+    "num": 10,
+    "payload": {},
+    "powerValues": [128, 256, 512, ...],
+    "autoOptimize": true,
+    "autoOptimizeAlias": "prod"
+}
+```
+
+You can use different alias names such as `dev`, `test`, `production`, etc. If you don't configure any alias name, the state machine will only update the `$LATEST` alias.
+
 ## Option 2: Execute the state machine manually (web console)
 
 Once the state machine is deployed, you can execute it and provide an input object.
@@ -18,7 +35,7 @@ You will find the new state machine in the [Step Functions Console](https://cons
 
 The state machine name will depend on the stack name (default: `aws-lambda-power-tuning`). Find it and click "**Start execution**".
 
-You'll be able to provide the execution input (check the [full documentation here](README-INPUT-OUTPUT.md)]), which will look like this:
+You'll be able to provide the execution input (check the [full documentation here](README.md#state-machine-input-at-execution-time)), which will look like this:
 
 ```json
 {