Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(commons): aws sdk util + identity check #1708

Conversation

dreamorosi
Copy link
Contributor

@dreamorosi dreamorosi commented Sep 24, 2023

Description of your changes

This PR introduces changes that tweak the logic used to identify an object passed to a Powertools utility as an AWS SDK client. This work is needed so that we can include Parameters, Idempotency, and Batch to the Lambda Layers as described in the linked issue.

AWS SDK util

Since all of a sudden several Powertools utilities have the necessity of knowing whether an unknown object they received is an instance of an AWS SDK client, I have added a new set of utility dedicated to interacting with AWS SDK as part of the commons package/utility. This set of utility will include also the already existing UA middleware as well as the new ones being introduced here.

The main components of this util are:

  • Types to express a bare-bone AWS SDK client & some minimal user agent arguments
  • Type guard that receives an object and asserts its shape at runtime based on the type described above
  • User Agent middleware (already existing, simply refactored to use the new things introduced above, and moved into the folder)

For the purpose of our utilities, we will consider a viable AWS SDK client any object that has a function called send, a config object, and a middlewareStack object that contains two functions named identify and addRelativeTo respectively.

interface SdkClient {
  send: (args: unknown) => Promise<unknown>;
  config: {};
  middlewareStack: {
    identify: () => string[];
    addRelativeTo: (middleware: unknown, options: unknown) => void;
  };
}

The interface above is not enough for the client to function properly (aka make requests to AWS APIs) nor for the middleware stack to actually work end to end, however if these keys are not present then there's no point for us in continuing and we can discard the object as invalid.

User Agent Middleware

With the introduction of the SDK util and the type guard just mentioned, we can also finally improve (& simplify) the implementation of the UA middleware. When we first implemented the middleware we decided to not include any type from the AWS SDK, in hindsight this was the correct choice since it would have suffered from the same issue we are trying to solve.

In doing so however, we weren't able to narrow down the type of the parameters/objects in the middleware. This lead to using several // @ts-ignore comments to suppress issues. At the time we experimented with creating a custom type and a type guard similar to the one introduced in this PR but we decided to not use it to keep things simple.

With the type guard available however, we can reliably assert the type of the object passeed to the middleware factory and remove all the comments that suppressed TS issues. The liberal usage of these comments was also something that @sthulb raised during a review, so the change should be a net positive across the board.

With the types and arguments being less ambigous I have also refactored the unit tests for the middleware so that we are testing only the inputs/outputs and number of calls, rather than the AWS SDK middleware engine behavior (aka we test our own logic, no theirs)..

Determining the client type

After introducing the isSdkClient helper function described above I initially tried to find a reliable way of determining the client type. Essentially I wanted to ensure that not only an object was a valid AWS SDK v3 client, but also that the client passed belonged to the correct service (i.e. DynamoDBPersistenceLayer should receive an instance of DynamoDBClient and not SSMClient).

As discussed in the linked issue, class instance (i.e. instanceof DynamoDBClient) and prototype checks (i.e. client.prototype.name === 'DynamoDBClient') are out of the question. So the next thing I tried to do was to try finding some properties/attributes within the instantiated class that would allow us to reliably do this.

Option 1: serviceId

All clients appear to have a serviceId string within the config whose value is set to the service name by default. This was promising, however after looking at the docs I realized that we cannot rely on it since customers can modify its value by passing a custom one when instantiating a new client. This is useful in those cases where customizing the labels in the CloudWatch Service Map is desirable:

const client = new DynamoDBClient({});
console.log(client.config.serviceId);
// 'DynamoDB' <- label used in Service Map to represent this node

const clientWithCustomService = new DynamoDBClient({
  serviceId: 'PersistenceStore',
});
console.log(client.config.serviceId);
// 'PersistenceStore' <- label used in Service Map to represent this node
Option 2: defaultSigningName

Another candidate that I found was another property called defaultSigningName also found within the config object. This string represents the value used when signing a request with Sigv4 (see "service name" string in diagram here). This property is not configurable so it appeared to be the perfect candidate for this purpose.

After running a few tests however I found out that this property was introduced somewhat recently or anyway in a version that postdates the one present in the Lambda runtime (3.188.0).

With the above in mind we cannot rely on this field either since it wouldn't work with clients instantiated using the SDK version bundled in the runtime.

Option 3: endpoint

The next, and last, option would be using the endpoint method, which is an async function that returns the API endpoint that the client will send request to. Given that all AWS endpoints have unique hostnames we could have used this value to determine somewhat reliably the type of client.

This option however had two problems: 1/ it's an async method, 2/ endpoint is also user-configurable.

In the current implementation the logic for checking the client is found in the constructor of the Powertools utilities. This means we cannot make asynchronous calls since JS allows only synchronous code to be run in a class constructor (for a good reason).

We could have circumvented the issue by refactoring the client check & instantiation to happen asynchronously, kind of like lazy-loading the client only when the first call is made, however this would have complicated the overall implementation for little benefit since the endpoint itself is not a silver bullet.

When creating an SDK client customers can specify a custom endpoint, this is useful mainly for testing but can be used also for advanced use cases that require proxies or custom endpoints. In all these cases we wouldn't be able to make assumptions on the service type based on the hostname.

Given that the use cases described above are also some of the primary reasons why someone would want to use a custom SDK client with the Powertools utilities, I decided to discard this option and refactor the client check/instantiation.

Option 4: do nothing

Before starting this refactor, relaxing the checks was a last resort in my view. After investigating all the previous ones however I am now convinced that it's the most sensible option to avoid getting in the way of customers.

The reasons for this change of mind are:

  • passing a custom SDK client is an advanced and opt-in feature, customers using this feature have likely read the documentation or IDE docstrings and know what to pass
  • the parameter is strongly typed, while this doesn't prevent runtime errors in itself both IDE and compile steps will fail unless all safety measures are explicitly disabled (i.e. // @ts-ignore)
  • follow-up of the above, we don't perform runtime checks in other similar places like when the customer passes us client configurations
  • the AWS SDK itself doesn't do this type of checks. For instance, doing this is allowed: DynamoDBDocumentClient.from(new SSMClient({})) since they also don't perform runtime checks on this type of thing

For the reasons above, at least for the time being, I suggest we simply try to check whether the object looks like an AWS SDK client and if so, trust that it is. If the provided object has the shape of a client but ends up not being one, this will result in a runtime error.

So to sum up, the current implementation looks like this:

if (config?.awsSdkv3Client && isSdkClient(config.awsSdkv3Client)) {
  // use this client
} else {
  // create a new client
}

I also expect that if the logic above is too coarse we'll listen to customer feedback and refine it based on their usage.

Unit Tests

Additionally, the unit tests for all providers have been modified to use scoped module mocking when testing the user agent middleware. Prior to this PR the whole module was imported and then a spy was placed on the specific function. This is not bad in itself, however with the export having now changed, importing everything with * would have been wasteful. This method is also more in line with other tests (i.e. Tracer Provider test) and doesn't require any extra spy to be created.

Other minor changes

The PR also includes the following minor changes:

  • Remove the now redundant devDependencies from the commons package as result of the change
  • Add two AWS SDK related devDependencies to the testing package since it's the package that directly uses them
  • Remove the example of install command from the docstring of the Parameters utility since it was breaking intellicode tooltips in VSCode
  • Apply native scoped mock of addUserAgentMiddleware function to Tracer unit tests

Related issues, RFCs

Issue number: #1707

Checklist

  • My changes meet the tenets criteria
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in areas that should be flagged with a TODO, or hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my change is effective and works
  • The PR title follows the conventional commit semantics

Breaking change checklist

Is it a breaking change?: NO

  • I have documented the migration process
  • I have added, implemented necessary warnings (if it can live side by side)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

@dreamorosi dreamorosi requested a review from a team as a code owner September 24, 2023 15:13
@dreamorosi dreamorosi linked an issue Sep 24, 2023 that may be closed by this pull request
2 tasks
@boring-cyborg boring-cyborg bot added commons This item relates to the Commons Utility idempotency This item relates to the Idempotency Utility parameters This item relates to the Parameters Utility dependencies Changes that touch dependencies, e.g. Dependabot, etc. tests PRs that add or change tests labels Sep 24, 2023
@pull-request-size pull-request-size bot added the size/XL PRs between 500-999 LOC, often PRs that grown with feedback label Sep 24, 2023
@dreamorosi dreamorosi self-assigned this Sep 24, 2023
@github-actions github-actions bot added the enhancement PRs that introduce minor changes, usually to existing features label Sep 24, 2023
@dreamorosi
Copy link
Contributor Author

dreamorosi commented Sep 24, 2023

@dreamorosi
Copy link
Contributor Author

dreamorosi commented Sep 24, 2023

I have also published a test Lambda Layer to our beta account, this is the partial ARN arn:aws:lambda:eu-west-1:************:layer:1707-pr:5 (replace the Account ID).

The Layer was created using the content of this branch using the script below (see details).

#!/usr/bin/env bash

set -e

# Clean up tmp folder and create nodejs folder
rm -rf tmp/*
mkdir -p tmp/nodejs

# Build and package all utilities
## Commons
npm run build -w packages/commons
npm pack -w packages/commons
mv aws-lambda-powertools-commons-*.tgz tmp/nodejs
## Logger
npm run build -w packages/logger
npm pack -w packages/logger
mv aws-lambda-powertools-logger-*.tgz tmp/nodejs
## Metrics
npm run build -w packages/metrics
npm pack -w packages/metrics
mv aws-lambda-powertools-metrics-*.tgz tmp/nodejs
## Tracer
npm run build -w packages/tracer
npm pack -w packages/tracer
mv aws-lambda-powertools-tracer-*.tgz tmp/nodejs
## Parameters
npm run build -w packages/parameters
npm pack -w packages/parameters
mv aws-lambda-powertools-parameters-*.tgz tmp/nodejs
## Batch
npm run build -w packages/batch
npm pack -w packages/batch
mv aws-lambda-powertools-batch-*.tgz tmp/nodejs
## Idempotency
npm run build -w packages/idempotency
npm pack -w packages/idempotency
mv aws-lambda-powertools-idempotency-*.tgz tmp/nodejs

cd tmp/nodejs
npm init -y

# Install utilities & dependencies
npm i \
  aws-lambda-powertools-commons-*.tgz \
  aws-lambda-powertools-logger-*.tgz \
  aws-lambda-powertools-metrics-*.tgz \
  aws-lambda-powertools-tracer-*.tgz \
  aws-lambda-powertools-idempotency-*.tgz \
  aws-lambda-powertools-parameters-*.tgz \
  aws-lambda-powertools-batch-*.tgz

# Remove unnecessary files
rm -rf node_modules/@types \
  aws-lambda-powertools-*.tgz \
  node_modules/@aws-lambda-powertools/*/lib/*.d.ts \
  node_modules/@aws-lambda-powertools/*/lib/*.d.ts.map \
  node_modules/**/README.md \
  node_modules/@smithy \
  node_modules/@aws-sdk \
  node_modules/.bin/semver \
  node_modules/async-hook-jl/test \
  node_modules/shimmer/test \
  node_modules/jmespath/artifacts \
  package.json \
  package-lock.json

# Restore to root
cd ../..

# Restore changes to package.json
git restore packages/*/package.json

You can attach it to a Lambda function that is bundled with esbuild and that either brings its own AWS SDK or uses the bundled one and verify that the client instantiated and passed in the user code is accepted as valid. Conversely, if you pass another exotic object (aka not an SDK) it'll still work but use the default one (see logged serviceId which corresponds to the correct service).

Here's a suggested handler:

import { Logger } from "@aws-lambda-powertools/logger";
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBPersistenceLayer } from "@aws-lambda-powertools/idempotency/dynamodb";
import { DynamoDBProvider } from "@aws-lambda-powertools/parameters/dynamodb";

const logger = new Logger({ serviceName: "my-service" });

const client = new DynamoDBClient({
  serviceId: "my-service",
});
const wrongClient = {};

const persistenceStoreWithCustom = new DynamoDBPersistenceLayer({
  awsSdkV3Client: client,
  tableName: "my-table",
});
const persistenceStoreWithDefault = new DynamoDBPersistenceLayer({
  awsSdkV3Client: wrongClient as unknown as DynamoDBClient,
  tableName: "my-table",
});

const providerWithCustom = new DynamoDBProvider({
  awsSdkV3Client: client,
  tableName: "my-table",
});
const providerWithDefault = new DynamoDBProvider({
  awsSdkV3Client: wrongClient as unknown as DynamoDBClient,
  tableName: "my-table",
});

export const handler = async () => {
  logger.info("It works with custom", {
    details: {
      shouldBe: "my-service",
      actual: {
        // @ts-ignore-next-line
        persistenceStore: persistenceStoreWithCustom.client.config.serviceId,
        provider: providerWithCustom.client.config.serviceId,
      },
    },
  });

  logger.info("It falls back with wrong", {
    details: {
      shouldBe: "DynamoDB",
      actual: {
        // @ts-ignore-next-line
        persistenceStore: persistenceStoreWithDefault.client.config.serviceId,
        provider: providerWithDefault.client.config.serviceId,
      },
    },
  });
};

Logs:

START RequestId: 1cef7b8f-a1b6-432a-9f23-d4c80c57fb4c Version: $LATEST
{"level":"INFO","message":"It works with custom","service":"my-service","timestamp":"2023-09-25T09:31:54.138Z","xray_trace_id":"1-65115389-69a64255751730775215ed93","details":{"shouldBe":"my-service","actual":{"persistenceStore":"my-service","provider":"my-service"}}}
{"level":"INFO","message":"It falls back with wrong","service":"my-service","timestamp":"2023-09-25T09:31:54.237Z","xray_trace_id":"1-65115389-69a64255751730775215ed93","details":{"shouldBe":"DynamoDB","actual":{"persistenceStore":"DynamoDB","provider":"DynamoDB"}}}
END RequestId: 1cef7b8f-a1b6-432a-9f23-d4c80c57fb4c
REPORT RequestId: 1cef7b8f-a1b6-432a-9f23-d4c80c57fb4c  Duration: 192.00 ms     Billed Duration: 193 ms Memory Size: 128 MB     Max Memory Used: 97 MB  Init Duration: 563.93 ms
XRAY TraceId: 1-65115389-69a64255751730775215ed93       SegmentId: 0157cc160c071188     Sampled: true

@dreamorosi dreamorosi marked this pull request as draft September 24, 2023 22:33
@dreamorosi
Copy link
Contributor Author

dreamorosi commented Sep 24, 2023

Apparently the version of the AWS SDK (3.188.0) present in the Lambda runtime (nodejs18x) doesn't have the defaultSigningName key in the config object as it was introduced sometime afterwards.

I need to find another way to identify the client so I'm moving this back to a draft.

EDIT: I wend in a different direction, read OP

@dreamorosi
Copy link
Contributor Author

@dreamorosi dreamorosi marked this pull request as ready for review September 25, 2023 09:34
Copy link
Contributor

@am29d am29d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely incredible PR which reads like a good thriller. I was pondering the ideas over the weekend and ended up with similar conclusion: either check the shape or remove the check.

The PR looks great, thanks a lot for putting so much work into it and documenting the decision process. I have just one small suggestion.

In case customers pass a wrong client with the assumption that their client is actually correct and the utility should accept it. Right now, we silently default to a valid SDK client, without emitting any information about this step. Can we add console.warn in the constructor to surface this information and reduce potential time to troubleshoot?

@sonarcloud
Copy link

sonarcloud bot commented Sep 25, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@dreamorosi
Copy link
Contributor Author

Thanks for the good feedback, it's nice to see you appreciated it :D

I have added the warning log as you suggested. Initially I opted not to do it to avoid complicating the implementation but I agree it's worth it.

To avoid a double nested if/else statement I've changed the logic to optimistically instantiate a new client and replace it with the provided one only when a valid one is passed. As far as I can tell the client instantiation is cheap so the impact should be negligible.

@am29d am29d merged commit 02aae6d into main Sep 25, 2023
11 checks passed
@am29d am29d deleted the 1707-maintenance-refactor-aws-sdk-client-check-in-parameters-and-idempotency branch September 25, 2023 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commons This item relates to the Commons Utility dependencies Changes that touch dependencies, e.g. Dependabot, etc. enhancement PRs that introduce minor changes, usually to existing features idempotency This item relates to the Idempotency Utility parameters This item relates to the Parameters Utility size/XL PRs between 500-999 LOC, often PRs that grown with feedback tests PRs that add or change tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Maintenance: refactor AWS SDK client check in Parameters and Idempotency
2 participants