feat: add support for function logs streaming to sandbox #1492

Amplifiyer · 2024-05-10T16:03:07Z

Changes

Add support for streaming function logs in sandbox

CLI
- Default to disabled with an explicit opt-in --stream-function-logs
- Allow customers to specify a list of function names for which logs should be streamed
- Sandbox with --once should not stream function logs
- Allow specifying a file path to stream all the function's execution log events to.
Display
- Display log events with the "friendly name" of the function in a specific color. Total of 5 colors used in round robin
- Display a timestamp for each log event
- Colors are turned off for file writing.
Functionality
- While sandbox is idle, function logs are streamed in parallel for all the functions as they are executed.
- Log streaming is paused during a deployment and resumes from the time the deployment was initiated to avoid intermingling the logs
- New functions, deleted functions are automatically updated after a deployment, i.e. if customer adds a new function, it's logs will be automatically streamed (if no filter was used)
New permissions need to be added to the managed policy AmplifyBackendDeployFullAccess

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "FunctionLogsStreaming",
            "Effect": "Allow",
            "Action": [
                "lambda:ListTags",
                "logs:FilterLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:log-group:/aws/lambda/amplify-*:*",
                "arn:aws:lambda:*:*:function:amplify-*"
            ]
        }
    ]
}

Validation

Unit tests added

Checklist

If this PR includes a functional change to the runtime behavior of the code, I have added or updated automated test coverage for this change.
If this PR requires a change to the Project Architecture README, I have included that update in this PR.
If this PR requires a docs update, I have linked to that docs PR above.
If this PR modifies E2E tests, makes changes to resource provisioning, or makes SDK calls, I have run the PR checks with the run-e2e label set.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

changeset-bot · 2024-05-10T16:03:12Z

🦋 Changeset detected

Latest commit: b4b1506

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages

Name	Type
@aws-amplify/cli-core	Minor
@aws-amplify/sandbox	Minor
@aws-amplify/backend-cli	Minor
create-amplify	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

packages/sandbox/src/sandbox_singleton_factory.ts

edwardfoyle · 2024-05-10T16:49:28Z

packages/sandbox/src/lambda_function_log_streamer.ts

+    const backendOutput: BackendOutput =
+      await this.backendOutputClient.getOutput(sandboxBackendId);
+
+    const definedFunctionsPayload =
+      backendOutput[functionOutputKey]?.payload.definedFunctions;
+    const deployedFunctionNames = definedFunctionsPayload
+      ? (JSON.parse(definedFunctionsPayload) as string[])
+      : [];


Can we use DeployedBackendClient.getBackendMetadata() instead? That returns a list of FunctionConfiguration objects which should eliminate the need to depend on backend-output-schemas directly in this package

We shouldn't. DeployedBackendClient.getBackendMetadata() is a pretty heavy command as it loads all the nested stacks and all resources.

We need something as lightweight as possible since this method gets called every time sandbox gets idle.

Maybe we need some sort of "filter" prop for getBackendMetadata where it can be instructed to only load metadata from certain verticals? So in this case, we could use it to only load function metadata

We don't even need the function stack as technically what we are loading is not a metadata unless we start adding the "friendly name" in the metadata or outputs section. In that case we can continue to use getOutput here.

I also don't see any issues with depending on backend-output-schemas here. backendOutputClient.getOutput is the right abstraction here.

Taking a closer look at this, it looks like we're only using the functionOutputKey from that package. However, getOutput should already have typed keys so we can just use the string directly here and we'll get typechecking without having to import the schema package.

I will update it to using the key "AWS::Amplify::Function" directly, but wouldn't it just add duplicity. Not that we are ever going to change it but I don't understand the drawback of this dependency?

It's not a huge deal. In my head, the deployed-backend-client is the "entry point" by which all readers should get info about the backend. the schema package is meant to be a contract between the backend-client package and the places where the output is written (the constructs). Ideally the schema shouldn't be needed directly by consumers of deployed-backend-client

It makes sense to have one writer, but I believe it's fine to have multiple readers. We already have client-config and model-generator

That's true but they both use deployed-backend-client to get backend output. Neither of them goes directly to CFN

This is also coming from deployed-backed-client

const backendOutput: BackendOutput = await this.backendOutputClient.getOutput(sandboxBackendId);

Same as

amplify-backend/packages/cli/src/commands/generate/forms/generate_forms_command.ts

Line 69 in 20bf679

const output = await backendOutputClient.getOutput(backendIdentifier);

and

amplify-backend/packages/client-config/src/generate_client_config.ts

Line 52 in 20bf679

backendOutputClient.getOutput(backendIdentifier)

packages/cli-core/src/format/format.ts

edwardfoyle · 2024-05-10T17:06:06Z

packages/cli-core/src/printer/printer.ts

    private readonly refreshRate: number = 500
-  ) {}
+  ) {
+    // if not running in tty, turn off colors


why is this class manipulating colors?

Since this is a central place for performing logging, it's better to turn off/on colors here rather than at the consumers place. Especially since this $ is a global variable, the colors will only change when one consumer changes it making the colors inconsistent.

Aren't the methods in format already a noop if colors are not enabled? Why do we need additional logic for it here?

format uses supportColors which only looks at the environment/terminal you are running in and decides whether to enable or disable colors. E.g. in CI/CD the colors might be disabled. See https://github.com/isaacs/color-support

In the printer, we might be piping the data in some other place as well (e.g. writing to a file) supportColors doesn't know that and will keep the colors enabled.

I'm not sure this is right.

If $.enabled is global setting but there could be two Printer instances - one writing to console and the other to file - then how do we decide if colors should be enabled or not.

So either color control should not be in this class
OR color control should be implemented differently - for example given that node is single threaded try-finally blocks at each printX call could be used.

I removed it from printer such that printer will print whatever is given to it. Instead of fiddling with kleur's $ the caller can then choose to not use format or colors.

packages/cli/src/commands/sandbox/sandbox_command.ts

packages/sandbox/src/cloudwatch_logs_monitor.ts

packages/sandbox/src/file_watching_sandbox.ts

packages/sandbox/src/lambda_function_log_streamer.ts

packages/cli-core/src/format/format.ts

sobolk

Looks good overall.

sobolk · 2024-06-06T15:21:45Z

.github/workflows/health_checks.yml

+      - name: Install and build baseline version
+        run: |
+          npm ci
+          npm run build


It took me a moment to recall what this was for.
Perhaps this could go into main in small dedicated PR?

I'll remove this here once #1624 is merged, otherwise the e2e fails.

sobolk · 2024-06-06T16:54:47Z

packages/cli-core/src/printer/printer.ts

    private readonly refreshRate: number = 500
-  ) {}
+  ) {
+    // if not running in tty, turn off colors


I'm not sure this is right.

If $.enabled is global setting but there could be two Printer instances - one writing to console and the other to file - then how do we decide if colors should be enabled or not.

So either color control should not be in this class
OR color control should be implemented differently - for example given that node is single threaded try-finally blocks at each printX call could be used.

packages/cli/src/commands/sandbox/sandbox_command.test.ts

packages/cli/src/commands/sandbox/sandbox_command.ts

packages/sandbox/src/cloudwatch_logs_monitor.ts

sobolk · 2024-06-06T17:09:58Z

packages/sandbox/src/cloudwatch_logs_monitor.ts

+    for (const logGroup of this.allLogGroups) {
+      promises.push(this.readEventsFromLogGroup(logGroup));
+    }
+    return (await Promise.all(promises)).flat();


What happens if these calls start to fail, get throttled?
Would sandbox stop streaming or would it keep trying until problem goes away?

Should sandbox try for predefned period of time or number of times and give up at some point?

Sandbox will keep trying with it's polling frequency, sandbox won't get interrupted through since log watching is running asynchronously.

Let me look into adding more error handling to perhaps stop streaming if too many unrecoverable errors happen.

Fixed the error handling. We will now pause the streaming when an exception happens. Subsequent deployments if any will again resume the streaming if it has been fixed.

sobolk · 2024-06-06T17:11:43Z

packages/sandbox/src/cloudwatch_logs_monitor.ts

+      // As long as there are _any_ events in the log group `filterLogEvents` will return a nextToken.
+      // This is true even if these events are before `startTime`. So if we have 100 events and a nextToken
+      // then assume that we have hit the limit and let the user know some messages have been suppressed.
+      // We are essentially showing them a sampling (10000 events printed out is not very useful)
+      if (filteredEvents.length === 100 && response.nextToken) {


I think that this shouldn't apply when we're streaming to file.

File is a different case and somebody may want to grep it/search it and missing entries will be surprising there.

also, for console. should sampling be configurable through command parameters?

also, for console. should sampling be configurable through command parameters?

This feature is exclusively for sandbox, so won't apply for console at all.

I think that this shouldn't apply when we're streaming to file.

There is a security aspect here as well to prevent overwhelming the machine with too many logs (DoS)

This feature is exclusively for sandbox, so won't apply for console at all.

I meant console.log vs file.write scenario on local computer.

Oh I see, that is not part of the requirements, but can be easily added in the future if needed.

I think this is going to be most wanted feature for people who stream logs to local file.
Mind creating GH issue?

packages/sandbox/src/file_watching_sandbox.ts

packages/cli-core/src/format/format.ts

packages/cli/src/commands/sandbox/sandbox_command.ts

edwardfoyle · 2024-06-07T20:25:48Z

packages/cli/src/commands/sandbox/sandbox_command.ts

+          group: 'Logs streaming',
+        })
+        .option('logs-filter', {
+          describe: `Regex pattern to filter logs from only matched functions. E.g. to stream logs for a function, specify it's name, and to stream logs from all functions starting with auth specify 'auth' Default: Stream all logs`,


Shouldn't a regex for "functions starting with auth" be ^auth.*? Also, why are we supporting regex and array input? You can make a regex to support multiple disjoint strings with (foo|bar|baz)

IMO if we are going to support regex, it should just be a single arg

For a full regex match yes, but I'm going for a partial string match as in my opinion it's easier to specify and also a default for pretty much all regex matchers. I don't see why we would prevent partial matching.

Similarly providing array just makes it easier to provide input instead of creating complex regex pattern. @josefaidt to provide more input here.

👋 yes the important piece of regex is the wildcard match (e.g. auth* picks up all functions named with auth like auth-post-confirmation and auth-pre-signup) but it is much easier to specify multiple patterns than it is to craft complex regex when writing the command

--logs-filter auth* --logs-filter resolver*

I think regex can be a bit of a hurdle though compared to a glob pattern

Do we want to support glob patterns or regex then? The current impl is using regex which would require --logs-filter auth.* logs-filter resolver.*

glob patterns don't make sense for non pathlike strings. Globs are typically (or only?) used for pathname expansions.

Yeah fair enough. Just want to make sure we're aligned that customers will have to specify auth.*, not auth*

There are use cases for glob patterns outside filenames
https://www.sqlitetutorial.net/sqlite-glob/
https://duckdb.org/docs/sql/functions/pattern_matching.html#glob

like minimatch without the globstar

Minimatch is basically a glob implementation, still only relevant for pathlike string matching, not for any arbitrary matches.

In the SQL world, the glob is basically implemented as a regex, e.g. there is no such thing as **.

edwardfoyle · 2024-06-07T20:37:51Z

packages/sandbox/src/file_watching_sandbox.ts

+    if (options.functionStreamingOptions?.logsOutFile) {
+      this.functionsLogStreamer.setOutputLocation(
+        options.functionStreamingOptions.logsOutFile
+      );
+    }


This seems like something that should be set in the ctor of the LambdaFunctionLogStreamer

None of the sandbox options are provided to the SandboxSingletonFactory like outDir. I have changed it to pass this option in the start/activate like the BackendDeployer instead of just a method for setOutputLocation

packages/sandbox/src/file_watching_sandbox.ts

packages/sandbox/src/lambda_function_log_streamer.ts

edwardfoyle · 2024-06-07T20:44:44Z

packages/sandbox/src/sandbox.ts

@@ -37,6 +37,13 @@ export type SandboxOptions = {
  format?: ClientConfigFormat;
  profile?: string;
  watchForChanges?: boolean;
+  functionStreamingOptions?: SandboxFunctionStreamingOptions;


related to a comment above, I would remove this from the SandboxOptions and instead inject it directly into the LambdaFunctionLogStreamer

edwardfoyle

Looks good, just left a few comments regarding what props get passed in to methods vs ctors

edwardfoyle · 2024-06-18T16:45:10Z

packages/sandbox/src/lambda_function_log_streamer.ts

+    sandboxBackendId: BackendIdentifier,
+    streamingOptions?: SandboxFunctionStreamingOptions


Since these parameters don't change over the lifecycle of this class, it seems like they should be part of the ctor

See this #1492 (comment)

This is continuing the current DI pattern used everywhere. We are not passing the arguments to the factory for other commands as well. I'd like to keep this consistent and if needed have a refactor when required.

This is where we are instantiating these classes https://github.com/aws-amplify/amplify-backend/pull/1492/files#diff-2df4a084e13b31a8ac7a58aa826a88f39368636f33c58791294e267be36ac291R53

edwardfoyle · 2024-06-18T16:50:42Z

packages/sandbox/src/cloudwatch_logs_monitor.ts

+   * If the file doesn't exist it will be created.
+   * @param outputLocation file location
+   */
+  activate = (outputLocation?: string): void => {


Should outputLocation be part of the ctor?

Amplifiyer added 3 commits May 10, 2024 17:44

feat: add support for function logs streaming to sandbox

74ccb5f

Merge branch 'main' into function_logs

4d30c77

update package lock

9141649

Amplifiyer added the run-e2e Label that will include e2e tests in PR checks workflow label May 10, 2024

Amplifiyer added 2 commits May 10, 2024 19:09

update package lock

11bc7b6

update package lock

3dacebd

edwardfoyle reviewed May 10, 2024

View reviewed changes

PR feedback updates

841a4c3

edwardfoyle reviewed May 13, 2024

View reviewed changes

packages/cli-core/src/format/format.ts Outdated Show resolved Hide resolved

Amplifiyer added 5 commits May 14, 2024 11:14

PR feedback updates

376de18

Merge branch 'main' into function_logs

6fd0a63

PR feedback updates

e8bbf91

try this

e15bcbf

try this

8fbeb3b

Amplifiyer force-pushed the function_logs branch from b50699a to 8fbeb3b Compare May 15, 2024 16:17

Amplifiyer added 7 commits May 15, 2024 18:34

Merge branch 'main' into function_logs

b589fdb

try this

c4d423e

Merge branch 'main' into function_logs

79223a6

Updates to cli options

ffb9791

add more tests

b5bcb12

Merge branch 'main' into function_logs

717698d

fix lint

6187471

Amplifiyer marked this pull request as ready for review June 6, 2024 13:00

Amplifiyer requested review from a team as code owners June 6, 2024 13:00

sobolk reviewed Jun 6, 2024

View reviewed changes

Amplifiyer added 2 commits June 6, 2024 20:21

PR feedback updates

18fc156

remove colors suppressions from printer

775b543

Amplifiyer added 3 commits June 7, 2024 15:21

move ArnParser from cdk to sdk

a6499ae

fix error handling

4b4b6ec

Merge branch 'main' into function_logs

d141d30

Amplifiyer mentioned this pull request Jun 7, 2024

[Sandbox] Log *all* events from cloudwatch when streaming to file #1625

Open

sobolk previously approved these changes Jun 7, 2024

View reviewed changes

edwardfoyle reviewed Jun 7, 2024

View reviewed changes

Amplifiyer added 2 commits June 10, 2024 15:43

PR updates

1639594

Merge branch 'main' into function_logs

b4b1506

Amplifiyer dismissed sobolk’s stale review via b4b1506 June 10, 2024 13:43

Amplifiyer requested review from edwardfoyle and sobolk June 12, 2024 18:01

sobolk approved these changes Jun 18, 2024

View reviewed changes

edwardfoyle reviewed Jun 18, 2024

View reviewed changes

edwardfoyle approved these changes Jun 18, 2024

View reviewed changes

		sandboxBackendId: BackendIdentifier,
		streamingOptions?: SandboxFunctionStreamingOptions

feat: add support for function logs streaming to sandbox #1492

Are you sure you want to change the base?

feat: add support for function logs streaming to sandbox #1492

Conversation

Amplifiyer commented May 10, 2024 • edited Loading

Changes

Validation

Checklist

changeset-bot bot commented May 10, 2024 • edited Loading

🦋 Changeset detected

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sobolk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Amplifiyer Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edwardfoyle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Amplifiyer Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Amplifiyer commented May 10, 2024 •

edited

Loading

changeset-bot bot commented May 10, 2024 •

edited

Loading

Amplifiyer Jun 6, 2024 •

edited

Loading

Amplifiyer Jun 18, 2024 •

edited

Loading