Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.1.1: Execution output: {} after running execute.sh #59

Closed
jcollum opened this issue Jan 8, 2020 · 15 comments
Closed

3.1.1: Execution output: {} after running execute.sh #59

jcollum opened this issue Jan 8, 2020 · 15 comments

Comments

@jcollum
Copy link

jcollum commented Jan 8, 2020

Steps:

  1. ./deploy.sh
  2. Result (already ran the deploy, just wanted to make sure it was deployed):
Waiting for changeset to be created..
Error: No changes to deploy. Stack lambda-power-tuning is up to date
  1. ./execute.sh
  2. Result:
-n .
// etc
SUCCEEDED
Execution output:
{}

Checked the logs: aws stepfunctions get-execution-history --profile default --execution-arn $EXECUTION_ARN

There are 96 entries in the logs like:

        {
            "timestamp": 1578505766.01,
            "type": "LambdaFunctionFailed",
            "id": 124,
            "previousEventId": 96,
            "lambdaFunctionFailedEventDetails": {
                "error": "Error",
                "cause": "{\"errorType\":\"Error\",\"errorMessage\":\"Invocation error: {\\\"errorType\\\":\\\"string\\\",\\\"errorMessage\\\":\\\"{\\\\\\\"statusCode\\\\\\\":\\\\\\\"500\\\\\\\",\\\\\\\"message\\\\\\\":\\\\\\\"An unexpected error occurred\\\\\\\"}\\\",\\\"trace\\\":[]}\",\"trace\":[\"Error: Invocation error: {\\\"errorType\\\":\\\"string\\\",\\\"errorMessage\\\":\\\"{\\\\\\\"statusCode\\\\\\\":\\\\\\\"500\\\\\\\",\\\\\\\"message\\\\\\\":\\\\\\\"An unexpected error occurred\\\\\\\"}\\\",\\\"trace\\\":[]}\",\"    at utils.range.map (/var/task/executor.js:67:19)\",\"    at process._tickCallback (internal/process/next_tick.js:68:7)\"]}"
            }
        },

So it appears to have failed but not caught it and then given an empty output.

Config:

{
    "lambdaARN": "arn:aws:lambda:us-west-2:etcetc",
    "powerValues": "ALL",
    "num": 5,
    "parallelInvocation": true,
    "strategy": "speed",
    "payload": [ {...} ]
}

Tried with explicit powerValues and parallelInvocation: false as well.

@alexcasalboni
Copy link
Owner

Thanks for reporting, @jcollum!

Here are the possible execution status values: RUNNING | SUCCEEDED | FAILED | TIMED_OUT | ABORTED (doc here).

The execute.sh script is only checking for FAILED, though. I admit the script was originally intended as just an example of how to poll for execution completion. Everything other than RUNNING should be considered a failed execution.

Can you please double-check what is the actual status of that execution?

Would you like to help me improve the script to cover those cases too?

@jcollum
Copy link
Author

jcollum commented Jan 14, 2020

Everything other than RUNNING should be considered a failed execution.

So SUCCEEDED is considered a failed execution?

Can you please double-check what is the actual status of that execution?

It appears to be getting to these lines:

 else
        # print execution output if succeeded
        echo $STATUS
        echo "Execution output: "

So the actual status is SUCCEEDED. But when it retrieves the output text with the describe-execution function it gets an empty object.

Would you like to help me improve the script to cover those cases too?

I don't think that's the issue. It seems like I need to see what the script actually did but I can't find that output because I don't know where to look.

I grabbed the full output, this is it:

{
    "executionArn": "arn:aws:states:us-west-2:xxx:execution:powerTuningStateMachine-BDxR787EO3SA:8a385921-2d77-4369-ba50-1b2af17bbff7",
    "stateMachineArn": "arn:aws:states:us-west-2:xxx:stateMachine:powerTuningStateMachine-BDxR787EO3SA",
    "name": "8a385921-2d77-4369-ba50-1b2af17bbff7",
    "status": "SUCCEEDED",
    "startDate": 1578510800.158,
    "stopDate": 1578510820.035,
    "input": "{\n    \"lambdaARN\": \"arn:aws:lambda:us-west-2:xxx:function:dev-testResult-student-testresults-get-lambda\",\n    \"powerValues\": [128, 256, 512, 1024, 2048, 3008],\n    \"num\": 5,\n    \"parallelInvocation\": false,\n    \"strategy\": \"speed\",\n    \"payload\": "(redacted)"
 }",
    "output": "{}"
}

That might not be correct JSON because I cut out the payload and some account info.

What can I do next to troubleshoot this?

@alexcasalboni
Copy link
Owner

So SUCCEEDED is considered a failed execution?

Good point :) Of course not.

So the actual status is SUCCEEDED. But when it retrieves the output text with the describe-execution function it gets an empty object.

I see. So the execution succeeds but the output is empty. I think this is due to the state machine error handling.

Could you inspect the execution in the AWS Management Console (web)? I'm quite sure it'll be much easier to debug there. Did the execution go through the CleanUpOnError state?

@jcollum
Copy link
Author

jcollum commented Jan 23, 2020

Took me a bit of time to get back to this one and figure out how to kick off the state machine execution from the control panel. I found there are many many errors like this one in the output:

2020-01-23T21:19:41.464Z	7a3cac89-9efb-4e57-9057-c41c14a0fcb3	ERROR	Invoke Error	
{
    "errorType": "Error",
    "errorMessage": "Invocation error: {\"errorType\":\"string\",\"errorMessage\":\"{\\\"statusCode\\\":\\\"500\\\",\\\"message\\\":\\\"An unexpected error occurred\\\"}\",\"trace\":[]}",
    "stack": [
        "Error: Invocation error: {\"errorType\":\"string\",\"errorMessage\":\"{\\\"statusCode\\\":\\\"500\\\",\\\"message\\\":\\\"An unexpected error occurred\\\"}\",\"trace\":[]}",
        "    at runInSeries (/var/task/executor.js:83:19)",
        "    at process._tickCallback (internal/process/next_tick.js:68:7)"
    ]
}

What else can I add to help here?

@alexcasalboni
Copy link
Owner

I'm sorry the experience wasn't super smooth :)

It looks like your function execution crashed (or threw an Exception/Error).

At the moment, the state machine assumes that your invocations are successful. If you are trying to fine-tune a failing Lambda function, it's not supported yet.

Please make sure that the payload you configured results in a non-failing function execution. You can easily verify this by testing your function in the Lambda management console, with the Test button.

@jcollum
Copy link
Author

jcollum commented Jan 27, 2020

It looks like I'm getting an access error when it tries to retrieve the power values:

2020-01-27T20:45:49.894Z	ab2c1583-e26d-420e-acaf-cb03878c7f8b	INFO	Getting current power value

Next entry:

2020-01-27T20:45:51.054Z	ab2c1583-e26d-420e-acaf-cb03878c7f8b	ERROR	Invoke Error	
{
    "errorType": "AccessDeniedException",
    "errorMessage": "User: arn:aws:sts::831967359561:assumed-role/serverlessrepo-aws-lambda-power-tu-initializerRole-KBWW5V1IY08N/serverlessrepo-aws-lambda-power-tuning-initializer-1D8H4M62LFR4T is not authorized to perform: lambda:GetFunctionConfiguration on resource: arn:aws:lambda:us-west-2:725462913924:function:dev-testResult-student-testresults-get-lambda",
    "code": "AccessDeniedException",
    "message": "User: arn:aws:sts::831967359561:assumed-role/serverlessrepo-aws-lambda-power-tu-initializerRole-KBWW5V1IY08N/serverlessrepo-aws-lambda-power-tuning-initializer-1D8H4M62LFR4T is not authorized to perform: lambda:GetFunctionConfiguration on resource: arn:aws:lambda:us-west-2:725462913924:function:dev-testResult-student-testresults-get-lambda",
    "time": "2020-01-27T20:45:51.016Z",
    "requestId": "bd67d8aa-829e-4af5-bc94-e9118818b2df",
    "statusCode": 403,
    "retryable": false,
    "retryDelay": 59.52802560907495,
    "stack": [
        "AccessDeniedException: User: arn:aws:sts::831967359561:assumed-role/serverlessrepo-aws-lambda-power-tu-initializerRole-KBWW5V1IY08N/serverlessrepo-aws-lambda-power-tuning-initializer-1D8H4M62LFR4T is not authorized to perform: lambda:GetFunctionConfiguration on resource: arn:aws:lambda:us-west-2:725462913924:function:dev-testResult-student-testresults-get-lambda",
        "    at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:51:27)",
        "    at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:55:8)",
        "    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)",
        "    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)",
        "    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:683:14)",
        "    at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)",
        "    at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)",
        "    at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10",
        "    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)",
        "    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:685:12)"
    ]
}

Here's the payload I'm sending:

{
    "lambdaARN": "arn:aws:lambda:us-west-2:725462913924:function:dev-testResult-student-testresults-get-lambda",
    "powerValues": [
        128,
        256,
        512,
        1024,
        2048,
        3008
    ],
    "num": 5,
    "parallelInvocation": false,
    "strategy": "speed",
    "payload": [
        {...}
    ]
}

@alexcasalboni
Copy link
Owner

Ok, this looks like the actual error. Thanks for sharing!

Maybe you configured a specific lambdaResource parameter when deploying the SAR app? Can you have a look at the CloudFormation stack's input parameters or at the IAM Policy used by the initializer function?

The default value for lambdaResource is "*", which grants the state machine access to any function in your account. But if you configured another value, it will be used as the Resource in the IAM role definition (see here).

Let's try to find out why the error is occurring in the first place, then I'll spend some time figuring a better way to debug these cases in the future.

@jcollum
Copy link
Author

jcollum commented Jan 28, 2020

Maybe you configured a specific lambdaResource parameter when deploying the SAR app?

Pretty sure I didn't change any of the defaults.

Can you have a look at the CloudFormation stack's input parameters

This?

image

the IAM Policy used by the initializer function?

lambda-power-tuning-initializerRole-12Q3CYVHBXCAZ

Under Permissions I see this: AWSLambdaExecute with a warning:

 This policy defines some actions, resources, or conditions that do not provide permissions. To grant access, policies must have an action that has an applicable resource or condition. For details, choose Show remaining Learn more

Also AWSLambdaBasicExecutionRole (no warnings) and initializerRolePolicy1 (no warnings)

Does that help?

@alexcasalboni
Copy link
Owner

Yes, can you please share the JSON policy of initializerRolePolicy1?

It's odd since you didn't change the default lambdaResource ("*"). Let's see if anything went wrong with that policy.

Also, when did you deploy the SAR app? It might be worth re-deploying it with the latest version to make sure it's not a bug already solved.

@jcollum
Copy link
Author

jcollum commented Jan 29, 2020 via email

@jcollum
Copy link
Author

jcollum commented Jan 29, 2020

Re-deployed the SAR, might be making progress, checking now...

@jcollum
Copy link
Author

jcollum commented Jan 29, 2020

image

Looks like whatever bug I hit on the first run was fixed between now and then.

Feedback:

  • the section about kicking off test events in the serverless app could use more detail; felt like I was guessing on what to click next a lot of the time. It's as simple as "Services > Step Functions", select power tuner, "Start Execution" button.
  • I'd stress that the user needs to test their lambda with a payload before starting with the tuner. It turns out the lambda I wanted to tune had been deleted a few days ago so I had to redeploy it. Even after redeploying I found that the test data I had wouldn't work on the deployed version, so I needed to do some digging (with help) to find a known-good input to the lambda. After that it was easy to run the tuner. So I'd say something like "Step 0: find a known-good working payload!"

@jcollum jcollum closed this as completed Jan 29, 2020
@jcollum
Copy link
Author

jcollum commented Jan 29, 2020

Oh yeah, one more suggestion: add the ARN of the lambda to the results visualization.

@alexcasalboni
Copy link
Owner

Thanks for sharing this, @jcollum!

I'll try to simplify the documentation and maybe add a few screenshots of the web console as well.

Oh yeah, one more suggestion: add the ARN of the lambda to the results visualization.

This is intentionally missing to avoid privacy concerns, as there isn't any data sharing between the state machine and the visualization tool (besides the numbers). Could you elaborate on why it'd make the visualization better?

@jcollum
Copy link
Author

jcollum commented Jan 30, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants