Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(stepfunctions-tasks): in-flight stepfunction executions fail on update of versioned lambdas #21233

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
32 changes: 31 additions & 1 deletion packages/@aws-cdk/aws-stepfunctions-tasks/lib/lambda/invoke.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import * as iam from '@aws-cdk/aws-iam';
import * as lambda from '@aws-cdk/aws-lambda';
import { IAlias, IFunction, IVersion } from '@aws-cdk/aws-lambda';
import * as sfn from '@aws-cdk/aws-stepfunctions';
import * as cdk from '@aws-cdk/core';
import { Construct } from 'constructs';
Expand Down Expand Up @@ -120,7 +121,7 @@ export class LambdaInvoke extends sfn.TaskStateBase {

this.taskPolicies = [
new iam.PolicyStatement({
resources: this.props.lambdaFunction.resourceArnsForGrantInvoke,
resources: this.determineResourceArnsForGrantInvoke(props.lambdaFunction),
actions: ['lambda:InvokeFunction'],
}),
];
Expand Down Expand Up @@ -161,6 +162,35 @@ export class LambdaInvoke extends sfn.TaskStateBase {
};
}
}

/**
* Determine the ARN(s) to put into the resource field of the generated
* IAM policy based on the type of the provided lambda function.
*
* When invoking Lambda Versions, we need to grant permissions to all
* qualifiers. Otherwise in-flight StepFunction executions will fail with
* due to missing permissions. This is because a change of the referenced
* Version will cause the Policy to remove permissions for the previous
* Version - which is currently in-flight.
*
* @see https://github.com/aws/aws-cdk/issues/17515
*/
private determineResourceArnsForGrantInvoke(lambdaFunction: IFunction): string[] {
if (isVersion(lambdaFunction)) {
return lambdaFunction.lambda.resourceArnsForGrantInvoke;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is version.resourceArnsForGrantInvoke ever correct, in the face of changing versions?

Isn't it easier to just update the behavior of Version?

Copy link
Contributor Author

@mrgrain mrgrain Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically, I'd expect lambdaVersion.grantInvoke(role) to grant permissions for the specified version and not all qualifiers. Otherwise I would do lambdaVersion.grantInvoke(role). That's a far reaching change to make.

The idiomatic way to solve this problem in AWS is to use an Alias. I'm on the fence, but because of this I'm more inclined to argue this "works as intended", than to change the behavior of Version.


Maybe we're going about this the wrong way and we need to solve the undeniably bad customer experience differently.

Copy link
Contributor Author

@mrgrain mrgrain Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rix0rrr @madeline-k #20177 seems to be of a similar nature. Specifically @kaizencc made the point that if users want to "[..] grant less permissible policies, grant them on the version or alias you've created, not the function itself.". Changing Version to also grant permissions to all versions, would contradict this.

You've all clearly thought about this in-depth, so I'm very happy to follow your lead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Kaizen is right. Users should have created an alias, and invoked that. Invoking by currentVersion seems always incorrect, given the code that CDK generates.

In fact we should probably discourage it in the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rix0rrr So close this PR, update the docs, and then close the related issue with explanation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rix0rrr Can you quickly confirm if I understood you correctly regarding this PR and issue?

So close this PR, update the docs, and then close the related issue with explanation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @rix0rrr that this PR does not address the root cause of the issue.

}

return lambdaFunction.resourceArnsForGrantInvoke;
}
}

/**
* Type guard to determine if a given `IFunction` implements IVersion
*/
function isVersion(lambdaFunction: IFunction | IAlias | IVersion): lambdaFunction is IVersion {
return !(lambdaFunction as IAlias).aliasName
&& (lambdaFunction as IVersion).lambda
&& Boolean((lambdaFunction as IVersion).version);
}

/**
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
import { Code, Function, Runtime } from '@aws-cdk/aws-lambda';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is copied from integ.invoke.ts and adjusted to use a Version and an Alias.

import * as sfn from '@aws-cdk/aws-stepfunctions';
import * as cdk from '@aws-cdk/core';
import * as integ from '@aws-cdk/integ-tests';
import { LambdaInvoke } from '../../lib';


/*
* Creates a state machine with a task state to invoke a Lambda function
* via Alias and Version qualifiers.
*
* The state machine creates a couple of Lambdas that pass results forward
* and into a Choice state that validates the output.
*/
const app = new cdk.App();
const stack = new cdk.Stack(app, 'aws-stepfunctions-tasks-lambda-invoke-integ');

const submitJobLambda = new Function(stack, 'submitJobLambda', {
code: Code.fromInline(`exports.handler = async () => {
return {
statusCode: '200',
body: 'hello, world!'
};
};`),
runtime: Runtime.NODEJS_14_X,
handler: 'index.handler',
});

const submitJob = new LambdaInvoke(stack, 'Invoke Handler', {
lambdaFunction: submitJobLambda.currentVersion,
outputPath: '$.Payload',
});

const checkJobStateLambda = new Function(stack, 'checkJobStateLambda', {
code: Code.fromInline(`exports.handler = async function(event, context) {
return {
status: event.statusCode === '200' ? 'SUCCEEDED' : 'FAILED'
};
};`),
runtime: Runtime.NODEJS_14_X,
handler: 'index.handler',
});
const checkJobStateLambdaAlias = checkJobStateLambda.addAlias('IntegTest');

const checkJobState = new LambdaInvoke(stack, 'Check the job state', {
lambdaFunction: checkJobStateLambdaAlias,
resultSelector: {
status: sfn.JsonPath.stringAt('$.Payload.status'),
},
});

const isComplete = new sfn.Choice(stack, 'Job Complete?');
const jobFailed = new sfn.Fail(stack, 'Job Failed', {
cause: 'Job Failed',
error: 'Received a status that was not 200',
});
const finalStatus = new sfn.Pass(stack, 'Final step');

const chain = sfn.Chain.start(submitJob)
.next(checkJobState)
.next(
isComplete
.when(sfn.Condition.stringEquals('$.status', 'FAILED'), jobFailed)
.when(sfn.Condition.stringEquals('$.status', 'SUCCEEDED'), finalStatus),
);

const sm = new sfn.StateMachine(stack, 'StateMachine', {
definition: chain,
timeout: cdk.Duration.seconds(30),
});

new cdk.CfnOutput(stack, 'stateMachineArn', {
value: sm.stateMachineArn,
});

new integ.IntegTest(app, 'StepFunctionsInvokeLambdaQualifiersTest', {
testCases: [stack],
});

app.synth();
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{}