Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AwsCustomResource]: (assumeRoleArn defined in non-opt-in region while assume in opt-in region cause permission issue) #26562

Closed
chensy-aws opened this issue Jul 28, 2023 · 18 comments · Fixed by #26917
Assignees
Labels
@aws-cdk/custom-resources Related to AWS CDK Custom Resources bug This issue is a bug. effort/medium Medium work item – several days of effort p1 sdk-v3-upgrade Tag issues that are associated to SDK V3 upgrade. Not limited to CR usage of SDK only.

Comments

@chensy-aws
Copy link

chensy-aws commented Jul 28, 2023

Describe the bug

for AwsCustomResource, the AwsSdkCall have assumeRoleArn that we can assume to proceed the SdkCall.

But the default sts endpoint is set to regional, and default region in our case an opt-in region. However, the role is created/defined under a root account in non-opt-in region(we cannot enable all opt-in region for that account). With the incorrect sts endpint point the assumeRole failed with permission issue.

Expected Behavior

sts Assume Role success and AwsSdkCall can proceed with success response.

Current Behavior

Error [CredentialsError]: Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1
    at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/query.js:50:29)
    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)
    at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12)
    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
  code: 'CredentialsError',
  time: 2023-07-19T01:26:20.624Z,
  requestId: '***********',
  statusCode: 403,
  retryable: false,
  retryDelay: 30.07614769919884,
  originalError: {
    message: 'Could not load credentials from ChainableTemporaryCredentials',
    code: 'CredentialsError',
    time: 2023-07-19T01:26:20.624Z,
    requestId: '*****************',
    statusCode: 403,
    retryable: false,
    retryDelay: 30.07614769919884,
    originalError: {
      message: 'User: arn:aws:sts::*********:assumed-role/********* is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::**********:role/*******',
      code: 'AccessDenied',
      time: 2023-07-19T01:26:20.565Z,
      requestId: '******',
      statusCode: 403,
      retryable: false,
      retryDelay: 30.07614769919884
    }
  }
}

Reproduction Steps

Create an AwsCustomResource in opt-in region to assume a Role define in an account which did not enable this opt-in region.

Possible Solution

We tried all combinations of region vs stsEndpoint:

case 1, default region(opt-in region) with global sts endpoint. -> FAILED
case 2, non opt-in region with global sts endpoint. -> SUCCEED
case 3, default region(opt-in region) with regional sts endpoint. -> FAILED
case 4, non opt-in region with regional sts endpoint. -> SUCCEED

So in either cases, we need to override the default region to a non-opt-in region!! So requesting to expose this sts region option to the user. the AwsSdkCall do have a region option, but the region is NOT used for sts assume role.

Additional Information/Context

No response

CDK CLI Version

2.73.0

Framework Version

No response

Node.js Version

18

OS

AL2

Language

Typescript

Language Version

No response

Other information

No response

@chensy-aws chensy-aws added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jul 28, 2023
@github-actions github-actions bot added the @aws-cdk/aws-iam Related to AWS Identity and Access Management label Jul 28, 2023
@pahud
Copy link
Contributor

pahud commented Aug 1, 2023

I am afraid this is related to JS SDK rather than CDK.

Which opt-in region is that in your case?

Let me rephrase it.

Account A in an opt-in region was deploying a CDK custom resource that assumes to another role from Account B which does not opt-in this region, and the custom resource just failed with lambda log as you provided above. Is it correct?

@pahud pahud added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Aug 1, 2023
@pahud
Copy link
Contributor

pahud commented Aug 3, 2023

Are you sure you are using CDK 2.1.4 ?

Can you confirm your CDK version?

@pahud pahud added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Aug 3, 2023
@mrgrain
Copy link
Contributor

mrgrain commented Aug 3, 2023

Stack trace indicates this is using SDK v2: /var/runtime/node_modules/aws-sdk/lib/protocol/query.js:50:2
Note the aws-sdk and not @aws-sdk/whatever

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Aug 3, 2023
@chensy-aws
Copy link
Author

chensy-aws commented Aug 3, 2023

I am afraid this is related to JS SDK rather than CDK.

Which opt-in region is that in your case?

Let me rephrase it.

Account A in an opt-in region was deploying a CDK custom resource that assumes to another role from Account B which does not opt-in this region, and the custom resource just failed with lambda log as you provided above. Is it correct?

Yes. the opt-in region we tested is ap-east-1 and me-south-1

This can be either CDK problem or SDK issue, if we want to force(hardcode) the assume region to be Account B non-opt-in region, it will be an SDK issue, but this may be an issue for some special case.

If we want to provide the override option for assumeRole region in this case, CDK also needs some update to support. Not sure about the decision from CDK team for AwsCustomResource

@chensy-aws
Copy link
Author

chensy-aws commented Aug 3, 2023

Are you sure you are using CDK 2.1.4 ?

Can you confirm your CDK version?

Actually, we are using

"aws-cdk-lib": 2.73.0,
"aws-cdk": 2.73.0",

@chensy-aws
Copy link
Author

Stack trace indicates this is using SDK v2: /var/runtime/node_modules/aws-sdk/lib/protocol/query.js:50:2 Note the aws-sdk and not @aws-sdk/whatever

Exact SDK version is:
AWS SDK VERSION: 2.1374.0

@chensy-aws
Copy link
Author

chensy-aws commented Aug 4, 2023

In the latest SDK, it provide the override option for region:
https://github.com/aws/aws-sdk-js-v3/blob/main/packages/credential-providers/src/fromTemporaryCredentials.ts

if (!stsClient) stsClient = new STSClient({ ...options.clientConfig, credentials: options.masterCredentials });

The sdk V2 also provide the override option:

https://github.com/aws/aws-sdk-js/blob/master/lib/credentials/chainable_temporary_credentials.js#L127

Is it possible to expose this stsRegion Option to AwsCustomResourceProps. it is blocking one of our prod region build.

@chensy-aws
Copy link
Author

Is there a timeline for fixing this bug? It is blocking our region build and we have another opt-in region build coming this week.

@MrArnoldPalmer
Copy link
Contributor

So, to be clear @chensy-aws, you're asking for a new feature that hasn't been supported in this custom resource previously? You want to be able to set the region with which the SDK client is initiated?

Can you not achieve this using environment variables?

@chensy-aws
Copy link
Author

chensy-aws commented Aug 9, 2023

So, to be clear @chensy-aws, you're asking for a new feature that hasn't been supported in this custom resource previously? You want to be able to set the region with which the SDK client is initiated?

Can you not achieve this using environment variables?

HI, I am not an expert of CustomResource,
https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.custom_resources.AwsCustomResource.html
https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.custom_resources.AwsSdkCall.html

Both interface here did not provide easy way to setup the ENV VAR in cdk, am I miss anything here?

you're asking for a new feature that hasn't been supported in this custom resource previously?

Yes, If this situation is something AwsCustomResource did not consider before.

@chensy-aws
Copy link
Author

To summarize the discussion:

We are using this construct:
https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.custom_resources.AwsCustomResource.html

From our CDK code,

 new AwsCustomResource(this, "someStack", {
            installLatestAwsSdk: false,
            onCreate: {
                assumedRoleArn: role_arn,
                region: non-opt-in region,
                service: 'CloudFormation',
                action: 'createStack',
                parameters: {
                    some stack template
                },
                physicalResourceId: phy_res_id,
            },
            policy: AwsCustomResourcePolicy.fromStatements([ PolicyStatement.fromJson({
                Effect: "Allow",
                Action: "sts:AssumeRole",
                Resource: role_arn,
            }) ]),
        })
    }

The Role we are trying to assume role_arn is defined under non-opt-in region. This custom resource construct is being created in opt-in region, the region param on create handler is creating stack in non-opt-in region. the STS client in AwsCustomResource is using the default opt-in region this construct is created, while we want to use the non-opt-in region (the role is created) to assume the role.

As far as we understand, there are 2 ways to solve this issue.

  1. use the region option for both service/action and assumeRoleArn inside.
  2. expose a parameter (stsRegion) to customer directly to override the sts region.

@kaizencc
Copy link
Contributor

So, in the CustomResource for CrossAccountZoneDelegationRecord, we provide the user with a way to specify non-regional sts endpoints:

const useRegionalStsEndpoint = this.node.tryGetContext(USE_REGIONAL_STS_ENDPOINT_CONTEXT_KEY);

However, in the more opinionated AwsCustomResource handler, we hardcode the sts endpoint to regional, so users cannot have non-regional sts endpoints and use AwsCustomResource:

credentials = new AWS.ChainableTemporaryCredentials({
params: params,
stsConfig: { stsRegionalEndpoints: 'regional' },
});

The ask is to honor the context flag this.node.tryGetContext(USE_REGIONAL_STS_ENDPOINT_CONTEXT_KEY); and use non-regional sts endpoints in AwsCustomResource as well.

@kaizencc kaizencc added @aws-cdk/custom-resources Related to AWS CDK Custom Resources and removed @aws-cdk/aws-iam Related to AWS Identity and Access Management labels Aug 10, 2023
@MrArnoldPalmer
Copy link
Contributor

MrArnoldPalmer commented Aug 11, 2023

So the v3 sdk uses the sts regional endpoints by default. I don't think regional endpoints are the issue here.

The client needs to instantiate the sts client using a non opt-in region, I guess it doesn't matter which one, as long as it's one that is enabled in the account where the role being assumed is defined. According to the docs, you can't assume a role using an opt-in regional endpoint, unless the account with that role defined also has that opt-in region enabled.

Also tokens granted using the global endpoint are not compatible with opt-in regions by default. You can make them compatible but you have to set a flag within the IAM console for your account. Basically I think this means we shouldn't use the global endpoint.

Relevant docs: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html

@chensy-aws
Copy link
Author

chensy-aws commented Aug 11, 2023

#26593
This is complaining on the sdk v3.

I don't think regional endpoints are the issue here.

The main issue is sts assumeRole using the default region (which is opt-in), while use the provided region in the parameter, for service/action.

As I mentioned earlier, either we use the same region for sts and service/action, or expose the sts region as a parameter.

@MrArnoldPalmer
Copy link
Contributor

@chensy-aws yeah I think we are in agreement, my message was with regards to the usage of USE_REGIONAL_STS_ENDPOINT_CONTEXT_KEY, which won't change the sdk v3 behavior and therefore won't fix the issue.

I think using the same region as the aws api call in the clientConfig for fromTemporaryCredentials is the best solution here.

@rix0rrr
Copy link
Contributor

rix0rrr commented Aug 17, 2023

I had to think about this a lot, but in the end I came up with this diagram and it basically confirms what @MrArnoldPalmer was already saying 🤣

                                                               
                     Account A                  Account B      
                                                               
                                                               
                                               Target Role     
                                                               
               ┌───────────────────┐      ┌───────────────────┐
               │                   │      │                   │
   Non opt-in  │Perform AssumeRole │      │   Make service    │
     region    │     call here     │      │     call here     │
               │                ▲ ─┼──────┼─────▶             │
               │                │  │      │                   │
               └────────────────┼──┘      └───────────────────┘
                                │                              
                                │                              
               ┌────────────────┼──┐      ┌───────────────────┐
               │                │  │      │                   │
               │                │  │      │                   │
opt-in region  │    Lambda here    │      │   (not enabled)   │
               │                   │      │                   │
               │                   │      │                   │
               └───────────────────┘      └───────────────────┘

I was wondering whether this would work, but it should: Lambda will be using regional opt-in credentials, but they should be valid in Account A to perform the AssumeRole call in the non-opt-in region (since Account A has the opt-in region opted-in, otherwise the Lambda couldn't be running there).

And then afterwards, those credentials are valid in the non-opt-in region.

@chensy-aws
Copy link
Author

chensy-aws commented Aug 17, 2023

I had to think about this a lot, but in the end I came up with this diagram and it basically confirms what @MrArnoldPalmer was already saying 🤣

                                                               
                     Account A                  Account B      
                                                               
                                                               
                                               Target Role     
                                                               
               ┌───────────────────┐      ┌───────────────────┐
               │                   │      │                   │
   Non opt-in  │Perform AssumeRole │      │   Make service    │
     region    │     call here     │      │     call here     │
               │                ▲ ─┼──────┼─────▶             │
               │                │  │      │                   │
               └────────────────┼──┘      └───────────────────┘
                                │                              
                                │                              
               ┌────────────────┼──┐      ┌───────────────────┐
               │                │  │      │                   │
               │                │  │      │                   │
opt-in region  │    Lambda here    │      │   (not enabled)   │
               │                   │      │                   │
               │                   │      │                   │
               └───────────────────┘      └───────────────────┘

I was wondering whether this would work, but it should: Lambda will be using regional opt-in credentials, but they should be valid in Account A to perform the AssumeRole call in the non-opt-in region (since Account A has the opt-in region opted-in, otherwise the Lambda couldn't be running there).

And then afterwards, those credentials are valid in the non-opt-in region.

Yes, that should definitely work in our case.

@scanlonp scanlonp added p1 and removed p2 labels Aug 29, 2023
@udaypant udaypant added the sdk-v3-upgrade Tag issues that are associated to SDK V3 upgrade. Not limited to CR usage of SDK only. label Aug 30, 2023
@mergify mergify bot closed this as completed in #26917 Aug 31, 2023
mergify bot pushed a commit that referenced this issue Aug 31, 2023
…26917)

Currently, the region parameter in `AwsCustomResource` only controls where the action is performed. If a role needs to be assumed, the `assumeRole` call is made from the region the stack is deployed into. This presents a problem if the stack is deployed into an opt-in region, and the role being assumed lives in a separate stack in an account without the opt-in region enabled. 

This change makes the `assumeRole` call and the sdk call performed in the same region. Therefore, to solve the above problem, pass any region that is enabled for the account that owns the role to be assumed.

Closes #26562.



----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

mikewrighton pushed a commit that referenced this issue Sep 14, 2023
…26917)

Currently, the region parameter in `AwsCustomResource` only controls where the action is performed. If a role needs to be assumed, the `assumeRole` call is made from the region the stack is deployed into. This presents a problem if the stack is deployed into an opt-in region, and the role being assumed lives in a separate stack in an account without the opt-in region enabled. 

This change makes the `assumeRole` call and the sdk call performed in the same region. Therefore, to solve the above problem, pass any region that is enabled for the account that owns the role to be assumed.

Closes #26562.



----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/custom-resources Related to AWS CDK Custom Resources bug This issue is a bug. effort/medium Medium work item – several days of effort p1 sdk-v3-upgrade Tag issues that are associated to SDK V3 upgrade. Not limited to CR usage of SDK only.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants