Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creds.sh An error occurred (ValidationError) when calling the AssumeRole operation arn::iam::XXXXXXXXXXX:role/AWSAFTAdmin is invalid #219

Closed
abhishek-sorenson opened this issue Aug 4, 2022 · 13 comments
Labels
bug Something isn't working

Comments

@abhishek-sorenson
Copy link

Terraform Version >= 0.15.1 & Prov: >= 3.72, < 4.0.0

AFT Version: 1.3.3
(Can be found in the AFT Management Account in the SSM Parameter /aft/config/aft/version)

Terraform Version & Provider Versions
N/A

terraform version

N/A

terraform providers

N/A

Bug Description
This Bug is as of 08/03/2022

When running the aft-create-pipeline CodeBuild project, it fails with the error:

[Container] 2022/08/03 21:52:46 Running command ./aws-aft-core-framework/sources/scripts/creds.sh --aft-mgmt

Generating credentials for AWSAFTAdmin in aft-management account: XXXXXXXXXXX

An error occurred (ValidationError) when calling the AssumeRole operation: arn::iam::XXXXXXXXXXX:role/AWSAFTAdmin is invalid

[Container] 2022/08/03 21:52:50 Command did not exit successfully ./aws-aft-core-framework/sources/scripts/creds.sh --aft-mgmt exit status 255

Note the ARN for the AWS AFT Admin role is missing the AWS partition key and should read "arn:aws:iamXXXXXXXXXXX:role/AWSAFTAdmin." This is causing the Validation Error for the Assume Role operation in the creds.sh script. Our parameter store was pointing to the "main" branch in the "aws-ia/terraform-aws-control_tower_account_factory" repository using the repo URL and the repo branch as the parameters. This is because the creds.sh is using the AWS partition environment variable rather than hard coding "aws." There was a commit on July 19th where "aws" was replaced with "${AWS_PARTITION}" and our builds have been failing soon after that commit.

creds.sh is in terraform-aws-control_tower_account_factory/sources/scripts/creds.sh

A workaround for this issue is to fork the repo and replace the ${AWS_PARTITION} environment variable with "aws."

For example:

CREDENTIALS=$(aws sts assume-role --role-arn "arn:${AWS_PARTITION}:iam::${AFT_MGMT_ACCOUNT}:role/${AFT_MGMT_ROLE}" --role-session-name "${ROLE_SESSION_NAME}")

SHOULD BE CHANGED TO:

CREDENTIALS=$(aws sts assume-role --role-arn "arn:aws:iam::${AFT_MGMT_ACCOUNT}:role/${AFT_MGMT_ROLE}" --role-session-name "${ROLE_SESSION_NAME}")

There are other spots where ${AWS_PARTITION} should be changed. This is just a work around until the environment variable is fixed.

To Reproduce
Steps to reproduce the behavior:

  1. Check the parameter store for /aft/config/aft-pipeline-code-source/repo-url and ensure it's value is "https://github.com/aws-ia/terraform-aws-control_tower_account_factory.git"
  2. Check the parameter store for /aft/config/aft-pipeline-code-source/repo-git-ref and ensure it's value is "main"
  3. Go to CodeBuild > Build Projects > aft-create-pipeline > Start Build
  4. This should produce the error in the description above (as of 08/03/2022)

Expected behavior
The aft-create-pipeline should run successfully

Related Logs

Additional context
Add any other context about the problem here.

Go to CodeBuild > Build Projects > aft-create-pipeline > Build History
2. Click on the latest failed build
3. Check for the error listed in the description

Expected behavior
A clear and concise description of what you expected to happen.

Related Logs
[Container] 2022/08/03 21:52:46 Running command ./aws-aft-core-framework/sources/scripts/creds.sh --aft-mgmt

Generating credentials for AWSAFTAdmin in aft-management account: XXXXXXXXXXX

An error occurred (ValidationError) when calling the AssumeRole operation: arn::iam::XXXXXXXXXXX:role/AWSAFTAdmin is invalid

[Container] 2022/08/03 21:52:50 Command did not exit successfully ./aws-aft-core-framework/sources/scripts/creds.sh --aft-mgmt exit status 255

@abhishek-sorenson abhishek-sorenson added bug Something isn't working pending investigation Issue needs further investigation labels Aug 4, 2022
@snebhu3
Copy link
Collaborator

snebhu3 commented Aug 4, 2022

@abhishek-sorenson there is a known bug with AFT version 1.3.5 and older which causes AFT components to use the latest code in the AFT source repository instead of the version of AFT that was deployed.

I would recommend updating to latest version of AFT v 1.6.2 which would fix your issue.

@snebhu3 snebhu3 removed the pending investigation Issue needs further investigation label Aug 4, 2022
@andrewkruse
Copy link

@abhishek-sorenson there is a known bug with AFT version 1.3.5 and older which causes AFT components to use the latest code in the AFT source repository instead of the version of AFT that was deployed.

I would recommend updating to latest version of AFT v 1.6.2 which would fix your issue.

Is there guidance on which order to do things for updating? I started this process this morning after encountering this. I ran the updated terraform for account factory. Then I kicked off the aft-invoke-customizations step function with:

{
  "include": [
    {
      "type": "all"
    }
  ]
}

which proceeded to fail. Was there an order of operations I missed somewhere?

@gabrielibagon
Copy link

gabrielibagon commented Aug 13, 2022

@andrewkruse @snebhu3 I also couldn't get that fix to work.

What ended up working was hardcoding the env variable AWS_PARTITION in aft-create-pipeline BuildSpec (this was the pipeline that was failing for me). I edited it via the Console, by going to CodeBuild > Build Projects > aft-create-pipeline > Edit Environment.

Not sure if this will fix all scenarios, but it allowed me to successfully run the other components of the project to provision/customize an account.

@abhishek-sorenson
Copy link
Author

@andrewkruse @snebhu3 I also couldn't get that fix to work.

What ended up working was hardcoding the env variable AWS_PARTITION in aft-create-pipeline BuildSpec (this was the pipeline that was failing for me). I edited it via the Console, by going to CodeBuild > Build Projects > aft-create-pipeline > Edit Environment.

Not sure if this will fix all scenarios, but it allowed me to successfully run the other components of the project to provision/customize an account.

Yup, this is probably a better workaround than what I'm doing currently as it explicitly defines the AWS_PARTITION environment variable rather than replacing that bit in the ARN altogether. I think the ultimate solution is:

  1. Use the workaround to make your code work as is
  2. Upgrade to the latest AFT version (make sure you test a deployment first!)
  3. Pull down the latest code from the aws-ia/terraform-aws-control_tower_account_factory
  4. Re-test the AFT version with a test deployment

If this doesn't work, then please report a bug with the latest version. I will try this myself sometime soon and provide an update.

@abhishek-sorenson
Copy link
Author

@abhishek-sorenson there is a known bug with AFT version 1.3.5 and older which causes AFT components to use the latest code in the AFT source repository instead of the version of AFT that was deployed.
I would recommend updating to latest version of AFT v 1.6.2 which would fix your issue.

Is there guidance on which order to do things for updating? I started this process this morning after encountering this. I ran the updated terraform for account factory. Then I kicked off the aft-invoke-customizations step function with:

{
  "include": [
    {
      "type": "all"
    }
  ]
}

which proceeded to fail. Was there an order of operations I missed somewhere?

Hey Andrew,

I'm not sure what customizations you kicked off, but please check if you're seeing the same error. The error in this thread in relation to the aft-create-pipeline and I believe the customizations step function calls global-customizations. Can you double-check the error you're getting? If it is the same, you just need to hardcode the environment variable for a workaround as @gabrielibagon suggested. For a full fix, follow the steps suggested in my prior post and see if it works.

@andrewkruse
Copy link

Hey Andrew,

I'm not sure what customizations you kicked off, but please check if you're seeing the same error. The error in this thread in relation to the aft-create-pipeline and I believe the customizations step function calls global-customizations. Can you double-check the error you're getting? If it is the same, you just need to hardcode the environment variable for a workaround as @gabrielibagon suggested. For a full fix, follow the steps suggested in my prior post and see if it works.

It appears it was failing to schedule some of the pipelines because it has exceeded the amount allowable at once. Apparently my cap is set to 20, not 25. But after getting through all of them, it appears the codebuild projects have been updated to have the AWS_PARTITION variable in them and the code pipelines are using a working code build project.

@balltrev
Copy link

We've got a separate issue to address the aft-create-pipeline concurrency throttling mentioned here, #223

We do not recommend hard-coding the AWS_PARTITION variable on the customization CodeBuild jobs, as the account specific customization pipelines should be updated as a side effect of the aft-invoke-customizations Step Function, which would resolve the missing AWS_PARTITION environment variable issue.

@abhishek-sorenson
Copy link
Author

@balltrev, okay we will give this a try. We recently tried upgrading AFT to the latest version and used the most updated repository for AFT and we still encountered this error. Do we need to run the account specific customizations pipelines to resolve this?

@andrewkruse
Copy link

@balltrev, okay we will give this a try. We recently tried upgrading AFT to the latest version and used the most updated repository for AFT and we still encountered this error. Do we need to run the account specific customizations pipelines to resolve this?

@abhishek-sorenson After updating the AFT module via terraform, I had to run aft-invoke-customizations for each individual account id to make sure I didn't exceed a 25 concurrency limit. It ends up updating some code pipelines and some code build projects and then the next pipeline kick offs should work normally.

@abhishek-sorenson
Copy link
Author

@andrewkruse Okay got it, we will give that a try. Thanks!

@stumins
Copy link

stumins commented Oct 3, 2022

I'm closing this issue as we've reached resolution on the original report - please track the pipeline creation concurrency issues via #223

@stumins stumins closed this as completed Oct 3, 2022
@abhishek-sorenson
Copy link
Author

Hi,

This bug is still active, please do not close it.

What is the resolution for this bug?

@stumins
Copy link

stumins commented Oct 3, 2022

Which bug are you referring to?

The pipeline creation concurrency issue has not been resolved but should be tracked via #223 .

As mentioned above, the symptoms in this ticket stem from improper partial upgrade of components due to a bug in AFT 1.3.5 - later versions are not exposed to this issue. Customers should not directly hardcode or configure the AWS_PARTITION environment variable, but instead upgrade to the latest AFT version and re-invoke the customization pipelines to ensure components are properly upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants