Skip to content

Frequent ConcurrentModificationException on running SCP updates #175

@Cihl28

Description

@Cihl28

Describe the bug
We're using Control Tower with CfCT for deploying various SCPs to Organizations.
About 50% of the time, the pipeline fails. Example step output:
{
"errorMessage": "An error occurred (ConcurrentModificationException) when calling the EnablePolicyType operation: AWS Organizations can't complete your request because it conflicts with another attempt to modify the same entity. Try again later.",
"errorType": "ConcurrentModificationException",
"stackTrace": [
" File "/var/task/state_machine_router.py", line 218, in lambda_handler\n return service_control_policy(event, function_name)\n",
" File "/var/task/state_machine_router.py", line 113, in service_control_policy\n response = scp.enable_policy_type()\n",
" File "/var/task/cfct/state_machine_handler.py", line 1027, in enable_policy_type\n scp.enable_policy_type(root_id)\n",
" File "/var/task/cfct/aws/services/scp.py", line 127, in enable_policy_type\n self.org_client.enable_policy_type(\n",
" File "/var/task/botocore/client.py", line 391, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File "/var/task/botocore/client.py", line 719, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}

To Reproduce
Run the pipeline. Fails frequently.

Expected behavior
SCPs deployed without errors.

Please complete the following information about the solution:

  • [ v2.6.0 ] Version: [e.g. v1.0.0]

To get the version of the solution, you can look at the description of the created CloudFormation stack. For example, "(SO0089) - customizations-for-aws-control-tower Solution. Version: v1.0.0". You can also find the version from releases

  • [ eu-central-1 ] Region: [e.g. us-east-1]
  • [ no ] Was the solution modified from the version published on this repository?
  • If the answer to the previous question was yes, are the changes available on GitHub?
  • [ yes ] Have you checked your service quotas for the sevices this solution uses?
  • [ no, in step function output ] Were there any errors in the CloudWatch Logs?

Additional context
Case number with AWS support: 14155855571
Response from engineer:

Hello!

Thanks for providing the error from the step function log output!

I was able to research this issue internally to better understand what is encountered and found that the Control Tower service team has been made aware of the issue. The internal team will be prioritizing resolving this behavior in an upcoming release as it requires implementing a code change to the following python ("scp.py") file which invokes the ("enable_policy_type") function on Line 125-139 to make the "EnablePolicyType" API call [1][2]. The changes proposed will better handle the ("ConcurrentModificationException") error and retry the "EnabledPolicyType" API call when the issue is encountered.

Per my understanding, the CfCT pipeline solution is currently designed to call "EnablePolicyType" API call when enabling a service control policy type for the organization before creating the policy. Per the documentation, “EnablePolicyType” API call enables a policy type in a root [2]. Its a one-time operation and after you enable a policy type in a root, you can attach policies of that type to the root, any organizational unit (OU), or account in that root. This means you do not have to call this API every time you create a new SCP policy. Once the SCP policy type is enabled for the root of the organization you can create SCP policies and attach them to the root, OUs or accounts.

As the document outlines, this is an "asynchronous" request that AWS performs in the background. AWS recommends that "ListRoots" is first used to see the status of policy types for a specified root, and then use “EnablePolicyType” operation only if your desired policy type (ex. Service Control Policy) is not enabled for the root.

Now in your case, since multiple policies are being created in parallel then the “EnablePolicyType” API is being called every time which leads to concurrent actions occurring at same time. In some occassions, when calling “EnablePolicyType” API the following error (“The specified policy type is already enabled.”) can also be received which is expected because we need to enable any policy type only once. We find this error exception already handled in the python ("scp.py") file on Line 130-135.

However, the “EnablePolicyType” API is an asynchronous request so it takes some time to process one request and return the successful code/error code. As a result, when another request is made for the same policy type at the same time it could lead to the encountered ("ConcurrentModificationException") which means that one request is already in progress and you should try again later.

Overall, the internal team plans to have this bug/issue resolved in their next release which is targetted by end of year. In the meantime, they recommend retrying the pipeline stage in the interim while the release is made. At this time, you can reference the AWS Control Tower GitHub Releases page for when the latest version with changes is made public [3]. Please feel free to raise this concern in the GitHub Issues page so you can publicly track the issue as well [4].

I hope the above provided some valuable information to you. I’m located in Seattle, WA with an availability from Mon.-Fri. (9:00AM-6:00PM PST). If any additional questions or concerns, please feel free to contact us back and we would be happy to help you out.

Thanks again!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions