Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WaitUntil* functions don't fail early if AccessDenied #4983

Closed
dannyrandall opened this issue Sep 12, 2023 · 3 comments
Closed

WaitUntil* functions don't fail early if AccessDenied #4983

dannyrandall opened this issue Sep 12, 2023 · 3 comments
Assignees
Labels
bug This issue is a bug.

Comments

@dannyrandall
Copy link

dannyrandall commented Sep 12, 2023

Describe the bug

When using the cloudformation.WaitUntilChangeSetCreateCompleteContext function with an IAM role with an explicit DENY for cloudformation:DescribeChangeSet, I expected that the function would return early (instead of waiting until MaxAttempts) with an error like:

AccessDenied: User: arn:aws:sts::12345:myRole is not authorized to perform: cloudformation:DescribeChangeSet on resource: arn:aws:cloudformation:us-west-2:12345:stack/my-stack/change-set-id with an explicit deny in an identity-based policy

However, an AccessDenied error doesn't match any of the defined Acceptors for that API. Because of this, the function loops up to the defined MaxAttempts, with Delay between each call before the error is finally returned.

Usage example here: https://github.com/aws/copilot-cli/blob/dc6d9535b9fb1cd5cc87975f158b980b85fdd88f/internal/pkg/aws/cloudformation/changeset.go#L119-L124

Expected Behavior

I expected that an AccessDenied error would be returned to the caller immediately upon receiving that response.

Current Behavior

The request.Waiter makes the request continuously for an hour (by default), constantly getting the same error.

Reproduction Steps

Using credentials for an IAM role with the policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "cloudformation:DescribeChangeSet"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
} 

Code:

func main() {
	cfn := cloudformation.New(session.Must(session.NewSession(&aws.Config{
		Region: aws.String("us-west-2"),
	})))

	err := cfn.WaitUntilChangeSetCreateCompleteWithContext(ctx, &cloudformation.DescribeChangeSetInput{
		ChangeSetName: aws.String("arn:aws:cloudformation:us-west-2:12345:stack/stack-name/change-set-id"), // this is invalid input (not a change set), but irrelvant to the example
	})
	log.Printf("WaitUtilChangeSetCreateComplete error: %s", err)
}

Possible Solution

There should be an acceptor to catch 403 responses defined here

"acceptors": [
{
"argument": "Status",
"expected": "CREATE_COMPLETE",
"matcher": "path",
"state": "success"
},
{
"argument": "Status",
"expected": "FAILED",
"matcher": "path",
"state": "failure"
},
{
"expected": "ValidationError",
"matcher": "error",
"state": "failure"
}
]
.

For example, testing with this worked for me:

{
  "matcher": "status",
  "expected": 403,
  "state": "failure"
},

Additional Information/Context

After glancing at other WaitUntil* functions, I see this extends to other functions as well.

SDK version used

v1

Environment details (Version of Go (go version)? OS name and version, etc.)

go version go1.21.1 darwin/amd64

@dannyrandall dannyrandall added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 12, 2023
@RanVaknin RanVaknin self-assigned this Sep 18, 2023
@RanVaknin RanVaknin removed the needs-triage This issue or PR still needs to be triaged. label Sep 18, 2023
@RanVaknin
Copy link
Contributor

Hi @dannyrandall,

The C2J API model you linked indeed specifies all the data the SDK's code generator needs for creating conditions and values for each waiter.

Since all of the AWS SDKs code is auto-generated from these API model files, the SDK team can't make direct changes to them. This data is defined upstream by the respective AWS services themselves (in this case, CloudFormation) and not on the SDK level.

Regarding terminating the waiter on a 403 error, there are some counterarguments to consider. Although 403 errors are generally not retryable, in distributed systems there can be a valid use case to force retries on 403s. Mainly surrounding propagation delays during IAM role creation or credential acquisition. These delays can lead to transient conditions where the system might actually benefit from retrying a 403 error.

For example, in an EKS environment, a pod might be in the process of scaling down or restarting. During this transient state, the SDK may not have the correct data to make a successful request, resulting in a 403. In such cases, automatically terminating the waiter on a 403 error could be counterproductive, as the error may resolve itself once the pod is back to a working state.

FWIW, I think you are making a reasonable argument, however this should be raised with the Cloudformation service team rather than the SDK team. You can use your AWS console to file a support ticket and raise this issue internally with them.

In the meantime your best option would be to handwrite your own waiter (you can use the generated one as a template) and add the logic to abort on 403s.

Thanks again,
Ran~

@RanVaknin RanVaknin closed this as not planned Won't fix, can't repro, duplicate, stale Sep 18, 2023
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@dannyrandall
Copy link
Author

@RanVaknin thanks for the response! Makes sense that since this logic is shared across all the SDKs, the issue belongs with the CloudFormation team. I'll open a ticket from the AWS Console with them!😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

2 participants