Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid ParameterGroup takes 45 minutes to fail #1014

Closed
pplu opened this issue Dec 21, 2021 · 2 comments
Closed

Invalid ParameterGroup takes 45 minutes to fail #1014

pplu opened this issue Dec 21, 2021 · 2 comments

Comments

@pplu
Copy link

pplu commented Dec 21, 2021

Name of the resource

AWS::RDS::DBClusterParameterGroup

Resource Name

No response

Issue Description

When creating an invalid ParameterGroup, CloudFormation will take 45 minutes to report back an error:

Timestamp LogicalId Status StatusReason
2021-12-20 23:30:07 UTC+0100 CustomersClusterParamGroup CREATE_FAILED An internal error has occurred. Please try your query again at a later time. (Service: AmazonRDS; Status Code: 500; Error Code: InternalFailure; Request ID: REDACTED; Proxy: null)
2021-12-20 22:46:09 UTC+0100 CustomersClusterParamGroup CREATE_IN_PROGRESS Resource creation initiated

Expected Behavior

CloudFormation should fail immediately if the ParameterGroup is invalid, clearly stating that the ParameterGroup is invalid

Observed Behavior

It looks like CloudFormation is retrying due to the RDS API returning HTTP 500s when setting the secure_auth parameter. I'm not sure if this affects more parameters and if non-cluster parameter groups are also affected.
I've been bitten by this bug multiple times, and have had to wait a ton of time just to get the stack rolled back, when really the rollback could have happened almost immediately.

Also the error message in CloudFormation encourages you to take the wrong path (to resubmit the template), adding to the frustration of having to wait a long time. Getting this narrowed down to the parameter that is causing it is quite hard work.

This might be an undesired interaction between CloudFormation and the RDS API (it feels kind of strange that the RDS API is returning an 500 error, and telling you to retry again later).

On the CLI, you can simulate this with:

aws rds --region eu-west-1 modify-db-cluster-parameter-group --db-cluster-parameter-group-name test --parameters ParameterName=secure_auth,ParameterValue=1,ApplyMethod=pending-reboot

An error occurred (InternalFailure) when calling the ModifyDBClusterParameterGroup operation (reached max retries: 4): An internal error has occurred. Please try your query again at a later time.
[ { "ApplyMethod": "pending-reboot", "Description": "Blocks connections from all accounts that have passwords stored in the old (pre-4.1) format.", "DataType": "boolean", "IsModifiable": true, "AllowedValues": "1", "SupportedEngineModes": [ "provisioned" ], "Source": "engine-default", "ParameterValue": "1", "ParameterName": "secure_auth", "ApplyType": "dynamic" } ]

Test Cases

AWSTemplateFormatVersion: "2010-09-09"
Description: 'ParamError'
Resources:
  CustomersClusterParamGroup:
    Type: AWS::RDS::DBClusterParameterGroup
    Properties:
      Description: 'Invalid Parameter Group handling'
      Family: aurora-mysql5.7
      Parameters: 
        secure_auth: 1

Is enough to trigger this error

Other Details

No response

@osdrv
Copy link

osdrv commented Sep 21, 2022

Thanks very much for this report @pplu. I know it's been a while. We are looking into this issue.

@cfn-github-issues-bot cfn-github-issues-bot moved this from We're working on it to Researching in coverage-roadmap Oct 7, 2022
@cfn-github-issues-bot cfn-github-issues-bot moved this from Researching to We're working on it in coverage-roadmap Feb 21, 2023
@cfn-github-issues-bot cfn-github-issues-bot moved this from We're working on it to Coming Soon in coverage-roadmap Feb 21, 2023
@osdrv
Copy link

osdrv commented Feb 28, 2023

The issue has been fixed. The reason it was taking so long is because the RDS API was returning an internal error which makes CFN to retry it a few times before giving up. The server-side issue was addressed.

@cfn-github-issues-bot cfn-github-issues-bot moved this from Coming Soon to Shipped in coverage-roadmap Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
coverage-roadmap
  
Shipped
Development

No branches or pull requests

3 participants