-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(rds): AWS::RDS::DBInstance should not have EngineVersion property set for Aurora clusters #21758
Comments
It seems like this issue impacts a significant number of customers, and I've tagged it as P1, which means it should be on our near-term roadmap. We welcome community contributions! If you are able, we encourage you to contribute (https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md) a bug fix or new feature to the CDK. If you decide to contribute, please start an engineering discussion in this issue to ensure there is a commonly understood design before submitting code. This will minimize the number of review cycles and get your code merged faster. |
Thanks all for raising this issue. We are hitting this in production as well. |
IMPORTANT: please note that I had an issue in the workaround I previously posted above. I've edited the description so the workaround there is now correct. Previously, I was only removing the const cfnInstances = cluster.node.children.filter((child) => child instanceof CfnDBInstance);
if (cfnInstances.length === 0) {
throw new Error("Couldn't pull CfnDBInstances from the L1 constructs!");
}
cfnInstances.forEach((cfnInstance) => delete (cfnInstance as CfnDBInstance).engineVersion); Note that instead of a |
We are observing the same bug even upgrading between minor versions (#21899). I wonder if an Aspect would be a better fit for the workaround. |
Thanks @rittneje, that behavior you saw makes sense. I updated the ticket to remove the language about minor versions being OK. |
@corymhall Not sure if any bug has been filed against CloudFormation, but it should have rejected templates with an AWS::RDS::DBInstance that has both |
@corymhall I moved my PR over to this issue. Can you give the PR a look please? |
…s that were part of a DBCluster (#22185) Engine version should not be set on instances that are part of a cluster. The cluster is responsible for this setting and throws an API error when an update is attempted on them. closes #21758 #22180 ---- ### All Submissions: * [X] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) ### Adding new Unconventional Dependencies: * [x] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies) ### New Features * [x] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)? * [x] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)? *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
|
…s that were part of a DBCluster (aws#22185) Engine version should not be set on instances that are part of a cluster. The cluster is responsible for this setting and throws an API error when an update is attempted on them. closes aws#21758 aws#22180 ---- ### All Submissions: * [X] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) ### Adding new Unconventional Dependencies: * [x] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies) ### New Features * [x] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)? * [x] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)? *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
…s that were part of a DBCluster (aws#22185) Engine version should not be set on instances that are part of a cluster. The cluster is responsible for this setting and throws an API error when an update is attempted on them. closes aws#21758 aws#22180 ---- ### All Submissions: * [X] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) ### Adding new Unconventional Dependencies: * [x] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies) ### New Features * [x] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)? * [x] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)? *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
…s that were part of a DBCluster (aws#22185) Engine version should not be set on instances that are part of a cluster. The cluster is responsible for this setting and throws an API error when an update is attempted on them. closes aws#21758 aws#22180 ---- ### All Submissions: * [X] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) ### Adding new Unconventional Dependencies: * [x] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies) ### New Features * [x] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)? * [x] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)? *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Is it correct that the only way out of this state is to delete the stack and the database? Or can the "skip resources" technique be used to complete the rollback into a "rollback_completed" state? |
I believe you can resolve this by completing the rollback by skipping resources in the UI. Then deploy a new template with the cluster version set to whatever it actually is in RDS right now, and EngineVersion removed from each DBInstance. I highly recommend testing this against some test stack first. |
Okay thanks. I was able to: Starting at bad state of Update Rollback Failed due to the problem detailed in this issue Some of the text in these steps might be different because I don't have access to that state any longer. The basic steps are also outlined here:
That seems to have worked and the cloudformation stack is back in the correct state. I then tried removing the patch code and redeploying using CDK version (2.61.1) but it failed with the original error and the diff shows an attempt to add EngineVersion to both instances. |
Describe the bug
Currently, when upgrading RDS Aurora between versions, a runtime error will occur that leaves the CloudFormation stack in an unrecoverable
UPDATE_ROLLBACK_FAILED
state. This is because CDK is setting theEngineVersion
property onAWS::RDS::DBInstance
, which the Cfn documentation states that you should not set when using an Aurora cluster:When upgrading between versions that require downtime for upgrade, it causes this error:
Here's a test stack with the event log:
The worst part about this bug is that the Database stack is left in the unrecoverable
UPDATE_ROLLBACK_FAILED
state, which means that you either have to:a) attempt to complete the rollback. this is impossible because the cluster actually upgrades even though the CloudFormation steps fail. A rollback between the new target major version and the old version is not allowed by RDS.
b) delete the stack. this obviously is not ideal because databases should not be deleted in most cases. worse, you can't update the
RetentionPolicy
to try to retain the database while deleting the stack.Expected Behavior
I expected to be able to update a DatabaseCluster between major supported versions via the
engine
property on DatabaseCluster.Current Behavior
As mentioned above, if you try to upgrade an Aurora cluster between major versions, you'll encounter this error, which leaves the stack in an unrecoverable state.
Reproduction Steps
cdk deploy
the stack abovecdk deploy
Observe: the stack update will fail with the aforementioned error, leaving the Database stack in an unrecoverable state.
Possible Solution
Short Term Workaround
As a short-term solution, users can reach into the L1 construct to remove the
EngineVersion
property like this:I've tested removing the
EngineVersion
property from an existing CloudFormation Stack/DbInstance. It shows the following diff:And the change applied with (what appears to be) no changes to the actual instances:
I verified in the RDS console that the instance didn't shut down or have any other events during this
cdk deploy
.So, I believe it should be safe for all Aurora DatabaseCluster users to use this workaround without downtime. However, I've only tested this on my test cluster, so your mileage may vary.
Note that this should only be done for Aurora clusters, per the
AWS::RDS::DBInstance
CFN documentationLonger Term Fix
CDK should detect when the
engine
passed is an Aurora RDS engine. In this case, it should not set theAWS::RDS::DBInstance
EngineVersion
propertyAdditional Information/Context
I confirmed this issue by testing in my AWS accounts. I also have an internal ticket (case #10630169951) opened with AWS Support about this issue. In addition to a CDK fix, it seems this could be handled more elegantly on the CloudFormation side.
CDK CLI Version
2.38.1 (build a5ced21)
Framework Version
No response
Node.js Version
16 LTS
OS
MacOS
Language
Typescript
Language Version
No response
Other information
Because of the unrecoverable state in which the DatabaseCluster stack is left, I'd highly recommend this being a P1 bug.
The text was updated successfully, but these errors were encountered: