Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-rds] Minimize downtime during DBCluster updates #10595

Closed
1 of 2 tasks
hixi-hyi opened this issue Sep 29, 2020 · 10 comments · Fixed by #20054
Closed
1 of 2 tasks

[aws-rds] Minimize downtime during DBCluster updates #10595

hixi-hyi opened this issue Sep 29, 2020 · 10 comments · Fixed by #20054
Labels
@aws-cdk/aws-rds Related to Amazon Relational Database effort/large Large work item – several weeks of effort feature-request A feature should be added or improved. p2

Comments

@hixi-hyi
Copy link
Contributor

hixi-hyi commented Sep 29, 2020

Minimize downtime during DB Cluster updates

Current Status

The CfnDBInstance of DBCluster is currently loosely coupled.
That is, if there are multiple CfnDBInstances, Instance updates will occur at the same time because there is no dependency on them on Cloudformation.
Therefore, the cluster will not be available until the DBInstance update is complete.

Proposal

Adds a dependency to CfnDBInstance.
As a result, one by one, RollingUpdate will be performed, and the only downtime will be the timing of the primary switch.
In other words, when there are two instances, it will take only two failover times to update. (A)
If we can create Dependency dynamically, it will take only a one-time failover time to update. (B)

I think primary failover times are faster than Instance updates. So I think it would be useful to include this feature.
However, the update time for Stack and the maintenance time for offline updates will increase.

What do you think about this proposal?
I'd like to hear your opinion.

Proposal Solution (A)

instance.node.addDependency(internetConnected);

Add instance.node.addDependency(previous_instance);

Proposal Solution (B)

I think we need to use aws-sdk to determine if the current Instance is primary or replica, but I haven't thought about it in detail.

  • 👋 I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

@hixi-hyi hixi-hyi added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Sep 29, 2020
@github-actions github-actions bot added the @aws-cdk/aws-rds Related to Amazon Relational Database label Sep 29, 2020
@hixi-hyi
Copy link
Contributor Author

hixi-hyi commented Sep 29, 2020

p.s. I thought it might be a good idea to create an instanceUpdateBehavior argument and change the behavior accordingly. (ROLLING, BULK)

@skinny85
Copy link
Contributor

skinny85 commented Dec 5, 2020

Hey @hixi-hyi ,

thanks for opening the issue. This is a very interesting proposal. Pinging @jogold as well , for visibility.

p.s. I thought it might be a good idea to create an instanceUpdateBehavior argument and change the behavior accordingly. (ROLLING, BULK)

Where does this instanceUpdateBehavior live? On the Cluster itself, or somewhere else?

@skinny85 skinny85 added effort/large Large work item – several weeks of effort p2 and removed needs-triage This issue or PR still needs to be triaged. labels Dec 5, 2020
@hixi-hyi
Copy link
Contributor Author

hixi-hyi commented Mar 8, 2021

@skinny85

Where does this instanceUpdateBehavior live? On the Cluster itself, or somewhere else?

I think a better definition would be Define to

interface DatabaseClusterBaseProps {

The developer assigns this attribute when creating a DatabaseCluster.

@skinny85
Copy link
Contributor

skinny85 commented Mar 8, 2021

@hixi-hyi I think I see where you're going with this. So we would add a property to DatabaseClusterProps, called something like instanceUpdateBehavior, whose type would be an enum with 2 members, with names like BULK (the current behavior, and so the default) and ROLLING (update the instances one-by-one by adding dependencies between them)?

Did I understand your suggestion correctly?

@hixi-hyi
Copy link
Contributor Author

@skinny85 Yes, You know exactly what I mean.

@skinny85
Copy link
Contributor

I'm glad @hixi-hyi 🙂.

Any chance of opening us a PR implementing this? Should only require adding a property here (or perhaps InstanceProps are a better place for it...?), and adding the DependsOn somewhere here.

Here's our Contributing guide: https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md.

Thanks,
Adam

spanierm42 added a commit to spanierm42/aws-cdk that referenced this issue Apr 23, 2022
Support defining the instance update behaviour of RDS instances. This allows to switch between bulk (all instances at once) and rolling updates (one instance after another). While bulk updates are faster, they have a higher risk for longer downtimes as all instances might be simultaneously unreachable due to the update. Rolling updates take longer but ensure that all but one instance are not updated and thus downtimes are limited to the (at most two) changes of the primary instance.

We keep the current behaviour, namely a bulk update, as default.

This implementation follows proposal A by  hixi-hyi in issue aws#10595.

Fixes aws#10595
@spanierm42
Copy link
Contributor

@skinny85 I added a PR for this issue some weeks ago. How long does it commonly take for the CDK maintainers to provide feedback to it? Is there anything I can do to speed up the process?

@skinny85
Copy link
Contributor

@mod-enter apologies for the bad experience! Unfortunately, I'm no longer with the CDK team, so I can't review your Pull Request.

Perhaps @TheRealAmazonKendra can help with this one?

@spanierm42
Copy link
Contributor

No worries, thanks for helping me anyway :)

spanierm42 added a commit to spanierm42/aws-cdk that referenced this issue Jul 12, 2022
Support defining the instance update behaviour of RDS instances. This allows to switch between bulk (all instances at once) and rolling updates (one instance after another). While bulk updates are faster, they have a higher risk for longer downtimes as all instances might be simultaneously unreachable due to the update. Rolling updates take longer but ensure that all but one instance are not updated and thus downtimes are limited to the (at most two) changes of the primary instance.

We keep the current behaviour, namely a bulk update, as default.

This implementation follows proposal A by  hixi-hyi in issue aws#10595.
spanierm42 added a commit to spanierm42/aws-cdk that referenced this issue Jul 12, 2022
Support defining the instance update behaviour of RDS instances. This allows to switch between bulk (all instances at once) and rolling updates (one instance after another). While bulk updates are faster, they have a higher risk for longer downtimes as all instances might be simultaneously unreachable due to the update. Rolling updates take longer but ensure that all but one instance are not updated and thus downtimes are limited to the (at most two) changes of the primary instance.

We keep the current behaviour, namely a bulk update, as default.

This implementation follows proposal A by  hixi-hyi in issue aws#10595.
spanierm42 added a commit to spanierm42/aws-cdk that referenced this issue Jul 12, 2022
Support defining the instance update behaviour of RDS instances. This allows to switch between bulk (all instances at once) and rolling updates (one instance after another). While bulk updates are faster, they have a higher risk for longer downtimes as all instances might be simultaneously unreachable due to the update. Rolling updates take longer but ensure that all but one instance are not updated and thus downtimes are limited to the (at most two) changes of the primary instance.

We keep the current behaviour, namely a bulk update, as default.

This implementation follows proposal A by  hixi-hyi in issue aws#10595.
@mergify mergify bot closed this as completed in #20054 Jul 12, 2022
mergify bot pushed a commit that referenced this issue Jul 12, 2022
Support defining the instance update behaviour of RDS instances. This allows to switch between bulk (all instances at once) and rolling updates (one instance after another). While bulk updates are faster, they have a higher risk for longer downtimes as all instances might be simultaneously unreachable due to the update. Rolling updates take longer but ensure that all but one instance are not updated and thus downtimes are limited to the (at most two) changes of the primary instance.

We keep the current behaviour, namely a bulk update, as default.

This implementation follows proposal A by  @hixi-hyi  in issue #10595.

Fixes  #10595
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-rds Related to Amazon Relational Database effort/large Large work item – several weeks of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants