reopen #192 #213

finchr · 2024-05-21T19:40:52Z

what
I implemented create_before_destroy on the aws_rds_cluster_instance default instances.
Originally in #192 but that was closed for reasons we won't go into here.

why
Making a change to any parameter that triggers a replace on a aws_rds_cluster_instance results in all instances being destroyed before attempting to create a new instance which causes an outage. This a faster (and safer) altenative to #191

references
This closes #190 and is an alternative to #191

joe-niland · 2024-05-22T11:27:11Z

These changes were released in v1.10.0.

osterman · 2024-05-22T13:56:38Z

Sadly, looks like this triggers some Terraform bug. Have you run the terratest tests locally?

osterman · 2024-05-22T13:56:58Z

/terratest

finchr · 2024-05-22T22:27:05Z

Hi @osterman, I found the issue and did another push. The random_pet did not handle the enabled == false condition.

osterman · 2024-05-29T00:50:48Z

/terratest

osterman · 2024-06-04T17:47:22Z

/terratest

Benbentwo · 2024-06-04T18:28:35Z

It looks like the tests are now failing on disabled (enabled = false) should not create any resources. I believe this should be fixed by

resource "random_pet" "..." {
count = local.enabled ? 1 : 0
...
}

note you then need to update references to the random pet to use an array. such as

one(random_pet.instance[*].keepers.instance_class)

finchr · 2024-06-04T18:44:54Z

It looks like the tests are now failing on disabled (enabled = false) should not create any resources. I believe this should be fixed by
resource "random_pet" "..." {
count = local.enabled ? 1 : 0
...
}
note you then need to update references to the random pet to use an array. such as
one(random_pet.instance[*].keepers.instance_class)

Hi @Benbentwo , that makes sense and better than the hack I had in there. I just pushed a fix for this.

GabisCampana · 2024-06-05T21:45:07Z

@Benbentwo @osterman this workflow is awaiting approval: https://github.com/cloudposse/terraform-aws-rds-cluster/actions/runs/9372514860

Benbentwo · 2024-06-06T21:29:12Z

/terratest

Benbentwo · 2024-06-06T23:15:57Z

/terratest

main.tf

morremeyer · 2024-06-10T09:40:30Z

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

syphernl · 2024-06-10T12:16:37Z

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

Based off @finchr's comment in #192 (comment) it should only replace the DB Instances.
It does feel a bit scary and a remark about this in the changelog would've probably been a good idea 😅

I ran this update on one of our test envs and what I see that happens is:

New instance gets created (as Reader Instance)
Existing instance (Writer Instance) is being kept during that process
The old instance is deleted
The new instance is promoted to Writer Instance (this happens during the delete process)

It looked a bit scary, as the AWS console stated it was deleting the "Writer Instance" without having the new one promoted to become a Writer first so it appears this could potentially cause a brief downtime (in regards to writing).

kevcube · 2024-06-10T14:39:55Z

It does feel a bit scary and a remark about this in the changelog would've probably been a good idea 😅

@syphernl @Benbentwo agree, while this isn't a breaking change, some sort of WARNING: line in changelog would be a good mention in the future.

But running this fully hands-off in our dev environment we did not see an interruption in our application's connection to db. So good work @finchr!

finchr · 2024-06-11T22:05:40Z

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

Hi. @morremeyer the intent was to be able to update a cluster with near zero downtime. We ran into several scenarios where terraform was doing that anyway without the benefit of create_before_destroy. Plus this is a lot faster that in-place updates of existing nodes.

finchr added 4 commits May 19, 2024 11:57

implement create_before_destroy on instances

9bdc65f

add random provider to versions.tf

9d2bf1d

run terraform fmt

049f717

run make readme to pass ci action

95d873e

finchr requested review from a team as code owners May 21, 2024 19:40

finchr requested review from hans-d and RoseSecurity May 21, 2024 19:40

finchr mentioned this pull request May 21, 2024

implement create_before_destroy on instances #192

Closed

mergify bot added the triage Needs triage label May 21, 2024

joe-niland added the minor New features that do not break anything label May 22, 2024

fix random_pet in case of enabled == true

f16cb42

finchr force-pushed the finchr_create_before_destroy branch from a8f8eaa to f16cb42 Compare May 22, 2024 22:22

finchr added 2 commits May 29, 2024 10:35

Merge branch 'main' into finchr_create_before_destroy

4140c79

Merge branch 'main' into finchr_create_before_destroy

4df0387

don't create resource in enabled == false

1e570e2

Merge branch 'main' into finchr_create_before_destroy

2d99ba7

Benbentwo enabled auto-merge (squash) June 6, 2024 23:16

Benbentwo reviewed Jun 6, 2024

View reviewed changes

main.tf Show resolved Hide resolved

Benbentwo approved these changes Jun 7, 2024

View reviewed changes

Benbentwo merged commit 54be61f into cloudposse:main Jun 7, 2024
21 checks passed

mergify bot removed the triage Needs triage label Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reopen #192 #213

reopen #192 #213

finchr commented May 21, 2024

joe-niland commented May 22, 2024 •

edited by github-actions bot

Loading

osterman commented May 22, 2024

osterman commented May 22, 2024

finchr commented May 22, 2024

osterman commented May 29, 2024

osterman commented Jun 4, 2024

Benbentwo commented Jun 4, 2024

finchr commented Jun 4, 2024

GabisCampana commented Jun 5, 2024

Benbentwo commented Jun 6, 2024

Benbentwo commented Jun 6, 2024

morremeyer commented Jun 10, 2024

syphernl commented Jun 10, 2024

kevcube commented Jun 10, 2024 •

edited

Loading

finchr commented Jun 11, 2024

reopen #192 #213

reopen #192 #213

Conversation

finchr commented May 21, 2024

joe-niland commented May 22, 2024 • edited by github-actions bot Loading

osterman commented May 22, 2024

osterman commented May 22, 2024

finchr commented May 22, 2024

osterman commented May 29, 2024

osterman commented Jun 4, 2024

Benbentwo commented Jun 4, 2024

finchr commented Jun 4, 2024

GabisCampana commented Jun 5, 2024

Benbentwo commented Jun 6, 2024

Benbentwo commented Jun 6, 2024

morremeyer commented Jun 10, 2024

syphernl commented Jun 10, 2024

kevcube commented Jun 10, 2024 • edited Loading

finchr commented Jun 11, 2024

joe-niland commented May 22, 2024 •

edited by github-actions bot

Loading

kevcube commented Jun 10, 2024 •

edited

Loading