Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reopen #192 #213

Merged
merged 9 commits into from
Jun 7, 2024
Merged

Conversation

finchr
Copy link
Contributor

@finchr finchr commented May 21, 2024

what
I implemented create_before_destroy on the aws_rds_cluster_instance default instances.
Originally in #192 but that was closed for reasons we won't go into here.

why
Making a change to any parameter that triggers a replace on a aws_rds_cluster_instance results in all instances being destroyed before attempting to create a new instance which causes an outage. This a faster (and safer) altenative to #191

references
This closes #190 and is an alternative to #191

@finchr finchr requested review from a team as code owners May 21, 2024 19:40
@mergify mergify bot added the triage Needs triage label May 21, 2024
@joe-niland joe-niland added the minor New features that do not break anything label May 22, 2024
@joe-niland
Copy link
Sponsor Member

joe-niland commented May 22, 2024

These changes were released in v1.10.0.

@osterman
Copy link
Member

Sadly, looks like this triggers some Terraform bug. Have you run the terratest tests locally?

image

@osterman
Copy link
Member

/terratest

@finchr finchr force-pushed the finchr_create_before_destroy branch from a8f8eaa to f16cb42 Compare May 22, 2024 22:22
@finchr
Copy link
Contributor Author

finchr commented May 22, 2024

Hi @osterman, I found the issue and did another push. The random_pet did not handle the enabled == false condition.

@osterman
Copy link
Member

/terratest

@osterman
Copy link
Member

osterman commented Jun 4, 2024

/terratest

@Benbentwo
Copy link
Member

It looks like the tests are now failing on disabled (enabled = false) should not create any resources. I believe this should be fixed by

resource "random_pet" "..." {
count = local.enabled ? 1 : 0
...
}

note you then need to update references to the random pet to use an array. such as

one(random_pet.instance[*].keepers.instance_class)

@finchr
Copy link
Contributor Author

finchr commented Jun 4, 2024

It looks like the tests are now failing on disabled (enabled = false) should not create any resources. I believe this should be fixed by

resource "random_pet" "..." {
count = local.enabled ? 1 : 0
...
}

note you then need to update references to the random pet to use an array. such as

one(random_pet.instance[*].keepers.instance_class)

Hi @Benbentwo , that makes sense and better than the hack I had in there. I just pushed a fix for this.

@GabisCampana
Copy link

@Benbentwo @osterman this workflow is awaiting approval: https://github.com/cloudposse/terraform-aws-rds-cluster/actions/runs/9372514860

@Benbentwo
Copy link
Member

/terratest

@Benbentwo
Copy link
Member

/terratest

@Benbentwo Benbentwo enabled auto-merge (squash) June 6, 2024 23:16
main.tf Show resolved Hide resolved
@Benbentwo Benbentwo merged commit 54be61f into cloudposse:main Jun 7, 2024
21 checks passed
@mergify mergify bot removed the triage Needs triage label Jun 7, 2024
@morremeyer
Copy link

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

@syphernl
Copy link
Contributor

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

Based off @finchr's comment in #192 (comment) it should only replace the DB Instances.
It does feel a bit scary and a remark about this in the changelog would've probably been a good idea 😅

I ran this update on one of our test envs and what I see that happens is:

  • New instance gets created (as Reader Instance)
  • Existing instance (Writer Instance) is being kept during that process
  • The old instance is deleted
  • The new instance is promoted to Writer Instance (this happens during the delete process)

It looked a bit scary, as the AWS console stated it was deleting the "Writer Instance" without having the new one promoted to become a Writer first so it appears this could potentially cause a brief downtime (in regards to writing).

@kevcube
Copy link
Contributor

kevcube commented Jun 10, 2024

It does feel a bit scary and a remark about this in the changelog would've probably been a good idea 😅

@syphernl @Benbentwo agree, while this isn't a breaking change, some sort of WARNING: line in changelog would be a good mention in the future.

But running this fully hands-off in our dev environment we did not see an interruption in our application's connection to db. So good work @finchr!

@finchr
Copy link
Contributor Author

finchr commented Jun 11, 2024

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

Hi. @morremeyer the intent was to be able to update a cluster with near zero downtime. We ran into several scenarios where terraform was doing that anyway without the benefit of create_before_destroy. Plus this is a lot faster that in-place updates of existing nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor New features that do not break anything
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement rolling update for instances.
8 participants