Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rerunning terraform - cant update node #336

Closed
thesutex opened this issue Sep 9, 2020 · 16 comments
Closed

Rerunning terraform - cant update node #336

thesutex opened this issue Sep 9, 2020 · 16 comments
Labels
Milestone

Comments

@thesutex
Copy link

thesutex commented Sep 9, 2020

Hi

I am setting up a basic vip-pool-node config using terraform and azure apps where the node IP is fetched from previous terraform code to setup privatelink to azure, this this IP is input to this terraform script (and not known before), this works like a charm the first time but if I rerun the deploy it fails because terraform wants to recreate the node because IP is not known before run. This again fails since recreating the node fails due to being attached to a pool

{"code":400,"message":"01070110:3: Node address '/Web-Applications/web-xyz_node' is referenced by a member of pool '/Web-Applications/web-xyz'.","errorStack":[],"apiError":3}

is there a way to solve this using current module?

as for why rerunning, this code is a part of the application deploy that gets updated regularly

@focrensh
Copy link
Collaborator

Please provide the repro steps including examples of the resources in question. This will help narrow down the exact issue here. You may need some logic for removing the member before the node can be re-created.

@thesutex
Copy link
Author

resource "bigip_ltm_node" "node" {
name = local.nodename
address = var.privateip
description = "Azure privatelink node"
monitor = ""
depends_on = [bigip_ltm_pool.pool]
}

resource "bigip_ltm_pool" "pool" {
name = local.poolname
load_balancing_mode = "round-robin"
description = "azure appservice pool"
}

resource "bigip_ltm_pool_attachment" "attach_node" {
pool = bigip_ltm_pool.pool.name
node = "${bigip_ltm_node.node.name}:443"
depends_on = [bigip_ltm_node.node]
}

so in the example above the "var.privateip" is set my a previous module creating a private endpoint in azure. Since this is not known terraform tries to recreate the node in every deploy, and on run 2++ since the node already exists it fails due to being attached to a pool.

@RavinderReddyF5
Copy link
Collaborator

@thesutex,
I used data source data.azurerm_app_service.example.default_site_hostname to get the site name of app service and used this as node address, able to create node without any issues. even second terraform apply didn't throw any errors( of-course there is no change in service resource)

please let me know did i miss anything

Resource snippet:

resource "bigip_ltm_monitor" "monitor" {
  name     = "/Common/terraform_monitor"
  parent   = "/Common/http"
  send     = "GET /some/path\r\n"
  timeout  = "999"
  interval = "998"
}
resource "bigip_ltm_pool" "pool" {
  name                = "/Common/terraform-pool"
  load_balancing_mode = "round-robin"
  monitors            = ["${bigip_ltm_monitor.monitor.name}"]
  allow_snat          = "yes"
  allow_nat           = "yes"
}

resource "bigip_ltm_node" "node" {
  name    = "/Common/terraform_node"
  address = data.azurerm_app_service.example.default_site_hostname
}

resource "bigip_ltm_pool_attachment" "attach_node" {
  pool = bigip_ltm_pool.pool.name
  node = "${bigip_ltm_node.node.name}:443"
}

@thesutex
Copy link
Author

thesutex commented Sep 25, 2020

Hi @RavinderReddyF5 , as i wrote above the problem is that its the IP address from the Azure Private Link service.

module one outputs this:

output "webapp_data_private_ip" {
  value = data.azurerm_private_endpoint_connection.privateendpointdeployed.private_service_connection[0].private_ip_address
}

gets the value to f5 module:

privateip = module.privateendpoint.webapp_data_private_ip

and then creates the node:

resource "bigip_ltm_node" "node" {
  name             = local.nodename
  address          = var.privateip
  description      = "Azure privatelink node"
  monitor = ""
  depends_on = [bigip_ltm_pool.pool]

Terraform outputs: during plan:

2020-09-09T13:28:20.4572397Z �[1m  # module.bigip.bigip_ltm_node.node�[0m must be �[1m�[31mreplaced�[0m�[0m
2020-09-09T13:28:20.4572748Z �[0m�[31m-�[0m/�[32m+�[0m�[0m resource "bigip_ltm_node" "node" {
2020-09-09T13:28:20.4573173Z       �[33m~�[0m �[0m�[1m�[0maddress�[0m�[0m          = "10.40.0.4" �[33m->�[0m �[0m(known after apply) �[31m# *forces replacement�[0m�[0m
2020-09-09T13:28:20.4573653Z       �[33m~�[0m �[0m�[1m�[0mconnection_limit�[0m�[0m = 0 �[33m->�[0m �[0m(known after apply)
2020-09-09T13:28:20.4574013Z         �[1m�[0mdescription�[0m�[0m      = "Azure privatelink node"
2020-09-09T13:28:20.4574386Z       �[33m~�[0m �[0m�[1m�[0mdynamic_ratio�[0m�[0m    = 1 �[33m->�[0m �[0m(known after apply)
2020-09-09T13:28:20.4575006Z       �[33m~�[0m �[0m�[1m�[0mid�[0m�[0m               = "/Web-Applications/web-nzoth.azurewebsites.net_node" �[33m->�[0m �[0m(known after apply)
2020-09-09T13:28:20.4575512Z         �[1m�[0mname�[0m�[0m             = "/Web-Applications/web-nzoth.azurewebsites.net_node"
2020-09-09T13:28:20.4576180Z       �[33m~�[0m �[0m�[1m�[0mrate_limit�[0m�[0m       = "disabled" �[33m->�[0m �[0m(known after apply)
2020-09-09T13:28:20.4577113Z       �[33m~�[0m �[0m�[1m�[0mratio�[0m�[0m            = 1 �[33m->�[0m �[0m(known after apply)
2020-09-09T13:28:20.4578012Z     }

10.40.0.4 is the IP set from first apply, and is still the IP. but it still forces the apply and fails with

�[1m�[31mError: �[0m�[0m�[1mHTTP 400 :: {"code":400,"message":"01070110:3: Node address '/Web-Applications/web-nzoth.azurewebsites.net_node' is referenced by a member of pool '/Web-Applications/web-nzoth.azurewebsites.net'.","errorStack":[],"apiError":3}�[0m

If I need some logic to remove this node from pool first before rerunning, how would that work on first run? and how do one do that?

@whume
Copy link

whume commented Sep 28, 2020

We ran into this same issue and seem blocked. Our issue is that the node is coming up with same name but new IP address and the attachment does not seem to be deleted first to clear out the existing node. Removing that node or adding a new one are not a issue its only when updating an existing one that forces a recreate of that node. Is it possible to add a recreate for the attachment as well?

@nmenant
Copy link

nmenant commented Sep 30, 2020

I've done the following:

resource "bigip_ltm_monitor" "monitor" {
  name     = "/Common/terraform_monitor"
  parent   = "/Common/http"
  send     = "GET /some/path\r\n"
  timeout  = "997"
  interval = "996"
}

resource "bigip_ltm_pool" "pool" {
  name                = "/Common/terraform-pool"
  load_balancing_mode = "round-robin"
  monitors            = ["${bigip_ltm_monitor.monitor.name}"]
  allow_snat          = "yes"
  allow_nat           = "yes"
}
resource "bigip_ltm_node" "node" {
  name    = "/Common/terraform_node"
  address = "192.168.30.2"
}

resource "bigip_ltm_pool_attachment" "attach_node" {
  pool = bigip_ltm_pool.pool.name
  node = "${bigip_ltm_node.node.name}:80"
}

This run successfully

When updating the node and running it again it will fail:

resource "bigip_ltm_monitor" "monitor" {
  name     = "/Common/terraform_monitor"
  parent   = "/Common/http"
  send     = "GET /some/path\r\n"
  timeout  = "997"
  interval = "996"
}

resource "bigip_ltm_pool" "pool" {
  name                = "/Common/terraform-pool"
  load_balancing_mode = "round-robin"
  monitors            = ["${bigip_ltm_monitor.monitor.name}"]
  allow_snat          = "yes"
  allow_nat           = "yes"
}
resource "bigip_ltm_node" "node" {
  name    = "/Common/terraform_node"
  address = "192.168.30.3"
}

resource "bigip_ltm_pool_attachment" "attach_node" {
  pool = bigip_ltm_pool.pool.name
  node = "${bigip_ltm_node.node.name}:80"
}
PAR-ML-00026375:bigip_ltm_node menant$ terraform apply --auto-approve
bigip_ltm_node.node: Refreshing state... [id=/Common/terraform_node]
bigip_ltm_monitor.monitor: Refreshing state... [id=/Common/terraform_monitor]
bigip_ltm_pool.pool: Refreshing state... [id=/Common/terraform-pool]
bigip_ltm_pool_attachment.attach_node: Refreshing state... [id=/Common/terraform-pool-/Common/terraform_node:80]
bigip_ltm_monitor.monitor: Modifying... [id=/Common/terraform_monitor]
bigip_ltm_monitor.monitor: Modifications complete after 1s [id=/Common/terraform_monitor]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
PAR-ML-00026375:bigip_ltm_node menant$ terraform apply --auto-approve
bigip_ltm_monitor.monitor: Refreshing state... [id=/Common/terraform_monitor]
bigip_ltm_node.node: Refreshing state... [id=/Common/terraform_node]
bigip_ltm_pool.pool: Refreshing state... [id=/Common/terraform-pool]
bigip_ltm_pool_attachment.attach_node: Refreshing state... [id=/Common/terraform-pool-/Common/terraform_node:80]
bigip_ltm_node.node: Destroying... [id=/Common/terraform_node]

Error: HTTP 400 :: {"code":400,"message":"01070110:3: Node address '/Common/terraform_node' is referenced by a member of pool '/Common/terraform-pool'.","errorStack":[],"apiError":3}

Updating the node on its own works UNTIL it's tied to a pool as a pool member. When updating a node's IP, you can see that we are deleting and creating the resource again:

This is the terraform output when updating an existing node:

bigip_ltm_node.node: Refreshing state... [id=/Common/terraform_node1]
bigip_ltm_node.node: Destroying... [id=/Common/terraform_node1]
bigip_ltm_node.node: Destruction complete after 0s
bigip_ltm_node.node: Creating...
bigip_ltm_node.node: Creation complete after 0s [id=/Common/terraform_node1]

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

This behaviour cannot work since you aren't allowed to delete a node that has been assigned to a pool. You have the same thing via the GUI:
image

We need to review how pool / pool members are created and managed via terraform without introducing breaking changes.

Tracking this internally with TER-477

@papineni87
Copy link
Collaborator

Issue fixed in 1.3.3 release

@RavinderReddyF5
Copy link
Collaborator

@thesutex please use pool attachment resource as outlined in : https://registry.terraform.io/providers/F5Networks/bigip/latest/docs/resources/bigip_ltm_pool_attachment

we modified pool attachment resource to remove dependency on ltm_node resource.

@whume
Copy link

whume commented Oct 15, 2020

Maybe I am wrong here but I tested and still see this failing. Other than removing the dependency did anything else change with the resource?

resource "bigip_ltm_node" "nodes" {
  count       = var.node_count
  name        = "/${local.partition}/${upper(element(local.node_names, count.index))}"
  address     = element(local.node_ips, count.index)
  description = "Deployed by Terraform"
}

resource "bigip_ltm_pool" "pool" {
  name                = "/${local.partition}/${var.vip_fqdn}_k8s_${local.pool_port}_pool"
  monitors            = local.monitor
  allow_nat           = "yes"
  allow_snat          = "yes"
  load_balancing_mode = "round-robin"
  description         = "Deployed by Terraform"
}

resource "bigip_ltm_pool_attachment" "attach" {
  count      = var.node_count
  pool       = bigip_ltm_pool.pool.name
  node       = "${bigip_ltm_node.nodes[count.index].name}:${local.pool_port}"
}

Results in this error when changing an IP address of a node

The Plan shows its will recreate the node but not the attachment:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.master_vips.bigip_ltm_node.nodes[2] must be replaced
-/+ resource "bigip_ltm_node" "nodes" {
      ~ address          = "10.129.82.192" -> "10.129.82.193" # forces replacement
      ~ connection_limit = 0 -> (known after apply)
        description      = "Deployed by Terraform"
      ~ dynamic_ratio    = 1 -> (known after apply)
      ~ id               = "/AUTO-CONTAINERS-DEVTST/TMP01TMPD1VM302" -> (known after apply)
        monitor          = "/Common/icmp"
        name             = "/AUTO-CONTAINERS-DEVTST/TMP01TMPD1VM302"
      ~ rate_limit       = "disabled" -> (known after apply)
      ~ ratio            = 1 -> (known after apply)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

And the apply fails with

module.master_vips.bigip_ltm_node.nodes[2]: Destroying... [id=/AUTO-CONTAINERS-DEVTST/TMP01TMPD1VM302]

Error: HTTP 400 :: {"code":400,"message":"01070110:3: Node address '/AUTO-CONTAINERS-DEVTST/TMP01TMPD1VM302' is referenced by a member of pool '/AUTO-CONTAINERS-DEVTST/tmp01-int-test-dev-xxxxxxxxxx.com_k8s_6443_pool'.","errorStack":[],"apiError":3}

@RavinderReddyF5
Copy link
Collaborator

RavinderReddyF5 commented Oct 15, 2020

@whume
still you are using node resource to attach pool members, remove ltm node resource and directly attach member to pool using
pool attachment resource.

and make sure your node is dis associated from pool

resource "bigip_ltm_pool_attachment" "attach_node" {
  pool                  = bigip_ltm_pool.pool.name
  node                  = "1.1.1.1:80"
  ratio                 = 2
  connection_limit      = 2
  connection_rate_limit = 2
  priority_group        = 2
  dynamic_ratio         = 3
}

@whume
Copy link

whume commented Oct 15, 2020

So I tried to update to what you suggested.

resource "bigip_ltm_pool_attachment" "attach" {
  count      = var.node_count
  pool       = bigip_ltm_pool.pool.name
  node       = "/${local.partition}/${element(local.node_ips, count.index)}:${local.pool_port}"
}

I added the partition to the node as well as im not working in the common partition.

I destroyed the whole vip and tried to recreate from scratch and when I do I get a inconsistent result error

Error: Provider produced inconsistent result after apply

When applying changes to
module.master_vips.bigip_ltm_pool_attachment.attach[0], provider
"registry.terraform.io/-/bigip" produced an unexpected new value for was
present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent result after apply

When applying changes to
module.master_vips.bigip_ltm_pool_attachment.attach[2], provider
"registry.terraform.io/-/bigip" produced an unexpected new value for was
present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent result after apply

When applying changes to
module.master_vips.bigip_ltm_pool_attachment.attach[1], provider
"registry.terraform.io/-/bigip" produced an unexpected new value for was
present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

@whume
Copy link

whume commented Oct 15, 2020

OK, So I did figure out what you were saying on this and I am still not sure this is a good solution.

This change introduces what I would call a breaking bug in the provider on a minor release version. We ended up pinning a bunch of our deployments to the previous version to get around the issue.

Additionally, while this works it creates the node in a way that it is not managed in state and by Terraform. So if you change the IP address it will remove it from the pool and make a new node but the node now exists in F5 and is orphaned. For ephemeral workloads like we are doing that could leaves 100's if not 1000's of stale entries in the F5. This could be mitigated by running cleanup scripts periodically but I think this should be handled in Terraform.

The last issue I see is that this change requires you to kill the nodes off out of the pool creating down time. While not a huge deal and can be mitigated there is no clear upgrade path and you cant just run Terraform apply to have it delete the old nodes and replace with the new ones.

While I am glad to see a fix come in and appreciate the quick turn around I think this should maybe be reverted and either released in a major version or reworked.

Thanks

@focrensh
Copy link
Collaborator

Thanks for the feedback @whume , can you elaborate a bit more on the last issue mentioned above. I am not clear how the new work flow introduces this issue.

@whume
Copy link

whume commented Oct 21, 2020

The last comment was mostly just the fact that its not a in place change. If you try to replace the node so it uses the new naming convention of the IP:Port on the attachment it tries to create a node with the same IP as the old nodes and errors out with a already in use error.
So the only way to apply this new change is to destroy the vips / pool / nodes so its can recreate them since you can remove the nodes without decoupling them from the pool. Due to the amount of manual effort this creates we decided to pin our version back for now since even if you try to run an apply using the old method of node naming (currently our hostnames) the provider throws an error with a naming issue needing to be IP:Port on the attachment. Hope that makes sense!

@focrensh
Copy link
Collaborator

It does thanks. What I have seen in a different migration was to remove the NODE resources from the config, delete their state out of the state file, and then reference the IP:PT of the nodes in the attachment resource. The attachment resource will find the existing nodes and use them within the pools.

We are actively working on fixing the patch release issue.

Thanks,

@bcorner13
Copy link

@whume what version(s) did you pin to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants