Rerunning terraform - cant update node #336

thesutex · 2020-09-09T14:14:24Z

Hi

I am setting up a basic vip-pool-node config using terraform and azure apps where the node IP is fetched from previous terraform code to setup privatelink to azure, this this IP is input to this terraform script (and not known before), this works like a charm the first time but if I rerun the deploy it fails because terraform wants to recreate the node because IP is not known before run. This again fails since recreating the node fails due to being attached to a pool

{"code":400,"message":"01070110:3: Node address '/Web-Applications/web-xyz_node' is referenced by a member of pool '/Web-Applications/web-xyz'.","errorStack":[],"apiError":3}

is there a way to solve this using current module?

as for why rerunning, this code is a part of the application deploy that gets updated regularly

focrensh · 2020-09-22T02:38:42Z

Please provide the repro steps including examples of the resources in question. This will help narrow down the exact issue here. You may need some logic for removing the member before the node can be re-created.

thesutex · 2020-09-24T08:05:59Z

resource "bigip_ltm_node" "node" {
name = local.nodename
address = var.privateip
description = "Azure privatelink node"
monitor = ""
depends_on = [bigip_ltm_pool.pool]
}

resource "bigip_ltm_pool" "pool" {
name = local.poolname
load_balancing_mode = "round-robin"
description = "azure appservice pool"
}

resource "bigip_ltm_pool_attachment" "attach_node" {
pool = bigip_ltm_pool.pool.name
node = "${bigip_ltm_node.node.name}:443"
depends_on = [bigip_ltm_node.node]
}

so in the example above the "var.privateip" is set my a previous module creating a private endpoint in azure. Since this is not known terraform tries to recreate the node in every deploy, and on run 2++ since the node already exists it fails due to being attached to a pool.

RavinderReddyF5 · 2020-09-24T18:30:37Z

@thesutex,
I used data source data.azurerm_app_service.example.default_site_hostname to get the site name of app service and used this as node address, able to create node without any issues. even second terraform apply didn't throw any errors( of-course there is no change in service resource)

please let me know did i miss anything

Resource snippet:

resource "bigip_ltm_monitor" "monitor" {
  name     = "/Common/terraform_monitor"
  parent   = "/Common/http"
  send     = "GET /some/path\r\n"
  timeout  = "999"
  interval = "998"
}
resource "bigip_ltm_pool" "pool" {
  name                = "/Common/terraform-pool"
  load_balancing_mode = "round-robin"
  monitors            = ["${bigip_ltm_monitor.monitor.name}"]
  allow_snat          = "yes"
  allow_nat           = "yes"
}

resource "bigip_ltm_node" "node" {
  name    = "/Common/terraform_node"
  address = data.azurerm_app_service.example.default_site_hostname
}

resource "bigip_ltm_pool_attachment" "attach_node" {
  pool = bigip_ltm_pool.pool.name
  node = "${bigip_ltm_node.node.name}:443"
}

thesutex · 2020-09-25T06:49:28Z

Hi @RavinderReddyF5 , as i wrote above the problem is that its the IP address from the Azure Private Link service.

module one outputs this:

output "webapp_data_private_ip" {
  value = data.azurerm_private_endpoint_connection.privateendpointdeployed.private_service_connection[0].private_ip_address
}

gets the value to f5 module:

privateip = module.privateendpoint.webapp_data_private_ip

and then creates the node:

resource "bigip_ltm_node" "node" {
  name             = local.nodename
  address          = var.privateip
  description      = "Azure privatelink node"
  monitor = ""
  depends_on = [bigip_ltm_pool.pool]

Terraform outputs: during plan:

2020-09-09T13:28:20.4572397Z �[1m  # module.bigip.bigip_ltm_node.node�[0m must be �[1m�[31mreplaced�[0m�[0m
2020-09-09T13:28:20.4572748Z �[0m�[31m-�[0m/�[32m+�[0m�[0m resource "bigip_ltm_node" "node" {
2020-09-09T13:28:20.4573173Z       �[33m~�[0m �[0m�[1m�[0maddress�[0m�[0m          = "10.40.0.4" �[33m->�[0m �[0m(known after apply) �[31m# *forces replacement�[0m�[0m
2020-09-09T13:28:20.4573653Z       �[33m~�[0m �[0m�[1m�[0mconnection_limit�[0m�[0m = 0 �[33m->�[0m �[0m(known after apply)
2020-09-09T13:28:20.4574013Z         �[1m�[0mdescription�[0m�[0m      = "Azure privatelink node"
2020-09-09T13:28:20.4574386Z       �[33m~�[0m �[0m�[1m�[0mdynamic_ratio�[0m�[0m    = 1 �[33m->�[0m �[0m(known after apply)
2020-09-09T13:28:20.4575006Z       �[33m~�[0m �[0m�[1m�[0mid�[0m�[0m               = "/Web-Applications/web-nzoth.azurewebsites.net_node" �[33m->�[0m �[0m(known after apply)
2020-09-09T13:28:20.4575512Z         �[1m�[0mname�[0m�[0m             = "/Web-Applications/web-nzoth.azurewebsites.net_node"
2020-09-09T13:28:20.4576180Z       �[33m~�[0m �[0m�[1m�[0mrate_limit�[0m�[0m       = "disabled" �[33m->�[0m �[0m(known after apply)
2020-09-09T13:28:20.4577113Z       �[33m~�[0m �[0m�[1m�[0mratio�[0m�[0m            = 1 �[33m->�[0m �[0m(known after apply)
2020-09-09T13:28:20.4578012Z     }

10.40.0.4 is the IP set from first apply, and is still the IP. but it still forces the apply and fails with

�[1m�[31mError: �[0m�[0m�[1mHTTP 400 :: {"code":400,"message":"01070110:3: Node address '/Web-Applications/web-nzoth.azurewebsites.net_node' is referenced by a member of pool '/Web-Applications/web-nzoth.azurewebsites.net'.","errorStack":[],"apiError":3}�[0m

If I need some logic to remove this node from pool first before rerunning, how would that work on first run? and how do one do that?

whume · 2020-09-28T15:32:26Z

We ran into this same issue and seem blocked. Our issue is that the node is coming up with same name but new IP address and the attachment does not seem to be deleted first to clear out the existing node. Removing that node or adding a new one are not a issue its only when updating an existing one that forces a recreate of that node. Is it possible to add a recreate for the attachment as well?

nmenant · 2020-09-30T08:32:27Z

I've done the following:

resource "bigip_ltm_monitor" "monitor" {
  name     = "/Common/terraform_monitor"
  parent   = "/Common/http"
  send     = "GET /some/path\r\n"
  timeout  = "997"
  interval = "996"
}

resource "bigip_ltm_pool" "pool" {
  name                = "/Common/terraform-pool"
  load_balancing_mode = "round-robin"
  monitors            = ["${bigip_ltm_monitor.monitor.name}"]
  allow_snat          = "yes"
  allow_nat           = "yes"
}
resource "bigip_ltm_node" "node" {
  name    = "/Common/terraform_node"
  address = "192.168.30.2"
}

resource "bigip_ltm_pool_attachment" "attach_node" {
  pool = bigip_ltm_pool.pool.name
  node = "${bigip_ltm_node.node.name}:80"
}

This run successfully

When updating the node and running it again it will fail:

resource "bigip_ltm_monitor" "monitor" {
  name     = "/Common/terraform_monitor"
  parent   = "/Common/http"
  send     = "GET /some/path\r\n"
  timeout  = "997"
  interval = "996"
}

resource "bigip_ltm_pool" "pool" {
  name                = "/Common/terraform-pool"
  load_balancing_mode = "round-robin"
  monitors            = ["${bigip_ltm_monitor.monitor.name}"]
  allow_snat          = "yes"
  allow_nat           = "yes"
}
resource "bigip_ltm_node" "node" {
  name    = "/Common/terraform_node"
  address = "192.168.30.3"
}

resource "bigip_ltm_pool_attachment" "attach_node" {
  pool = bigip_ltm_pool.pool.name
  node = "${bigip_ltm_node.node.name}:80"
}

PAR-ML-00026375:bigip_ltm_node menant$ terraform apply --auto-approve
bigip_ltm_node.node: Refreshing state... [id=/Common/terraform_node]
bigip_ltm_monitor.monitor: Refreshing state... [id=/Common/terraform_monitor]
bigip_ltm_pool.pool: Refreshing state... [id=/Common/terraform-pool]
bigip_ltm_pool_attachment.attach_node: Refreshing state... [id=/Common/terraform-pool-/Common/terraform_node:80]
bigip_ltm_monitor.monitor: Modifying... [id=/Common/terraform_monitor]
bigip_ltm_monitor.monitor: Modifications complete after 1s [id=/Common/terraform_monitor]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
PAR-ML-00026375:bigip_ltm_node menant$ terraform apply --auto-approve
bigip_ltm_monitor.monitor: Refreshing state... [id=/Common/terraform_monitor]
bigip_ltm_node.node: Refreshing state... [id=/Common/terraform_node]
bigip_ltm_pool.pool: Refreshing state... [id=/Common/terraform-pool]
bigip_ltm_pool_attachment.attach_node: Refreshing state... [id=/Common/terraform-pool-/Common/terraform_node:80]
bigip_ltm_node.node: Destroying... [id=/Common/terraform_node]

Error: HTTP 400 :: {"code":400,"message":"01070110:3: Node address '/Common/terraform_node' is referenced by a member of pool '/Common/terraform-pool'.","errorStack":[],"apiError":3}

Updating the node on its own works UNTIL it's tied to a pool as a pool member. When updating a node's IP, you can see that we are deleting and creating the resource again:

This is the terraform output when updating an existing node:

bigip_ltm_node.node: Refreshing state... [id=/Common/terraform_node1]
bigip_ltm_node.node: Destroying... [id=/Common/terraform_node1]
bigip_ltm_node.node: Destruction complete after 0s
bigip_ltm_node.node: Creating...
bigip_ltm_node.node: Creation complete after 0s [id=/Common/terraform_node1]

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

This behaviour cannot work since you aren't allowed to delete a node that has been assigned to a pool. You have the same thing via the GUI:

We need to review how pool / pool members are created and managed via terraform without introducing breaking changes.

Tracking this internally with TER-477

papineni87 · 2020-10-15T13:05:18Z

Issue fixed in 1.3.3 release

RavinderReddyF5 · 2020-10-15T13:07:12Z

@thesutex please use pool attachment resource as outlined in : https://registry.terraform.io/providers/F5Networks/bigip/latest/docs/resources/bigip_ltm_pool_attachment

we modified pool attachment resource to remove dependency on ltm_node resource.

whume · 2020-10-15T15:06:58Z

Maybe I am wrong here but I tested and still see this failing. Other than removing the dependency did anything else change with the resource?

resource "bigip_ltm_node" "nodes" {
  count       = var.node_count
  name        = "/${local.partition}/${upper(element(local.node_names, count.index))}"
  address     = element(local.node_ips, count.index)
  description = "Deployed by Terraform"
}

resource "bigip_ltm_pool" "pool" {
  name                = "/${local.partition}/${var.vip_fqdn}_k8s_${local.pool_port}_pool"
  monitors            = local.monitor
  allow_nat           = "yes"
  allow_snat          = "yes"
  load_balancing_mode = "round-robin"
  description         = "Deployed by Terraform"
}

resource "bigip_ltm_pool_attachment" "attach" {
  count      = var.node_count
  pool       = bigip_ltm_pool.pool.name
  node       = "${bigip_ltm_node.nodes[count.index].name}:${local.pool_port}"
}

Results in this error when changing an IP address of a node

The Plan shows its will recreate the node but not the attachment:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.master_vips.bigip_ltm_node.nodes[2] must be replaced
-/+ resource "bigip_ltm_node" "nodes" {
      ~ address          = "10.129.82.192" -> "10.129.82.193" # forces replacement
      ~ connection_limit = 0 -> (known after apply)
        description      = "Deployed by Terraform"
      ~ dynamic_ratio    = 1 -> (known after apply)
      ~ id               = "/AUTO-CONTAINERS-DEVTST/TMP01TMPD1VM302" -> (known after apply)
        monitor          = "/Common/icmp"
        name             = "/AUTO-CONTAINERS-DEVTST/TMP01TMPD1VM302"
      ~ rate_limit       = "disabled" -> (known after apply)
      ~ ratio            = 1 -> (known after apply)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

And the apply fails with

module.master_vips.bigip_ltm_node.nodes[2]: Destroying... [id=/AUTO-CONTAINERS-DEVTST/TMP01TMPD1VM302]

Error: HTTP 400 :: {"code":400,"message":"01070110:3: Node address '/AUTO-CONTAINERS-DEVTST/TMP01TMPD1VM302' is referenced by a member of pool '/AUTO-CONTAINERS-DEVTST/tmp01-int-test-dev-xxxxxxxxxx.com_k8s_6443_pool'.","errorStack":[],"apiError":3}

RavinderReddyF5 · 2020-10-15T16:06:53Z

@whume
still you are using node resource to attach pool members, remove ltm node resource and directly attach member to pool using
pool attachment resource.

and make sure your node is dis associated from pool

resource "bigip_ltm_pool_attachment" "attach_node" {
  pool                  = bigip_ltm_pool.pool.name
  node                  = "1.1.1.1:80"
  ratio                 = 2
  connection_limit      = 2
  connection_rate_limit = 2
  priority_group        = 2
  dynamic_ratio         = 3
}

whume · 2020-10-15T18:29:46Z

So I tried to update to what you suggested.

resource "bigip_ltm_pool_attachment" "attach" {
  count      = var.node_count
  pool       = bigip_ltm_pool.pool.name
  node       = "/${local.partition}/${element(local.node_ips, count.index)}:${local.pool_port}"
}

I added the partition to the node as well as im not working in the common partition.

I destroyed the whole vip and tried to recreate from scratch and when I do I get a inconsistent result error

Error: Provider produced inconsistent result after apply

When applying changes to
module.master_vips.bigip_ltm_pool_attachment.attach[0], provider
"registry.terraform.io/-/bigip" produced an unexpected new value for was
present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent result after apply

When applying changes to
module.master_vips.bigip_ltm_pool_attachment.attach[2], provider
"registry.terraform.io/-/bigip" produced an unexpected new value for was
present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent result after apply

When applying changes to
module.master_vips.bigip_ltm_pool_attachment.attach[1], provider
"registry.terraform.io/-/bigip" produced an unexpected new value for was
present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

whume · 2020-10-15T19:43:40Z

OK, So I did figure out what you were saying on this and I am still not sure this is a good solution.

This change introduces what I would call a breaking bug in the provider on a minor release version. We ended up pinning a bunch of our deployments to the previous version to get around the issue.

Additionally, while this works it creates the node in a way that it is not managed in state and by Terraform. So if you change the IP address it will remove it from the pool and make a new node but the node now exists in F5 and is orphaned. For ephemeral workloads like we are doing that could leaves 100's if not 1000's of stale entries in the F5. This could be mitigated by running cleanup scripts periodically but I think this should be handled in Terraform.

The last issue I see is that this change requires you to kill the nodes off out of the pool creating down time. While not a huge deal and can be mitigated there is no clear upgrade path and you cant just run Terraform apply to have it delete the old nodes and replace with the new ones.

While I am glad to see a fix come in and appreciate the quick turn around I think this should maybe be reverted and either released in a major version or reworked.

Thanks

focrensh · 2020-10-21T17:23:29Z

Thanks for the feedback @whume , can you elaborate a bit more on the last issue mentioned above. I am not clear how the new work flow introduces this issue.

whume · 2020-10-21T17:30:28Z

The last comment was mostly just the fact that its not a in place change. If you try to replace the node so it uses the new naming convention of the IP:Port on the attachment it tries to create a node with the same IP as the old nodes and errors out with a already in use error.
So the only way to apply this new change is to destroy the vips / pool / nodes so its can recreate them since you can remove the nodes without decoupling them from the pool. Due to the amount of manual effort this creates we decided to pin our version back for now since even if you try to run an apply using the old method of node naming (currently our hostnames) the provider throws an error with a naming issue needing to be IP:Port on the attachment. Hope that makes sense!

focrensh · 2020-10-21T17:36:54Z

It does thanks. What I have seen in a different migration was to remove the NODE resources from the config, delete their state out of the state file, and then reference the IP:PT of the nodes in the attachment resource. The attachment resource will find the existing nodes and use them within the pools.

We are actively working on fixing the patch release issue.

Thanks,

bcorner13 · 2021-02-24T20:30:53Z

@whume what version(s) did you pin to?

focrensh added the question label Sep 22, 2020

nmenant mentioned this issue Sep 30, 2020

Introducing idempotency to bigip_ltm resources #344

Closed

papineni87 added this to the 1.3.3 milestone Oct 14, 2020

papineni87 closed this as completed Oct 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rerunning terraform - cant update node #336

Rerunning terraform - cant update node #336

thesutex commented Sep 9, 2020 •

edited

focrensh commented Sep 22, 2020

thesutex commented Sep 24, 2020

RavinderReddyF5 commented Sep 24, 2020

thesutex commented Sep 25, 2020 •

edited

whume commented Sep 28, 2020

nmenant commented Sep 30, 2020

papineni87 commented Oct 15, 2020

RavinderReddyF5 commented Oct 15, 2020

whume commented Oct 15, 2020 •

edited

RavinderReddyF5 commented Oct 15, 2020 •

edited

whume commented Oct 15, 2020

whume commented Oct 15, 2020

focrensh commented Oct 21, 2020

whume commented Oct 21, 2020

focrensh commented Oct 21, 2020

bcorner13 commented Feb 24, 2021

Rerunning terraform - cant update node #336

Rerunning terraform - cant update node #336

Comments

thesutex commented Sep 9, 2020 • edited

focrensh commented Sep 22, 2020

thesutex commented Sep 24, 2020

RavinderReddyF5 commented Sep 24, 2020

thesutex commented Sep 25, 2020 • edited

whume commented Sep 28, 2020

nmenant commented Sep 30, 2020

papineni87 commented Oct 15, 2020

RavinderReddyF5 commented Oct 15, 2020

whume commented Oct 15, 2020 • edited

RavinderReddyF5 commented Oct 15, 2020 • edited

whume commented Oct 15, 2020

whume commented Oct 15, 2020

focrensh commented Oct 21, 2020

whume commented Oct 21, 2020

focrensh commented Oct 21, 2020

bcorner13 commented Feb 24, 2021

thesutex commented Sep 9, 2020 •

edited

thesutex commented Sep 25, 2020 •

edited

whume commented Oct 15, 2020 •

edited

RavinderReddyF5 commented Oct 15, 2020 •

edited