2.9.13 - Terraform hangs on "still creating" when deploying multiple VMs #705

deep-blue-pulsar · 2023-02-27T23:45:16Z

Hi folks. Apologies if I'm missing any information as I'm still learning the ropes with Terraform. I've been trying for the past day to deploy 4 cloudinit VMs to Proxmox but no mater how much I tweak things I can't get the provider to progress further than the first VM.

This is what my main.tf looks like:

resource "proxmox_vm_qemu" "control_plane" {
  count             = 1
  name              = "control-plane-${count.index}"
  target_node       = "${var.pm_node}"

  clone             = "ubuntu-2004-cloudinit-template"
  full_clone        = "true"

  os_type           = "cloud-init"
  cores             = 4
  sockets           = "1"
  cpu               = "host"
  memory            = 2048
  scsihw            = "virtio-scsi-pci"
  bootdisk          = "scsi0"
  agent             = 1

  disk {
    slot            = 0
    size            = "20G"
    type            = "scsi"
    storage         = "local-lvm"
    iothread        = 1
  }

  network {
    model           = "virtio"
    bridge          = "vmbr0"
    tag             = 20
  }

  # cloud-init settings
  ipconfig0         = "ip=10.10.20.3${count.index}/24,gw=10.10.20.1"
  nameserver        = "10.10.20.52,10.10.20.53"
  sshkeys = file("${var.ssh_key_file}")
}

resource "proxmox_vm_qemu" "worker_nodes" {
  count             = 3
  name              = "worker-${count.index}"
  target_node       = "${var.pm_node}"

  clone             = "ubuntu-2004-cloudinit-template"
  full_clone        = "true"

  os_type           = "cloud-init"
  cores             = 4
  sockets           = "1"
  cpu               = "host"
  memory            = 4096
  scsihw            = "virtio-scsi-pci"
  bootdisk          = "scsi0"

  disk {
    slot            = 0
    size            = "20G"
    type            = "scsi"
    storage         = "local-lvm"
    iothread        = 1
  }

  network {
    model           = "virtio"
    bridge          = "vmbr0"
    tag             = 20
  }

  # cloud-init settings
  ipconfig0         = "ip=10.10.20.4${count.index}/24,gw=10.10.20.1"
  nameserver        = "10.10.20.52,10.10.20.53"
  sshkeys = file("${var.ssh_key_file}")
}

I've provisioned the template to have the qemu agent installed. Whenever I run the plan, it starts creating the first VM but never goes past that. Also, when it tries to boot the VM it does so 3 times and errors out because the VM is already running from the first request it sent:

2023-02-27T18:16:35.110-0500 [ERROR] provider.terraform-provider-proxmox_v2.9.13: Response contains error diagnostic: @caller=github.com/hashicorp/terraform-plugin-go@v0.14.3/tfprotov5/internal/diag/diagnostics.go:55 diagnostic_summary="VM 104 already running" tf_req_id=5907450b-2183-b8a3-9f60-8ed35e8aa6fb tf_provider_addr=registry.terraform.io/telmate/proxmox tf_resource_type=proxmox_vm_qemu @module=sdk.proto diagnostic_detail= diagnostic_severity=ERROR tf_proto_version=5.3 tf_rpc=ApplyResourceChange timestamp=2023-02-27T18:16:35.110-0500
2023-02-27T18:16:35.120-0500 [ERROR] vertex "proxmox_vm_qemu.worker_nodes[0]" error: VM 104 already running

Anyone else having a similar issue?

The text was updated successfully, but these errors were encountered:

mantony9000 · 2023-03-05T02:21:46Z

yep, this is a huge blocker for both lxc and vm creation atm,
what version of proxmox and terraform are you using?
I'm using proxmox 7.1-7
and terraform 1.3.3

module.proxmox.proxmox_lxc.pihole: Still creating... [10s elapsed]
module.proxmox.proxmox_lxc.pihole: Still creating... [20s elapsed]
module.proxmox.proxmox_lxc.pihole: Still creating... [30s elapsed]
module.proxmox.proxmox_lxc.pihole: Still creating... [40s elapsed]
module.proxmox.proxmox_lxc.pihole: Still creating... [50s elapsed]
module.proxmox.proxmox_lxc.pihole: Still creating... [1m0s elapsed]
module.proxmox.proxmox_lxc.pihole: Still creating... [1m10s elapsed]
module.proxmox.proxmox_lxc.pihole: Still creating... [1m20s elapsed]
module.proxmox.proxmox_lxc.pihole: Still creating... [1m30s elapsed]
module.proxmox.proxmox_lxc.pihole: Still creating... [1m40s elapsed]
module.proxmox.proxmox_lxc.pihole: Still creating... [1m50s elapsed]
module.proxmox.proxmox_lxc.pihole: Still creating... [2m0s elapsed]```


getting the same for vm creation

mantony9000 · 2023-03-05T02:33:22Z

I'm a moron, change the template and try again it'll work.
mine was failing due to using the alpine template (provisioners failed due to no ssh).

use a updated template: https://github.com/TechByTheNerd/cloud-image-for-proxmox/tree/main/ubuntu

rterbush · 2023-03-22T20:40:17Z

This actually appears to be a real problem. I am unable to deploy more than 1 container resource. This appears to be because Terraform uses parallelism by default and PVE cannot handle multiple requests at the same time. Perhaps a lock issue on a clone?

Here is an interesting thread, putting the responsibility on the provider to manage this.
hashicorp/terraform-plugin-sdk#67

I've tried both a for_each as well as count to do this. Only way to get this to work at least partially is by setting --parallelism=1 for Terraform.

Would love to be proved wrong here.

resource "proxmox_lxc" "data_lxc" {
  count         = 2
  hostname      = "data-${count.index + 1}"
  target_node   = "pve3"
  password      = var.container_password
  clone         = 501
  full          = true
  cores         = 4
  memory        = 2048
  swap          = 1024
  start         = true
  onboot        = true
  unprivileged  = true
  hastate       = "ignored"
  vmid          = count.index + 221

  rootfs {
    storage = "containers"
    size = "10G"
  }

  mountpoint {
    key     = "0"
    slot    = 0
    storage = "salt-states"
    mp      = "/srv"
    size    = "100M"
    shared	= true
  }

  mountpoint {
    key     = "1"
    slot    = 1
    storage = "data-library"
    mp      = "/mnt/data"
    size    = "100G"
    shared	= true
  }

  network {
    name = "eth0"
    bridge = "vmbr0"
    ip = "10.10.9.${count.index + 221}/24"
    gw = "10.10.9.254"
  }

  provisioner "remote-exec" {
    script = "provision.sh"

    connection {
      type = "ssh"
      user = "root"
      host = "10.10.9.${count.index + 221}"
      private_key = var.ssh_private_key
    }
  }
}

rterbush · 2023-03-22T21:57:02Z

Just discovered pm_parallel parameter. This does avoid the need to set this on the command line, but the bigger issue is not being able to run these container deployments in parallel. Not sure if a delay might solve this or workaround getting away from using clone.

rterbush · 2023-03-22T22:09:20Z

And I can confirm that pm_parallel does not work as expected. #310
Also related #173

mantony9000 · 2023-03-26T06:32:36Z

@rterbush you cannot clone in parallel : https://forum.proxmox.com/threads/parallel-cloning.75902/
it needs to lock the storage. Might be a proxmox api limitation?

rterbush · 2023-03-26T13:50:57Z

@CaptainPizzaPirate thanks for the link. It does explain the issue on the Proxmox side.

Point of my earlier comment is that setting pm_parallel in the TF does not limit parallel processing of the deployment. Seems it is required to use the command line flag to set --parallelism=1 in order to get this to work so I too, (as did the OP), question if the provider setting works as intended.

Actually, was the poster in one of the other referenced issues that made the statement that they did not believe this works as intended.

mantony9000 · 2023-04-30T07:24:22Z

@rterbush & @deep-blue-pulsar
I understand the pm_parallel is having issues,
however can you confirm if the qemu-guest-agent is installed on your disk image?
please add these vars to your provider to debug the trace:

  pm_log_file         = "terraform-provider-proxmox.log"
  pm_debug            = true

can you post the log trace in the file terraform-provider-proxmox.log?

rterbush · 2023-04-30T13:17:29Z

@CaptainPizzaPirate in my case, I am deploying lxc containers, so not applicable.

mantony9000 · 2023-04-30T15:19:44Z

@rterbush I understand that, however if the clone template does not have the agent it will also stall. you can verify this by enabling the logs to confirm its not actually the image issue.
it can also stall if your image does not has openssh, we have no idea what container template you are using, for eg if its alpine it does not ship with ssh and will stall too, because it can't run the provisioners

clone = 501

rterbush · 2023-04-30T16:15:55Z

@CaptainPizzaPirate maybe I am not understanding you completely.

I am not running qemu in these deployments. The containers are cloned from lxc templates. No qemu involved to install and run the qemu agent.

A lot of water has gone under the bridge since I reported/confirmed this behavior with not being able to control parallelism in the proxmox provider config. I acknowledge that there is a limitation in the Proxmox API with regard to locking clones. However, setting pm_parallel=1 does not work around this issue. I must set --parallelism=1 at the terraform command line. https://registry.terraform.io/providers/Telmate/proxmox/2.7.4/docs#pm_parallel

My config has also changed a lot in order to work around other bugs such as #753, so I am now also specifying hwaddr for each container provisioned. Not clear if I can easily recreate this without substantially reverting my config. Will give a try without the command line var and report back if I can recreate. But again, this has less to do with the container config and more to do with an upstream issue in the proxmox provider config itself.

mantony9000 · 2023-04-30T16:57:57Z

@rterbush
I'm saying theres something wrong with the lxc template, and logs will confirm that

github-actions · 2023-06-30T11:01:37Z

This issue is stale because it has been open for 60 days with no activity. Please update the provider to the latest version and, in the issue persist, provide full configuration and debug logs

github-actions · 2023-07-06T11:01:38Z

This issue was closed because it has been inactive for 5 days since being marked as stale.

github-actions bot added the stale label Jun 30, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.9.13 - Terraform hangs on "still creating" when deploying multiple VMs #705

2.9.13 - Terraform hangs on "still creating" when deploying multiple VMs #705

deep-blue-pulsar commented Feb 27, 2023

mantony9000 commented Mar 5, 2023 •

edited

mantony9000 commented Mar 5, 2023 •

edited

rterbush commented Mar 22, 2023 •

edited

rterbush commented Mar 22, 2023

rterbush commented Mar 22, 2023 •

edited

mantony9000 commented Mar 26, 2023 •

edited

rterbush commented Mar 26, 2023 •

edited

mantony9000 commented Apr 30, 2023

rterbush commented Apr 30, 2023

mantony9000 commented Apr 30, 2023 •

edited

rterbush commented Apr 30, 2023

mantony9000 commented Apr 30, 2023

github-actions bot commented Jun 30, 2023

github-actions bot commented Jul 6, 2023

2.9.13 - Terraform hangs on "still creating" when deploying multiple VMs #705

2.9.13 - Terraform hangs on "still creating" when deploying multiple VMs #705

Comments

deep-blue-pulsar commented Feb 27, 2023

mantony9000 commented Mar 5, 2023 • edited

mantony9000 commented Mar 5, 2023 • edited

rterbush commented Mar 22, 2023 • edited

rterbush commented Mar 22, 2023

rterbush commented Mar 22, 2023 • edited

mantony9000 commented Mar 26, 2023 • edited

rterbush commented Mar 26, 2023 • edited

mantony9000 commented Apr 30, 2023

rterbush commented Apr 30, 2023

mantony9000 commented Apr 30, 2023 • edited

rterbush commented Apr 30, 2023

mantony9000 commented Apr 30, 2023

github-actions bot commented Jun 30, 2023

github-actions bot commented Jul 6, 2023

mantony9000 commented Mar 5, 2023 •

edited

mantony9000 commented Mar 5, 2023 •

edited

rterbush commented Mar 22, 2023 •

edited

rterbush commented Mar 22, 2023 •

edited

mantony9000 commented Mar 26, 2023 •

edited

rterbush commented Mar 26, 2023 •

edited

mantony9000 commented Apr 30, 2023 •

edited