Skip to content

VMs created without explicit bootDevices are silently bricked by a second virtio disk attach #7

@ddemlow

Description

@ddemlow

Summary

Worker / control-plane VMs provisioned by this playbook are created without an explicit bootDevices field on the HC VirDomain. They boot fine on first power-on because HC's BIOS auto-falls-back to the only attached VIRTIO_DISK. But the next time a second VIRTIO_DISK is attached to the VM (any HC API consumer — manual disk add, Terraform provider, dynamic-storage workflows, etc.) AND the VM is rebooted, the BIOS gives up with Boot failed: not a bootable disk / No bootable device. The VM is then unbootable until bootDevices is set manually via the HC UI.

This was reproduced on an HC 9.6.25.224460 cluster: all 8 VMs (3 servers + 4 agents + 1 nfs-server) provisioned by this playbook ~728 days ago had bootDevices = []. A second VIRTIO_DISK was attached to one agent VM, and on the next STOP/START it bricked.

The root cause is an HC platform bug (separate issue filed internally against the HC platform team). The workaround belongs here regardless: the playbook is the source of these VMs and is in the best position to set bootDevices correctly at provision time. That covers every existing and future deployment without waiting on the HC platform fix landing.

Suggested fix

When provisioning a VM via the HC API, include the primary disk UUID in the request's bootDevices list, or PATCH the VM right after disk attach. Something like:

- name: Set explicit boot order on the VM
  ansible.builtin.uri:
    url: "https://{{ hc_host }}/rest/v1/VirDomain/{{ vm_uuid }}"
    method: PATCH
    user: "{{ hc_user }}"
    password: "{{ hc_pass }}"
    force_basic_auth: yes
    validate_certs: no
    body_format: json
    body:
      bootDevices:
        - "{{ os_disk_uuid }}"

Or, if scale_computing.hypercore Ansible collection's vm module exposes the field, set it there at create time.

Workaround for existing deployments

curl -sk -u admin:admin -X PATCH "https://<hc-host>/rest/v1/VirDomain/<vm-uuid>" \
  -H 'Content-Type: application/json' \
  -d '{"bootDevices":["<os-disk-uuid>"]}'

Evidence

A worker VM before manual fix:  bootDevices=[]
  blockDevs:
    f8b7960b VIRTIO_DISK 100GB slot=1  <- OS disk
    16e1edfa IDE_CDROM   1MB   slot=0  <- cloud-init
    93969048 VIRTIO_DISK 1GB   slot=0  <- second virtio (any source — manual add, dynamic storage, etc.)

After STOP+START: BIOS → "Boot failed: not a bootable disk / No bootable device"

All other VMs in the cluster: bootDevices=[] — boot fine today because they still
have only one VIRTIO_DISK, but vulnerable to the same brick the moment a second
virtio disk is attached AND the VM is rebooted.

Suggested labels

bug, priority:high (silent + recovery-blocking failure mode)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions