Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virt define improvements #142

Merged
merged 3 commits into from Oct 12, 2022

Conversation

mlow
Copy link
Contributor

@mlow mlow commented Sep 16, 2022

SUMMARY

This pull request effectively rewrites how domains are defined.

To summarize its changes:

  • Diff info is now supplied, showing the changes between previously and
    newly defined domain XML.
  • If a domain with the same name already exists and the incoming UUID is
    missing, apply the existing UUID to the incoming XML.
  • If a UUID is defined in the incoming XML and it doesn't match that of
    an existing domain with the same name, exit with an error
  • For any incoming XML missing a element, we will
    attempt to fetch MAC of the matching interface from the existing
    domain's XML
ISSUE TYPE
  • Improvement Pull Request
COMPONENT NAME

virt

ADDITIONAL INFORMATION

Before these changes, if a definition was given without a UUID, the
appropriate domain would be created with a libvirt-generated UUID. If
the definition was modified (with UUID still left out), the module would
quietly exit with changes: False and make no changes (masking a
libvirt error that a domain with the same name already existed).

If a UUID was given, the module would work (mostly) as expected and
update the domain's definition. However, if the did not match the
of an existing domain with the same name, the module would
quietly exit without making changes again.

I say (mostly) as expected, because it was still possible to create
non-idempotent XML definitions. For example, if network interface MAC
addresses were left out of the XML definition, libvirt would generate
entirely new ones; probably not what most users would expect.

The parsed XML will be useful for future features.

Also change domain name handling. Allow specifying from either xml or
the `name` parameter, but raise an error if they're both present and
not equal.
@mlow mlow force-pushed the virt-define-improvements branch 3 times, most recently from 92b927a to 3402732 Compare September 16, 2022 04:35
@csmart
Copy link
Collaborator

csmart commented Sep 16, 2022

Hi @mlow, thanks for this contribution! I will go over it carefully and do some testing, then provide some more specific feedback but here are some initial thoughts.

Diff info is now supplied, showing the changes between previously and newly defined domain XML.

That's great, thanks!

If a domain with the same name already exists and the incoming UUID is missing, apply the existing UUID to the incoming XML.

Great, I think this makes sense, too.

If a UUID is defined in the incoming XML and it doesn't match that of an existing domain with the same name, exit with an error

Agree, makes sense.

For any incoming XML missing a element, we will attempt to fetch MAC of the matching interface from the existing domain's XML

I do like the idea of being more helpful, but i think it can also introduce dangers if we get it wrong... Overall, it might be reasonable to stick to how virsh works so that the experience and expectations are similar whether using ansible or command line tools, but keen to hash it out a bit and see what might be best.

In my experience using libvirt outside of Ansible, it has been expected to auto-generate elements such as UUID and MAC when they are left out when defining a new VM (I don't think that's changing with your patches).

When using virsh define to update an existing VM, both UUID and name are required. If only one is defined and it matches an existing VM, then it expects to create a new VM but won't because one exists with a conflicting option.

Similarly, if you modify an interface but leave out the MAC, it will fail.

With your change, I'm curious how will you handle the detection - is it based on network? What if a machine has two interfaces on the one network? What if a user wants to change the network of an interface but left off the MAC (normally you'd need MAC for this).

If something new is defined that also missing an element (say for example you define a new network interface), will this add the new interface and also generate a new MAC as there was nothing existing? This might get tricky if a new interface is defined on the same network and the original one is either missing or missing a MAC address?

Also, some changes will require a reboot of the machine to activate, might it be a good idea to include a message to that effect in the output? That'd be great!

@mlow
Copy link
Contributor Author

mlow commented Sep 16, 2022

Thanks for having a look, @csmart

Similarly, if you modify an interface but leave out the MAC, it will fail.

In my testing, when you leave out the MAC address, libvirt will silently generate a new one.

For example, if you define a domain with an interface like this:

<interface type="network">
  <source network="default"/>
  <model type="virtio"/>
</interface>

Then, libvirt will insert a <mac> element. If you later re-define an identical XML (which this module would do as part of a normal execution), libvirt will generate an entirely new MAC for that interface. I'm testing on 8.5.0 currently.

With your change, I'm curious how will you handle the detection - is it based on network? What if a machine has two interfaces on the one network?

Hmm, yes it's based on the network (or bridge device in the case of bridge interface, or physical host interface in the case of macvtap) the interface is attached to.

There is potential for issues if a domain has multiple interfaces attached to the same source, which I haven't thought about...

I have two thoughts about this:

  • If an incoming interface matches an existing interface with a MAC - remove that existing interface from the candidates to search for the remaining incoming interfaces.
  • Make UUID/MAC lookups (modifying the incoming XML in general) an opt-in feature so there's no risk of someone unwittingly having their definitions mangled

What if a user wants to change the network of an interface but left off the MAC (normally you'd need MAC for this).

This was something I thought about when working on this... I think an opt-in feature is probably the ideal way to handle this. A user could run the module with incoming XML mutation disabled, libvirt would generate the new MAC, then the user could re-enable XML mutation.

This begs the question - would individual mutation flags (one to enable/disable bringing over UUID, another for MAC addresses) make sense, or an overall flag? Or both?

Also, some changes will require a reboot of the machine to activate, might it be a good idea to include a message to that effect in the output? That'd be great!

This could be useful - it would be nice if there's libvirt way to check if a domain should be rebooted for changes to be applied. Otherwise we may be outputting this message even if no reboot is required

Side note:

I originally implemented this as an action plugin, which worked, but since action plugins run on the controller node, then it would need to be able to connect to remote libvirt instances to perform these checks. I currently don't know another way around this issue besides statically defining UUID's/MAC addresses in my domain XMLs, which I'd really like to avoid.

@mlow
Copy link
Contributor Author

mlow commented Sep 16, 2022

It just occurred to me that I could implement XML mutation parts as a separate module whose output is consumed by community.libvirt.virt 🤦

I'm happy to remove the network MAC address copying while keeping the less-tricky parts (diff, validation changes, UUID copying) if you agree.

@csmart
Copy link
Collaborator

csmart commented Sep 17, 2022

I think the idea of defaulting to standard libvirt (or existing) behaviour but extending it with flags sounds good. I am all for improving the way it works and making the module more flexible and useful, just gotta balance that with existing expected behaviour 👍

@Andersson007
Copy link
Contributor

@mlow hello, thanks for the PR, could you also please add a changelog fragment. Thanks!

@mlow
Copy link
Contributor Author

mlow commented Sep 19, 2022

I've added a changelog fragment (hopefully conforming to the conventions correctly)

I've changed the functionality of this PR since the last big discussion by adding the mutate_flags parameter and performing MAC address matching based on alias by default (with an additional ADD_MAC_ADDRESSES_FUZZY flag to also allow matching based on interface type and source).

If desired I will go ahead and shorten the commit log.

@csmart
Copy link
Collaborator

csmart commented Sep 22, 2022

Thanks @mlow, I will try to do some testing of this code over the next few days and get back to you.

@csmart
Copy link
Collaborator

csmart commented Sep 25, 2022

@mlow looking pretty good, thanks! I've done some test as you've outlined with the following playbook (as mutate_flags has ADD_UUID enabled by default I did not specify for initial testing). The ADD_MAC_ADDRESSES flag didn't copy the MAC address for an existing interface when re-defining the VM, but adding ADD_MAC_ADDRESSES_FUZZY worked - what did I misunderstand there?..

Here are the tests.

---
- hosts: localhost
  connection: local
  gather_facts: false
  tasks:
    - name: Define a VM
      community.libvirt.virt:
        command: define
        xml: "{{ lookup('template', '/tmp/example-centos-8.xml') }}"
      register: result

    - name: Write old XML
      copy:
        content: "{{ result.diff.before }}"
        dest: /tmp/example-centos-8-before.xml
      delegate_to: localhost
      when: result.diff.before is defined

    - name: Write new XML
      copy:
        content: "{{ result.diff.after }}"
        dest: /tmp/example-centos-8-after.xml
      delegate_to: localhost
      when: result.diff.after is defined

Then playing with dropping the UUID and changing CPUs and it added the existing UUID and made the change correctly 👍

$ ansible-playbook /tmp/define-vm.yml -bK
BECOME password: 

PLAY [localhost] *******************

TASK [Define a VM] ************************
changed: [localhost]

TASK [Write old XML] *************************
changed: [localhost]

TASK [Write new XML] **************************
changed: [localhost]

PLAY RECAP ****************
localhost                  : ok=3    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

$ diff -Nurd /tmp/example-centos-8-before.xml /tmp/example-centos-8-after.xml 
--- /tmp/example-centos-8-before.xml    2022-09-25 17:46:56.491211553 +1000
+++ /tmp/example-centos-8-after.xml     2022-09-25 17:46:56.840218469 +1000
@@ -8,7 +8,7 @@
   </metadata>
   <memory unit='KiB'>2048576</memory>
   <currentMemory unit='KiB'>2048576</currentMemory>
-  <vcpu placement='static'>2</vcpu>
+  <vcpu placement='static'>4</vcpu>
   <resource>
     <partition>/machine</partition>
   </resource>

$ sed -i 's/>4</>2</g' /tmp/example-centos-8.xml 

$ ansible-playbook /tmp/define-vm.yml -bK
BECOME password: 

PLAY [localhost] *******************

TASK [Define a VM] ***********************
changed: [localhost]

TASK [Write old XML] *************************
changed: [localhost]

TASK [Write new XML] **************************
changed: [localhost]

PLAY RECAP ****************
localhost                  : ok=3    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

$ diff -Nurd /tmp/example-centos-8-before.xml /tmp/example-centos-8-after.xml 
--- /tmp/example-centos-8-before.xml    2022-09-25 17:47:16.684615331 +1000
+++ /tmp/example-centos-8-after.xml     2022-09-25 17:47:17.030622292 +1000
@@ -8,7 +8,7 @@
   </metadata>
   <memory unit='KiB'>2048576</memory>
   <currentMemory unit='KiB'>2048576</currentMemory>
-  <vcpu placement='static'>4</vcpu>
+  <vcpu placement='static'>2</vcpu>
   <resource>
     <partition>/machine</partition>
   </resource>

When specifying mutate_flags and leaving ADD_UUID off, it failed to add UUID from existing machine, which is expected behaviour 👍

  • Diff info is now supplied, showing the changes between previously and newly defined domain XML.

Confirmed that diff works as expected 👍

$ ansible-playbook /tmp/define-vm.yml -bK --diff
BECOME password: 

PLAY [localhost] *******************

TASK [Define a VM] ***********************
--- before
+++ after
@@ -8,7 +8,7 @@
   </metadata>
   <memory unit='KiB'>2048576</memory>
   <currentMemory unit='KiB'>2048576</currentMemory>
-  <vcpu placement='static'>2</vcpu>
+  <vcpu placement='static'>4</vcpu>
   <resource>
     <partition>/machine</partition>
   </resource>

changed: [localhost]

TASK [Write old XML] **************************
--- before: /tmp/example-centos-8-before.xml
+++ after: /home/csmart/.ansible/tmp/ansible-local-2519086s07m_gi5/tmppmnd3_kc
@@ -8,7 +8,7 @@
   </metadata>
   <memory unit='KiB'>2048576</memory>
   <currentMemory unit='KiB'>2048576</currentMemory>
-  <vcpu placement='static'>4</vcpu>
+  <vcpu placement='static'>2</vcpu>
   <resource>
     <partition>/machine</partition>
   </resource>

changed: [localhost]

TASK [Write new XML] **************************
--- before: /tmp/example-centos-8-after.xml
+++ after: /home/csmart/.ansible/tmp/ansible-local-2519086s07m_gi5/tmpxcmgn237
@@ -8,7 +8,7 @@
   </metadata>
   <memory unit='KiB'>2048576</memory>
   <currentMemory unit='KiB'>2048576</currentMemory>
-  <vcpu placement='static'>2</vcpu>
+  <vcpu placement='static'>4</vcpu>
   <resource>
     <partition>/machine</partition>
   </resource>

changed: [localhost]

PLAY RECAP ****************
localhost                  : ok=3    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

$ ansible-playbook /tmp/define-vm.yml -bK --diff
BECOME password: 

PLAY [localhost] ********************

TASK [Define a VM] ***********************
ok: [localhost]

TASK [Write old XML] *************************
skipping: [localhost]

TASK [Write new XML] ***************************
skipping: [localhost]

PLAY RECAP ****************
localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0   
  • If a domain with the same name already exists and the incoming UUID is missing, apply the existing UUID to the incoming XML.

Confirmed by taking uuid out of the xml definition and re-running with a change worked as expected 👍

$ grep uuid /tmp/example-centos-8.xml

$ ansible-playbook /tmp/define-vm.yml -bK 
BECOME password: 

PLAY [localhost] *******************

TASK [Define a VM] ***********************
changed: [localhost]

TASK [Write old XML] *************************
changed: [localhost]

TASK [Write new XML] **************************
changed: [localhost]

PLAY RECAP ****************
localhost                  : ok=3    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

$ ansible-playbook /tmp/define-vm.yml -bK 
BECOME password: 

PLAY [localhost] *******************

TASK [Define a VM] ***********************
ok: [localhost]

TASK [Write old XML] *************************
skipping: [localhost]

TASK [Write new XML] **************************
skipping: [localhost]

PLAY RECAP ****************
localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0   
  • If a UUID is defined in the incoming XML and it doesn't match that of an existing domain with the same name, exit with an error

Confirmed by adding a new uuid to xml and re-running.

$ sudo virsh dumpxml example-centos-8 |grep uuid
  <uuid>4bb6e384-9274-4531-8757-4b93f06a7360</uuid>

$ grep uuid /tmp/example-centos-8.xml
  <uuid>81de101c-23dc-4a50-94e5-f3c26e79b951</uuid>

$ ansible-playbook /tmp/define-vm.yml -bK 
BECOME password: 

PLAY [localhost] *******************

TASK [Define a VM] ***********************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "attempting to re-define domain example-centos-8/4bb6e384-9274-4531-8757-4b93f06a7360 with a different UUID: 81de101c-23dc-4a50-94e5-f3c26e79b951"}

PLAY RECAP ****************
localhost                  : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
  • For any incoming XML missing a element, we will attempt to fetch MAC of the matching interface from the existing domain's XML

Tested without mutate flags, confirm is as existing behaviour where the MAC is re-generated 👍

$ grep -A4 "type='network'" /tmp/example-centos-8.xml
    <interface type='network'>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

$ ansible-playbook /tmp/define-vm.yml -bK 
BECOME password: 

PLAY [localhost] *******************

TASK [Define a VM] ***********************
changed: [localhost]

TASK [Write old XML] *************************
changed: [localhost]

TASK [Write new XML] ***************************
changed: [localhost]

PLAY RECAP ****************
localhost                  : ok=3    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

$ diff -Nurd /tmp/example-centos-8-before.xml /tmp/example-centos-8-after.xml 
--- /tmp/example-centos-8-before.xml    2022-09-25 18:05:30.484642645 +1000
+++ /tmp/example-centos-8-after.xml     2022-09-25 18:05:30.872650570 +1000
@@ -172,7 +172,7 @@
       <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
     </controller>
     <interface type='network'>
-      <mac address='52:54:00:ea:8a:35'/>
+      <mac address='52:54:00:65:ac:65'/>
       <source network='default'/>
       <model type='virtio'/>
       <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>

Tested again with mutate flags:

        mutate_flags:
          - ADD_UUID
          - ADD_MAC_ADDRESSES

It did not seem to match the network interface.

$ sudo virsh dumpxml example-centos-8 |grep 'mac add'
      <mac address='52:54:00:0a:a2:ff'/>

$ grep -A4 "type='network'" /tmp/example-centos-8.xml
    <interface type='network'>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

$ ansible-playbook /tmp/define-vm.yml -bK 
BECOME password: 

PLAY [localhost] *******************

TASK [Define a VM] ***********************
changed: [localhost]

TASK [Write old XML] *************************
changed: [localhost]

TASK [Write new XML] ***************************
changed: [localhost]

PLAY RECAP ****************
localhost                  : ok=3    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

$ diff -Nurd /tmp/example-centos-8-before.xml /tmp/example-centos-8-after.xml 
--- /tmp/example-centos-8-before.xml    2022-09-25 18:28:59.728155749 +1000
+++ /tmp/example-centos-8-after.xml     2022-09-25 18:29:00.110164172 +1000
@@ -172,7 +172,7 @@
       <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
     </controller>
     <interface type='network'>
-      <mac address='52:54:00:0a:a2:ff'/>
+      <mac address='52:54:00:7a:96:ba'/>
       <source network='default'/>
       <model type='virtio'/>
       <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>

I tested with fuzzy enabled via ADD_MAC_ADDRESSES_FUZZY flag:

        mutate_flags:
          - ADD_UUID
          - ADD_MAC_ADDRESSES
          - ADD_MAC_ADDRESSES_FUZZY

It works as expected, the MAC is copied across 👍

$ sudo virsh dumpxml example-centos-8 |grep 'mac add'
      <mac address='52:54:00:7a:96:ba'/>

$ grep -A4 "type='network'" /tmp/example-centos-8.xml
    <interface type='network'>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

$ ansible-playbook /tmp/define-vm.yml -bK 
BECOME password: 

PLAY [localhost] *******************

TASK [Define a VM] ***********************
ok: [localhost]

TASK [Write old XML] *************************
skipping: [localhost]

TASK [Write new XML] ***************************
skipping: [localhost]

PLAY RECAP ****************
localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0   

@mlow
Copy link
Contributor Author

mlow commented Sep 25, 2022

@csmart
Thanks for the testing!

I had changed ADD_MAC_ADDRESSES since it was introduced to match interfaces based on a user-provided alias, such as:

<alias name='ua-my-alias'/>

(user aliases must have the ua- prefix)

And moved interface type/source matching to ADD_MAC_ADDRESSES_FUZZY (your test shows expected behavior here).

This is mentioned in the doc fragment. One thing that isn't, however, is that ADD_MAC_ADDRESSES_FUZZY implies alias-based matching as well (I felt it safe to assume that behavior would be expected but am questioning myself now...)

@csmart
Copy link
Collaborator

csmart commented Sep 27, 2022

@mlow ahh right! I feel like the alias testing might be a bit specific, perhaps fuzzy matching should be the default? If we do have alias matching, perhaps the prefix needs to be configurable?.. Let me do some more testing of prefix then and also some more complex network situations for fuzzy...

@mlow
Copy link
Contributor Author

mlow commented Sep 27, 2022

@mlow ahh right! I feel like the alias testing might be a bit specific, perhaps fuzzy matching should be the default? If we do have alias matching, perhaps the prefix needs to be configurable?.. Let me do some more testing of prefix then and also some more complex network situations for fuzzy...

I'm alright with either option!

Small note, the prefix is enforced by libvirt, not us.

Cheers

@csmart
Copy link
Collaborator

csmart commented Sep 27, 2022

OK that makes sense, thanks for that. If you can give me a bit more time to do some more testing and read over the code, that'd be great. We really appreciate the contribution, thanks!

@csmart
Copy link
Collaborator

csmart commented Sep 30, 2022

@mlow so doing a test with multiple NICs, each on a different network, and re-defining without the MAC specific works great!

However, testing two NICs on the same network and redefining it without MAC addresses (with all mutate_flags) causes a new MAC to be generated by libvirt.

While having two interfaces on the same network might be a less common scenario, I still think it's something we should consider. How do you think it might be best manage that?..

@mlow
Copy link
Contributor Author

mlow commented Oct 3, 2022

@csmart Hmm, I haven't been able to reproduce that. Testing with the playbook below, MAC addresses are not re-generated as expected:

---
- hosts: localhost
  become: true
  gather_facts: false
  tasks:
    - community.libvirt.virt:
        command: define
        mutate_flags: [ ADD_UUID, ADD_MAC_ADDRESSES_FUZZY ]
        xml: |
          <domain type="kvm">
            <name>test_vm</name>
            <memory unit="MiB">1024</memory>
            <vcpu placement="static">1</vcpu>
            <os>
              <type arch="x86_64" machine="q35">hvm</type>
            </os>
            <devices>
              <interface type="network">
                <source network="default"/>
                <model type="virtio"/>
              </interface>
              <interface type="network">
                <source network="default"/>
                <model type="virtio"/>
              </interface>
            </devices>
          </domain>

When ADD_MAC_ADDRESSES is used instead, MAC addresses are regenerated, unless <alias>es are added to the interfaces, e.g.:

---
- hosts: localhost
  become: true
  gather_facts: false
  tasks:
    - community.libvirt.virt:
        command: define
        mutate_flags: [ ADD_UUID, ADD_MAC_ADDRESSES ]
        xml: |
          <domain type="kvm">
            <name>test_vm</name>
            <memory unit="MiB">1024</memory>
            <vcpu placement="static">1</vcpu>
            <os>
              <type arch="x86_64" machine="q35">hvm</type>
            </os>
            <devices>
              <interface type="network">
                <source network="default"/>
                <model type="virtio"/>
                <alias name="ua-net1"/>
              </interface>
              <interface type="network">
                <source network="default"/>
                <model type="virtio"/>
                <alias name="ua-net2"/>
              </interface>
            </devices>
          </domain>

It's worth mentioning if you execute the two playbooks above back-to-back, you will see MAC addresses regenerate: on the second playbook invocation, the aliases do not yet exist on the domain, so matching cannot be performed (this can be avoided by doing a run with both ADD_MAC_ADDRESSES and ADD_MAC_ADDRESSES_FUZZY after adding aliases).

@csmart
Copy link
Collaborator

csmart commented Oct 3, 2022

Hey @mlow, thanks for your patience working through this! It's a good improvement but as it's changing behaviour I just want to make sure we cover as many bases as we can think of and get it as right as we can 🤞

I think it happens when I have one interface with a MAC address defined and the second one not. Looks like it will regenerate the second mac, but actually it takes the MAC of the first interface. I'm wondering if we might need to refactor that code a little to support multiple NICs? Maybe you can see if you can reproduce it?

Set up VM specifying the MACs, all OK:

$ cat ../test-vm.yml
---
- hosts: localhost
  become: true
  gather_facts: false
  tasks:
    - community.libvirt.virt:
        command: define
        mutate_flags:
          - ADD_UUID
          - ADD_MAC_ADDRESSES
          - ADD_MAC_ADDRESSES_FUZZY
        xml: |
          <domain type="kvm">
            <name>test_vm</name>
            <memory unit="MiB">1024</memory>
            <vcpu placement="static">1</vcpu>
            <os>
              <type arch="x86_64" machine="q35">hvm</type>
            </os>
            <devices>
              <interface type='network'>
                <mac address='52:54:00:02:48:1c'/>
                <source network='default'/>
                <model type='virtio'/>
              </interface>
              <interface type='network'>
                <mac address='52:54:00:9c:b2:c3'/>
                <source network='default'/>
                <model type='virtio'/>
              </interface>
            </devices>
          </domain>

$ ansible-playbook -i localhost, -c local ../test-vm.yml -bK 
BECOME password: 

PLAY [localhost] *******************

TASK [community.libvirt.virt] ********************************
ok: [localhost]

PLAY RECAP *****************
localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

Check the MAC addresses on the interfaces.

$ sudo virsh dumpxml test_vm |grep 'mac '
      <mac address='52:54:00:02:48:1c'/>
      <mac address='52:54:00:9c:b2:c3'/>

Repeat test but with MAC missing from second interface, the MAC is duplicated from the first interface:

$ cat ../test-vm.yml
---
- hosts: localhost
  become: true
  gather_facts: false
  tasks:
    - community.libvirt.virt:
        command: define
        mutate_flags:
          - ADD_UUID
          - ADD_MAC_ADDRESSES
          - ADD_MAC_ADDRESSES_FUZZY
        xml: |
          <domain type="kvm">
            <name>test_vm</name>
            <memory unit="MiB">1024</memory>
            <vcpu placement="static">1</vcpu>
            <os>
              <type arch="x86_64" machine="q35">hvm</type>
            </os>
            <devices>
              <interface type='network'>
                <mac address='52:54:00:02:48:1c'/>
                <source network='default'/>
                <model type='virtio'/>
              </interface>
              <interface type='network'>
                <source network='default'/>
                <model type='virtio'/>
              </interface>
            </devices>
          </domain>


$ ansible-playbook -i localhost, -c local ../test-vm.yml -bK 
BECOME password: 

PLAY [localhost] *******************

TASK [community.libvirt.virt] ********************************
changed: [localhost]

PLAY RECAP ****************
localhost                  : ok=1    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

Check the interfaces:

$ sudo virsh dumpxml test_vm |grep 'mac '
      <mac address='52:54:00:02:48:1c'/>
      <mac address='52:54:00:02:48:1c'/>

I think it'd be great it we could either try and fix that up (perhaps counting the number of interfaces on that network?) or if we can't grab the right MAC, then we either re-generate with a new MAC (default-like behaviour) or error?

Which makes me wonder whether the flags should be hard? For example, if we have ADD_MAC_ADDRESSES_FUZZY set and we can't safely get it, do we error?

I guess I'm hesitant to take a path where we might have unexpected consequences - a user expects it to find the right MAC but it silently generates a new one, or such...

What do you think?

-c

@mlow
Copy link
Contributor Author

mlow commented Oct 3, 2022

@csmart Ahh, nice catch! I didn't consider a user might supply some but not all MAC addresses. What's happening is when we search for a match to the second interface, we fuzzy match against the first already-defined interface, which corresponds to the first interface in our supplied XML (thus we end up re-using its MAC)

The fix I've comitted is to avoid matching against any already-defined interfaces that have a MAC address contained in our user-supplied XML (by simply removing them as candidates before we perform the matching).

FWIW I appreciate your effort reviewing and testing this! I don't mind things taking a while if it leads to a better quality end result :)

@mlow
Copy link
Contributor Author

mlow commented Oct 3, 2022

There's an interesting behavior present now, but I'm not sure how much it's worth trying to fix given how strange of an edge case is:

Assume domain test_vm already exists with two interfaces on the same network with unique MAC addresses already.

If you re-define the domain with the an explicit MAC for the first interface (that differs from the already defined MAC), then the existing MAC address of the first interface will end up getting used for the second interface. E.g.:

$ sudo virsh dumpxml test_vm --xpath '//devices/interface/mac'
<mac address="52:54:00:65:6b:07"/>
<mac address="52:54:00:b9:6d:fe"/>

$ cat ./test_vm.yml 
---
- hosts: localhost
  become: true
  gather_facts: false
  tasks:
    - community.libvirt.virt:
        command: define
        mutate_flags:
          - ADD_UUID
          - ADD_MAC_ADDRESSES
          - ADD_MAC_ADDRESSES_FUZZY
        xml: |
          <domain type="kvm">
            <name>test_vm</name>
            <memory unit="MiB">1024</memory>
            <vcpu placement="static">1</vcpu>
            <os>
              <type arch="x86_64" machine="q35">hvm</type>
            </os>
            <devices>
              <interface type="network">
                <mac address="52:54:00:f0:36:05"/>
                <source network="default"/>
                <model type="virtio"/>
              </interface>
              <interface type="network">
                <source network="default"/>
                <model type="virtio"/>
              </interface>
            </devices>
          </domain>

$ ansible-playbook test_vm.yml -KD
BECOME password: 

PLAY [localhost] ***************************************************************************************************

TASK [community.libvirt.virt] **************************************************************************************
--- before
+++ after
@@ -50,13 +50,13 @@
       <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
     </controller>
     <interface type='network'>
-      <mac address='52:54:00:65:6b:07'/>
+      <mac address='52:54:00:f0:36:05'/>
       <source network='default'/>
       <model type='virtio'/>
       <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
     </interface>
     <interface type='network'>
-      <mac address='52:54:00:b9:6d:fe'/>
+      <mac address='52:54:00:65:6b:07'/>
       <source network='default'/>
       <model type='virtio'/>
       <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>

changed: [127.0.0.1]

PLAY RECAP *********************************************************************************************************
127.0.0.1                  : ok=1    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

$ sudo virsh dumpxml test_vm --xpath '//devices/interface/mac'
<mac address="52:54:00:f0:36:05"/>
<mac address="52:54:00:65:6b:07"/>

(What used to be the MAC for the first interface is now the MAC for the second interface).

This is only possible if both of of these conditions are true:

  • you're attaching multiple interfaces to the same network
  • explicitly defining MACs for some but not all of your interfaces

It does not lead to duplicate MAC assignments (which may actually cause issues), but it does lead to potentially unexpected changes in the MAC address of an interface.

My feeling is that if a user is seriously concerned about the MAC addresses their interfaces have, they will explicitly define them (or at the very least use the more explicit alias matching). But I think it's worth adding a note to the documentation to take care when using ADD_MAC_ADDRESSES_FUZZY - essentially that it could lead to MAC-address reassignment where not expected.

@csmart
Copy link
Collaborator

csmart commented Oct 5, 2022

@mlow my feeling is that it would be better if we could try and fix these issues, as we can see the behaviour isn't right and we don't know what it might affect down the line... Do you think we could we do something like count the interfaces as we're looping through them? Becasue we're defining the VM with XML, it's all or nothing. I.e. if the XML we define has one network, the resulting VM will end up with one network, any others will be deleted and the first one kept. If we have 4, three mode will be added. If we remove the third network, it's just going to drop the 4th one to make 3.

Taking your VM definition above, if we had this:

$ sudo virsh dumpxml test_vm --xpath '//devices/interface/mac'
<mac address="52:54:00:65:6b:07"/>
<mac address="52:54:00:b9:6d:fe"/>

And you define networks like this:

            <devices>
              <interface type="network">
                <mac address="52:54:00:f0:36:05"/>
                <source network="default"/>
                <model type="virtio"/>
              </interface>
              <interface type="network">
                <source network="default"/>
                <model type="virtio"/>
              </interface>

Then as it's looping it would make the first NIC 52:54:00:f0:36:05 as it's defined in the XML, then when it gets to the second NIC in the XML interface it have no MAC defined in the XML, but the second NIC in the already defined VM will make it 52:54:00:b9:6d:fe.

I feel like this would probably cover all bases on expected behaviour... What do you think?

@mlow
Copy link
Contributor Author

mlow commented Oct 5, 2022

@csmart

Counting interfaces (trying to match nth interface in incoming XML with nth interface in existing domain) indeed will solve the case I demonstrated above - the second interface would end up keeping its MAC address, which is what a user would expect.

Considering this seems relatively straightforward to implement, I'll have a go at it 👍

Some "unexpected" behavior may still occur; consider a domain having 4 interfaces, and we remove the 3rd interface from the incoming XML (which had no explicit <mac>s or <alias>es). There's no way for us to know that it was the 3rd interface removed. When we perform the matching, we'd end up scraping the first 3 MAC addresses from the domain, and ultimately the 4th interface would be dropped. Some users might consider that unexpected, some might not (IMO the FUZZY should get most users to not expect miracles)

@csmart
Copy link
Collaborator

csmart commented Oct 6, 2022

@csmart

Counting interfaces (trying to match nth interface in incoming XML with nth interface in existing domain) indeed will solve the case I demonstrated above - the second interface would end up keeping its MAC address, which is what a user would expect.

Considering this seems relatively straightforward to implement, I'll have a go at it +1

That would be great, thanks @mlow! 👌

Some "unexpected" behavior may still occur; consider a domain having 4 interfaces, and we remove the 3rd interface from the incoming XML (which had no explicit <mac>s or <alias>es). There's no way for us to know that it was the 3rd interface removed. When we perform the matching, we'd end up scraping the first 3 MAC addresses from the domain, and ultimately the 4th interface would be dropped. Some users might consider that unexpected, some might not (IMO the FUZZY should get most users to not expect miracles)

I think the scenario you laid out there is probably OK - I'm pretty sure that's how the current libvirt will work anyway (in terms of how it drops interfaces).

I agree with you - I think that we should probably make it clear in the docs that using FUZZY is not guaranteed to be perfect, so that it's clear if people want to be sure, they should use aliases or always specify MACs. Maybe we need to flesh that out a little bit more in the docs to make it clear?

Anyway, I'm pretty excited for your changes, I think they'll make the module much better to use, so thanks again! I'm looking forward to merging it 😃

@mlow
Copy link
Contributor Author

mlow commented Oct 9, 2022

@csmart

Made the attempt! I'm not quite as happy with the outcome as what I was hoping for, but I think even I am trying to make FUZZY work better than is reasonable. Things can still get weird when "some but not all" interface MACs are defined, and you start adding/removing/re-ordering interfaces. But I think it works "well enough", and am certainly at a point where I don't want to look at this code anymore... [Edit: to clarify, the code for fuzzy MAC address matching]

I look forward to hearing what you think :)

From libvirt docs (https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainDefineXML):
-- A previous definition for this domain with the same UUID and name would
be overridden if it already exists.

Before this commit, if a definition  was given without a UUID, the
appropriate domain would be created with a libvirt-generated UUID. If
the definition was modified (with UUID still left out), the module would
quietly exit with `changes: False` and make no changes (masking a
libvirt error that a domain with the same name already existed).

If a UUID was given, the module would work (mostly) as expected and
update the domain's definition. However, if the <uuid> did not match the
<uuid> of an existing domain with the same name, the module would
quietly exit without making changes again.

I say (mostly) as expected, because it was still possible to create
non-idempotent XML definitions. For example, if network interface MAC
addresses were left out of the XML definition, libvirt would generate
entirely new ones; probably not what most users would expect.

To summarize the changes of this commit:
- Incoming XML is checked for a UUID
- A domain with the same name checked for
- If a UUID is defined in the incoming XML and it doesn't match that of
  an existing domain with the same name, exit with an error
- Add a `mutate_flags` parameter to `virt`:
  - Copy the <uuid> of an existing domain when the `ADD_UUID` mutate
    flag is supplied and the incoming XML is missing a <uuid>.
  - Attempt to reuse the MAC address of an existing interface having the
    same <alias>, when the `ADD_MAC_ADDRESSES` mutate flag is supplied.
  - Attempt to reuse the MAC address of an existing interface of similar
    type and source when the `ADD_MAC_ADDRESSES_FUZZY` mutate flag is
    supplied.
- Diff info is supplied, showing the changes between previously and
  newly defined domain XML.

Note: it is still possible to apply non-idempotent libvirt definitions
with this module. However, the support for diffing makes it easier to
find those and fix the problems your definitions (or request support in
this module to handle those cases where feasible).
@csmart
Copy link
Collaborator

csmart commented Oct 10, 2022

@mlow I've re-done my testing and I think you've done a great job 👍 Thanks so much for this contribution! Unless you can think of anything else, I'm happy to merge it.

@mlow
Copy link
Contributor Author

mlow commented Oct 12, 2022

Thanks , @csmart
I think I'm happy where this PR is. I'd like to contribute some more to community.libvirt in other PRs, mainly adding some tests for this new behaviour and towards splitting up virt as I talk about in #144 (which leads in to another new feature, ability to fetch domain interface IPs)

@csmart
Copy link
Collaborator

csmart commented Oct 12, 2022

Sounds great 👍 I will merge this code in, thanks again!

@csmart csmart merged commit c4fe158 into ansible-collections:main Oct 12, 2022
@mlow mlow deleted the virt-define-improvements branch October 13, 2022 05:06
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Oct 12, 2023
8.5.0

amazon.aws
~~~~~~~~~~

- ec2_ami - add support for ``org_arns`` and ``org_unit_arns`` in launch_permissions (ansible-collections/amazon.aws#1690).
- elb_application_lb_info - drop redundant ``describe_load_balancers`` call fetching ``ip_address_type`` (ansible-collections/amazon.aws#1768).

community.general
~~~~~~~~~~~~~~~~~

- cargo - add option ``executable``, which allows user to specify path to the cargo binary (ansible-collections/community.general#7352).
- cargo - add option ``locked`` which allows user to specify install the locked version of dependency instead of latest compatible version (ansible-collections/community.general#6134).
- dig lookup plugin - add TCP option to enable the use of TCP connection during DNS lookup (ansible-collections/community.general#7343).
- gitlab_group - add option ``force_delete`` (default: false) which allows delete group even if projects exists in it (ansible-collections/community.general#7364).
- ini_file - add ``ignore_spaces`` option (ansible-collections/community.general#7273).
- newrelic_deployment - add option ``app_name_exact_match``, which filters results for the exact app_name provided (ansible-collections/community.general#7355).
- onepassword lookup plugin - introduce ``account_id`` option which allows specifying which account to use (ansible-collections/community.general#7308).
- onepassword_raw lookup plugin - introduce ``account_id`` option which allows specifying which account to use (ansible-collections/community.general#7308).
- parted - on resize, use ``--fix`` option if available (ansible-collections/community.general#7304).
- pnpm - set correct version when state is latest or version is not mentioned. Resolves previous idempotency problem (ansible-collections/community.general#7339).
- proxmox - add ``vmid`` (and ``taskid`` when possible) to return values (ansible-collections/community.general#7263).
- random_string - added new ``ignore_similar_chars`` and ``similar_chars`` option to ignore certain chars (ansible-collections/community.general#7242).
- redfish_command - add new option ``update_oem_params`` for the ``MultipartHTTPPushUpdate`` command (ansible-collections/community.general#7331).
- redfish_config - add ``CreateVolume`` command to allow creation of volumes on servers (ansible-collections/community.general#6813).
- redfish_config - adding ``SetSecureBoot`` command (ansible-collections/community.general#7129).
- redfish_info - add support for ``GetBiosRegistries`` command (ansible-collections/community.general#7144).
- redfish_info - adds ``LinkStatus`` to NIC inventory (ansible-collections/community.general#7318).
- redis_info - refactor the redis_info module to use the redis module_utils enabling to pass TLS parameters to the Redis client (ansible-collections/community.general#7267).
- supervisorctl - allow to stop matching running processes before removing them with ``stop_before_removing=true`` (ansible-collections/community.general#7284).

community.libvirt
~~~~~~~~~~~~~~~~~

- virt - add `mutate_flags` parameter to enable XML mutation (add UUID, MAC addresses from existing domain) (ansible-collections/community.libvirt#142).
- virt - support ``--diff`` for ``define`` command (ansible-collections/community.libvirt#142).

community.routeros
~~~~~~~~~~~~~~~~~~

- api_info - add new ``include_read_only`` option to select behavior for read-only values. By default these are not returned (ansible-collections/community.routeros#213).
- api_info, api_modify - add support for ``address-list`` and ``match-subdomain`` introduced by RouterOS 7.7 in the ``ip dns static`` path (ansible-collections/community.routeros#197).
- api_info, api_modify - add support for ``user``, ``time`` and ``gmt-offset`` under the ``system clock`` path (ansible-collections/community.routeros#210).
- api_info, api_modify - add support for the ``interface ppp-client`` path (ansible-collections/community.routeros#199).
- api_info, api_modify - add support for the ``interface wireless`` path (ansible-collections/community.routeros#195).
- api_info, api_modify - add support for the ``iot modbus`` path (ansible-collections/community.routeros#205).
- api_info, api_modify - add support for the ``ip dhcp-server option`` and ``ip dhcp-server option sets`` paths (ansible-collections/community.routeros#223).
- api_info, api_modify - add support for the ``ip upnp interfaces``, ``tool graphing interface``, ``tool graphing resource`` paths (ansible-collections/community.routeros#227).
- api_info, api_modify - add support for the ``ipv6 firewall nat`` path (ansible-collections/community.routeros#204).
- api_info, api_modify - add support for the ``mode`` property in ``ip neighbor discovery-settings`` introduced in RouterOS 7.7 (ansible-collections/community.routeros#198).
- api_info, api_modify - add support for the ``port remote-access`` path (ansible-collections/community.routeros#224).
- api_info, api_modify - add support for the ``routing filter rule`` and ``routing filter select-rule`` paths (ansible-collections/community.routeros#200).
- api_info, api_modify - add support for the ``routing table`` path in RouterOS 7 (ansible-collections/community.routeros#215).
- api_info, api_modify - add support for the ``tool netwatch`` path in RouterOS 7 (ansible-collections/community.routeros#216).
- api_info, api_modify - add support for the ``user settings`` path (ansible-collections/community.routeros#201).
- api_info, api_modify - add support for the ``user`` path (ansible-collections/community.routeros#211).
- api_info, api_modify - finalize fields for the ``interface wireless security-profiles`` path and enable it (ansible-collections/community.routeros#203).
- api_info, api_modify - finalize fields for the ``ppp profile`` path and enable it (ansible-collections/community.routeros#217).
- api_modify - add new ``handle_read_only`` and ``handle_write_only`` options to handle the module's behavior for read-only and write-only fields (ansible-collections/community.routeros#213).
- api_modify, api_info - support API paths ``routing id``, ``routing bgp connection`` (ansible-collections/community.routeros#220).

community.vmware
~~~~~~~~~~~~~~~~

- add moid property in the return value for the module(ansible-collections/community.vmware#1855).
- add new snapshot_id option to the vmware_guest_snapshot module(ansible-collections/community.vmware#1847).

dellemc.powerflex
~~~~~~~~~~~~~~~~~

- Added Ansible role to support installation and uninstallation of Gateway.
- Added Ansible role to support installation and uninstallation of SDR.
- Added Ansible role to support installation and uninstallation of Web UI.

grafana.grafana
~~~~~~~~~~~~~~~

- Add check for Curl and failure step if Agent Version is not retrieved
- Allow alert resource provisioning in Grafana Role
- Bump cryptography from 39.0.2 to 41.0.3
- Bump cryptography from 41.0.3 to 41.0.4
- Bump semver from 5.7.1 to 5.7.2
- Bump word-wrap from 1.2.3 to 1.2.5
- Create local dashboard directory in check mode
- Create missing notification directory in Grafana Role
- Remove check_mode from create local directory task in Grafana Role
- Remove dependency on local-fs.target from Grafana Agent role
- Update CI Testing
- Update Cloud Stack Module failures
- Use 'ansible_system' env variable to detect os typ in Grafana Agent Role
- hange grafana Agent Wal and Positions Directory in Grafana Agent Role

ovirt.ovirt
~~~~~~~~~~~

- ovirt_vm - Add tpm_enabled (oVirt/ovirt-ansible-collection#722).
- storage_error_resume_behaviour - Support VM storage error resume behaviour "auto_resume", "kill", "leave_paused". (oVirt/ovirt-ansible-collection#721)
- vm_infra - Support boot disk renaming and resizing. (oVirt/ovirt-ansible-collection#705)

purestorage.flashblade
~~~~~~~~~~~~~~~~~~~~~~

- purefb_bucket_replica - Added support for cascading replica links
- purefb_info - New fields to display free space (remaining quota) for Accounts and Buckets. Space used by destroyed buckets is split out from virtual field to new destroyed_virtual field
- purefb_info - Report encryption state in SMB client policy rules
- purefb_info - Report more detailed space data from Purity//FB 4.3.0
- purefb_policy - Add deny effect for object store policy rules. Requires Purity//FB 4.3.0+
- purefb_policy - Added parameter to define object store policy description

vultr.cloud
~~~~~~~~~~~

- inventory - Added VPC/VPC 2.0 support by adding ``internal_ip`` to the attributes (vultr/ansible-collection-vultr#86).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants