Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libvirt dhcp lease fix #613

Closed
wants to merge 15 commits into from

Conversation

NotBrianZach
Copy link
Contributor

@NotBrianZach NotBrianZach commented Feb 27, 2017

allows to mutate global state of deployment once before stopping
multiple machines (that could otherwise experience multithreading problems) by defining a method with a fixed named "_globalPreDestroyHook" in any given backend, in this particular case, useful with libvirt
to restart the vitrual network your vms were running on before destroying them since the only way to quickly re assign old hostname to new ip in dhcp was to mutate state of the network with virsh

                              ["virsh", "-c", "qemu:///system",
                               "net-update", net[0], "add",
                               "ip-dhcp-host",
                               "<host mac='{0}' name='{1}' ip='{2}' />".format(
                                 net[1], self.name, ip),
                               "--live"
                               ]),

example of problem this solves here

@danbst
Copy link
Contributor

danbst commented Mar 5, 2017

I accidently stopped network in virsh, then run deploy, got

...
backup> starting...
live..> starting...
backup> error: Failed to create domain from /tmp/nixops-tmp8OYpHW/backup-domain.xml
backup> error: Requested operation is not valid: network 'default' is not active
backup>
live..> error: Failed to create domain from /tmp/nixops-tmp8OYpHW/live-domain.xml
live..> error: Requested operation is not valid: network 'default' is not active
live..>
error: Multiple exceptions: command ‘['virsh', '-c', 'qemu:///system', 'create', '/tmp/nixops-tmp8OYpHW/backup-domain.xml']’ failed on machine ‘backup’ (exit code 1), command ‘['virsh', '-c', 'qemu:///system', 'create', '/tmp/nixops-tmp8OYpHW/live-domain.xml']’ failed on machine ‘live’ (exit code 1)

Then enabled network again and run destroy

$ ./nixops/scripts/nixops destroy
live..> running globalPreStopHook
error: failed to get domain 'nixops-ad4ce430-01a2-11e7-a68b-0a2c52343f13-live'
error: Domain not found: no domain with matching name 'nixops-ad4ce430-01a2-11e7-a68b-0a2c52343f13-live'
error: Command '['virsh', '-c', 'qemu:///system', 'dumpxml', u'nixops-ad4ce430-01a2-11e7-a68b-0a2c52343f13-live']' returned non-zero exit status 1

update: this may be unrelated to this PR

@NotBrianZach
Copy link
Contributor Author

looked at it for a couple seconds and can't see how myself without further info.

for first phase, this pr doesn't make assumption about network name I don't believe

for second phase,

        globalPreDestroyHook = getattr(next(self.active.itervalues(), None), "_globalPreDestroyHook", None)
         if callable(globalPreDestroyHook):
             globalPreDestroyHook()

if there are no values in self.active.itervalues(), should get None, which I don't believe is callable so shouldn't trigger hook, i don't think

certainly mutating state outside of nixops with virsh could get you in trouble fairly quickly; and nixops seems still pretty "under development" for this backend anyway ;P

@danbst
Copy link
Contributor

danbst commented Mar 5, 2017

I ended up wrapping the globalPreDestroyHook into try ... except to workaround this.

self._logged_exec(["virsh", "-c", "qemu:///system", "net-start", net])

def _globalPreDestroyHook(self):
self.log("running globalPreStopHook")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be running globalPreDestroyHook

@danbst
Copy link
Contributor

danbst commented Mar 6, 2017

though you do network destroy, seems like it doesn't fix the problem for me. I have

  ...
  <ip address='192.168.122.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.122.2' end='192.168.122.254'/>
      <host mac='52:54:00:6d:87:28' name='live' ip='192.168.122.52'/>
      <host mac='52:54:00:5c:2f:8c' name='backup' ip='192.168.122.169'/>
    </dhcp>
  </ip>

and running nixops destroy doesn't remove those custom <host lines (which were added by Nixops before #586, btw, not manually). Is this a kind of a problem you wanted to address or this is intended in #586?

@NotBrianZach
Copy link
Contributor Author

NotBrianZach commented Mar 6, 2017

if the virsh was this:

                              ["virsh", "-c", "qemu:///system",
                               "net-update", net[0], "add",
                               "ip-dhcp-host",
                               "<host mac='{0}' name='{1}' ip='{2}' />".format(
                                 net[1], self.name, ip),
                               "--live", "--config"
                               ]),

then the changes would persist after the network was destroyed

with just live the changes do not persist after the network has been destroyed.

                              ["virsh", "-c", "qemu:///system",
                               "net-update", net[0], "add",
                               "ip-dhcp-host",
                               "<host mac='{0}' name='{1}' ip='{2}' />".format(
                                 net[1], self.name, ip),
                               "--live"
                               ]),

this pull request does the second method however if you have added lines via first method (or similar technique not using a live update) they will persist unless you delete them with virsh net update.

@danbst
Copy link
Contributor

danbst commented Mar 6, 2017

So, this doesn't work with multiple deployments? If I destroy one deployment, network access to VMs in another deployment will be lost.

@NotBrianZach
Copy link
Contributor Author

NotBrianZach commented Mar 6, 2017 via email

@NotBrianZach
Copy link
Contributor Author

NotBrianZach commented Mar 6, 2017

did a little testing and if you use the same physical specification (infrastructure-libvirt.nix), it won't be able to create two deployments (since they'll have same hostname), also it did lose connection (only verified it lost ssh connection) after destroying a deployment on the same network.

haven't tested the network workaround

I'll probably still use this personally but I think it would be unexpected behavior in the broader tool

though I do think this feature could be provided in a reliable/compatible way by automatically generating network names or some similar scheme.

@danbst
Copy link
Contributor

danbst commented Mar 7, 2017

Strangely enough, if I define

    networking.bridges.br1.interfaces = [];
    networking.interfaces.br1.ip4 = [ { address = "192.168.5.1"; prefixLength = 24; } ];

in guest's configuration, it publishes hostname to libvirt's dnsmasq. If now bridges are present, DHCP leases are not updated (the hostname nixos is assigned to all machines in a network).

If you could verify, that with this client configs you don't require patching XML, I can dig deeper into the issue.

I tested this with nixops destroy && nixops deploy, monitored DHCP leases with

$ journalctl SYSLOG_FACILITY=3 -pdebug -f | grep dnsmasq

used nixos-17.03

@NotBrianZach
Copy link
Contributor Author

NotBrianZach commented Mar 7, 2017

I added these to logical specification (network.nix), and commented out the live patching of the network xml.

   networking.bridges.br1.interfaces = [];
    networking.interfaces.br1.ip4 = [ { address = "192.168.122.68"; prefixLength = 24; } ];
   networking.bridges.br1.interfaces = [];
    networking.interfaces.br1.ip4 = [ { address = "192.168.122.57"; prefixLength = 24; } ];

following that when doing the new leases did have the appropriate names (instead of nixos) as you say,

sudo journalctl SYSLOG_FACILITY=3 -pdebug -f | grep dnsmasq
Mar 07 12:21:49 zachMothership dnsmasq-dhcp[9009]: DHCPREQUEST(virbr0) 192.168.122.189 52:54:00:a7:a7:b9
Mar 07 12:21:49 zachMothership dnsmasq-dhcp[9009]: DHCPACK(virbr0) 192.168.122.189 52:54:00:a7:a7:b9 postgresqlServer
Mar 07 12:21:50 zachMothership dnsmasq-dhcp[9009]: DHCPREQUEST(virbr0) 192.168.122.7 52:54:00:9a:65:02
Mar 07 12:21:50 zachMothership dnsmasq-dhcp[9009]: DHCPACK(virbr0) 192.168.122.7 52:54:00:9a:65:02 apiServer

virsh -c qemu:///system net-dhcp-leases default

Expiry Time          MAC address        Protocol  IP address                Hostname        Client ID or DUID
-------------------------------------------------------------------------------------------------------------------
2017-03-07 12:34:44  52:54:00:09:06:8d  ipv4      192.168.122.227/24        -               -
2017-03-07 12:49:44  52:54:00:0c:2a:c4  ipv4      192.168.122.220/24        nixos           -
2017-03-07 12:34:44  52:54:00:0f:1a:af  ipv4      192.168.122.188/24        -               -
2017-03-07 13:04:21  52:54:00:15:bd:16  ipv4      192.168.122.138/24        nixos           -
2017-03-07 13:19:00  52:54:00:19:95:72  ipv4      192.168.122.52/24         apiServer       -
2017-03-07 13:11:41  52:54:00:27:b0:97  ipv4      192.168.122.105/24        apiServer       -
2017-03-07 13:11:40  52:54:00:33:53:16  ipv4      192.168.122.121/24        nixos           -
2017-03-07 12:45:09  52:54:00:52:1c:73  ipv4      192.168.122.117/24        postgresqlServer -
2017-03-07 12:43:41  52:54:00:6f:b2:9f  ipv4      192.168.122.64/24         -               -
2017-03-07 12:49:34  52:54:00:7e:f8:0a  ipv4      192.168.122.34/24         nixos           -
2017-03-07 13:21:50  52:54:00:9a:65:02  ipv4      192.168.122.7/24          apiServer       -
2017-03-07 13:04:21  52:54:00:a5:04:ba  ipv4      192.168.122.5/24          postgresqlServer -
2017-03-07 13:21:49  52:54:00:a7:a7:b9  ipv4      192.168.122.189/24        nixos           -
2017-03-07 12:59:36  52:54:00:b9:6e:47  ipv4      192.168.122.33/24         apiServer       -
2017-03-07 13:18:58  52:54:00:bb:55:69  ipv4      192.168.122.124/24        nixos           -
2017-03-07 12:57:36  52:54:00:e1:3a:89  ipv4      192.168.122.99/24         nixos           -
2017-03-07 12:46:24  52:54:00:e8:b6:c7  ipv4      192.168.122.29/24         apiServer       -
2017-03-07 12:43:52  52:54:00:f0:2e:5e  ipv4      192.168.122.90/24         -               -

however the deploy hangs when starting nscd.service
also I am on nixos version 16.09.663.3dc0897 (Flounder)

@danbst
Copy link
Contributor

danbst commented Mar 7, 2017

networking.interfaces.br1.ip4 = [ { address = "192.168.122.68"; prefixLength = 24; } ];

Maybe I was a bit unclear here, but "192.168.5.1" was an arbitrary IP not clashing with libvirtd 192.168.122.0/24. I don't have problems with nscd at 16.09.

Though I'm stuck again. DHCP leases are expired in an hour and not renewed until machine reboot, so this solution is still bad. Perhaps some DHCP refresh daemon on client could solve this problem? Don't know

@NotBrianZach
Copy link
Contributor Author

NotBrianZach commented Mar 7, 2017

hmm, woops. yea not exactly a networking war veteran myself. was trying to go for a fix internal to nixops/libvirt, though for my personal needs it's not currently super high on the list of priorities. Do still think the xml hack could work if you forced each deployment to have a different network name. Not sure how bad a tradeoff that is in terms of configurability for more complex setups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants