Allowing autostart setting for KVM deployed VMs with persistent images, to restart them upon node reboot #599

rsmontero · 2017-11-20T00:00:49Z

Author Name: Olivier Berger (Olivier Berger)
Original Redmine Issue: 1290, https://dev.opennebula.org/issues/1290
Original Date: 2012-05-23

As discussed in http://lists.opennebula.org/pipermail/users-opennebula.org/2012-May/008959.html I think it would be great to offer some support for automatic restarting of VMs (libvirt autostart setting) images that would have persistent images, in case a node is rebooted (or in case of power outages and other restarts).

One main change involved is the need to define the domains, instead of transient domains.

See the discussion of proposed changes

rsmontero · 2017-11-20T00:00:50Z

Original Redmine Comment
Author Name: jordan pittier (jordan pittier)
Original Date: 2012-05-24T18:48:06Z

Sounds great but, isn't this KVM specific ? Although I am using KVM, I like the way Opennebula tries to be as much "cross hypervisor" as possible.

rsmontero · 2017-11-20T00:00:50Z

Original Redmine Comment
Author Name: Olivier Berger (Olivier Berger)
Original Date: 2012-05-24T20:55:58Z

jordan pittier wrote:

Sounds great but, isn't this KVM specific ? Although I am using KVM, I like the way Opennebula tries to be as much "cross hypervisor" as possible.

I don't think this is specific to KVM (although I mentioned KVM in the title of this ticket), as it is a a libvirt option (see http://libvirt.org/sources/virshcmdref/html/sect-autostart.html).

Still it may happen that libvirt only supports this for KVM. I haven't tested with others. But I couldn't imagine other hypervisors wouldn't have such a feature :-/

rsmontero · 2017-11-20T00:00:51Z

Original Redmine Comment
Author Name: jordan pittier (jordan pittier)
Original Date: 2012-05-31T14:14:34Z

You are correct.

The this is, Opennebula uses libvirt only to manage KVM hosts.

rsmontero · 2017-11-20T00:00:51Z

Original Redmine Comment
Author Name: Daniel Dehennin (Daniel Dehennin)
Original Date: 2013-10-02T12:07:19Z

+1

It could be great to have a checkbox option in template definition and/or at VM instance time.

Some tests may be performed before enabling the “autostart” as I'm not sure it will work for non-persistent disk.

Thanks.

rsmontero · 2017-11-20T00:00:52Z

Original Redmine Comment
Author Name: Daniel Dehennin (Daniel Dehennin)
Original Date: 2013-11-07T11:17:36Z

Daniel Dehennin wrote:

+1

It could be great to have a checkbox option in template definition and/or at VM instance time.

We could also define a name prefix to use when ONE will @onetemplate instanciate@ automatically.

Some tests may be performed before enabling the “autostart” as I'm not sure it will work for non-persistent disk.

I thought a little to this issue as we need it and I wonder if it could not be implemented with actual features of ONE instead of using libvirt feature (for KVM).

In fact, I'm quite sure we should not use libvirt to manage this even for KVM VMs, on my own system, service dependencies is problematic between libvirt and OpenvSwitch.

Instead we could use the @suspend@, @Stop@ and @resume@ mechanisms, making it working with non-persistent storage if I understood correctly theses commands.

We must face different use-cases, depending of what happen and if we are in single node or multi nodes.

For single node

on planned shutdown/reboot after an upgrade for example (init 0/init 6)

when the ONE node is shutdown, we could just run @OneVM suspend@ all the running VM and put them in an @autoboot@ state.

when the ONE node is booted, run @OneVM resume@ on each VMs in @autoboot@ state

on ONE node crash like hardware failure

when the ONE node is booted

run @OneVM boot@ on each VMs in @unknown@ state using an auto-start enabled template

search for auto-start enabled templates and for each one: if no @running@ VMs use it, run @onetemplate instanciate@

For multi-nodes

on planned shutdown/reboot after an upgrade for example (init 0/init 6)
** if @System@ datastore is shared, just live migrate all VMs to other nodes
** if @System@ datastore is local to the node, run @OneVM stop@ and @OneVM resume@ to perform a “cold migration”

I'm not sure about the best thing to do on hardware failure, in fact, as I'm missing test machines for now I don't even know what ONE does in such situation.

Regards.

rsmontero · 2017-11-20T00:00:52Z

Original Redmine Comment
Author Name: Daniel Dehennin (Daniel Dehennin)
Original Date: 2014-10-03T21:45:21Z

The ideal would be to have a trusted synchronous communication from the node to the frontend to report a shutdown/reboot.

My idea it to use the monitoring system with an init script on the node like the @libvirt-guest@ one:

started last, stopped first
push some kind of shutdown monitoring
then the frontend run a node hook and apply a policy like the one I describe in my previous comment (single/multi nodes, with/without shared storage)
the script must wait for some feedback of the frontend

This will not work with pull based monitoring.

Another option is to hit the frontend directly by RPC but this requires authentication/authorization of nodes on the frontend.

Is there a way for nodes to advertise the frontend of a shutdown/reboot?

rsmontero · 2017-11-20T00:00:53Z

Original Redmine Comment
Author Name: Ruben S. Montero (@rsmontero)
Original Date: 2014-10-07T14:28:56Z

Is there a way for nodes to advertise the frontend of a shutdown/reboot?

In a shutdown/reboot cycle:

Long enough cycle the host should transit the error -> on state so a hook can be easily triggered
For quick reboots oned could not even notice the reboot, in that case we can add a probe in the monitor system. However the VMs will be moved to POWER-OFF. We can add a hook (on vm power off) to power on them if the host was rebooted. We need to add a probe for that.

So I'd suggest

Add a probe with uptime information
Write a hook for VM on POWER-OFF, if the VM went to power off just after the host booted restart it.

I really like this approach as it is hypervisor independent.

rsmontero · 2017-11-20T00:00:53Z

Original Redmine Comment
Author Name: EOLE Team (EOLE Team)
Original Date: 2014-10-07T14:57:28Z

Ruben S. Montero wrote:

So I'd suggest

Add a probe with uptime information

Write a hook for VM on POWER-OFF, if the VM went to power off just after the host booted restart it.

Could we make a difference between VMs in @power-off@ because of the reboot and VMs in @power-off@ because user want them powered off?

I'm not sure we can blindly boot VMs after a reboot.

I really like this approach as it is hypervisor independent.

Yes, me too, even if I personally only use KVM ;-)

Regards.

rsmontero · 2017-11-20T00:00:53Z

Original Redmine Comment
Author Name: Ruben S. Montero (@rsmontero)
Original Date: 2014-10-09T09:01:41Z

EOLE Team wrote:

Ruben S. Montero wrote:

So I'd suggest

Add a probe with uptime information

Write a hook for VM on POWER-OFF, if the VM went to power off just after the host booted restart it.

Could we make a difference between VMs in @power-off@ because of the reboot and VMs in @power-off@ because user want them powered off?

Yes I think we can use the REASON field of history. Simply add a new reason for automatic transitions (vs user requested). This together with the uptime of the host should be enough...

Chhers

rsmontero · 2017-11-20T00:00:54Z

Original Redmine Comment
Author Name: Olivier Berger (Olivier Berger)
Original Date: 2015-08-12T14:17:46Z

FWIW, a discussion about this issue : https://forum.opennebula.org/t/automatically-restart-vms-after-host-restart/454

Any progress to expect ?

rsmontero · 2017-11-20T00:00:54Z

Original Redmine Comment
Author Name: Olivier Berger (Olivier Berger)
Original Date: 2015-08-12T14:19:46Z

Olivier Berger wrote:

As discussed in http://lists.opennebula.org/pipermail/users-opennebula.org/2012-May/008959.html ...

Btw, the list archive is gone, but one can still find it on https://www.mail-archive.com/users%40lists.opennebula.org/msg06649.html

Hth

rsmontero · 2017-11-20T00:00:55Z

Original Redmine Comment
Author Name: Olivier Berger (Olivier Berger)
Original Date: 2015-08-13T09:31:22Z

Would it be possible to at least have KVM VMs created as non-transient, i.e. using virsh define + virsh start instead of just virsh create so that it is possible to manually perform a virsh autostart if needed (virsh autostart won't work on transient VMs), like in https://gist.github.com/anonymous/2776202, but without line 34 ?

I'm not sure there would be any side effects, and that would be a first improvement for KVM, until a more generic solution is found ?

rsmontero · 2017-11-20T00:00:55Z

Original Redmine Comment
Author Name: Ruben S. Montero (@rsmontero)
Original Date: 2015-08-14T15:00:21Z

Yes the main reason for holding this back is the side effects for all the operations. I agree the idea would be to define+start and when the VM is removed from the host it needs to be undefined. We need to review all the operations to check when we need to do the undefine, e.g. poweroff, migrations etc... now it is assumed that the VM is not defined.

rsmontero · 2017-11-20T00:00:56Z

Original Redmine Comment
Author Name: EOLE Team (EOLE Team)
Original Date: 2015-11-17T13:35:54Z

Ruben S. Montero wrote:

Yes I think we can use the REASON field of history. Simply add a new reason for automatic transitions (vs user requested). This together with the uptime of the host should be enough...

Could we open a new issue for this point ?

This could solve the “host crashed” case:

we set a @Reason@ for any operation (automatic or user requested)
in case of a host crash, the VMs will be reported as @PowerOff@ without any @Reason@.

Then, we should add the possiblity to have a @HOST_HOOK@ executed when a host enter the @on@ state, in which case we list all VMs on that host in @PowerOff@ state without any @Reason@ and @resume@ them.

Regards.

stale · 2019-03-06T13:56:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. The OpenNebula Dev Team

baby-gnu · 2019-03-11T08:54:35Z

Thanks for keeping this in backlog 👍

mathieumd · 2019-06-19T13:27:08Z

Does that mean that OpenNebula does not support restarting VM after their host has been rebooted?

Githopp192 · 2020-10-14T13:41:47Z

yes .. this is really a good an important question ..
normally you can easy configure that with "virsh autostart " -->

   autostart [--disable] domain
       Configure a domain to be automatically started at boot.

       The option --disable disables autostarting.

But this will not work on OpenNebula controlled VMs -->

error: Failed to mark domain 2 as autostarted
error: Requested operation is not valid: cannot set autostart for transient domain

rsmontero · 2020-11-11T09:44:16Z

Hi,

There are 2 considerations here:

OpenNebula create transient domains, i.e. they are not persisted in libvirt
Considering a distributed system, when a host fails you'd be most probably interested in restarting the VM in other host, rather than waiting for the hypervisor to be fixed. This functionality is described here:

https://docs.opennebula.io/5.12/advanced_components/ha/ftguide.html#host-failures

Maybe, we can extend this hook to not recreate VMs in other hosts, but rather the host to be back online. This could be executed by a simple hook (restart host VMs in UKNOWN when the host go back to "online")

What do you think?

Githopp192 · 2021-01-08T22:07:42Z

yes,Ruben .. this would probably make sense (as addition) to extend the hook, that it would be possible to NOT recreate VMs in other hosts - (i also simply thought to create a own script for that).

On the other hand, i would need such a solution as well (that when one host goes down or not working properly), than through a hook the vm would be migrated to another host.

But i never tested that .. i'm using this kind of design -->
VMs of Host1 are runing on --> host1:/home/datastores
VMs of Host2 are runing on --> host2:/home/datastores

Important data are on Host1 --> host1:/datastore_ssd

per Sunstone all DS are NFS-Shared - but only sunstone will see all (nfs-shared) DS.

What would happen by an automatic VM-Migrate hook from host1: to host2: ?
(Host1: has got all vm(os) and data on the local host)

[root@sunstone ~]# onedatastore list
ID USER GROUP NAME SIZE AVA CLUSTERS IMAGES TYPE DS TM STAT

101 oneadmin oneadmin data_hd_srv 16T 99% 0 1 img fs qcow2 on
100 oneadmin oneadmin data_ssd_srv 4T 87% 0 1 img fs qcow2 on
1 oneadmin oneadmin os_nvme_srv 4T 93% 0 5 img fs qcow2 on
0 oneadmin oneadmin vm_nvme_srv 4T 93% 0 0 sys - qcow2 on

[root@sunstone ~]# df -h

host1:/home/datastores 4.0T 297G 3.8T 8% /mnt/datastores/nfs-dsnvme-srv
host2:/home/datastores 1.0T 53G 972G 6% /mnt/datastores/nfs-dshdd-srv2
host1:/datastore_ssd 4.0T 533G 3.5T 14% /mnt/datastores/nfs-dsssd-srv
host1:/datastore_hdd 16T 216G 16T 2% /mnt/datastores/nfs-dshdd-srv

rsmontero added Category: KVM Community Status: Pending Type: Backlog labels Nov 20, 2017

rsmontero removed the Status: Pending label Nov 20, 2017

stale bot added the Status: Abandoned label Mar 6, 2019

tinova added the Status: Accepted label Mar 6, 2019

stale bot removed the Status: Abandoned label Mar 6, 2019

rsmontero pushed a commit that referenced this issue Jan 20, 2021

F #3951: Restructure provision templates (#599)

9ef770c

rsmontero pushed a commit that referenced this issue Jan 20, 2021

F #3951: Restructure provision templates (#599)

7a2fc40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing autostart setting for KVM deployed VMs with persistent images, to restart them upon node reboot #599

Allowing autostart setting for KVM deployed VMs with persistent images, to restart them upon node reboot #599

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

Add a probe with uptime information

Write a hook for VM on POWER-OFF, if the VM went to power off just after the host booted restart it.

rsmontero commented Nov 20, 2017

Add a probe with uptime information

Write a hook for VM on POWER-OFF, if the VM went to power off just after the host booted restart it.

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

stale bot commented Mar 6, 2019

baby-gnu commented Mar 11, 2019

mathieumd commented Jun 19, 2019

Githopp192 commented Oct 14, 2020

rsmontero commented Nov 11, 2020

Githopp192 commented Jan 8, 2021

Allowing autostart setting for KVM deployed VMs with persistent images, to restart them upon node reboot #599

Allowing autostart setting for KVM deployed VMs with persistent images, to restart them upon node reboot #599

Comments

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

For single node

when the ONE node is shutdown, we could just run @OneVM suspend@ all the running VM and put them in an @autoboot@ state.

when the ONE node is booted, run @OneVM resume@ on each VMs in @autoboot@ state

when the ONE node is booted

run @OneVM boot@ on each VMs in @unknown@ state using an auto-start enabled template

search for auto-start enabled templates and for each one: if no @running@ VMs use it, run @onetemplate instanciate@

For multi-nodes

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

Add a probe with uptime information

Write a hook for VM on POWER-OFF, if the VM went to power off just after the host booted restart it.

rsmontero commented Nov 20, 2017

Add a probe with uptime information

Write a hook for VM on POWER-OFF, if the VM went to power off just after the host booted restart it.

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

rsmontero commented Nov 20, 2017

stale bot commented Mar 6, 2019

baby-gnu commented Mar 11, 2019

mathieumd commented Jun 19, 2019

Githopp192 commented Oct 14, 2020

rsmontero commented Nov 11, 2020

Githopp192 commented Jan 8, 2021