New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0012] Declarative virtual machines #12

Open
wants to merge 15 commits into
base: master
from

Conversation

Projects
None yet
8 participants
@Ekleog

Ekleog commented Apr 8, 2017

Here is a proposition of a NixOS module to declaratively handle virtual machines.

Rendered: https://github.com/Ekleog/nixos-rfcs/blob/virtual-machines/rfcs/0012-declarative-virtual-machines.md

Having started work on it quite a while ago (2017-02-25 acc. to my git history), I have something working that almost fits this RFC in a ~600 lines module file and a few additions of functions to the nixpkgs lib (for auto-generating the IPs, as they could be used elsewhere), so concerns like "this will be hard to implement" should not be too much of an issue :) (that's not saying I'd drop the support if things needed to change, I'm perfectly willing to rework it so that it fits the bill)

Hope this may help!

cc @Nadrieril, and thank you for review and harsh criticism whenever I did something wrong 馃憤

Edit: Adding a link to the current state of the implementation: NixOS/nixpkgs@master...Ekleog:vms

Ekleog added some commits Apr 1, 2017

@Ekleog Ekleog changed the title from Virtual machines to [RFC 0012] Virtual machines Apr 8, 2017

@Ekleog Ekleog changed the title from [RFC 0012] Virtual machines to [RFC 0012] Declarative virtual machines Apr 10, 2017

@copumpkin

This comment has been minimized.

Show comment
Hide comment
@copumpkin

copumpkin May 8, 2017

Member

I haven't read it in depth, but love the idea so far. Have you considered trying to abstract over containers and VMs using libvirt?

Member

copumpkin commented May 8, 2017

I haven't read it in depth, but love the idea so far. Have you considered trying to abstract over containers and VMs using libvirt?

@Ekleog

This comment has been minimized.

Show comment
Hide comment
@Ekleog

Ekleog May 8, 2017

That's a really good point: we considered using libvirt to abstract over different kinds of virtualizers, but figured out that only one was actually needed, and the module in itself was already a "virtualizer abstraction layer". As it also adds one level of potential failure and, mostly, I don't know anything about libvirt, at the time the project started I went with qemu.

Now, abstracting also over containers would actually bring net value, by more or less merging the two modules together, and as they have different objectives (lightweight vs. heavyweight virtualization) it makes sense to have both.

However, I'm not sure the implementation could share much, exactly because the goals are different. I'll take the features described in the RFC and try to see how it could be merged with a containers implementation:

  • memorySize, vcpus: same as for containers
  • diskSize doesn't seem to make sense for containers [1]
  • Managing the disk of the guest (mostly its /nix/store): containers being lightweight, it makes sense for them to mount directly over the host's store, however this would leak information between VMs if used for VMs, hence the process to mount the store must differ (and is one of the major pain points, at least in my current implementation [2])
  • Shared directories: same as for containers
  • Networking (the second major pain point) appears quite different than what is currently proposed with containers to me, especially given containers.<name>.privateNetwork that would be very hard to impossible to replicate with libvirt
  • Security part (ie. running qemu as non-root) is irrelevant for containers
  • Nix support is unimplemented for the time being for VMs, I don't know its exact state for containers, could be a great way to share code (but I was thinking nix 1.12 and especially nixos-prepare-root would bring the same improvements)

Besides, I just learned while writing this answer that imperative containers rely on the inner working of the containers module [3], so replacing the containers module with something based on libvirt would be highly nontrivial, I guess.

Just to get an idea, I counted the lines of code in [2] that could be shared with containers using libvirt (assuming it's equally hard to write modules for qemu and libvirt). The answer is (excluding option definitions) 24: 2 for memorySize and vcpus and 22 for shared.

Obviously, this count is biased towards lower values: the current implementation does only the strict minimum and future additional features may be more factorizable than the current ones.

However, I think there is a wide enough gap between containers and VMs to make it maybe even harder to factor the two than to maintain a separate implementation for each. Do you see another place for factorization I missed?

[1] https://libvirt.org/drvlxc.html

[2] NixOS/nixpkgs@master...Ekleog:vms

[3] NixOS/nixpkgs#3021

Ekleog commented May 8, 2017

That's a really good point: we considered using libvirt to abstract over different kinds of virtualizers, but figured out that only one was actually needed, and the module in itself was already a "virtualizer abstraction layer". As it also adds one level of potential failure and, mostly, I don't know anything about libvirt, at the time the project started I went with qemu.

Now, abstracting also over containers would actually bring net value, by more or less merging the two modules together, and as they have different objectives (lightweight vs. heavyweight virtualization) it makes sense to have both.

However, I'm not sure the implementation could share much, exactly because the goals are different. I'll take the features described in the RFC and try to see how it could be merged with a containers implementation:

  • memorySize, vcpus: same as for containers
  • diskSize doesn't seem to make sense for containers [1]
  • Managing the disk of the guest (mostly its /nix/store): containers being lightweight, it makes sense for them to mount directly over the host's store, however this would leak information between VMs if used for VMs, hence the process to mount the store must differ (and is one of the major pain points, at least in my current implementation [2])
  • Shared directories: same as for containers
  • Networking (the second major pain point) appears quite different than what is currently proposed with containers to me, especially given containers.<name>.privateNetwork that would be very hard to impossible to replicate with libvirt
  • Security part (ie. running qemu as non-root) is irrelevant for containers
  • Nix support is unimplemented for the time being for VMs, I don't know its exact state for containers, could be a great way to share code (but I was thinking nix 1.12 and especially nixos-prepare-root would bring the same improvements)

Besides, I just learned while writing this answer that imperative containers rely on the inner working of the containers module [3], so replacing the containers module with something based on libvirt would be highly nontrivial, I guess.

Just to get an idea, I counted the lines of code in [2] that could be shared with containers using libvirt (assuming it's equally hard to write modules for qemu and libvirt). The answer is (excluding option definitions) 24: 2 for memorySize and vcpus and 22 for shared.

Obviously, this count is biased towards lower values: the current implementation does only the strict minimum and future additional features may be more factorizable than the current ones.

However, I think there is a wide enough gap between containers and VMs to make it maybe even harder to factor the two than to maintain a separate implementation for each. Do you see another place for factorization I missed?

[1] https://libvirt.org/drvlxc.html

[2] NixOS/nixpkgs@master...Ekleog:vms

[3] NixOS/nixpkgs#3021

@Ekleog

This comment has been minimized.

Show comment
Hide comment
@Ekleog

Ekleog May 8, 2017

Jumping to another subject while I'm at it: I brought this link to the ML [1].

From there two major points emerged.

First one, there are not enough tools to share code between different VM appliances. In my opinion, this one will be in a big part solved by nixos-prepare-root after nix 1.12, the remainder being mostly virtualization-system-specific. Then, it's the same point as the one you are raising, and I wonder if I'm not just not seeing possibilities for code factorization.

The second one was about the format for disk management and boot control. I will take elements back from [2]. The basic requirement is that the host be able to control the guest's boot, mostly for upgrades and downgrades. This can be achieved in three ways:

  1. Having the guest boot, then wait in the initrd for its configuration, have the host push it, then the guest can continue booting. This requires nothing special on the guest's FS.
  2. Having the guest's /nix on a separate .qcow2 image. This way, the host can decide to stop the guest, upgrade the store, then restart the guest.
  3. Having the guest's /nix on a virtfs. This way, the host can upgrade the guest's store in-place.

There are drawbacks to all of these options:

  1. makes the boot scheme complex to understand, and risks duplicating some behaviour between the dropbear in the initrd and the ssh daemon outside of it, in order to handle online upgrades
  2. makes it really complex to handle online upgrades
  3. may be slower and less stable (even though I haven't experienced unstability during the development), and makes it hard to control the guest's store size.

Besides, options 1 and 2 would I think be made way easier by the presence of nixos-prepare-root, which hints for waiting for it to be released with nix 1.12.

As you will have understood, my current favorite is option 3, and after the coming of nixos-prepare-root, option 1 would I think have my vote.

Do you see another decision argument? Another scheme for handling upgrades and downgrades effectively? What would be your choice?

[1] https://www.mail-archive.com/nix-dev@lists.science.uu.nl/msg36307.html

[2] https://www.mail-archive.com/nix-dev@lists.science.uu.nl/msg36354.html

Ekleog commented May 8, 2017

Jumping to another subject while I'm at it: I brought this link to the ML [1].

From there two major points emerged.

First one, there are not enough tools to share code between different VM appliances. In my opinion, this one will be in a big part solved by nixos-prepare-root after nix 1.12, the remainder being mostly virtualization-system-specific. Then, it's the same point as the one you are raising, and I wonder if I'm not just not seeing possibilities for code factorization.

The second one was about the format for disk management and boot control. I will take elements back from [2]. The basic requirement is that the host be able to control the guest's boot, mostly for upgrades and downgrades. This can be achieved in three ways:

  1. Having the guest boot, then wait in the initrd for its configuration, have the host push it, then the guest can continue booting. This requires nothing special on the guest's FS.
  2. Having the guest's /nix on a separate .qcow2 image. This way, the host can decide to stop the guest, upgrade the store, then restart the guest.
  3. Having the guest's /nix on a virtfs. This way, the host can upgrade the guest's store in-place.

There are drawbacks to all of these options:

  1. makes the boot scheme complex to understand, and risks duplicating some behaviour between the dropbear in the initrd and the ssh daemon outside of it, in order to handle online upgrades
  2. makes it really complex to handle online upgrades
  3. may be slower and less stable (even though I haven't experienced unstability during the development), and makes it hard to control the guest's store size.

Besides, options 1 and 2 would I think be made way easier by the presence of nixos-prepare-root, which hints for waiting for it to be released with nix 1.12.

As you will have understood, my current favorite is option 3, and after the coming of nixos-prepare-root, option 1 would I think have my vote.

Do you see another decision argument? Another scheme for handling upgrades and downgrades effectively? What would be your choice?

[1] https://www.mail-archive.com/nix-dev@lists.science.uu.nl/msg36307.html

[2] https://www.mail-archive.com/nix-dev@lists.science.uu.nl/msg36354.html

@danbst

This comment has been minimized.

Show comment
Hide comment
@danbst

danbst May 26, 2017

What probably is lacking, is support for multiple disks. For example, NixOps project allows this for virtualbox backend, and same is requested for libvirtd backend.

danbst commented May 26, 2017

What probably is lacking, is support for multiple disks. For example, NixOps project allows this for virtualbox backend, and same is requested for libvirtd backend.

@Ekleog

This comment has been minimized.

Show comment
Hide comment
@Ekleog

Ekleog May 26, 2017

Hmm, you are right and this feature can be useful/needed (to take back the example you gave on IRC, for having multiple disks on multiple VMs for redundancy for openstack).

However, I think it is more important to have something agreed on on which we can later on build additional features, than trying to have something perfect from the beginning on -- the important point being that all features that are included in a RFC should have a very low probability of changing, as they will eventually enter a stable version and stable versions should be as backwards-compatible as possible.

So I think your point is a great one, and we should definitively investigate handling multiple disks, but maybe after the basics are agreed on? :)

Unless other people would want it right now, I'm putting this as an unresolved question for the time being. This way it will stay on the "future works" step after this RFC, so that choice of the correct interface doesn't slow down having a first working version.

Ekleog commented May 26, 2017

Hmm, you are right and this feature can be useful/needed (to take back the example you gave on IRC, for having multiple disks on multiple VMs for redundancy for openstack).

However, I think it is more important to have something agreed on on which we can later on build additional features, than trying to have something perfect from the beginning on -- the important point being that all features that are included in a RFC should have a very low probability of changing, as they will eventually enter a stable version and stable versions should be as backwards-compatible as possible.

So I think your point is a great one, and we should definitively investigate handling multiple disks, but maybe after the basics are agreed on? :)

Unless other people would want it right now, I'm putting this as an unresolved question for the time being. This way it will stay on the "future works" step after this RFC, so that choice of the correct interface doesn't slow down having a first working version.

@joepie91

This comment has been minimized.

Show comment
Hide comment
@joepie91

joepie91 May 26, 2017

Just throwing in my two cents concerning libvirt - in my experience of attempting to use it, it's rather temperamental, and has a habit of producing utterly incomprehensible, difficult-to-debug error messages whenever anything goes wrong.

I can't speak for others, but personally I've started just ignoring the existence of libvirt - since for most cases it seems to add more work trying to figure out why it's breaking, than it's saving me effort on any other point.

joepie91 commented May 26, 2017

Just throwing in my two cents concerning libvirt - in my experience of attempting to use it, it's rather temperamental, and has a habit of producing utterly incomprehensible, difficult-to-debug error messages whenever anything goes wrong.

I can't speak for others, but personally I've started just ignoring the existence of libvirt - since for most cases it seems to add more work trying to figure out why it's breaking, than it's saving me effort on any other point.

@teh

This comment has been minimized.

Show comment
Hide comment
@teh

teh May 28, 2017

Another data point from me: I had some bad experiences with libvirt - daemon deadlocking and becoming unresponsive, weird error messages, VM corruption. The outstanding bugs are supporting evidence. Abstracting over virtualization is hard problem so maybe I'm too hard on libvirt :)

teh commented May 28, 2017

Another data point from me: I had some bad experiences with libvirt - daemon deadlocking and becoming unresponsive, weird error messages, VM corruption. The outstanding bugs are supporting evidence. Abstracting over virtualization is hard problem so maybe I'm too hard on libvirt :)

@Ekleog

This comment has been minimized.

Show comment
Hide comment
@Ekleog

Ekleog Jun 13, 2017

OK, so maybe I'm being biased, but it has been open for more than two months and has seen, as far as I can see, mostly positive support (with some propositions for refactoring that don't seem to have caught much traction).

Maybe it would be time to get to a final comments period (idea inspired from Rust's RFCs), where last comments could be raised and the decision would be announced after a given time if no game-changing comment turns up? (cc @zimbatm)

Ekleog commented Jun 13, 2017

OK, so maybe I'm being biased, but it has been open for more than two months and has seen, as far as I can see, mostly positive support (with some propositions for refactoring that don't seem to have caught much traction).

Maybe it would be time to get to a final comments period (idea inspired from Rust's RFCs), where last comments could be raised and the decision would be announced after a given time if no game-changing comment turns up? (cc @zimbatm)

@0xABAB

This comment has been minimized.

Show comment
Hide comment
@0xABAB

0xABAB Jun 30, 2017

@Ekleog I am not sure what level of quality we are aiming for, but the "Use case" section would never ever be accepted under my watch in a project where I was in charge, because there is no substance. You should not take this personal, but your use case doesn't actually describe a use case; it just describes some qualities of a human (i.e. someone who cares about security). I didn't even read more of the proposal because of that.

Having said that, as long as you don't break anything pre existing, I am not against adding features, since such developments need to be used and refined over periods of years anyway. No first design is going to be great, unless someone has implemented the exact same thing already before, but given the ever changing technology space, that seems unlikely.

In short, my opinion is that we should basically go for a let's throw code at the wall system and see what sticks approach under the condition that the code is documented in the source code as well as in the manual with the understanding that it is always possible to create a new system based on whatever is learned. I.e., deprecation could happen at some time.

0xABAB commented Jun 30, 2017

@Ekleog I am not sure what level of quality we are aiming for, but the "Use case" section would never ever be accepted under my watch in a project where I was in charge, because there is no substance. You should not take this personal, but your use case doesn't actually describe a use case; it just describes some qualities of a human (i.e. someone who cares about security). I didn't even read more of the proposal because of that.

Having said that, as long as you don't break anything pre existing, I am not against adding features, since such developments need to be used and refined over periods of years anyway. No first design is going to be great, unless someone has implemented the exact same thing already before, but given the ever changing technology space, that seems unlikely.

In short, my opinion is that we should basically go for a let's throw code at the wall system and see what sticks approach under the condition that the code is documented in the source code as well as in the manual with the understanding that it is always possible to create a new system based on whatever is learned. I.e., deprecation could happen at some time.

@Ekleog

This comment has been minimized.

Show comment
Hide comment
@Ekleog

Ekleog Jun 30, 2017

@0xABAB: Thanks for the comment!

For the 鈥淯se Case鈥 section, given the summary section I don't really know how to make it more explicit that the described module is 鈥渓ike containers., but with security over speed鈥... would you have any better idea of wording for this section? I've pushed a tentative change, how do you feel about it?

Ekleog commented Jun 30, 2017

@0xABAB: Thanks for the comment!

For the 鈥淯se Case鈥 section, given the summary section I don't really know how to make it more explicit that the described module is 鈥渓ike containers., but with security over speed鈥... would you have any better idea of wording for this section? I've pushed a tentative change, how do you feel about it?

@Ekleog Ekleog referenced this pull request Jun 30, 2017

Closed

[RFC 0009] Nix rapid release #9

@zimbatm

馃憤 overall.

I would like to see more in detail how the qemu VM gets passed to the VM. Potentially it could be a path so arbitrary VMs can be run, and leave the incremental NixOS updates for later.

Also one thing that comes to mind is, would it be possible to take the libvirt configuration options as an inspiration? My guess is that they already played the game of common denominator.

Show outdated Hide outdated rfcs/0012-declarative-virtual-machines.md
Show outdated Hide outdated rfcs/0012-declarative-virtual-machines.md
Show outdated Hide outdated rfcs/0012-declarative-virtual-machines.md
diskSize = 10240; # Size (in MiB) of the disk image excluding shared paths
# and store
memorySize = 1024; # Size (in MiB) of the RAM allocated to the VM
vcpus = 1; # Number of virtual CPUs the VM will see

This comment has been minimized.

@zimbatm

zimbatm Jul 2, 2017

Member

These are VM properties. How is the VM configuration selected?

@zimbatm

zimbatm Jul 2, 2017

Member

These are VM properties. How is the VM configuration selected?

This comment has been minimized.

@Ekleog

Ekleog Jul 27, 2017

Wow, I was sure I had a paragraph about it. Added it now, thanks!

@Ekleog

Ekleog Jul 27, 2017

Wow, I was sure I had a paragraph about it. Added it now, thanks!

Show outdated Hide outdated rfcs/0012-declarative-virtual-machines.md
Show outdated Hide outdated rfcs/0012-declarative-virtual-machines.md
@Ekleog

This comment has been minimized.

Show comment
Hide comment
@Ekleog

Ekleog Jul 27, 2017

Thanks for the comments! 馃憤

I'm assuming the renames you proposed were the renaming to libvirt's configuration options? I must say I have never configured a VM using libvirt, only ever used VMs configured by others for libvirt, so... :)

Ekleog commented Jul 27, 2017

Thanks for the comments! 馃憤

I'm assuming the renames you proposed were the renaming to libvirt's configuration options? I must say I have never configured a VM using libvirt, only ever used VMs configured by others for libvirt, so... :)

@Ekleog

This comment has been minimized.

Show comment
Hide comment
@Ekleog

Ekleog Feb 10, 2018

So, with RFC 0018 failing to make the RFC process smoother, I give up on this PR. I've sent my current implementation of it as a PR to nixpkgs.

I'll leave this RFC open in case someone else wants to take it over, but feel free to close if you don't think it brings any good :)

Ekleog commented Feb 10, 2018

So, with RFC 0018 failing to make the RFC process smoother, I give up on this PR. I've sent my current implementation of it as a PR to nixpkgs.

I'll leave this RFC open in case someone else wants to take it over, but feel free to close if you don't think it brings any good :)

@FRidh

This comment has been minimized.

Show comment
Hide comment
@FRidh

FRidh Feb 11, 2018

Member

I have not looked at this RFC before because I have not had the need for it. There has been mostly positive feedback on the RFC and some negative on the underlying tool. I agree with an earlier comment that the motivation section may be improved by e.g. considering emulation. But that's just a minor improvement. Aside from that, I think we should proceed with this. As long as the new module has maintainers and does not break NixOS (it's just an addition) I see no reason of not including it.

Regarding the RFC process. As long as there has been a long enough period to give feedback, and there are no objections, I see no reason why it cannot be accepted. Not every maintainer/contributor needs to comment or approve although I suppose that is something for in #18. In any case, I'd say this has been long enough out here that it should go in. cc @zimbatm @domenkozar

Member

FRidh commented Feb 11, 2018

I have not looked at this RFC before because I have not had the need for it. There has been mostly positive feedback on the RFC and some negative on the underlying tool. I agree with an earlier comment that the motivation section may be improved by e.g. considering emulation. But that's just a minor improvement. Aside from that, I think we should proceed with this. As long as the new module has maintainers and does not break NixOS (it's just an addition) I see no reason of not including it.

Regarding the RFC process. As long as there has been a long enough period to give feedback, and there are no objections, I see no reason why it cannot be accepted. Not every maintainer/contributor needs to comment or approve although I suppose that is something for in #18. In any case, I'd say this has been long enough out here that it should go in. cc @zimbatm @domenkozar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment