Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0012] Declarative virtual machines #12

Closed
wants to merge 15 commits into from

Conversation

@Ekleog
Copy link
Member

@Ekleog Ekleog commented Apr 8, 2017

Here is a proposition of a NixOS module to declaratively handle virtual machines.

Rendered: https://github.com/Ekleog/nixos-rfcs/blob/virtual-machines/rfcs/0012-declarative-virtual-machines.md

Having started work on it quite a while ago (2017-02-25 acc. to my git history), I have something working that almost fits this RFC in a ~600 lines module file and a few additions of functions to the nixpkgs lib (for auto-generating the IPs, as they could be used elsewhere), so concerns like "this will be hard to implement" should not be too much of an issue :) (that's not saying I'd drop the support if things needed to change, I'm perfectly willing to rework it so that it fits the bill)

Hope this may help!

cc @Nadrieril, and thank you for review and harsh criticism whenever I did something wrong 馃憤

Edit: Adding a link to the current state of the implementation: NixOS/nixpkgs@master...Ekleog:vms

Ekleog added 11 commits Apr 1, 2017
@Ekleog Ekleog changed the title Virtual machines [RFC 0012] Virtual machines Apr 8, 2017
@Ekleog Ekleog changed the title [RFC 0012] Virtual machines [RFC 0012] Declarative virtual machines Apr 10, 2017
@copumpkin
Copy link
Member

@copumpkin copumpkin commented May 8, 2017

I haven't read it in depth, but love the idea so far. Have you considered trying to abstract over containers and VMs using libvirt?

@Ekleog
Copy link
Member Author

@Ekleog Ekleog commented May 8, 2017

That's a really good point: we considered using libvirt to abstract over different kinds of virtualizers, but figured out that only one was actually needed, and the module in itself was already a "virtualizer abstraction layer". As it also adds one level of potential failure and, mostly, I don't know anything about libvirt, at the time the project started I went with qemu.

Now, abstracting also over containers would actually bring net value, by more or less merging the two modules together, and as they have different objectives (lightweight vs. heavyweight virtualization) it makes sense to have both.

However, I'm not sure the implementation could share much, exactly because the goals are different. I'll take the features described in the RFC and try to see how it could be merged with a containers implementation:

  • memorySize, vcpus: same as for containers
  • diskSize doesn't seem to make sense for containers [1]
  • Managing the disk of the guest (mostly its /nix/store): containers being lightweight, it makes sense for them to mount directly over the host's store, however this would leak information between VMs if used for VMs, hence the process to mount the store must differ (and is one of the major pain points, at least in my current implementation [2])
  • Shared directories: same as for containers
  • Networking (the second major pain point) appears quite different than what is currently proposed with containers to me, especially given containers.<name>.privateNetwork that would be very hard to impossible to replicate with libvirt
  • Security part (ie. running qemu as non-root) is irrelevant for containers
  • Nix support is unimplemented for the time being for VMs, I don't know its exact state for containers, could be a great way to share code (but I was thinking nix 1.12 and especially nixos-prepare-root would bring the same improvements)

Besides, I just learned while writing this answer that imperative containers rely on the inner working of the containers module [3], so replacing the containers module with something based on libvirt would be highly nontrivial, I guess.

Just to get an idea, I counted the lines of code in [2] that could be shared with containers using libvirt (assuming it's equally hard to write modules for qemu and libvirt). The answer is (excluding option definitions) 24: 2 for memorySize and vcpus and 22 for shared.

Obviously, this count is biased towards lower values: the current implementation does only the strict minimum and future additional features may be more factorizable than the current ones.

However, I think there is a wide enough gap between containers and VMs to make it maybe even harder to factor the two than to maintain a separate implementation for each. Do you see another place for factorization I missed?

[1] https://libvirt.org/drvlxc.html

[2] NixOS/nixpkgs@master...Ekleog:vms

[3] NixOS/nixpkgs#3021

@Ekleog
Copy link
Member Author

@Ekleog Ekleog commented May 8, 2017

Jumping to another subject while I'm at it: I brought this link to the ML [1].

From there two major points emerged.

First one, there are not enough tools to share code between different VM appliances. In my opinion, this one will be in a big part solved by nixos-prepare-root after nix 1.12, the remainder being mostly virtualization-system-specific. Then, it's the same point as the one you are raising, and I wonder if I'm not just not seeing possibilities for code factorization.

The second one was about the format for disk management and boot control. I will take elements back from [2]. The basic requirement is that the host be able to control the guest's boot, mostly for upgrades and downgrades. This can be achieved in three ways:

  1. Having the guest boot, then wait in the initrd for its configuration, have the host push it, then the guest can continue booting. This requires nothing special on the guest's FS.
  2. Having the guest's /nix on a separate .qcow2 image. This way, the host can decide to stop the guest, upgrade the store, then restart the guest.
  3. Having the guest's /nix on a virtfs. This way, the host can upgrade the guest's store in-place.

There are drawbacks to all of these options:

  1. makes the boot scheme complex to understand, and risks duplicating some behaviour between the dropbear in the initrd and the ssh daemon outside of it, in order to handle online upgrades
  2. makes it really complex to handle online upgrades
  3. may be slower and less stable (even though I haven't experienced unstability during the development), and makes it hard to control the guest's store size.

Besides, options 1 and 2 would I think be made way easier by the presence of nixos-prepare-root, which hints for waiting for it to be released with nix 1.12.

As you will have understood, my current favorite is option 3, and after the coming of nixos-prepare-root, option 1 would I think have my vote.

Do you see another decision argument? Another scheme for handling upgrades and downgrades effectively? What would be your choice?

[1] https://www.mail-archive.com/nix-dev@lists.science.uu.nl/msg36307.html

[2] https://www.mail-archive.com/nix-dev@lists.science.uu.nl/msg36354.html

@danbst
Copy link

@danbst danbst commented May 26, 2017

What probably is lacking, is support for multiple disks. For example, NixOps project allows this for virtualbox backend, and same is requested for libvirtd backend.

@Ekleog
Copy link
Member Author

@Ekleog Ekleog commented May 26, 2017

Hmm, you are right and this feature can be useful/needed (to take back the example you gave on IRC, for having multiple disks on multiple VMs for redundancy for openstack).

However, I think it is more important to have something agreed on on which we can later on build additional features, than trying to have something perfect from the beginning on -- the important point being that all features that are included in a RFC should have a very low probability of changing, as they will eventually enter a stable version and stable versions should be as backwards-compatible as possible.

So I think your point is a great one, and we should definitively investigate handling multiple disks, but maybe after the basics are agreed on? :)

Unless other people would want it right now, I'm putting this as an unresolved question for the time being. This way it will stay on the "future works" step after this RFC, so that choice of the correct interface doesn't slow down having a first working version.

@joepie91
Copy link

@joepie91 joepie91 commented May 26, 2017

Just throwing in my two cents concerning libvirt - in my experience of attempting to use it, it's rather temperamental, and has a habit of producing utterly incomprehensible, difficult-to-debug error messages whenever anything goes wrong.

I can't speak for others, but personally I've started just ignoring the existence of libvirt - since for most cases it seems to add more work trying to figure out why it's breaking, than it's saving me effort on any other point.

@teh
Copy link

@teh teh commented May 28, 2017

Another data point from me: I had some bad experiences with libvirt - daemon deadlocking and becoming unresponsive, weird error messages, VM corruption. The outstanding bugs are supporting evidence. Abstracting over virtualization is hard problem so maybe I'm too hard on libvirt :)

@Ekleog
Copy link
Member Author

@Ekleog Ekleog commented Jun 13, 2017

OK, so maybe I'm being biased, but it has been open for more than two months and has seen, as far as I can see, mostly positive support (with some propositions for refactoring that don't seem to have caught much traction).

Maybe it would be time to get to a final comments period (idea inspired from Rust's RFCs), where last comments could be raised and the decision would be announced after a given time if no game-changing comment turns up? (cc @zimbatm)

@0xABAB
Copy link

@0xABAB 0xABAB commented Jun 30, 2017

@Ekleog I am not sure what level of quality we are aiming for, but the "Use case" section would never ever be accepted under my watch in a project where I was in charge, because there is no substance. You should not take this personal, but your use case doesn't actually describe a use case; it just describes some qualities of a human (i.e. someone who cares about security). I didn't even read more of the proposal because of that.

Having said that, as long as you don't break anything pre existing, I am not against adding features, since such developments need to be used and refined over periods of years anyway. No first design is going to be great, unless someone has implemented the exact same thing already before, but given the ever changing technology space, that seems unlikely.

In short, my opinion is that we should basically go for a let's throw code at the wall system and see what sticks approach under the condition that the code is documented in the source code as well as in the manual with the understanding that it is always possible to create a new system based on whatever is learned. I.e., deprecation could happen at some time.

@Ekleog
Copy link
Member Author

@Ekleog Ekleog commented Jun 30, 2017

@0xABAB: Thanks for the comment!

For the 鈥淯se Case鈥 section, given the summary section I don't really know how to make it more explicit that the described module is 鈥渓ike containers., but with security over speed鈥... would you have any better idea of wording for this section? I've pushed a tentative change, how do you feel about it?

Copy link
Member

@zimbatm zimbatm left a comment

馃憤 overall.

I would like to see more in detail how the qemu VM gets passed to the VM. Potentially it could be a path so arbitrary VMs can be run, and leave the incremental NixOS updates for later.

Also one thing that comes to mind is, would it be possible to take the libvirt configuration options as an inspiration? My guess is that they already played the game of common denominator.

management](#disk-management))
* Each VM is run as a (set of) systemd service, and can be rebooted using
`systemctl restart vm-${name}.service`
* `qemu` fetches the kernel and initrd directly from the guests' store

This comment has been minimized.

@zimbatm

zimbatm Jul 2, 2017
Member

isn't it fetched from the host's store?

This comment has been minimized.

@Ekleog

Ekleog Jul 27, 2017
Author Member

You're right, my current implementation fetches from the host's store. Then the kernel in the guest's store is a copy from this kernel, so I hope it doesn't have a big impact? Anyway, changing for host's store, as it's easier to implement and should not have any visible side-effect (I guess) :)

```nix
{
vms = {
path = "/path/to/dir"; # Path into which to store persistent data (disk

This comment has been minimized.

@zimbatm

zimbatm Jul 2, 2017
Member

naming: usually named stateDir and pointing to /var/lib/something as a default

This comment has been minimized.

@Ekleog

Ekleog Jul 27, 2017
Author Member

Renamed ; and I had completely forgotten to write about default values :)

path = "/path/to/dir"; # Path into which to store persistent data (disk
# images and per-vm store)
rpath = "/runtime/dir"; # Path for temporary non-user-facing low-size data,
# like IPC sockets

This comment has been minimized.

@zimbatm

zimbatm Jul 2, 2017
Member

naming: socketDir or runtimeDir?

This comment has been minimized.

@Ekleog

Ekleog Jul 27, 2017
Author Member

Renamed socketDir as in the current implementation it contains only sockets :)

diskSize = 10240; # Size (in MiB) of the disk image excluding shared paths
# and store
memorySize = 1024; # Size (in MiB) of the RAM allocated to the VM
vcpus = 1; # Number of virtual CPUs the VM will see

This comment has been minimized.

@zimbatm

zimbatm Jul 2, 2017
Member

These are VM properties. How is the VM configuration selected?

This comment has been minimized.

@Ekleog

Ekleog Jul 27, 2017
Author Member

Wow, I was sure I had a paragraph about it. Added it now, thanks!

In order to do this, a possible way to do so is to mount:
* `/` as a filesystem on a qcow2 image
* `/nix/store` as a virtfs onto a directory on the host, in order to easily
handle setup and upgrades from the host

This comment has been minimized.

@zimbatm

zimbatm Jul 2, 2017
Member

wouldn't that defeat the aforementioned purpose of using VMs over containers?

This comment has been minimized.

@Ekleog

Ekleog Jul 27, 2017
Author Member

I didn't mention explicitly that the directory on the host it's mapped to is not the host's /nix/store, fixed now, thanks!

addHostNames = true; # Whether to add the VMs in /etc/hosts of each other,
# under the vm-${name}.localhost name and
# host.localhost

This comment has been minimized.

@zimbatm

zimbatm Jul 2, 2017
Member

I would suggest to group the networking options into it's own vms.networking for clarity purposes.

This comment has been minimized.

@Ekleog

Ekleog Jul 27, 2017
Author Member

Sounds like a good idea, thanks :)

@Ekleog
Copy link
Member Author

@Ekleog Ekleog commented Jul 27, 2017

Thanks for the comments! 馃憤

I'm assuming the renames you proposed were the renaming to libvirt's configuration options? I must say I have never configured a VM using libvirt, only ever used VMs configured by others for libvirt, so... :)

@Ekleog
Copy link
Member Author

@Ekleog Ekleog commented Feb 10, 2018

So, with RFC 0018 failing to make the RFC process smoother, I give up on this PR. I've sent my current implementation of it as a PR to nixpkgs.

I'll leave this RFC open in case someone else wants to take it over, but feel free to close if you don't think it brings any good :)

@FRidh
Copy link
Member

@FRidh FRidh commented Feb 11, 2018

I have not looked at this RFC before because I have not had the need for it. There has been mostly positive feedback on the RFC and some negative on the underlying tool. I agree with an earlier comment that the motivation section may be improved by e.g. considering emulation. But that's just a minor improvement. Aside from that, I think we should proceed with this. As long as the new module has maintainers and does not break NixOS (it's just an addition) I see no reason of not including it.

Regarding the RFC process. As long as there has been a long enough period to give feedback, and there are no objections, I see no reason why it cannot be accepted. Not every maintainer/contributor needs to comment or approve although I suppose that is something for in #18. In any case, I'd say this has been long enough out here that it should go in. cc @zimbatm @domenkozar

@Ekleog
Copy link
Member Author

@Ekleog Ekleog commented Dec 14, 2018

As #36 is moving forward, before too much energy is invested here I want to point out that I am no longer happy with the design of the options in general, and with vms.networking in particular.

Basically, I now think that the module should be a much thinner wrapper around qemu (or similar) than it currently is, and leave the user able to define whatever networking scheme and whatever disk configuration scheme they want.

However, I do not have much time currently to review the option set and adapt/re-do the implementation work (which has by now bit-rotted) and this has become quite low-priority for me, so I will not be actively pushing this forward for a while. Will add in patches people would throw at me, though.

@zimbatm
Copy link
Member

@zimbatm zimbatm commented Dec 14, 2018

In that case I will close the PR. I am sure this will all be useful for the next person who wants to tackle this.

@zimbatm zimbatm closed this Dec 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants
You can鈥檛 perform that action at this time.