Using Virtio (QEMU)

Dmitry Golubovsky edited this page Dec 29, 2016 · 1 revision

Using Virtio

With the recent merges, Harvey is now capable of using the Virtio capabilities provided by QEMU. Here is a brief guide how to set it up on both guest and host sides.

Virtio Devices Identification

Harvey uses the PCI representation of Virtio devices. During the kernel initialization, all PCI devices are scanned and enumerated. Separate additional enumeration is maintained for virtio devices. They are given internal names consisting of the word "virtio", device class (9p, console, etc.), and the internal number given during the initial enumeration. These names can be seen in the file /dev/irqalloc as an example below shows:

          3           0                    0                    0 trap     #BP
          7           0                    1                 1975 trap     #NM
          8           0                    0                    0 trap     #DF
         14           0                 1552           2642646005 trap     #PF
         15           0                    0                    0 trap     #15
         16           0                    0                    0 trap     #MF
         19           0                    0                    0 trap     #XF
         50          50                65193          10844815152 lapic    APIC timer
         65           1                  440            312778444 ioapic   keyb
         73          12                  956            338362551 ioapic   mouse
         89          11                 3526           1351727073 ioapic   virtio-9p-0
         89          11                 3526           1351727073 ioapic   ether0
        113          10                    1               187675 ioapic   virtio-console-2
        113          10                    1               187675 ioapic   virtio-9p-1

The order of PCI devices as presented to the quest is supposedly defined by the order of the device definitions in the QEMU command line, so it is important to remember that internal names of virtio devices as seen by the Harvey kernel are not persistent. The QEMU command line fragment corresponding to the output above is presented below:

-device virtio-9p-pci,fsdev=hrvtmp,mount_tag=harvtmp \                     [0]
-fsdev local,path=$HARVEY/usr/harvey/tmp,id=hrvtmp,security_model=none \
-device virtio-9p-pci,fsdev=fstmp,mount_tag=hosttmp \                      [1]
-fsdev local,path=/tmp,id=fstmp,security_model=none \
-device virtio-serial-pci,max_ports=1 \                                    [2]
-device virtconsole,chardev=vc02 \
-chardev socket,id=vc02,path=/tmp/vc02 \

The representation of virtio devices to userland programs may not be the same for all devices. For example, native 9p mount device operates "under the hood", and does not expose itself directly; device identification is done via mount tags. Contrarily, virtio-serial devices expose their raw virtqueues for direct file read-write operations, and are addressed by their internal name. Details of such representation will be discussed further for each device type.

Native 9p

A well-known capability of QEMU is to provide access to any host directory via 9p to a guest which has to use a specialized virtio device in order to access them. The actual implementation is based on the 9p2000.u (or .L) flavors of 9p, but the special Harvey kernel-level driver performs the necessary protocol translation by stripping extra message elements, so the rest of the Harvey kernel is not confused.

Each host share should be properly tagged in order for guests to distinguish between them. For use with Harvey, mount tags should not contain spaces and colons.

Setup on the host side

See this page for general information. Example setup is below (a fragment from a typical shell script to run Harvey in QEMU):

qemu-system-x86_64 \
-append "..."
...

-device virtio-9p-pci,fsdev=hrvtmp,mount_tag=harvtmp \
-fsdev local,path=$HARVEY/usr/harvey/tmp,id=hrvtmp,security_model=none \

...

-kernel $HARVEY/sys/src/9/amd64/harvey.32bit $*

Here, $HARVEY should point to the root directory of Harvey source tree, and /usr/harvey/tmp will be visible in Harvey as /tmp (just for example).

Setup on the Guest Side

The list of available mount tags is provided via the /dev/mtags (served by the console driver):

--r--r--r-- P 0 harvey harvey 0 Feb  7  2006 /dev/mtags

term% cat /dev/mtags
harvtmp:-
hosttmp:-

For each tag, a column separated list is provided. the first token being the tag name. Unmounted tags have a hyphen after the first colon.

In order to mount a tag, the following format of a command is used:

mount -d '#9' /dev/null /mnt/xxx harvtmp

What matters here: use the '#9' mount device in order to get proper translation of the protocol (the standard '#M' device does not support 9p2000.u, and '#9' is a "phantom" device on top of '#M' which takes care of it). The first command parameter, /dev/null can be arbitrary existing file, it will not be accessed, and is needed here just as a placeholder (cf. mount none -t tmpfs... in Linux). The next parameter is the mount point location, provided as usual. The last parameter, harvtmp is mandatory. It is the "spec" in the mount parlance, and must contain the mount tag to identify the host share to be mounted.

After the command above, the contents of /dev/mtags is now presented differently:

harvtmp:9P2000.u:131096:2:2
hosttmp:-

the first tag is now mounted, so its line now displays tag name, protocol version, message size, PID cache use and PID cache hits numbers (the PID cache is needed to provide proper ordering of 9p messages transmitted over the virtqueue as the QEMU native 9p implementation requires).

Known bugs

  • Unmounting of a tag is not properly detected, so even after unmount /mnt/xxx the contents of /dev/mtags remains unchanged.

  • If a host directory contains "non-regular" files (e. g. sockets) reading the directory contents causes "malformed stat buffer" error.

Virtio-serial-pci Raw Virtqueues

The Virtio Serial Port device is provided by QEMU for arbitrary stream-like exchange of information. It can be connected with a pipe-like resource (e. g. an Unix socket, or a TCP connection) on the host side. Guest provides a buffer to be written to the pipe, and it will be read on the host side; consequently the information written by the host program into the pipe will be returned in the read buffer to the guest. The device has one limitation that it is not possible to know how much of the read buffer was filled by the host program. So whatever is the size of the read buffer provided by the guest, the same number will always be returned by the read operation even if the host modified nothing in the buffer.

The virtio-serial-pci devices are presented under /dev/virtcon, one file per virtqueue:

--rw-rw-rw- C 0 harvey harvey 0 Feb  7  2006 /dev/virtcon/virtio-console-2

Note the file name: it matches the internal PCI device name as shown in the /dev/irqalloc example earlier. The file can be opened, read, written, closed as usual. Seek is not supported, and the offset parameter of read/write is ignored.

The host setup looks like this:

-device virtio-serial-pci,max_ports=1 \   [line 1]
-device virtconsole,chardev=vc02 \        [line 2]
-chardev socket,id=vc02,path=/tmp/vc02 \  [line 3]

Even though QEMU provides multi-port feature with virtio serial ports (multiple virtqueues per controller), Harvey does not use it. The interrupt bit is set per device rather than per virtqueue, and if a device has multiple virtqueues, additional step is required to find which queue caused an interrupt which makes interrupt processing longer. So it is more feasible to have N serial devices each with one port than one serial device with N ports.

Line 1 in the example above defines the controller. The property max_ports=1 is recommended as it limits the number of virtqueues created per controller, reduces PCI scan time, and interrupt processing time. Line 2 names the serial device and its associated host pipe resource. It is necessary to use virtconsole rather than virtio-serial-port because max_ports=1 sets the upper limit for port index to 0, and QEMU associates port 0 with virtual console specifically. Line 3 defines the pipe resource on the host (chardev= on line 2 should be the same as id= on line 3). A socket can be in either client (as shown) or in server mode (refer to the QEMU documentation for details).

Write operations on a virtio serial port whose host resource is not connected will lose information, read operations on such resource will block. Beware that malfunctioning host program connected to the resource (not returning from a read-write operation) may block entire QEMU, but this is not a Harvey limitation.

9p over Virtio-serial-pci

The native 9p under QEMU has limitation that it cannot work with an arbitrary host program which would serve 9p on its standard input/output. Using virtio serial ports for 9p makes it possible. In this example we use ufs to serve a host directory via 9p over a virtio serial port, but any other compliant program can be used. Host file system access over virtio serial ports is generally slower and less stable than over native 9p.

Setup on the host side

Define a Unix socket in client mode, like shown in the example above. Before starting QEMU with Harvey, make sure that ufs is serving the desired host directory:

rm -f  /tmp/vc02

$HARVEY/util/ufs -ntype=unix -addr=/tmp/vc02 -root=/tmp -debug=1 &
nppid=$!

sleep 1

Setup on the guest side

The virtio serial port device limitation (no way to know actual number of information provided in the read buffer) makes it impossible (at least with existing devmnt) to directly mount a virtqueue file: devmnt validates the 9p messages received, and at least with Rversion, it rejects messages whose actual (returned by read) length differs from the length encoded in the first 4 octets.

A very simple userland program, 9pvpxy (9p virtio proxy) was added to Harvey to enable proper 9p message handling that works around the said device limitation. The program takes a single parameter with the virtio serial port virtqueue file name, and no other options. From the kernel standpoint, this program forms a 9p server over its standard input/output which makes it possible to work with srv and mount as shown below:

srv -e '9pvpxy /dev/virtcon/virtio-console-2' vc2
mount -c -n /srv/vc2 /mnt/xxx

After these commands, the host directory served by ufs on the host side is visible under the mountpoint chosen on the guest side.

Discussion

It might be technically possible to modify the driver for the '#9' device to work with virtio serial port virtqueues as well, keeping both mechanisms under the kernel hood. On the other hand, virtio serial ports can be viewed as a generic guest-host pipes usable for various purposes (e. g. the Spice remote guest viewer uses them as control channels). It is more logical to expect userland programs on both sides communicating over such pipes rather than to restrict them for filesystem services within the kernel only. Future usage practice will show the correctness of such assumption.