Kimberlize

bgilbert edited this page Feb 19, 2013 · 3 revisions
Clone this wiki locally

Kimberlize System

The Kimberlize system provides the ability for a user to execute applications on computers with no prior installation through the use of virtual machine technology. A virtual machine running the desired application is created beforehand by the user in anticipation of future usage on other machines. Then at runtime it is assembled and resumed on-the-fly on the target machine.

We chose to use the VirtualBox virtual machine monitor as our virtualization platform. VirtualBox is an open-source virtualization product of the German software company Innotek, who was recently purchased by Sun Microsystems.

Since transferring up to tens of gigabytes of virtual machine state may be infeasible at runtime, the Kimberlize preparation process separates from the large virtual machine state the much smaller delta created as a result of the application's installation and subsequent execution. It accomplishes this goal using the snapshotting and differencing disk features of VirtualBox. A user creating a Kimberlized application must first prepare a base virtual machine. Then he executes the kimberlize script, which installs the desired application in this virtual machine, preserves the new virtual machine state created, returns the virtual machine to its base state, and outputs a file representing the binary difference in state after application installation and execution.

Administrative Setup

From the user or administrator's perspective, preparation for the Kimberlize process appears as such:

  1. Load VirtualBox's GUI and create a new Virtual Machine.
    1. Begin by clicking the "New" button and launching the wizard.
    2. Choose a unique name and any type of operating system for the VM.
    3. Create a new boot Hard Disk for the virtual machine, or use an existing boot disk from a past Kimberlize execution. If creating a new boot hard disk, it is recommended to use a dynamically expanding image to reduce the state size. This feature turns on binary differencing of the disk within the VirtualBox runtime and minimizes the disk size.
    4. Finish the virtual machine creation wizard.
  2. Resume the virtual machine.
    1. Install an operating system on the boot disk (if necessary). Fedora Install
    2. Add a user "kimberley" with password "kimberley" to be used as an ssh entrance point for remote script execution by Kimberlize. The password is irrelevant since the virtual machine sits behind a masquerading firewall.
    3. Gain root access with sudo, and using visudo add user "kimberley" to /etc/sudoers with the NOPASSWD option, to allow the user to update the virtual machine through its package manager. If the guest is not Linux, provide Administrator access for the kimberley user.
    4. Update the operating system and applications from the base install, using the appropriate update tool.
    5. Install openssh-server in the virtual machine.
    6. (optional) Add any public SSH keys of the host to the authorized keys of the virtual machine to remove password prompts during the kimberlize process.
    7. (optional) Install VirtualBox "Guest Additions" for various enhancements, such as improved mouse interaction.
    8. (optional) Disable the screensaver and power-saving features in the guest.
    9. Configure the operating system to use internet sources for updates rather than the DVD. In Ubuntu, this involves changing Synaptic's update sources.
    10. (optional) Set the default screen resolution. Unnecessary if the guest is set to automatically resize to the host resolution, but that feature is not fully supported in VirtualBox yet, and may not work in all cases.
    11. (optional) Add icons which execute mounting and unmounting of the floppy disk.
    12. Pause the virtual machine, saving state.

Then, a user wishing to Kimberlize a new application must also create two scripts: one represents the series of commands executed in the guest that may set up repositories, fetch data and install the required packages for this application to execute; the second represents the commands necessary to execute this application. The scripts are copied into and execute within the virtual machine, and may contain any commands excluding those which may terminate the remote script execution as a side effect, such as rebooting the virtual machine.

kimberlize Command

Syntax

The Kimberlize process can then be executed. The user executes this on the command line by specifying the base virtual machine to use, and the installation and execution scripts to execute inside the guest. The command syntax is:

kimberlize [-e [-k encryption-key-file]] [-l username] [-n] <vm-name> <install-script> <execute-script>

and may contain other options, such as -n to disable LZMA (Lempel-Ziv-Markov Chain) compression of the Kimberlize patch, or -l to choose a different username for ssh logins into the virtual machine.

Internals

The Kimberlize script performs the following actions:

  1. Create a checkpoint. This checkpoint logically divides the base virtual machine state from the application state by forcing all disk updates into new differencing files rather than back to the dynamically expanding boot disk image.
  2. Configure VirtualBox to forward port 2222 in the host to port 22 in the guest, to allow ssh access to the virtual machine.
  3. Resume the virtual machine.
  4. Run the install script with ssh, waiting for it to complete.
  5. Run the application execution script with "ssh -f" in the background.
  6. Wait a few seconds (5 at the moment) as the application initializes.
  7. Suspend the virtual machine.
  8. Create a directory /tmp/vm_name/ and copy the VirtualBox.xml global configuration file and vm_name.xml vm-specific configuration file into it. This directory will the same structure as a directory containing a VirtualBox virtual machine, which allows the dekimberlize process to untar directly over a base virtual machine.
  9. Create a directory /tmp/vm_name/Snapshots/, in the same manner that VirtualBox has a Snapshots/ subdirectory of virtual machines.
  10. Execute xdelta, a tool used for taking the binary difference of the post-install in-memory state versus the checkpointed in-memory state. This represents the new in-memory state generated as a result of the application's install. It is not necessary to use xdelta on the disk state since VirtualBox already keeps a compact differencing disk. The output is stored with the current memory state's filename in /tmp/vm_name/Snapshots/{memory_state's_uuid}.diff Note: xdelta's internal compression is disabled for this step.
  11. Copy the binary differencing disk created since the snapshot to /tmp/vm_name/Snapshots/
  12. Revert the virtual machine to its snapshot, effectively discarding all new state (including the snapshot). The virtual machine is now identical to its state before performing Kimberlize.
  13. Remove the port forwarding to the virtual machine.
  14. Create a tarball of /tmp/vm_name and output it as app_state.tar in the working directory. The file can be renamed to anything as long as the extensions .tar and .tar.lzma are preserved, which dekimberlize interprets.
  15. (optional) Compress the tarball using lzma (the Lempel-Ziv-Markov Chain algorithm).
  16. (optional) Encrypt the compressed tarball with an AES-128 cipher (-e option to kimberlize), using a key either supplied on the command line (with -k), or generated using random data from /dev/urandom. If the key is generated, it is output to a file and the user is notified of its location. openssl performs the encryption, and it provides a way to deterministically generate AES keys seeded from passphrases provided by the user. Thus the command-line option to kimberlize should be a filename containing text with the passphrase as on the first line.

A user would then take the tarball produced and store it on the machine on which it will be applied, or make it publically available through a webserver. When the user attempts to use this patch, he will call the dekimberlize script, supplying either the filename of the kimberlize patch or the URL at which it could be found. dekimberlize performs the inverse operation of kimberlize, taking a kimberlize patch and applying it to a base virtual machine to produce one with the desired application installed.

dekimberlize Command

Syntax

The syntax of the dekimberlize command is:

dekimberlize [-a floppy_image_path] [-d encryption_key_file] <-f || -i> <kimberlize_path> <vm_name>

The dekimberlize command takes one of two options for specifying the VM overlay: -f if the file is stored locally on the machine, or -i if it can be found at a URL. Additionally, a -a option allows the user to attach a floppy disk image to the VM once it is running, and a -d option indicates that the VM overlay is encrypted and can be decrypted using the key located at the supplied filename.

Internals

The dekimberlize script performs the following actions:

  1. Fetch the VM overlay, if indicated as a URL via the -i command-line option.
  2. Decrypt the VM overlay, if a decryption key suppled via the -d command-line option. openssl is called to perform this task.
  3. Uncompress the VM overlay, if the filename's extension indicates it is compressed.
  4. Rename the virtual machine's in-memory state file to its checkpointed filename, so it is not blown away during step 4. Remember, this represents the base virtual machine's in-memory state.
  5. Untar the kimberlize patch over the virtual machine's directory: $HOME/.VirtualBox/Machines/vm_name/
  6. Apply the binary difference file {memory_state_uuid}.diff against the base in-memory state to recreate the file {memory_state_uuid}.sav that contains the running application. xdelta is used for this.
  7. Resume the virtual machine.
  8. Wait until VirtualBox indicates the virtual machine is running.
  9. If supplied, optionally attach a floppy disk image file as a floppy disk in the VM.
  10. Wait until the virtual machine is no longer in use and the user has suspended or powered it down. Alternatively, creating the file /tmp/dekimberlize_finished indicates to dekimberlize that usage has completed.
  11. Power off the machine.
  12. Revert the machine to its checkpoint, much like as in the kimberlize, restoring the base virtual machine state.
  13. Delete {memory_state_uuid}.diff

Since the dekimberlize process discards new state and reverts to a checkpoint stored in the kimberlize patch, it is an idempotent operation and can be applied successively with no side effects.

If the user would like to run the virtual machine on a different host than the one the Kimberlize script was run on, he can move the base virtual disk file and the virtual machine directory to similar directories on another host. However, due to technical limitations in VirtualBox, a virtual machine with snapshots cannot currently be safely moved between machines. This limitation is targetted for future resolution by innotek.

Appendix: VirtualBox Directory Structure

Kimberlize depends on VirtualBox's directory structure representation for virtual machines and their snapshots.

Path Purpose
$HOME/.VirtualBox/ Contains user-specific VirtualBox data.
$HOME/.VirtualBox/VirtualBox.xml VirtualBox's global configuration file, which contains information about virtual disks.
$HOME/.VirtualBox/VDI/*.vdi Virtual disks used in creating new virtual machines.
$HOME/.VirtualBox/Machines/ Virtual machine-specific data, excluding the virtual disks.
$HOME/.VirtualBox/Machines/vm_name/ A specific virtual machine's data, including checkpoints.
$HOME/.VirtualBox/Machines/vm_name/vm_name.xml VM-specific configuration file.
$HOME/.VirtualBox/Machines/vm_name/Snapshots/ Used to store in-memory states and differencing virtual disks created as a result of taking snapshots.
$HOME/.VirtualBox/Machines/vm_name/Snapshots/*.sav Represents in-memory state of a checkpoint (or current state).
$HOME/.VirtualBox/Machines/vm_name/Snapshots/*.vdi Represents virtual disk differences of a checkpoint (or current state).