3. The structure of the Hypervisor

# 3.1. The Hypervisor Kernel

## 3.1.1. Booting into the Kernel

The hypervisor is built into a binary which is loaded into memory and booted from by using the Pre-boot Execution Environment (PXE) method. This method will deploy the binary through network, the client communicating through the Network Interface Controller (NIC) with the PXE server, and the server, being previously configured to deploy the binary, will send it to the virtual machine, where the Basic Input Output System (BIOS) will load this binary into memory. The PXE booting capability uses the Multi Boot standard, which specifies the boot sequence and will give control to the Hypervisor entry point, found in the binary file of the Hypervisor. Multi Boot Specification needs a header structure to be completed in the binary in order to boot correctly. Thus, a binary which needs to boot through Multi Boot will have to complete in the section *“.boot”* of the binary, at the address given by a variable defined in the binary, named MULTIBOOT\_BASE, from the beginning of the section, the following structure, as given in the documentation of Multi Boot Specification in section 3.1.1:

|  |  |  |
| --- | --- | --- |
| Offset | Type | Field Name |
| 0 | DWORD | magic |
| 4 | DWORD | flags |
| 8 | DWORD | checksum |
| 12 | DWORD | header\_addr |
| 16 | DWORD | load\_addr |
| 20 | DWORD | load\_end\_addr |
| 24 | DWORD | bss\_end\_addr |
| 28 | DWORD | entry\_addr |
| 32 | DWORD | mode\_type |
| 36 | DWORD | width |
| 40 | DWORD | height |
| 44 | DWORD | depth |

The following fields in the Multi Boot Header are optional, depending on setting the 16th bit in the flags field: *header\_addr, load\_addr, load\_end\_addr, bss\_end\_addr, entry\_addr*. The following fields in the Header are optional, depending on setting the 2nd bit in the flags field: *mode\_type, width, height, depth*. The magic, flags and checksum fields are not optional and must be completed properly in order to boot through Multi Boot Standard. The *magic* field must be completed to the fixed hexadecimal value of 0x1BADB002. For the purpose of booting the Hypervisor, the 16th flag will be set, that is because the *load\_addr* field must be completed with the base of the Hypervisor binary, such that the binary will be located in memory at a fixed physical base. The *entry\_addr* field must also be set in order to give the Multi Boot Loader the address of the entry point in the Hypervisor code, so that after executing the Multi Boot booting sequence, the Hypervisor code gets executed directly. The *checksum* field is a measure of verification of the Multi Boot Header, and this must be equal to the negation of the sum of the first two fields, namely *magic* and *flags*.

Prior to giving control to the code pointed by the address present in the header as specified, the Multi Boot Loader will complete a structure with different boot information gathered from the BIOS, such as the RAM memory map and some PCI devices needed in the early stage of booting. When the Multi Boot Loader is done completing this structure and is ready to give the control to the booted code, it will put in the EBX register the pointer to this structure and in the EAX register a value which is called MULTIBOOT\_LOADER\_MAGIC, equal to 0x2BADB002 in hexadecimal, which should be checked by the loaded code in order to verify if the boot was done indeed by the Multi Boot Loader and it was successful.

The Multi Boot Loader will give control to the code by first getting the CPU in 32 bits protected mode, and thus, the code present at the address pointed by the *entry\_addr* must be written in 32 bits assembly code. We will talk in the following subchapters about all the transitions that the CPU can make, as we’ll embark into a journey from 32 bits into 16 bits code, and then back into 32 bits, and finally go into 64 bit mode, where, in fact, the Hypervisor code will execute most of the time.

## 3.1.2. The E820 Memory Map

The physical memory is a system of interconnected addressable semiconductors, the most popular being silicon-based transistors. The physical memory is almost always volatile, that means that it will not store the information after the power is off, thus being “released”, and all the information being lost forever if not being prior saved to a non-volatile storage, such as a flash memory disk or a hard disk. The physical memory (or RAM) can be of two types: dynamic random-access memory, which is the primary storage in a computer system, and static random-access memory, which is comprised of fast CPU caches. The RAM memory is often used for storing information which is used in the current process or in the kernel of the operating system, or to store results of operations, for example, in a variable, as the registers of a CPU have a very limited size. The RAM is also the place where code is executed from, as the CPU will always fetch instruction from the RAM, or from an Instruction Cache for some modern CPUs, so one may conclude that without physical memory, there wouldn’t be anything to execute, so the CPU will not do anything. Thus, one may very easily understand that managing the memory is an essential facility of any operating system.

When powering on a computer system, one of the first “programs” that is run on it will be the BIOS. The BIOS will detect the type of RAM, the number of RAM chips and their size. A good question is: if the BIOS is a “program”, and we have concluded before that all programs are executed from the RAM, how BIOS can be executed if it doesn’t know anything about the physical memory? The answer is that BIOS is a special type of “program”, one that is called in the literature “the firmware code”. This code resides in a special area, comprised of Read Only Memory (ROM), that can be executed from, so the BIOS does not really need any knowledge of RAM in order to execute. But, after the BIOS code is executed and gives control to a Boot Loader, the BIOS will know about the physical memory and different PCI devices present in the system. The BIOS can be interrogated for this information through the 0x15 interrupt, having in EAX register the code of the information that the caller requests. For memory management, the 0xE820 code was chosen, thus the name “E820 Memory Map”.

The 0xE820 function is considered the best way to detect the memory in a computer system, being available in all computers built since 2002, and in most computer before this year. This function will detect all physical memory areas, even those above the 4 Gigabytes limit of 32-bits architectures. All the current operating systems, along with different boot loaders, even the Multi Boot Loader, load the E820 memory map in order to know the boundaries of the physical memory in the early stages. A method to get the E820 memory map will be presented below, and in the next subchapter, the method to interrogate the memory map directly from the BIOS, without a boot loader, will be shown.

The initial kernel code is booted, through Multi Boot Loader, directly into 32 bits. As presented before, Multi Boot Loader will give to the loaded code in the EBX register a pointer to a structure of early boot information needed by all kernels of all operating systems, and also needed by our Hypervisor Kernel. This structure has the following form: