x86 Bare Metal Examples
Dozens of minimal operating systems to learn x86 system programming. Tested on Ubuntu 18.04 host in QEMU 2.11 and real hardware. Userland cheat at: https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly ARM baremetal setup at: https://github.com/cirosantilli/linux-kernel-module-cheat#baremetal-setup
- 1. Getting started
- 2. Minimal examples
- 3. BIOS
- 4. No BIOS
- 5. Modes of operation
- 5.1. Legacy modes
- 5.2. Real mode
- 5.3. Protected mode
- 5.3.1. Intel protected mode example
- 5.3.2. Protected mode draw pixel
- 5.3.3. Protected mode segmentation
- 5.3.4. IDT
- 5.3.5. SMP
- 5.3.6. Paging
- 5.4. IA-32e mode
- 5.5. 64-bit mode
- 5.6. Compatibility mode
- 6. in and out instructions
- 7. Video mode
- 8. Power
- 9. UEFI
- 10. Coreboot
- 11. GRUB
- 12. Multiboot
- 13. Tests
- 14. About
- 15. Bibliography
- 15.1. Intel manual
- 15.2. Small educational projects
- 15.3. Tutorials
- 15.4. Multi collaborator websites
- 15.5. Progressive tutorials
- 15.6. Actually useful
- 15.7. ARM
- 16. LICENSE
First read this introduction: https://stackoverflow.com/questions/22054578/how-to-run-a-program-without-an-operating-system/32483545#32483545
Then on Ubuntu:
.S file on the top-level is an operating system! It gets compiled to a corresponding
Run the default OS on QEMU:
Run a given OS:
./run bios_hello_world ./run bios_putc
Examples described at:
Extensions are ignored for perfect tab completion, so all the following are equivalent:
./run min ./run min. ./run min.S ./run min.img
Use Bochs instead of QEMU:
./run bios_hello_world bochs
Then on the terminal start the simulation with:
To quit Bochs either:
press the poweroff button inside its GUI
Ctrl + C on terminal and the type
quitand hit enter
TODO: automate this step.
Insert an USB, determine its device (
sudo lsblk sudo fdisk -l
.img file that you wan to run and:
sudo dd if=bios_hello_world.img of=/dev/sdX
insert the USB in a computer
during boot, hit some special hardware dependant key, usually F12, Esc
choose to boot from the USB
When you are done, just hit the power button to shutdown.
For example, on my T430 I see the following.
After turning on, this is when I have to press Enter to enter the boot menu:
Then, here I have to press F12 to select the USB as the boot device:
From there, I can select the USB as the boot device like this:
Alternatively, to change the boot order and choose the USB to have higher precedence so I don’t have to manually select it every time, I would hit F1 on the "Startup Interrupt Menu" screen, and then navigate to:
See also: Test hardware
big.img that contains all examples that can be booted from GRUB:
Now if you do:
sudo dd if=big.img of=/dev/sdX
you can test several examples with a single USB burn, which is much faster.
You can also try out the big image on QEMU for fun with:
qemu-system-i386 -hda big.img
You will also want to change the boot order to put the USB first from the F12 BIOS menu. This way you don’t have to hit F12 like a madman every time.
TODO: boot sectors that load STAGE2 are not working with the big image chainloader. TODO why?
If you don’t have an Ubuntu box, this is an easy alternative, for the first run:
sudo docker run --interactive --tty --name xbme --net=host ubuntu:18.04 bash
and the following runs:
sudo docker start xbme sudo docker exec --interactive --tty xbme bash sudo docker stop xbme
and to nuke the container later on:
# sudo docker rm xbme
Then proceed normally in the guest: install packages, and build:
apt-get update && \ apt-get install -y git && \ git clone https://github.com/cirosantilli/x86-bare-metal-examples && \ cd x86-bare-metal-examples && \ ./configure -y && \ make
To overcome the lack of GUI, we can use QEMU’s VNC implementation instead of the default SDL, which is visible on the host due to
./run bios_hello_world run -vnc :0
and then on host:
sudo apt-get install vinagre vinagre localhost:5900
TODO: get sound working from docker: PC speaker: https://stackoverflow.com/questions/41083436/how-to-play-sound-in-a-docker-container
It should also be possible to run a GUI inside the container, but I haven’t tested: https://stackoverflow.com/questions/40658095/how-to-open-ubuntu-gui-inside-a-docker-image/57636624#57636624
To GDB step debug the program, run it with:
./run bios_hello_world debug
This will leave you at the very first instruction executed by our program, which is the beginning of our
Note however that this is not the very first instruction QEMU executes: that will actually be BIOS setup code that runs before our program itself.
You can then basically debug as you would a normal userland program, notably:
I then highly recommend that you use GDB Dashboard to see what is going on.
nskips over macros
nisteps within macros. But you will need to enable the printing of assembly code on GDB Dashboard to see where you are at
With this God-like GDB Dashboard setup, at 89cbe7be83f164927caebc9334bc42990e499cb1 I see a perfect program view such as:
1 /* https://github.com/cirosantilli/x86-bare-metal-examples#bios-hello-world */ 2 3 #include "common.h" 4 BEGIN 5 mov $msg, %si 6 mov $0x0e, %ah 7 loop: 8 lodsb 9 or %al, %al 10 jz halt 11 int $0x10 12 jmp loop ─── Assembly ──────────────────────────────────────────────────────────────────────────── 0x00007c00 __start+0 cli 0x00007c01 __start+1 ljmp $0xc031,$0x7c06 0x00007c08 __start+8 mov %eax,%ds 0x00007c0a __start+10 mov %eax,%es 0x00007c0c __start+12 mov %eax,%fs 0x00007c0e __start+14 mov %eax,%gs 0x00007c10 __start+16 mov %eax,%ebp 0x00007c12 __start+18 mov %eax,%ss 0x00007c14 __start+20 mov %ebp,%esp ─── Registers ─────────────────────────────────────────────────────────────────────────── eax 0x0000aa55 ecx 0x00000000 edx 0x00000080 ebx 0x00000000 esp 0x00006f04 ebp 0x00000000 esi 0x00000000 edi 0x00000000 eip 0x00007c00 eflags [ IF ] cs 0x00000000 ss 0x00000000 ds 0x00000000 es 0x00000000 fs 0x00000000 gs 0x00000000 ─── Stack ───────────────────────────────────────────────────────────────────────────────  from 0x00007c00 in __start+0 at bios_hello_world.S:4 (no arguments) ───────────────────────────────────────────────────────────────────────────────────────── >>>
Debug symbols are obtained by first linking ELF files, and then using
objcopy on them to generate the final image. We then pass the ELF files with the debug information to GDB: https://stackoverflow.com/questions/32955887/how-to-disassemble-16-bit-x86-boot-sector-code-in-gdb-with-x-i-pc-it-gets-tr/32960272#32960272
Single stepping until a given opcode can be helpful sometimes: https://stackoverflow.com/questions/14031930/break-on-instruction-with-specific-opcode-in-gdb/31249378#31249378
TODO: detect if we are on 16 or 32 bit automatically from control registers. Now I’m using 2 functions
32 to switch manually, but that sucks. The problem is that it’s not possible to read them directly: http://stackoverflow.com/a/31340294/895245 If we had
cr0, it would be easy to do with an
if cr0 & 1 inside a hook-stop.
TODO: Take segmentation offsets into account: http://stackoverflow.com/questions/10354063/how-to-use-a-logical-address-in-gdb
make doc xdg-open README.html
These are the first ones you should look at.
make -C printf run
Outcome: QEMU window opens up, prints a few boot messages, and hangs.
Our program itself does not print anything to the screen itself, just makes the CPU halt.
This example is generated with
printf byte by byte: you can’t get more minimal than this!
It basically consists of:
byte 0: a
bytes 1 through 509: zeroes, could be anything
bytes 510 and 511: mandatory magic bytes
0xAA55, which are required for BIOS to consider our disk.
Minimal example that just halts the CPU without using our mini-library common.h:
Outcome: QEMU window opens up, prints a few firmware messages, and hangs.
Here is an equivalent example using our mini-library:
You can use that file as a quick template to start new tests.
Go into an infinite loop instead of using
The outcome if visibly the same, but TODO: it likely wastes more energy in real hardware?
This hello world, and most of our OSes use the linker script: linker.ld
This critical file determines the memory layout of our assembly, take some time to read the comments in that file and familiarize yourself with it.
The Linux kernel also uses linker scripts to setup its image memory layout, see for example: https://github.com/torvalds/linux/blob/v4.2/arch/x86/boot/setup.ld
hello world after the firmware messages:
Same output as BIOS hello world, but written in C:
cd c_hello_world ./run
But keep in mind the following limitations and difficulties:
single stage, so still limited to 512 bytes of code + data! TODO: it should be easy to solve that with BIOS disk load, send a pull request :-) Here is full example that we could also adapt: http://3zanders.co.uk/2017/10/18/writing-a-bootloader3
use use GCC’s
-mwhich does not produce "real" 16 bit code, but rather 32-bit code with
setting up the initial state and the linker script is much harder and error prone than with assembly
Therefore, for most applications, you will just want to use Multiboot instead, which overcomes all of those problems.
To disassemble the generated C code, try:
objdump -D -m i8086 main.elf
but note that it still contains references to 32-bit references, e.g.:
00007c17 <main>: 7c17: 66 55 push %ebp 7c19: 66 89 e5 mov %esp,%ebp 7c1c: 66 83 ec 10 sub $0x10,%esp
This is because those instructions are modified by the prefix
0x66, which makes them behave like 32-bit.
hello world without using an explicit linker script:
make -C no-linker-script run
Uses the default host
ld script, not an explicit one set with
.orginside each assembly file
_startmust be present to avoid a warning, since the default linker script expects it
This is a hack, it can be more convenient for quick and dirty tests, but just don’t use it.
The BIOS is one of the most well known firmwares in existence.
A firmware is a software a software that:
runs before the OS / bootloader to do very low level setup
usually closed source, provided by the vendor, and interacts with undocumented hardware APIs
offers an API to the OS / bootloader, that allows you to do things like quick and dirty IO
undistinguishable from an OS, except that is it usually smaller
BIOS is old, non-standardized, x86 omnipresent and limited.
UEFI is the shiny new overbloated thing.
If you are making a serious OS, use it as little as possible.
BIOS Can only be used in Real mode.
BIOS functions are all accessed through the
mov <function-id>, %ah int <interrupt-id>
Function arguments are stored in other registers.
The interrupt IDs are traditionally in hex as:
which is the same as
interrupt-id groups multiple functions with similar functions, e.g.
10h groups functions with video related functionality.
Does any official documentation or standardization exist?
http://www.ctyme.com/intr/int.htm Ralf Brown’s Interrupt List. Everyone says that this is the ultimate unofficial compilation.
https://en.wikipedia.org/wiki/INT_10H good quick summary
http://www.scs.stanford.edu/nyu/04fa/lab/specsbbs101.pdf says little about interrupts, I don’t understand it’s scope.
Print a single
Print a newline:
Carriage returns are needed just like in old days:
Change the current cursor position:
Color codes: https://en.wikipedia.org/wiki/BIOS_color_attributes
Write a character N times with given color:
chave red foreground, and green background
dhas the default color (gray on black)
Change the background color to red for the entire screen and print an
Scroll the screen:
a c GG d
G are empty green squares.
How it works:
a b c d
We then choose to act on the rectangle with corners (1, 1) and (2, 2) given by
a XX YY d
and scroll that rectangle up by one line.
Y is then filled with the fill color green
Make the pixel at position (1, 1) clear red color (0Ch) in Video mode 13h:
You may have to look a bit hard to see it.
Draw a line of such pixels:
Get one character from the user via the keyboard, increment it by one, and print it to the screen, then halt:
Type a bunch of characters and see them appear on the screen:
Load a stage 2 from disk with
int 13h and run it:
This character was printed from stage 2.
Load two sectors instead of just one:
a was printed from code on the first block, and
b from code on the second block.
This shows that each sector is 512 bytes long.
GRUB 2.0 makes several calls to it under
TODO: not working on Bochs:
BOUND_GdMa: fails bounds test.
But it does work on QEMU and ThinkPad T400.
TODO failed attempt at detecting how big our memory is with
Seems to output trash currently.
This is important in particular so that you can start your stack there when you enter Protected mode, since the stack grows down.
In 16-bit mode, it does not matter much, since most modern machines have all addressable memory there, but in 32-bit protected it does, as our emulator usually does not have all 4Gb. And of course, 64-bit RAM is currently larger than the total RAM in the world.
int 15 returns a list: each time you call it a new memory region is returned.
The format is not too complicated, and documented at: http://wiki.osdev.org/Detecting_Memory_%28x86%29#Detecting_Upper_Memory
8 bytes: base address of region.
8 bytes: length of region.
4 bytes: type or region. 1 for usable RAM.
4 bytes: some ACPI stuff that no one uses?
int 15h can detect low or high memory. How are they different?
Count to infinity, sleep one second between each count:
Polls time counter that BIOS keeps up to date at
0x046C with frequency 18.2Hz eighteen times.
Check the initial state the firmware leaves us by printing the contents of several registers:
ax = 00 00 bx = 00 00 cx = 00 00 dx = 80 00 cs = 00 00 ds = 00 00 es = 00 00 fs = 00 00 gs = 00 00 ss = 00 00 cr0 = 53 FF 00 F0
dx seems to be like the only interesting regular register: the firmware stores the value of the current disk number to help with
int 15h there. Thus it usually contains
Get BIOS information. On host:
Standardized by: https://en.wikipedia.org/wiki/Distributed_Management_Task_Force
TODO: how is it obtained at the low level?
Here we will collect some examples that do stuff without using the BIOS!
These tend to be less portable, not sure they will work on real hardware.
Also they might need to rely on undocumented features.
But they were verified in QEMU.
If you are serious about this, study Coreboot.
with red foreground and blue background shows on the top left of the cleared screen.
This example uses the fact that BIOS maps video memory to address 0xB8000.
We can then move 0xB800 to a segment register and use segment:offset addressing to access this memory.
Then we can show characters by treating
0xB800:0000 as a
uint16_t array, where low 8 bytes is the ASCII character, and the high 8 bytes is the color attribute of this character.
The x86 processor has a few modes, which have huge impact on how the processor works.
Covered on the Intel manual Volume 3. Specially useful is the "Figure 2-3. Transitions Among the Processor’s Operating Modes" diagram.
The modes are:
Real-address, usually known just as "real mode"
IA-32e. Has two sub modes:
(all modes) | | Reset | v +---------------------+ | Real address (PE=0) | +---------------------+ ^ | | PE | v +------------------------+ | Protected (PE=1, VM=0) | +------------------------+ ^ ^ | | | | VM | | v v +--------------+ +---------------------+ | IA-32e | | Virtual-8086 (VM=1) | +--------------+ +---------------------+
+------------------------+ | System management mode | +------------------------+ | ^ | | | RSM | SMI# | | v | (All other modes)
The IA-32e transition is trickier, but clearly described on the Intel manual Volume 3 - 9.8.5 "Initializing IA-32e Mode":
Operating systems should follow this sequence to initialize IA-32e mode:
Starting from protected mode, disable paging by setting
CR0.PG = 0. Use the
MOV CR0instruction to disable paging (the instruction must be located in an identity-mapped page).
Enable physical-address extensions (PAE) by setting CR4.
PAE = 1. Failure to enable PAE will result in a
#GPfault when an attempt is made to initialize IA-32e mode.
CR3with the physical base address of the Level 4 page map table (PML4).
Enable IA-32e mode by setting
IA32_EFER.LME = 1.
Enable paging by setting
CR0.PG = 1. This causes the processor to set the
IA32_EFER.LMAbit to 1. The
MOV CR0instruction that enables paging and the following instructions must be located in an identity-mapped page (until such time that a branch to non-identity mapped pages can be effected).
The term defined in the Intel manual Volume 3 - CHAPTER 2 "SYSTEM ARCHITECTURE OVERVIEW":
Real mode, protected mode, virtual 8086 mode, and system management mode. These are sometimes referred to as legacy modes.
In other words: anything except IA-32e and System management mode.
This further suggests that real, protected and virtual mode are not the main intended modes of operation.
The CPU starts in this mode after power up.
All our BIOS examples are in real mode.
It is possible to use 32-bit registers in this mode with the "Operand Size Override Prefix"
TODO is it possible to access memory above 1M like this:
mov $1, 0xF0000000 mov $1, (%eax)
We access the character
A with segments in 6 different ways:
ds, with explicit and implicit segment syntax
Segment registers modify the addresses that instructions actually use as:
<segment> * 16 + <original-address>
This implies that:
20 bits of memory (1MB) instead of the 16 bits (256kB) that normally fits into registers. E.g., to address:
we can use:
0x8000 (segment) 0x 4000 (address) ------- 0x84000
most addresses can be encoded in multiple ways, e.g.:
can be encoded as either of:
0x10, address =
0, address =
0x1, address =
gs are general purpose: they are not affected implicitly by any instructions. All others will be further exemplified.
Affects the code address pointer:
00 01 02
CS is set with the
ljmp instruction, and we use it to skip
.skip zero gaps in the code.
The second byte is 16 bytes after the first, and is accessed with
SP = 1.
SS affects instructions that use
SP such as
POP: those will actually use
16 * SS + SP as the actual address.
TODO: this does seem to have special properties as used by string instructions.
objdump -D -b binary -m i8086 segment_registers.img
shows that non
ds encodings are achieved through a prefix:
20: a0 63 7c mov 0x7c63,%al 34: 26 a0 63 7c mov %es:0x7c63,%al 40: 64 a0 63 7c mov %fs:0x7c63,%al 4c: 65 a0 63 7c mov %gs:0x7c63,%al 58: 36 a0 63 7c mov %ss:0x7c63,%al
ds the most efficient one for data access, and thus a good default.
Create an interrupt handler and handle an interrupt:
It works like this:
aan interrupt handler
jump back to main code
TODO: is STI not needed because this interrupt is not maskable?
Same with interrupt handler
TODO understand: attempt to create an infinite loop that calls the interrupt from the handler:
QEMU exits with:
Trying to execute code outside RAM or ROM at 0x000a0000
Handle a division by zero:
expected outcome: prints values from 0 to
0xFFFFin an infinite loop.
actual outcome: stops at
Apparently when there is an exception,
iret jumps back to the line that threw the exception itself, not the one after, which leads to the loop:
But then why does it stop at
0081? And if we set the initial value to
0x0090, it just runs once.
long jumps to the CS : IP found in the corresponding interrupt vector.
pushes EFLAGS to let them be restored by iret?
Jumps back to the next instruction to be executed before the interrupt came in.
Restores EFLAGS and other registers TODO which?
Fancy name for the handler: http://wiki.osdev.org/Interrupt_Service_Routines
Interrupt vector table: https://wiki.osdev.org/IVT
The real mode in-memory table that stores the address for the handler for each interrupt.
The base address is set in the interrupt descriptor table register (IDTR), which can be modified with the lidt instruction.
The default address is
The format of the table is:
IDTR -> +-----------------------+ 0 |Address (2 bytes) | 2 |Code segment (2 bytes) | +-----------------------+ +-----------------------+ 4 ----> |Address (2 bytes) | 6 |Code segment (2 bytes) | +-----------------------+ +-----------------------+ 8 ----> |Address (2 bytes) | A |Code segment (2 bytes) | +-----------------------+ ... ...
Set the value of the IDTR, and therefore set the base address of the IVT:
./run lidt ./run lidt2 ./run lidt0
TODO not working.
Actual outcome: infinite reboot loop.
Actual outcome if we comment out the
lidt: still infinite reboot loop
lidt0: halt apparently
I think I understand that
lidt takes as input a memory address, and the memory at that address must contain:
2 bytes: total size of the IVT in bytes
4 bytes: base address of the IVT. Higher byte is ignored in real mode, since addresses are not 4 bytes long.
hello world in protected mode:
Major changes from real mode:
http://stackoverflow.com/questions/28645439/how-do-i-enter-32-bit-protected-mode-in-nasm-assembly Initially adapted from this.
https://thiscouldbebetter.wordpress.com/2011/03/17/entering-protected-mode-from-assembly/ FASM based. Did not word on first try, but looks real clean.
Linux kernel v4.12
The Intel manual Volume 3 - 9.10 "INITIALIZATION AND MODE SWITCHING EXAMPLE" does contain an official example of how to go into protected mode.
the code is inside the PDF, which breaks all the formatting, so we have copied it here to this repo
TODO there is no known tool that can actually compile that syntax… although MASM should be close:
How can those guys be in business? >:-)
TODO do it.
Things get much more involved than in real mode: http://stackoverflow.com/questions/14419088/how-to-draw-a-pixel-on-the-screen-in-protected-mode-in-x86-assembly
TODO: get working:
x a b
Example of the effect on a memory access of changing the segment base address.
Without segment manipulation, the output would be just: TODO
First read the paging tutorial, and in particular: https://cirosantilli.com/x86-paging#segmentation to get a feel for the type of register and data structure manipulation required to configure the CPU, and how segmentation compares to paging.
Segmentation modifies every memory access of a given segment by:
adding an offset to it
limiting how big the segment is
If an access is made at an offset larger than allowed an exception happens, which is like an interrupt, and gets handled by a previously registered handler.
Segmentation could be used to implement virtual memory by assigning one segment per program:
+-----------+--------+--------------------------+ | Program 1 | Unused | Program 2 | +-----------+--------+--------------------------+ ^ ^ ^ ^ | | | | Start1 End1 Start2 End2
Besides address translation, the segmentation system also managed other features such as Protection rings. TODO: how are those done in 64-bit mode?
In Linux 32-bit for example, only two segments are used at all times: one at ring 0 for the kernel, and one another at privilege 3 for all user processes.
In protected mode, the segment registers
GS contain a data structure more complex than a simple address as in real mode, which contains a single number.
This 2 byte data structure is called a segment selector:
|Position (bits)||Size (bits)||Name||Description|
Request Privilege Level (RPL)
Protection ring level, from 0 to 3.
Table Indicator (TI)
Index of the Segment descriptor to be used from the descriptor table.
Like in real mode, this data structure is loaded on the registers with a regular
mov mnemonic instruction.
Bibliography: Intel manual Volume 3 - 3.4.5 "Segment Descriptors".
Global descriptor table.
An in-memory array of Segment descriptor data structures:
Index field of the Segment selector chooses which one of those segment descriptors is to be used.
The base address is set with the
lgdt instruction, which loads from memory a 6 byte structure:
|Position (bytes)||Size (bytes)||Description|
Number of entries in the table
Base address of the table
TODO vs global?
Intel manual Volume 3 - 3.4.2 "Segment Selectors" says that we can’t use the first entry of the GDT:
The first entry of the GDT is not used by the processor. A segment selector that points to this entry of the GDT (that is, a segment selector with an index of 0 and the TI flag set to 0) is used as a “null segment selector.” The processor does not generate an exception when a segment register (other than the CS or SS registers) is loaded with a null selector. It does, however, generate an exception when a segment register holding a null selector is used to access memory. A null selector can be used to initialize unused segment registers. Loading the CS or SS register with a null segment selector causes a general-protection exception (#GP) to be generated.
A data structure that is stored in the GDT.
Clearly described on the Intel manual Volume 3 - 3.4.5 "Segment Descriptors" and in particular Figure 3-8 "Segment Descriptor".
The Linux kernel v4.2 encodes it at:
TODO example. Jump to userspace, do something naughty, handler interrupt in kernel land.
Interrupt descriptor table.
Protected mode analogue to the IVT:
int 0 handled
Handle interrupt 1 instead of 0:
int 1 handled
18.2 Hz with the PIT:
The first 32 handlers are reserved by the processor and have predefined meanings, as specified in the Intel manual Volume 3 Table 3-3. "Intel 64 and IA-32 General Exceptions".
In the Linux kernel, https://github.com/torvalds/linux/blob/v4.2/arch/x86/entry/entry_64.S sets them all up: each
idtentry divide_error call sets up a new one.
Handle a division by zero:
division by zero handled
Division by zero causes a Divide Error which Intel notes as
It is then handled by IDT 0.
DEs are not only for division by zero: they also happens on overflow. TODO example.
Start multiple processors and make them interact:
Implies that SMP worked because a spinlock was unlocked by the second processor.
Try commenting out waking up the second processor and see it not get printed.
Verbose beginner’s tutorial: https://cirosantilli.com/x86-paging
Change page tables and observe how that affects memory accesses:
Implies that paging worked because we printed and modified the same physical address with two different virtual addresses.
Requires Protected mode.
Wikipedia seems to call it long mode: https://en.wikipedia.org/wiki/Long_mode
This controlled by the
CS.L bit of the segment descriptor.
It appears that it is possible for user programs to modify that during execution from userland: http://stackoverflow.com/questions/12716419/can-you-enter-x64-32-bit-long-compatibility-sub-mode-outside-of-kernel-mode
TODO vs Protected mode.
64-bit is the major mode of operation, and enables the full 64 bit instructions.
There are currently no examples in this repo because I was lazy to make them.
As someone once brilliantly put it: https://twitter.com/garybernhardt/status/1106255947138125824
Watching an x86-64 CPU boot is like watching an amoeba slowly evolve into a dog.
The backward compatibility of x86 is mind boggling.
Compatibility mode emulates IA-32 and allows to run 32 and 16 bit code.
But 64 bit Linux and Windows don’t seem to allow 16 bit code anymore?
Compatibility vs protected: https://stackoverflow.com/questions/20848412/modes-of-intel-64-cpu
x86 has dedicated instructions for certain IO operations:
These instructions take an IO address which identifies which hardware they will communicate to.
The IO ports don’t seem to be standardized, like everything else: http://stackoverflow.com/questions/14194798/is-there-a-specification-of-x86-i-o-port-assignment
The Linux kernel wraps those instructions with the
outb family of instructions:
man inb man outb
Not all instruction sets have dedicated instructions such as
out for IO.
In ARM for example, everything is done by writing to magic memory addresses.
out approach is called "port mapped IO", and the approach of the magic addresses "memory mapp"
From an interface point of view, I feel that memory mapped is more elegant: port IO simply creates a second addresses space.
TODO: are there performance considerations when designing CPUs?
Whenever you press a key down or up, the keyboard hex scancode is printed to the screen:
Uses the PS/2 keyboard controller on
in 60h: http://wiki.osdev.org/%228042%22_PS/2_Controller
in always returns immediately with the last keyboard keycode: we then just poll for changes and print only the changes.
Scancode tables: TODO: official specs?
TODO do this with the interrupt table instead of
in. Failed attempt at: interrupt_keyboard.S
TODO create an example:
Random threads with source code, ah those OS devs:
I am so going to make a pixel drawing program with this.
Real Time Clock: https://en.wikipedia.org/wiki/Real-time_clock
Get wall time with precision of seconds every second:
00 01 02 03 04 10
3rd April 2010, 02 hours 01 minute and 00 seconds.
out 70h and
in 71h to query the hardware.
This hardware must therefore use a separate battery to keep going when we turn off the computer or remove the laptop battery.
We can control the initial value in QEMU with the option:
qemu-system-x86_64 -rtc base='2010-04-03T02:01:00'
Programmable Interval Timer: https://en.wikipedia.org/wiki/Programmable_interval_timer
Superseded by the HPET.
a\n with the minimal frequency possible of
0x1234DD / 0xFFFF = 18.2 Hz:
Make the PIT generate a single interrupt instead of a frequency:
TODO I think this counts down from the value value in channel 0, and therefore allows to schedule a single event in the future.
The PIT can generate periodic interrupts (or sound!) with a given frequency to
IRQ0, which on real mode maps to interrupt 8 by default.
Major application: interrupt the running process to allow the OS to schedule processes.
The PIT 3 channels that can generate 3 independent signals
channel 0 at port
40h: generates interrupts
channel 1 at port
41h: not to be used for some reason
channel 2 at port
42h: linked to the speaker to generate sounds
43h is used to control signal properties except frequency, which goes in the channel ports, for the 3 channels.
https://en.wikipedia.org/wiki/Intel_8253 that is the circuit ID for the PIT.
We don’t control the frequency of the PIT directly, which is fixed at
Instead, we control a frequency divisor. This is a classic type of discrete electronic circuit: https://en.wikipedia.org/wiki/Frequency_divider
The magic frequency comes from historical reasons to reuse television hardware according to https://wiki.osdev.org/Programmable_Interval_Timer, which in turn is likely influenced by some physical properties of crystal oscillators.
1193181 == 0x1234DD has 2 occurrences on Linux 4.16.
Outcome: produces a foul noisy noise using the PC speaker hardware on
QEMU only plays the sound if we give it the option:
The beep just uses the PIT Channel 2 to generate the frequency.
Extracted from: https://github.com/torvalds/linux/blob/v4.2/arch/x86/realmode/rm/wakemain.c#L38 The kernel has a Morse code encoder using it!
There are several video modes.
Modes determine what interrupt functions can be used.
There are 2 main types of modes:
text, where we operate character-wise
video, operate byte-wise
Modes can be set with
int 0x10 and
AH = 0x00, and get with
AH = 0x0F
The most common modes seem to be:
0x01: 40x25 Text, 16 colors, 8 pages
0x03: 80x25 Text, 16 colors, 8 pages
0x13: 320x200 Graphics, 256 colors, 1 page
You can add 128 to the modes to prevent them from clearing the screen.
A larger list: http://www.columbia.edu/~em36/wpdos/videomodes.txt
Example at: BIOS draw pixel
13h has: 320 x 200 Graphics, 256 colors, 1 page.
The color encoding is just an arbitrary palette that fits 1 byte, it is not split colors like R R R G G G B B or anything mentioned at: https://en.wikipedia.org/wiki/8-bit_color. Related: http://stackoverflow.com/questions/14233437/convert-normal-256-color-to-mode-13h-version-color
TODO: what is it exactly?
BIOS cannot be used when we move into Protected mode, but we can use the VGA interface to get output out of our programs.
Have a look at the macros prefixed with
VGA_ inside common.h.
Infinite reboot loop on emulator!
TODO why does it work?
Turn on and immediately shutdown the system closing QEMU:
Fancier version copied from http://wiki.osdev.org/APM (TODO why is that better):
Older than ACPI and simpler.
By Microsoft in 1995. Spec seems to be in RTF format…
Can’t find the URL. A Google cache: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CB0QFjAAahUKEwj7qpLN_4XIAhWCVxoKHa_nAxY&url=http%3A%2F%2Fdownload.microsoft.com%2Fdownload%2F1%2F6%2F1%2F161ba512-40e2-4cc9-843a-923143f3456c%2FAPMV12.rtf&usg=AFQjCNHoCx8gHv-w08Dn_Aoy6Q3K3DLWRg&sig2=D_66xvI7Y2n1cvyB8d2Mmg
Newer and better.
Now managed by the same group that manages UEFI.
Successor for BIOS.
All laptops I tested BIOS with had UEFI, so UEFI must have a BIOS emulation mode for backwards compatibility: https://www.howtogeek.com/56958/htg-explains-how-uefi-will-replace-the-bios/
Made by Intel, mostly MIT open source, which likely implies that vendors will hack away closed source versions.
Matthew Garrett says it is huge: larger than Linux without drivers.
Since it is huge, it inevitably contains bugs. Garret says that Intel sometimes does not feel like updating the firmware with bugfixes.
UEFI offers a large API comparable to what most people would call an operating system:
https://software.intel.com/en-us/articles/uefi-application mentions a POSIX C library port
https://lwn.net/Articles/641244/ mentions a Python interpreter port!
ARM is considering an implementation https://wiki.linaro.org/ARM/UEFI
make -C uefi run
TODO get a hello world program working:
http://www.rodsbooks.com/efi-programming/hello.html Best source so far: allowed me to compile the hello world! TODO: how to run it now on QEMU and real hardware?
Running without image gives the UEFI shell, and a Linux kernel image booted fine with it: http://unix.stackexchange.com/a/228053/32558, so we just need to generate the image.
uefi/ovmf.fd IA32 r15214 was downloaded from: https://sourceforge.net/projects/edk2/files/OVMF/OVMF-IA32-r15214.zip/download TODO: automate building it from source instead, get rid of the blob, and force push it away from history. Working build setup sketch: https://github.com/cirosantilli/linux-cheat/blob/b1c3740519eff18a7707de981ee3afea2051ba10/ovmf.sh
It seems that they have moved to GitHub at last: https://github.com/tianocore/tianocore.github.io/wiki/How-to-build-OVMF/e372aa54750838a7165b08bb02b105148e2c4190
https://www.youtube.com/watch?v=V2aq5M3Q76U hardcore kernel dev Matthew Garrett saying how bad UEFI is
grub/README.adoc TODO cleanup and exemplify everything in that file. Some hosty stuff needs to go out maybe.
make -C grub/chainloader run
Outcome: you are left in an interactive GRUB menu with two choices:
hello-world: go into a hello world OS
self +1: reload ourselves, and almost immediately reload GRUB and fall on the same menu as before
This example illustrates the
chainloader GRUB command, which just loads a boot sector and runs it: https://www.gnu.org/software/grub/manual/grub/html_node/chainloader.html
This is what you need to boot systems like Windows which GRUB does not know anything about: just point to their partition and let them do the job.
Both of the menu options are implemented with
Loads a given image file within the partition.
grub-mkrescuecreates a few filesystems, and
grub/chainloader/iso/boot/main.imgis placed inside one of those filesystems.
This illustrates GRUB’s awesome ability to understand certain filesystem formats, and fetch files from them, thus allowing us to pick between multiple operating systems with a single filesystem.
It is educational to open up the generated
grub/chainloader/main.imgwith the techniques described at https://askubuntu.com/questions/69363/mount-single-partition-from-image-of-entire-disk-device/673257#673257 to observe that the third partition of the image is a VFAT filesystem, and that it contains the
boot/main.imgimage as a regular file.
self +1: uses the syntax:
which reloads the first sector of the current partition, and therefor ourselves.
TODO: why does it fail for hybrid ISO images? http://superuser.com/questions/154134/grub-how-to-boot-into-iso-partition#comment1337357_154271
TODO get working.
OK, let’s have some fun and do the real thing!
make -C grub/linux run
Expected outcome: GRUB menu with a single
Buildroot entry. When you select it, a tiny pre-built Linux image boots from: https://github.com/cirosantilli/linux-kernel-module-cheat
Actual outcome: after selecting the entry, nothing shows on the screen. Even if we fix this, we will then also need to provide a rootfs somehow: the
initrd GRUB command would be a simple method, that repo can also generate initrd images: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/c06476bfc821659a4731d49e808f45e8c509c5e1#initrd Maybe have look under Buildroot
boot/grub2 and copy what they are doing there.
The GRUB command is of form:
linux /boot/bzImage root=/dev/sda1 console=tty1
so we see that the kernel boot parameters are passed right there, for example try to change the value of the
and see how the dmesg times not get printed anymore.
Standard created by GRUB for booting OSes.
Multiboot files are an extension of ELF files with a special header.
Advantages: GRUB does housekeeping magic for you:
you can store the OS as a regular file inside a filesystem
your program starts in 32-bit mode already, not 16 bit real mode
it gets the available memory ranges for you
GRUB leaves the application into a well defined starting state.
It seems that Linux does not implement Multiboot natively, but GRUB supports it as an exception: http://stackoverflow.com/questions/17909429/booting-a-non-multiboot-kernel-with-grub2
QEMU supports multiboot natively https://stackoverflow.com/questions/25469396/how-to-use-qemu-properly-with-multi-boot-headers/32550281#32550281:
make -C multiboot/hello-world run
which actually runs:
qemu-system-i386 -kernel 'main.elf'
main.elf is the multiboot file we generated.
Or you can use
grub-mkrescue to make a multiboot file into a bootable ISO or disk:
qemu-system-x86_64 -drive file=main.img,format=raw
main.img file can also be burned to a USB and run on real hardware.
Example originally minimized from https://github.com/programble/bare-metal-tetris
This example illustrates the
multiboot GRUB command: https://www.gnu.org/software/grub/manual/grub/html_node/multiboot.html
We also track here the code from: http://wiki.osdev.org/Bare_Bones:
make -C multiboot/osdev run
This is interesting as it uses C as much as possible with some GAS where needed.
This should serve as a decent basis for starting a pet OS. But please don’t, there are enough out there already :-)
Tests for utilities defined in this repo, as opposed to x86 or external firmware concepts.
TODO: implement the function and enable this test: test_vga_print_bytes.S
Print several bytes in human readable form:
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50
Most of this repo was originally tested on a Lenovo ThinkPad T400.
Unfortunately it broke and I threw it away, and I didn’t write down the exact specs before doing so, notably the bootloader version.
This repository covers only things that can only be done from ring 0 (system) and not ring 3 (userland).
Ring 3 is covered at: https://github.com/cirosantilli/x86-assembly-cheat
An overview of rings 0 and 3 can be found at: https://stackoverflow.com/questions/18717016/what-are-ring-0-and-ring-3-in-the-context-of-operating-systems/44483439#44483439
There are a few tutorials that explain how to make an operating system and give examples of increasing complexity with more and more functionality added: Progressive tutorials.
This is not one of them.
The goal of this repository is to use the minimal setup possible to be able to observe a single low-level programming concept for each minimal operating system we create.
This is not meant provide a template from which you can write a real OS, but instead to illustrate how those low-level concepts work in isolation, so that you can use that knowledge to implement operating systems or drivers.
Minimal examples are useful because it is easier to observe the requirements for a given concept to be observable.
Another advantage is that it is easier to DRY up minimal examples with macros or functions, which is much harder on progressive OS template tutorials, which tend to repeat big chunks of code between the examples.
Using C or not is a hard choice.
It does make it much easier to express higher level ideas, and gives portability.
However, it increases the complexity that one has to understand a bit, so I decided to stay away from it when I wrote this tutorial.
But I have since change my mind, and if I ever touch this again seriously, I would rewrite it in C based on C hello world and Newlib: https://electronics.stackexchange.com/questions/223929/c-standard-libraries-on-bare-metal/400077#400077
If this is done, we this repo should then be merged into: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/87e846fc1f9c57840e143513ebd69c638bd37aa8#baremetal-setup together with the ARM Newlib baremetal setups present there.
What the heck is a serial in the real world: https://unix.stackexchange.com/questions/307390/what-is-the-difference-between-ttys0-ttyusb0-and-ttyama0-in-linux/367882#367882
Currently all text output is done the display, and that was a newbie design choice from before I knew the serial existed. The serial is just generally more minimal and elegant than the display, and should have been used instead.
./run bios_serial cat bios_serial.tmp.serial
On QEMU, we see the serial output on the host terminal:
and on Bochs we redirect it to a file:
./run bios_serial bochs cat bios_serial.tmp.serial
TODO: failed attempt without BIOS:
Like every other of those old hardwares, it is impossible to find its documentation, must be rotting on some IBM mainframe that is not connected to the internet, so we go for:
Samy likely just copied OSDev that for his: https://github.com/SamyPesse/How-to-Make-a-Computer-Operating-System/blob/eb30f8802fac9f0f1c28d3a96bb3d402bdfc4687/src/kernel/modules/x86serial.cc#L38
This would open up:
gem5 benchmarking and exploration, currently blocked on https://stackoverflow.com/questions/50364863/how-to-get-graphical-gui-output-and-user-touch-keyboard-mouse-input-in-a-ful/50364864#50364864
the output stays persistently on the host terminal. So we can run QEMU without a GUI, immediatily shutdown the machine it at the end, and not have to close QEMU manually all the time.
automated unit tests. Ha, like I’m gonna be that diligent!
easily working on ARM in a more uniform way to prepare for the move in to https://github.com/cirosantilli/linux-kernel-module-cheat/tree/87e846fc1f9c57840e143513ebd69c638bd37aa8#baremetal-setup
Using macros for now on common.h instead of functions because it simplifies the linker script.
But the downsides are severe:
no symbols to help debugging. TODO: I think there are assembly constructs for that.
impossible to step over method calls: you have to step into everything. TODO:
larger output, supposing I can get linker gc for unused functions working, see
--gc-section, which is for now uncertain.
If I can get this working, I’ll definitely move to function calls.
The problem is that if I don’t, every image will need a stage 2 loader. That is not too serious though, it could be added to the
It seems that
ldcan only remove sections, not individual symbols: http://stackoverflow.com/questions/6687630/c-c-gcc-ld-remove-unused-symbols With GCC we can use
-ffunction-sections -fdata-sectionsto quickly generate a ton of sections, but I don’t thing GAS supports that…
We should just rewrite the whole thing to use functions instead…
cd nasm/ ./run bios_hello_world
While NASM is a bit more convenient than GAS to write a boot sector, I think it is just not worth it.
When writing an OS in C, we are going to use GCC, which already uses GAS. So it’s better to reduce the number of assemblers to one and stick to GAS only.
Right now, this directory is not very DRY since NASM is secondary to me, so it contains mostly some copy / paste examples.
On top of that, GAS also supports other architectures besides x86, so learning it is more useful in that sense.
Always try looking into the Linux kernel to find how those CPU capabilities are used in a "real" OS.
OS dev is one of the most insanely hard programming tasks a person can undertake, and will push your knowledge of several domains to the limit.
Knowing the following will help a lot:
userland x86 assembly: https://github.com/cirosantilli/assembly-cheat
compilation, linking and ELF format basics
While it is possible to learn those topics as you go along, and it is almost certain that you will end up learning more about them, we will not explain them here in detail.
We are interested mostly in the "Intel Manual Volume 3 System Programming Guide", where system programming basically means "OS stuff" or "bare metal" as opposed to userland present in the other manuals.
This repository quotes by default the following revision: 325384-056US September 2015 https://web.archive.org/web/20151025081259/http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf
Fun, educational and useless:
The following did not work on my machine out of the box:
https://github.com/programble/bare-metal-tetris tested on Ubuntu 14.04. Just works.
Has Multiboot and El Torito. Uses custom linker script.
Almost entirely in C
-nostdlib, with very few inline
asmcommands, and a small assembly entry point. So a good tutorial in how to do the bridge.
https://github.com/daniel-e/tetros Tetris that fits into bootloader.
https://github.com/nanochess/fbird Flappy bird in the 512-byte boot sector.
https://github.com/tsoding/pinpog Pong / Breakout
https://github.com/io12/bootmine Minesweeper game in a 512-byte boot sector.
One complexity order above the minimal tutorials, one below actual kernels
osdev.org is a major source for this.
https://courses.engr.illinois.edu/ece390/books/labmanual/index.html Illinois course from 2004
The classic tutorial. Highly recommended.
Multiboot based kernels of increasing complexity, one example builds on the last one. Non DRY as a result.
Cleaned up source code: https://github.com/cirosantilli/jamesmolloy-kernel-development-tutorials
Well known bugs: http://wiki.osdev.org/James_Molloy’s_Tutorial_Known_Bugs That’s what happens when you don’t use GitHub.
Good tutorials, author seems to master the subject.
But he could learn more about version control and build automation: source code inside ugly tar.gz with output files.
Ubuntu 18.04 usage: apply this patch https://github.com/cfenollosa/os-tutorial/pull/100 and then:
cd 23-fixes make run
Starts with raw assembly + inludes, moves to C midway.
Raw stage-2 loader. No task scheduling yet, but the feature is… "scheduled" ;-)
Explains how to use the QEMU GDB stub and automates it on makefile, kudos.
Reviewed at: 7aff64740e1e3fba9a64c30c5cead0f88514eb62
Has one big source tree that goes up to multitasking and a stdlib. Kernel written C++ and stdlib in C. TODO check: 64-bit, ring 0 vs ring 3?
git grep rax has no hits, so I’m guessing no 64-bit.
Build failed on Ubunbu 18.04 with: https://github.com/SamyPesse/How-to-Make-a-Computer-Operating-System/issues/127 and I didn’t bother to investigate.
Does have a
lucid32 Vagrant file for the host, but lazy to try it out.
Reviewed at: eb30f8802fac9f0f1c28d3a96bb3d402bdfc4687
Several examples of increasing complexity. Found at: http://stackoverflow.com/questions/7130726/writing-a-hello-world-kernel
Just works, but examples are non-minimal, lots of code duplication and blobs. There must be around 20 El Torito blobs in that repo.
Cleaned up version: https://github.com/cirosantilli/skelix-os
Not tested yet.
GAS based, no multiboot used.
These are not meant as learning resources but rather as useful programs:
https://github.com/scanlime/metalkit A more automated / general bare metal compilation system. Untested, but looks promising.
Python without an "OS": https://us.pycon.org/2015/schedule/presentation/378/
A list of ARM bare metal resources can be found at: https://github.com/cirosantilli/arm-assembly-cheat/tree/117f5d7d3458c028275ce112725f2e36f594f13c#bare-metal
Copyright Ciro Santilli https://cirosantilli.com
GPL v3 for executable computer program usage.
CC BY-SA v4 for human consumption usage in learning material, e.g.
.md files, source code comments, using source code excerpts in tutorials. Recommended attribution:
Single file adaptations:
Based on https://github.com/cirosantilli/x86-bare-metal-examples/blob/<commit-id>/path/to/file.md under CC BY-SA v4
Based on https://github.com/cirosantilli/x86-bare-metal-examples/tree/<commit-id> under CC BY-SA v4
If you want to use this work under a different license, contact the copyright owner, and he might make a good price.