Dozens of minimal operating systems to learn x86 system programming. Userland cheat at: https://github.com/cirosantilli/x86-assembly-cheat Keywords: hello world, bare bones, boot sector, MBR, BIOS, UEFI, VGA, GRUB, Multiboot, QEMU.
Switch branches/tags
Nothing to show
Clone or download
Permalink
Failed to load latest commit information.
grub Move all documentation to README.adoc May 13, 2018
intel-protected Move all documentation to README.adoc May 13, 2018
multiboot Move all documentation to README.adoc May 13, 2018
nasm Move all documentation to README.adoc May 13, 2018
no-linker-script Move all documentation to README.adoc May 13, 2018
printf Move all documentation to README.adoc May 13, 2018
uefi link to examples from readme Jul 17, 2018
.gitignore initial_state.S stopped working... Oct 6, 2015
LICENSE.adoc Move all documentation to README.adoc May 13, 2018
Makefile Move all documentation to README.adoc May 13, 2018
README.adoc Move bare metal bibliography to arm-assembly-cheat Jul 21, 2018
apm_shutdown.S Move all documentation to README.adoc May 13, 2018
apm_shutdown2.S Move all documentation to README.adoc May 13, 2018
bios_background.S Uniformized macros: push save all registers, and use only GAS macros,… Oct 21, 2015
bios_carriage_return.S Move all documentation to README.adoc May 13, 2018
bios_clear_screen.S Move all documentation to README.adoc May 13, 2018
bios_color.S Move all documentation to README.adoc May 13, 2018
bios_cursor_position.S Move all documentation to README.adoc May 13, 2018
bios_detect_memory.S link to examples from readme Jul 17, 2018
bios_disk_load.S link to examples from readme Jul 17, 2018
bios_disk_load2.S link to examples from readme Jul 17, 2018
bios_hello_world.S Move all documentation to README.adoc May 13, 2018
bios_initial_state.S link to examples from readme Jul 17, 2018
bios_keyboard.S Move all documentation to README.adoc May 13, 2018
bios_keyboard_loop.S Move all documentation to README.adoc May 13, 2018
bios_newline.S Move all documentation to README.adoc May 13, 2018
bios_pixel.S Move all documentation to README.adoc May 13, 2018
bios_pixel_line.S Move all documentation to README.adoc May 13, 2018
bios_putc.S link to examples from readme Jul 17, 2018
bios_scroll.S link to examples from readme Jul 17, 2018
bios_sleep.S Move all documentation to README.adoc May 13, 2018
common.h Move all documentation to README.adoc May 13, 2018
configure Install the missing grub-pc-bin Apr 2, 2018
cs.S Move all documentation to README.adoc May 13, 2018
gdb.gdb Move all documentation to README.adoc May 13, 2018
idt.S Move all documentation to README.adoc May 13, 2018
idt1.S Move all documentation to README.adoc May 13, 2018
idt_zero_divide.S Move all documentation to README.adoc May 13, 2018
infinite_loop.S link to examples from readme Jul 17, 2018
interrupt.S Move all documentation to README.adoc May 13, 2018
interrupt1.S Move all documentation to README.adoc May 13, 2018
interrupt_keyboard.S link to examples from readme Jul 17, 2018
interrupt_loop.S Move all documentation to README.adoc May 13, 2018
interrupt_zero_divide.S Move all documentation to README.adoc May 13, 2018
lidt.S Move all documentation to README.adoc May 13, 2018
lidt0.S Move all documentation to README.adoc May 13, 2018
lidt2.S Move all documentation to README.adoc May 13, 2018
linker.ld Move all documentation to README.adoc May 13, 2018
min.S link to examples from readme Jul 17, 2018
page_fault.S link to examples from readme Jul 17, 2018
paging.S Move all documentation to README.adoc May 13, 2018
pc_speaker.S Move all documentation to README.adoc May 13, 2018
pit.S Move all documentation to README.adoc May 13, 2018
pit_once.S Move all documentation to README.adoc May 13, 2018
pit_protected.S Move all documentation to README.adoc May 13, 2018
protected_mode.S Move all documentation to README.adoc May 13, 2018
ps2_keyboard.S Move all documentation to README.adoc May 13, 2018
real_segmentation.S Move all documentation to README.adoc May 13, 2018
reboot.S Move all documentation to README.adoc May 13, 2018
rtc.S link to examples from readme Jul 17, 2018
run Move all documentation to README.adoc May 13, 2018
run-bios_hello_world Move all documentation to README.adoc May 13, 2018
segmentation.S Move all documentation to README.adoc May 13, 2018
smp.S Move all documentation to README.adoc May 13, 2018
ss.S Move all documentation to README.adoc May 13, 2018
template.S link to examples from readme Jul 17, 2018
test_pit_sleep_protected.S link to examples from readme Jul 17, 2018
test_pit_sleep_ticks.S Move all documentation to README.adoc May 13, 2018
test_print_bytes.S Move all documentation to README.adoc May 13, 2018
test_vga_print_bytes.S link to examples from readme Jul 17, 2018

README.adoc

x86 Bare Metal Examples

Dozens of minimal operating systems to learn x86 system programming. Tested on Ubuntu 17.10 host. Userland cheat at: https://github.com/cirosantilli/x86-assembly-cheat

1. Getting started

On Ubuntu:

./configure
make

Each .S file on the top-level is an operating system! It gets compiled to a corresponding .img file.

Run the default OS on QEMU:

./run

Run a given OS:

./run min
./run bios_one_char

Extensions are ignored for perfect tab completion, so all the following are equivalent:

./run min
./run min.
./run min.S
./run min.img

Use Bochs instead of QEMU:

./run bios_hello_world bochs

Then on the terminal start the simulation with:

c

1.1. Getting started with real hardware

Insert an USB, determine its device (/dev/sdX) with:

sudo lsblk
sudo fdisk -l

Pick the .img file that you wan to run and:

sudo dd if=bios_hello_world.img of=/dev/sdX

Then:

  • insert the USB in a computer

  • during boot, hit some special hardware dependant key, usually F12, Esc

  • choose to boot from the USB

When you are done, just hit the power button to shutdown.

1.1.1. Getting started with the big image

Create a big.img that contains all examples that can be booted from GRUB:

make big.img

Now if you do:

sudo dd if=big.img of=/dev/sdX

you can test several examples with a single USB burn, which is much faster.

You can also try out the big image on QEMU for fun with:

qemu-system-i386 -hda big.img

You will also want to change the boot order to put the USB first from the F12 BIOS menu. This way you don’t have to hit F12 like a madman every time.

TODO: boot sectors that load STAGE2 are not working with the big image chainloader. TODO why?

1.2. Getting started with Docker

If you don’t have an Ubuntu box, this is an easy alternative:

sudo docker run -it --net=host ubuntu:14.04 bash

Then proceed normally in the guest: install packages, and build:

apt-get update
apt-get install git
git clone https://github.com/cirosantilli/x86-bare-metal-examples
cd x86-bare-metal-examples
./configure
make

To overcome the lack of GUI, we can use QEMU’s VNC implementation instead of the default SDL, which is visible on the host due to --net=host:

qemu-system-i386 -hda main.img -vnc :0

and then on host:

sudo apt-get install vinagre
vinagre localhost:5900

1.3. GDB step debug

TODO get it working nicely:

./run bios_hello_world debug

This will only cover specifics, you have to know GDB debugging already.

How to have debug symbols: https://stackoverflow.com/questions/32955887/how-to-disassemble-16-bit-x86-boot-sector-code-in-gdb-with-x-i-pc-it-gets-tr/32960272#32960272 TODO implement here. Needs to point GDB to an ELF file in addition to the remote listen.

TODO: detect if we are on 16 or 32 bit automatically from control registers. Now I’m using 2 functions 16 and 32 to switch manually, but that sucks. The problem is that it’s not possible to read them directly: http://stackoverflow.com/a/31340294/895245 If we had cr0, it would be easy to do with an if cr0 & 1 inside a hook-stop.

2. Minimal examples

These are the first ones you should look at.

2.1. Create a minimal image with printf

make -C printf run

Outcome: QEMU window opens up, prints a few boot messages, and hangs.

Our program itself does not print anything to the screen itself, just makes the CPU halt.

This example is generated with printf byte by byte: you can’t get more minimal than this!

It basically consists of:

  • byte 0: a hlt instruction

  • bytes 1 through 509: zeroes, could be anything

  • bytes 510 and 511: mandatory magic bytes 0xAA55, which are required for BIOS to consider our disk.

2.2. Minimal GAS example

Minimal example that just halts the CPU without using our mini-library common.h:

./run min

Source: min.S

Outcome: QEMU window opens up, prints a few firmware messages, and hangs.

2.2.1. Infinite loop

Go into an infinite loop instead of using hlt:

./run infinite_loop

The outcome if visibly the same, but TODO: it likely wastes more energy in real hardware?

2.2.2. Linker script

This hello world, and most of our OSes use the linker script: linker.ld

This critical file determines the memory layout of our assembly, take some time to read the comments in that file and familiarize yourself with it.

The Linux kernel also uses linker scripts to setup its image memory layout, see for example: https://github.com/torvalds/linux/blob/v4.2/arch/x86/boot/setup.ld

2.3. BIOS hello world

Print hello world after the firmware messages:

./run bios_hello_world

2.4. No linker script

Print hello world without using an explitic linker script:

make -C no-linker-script run

Sources:

Uses the default host ld script, not an explicit one set with -T. Uses:

  • -tText

  • .org inside each assembly file

  • _start must be present to avoid a warning, since the default linker script expects it

This is a hack, it can be more convenient for quick and dirty tests, but just don’t use it.

3. BIOS

The BIOS is one of the most well known firmwares in existence.

A firmware is a software a software that:

  • runs before the OS / bootloader to do very low level setup

  • usually closed source, provided by the vendor, and interacts with undocumented hardware APIs

  • offers an API to the OS / bootloader, that allows you to do things like quick and dirty IO

  • undistinguishable from an OS, except that is it usually smaller

BIOS is old, non-standardized, x86 omnipresent and limited.

UEFI is the shiny new overbloated thing.

If you are making a serious OS, use it as little as possible.

BIOS Can only be used in Real mode.

BIOS functions are all accessed through the int instruction:

mov <function-id>, %ah
int <interrupt-id>

Function arguments are stored in other registers.

The interrupt IDs are traditionally in hex as:

10h

which is the same as 0x10.

Each interrupt-id groups multiple functions with similar functions, e.g. 10h groups functions with video related functionality.

Bibliography:

3.1. BIOS documentation

Does any official documentation or standardization exist?

3.2. BIOS examples

Print a single @ character:

./run bios_putc

Source: bios_putc.S

Print a newline:

./run bios_newline

Source: bios_newline.S

Outcome:

hello
     world

Carriage returns are needed just like in old days:

./run bios_carriage_return

Outcome:

hello
world

Change the current cursor position:

./run bios_cursor_position

Outcome:

cb

3.2.1. BIOS color

Write a character N times with given color:

./run bios_color

Source: bios_color.S

Outcome:

bcd

where:

  • b and c have red foreground, and green background

  • d has the default color (gray on black)

Change the background color to red for the entire screen and print an a character:

./run bios_background

3.2.2. BIOS scroll

Scroll the screen:

./run bios_scroll

Source: bios_scroll.S

Outcome:

a
  c
 GG
   d

where G are empty green squares.

How it works:

Before scroll:

a
 b
  c
   d

We then choose to act on the rectangle with corners (1, 1) and (2, 2) given by cx and dx:

a
 XX
 YY
   d

and scroll that rectangle up by one line.

Y is then filled with the fill color green

3.2.2.1. BIOS clear screen

Subset of scroll:

./run bios_clear_screen

Outcome:

b

on red foreground, and the entire screen in green background, without any initial SeaBIOS messages.

3.2.3. BIOS draw pixel

Make the pixel at position (1, 1) clear red color (0Ch) in Video mode 13h:

./run bios_pixel

Source: bios_pixel.S

You may have to look a bit hard to see it.

Draw a line of such pixels:

./run bios_pixel_line

Advanced graphics!

3.2.4. BIOS keyboard

Get one character from the user via the keyboard, increment it by one, and print it to the screen, then halt:

./run bios_keyboard

Source: bios_keyboard.S

Type a bunch of characters and see them appear on the screen:

./run bios_keyboard_loop

Do try Ctrl-key combinations.

3.2.5. BIOS disk load

Load a stage 2 from disk with int 13h and run it:

./run bios_disk_load

Outcome:

a

This character was printed from stage 2.

Load two sectors instead of just one:

./run bios_disk_load2

Outcome:

ab

where a was printed from code on the first block, and b from code on the second block.

This shows that each sector is 512 bytes long.

GRUB 2.0 makes several calls to it under grub-core/boot/i386/pc.

TODO: not working on Bochs: BOUND_GdMa: fails bounds test.

But it does work on QEMU and ThinkPad T400.

Bibliography:

3.2.6. BIOS detect memory

TODO failed attempt at detecting how big our memory is with int 15h:

./run bios_detect_memory

Seems to output trash currently.

This is important in particular so that you can start your stack there when you enter Protected mode, since the stack grows down.

In 16-bit mode, it does not matter much, since most modern machines have all addressable memory there, but in 32-bit protected it does, as our emulator usually does not have all 4Gb. And of course, 64-bit RAM is currently larger than the total RAM in the world.

int 15 returns a list: each time you call it a new memory region is returned.

The format is not too complicated, and documented at: http://wiki.osdev.org/Detecting_Memory_%28x86%29#Detecting_Upper_Memory

  • 8 bytes: base address of region.

  • 8 bytes: length of region.

  • 4 bytes: type or region. 1 for usable RAM.

  • 4 bytes: some ACPI stuff that no one uses?

3.2.6.1. Low vs high memory

TODO example.

int 15h can detect low or high memory. How are they different?

3.2.7. BIOS sleep

Count to infinity, sleep one second between each count:

./run bios_sleep

Source: bios_sleep.S

Polls time counter that BIOS keeps up to date at 0x046C with frequency 18.2Hz eighteen times.

3.2.8. BIOS initial state

Check the initial state the firmware leaves us by printing the contents of several registers:

./run bios_initial_state

Outcome:

ax = 00 00
bx = 00 00
cx = 00 00
dx = 80 00
cs = 00 00
ds = 00 00
es = 00 00
fs = 00 00
gs = 00 00
ss = 00 00
cr0 = 53 FF 00 F0

dx seems to be like the only interesting regular register: the firmware stores the value of the current disk number to help with int 15h there. Thus it usually contains 0x80.

3.4. SeaBIOS

Open source x86 BIOS implementation.

Default BIOS for QEMU and KVM.

4. Modes of operation

The x86 processor has a few modes, which have huge impact on how the processor works.

Covered on the Intel manual Volume 3. Specially useful is the "Figure 2-3. Transitions Among the Processor’s Operating Modes" diagram.

The modes are:

  • Real-address, usually known just as "real mode"

  • Protected

  • System management

  • IA-32e. Has two sub modes:

    • Compatibility

    • 64-bit

  • Virtual-8086 Mode

Transition tables:

(all modes)
|
| Reset
|
v
+---------------------+
| Real address (PE=0) |
+---------------------+
^
|
| PE
|
v
+------------------------+
| Protected (PE=1, VM=0) |
+------------------------+
^                   ^
|                   |
|                   | VM
|                   |
v                   v
+--------------+    +---------------------+
| IA-32e       |    | Virtual-8086 (VM=1) |
+--------------+    +---------------------+

and:

+------------------------+
| System management mode |
+------------------------+
|          ^
|          |
| RSM      | SMI#
|          |
v          |
(All other modes)

The IA-32e transition is trickier, but clearly described on the Intel manual Volume 3 - 9.8.5 "Initializing IA-32e Mode":

Operating systems should follow this sequence to initialize IA-32e mode:

  1. Starting from protected mode, disable paging by setting CR0.PG = 0. Use the MOV CR0 instruction to disable paging (the instruction must be located in an identity-mapped page).

  2. Enable physical-address extensions (PAE) by setting CR4.PAE = 1. Failure to enable PAE will result in a #GP fault when an attempt is made to initialize IA-32e mode.

  3. Load CR3 with the physical base address of the Level 4 page map table (PML4).

  4. Enable IA-32e mode by setting IA32_EFER.LME = 1.

  5. Enable paging by setting CR0.PG = 1. This causes the processor to set the IA32_EFER.LMA bit to 1. The MOV CR0 instruction that enables paging and the following instructions must be located in an identity-mapped page (until such time that a branch to non-identity mapped pages can be effected).

4.1. Legacy modes

The term defined in the Intel manual Volume 3 - CHAPTER 2 "SYSTEM ARCHITECTURE OVERVIEW":

Real mode, protected mode, virtual 8086 mode, and system management mode. These are sometimes referred to as legacy modes.

In other words: anything except IA-32e and System management mode.

This further suggests that real, protected and virtual mode are not the main intended modes of operation.

4.2. Real mode

The CPU starts in this mode after power up.

All our BIOS examples are in real mode.

It is possible to use 32-bit registers in this mode with the "Operand Size Override Prefix" 0x66.

TODO is it possible to access memory above 1M like this:

mov $1, 0xF0000000
mov $1, (%eax)

4.2.1. Real mode segmentation

./run real_segmentation

Outcome:

AAAAAA

We access the character A with segments in 6 different ways:

  • ds, with explicit and implicit segment syntax

  • es, fs, gs, ss

Segment registers modify the addresses that instructions actually use as:

<segment> * 16 + <original-address>

This implies that:

  • 20 bits of memory (1MB) instead of the 16 bits (256kB) that normally fits into registers. E.g., to address:

    0x84000

    we can use:

    0x8000  (segment)
    0x 4000 (address)
    -------
    0x84000
  • most addresses can be encoded in multiple ways, e.g.:

    0x100

    can be encoded as either of:

    • segment = 0x10, address = 0

    • segment = 0, address = 0x100

    • segment = 0x1, address = 0xF0

fs and gs are general purpose: they are not affected implicitly by any instructions. All others will be further exemplified.

4.2.1.1. CS

Affects the code address pointer:

./run cs

Source: cs.S

Outcome:

00
01
02

CS is set with the ljmp instruction, and we use it to skip .skip zero gaps in the code.

4.2.1.2. SS
./run ss

Source: ss.S

Outcome:

0102

The second byte is 16 bytes after the first, and is accessed with SP = 1.

SS affects instructions that use SP such as PUSH and POP: those will actually use 16 * SS + SP as the actual address.

4.2.1.3. ES

TODO: this does seem to have special properties as used by string instructions.

4.2.1.4. Segment register encoding
objdump -D -b binary -m i8086 segment_registers.img

shows that non ds encodings are achieved through a prefix:

20:   a0 63 7c                mov    0x7c63,%al
34:   26 a0 63 7c             mov    %es:0x7c63,%al
40:   64 a0 63 7c             mov    %fs:0x7c63,%al
4c:   65 a0 63 7c             mov    %gs:0x7c63,%al
58:   36 a0 63 7c             mov    %ss:0x7c63,%al

This makes ds the most efficient one for data access, and thus a good default.

4.2.2. Interrupts

Create an interrupt handler and handle an interrupt:

./run interrupt

Source: interrupt.S

Outcome:

ab

It works like this:

  • print a an interrupt handler 0

  • jump back to main code

  • print b

TODO: is STI not needed because this interrupt is not maskable?

Same with interrupt handler 1:

./run interrupt1

Source: interrupt1.S

TODO understand: attempt to create an infinite loop that calls the interrupt from the handler:

./run interrupt_loop

QEMU exits with:

Trying to execute code outside RAM or ROM at 0x000a0000

Handle a division by zero:

./run interrupt_zero_divide

TODO understand:

  • expected outcome: prints values from 0 to 0xFFFF in an infinite loop.

  • actual outcome: stops at 0081

Apparently when there is an exception, iret jumps back to the line that threw the exception itself, not the one after, which leads to the loop:

But then why does it stop at 0081? And if we set the initial value to 0x0090, it just runs once.

4.2.2.1. int
  • long jumps to the CS : IP found in the corresponding interrupt vector.

  • pushes EFLAGS to let them be restored by iret?

4.2.2.2. iret

Jumps back to the next instruction to be executed before the interrupt came in.

Restores EFLAGS and other registers TODO which?

4.2.2.4. IVT

Interrupt vector table: https://wiki.osdev.org/IVT

The real mode in-memory table that stores the address for the handler for each interrupt.

In Protected mode, the analogous structure is the IDT.

The base address is set in the interrupt descriptor table register (IDTR), which can be modified with the lidt instruction.

The default address is 0x0.

The format of the table is:

IDTR -> +-----------------------+
0       |Address      (2 bytes) |
2       |Code segment (2 bytes) |
        +-----------------------+
        +-----------------------+
4 ----> |Address      (2 bytes) |
6       |Code segment (2 bytes) |
        +-----------------------+
        +-----------------------+
8 ----> |Address      (2 bytes) |
A       |Code segment (2 bytes) |
        +-----------------------+

...     ...
4.2.2.4.1. lidt

Set the value of the IDTR, and therefore set the base address of the IVT:

./run lidt
./run lidt2
./run lidt0

Sources:

TODO not working.

Expected outcome:

ab

Actual outcome: infinite reboot loop.

Actual outcome if we comment out the PUTC:

  • lidt: still infinite reboot loop

  • lidt2 and lidt0: halt apparently

I think I understand that lidt takes as input a memory address, and the memory at that address must contain:

  • 2 bytes: total size of the IVT in bytes

  • 4 bytes: base address of the IVT. Higher byte is ignored in real mode, since addresses are not 4 bytes long.

4.3. Protected mode

Print hello world in protected mode:

./run protected_mode

Major changes from real mode:

  • VGA must be used for output since BIOS is not available in protected mode.

  • segmentation takes effect immediately, so we have to set the GDT up

  • we have to encode instructions differently, thus a .code32 is needed. 16-bit mode 32-bit instructions are encodable with a special prefix.

Bibliography:

4.3.1. Intel protected mode example

The Intel manual Volume 3 - 9.10 "INITIALIZATION AND MODE SWITCHING EXAMPLE" does contain an official example of how to go into protected mode.

However:

How can those guys be in business? >:-)

4.3.3. Protected mode segmentation

TODO: get working:

./run segmentation

Source: segmentation.S

Expected outcome:

x
a
b

Actual outcome:

x
a

Example of the effect on a memory access of changing the segment base address.

Without segment manipulation, the output would be just: TODO

4.3.3.1. Segmentation introduction

First read the paging tutorial, and in particular: http://www.cirosantilli.com/x86-paging/#segmentation to get a feel for the type of register and data structure manipulation required to configure the CPU, and how segmentation compares to paging.

Segmentation modifies every memory access of a given segment by:

  • adding an offset to it

  • limiting how big the segment is

If an access is made at an offset larger than allowed an exception happens, which is like an interrupt, and gets handled by a previously registered handler.

Segmentation could be used to implement virtual memory by assigning one segment per program:

+-----------+--------+--------------------------+
| Program 1 | Unused | Program 2                |
+-----------+--------+--------------------------+
^           ^        ^                          ^
|           |        |                          |
Start1      End1     Start2                     End2

Besides address translation, the segmentation system also managed other features such as Protection rings. TODO: how are those done in 64-bit mode?

In Linux 32-bit for example, only two segments are used at all times: one at ring 0 for the kernel, and one another at privilege 3 for all user processes.

4.3.3.2. Segment selector

In protected mode, the segment registers CS, DS, SS, ES, FS and GS contain a data structure more complex than a simple address as in real mode, which contains a single number.

This 2 byte data structure is called a segment selector:

Position (bits) Size (bits) Name Description

0

2

Request Privilege Level (RPL)

Protection ring level, from 0 to 3.

2

1

Table Indicator (TI)

  • 0: global descriptor table

  • 1: local descriptor table

3

13

Index

Index of the Segment descriptor to be used from the descriptor table.

Like in real mode, this data structure is loaded on the registers with a regular mov mnemonic instruction.

Bibliography: Intel manual Volume 3 - 3.4.5 "Segment Descriptors".

4.3.3.3. GDT

Global descriptor table.

An in-memory array of Segment descriptor data structures:

The Index field of the Segment selector chooses which one of those segment descriptors is to be used.

The base address is set with the lgdt instruction, which loads from memory a 6 byte structure:

Position (bytes) Size (bytes) Description

0

2

Number of entries in the table

2

4

Base address of the table

Bibliography:

4.3.3.3.2. Null segment selector

Intel manual Volume 3 - 3.4.2 "Segment Selectors" says that we can’t use the first entry of the GDT:

The first entry of the GDT is not used by the processor. A segment selector that points to this entry of the GDT (that is, a segment selector with an index of 0 and the TI flag set to 0) is used as a “null segment selector.” The processor does not generate an exception when a segment register (other than the CS or SS registers) is loaded with a null selector. It does, however, generate an exception when a segment register holding a null selector is used to access memory. A null selector can be used to initialize unused segment registers. Loading the CS or SS register with a null segment selector causes a general-protection exception (#GP) to be generated.

4.3.3.4. Segment descriptor

A data structure that is stored in the GDT.

Clearly described on the Intel manual Volume 3 - 3.4.5 "Segment Descriptors" and in particular Figure 3-8 "Segment Descriptor".

The Linux kernel v4.2 encodes it at: arch/x86/include/asm/desc_defs.h in struct desc_struct

4.3.4. IDT

Interrupt descriptor table.

Protected mode analogue to the IVT:

./run idt

Source: idt.S

Outcome:

int 0 handled

Handle interrupt 1 instead of 0:

./run idt1

Source: idt1.S

Outcome:

int 1 handled

Print 00000020\n at 18.2 Hz with the PIT:

./run pit_protected

Source: pit_protected.S

Bibliography:

The first 32 handlers are reserved by the processor and have predefined meanings, as specified in the Intel manual Volume 3 Table 3-3. "Intel 64 and IA-32 General Exceptions".

In the Linux kernel, https://github.com/torvalds/linux/blob/v4.2/arch/x86/entry/entry_64.S sets them all up: each idtentry divide_error call sets up a new one.

4.3.4.1. IDT divide by zero

Handle a division by zero:

./run idt_zero_divide

Outcome:

division by zero handled

Division by zero causes a Divide Error which Intel notes as #DE.

It is then handled by IDT 0.

DEs are not only for division by zero: they also happens on overflow. TODO example.

4.3.5. SMP

Start multiple processors and make them interact:

./run smp

Source: smp.S

Outcome:

SMP started

Implies that SMP worked because a spinlock was unlocked by the second processor.

Try commenting out waking up the second processor and see it not get printed.

4.3.6. Paging

Verbose beginner’s tutorial: http://www.cirosantilli.com/x86-paging/

Change page tables and observe how that affects memory accesses:

./run paging

Source: paging.S

Outcome:

00001234
00005678

Implies that paging worked because we printed and modified the same physical address with two different virtual addresses.

Requires Protected mode.

4.3.6.1. Page fault

Generate and handle a page fault:

./run page_fault

Source: page_fault.S

Outcome:

Page fault handled. Error code:
00000002

This is printed from a page fault handler that we setup an triggered by writing to an unmapped address.

4.4. IA-32e mode

Wikipedia seems to call it long mode: https://en.wikipedia.org/wiki/Long_mode

Contains two sub-modes: 64-bit mode and Compatibility mode.

This controlled by the CS.L bit of the segment descriptor.

It appears that it is possible for user programs to modify that during execution from userland: http://stackoverflow.com/questions/12716419/can-you-enter-x64-32-bit-long-compatibility-sub-mode-outside-of-kernel-mode

TODO vs Protected mode.

4.5. 64-bit mode

64-bit is the major mode of operation, and enables the full 64 bit instructions.

5. in and out instructions

x86 has dedicated instructions for certain IO operations: in and out.

These instructions take an IO address which identifies which hardware they will communicate to.

The IO ports don’t seem to be standardized, like everything else: http://stackoverflow.com/questions/14194798/is-there-a-specification-of-x86-i-o-port-assignment

The Linux kernel wraps those instructions with the inb and outb family of instructions:

man inb
man outb

5.1. Memory mapped vs port mapped IO

Not all instruction sets have dedicated instructions such as in and out for IO.

In ARM for example, everything is done by writing to magic memory addresses.

The dedicated in and out approach is called "port mapped IO", and the approach of the magic addresses "memory mapp"

From an interface point of view, I feel that memory mapped is more elegant: port IO simply creates a second addresses space.

TODO: are there performance considerations when designing CPUs?

5.2. PS/2 keyboard

Whenever you press a key down or up, the keyboard hex scancode is printed to the screen:

./run ps2_keyboard

Source: ps2_keyboard.S

Uses the PS/2 keyboard controller on in 60h: http://wiki.osdev.org/%228042%22_PS/2_Controller

The in always returns immediately with the last keyboard keycode: we then just poll for changes and print only the changes.

Scancode tables: TODO: official specs?

TODO do this with the interrupt table instead of in. Failed attempt at: interrupt_keyboard.S

5.4. RTC

Get wall time with precision of seconds every second:

./run rtc

Source: rtc.S

Sample outcome:

00 01 02 03 04 10

which means:

3rd April 2010, 02 hours 01 minute and 00 seconds.

Uses out 70h and in 71h to query the hardware.

This hardware must therefore use a separate battery to keep going when we turn off the computer or remove the laptop battery.

We can control the initial value in QEMU with the option:

qemu-system-x86_64 -rtc base='2010-04-03T02:01:00'

The RTC cannot give accuracy greater than seconds. For that, consider the PIT, or the HPET.

Bibliography:

5.5. PIT

Superseded by the HPET.

Print a\n with the minimal frequency possible of 0x1234DD / 0xFFFF = 18.2 Hz:

./run pit

Source: pit.S

Make the PIT generate a single interrupt instead of a frequency:

./run pit_once

Source: pit_once.S

Outcome:

a

TODO I think this counts down from the value value in channel 0, and therefore allows to schedule a single event in the future.

The PIT can generate periodic interrupts (or sound!) with a given frequency to IRQ0, which on real mode maps to interrupt 8 by default.

Major application: interrupt the running process to allow the OS to schedule processes.

The PIT 3 channels that can generate 3 independent signals

  • channel 0 at port 40h: generates interrupts

  • channel 1 at port 41h: not to be used for some reason

  • channel 2 at port 42h: linked to the speaker to generate sounds

Port 43h is used to control signal properties except frequency, which goes in the channel ports, for the 3 channels.

Bibliography:

5.5.1. PIT frequency

We don’t control the frequency of the PIT directly, which is fixed at 0x1234DD.

Instead, we control a frequency divisor. This is a classic type of discrete electronic circuit: https://en.wikipedia.org/wiki/Frequency_divider

The magic frequency comes from historical reasons to reuse television hardware according to https://wiki.osdev.org/Programmable_Interval_Timer, which in turn is likely influenced by some physical properties of crystal oscillators.

The constant 1193181 == 0x1234DD has 2 occurrences on Linux 4.16.

5.5.3. PC speaker

./run pc_speaker

Source: pc_speaker.S

Outcome: produces a foul noisy noise using the PC speaker hardware on out 61h

QEMU only plays the sound if we give it the option:

-soundhw pcspk

The beep just uses the PIT Channel 2 to generate the frequency.

Extracted from: https://github.com/torvalds/linux/blob/v4.2/arch/x86/realmode/rm/wakemain.c#L38 The kernel has a Morse code encoder using it!

Bibliography:

6. Video mode

There are several video modes.

Modes determine what interrupt functions can be used.

There are 2 main types of modes:

  • text, where we operate character-wise

  • video, operate byte-wise

Modes can be set with int 0x10 and AH = 0x00, and get with AH = 0x0F

The most common modes seem to be:

  • 0x01: 40x25 Text, 16 colors, 8 pages

  • 0x03: 80x25 Text, 16 colors, 8 pages

  • 0x13: 320x200 Graphics, 256 colors, 1 page

You can add 128 to the modes to prevent them from clearing the screen.

6.1. Video mode 13h

Example at: BIOS draw pixel

Video Mode 13h has: 320 x 200 Graphics, 256 colors, 1 page.

The color encoding is just an arbitrary palette that fits 1 byte, it is not split colors like R R R G G G B B or anything mentioned at: https://en.wikipedia.org/wiki/8-bit_color. Related: http://stackoverflow.com/questions/14233437/convert-normal-256-color-to-mode-13h-version-color

6.2. VGA

TODO: what is it exactly?

BIOS cannot be used when we move into Protected mode, but we can use the VGA interface to get output out of our programs.

Have a look at the macros prefixed with VGA_ inside common.h.

7. Power

7.1. Reboot

Infinite reboot loop on emulator!

./run reboot

Source: reboot.S

TODO why does it work?

7.2. APM

Turn on and immediately shutdown the system closing QEMU:

./run apm_shutdown

Source: apm_shutdown.S

Fancier version copied from http://wiki.osdev.org/APM (TODO why is that better):

./run apm_shutdown2

Source: apm_shutdown2.S

Older than ACPI and simpler.

By Microsoft in 1995. Spec seems to be in RTF format…​

Bibliography:

7.3. ACPI

TODO example

Newer and better.

Now managed by the same group that manages UEFI.

Spec:

8. UEFI

Successor for BIOS.

Made by Intel, mostly MIT open source, which likely implies that vendors will hack away closed source versions.

Matthew Garrett says it is huge: larger than Linux without drivers.

Since it is huge, it inevitably contains bugs. Garret says that Intel sometimes does not feel like updating the firmware with bugfixes.

UEFI offers a large API comparable to what most people would call an operating system:

ARM is considering an implementation https://wiki.linaro.org/ARM/UEFI

8.1. UEFI example

make -C uefi run

TODO get a hello world program working:

Running without image gives the UEFI shell, and a Linux kernel image booted fine with it: http://unix.stackexchange.com/a/228053/32558, so we just need to generate the image.

The blob uefi/ovmf.fd IA32 r15214 was downloaded from: https://sourceforge.net/projects/edk2/files/OVMF/OVMF-IA32-r15214.zip/download TODO: automate building it from source instead, get rid of the blob, and force push it away from history. It seems that they have moved to GitHub at last: https://github.com/tianocore/tianocore.github.io/wiki/How-to-build-OVMF/e372aa54750838a7165b08bb02b105148e2c4190

9. Coreboot

TODO minimal examples.

Open source hippie freedom loving cross platform firmware that attempts to replace BIOS and UEFI for the greater good of mankind.

10. GRUB

grub/README.adoc TODO cleanup and exemplify everything in that file. Some hosty stuff needs to go out maybe.

10.1. GRUB chainloader

make -C grub/chainloader run

Outcome: you are left in an interactive GRUB menu with two choices:

  • hello-world: go into a hello world OS

  • self +1: reload ourselves, and almost immediately reload GRUB and fall on the same menu as before

This example illustrates the chainloader GRUB command, which just loads a boot sector and runs it: https://www.gnu.org/software/grub/manual/grub/html_node/chainloader.html

This is what you need to boot systems like Windows which GRUB does not know anything about: just point to their partition and let them do the job.

Both of the menu options are implemented with chainloader:

  • hello-world:

    Loads a given image file within the partition.

    After build, grub-mkrescue creates a few filesystems, and grub/chainloader/iso/boot/main.img is placed inside one of those filesystems.

    This illustrates GRUB’s awesome ability to understand certain filesystem formats, and fetch files from them, thus allowing us to pick between multiple operating systems with a single filesystem.

    It is educational to open up the generated grub/chainloader/main.img with the techniques described at https://askubuntu.com/questions/69363/mount-single-partition-from-image-of-entire-disk-device/673257#673257 to observe that the third partition of the image is a VFAT filesystem, and that it contains the boot/main.img image as a regular file.

  • self +1: uses the syntax:

    chainloader +1

    which reloads the first sector of the current partition, and therefor ourselves.

10.2. GRUB linux

TODO get working.

OK, let’s have some fun and do the real thing!

make -C grub/linux run

Expected outcome: GRUB menu with a single Buildroot entry. When you select it, a tiny pre-built Linux image boots from: https://github.com/cirosantilli/linux-kernel-module-cheat

Actual outcome: after selecting the entry, nothing shows on the screen. Even if we fix this, we will then also need to provide a rootfs somehow: the initrd GRUB command would be a simple method, that repo can also generate initrd images: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/c06476bfc821659a4731d49e808f45e8c509c5e1#initrd Maybe have look under Buildroot boot/grub2 and copy what they are doing there.

The GRUB command is of form:

linux /boot/bzImage root=/dev/sda1 console=tty1

so we see that the kernel boot parameters are passed right there, for example try to change the value of the printk.time parameter:

printk.time=y

and see how the dmesg times not get printed anymore.

11. Multiboot

Standard created by GRUB for booting OSes.

Multiboot files are an extension of ELF files with a special header.

Advantages: GRUB does housekeeping magic for you:

  • you can store the OS as a regular file inside a filesystem

  • your program starts in 32-bit mode already, not 16 bit real mode

  • it gets the available memory ranges for you

Disadvantages:

  • more boilerplate

GRUB leaves the application into a well defined starting state.

It seems that Linux does not implement Multiboot natively, but GRUB supports it as an exception: http://stackoverflow.com/questions/17909429/booting-a-non-multiboot-kernel-with-grub2

11.1. Multiboot hello world

make -C multiboot/hello-world run

which actually runs:

qemu-system-i386 -kernel 'main.elf'

where main.elf is the multiboot file we generated.

Outcome:

hello world

Or you can use grub-mkrescue to make a multiboot file into a bootable ISO or disk:

qemu-system-x86_64 -drive file=main.img,format=raw

The main.img file can also be burned to a USB and run on real hardware.

Example originally minimized from https://github.com/programble/bare-metal-tetris

This example illustrates the multiboot GRUB command: https://www.gnu.org/software/grub/manual/grub/html_node/multiboot.html

11.2. osdev multiboot hello world

We also track here the code from: http://wiki.osdev.org/Bare_Bones:

make -C multiboot/osdev run

Outcome:

hello world

This is interesting as it uses C as much as possible with some GAS where needed.

This should serve as a decent basis for starting a pet OS. But please don’t, there are enough out there already :-)

12. Tests

12.1. Unit tests

Tests for utilities defined in this repo, as opposed to x86 or external firmware concepts.

TODO: implement the function and enable this test: test_vga_print_bytes.S

12.1.1. PRINT_BYTES

Print several bytes in human readable form:

./run test_print_bytes

Outcome:

40 41 42 43 44 45 46 47
48 49 4A 4B 4C 4D 4E 4F
50

12.1.2. PIT_SLEEP_TICKS

Print a\n with frequency 2Hz:

./run test_pit_sleep_ticks

Same but in protected mode:

./run test_pit_sleep_protected

12.2. Test hardware

12.2.1. ThinkPad T400

Most of this repo was originally tested on a ThinkPad T400.

Unfortunately it broke and I threw it away, and I didn’t write down the exact specs before doing so, notably the bootloader version.

13. About

13.1. System vs userland

This repository covers only things that can only be done from ring 0 (system) and not ring 3 (userland).

13.2. One minimal concept per OS

There are a few tutorials that explain how to make an operating system and give examples of increasing complexity with more and more functionality added.

This is not one of them.

The goal of this repository is to use the minimal setup possible to be able to observe a single low-level programming concept for each minimal operating system we create.

This is not meant provide a template from which you can write a real OS, but instead to illustrate how those low-level concepts work in isolation, so that you can use that knowledge to implement operating systems or drivers.

Minimal examples are useful because it is easier to observe the requirements for a given concept to be observable.

Another advantage is that it is easier to DRY up minimal examples (here done simply through #include and macros), which is much harder on progressive OS template tutorials, which tend to repeat big chunks of code between the examples.

13.3. To C or not to C

Using C or not is a hard choice.

It does make it much easier to express higher level ideas, and gives portability.

But in the end, it increases the complexity that one has to understand, so we’ve stayed away from it.

13.4. NASM

cd nasm/
./run bios_hello_world

While NASM is a bit more convenient than GAS to write a boot sector, I think it is just not worth it.

When writing an OS in C, we are going to use GCC, which already uses GAS. So it’s better to reduce the number of assemblers to one and stick to GAS only.

Right now, this directory is not very DRY since NASM is secondary to me, so it contains mostly some copy / paste examples.

On top of that, GAS also supports other architectures besides x86, so learning it is more useful in that sense.

13.5. Macros vs functions

Using macros for now on common.h instead of functions because it simplifies the linker script.

But the downsides are severe:

  • no symbols to help debugging. TODO: I think there are assembly constructs for that.

  • impossible to step over method calls: you have to step into everything. TODO: until?

  • larger output, supposing I can get linker gc for unused functions working, see --gc-section, which is for now uncertain.

    If I can get this working, I’ll definitely move to function calls.

    The problem is that if I don’t, every image will need a stage 2 loader. That is not too serious though, it could be added to the BEGIN.

    It seems that ld can only remove sections, not individual symbols: http://stackoverflow.com/questions/6687630/c-c-gcc-ld-remove-unused-symbols With GCC we can use -ffunction-sections -fdata-sections to quickly generate a ton of sections, but I don’t thing GAS supports that…​

13.5.1. Macro conventions

Every "function-like macro" in common.h must maintain the state of general purpose registers.

Flags are currently not maintained.

%sp cannot be used to pass most arguments.

We don’t care about setting %bp properly at the moment.

13.6. Linux is open source

Always try looking into the Linux kernel to find how those CPU capabilities are used in a "real" OS.

13.7. Pre-requisites

OS dev is one of the most insanely hard programming tasks a person can undertake, and will push your knowledge of several domains to the limit.

Knowing the following will help a lot:

While it is possible to learn those topics as you go along, and it is almost certain that you will end up learning more about them, we will not explain them here in detail.

15. Bibliography

15.1. Intel manual

We are interested mostly in the "Intel Manual Volume 3 System Programming Guide", where system programming basically means "OS stuff" or "bare metal" as opposed to userland present in the other manuals.

15.2. Small educational projects

Fun, educational and useless:

The following did not work on my machine out of the box:

15.5. Progressive tutorials

15.6. Actually useful

These are not meant as learning resources but rather as useful programs:

16. LICENSE

Copyright Ciro Santilli http://www.cirosantilli.com/

GPL v3 for executable computer program usage.

CC BY-SA v4 for human consumption usage in learning material, e.g. .md files, source code comments, using source code excerpts in tutorials. Recommended attribution:

  • Single file adaptations:

    Based on https://github.com/cirosantilli/x86-bare-metal-examples/blob/<commit-id>/path/to/file.md under CC BY-SA v4
  • Multi-file adaptations:

    Based on https://github.com/cirosantilli/x86-bare-metal-examples/tree/<commit-id> under CC BY-SA v4

If you want to use this work under a different license, contact the copyright owner, and he might make a good price.