SAMOS is a minimal bare-metal AArch64 (ARMv8-A) kernel built from scratch for the QEMU virt machine with a cortex-a57 CPU. It implements all the fundamental pieces of an operating system kernel: boot from EL3 down to EL1, virtual memory via stage-1 page tables, exception and interrupt handling, a generic timer driver, heap and page allocators, a preemptive round-robin scheduler, system calls, ELF loading, a virtio block driver, an initramfs filesystem, and a user-mode demo program.
The project was built incrementally: start with a boot.S that drops from EL3 to EL2 to EL1, then add UART output, then MMU, then exceptions, then a timer, then a heap allocator, then a scheduler, then a shell, then GIC, then system calls, then ELF loading, then virtio and filesystem support, and finally user-mode execution.
The cross-compiler toolchain is bundled in the downloads/ directory:
downloads/
arm-toolchain/ <-- extracted aarch64-none-elf-gcc 15.2.1
bin/
aarch64-none-elf-gcc.exe
aarch64-none-elf-ld.exe
aarch64-none-elf-objcopy.exe
...
You also need:
- QEMU —
qemu-system-aarch64(install from https://www.qemu.org or use the bundled installer) - make —
mingw32-makeon Windows,makeon Linux/Mac
cd samos
make # uses ../downloads/arm-toolchain by defaultThe Makefile defaults TOOLCHAIN to ../downloads/arm-toolchain, so it picks up the bundled cross-compiler automatically. To override:
make TOOLCHAIN=/path/to/cross-compiler| File | Description |
|---|---|
build/*.o |
Compiled object files (one per source file) |
samos.elf |
ELF executable with DWARF debug info, loaded by QEMU |
samos.bin |
Raw binary image (objcopy -O binary) |
| Target | Description |
|---|---|
make or make all |
Build samos.elf and samos.bin |
make run |
Launch in QEMU (-machine virt -cpu cortex-a57 -nographic) |
make debug |
Same but with -s -S (wait for GDB connection on port 1234) |
make clean |
Remove object files and output binaries |
build.bat # builds
run.bat # runs in QEMU
clean.bat # removes build artifacts
All scripts use relative paths (%~dp0) so the project is fully portable on a USB drive.
app/
├── downloads/
│ ├── arm-toolchain/ # Cross-compiler (aarch64-none-elf-gcc)
│ ├── qemu-w64-setup-20260501.exe # QEMU installer
│ └── arm-gnu-toolchain-*.zip # Original toolchain archive
├── samos/
│ ├── build/ # Object files (generated)
│ ├── src/ # Source code (36 files)
│ │ ├── kernel.c # Entry point, banner, init sequence
│ │ ├── boot.S # Boot asm, vector table, context switch
│ │ ├── uart.c / uart.h # PL011 UART driver
│ │ ├── printf.c / printf.h # Formatted output (stdarg-based)
│ │ ├── mmu.c / mmu.h # Stage-1 page tables, identity mapping
│ │ ├── exceptions.c / exceptions.h # EL1/EL0 exception handlers
│ │ ├── gic.c / gic.h # GICv3 driver (v2 MMIO fallback)
│ │ ├── timer.c / timer.h # ARM Generic Timer (CNTP)
│ │ ├── allocator.c / allocator.h # Bitmap-based 4KB page allocator
│ │ ├── heap.c / heap.h # Free-list heap (kmalloc/kfree)
│ │ ├── spinlock.h # Header-only spinlock (ldaxr/stlxr)
│ │ ├── scheduler.c / scheduler.h # Round-robin scheduler, 16 tasks
│ │ ├── shell.c / shell.h # Interactive shell (7 commands)
│ │ ├── keyboard.c / keyboard.h # Line-editing keyboard input
│ │ ├── syscall.c / syscall.h # 5 syscalls (write/read/sleep/yield/exit)
│ │ ├── elf.c / elf.h # ELF64 loader (static segments)
│ │ ├── virtio.c / virtio.h # Virtio block device driver
│ │ ├── fs.c / fs.h # Initramfs (ustar TAR reader)
│ │ ├── user_process.c / user_process.h # User task creation helper
│ │ └── user_demo.S # User-mode demo (prints [user EL0]!)
│ ├── linker.ld # Linker script (0x40080000)
│ ├── Makefile # Build system
│ ├── virt.dtb # Device tree blob (QEMU virt)
│ ├── samos.elf / samos.bin # Built binaries
│ └── README.md # This file
├── x.html # Web-based boot visualization
├── build.bat # Windows build script
├── run.bat # Windows run script
└── clean.bat # Windows clean script
ARMv8-A boots in the highest exception level (EL3). SAMOS drops through EL2 to EL1 in three stages, all in boot.S:
_start (EL3)
│ Detect current exception level
│
├─ EL3 path (QEMU default):
│ ├─ Configure GIC distributor at 0x08000000 for Non-Secure access
│ │ - GICD_CTLR: enable GICv3 mode (ARE_S=1, ARE_NS=1, EnableGrp1=1)
│ │ - ICC_SRE_EL3: enable system register interface
│ ├─ SCR_EL3.NS=1 → Non-Secure state for EL2
│ ├─ HCR_EL2.RW=1 → AArch64 for EL1
│ ├─ SCTLR_EL2=0 → MMU off at EL2
│ ├─ SPSR_EL3=0x349 → EL2h, all interrupts masked
│ ├─ ELR_EL3=el2_entry
│ └─ ERET → EL2
│
├─ EL2 path:
│ ├─ ICC_SRE_EL2: enable system register access at EL1
│ ├─ SCTLR_EL1=0 → MMU off at EL1
│ ├─ SPSR_EL2=0x345 → EL1h, all interrupts masked
│ ├─ ELR_EL2=el1_entry
│ └─ ERET → EL1
│
└─ EL1 entry (el1_entry):
├─ CPACR_EL1 → FPEN=3 (enable FP/SIMD at EL0/EL1)
├─ SP = __stack_top (64KB stack from linker.ld)
├─ Clear BSS (__bss_start → __bss_end)
└─ BL kernel_main()
The kernel is loaded at physical address 0x40080000 (QEMU virt convention):
0x40080000
├─ .text
│ ├─ .text.boot (_start entry point)
│ ├─ .align 2048 (2KB gap for vector table alignment)
│ ├─ .text.vector (_vector_table - must be 2KB-aligned for VBAR_EL1)
│ └─ .text* (all other code)
├─ .rodata
├─ .data
├─ .bss (zero-initialized, __bss_start/__bss_end)
└─ stack space (64KB, __stack_top at the top)
0x00000000 - 0x07FFFFFF Unmapped
0x08000000 - 0x080FFFFF GIC (Distributor 0x08000000, Redistributor 0x080A0000, CPU IF 0x08010000)
0x09000000 - 0x09000FFF PL011 UART
0x0C000000 - 0x0C000FFF Virtio block MMIO
0x40000000 - 0x4007FFFF First 512KB RAM (unused before kernel)
0x40080000 Kernel loaded here
├─ 0x40080000 - 0x401FFFFF Kernel .text, .rodata, .data, .bss
├─ 0x40200000 - 0x403FFFFF Page allocator pool (32MB)
├─ 0x40400000 - 0x404FFFFF Heap (1MB)
└─ 0x40500000 - 0x5FFFFFFF Free memory from page allocator
0x60000000 End of 512MB RAM
The MMU uses AArch64 stage-1 translation tables with 4KB granules and 2MB block mappings (2-level page table: L1 + L2).
Configuration:
| Register | Value | Meaning |
|---|---|---|
| TCR_EL1 | T0SZ=25, TG0=4KB, SH=Inner, ORGN/IRGN=WBRAWA | 512GB address space |
| MAIR_EL1 | Attr0=0xFF (Normal WB), Attr1=0x04 (Device nGnRE), Attr2=0x00 (Device nGnRnE) | 3 memory types |
Mappings (identity-mapped, virtual == physical):
| Region | Size | Attributes |
|---|---|---|
| 0x40000000 - 0x5FFFFFFF | 512MB RAM | Normal, RW at EL1 |
| 0x08000000 - 0x08FFFFFF | 2MB (GIC) | Device nGnRnE |
| 0x09000000 - 0x09FFFFFF | 2MB (UART) | Device nGnRE |
Key design decisions:
- Identity mapping (virtual == physical) — keeps things simple, no need to relocate the kernel
- 2MB blocks only — uses block entries at L2, never allocates L3 tables for 4KB pages
- No VM for kernel heap — the kernel operates on physical addresses directly
- User processes get separate page tables: code at virtual
0x10000000, stack at0x7FE00000
Page tables (global):
| Table | Level | Coverage |
|---|---|---|
l1_tbl |
L1 | 0x00000000 - 0x7FFFFFFF (512GB) |
l2_tbl_lo |
L2 | 0x00000000 - 0x3FFFFFFF (first 1GB of L1 entry 0) |
l2_tbl_hi |
L2 | 0x40000000 - 0x7FFFFFFF (second 1GB of L1 entry 0) |
A bitmap-based 4KB page allocator managing a 32MB pool starting at 0x40200000.
pool: 0x40200000 - 0x403FFFFF (32MB = 8192 pages)
bitmap: first few pages store the bitmap itself (8192 bits = 1024 bytes = 1 page)
page_alloc()— linear scan for first free bit, marks it, returns addresspage_free(addr)— clears the corresponding bit- Initialization marks the bitmap pages as allocated so they aren't handed out
A free-list-based heap (kmalloc/kfree) sitting on top of page-allocated memory, with spinlock protection for thread safety.
Allocation strategy:
- Align request to 8 bytes, add header (
size_tstoring total block size) - First-fit walk of the free list
- Split block if remainder ≥ minimum block size (
sizeof(free_block_t)) - Return pointer past the size field
Free strategy:
- Insert into free list sorted by address
- Coalesce with adjacent free blocks (both forward and backward)
Configuration: 1MB heap starting at the first page after the page allocator pool.
The vector table (_vector_table in boot.S) is 2KB-aligned and contains 16 entries per the ARMv8-A spec:
Offset | Exception Type | Handler
--------|------------------------|---------------------------
0x000 | EL1h Sync | exc_sync_el1h (exc_stub 0)
0x080 | EL1h IRQ | exc_irq_el1h (exc_stub 1)
0x100 | EL1h FIQ | exc_fiq_el1h (exc_stub 2)
0x180 | EL1h SError | exc_serror_el1h (exc_stub 3)
0x200 | EL0 Sync (AArch64) | exc_sync_el0 (full ctx_save)
0x280 | EL0 IRQ (AArch64) | exc_irq_el0 (full ctx_save)
0x300 | EL0 FIQ (AArch64) | exc_fiq_el0 (full ctx_save)
0x380 | EL0 SError (AArch64) | exc_serror_el0 (full ctx_save)
EL1h exceptions: Use exc_stub macro — saves x0-x7 (64 bytes), calls C handler, restores, eret.
EL0 exceptions: Use ctx_save/ctx_restore — saves all 35 registers (280 bytes), calls C handler. On return, checks reschedule_pending; if set, branches to ctx_resched for task switch.
Exception Class (ESR) decoding:
| EC | Meaning | Handler Action |
|---|---|---|
| 0x15 | SVC (syscall) | handle_svc_el1() or syscall_handler() |
| 0x20/0x21 | Instruction Abort | Print fault info, halt |
| 0x24/0x25 | Data Abort | Print fault info, halt |
Despite QEMU virt using GICv3, the driver explicitly falls back to GICv2 MMIO compatibility mode because system register access may be trapped.
Register layout:
| Region | Address | Purpose |
|---|---|---|
| GICD | 0x08000000 | Distributor (global settings, SPI enable) |
| GICR | 0x080A0000 | Redistributor (PPI/SGI enable per-core) |
| GICC | 0x08010000 | CPU interface (GICv2 MMIO fallback) |
Key interrupt IDs:
| ID | Name | Source |
|---|---|---|
| 30 | IRQ_TIMER_EL1 | PPI 14 (16 + 14) — ARM Generic Timer |
| 33 | IRQ_UART | SPI 1 (32 + 1) — PL011 UART |
Initialization:
- Read
GICD_TYPERto get IRQ line count - Disable all SPI interrupts
- Set up CPU interface (GICv2 MMIO if sysregs not available)
- Enable distributor
Interrupt flow:
gic_acknowledge()— readsICC_IAR1_EL1(or GICC_IAR) for pending IRQ ID- Handle the IRQ
gic_eoi(irq)— writesICC_EOIR1_EL1(or GICC_EOIR) to signal end-of-interrupt
Uses the ARM Generic Timer's physical count and timer (CNTP):
| Register | Function |
|---|---|
CNTFRQ_EL0 |
Timer frequency (read at init — typically 62.5MHz on QEMU) |
CNTPCT_EL0 |
Physical count register (free-running counter) |
CNTP_TVAL_EL0 |
Timer value (counts down, IRQ fires when 0 is reached) |
CNTP_CTL_EL0 |
Control (bit 0: enable, bit 1: IMASK, bit 2: ISTATUS) |
Timer starts at 20ms intervals (50Hz) using timer_irq_start(20). Each IRQ rearms via timer_irq_ack().
sleep is implemented as a busy-wait loop (timer_sleep(ms) counts ticks).
A preemptive round-robin scheduler with a static pool of 16 tasks.
Task states:
| State | Meaning |
|---|---|
| RUNNING | Currently on CPU |
| READY | In ready queue, waiting for CPU |
| WAITING | Blocked (not used in current implementation) |
| ZOMBIE | Exited, slot available for new task |
Task control block:
typedef struct task {
unsigned long sp; // saved stack pointer (context frame)
unsigned long *stack; // kernel stack base
task_state_t state; // RUNNING/READY/WAITING/ZOMBIE
int pid;
char name[16];
struct task *next; // ready queue linked list
int is_user; // 1 = user-mode (EL0) task
uint64_t ttbr0; // page table for user task
unsigned long user_entry; // virtual entry point
unsigned long user_sp; // user stack pointer
unsigned long *user_stack; // user stack base
} task_t;Context frame layout (stack grows down, SP at the end):
SP+0: x0 (offset 0)
SP+8: x1
...
SP+232: x29 (offset 232)
SP+240: x30 (LR) (offset 240)
SP+248: SP_EL0 (offset 248)
SP+256: ELR_EL1 (offset 256)
SP+264: SPSR_EL1 (offset 264)
SP+272: TTBR0_EL1 (offset 272)
How preemption works:
Timer IRQ (20ms)
│
▼
exc_irq_el0_handler()
│ gic_acknowledge() → ID=30 (timer)
│ timer_irq_ack() → re-arm
│ sched_try_yield() → dequeue head, enqueue current at tail
│ → sets reschedule_pending = 1
│ gic_eoi(irq)
│
▼
Return from exception
│ check reschedule_pending
│ if set:
│ ctx_resched:
│ *current_task_sp_ptr = SP (save current)
│ SP = next_task_sp (load next)
│ reschedule_pending = 0
│ ctx_restore + eret
Task creation:
- Kernel tasks (
sched_create_task): 4KB stack, context frame withSPSR=0x3C5(EL1h, IRQs masked), ELR →task_trampoline, x19 = entry function - User tasks (
sched_create_user_task): kernel stack + user stack (4KB each), separate page table, code at0x10000000, stack at0x7FE00000,SPSR=0(EL0t, all exceptions unmasked)
Five syscalls, invoked via SVC from EL0:
| # | Name | Args | Implementation |
|---|---|---|---|
| 0 | SYS_WRITE |
buf, len |
Writes len bytes from buf to UART |
| 1 | SYS_SLEEP |
ms |
Busy-wait for ms milliseconds |
| 2 | SYS_YIELD |
— | Calls sched_yield() |
| 3 | SYS_EXIT |
— | Frees stacks, marks ZOMBIE, switches away |
| 4 | SYS_READ |
buf, max |
Reads keyboard input with echo/backspace |
The syscall number is extracted from ESR_EL1 bits 0-15 (the immediate field of the SVC instruction). Arguments come from x0-x2.
The interactive shell runs as the kernel's main task after boot. It reads commands via keyboard_gets() with full line editing (backspace, echo, CR/LF handling).
Commands:
| Command | Function | What it does |
|---|---|---|
help |
cmd_help() |
Lists all commands |
ps |
cmd_ps() |
Prints task table via sched_list_tasks() |
kmalloc |
cmd_kmalloc() |
Tests heap: allocates 8 blocks (16-128 bytes), frees every other, reallocates, reports total heap usage |
timer |
cmd_timer() |
Tests timer: sleeps 1 second, reports elapsed ticks vs expected |
info |
cmd_info() |
Shows timer frequency, page size, heap bounds, MMU status |
uptime |
cmd_uptime() |
Reads tick count, converts to minutes:seconds |
task |
cmd_task() |
Creates two kernel test tasks (testA prints every 2s, testB every 3s) |
PL011 UART at 0x09000000 (QEMU virt standard address). Polled mode — no interrupts.
uart_init(): sets baud (IBRD=13, FBRD=1), 8-bit FIFO mode, enables TX/RXuart_putc(c): pollsFR_TXFF(bit 5), writes toDRuart_puts(s): writes string, converts\nto\r\nuart_getc(): pollsFR_RXFE(bit 4), reads fromDR
Sits on top of UART RX:
keyboard_getc()— blocking read fromuart_getc()keyboard_getc_nonblock()— checksUART_FRbit 4; returns -1 if emptykeyboard_gets(buf, max)— line-editing input: echoes chars, handles backspace (DEL/\b), stops on CR/LF
Used by the shell and by the SYS_READ syscall.
A minimal virtio block device driver for virtio-mmio at 0x0C000000.
Status sequence on init:
- Check magic (
0x74726976) - Verify device ID 2 (block device)
- Reset → ACK → DRIVER → DRIVER_OK
Read operation:
- Allocate request header (16 bytes), status byte, and data page
- Program queue num = 1
- Poll
QUEUE_READYfor completion - If status == 0, copy 512 bytes to output buffer
Limitation: processes one request at a time (queue_num=1), no descriptor ring. Works for QEMU's legacy virtio-mmio.
Read-only filesystem using a ustar TAR archive embedded in the kernel.
Header parsing:
- Validates magic
"ustar"at offset 257 - Parses file size via octal conversion
- Each file padded to 512-byte boundary
Functions:
fs_init(addr)— validates archive, calculates sizefs_list(files, max)— enumerates filesfs_read(name, buf, max)— finds and reads a file
Note: Not currently wired into kernel_main() — fs_init() is never called.
Loads static ELF64 executables for AArch64. No relocation support.
Process:
- Validate magic (
\x7fELF) and machine type (0x3E= AArch64) - For each
PT_LOADsegment:- Allocate 4KB-aligned pages via
page_alloc() - Zero
p_memszbytes - Copy
p_fileszbytes from file offset
- Allocate 4KB-aligned pages via
- Return entry point and load address
A tiny assembly program linked into the kernel that runs at EL0:
user_demo_start:
adr x0, msg ; buf
mov x1, #13 ; len
svc #0 ; SYS_WRITE
... busy-wait ...
b loop ; loop forever
msg: .ascii "[user EL0]!\n"User mode is opt-in: the usermode_init() call in kernel.c is commented out. To enable:
- Uncomment
usermode_init()inkernel_main() - The demo task will print "[user EL0]!" periodically
Minimal printf using UART. Supports the standard format specifiers with width/padding:
%d,%u,%x,%ld,%lu,%lx,%p(with 0x prefix),%s,%c,%%- Width/padding:
%08x,%5d, etc. - Uses
<stdarg.h>for variadic arguments
Header-only, uses ARMv8 exclusive access instructions:
| Operation | Instruction | Semantics |
|---|---|---|
spin_lock(s) |
ldaxr / stxr loop |
Load-acquire exclusive, store-release exclusive |
spin_unlock(s) |
stlr |
Store-release |
spin_trylock(s) |
ldaxr / stxr |
Single attempt, returns 0 on success |
Uses acquire/release semantics for correct memory ordering.
- Identity mapping — Virtual == physical. Sim