Skip to content

HackHarbor-max/SamOs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

SAMOS — AArch64 Bare Metal Kernel

Overview

SAMOS is a minimal bare-metal AArch64 (ARMv8-A) kernel built from scratch for the QEMU virt machine with a cortex-a57 CPU. It implements all the fundamental pieces of an operating system kernel: boot from EL3 down to EL1, virtual memory via stage-1 page tables, exception and interrupt handling, a generic timer driver, heap and page allocators, a preemptive round-robin scheduler, system calls, ELF loading, a virtio block driver, an initramfs filesystem, and a user-mode demo program.

The project was built incrementally: start with a boot.S that drops from EL3 to EL2 to EL1, then add UART output, then MMU, then exceptions, then a timer, then a heap allocator, then a scheduler, then a shell, then GIC, then system calls, then ELF loading, then virtio and filesystem support, and finally user-mode execution.


How to Build

Prerequisites

The cross-compiler toolchain is bundled in the downloads/ directory:

downloads/
  arm-toolchain/              <-- extracted aarch64-none-elf-gcc 15.2.1
    bin/
      aarch64-none-elf-gcc.exe
      aarch64-none-elf-ld.exe
      aarch64-none-elf-objcopy.exe
      ...

You also need:

  • QEMUqemu-system-aarch64 (install from https://www.qemu.org or use the bundled installer)
  • makemingw32-make on Windows, make on Linux/Mac

Build Steps

cd samos
make                         # uses ../downloads/arm-toolchain by default

The Makefile defaults TOOLCHAIN to ../downloads/arm-toolchain, so it picks up the bundled cross-compiler automatically. To override:

make TOOLCHAIN=/path/to/cross-compiler

Build Output

File Description
build/*.o Compiled object files (one per source file)
samos.elf ELF executable with DWARF debug info, loaded by QEMU
samos.bin Raw binary image (objcopy -O binary)

Targets

Target Description
make or make all Build samos.elf and samos.bin
make run Launch in QEMU (-machine virt -cpu cortex-a57 -nographic)
make debug Same but with -s -S (wait for GDB connection on port 1234)
make clean Remove object files and output binaries

Windows Quick Start (no make)

build.bat    # builds
run.bat      # runs in QEMU
clean.bat    # removes build artifacts

All scripts use relative paths (%~dp0) so the project is fully portable on a USB drive.


Project Structure

app/
├── downloads/
│   ├── arm-toolchain/                  # Cross-compiler (aarch64-none-elf-gcc)
│   ├── qemu-w64-setup-20260501.exe     # QEMU installer
│   └── arm-gnu-toolchain-*.zip         # Original toolchain archive
├── samos/
│   ├── build/                          # Object files (generated)
│   ├── src/                            # Source code (36 files)
│   │   ├── kernel.c                    # Entry point, banner, init sequence
│   │   ├── boot.S                      # Boot asm, vector table, context switch
│   │   ├── uart.c / uart.h             # PL011 UART driver
│   │   ├── printf.c / printf.h         # Formatted output (stdarg-based)
│   │   ├── mmu.c / mmu.h               # Stage-1 page tables, identity mapping
│   │   ├── exceptions.c / exceptions.h # EL1/EL0 exception handlers
│   │   ├── gic.c / gic.h               # GICv3 driver (v2 MMIO fallback)
│   │   ├── timer.c / timer.h           # ARM Generic Timer (CNTP)
│   │   ├── allocator.c / allocator.h   # Bitmap-based 4KB page allocator
│   │   ├── heap.c / heap.h             # Free-list heap (kmalloc/kfree)
│   │   ├── spinlock.h                  # Header-only spinlock (ldaxr/stlxr)
│   │   ├── scheduler.c / scheduler.h   # Round-robin scheduler, 16 tasks
│   │   ├── shell.c / shell.h           # Interactive shell (7 commands)
│   │   ├── keyboard.c / keyboard.h     # Line-editing keyboard input
│   │   ├── syscall.c / syscall.h       # 5 syscalls (write/read/sleep/yield/exit)
│   │   ├── elf.c / elf.h               # ELF64 loader (static segments)
│   │   ├── virtio.c / virtio.h         # Virtio block device driver
│   │   ├── fs.c / fs.h                 # Initramfs (ustar TAR reader)
│   │   ├── user_process.c / user_process.h  # User task creation helper
│   │   └── user_demo.S                # User-mode demo (prints [user EL0]!)
│   ├── linker.ld                       # Linker script (0x40080000)
│   ├── Makefile                        # Build system
│   ├── virt.dtb                        # Device tree blob (QEMU virt)
│   ├── samos.elf / samos.bin           # Built binaries
│   └── README.md                       # This file
├── x.html                              # Web-based boot visualization
├── build.bat                           # Windows build script
├── run.bat                             # Windows run script
└── clean.bat                           # Windows clean script

Architecture Deep Dive

1. Boot Flow (EL3 → EL2 → EL1)

ARMv8-A boots in the highest exception level (EL3). SAMOS drops through EL2 to EL1 in three stages, all in boot.S:

_start (EL3)
  │  Detect current exception level
  │
  ├─ EL3 path (QEMU default):
  │    ├─ Configure GIC distributor at 0x08000000 for Non-Secure access
  │    │  - GICD_CTLR: enable GICv3 mode (ARE_S=1, ARE_NS=1, EnableGrp1=1)
  │    │  - ICC_SRE_EL3: enable system register interface
  │    ├─ SCR_EL3.NS=1  → Non-Secure state for EL2
  │    ├─ HCR_EL2.RW=1  → AArch64 for EL1
  │    ├─ SCTLR_EL2=0   → MMU off at EL2
  │    ├─ SPSR_EL3=0x349 → EL2h, all interrupts masked
  │    ├─ ELR_EL3=el2_entry
  │    └─ ERET → EL2
  │
  ├─ EL2 path:
  │    ├─ ICC_SRE_EL2: enable system register access at EL1
  │    ├─ SCTLR_EL1=0   → MMU off at EL1
  │    ├─ SPSR_EL2=0x345 → EL1h, all interrupts masked
  │    ├─ ELR_EL2=el1_entry
  │    └─ ERET → EL1
  │
  └─ EL1 entry (el1_entry):
       ├─ CPACR_EL1 → FPEN=3 (enable FP/SIMD at EL0/EL1)
       ├─ SP = __stack_top (64KB stack from linker.ld)
       ├─ Clear BSS (__bss_start → __bss_end)
       └─ BL kernel_main()

2. Linker Script — Memory Layout (linker.ld)

The kernel is loaded at physical address 0x40080000 (QEMU virt convention):

0x40080000
├─ .text
│  ├─ .text.boot      (_start entry point)
│  ├─ .align 2048     (2KB gap for vector table alignment)
│  ├─ .text.vector    (_vector_table - must be 2KB-aligned for VBAR_EL1)
│  └─ .text*          (all other code)
├─ .rodata
├─ .data
├─ .bss               (zero-initialized, __bss_start/__bss_end)
└─ stack space        (64KB, __stack_top at the top)

3. Overall Memory Map

0x00000000 - 0x07FFFFFF   Unmapped
0x08000000 - 0x080FFFFF   GIC (Distributor 0x08000000, Redistributor 0x080A0000, CPU IF 0x08010000)
0x09000000 - 0x09000FFF   PL011 UART
0x0C000000 - 0x0C000FFF   Virtio block MMIO
0x40000000 - 0x4007FFFF   First 512KB RAM (unused before kernel)
0x40080000                 Kernel loaded here
   ├─ 0x40080000 - 0x401FFFFF   Kernel .text, .rodata, .data, .bss
   ├─ 0x40200000 - 0x403FFFFF   Page allocator pool (32MB)
   ├─ 0x40400000 - 0x404FFFFF   Heap (1MB)
   └─ 0x40500000 - 0x5FFFFFFF   Free memory from page allocator
0x60000000                 End of 512MB RAM

4. MMU — Stage-1 Translation (mmu.c)

The MMU uses AArch64 stage-1 translation tables with 4KB granules and 2MB block mappings (2-level page table: L1 + L2).

Configuration:

Register Value Meaning
TCR_EL1 T0SZ=25, TG0=4KB, SH=Inner, ORGN/IRGN=WBRAWA 512GB address space
MAIR_EL1 Attr0=0xFF (Normal WB), Attr1=0x04 (Device nGnRE), Attr2=0x00 (Device nGnRnE) 3 memory types

Mappings (identity-mapped, virtual == physical):

Region Size Attributes
0x40000000 - 0x5FFFFFFF 512MB RAM Normal, RW at EL1
0x08000000 - 0x08FFFFFF 2MB (GIC) Device nGnRnE
0x09000000 - 0x09FFFFFF 2MB (UART) Device nGnRE

Key design decisions:

  • Identity mapping (virtual == physical) — keeps things simple, no need to relocate the kernel
  • 2MB blocks only — uses block entries at L2, never allocates L3 tables for 4KB pages
  • No VM for kernel heap — the kernel operates on physical addresses directly
  • User processes get separate page tables: code at virtual 0x10000000, stack at 0x7FE00000

Page tables (global):

Table Level Coverage
l1_tbl L1 0x00000000 - 0x7FFFFFFF (512GB)
l2_tbl_lo L2 0x00000000 - 0x3FFFFFFF (first 1GB of L1 entry 0)
l2_tbl_hi L2 0x40000000 - 0x7FFFFFFF (second 1GB of L1 entry 0)

5. Page Allocator (allocator.c)

A bitmap-based 4KB page allocator managing a 32MB pool starting at 0x40200000.

pool: 0x40200000 - 0x403FFFFF (32MB = 8192 pages)
bitmap: first few pages store the bitmap itself (8192 bits = 1024 bytes = 1 page)
  • page_alloc() — linear scan for first free bit, marks it, returns address
  • page_free(addr) — clears the corresponding bit
  • Initialization marks the bitmap pages as allocated so they aren't handed out

6. Heap Allocator (heap.c)

A free-list-based heap (kmalloc/kfree) sitting on top of page-allocated memory, with spinlock protection for thread safety.

Allocation strategy:

  1. Align request to 8 bytes, add header (size_t storing total block size)
  2. First-fit walk of the free list
  3. Split block if remainder ≥ minimum block size (sizeof(free_block_t))
  4. Return pointer past the size field

Free strategy:

  1. Insert into free list sorted by address
  2. Coalesce with adjacent free blocks (both forward and backward)

Configuration: 1MB heap starting at the first page after the page allocator pool.

7. Exception Handling (exceptions.c + boot.S)

The vector table (_vector_table in boot.S) is 2KB-aligned and contains 16 entries per the ARMv8-A spec:

Offset  | Exception Type         | Handler
--------|------------------------|---------------------------
0x000   | EL1h Sync              | exc_sync_el1h (exc_stub 0)
0x080   | EL1h IRQ               | exc_irq_el1h (exc_stub 1)
0x100   | EL1h FIQ               | exc_fiq_el1h (exc_stub 2)
0x180   | EL1h SError            | exc_serror_el1h (exc_stub 3)
0x200   | EL0 Sync (AArch64)     | exc_sync_el0 (full ctx_save)
0x280   | EL0 IRQ (AArch64)      | exc_irq_el0 (full ctx_save)
0x300   | EL0 FIQ (AArch64)      | exc_fiq_el0 (full ctx_save)
0x380   | EL0 SError (AArch64)   | exc_serror_el0 (full ctx_save)

EL1h exceptions: Use exc_stub macro — saves x0-x7 (64 bytes), calls C handler, restores, eret.

EL0 exceptions: Use ctx_save/ctx_restore — saves all 35 registers (280 bytes), calls C handler. On return, checks reschedule_pending; if set, branches to ctx_resched for task switch.

Exception Class (ESR) decoding:

EC Meaning Handler Action
0x15 SVC (syscall) handle_svc_el1() or syscall_handler()
0x20/0x21 Instruction Abort Print fault info, halt
0x24/0x25 Data Abort Print fault info, halt

8. Interrupt Controller — GICv3 (gic.c)

Despite QEMU virt using GICv3, the driver explicitly falls back to GICv2 MMIO compatibility mode because system register access may be trapped.

Register layout:

Region Address Purpose
GICD 0x08000000 Distributor (global settings, SPI enable)
GICR 0x080A0000 Redistributor (PPI/SGI enable per-core)
GICC 0x08010000 CPU interface (GICv2 MMIO fallback)

Key interrupt IDs:

ID Name Source
30 IRQ_TIMER_EL1 PPI 14 (16 + 14) — ARM Generic Timer
33 IRQ_UART SPI 1 (32 + 1) — PL011 UART

Initialization:

  1. Read GICD_TYPER to get IRQ line count
  2. Disable all SPI interrupts
  3. Set up CPU interface (GICv2 MMIO if sysregs not available)
  4. Enable distributor

Interrupt flow:

  1. gic_acknowledge() — reads ICC_IAR1_EL1 (or GICC_IAR) for pending IRQ ID
  2. Handle the IRQ
  3. gic_eoi(irq) — writes ICC_EOIR1_EL1 (or GICC_EOIR) to signal end-of-interrupt

9. Timer (timer.c)

Uses the ARM Generic Timer's physical count and timer (CNTP):

Register Function
CNTFRQ_EL0 Timer frequency (read at init — typically 62.5MHz on QEMU)
CNTPCT_EL0 Physical count register (free-running counter)
CNTP_TVAL_EL0 Timer value (counts down, IRQ fires when 0 is reached)
CNTP_CTL_EL0 Control (bit 0: enable, bit 1: IMASK, bit 2: ISTATUS)

Timer starts at 20ms intervals (50Hz) using timer_irq_start(20). Each IRQ rearms via timer_irq_ack().

sleep is implemented as a busy-wait loop (timer_sleep(ms) counts ticks).

10. Scheduler (scheduler.c)

A preemptive round-robin scheduler with a static pool of 16 tasks.

Task states:

State Meaning
RUNNING Currently on CPU
READY In ready queue, waiting for CPU
WAITING Blocked (not used in current implementation)
ZOMBIE Exited, slot available for new task

Task control block:

typedef struct task {
    unsigned long sp;           // saved stack pointer (context frame)
    unsigned long *stack;       // kernel stack base
    task_state_t state;         // RUNNING/READY/WAITING/ZOMBIE
    int pid;
    char name[16];
    struct task *next;          // ready queue linked list
    int is_user;                // 1 = user-mode (EL0) task
    uint64_t ttbr0;             // page table for user task
    unsigned long user_entry;   // virtual entry point
    unsigned long user_sp;      // user stack pointer
    unsigned long *user_stack;  // user stack base
} task_t;

Context frame layout (stack grows down, SP at the end):

SP+0:   x0          (offset 0)
SP+8:   x1
...
SP+232: x29         (offset 232)
SP+240: x30 (LR)    (offset 240)
SP+248: SP_EL0      (offset 248)
SP+256: ELR_EL1     (offset 256)
SP+264: SPSR_EL1    (offset 264)
SP+272: TTBR0_EL1   (offset 272)

How preemption works:

Timer IRQ (20ms)
  │
  ▼
exc_irq_el0_handler()
  │  gic_acknowledge() → ID=30 (timer)
  │  timer_irq_ack()   → re-arm
  │  sched_try_yield() → dequeue head, enqueue current at tail
  │                     → sets reschedule_pending = 1
  │  gic_eoi(irq)
  │
  ▼
Return from exception
  │  check reschedule_pending
  │  if set:
  │    ctx_resched:
  │      *current_task_sp_ptr = SP  (save current)
  │      SP = next_task_sp          (load next)
  │      reschedule_pending = 0
  │      ctx_restore + eret

Task creation:

  • Kernel tasks (sched_create_task): 4KB stack, context frame with SPSR=0x3C5 (EL1h, IRQs masked), ELR → task_trampoline, x19 = entry function
  • User tasks (sched_create_user_task): kernel stack + user stack (4KB each), separate page table, code at 0x10000000, stack at 0x7FE00000, SPSR=0 (EL0t, all exceptions unmasked)

11. System Calls (syscall.c)

Five syscalls, invoked via SVC from EL0:

# Name Args Implementation
0 SYS_WRITE buf, len Writes len bytes from buf to UART
1 SYS_SLEEP ms Busy-wait for ms milliseconds
2 SYS_YIELD Calls sched_yield()
3 SYS_EXIT Frees stacks, marks ZOMBIE, switches away
4 SYS_READ buf, max Reads keyboard input with echo/backspace

The syscall number is extracted from ESR_EL1 bits 0-15 (the immediate field of the SVC instruction). Arguments come from x0-x2.

12. Shell (shell.c)

The interactive shell runs as the kernel's main task after boot. It reads commands via keyboard_gets() with full line editing (backspace, echo, CR/LF handling).

Commands:

Command Function What it does
help cmd_help() Lists all commands
ps cmd_ps() Prints task table via sched_list_tasks()
kmalloc cmd_kmalloc() Tests heap: allocates 8 blocks (16-128 bytes), frees every other, reallocates, reports total heap usage
timer cmd_timer() Tests timer: sleeps 1 second, reports elapsed ticks vs expected
info cmd_info() Shows timer frequency, page size, heap bounds, MMU status
uptime cmd_uptime() Reads tick count, converts to minutes:seconds
task cmd_task() Creates two kernel test tasks (testA prints every 2s, testB every 3s)

13. UART (uart.c)

PL011 UART at 0x09000000 (QEMU virt standard address). Polled mode — no interrupts.

  • uart_init(): sets baud (IBRD=13, FBRD=1), 8-bit FIFO mode, enables TX/RX
  • uart_putc(c): polls FR_TXFF (bit 5), writes to DR
  • uart_puts(s): writes string, converts \n to \r\n
  • uart_getc(): polls FR_RXFE (bit 4), reads from DR

14. Keyboard (keyboard.c)

Sits on top of UART RX:

  • keyboard_getc() — blocking read from uart_getc()
  • keyboard_getc_nonblock() — checks UART_FR bit 4; returns -1 if empty
  • keyboard_gets(buf, max) — line-editing input: echoes chars, handles backspace (DEL/\b), stops on CR/LF

Used by the shell and by the SYS_READ syscall.

15. Virtio Block Driver (virtio.c)

A minimal virtio block device driver for virtio-mmio at 0x0C000000.

Status sequence on init:

  1. Check magic (0x74726976)
  2. Verify device ID 2 (block device)
  3. Reset → ACK → DRIVER → DRIVER_OK

Read operation:

  1. Allocate request header (16 bytes), status byte, and data page
  2. Program queue num = 1
  3. Poll QUEUE_READY for completion
  4. If status == 0, copy 512 bytes to output buffer

Limitation: processes one request at a time (queue_num=1), no descriptor ring. Works for QEMU's legacy virtio-mmio.

16. Initramfs / Filesystem (fs.c)

Read-only filesystem using a ustar TAR archive embedded in the kernel.

Header parsing:

  • Validates magic "ustar" at offset 257
  • Parses file size via octal conversion
  • Each file padded to 512-byte boundary

Functions:

  • fs_init(addr) — validates archive, calculates size
  • fs_list(files, max) — enumerates files
  • fs_read(name, buf, max) — finds and reads a file

Note: Not currently wired into kernel_main()fs_init() is never called.

17. ELF Loader (elf.c)

Loads static ELF64 executables for AArch64. No relocation support.

Process:

  1. Validate magic (\x7fELF) and machine type (0x3E = AArch64)
  2. For each PT_LOAD segment:
    • Allocate 4KB-aligned pages via page_alloc()
    • Zero p_memsz bytes
    • Copy p_filesz bytes from file offset
  3. Return entry point and load address

18. User Process Demo (user_demo.S + user_process.c)

A tiny assembly program linked into the kernel that runs at EL0:

user_demo_start:
    adr x0, msg               ; buf
    mov x1, #13               ; len
    svc #0                    ; SYS_WRITE
    ... busy-wait ...
    b loop                    ; loop forever
msg: .ascii "[user EL0]!\n"

User mode is opt-in: the usermode_init() call in kernel.c is commented out. To enable:

  1. Uncomment usermode_init() in kernel_main()
  2. The demo task will print "[user EL0]!" periodically

19. Printf (printf.c)

Minimal printf using UART. Supports the standard format specifiers with width/padding:

  • %d, %u, %x, %ld, %lu, %lx, %p (with 0x prefix), %s, %c, %%
  • Width/padding: %08x, %5d, etc.
  • Uses <stdarg.h> for variadic arguments

20. Spinlock (spinlock.h)

Header-only, uses ARMv8 exclusive access instructions:

Operation Instruction Semantics
spin_lock(s) ldaxr / stxr loop Load-acquire exclusive, store-release exclusive
spin_unlock(s) stlr Store-release
spin_trylock(s) ldaxr / stxr Single attempt, returns 0 on success

Uses acquire/release semantics for correct memory ordering.


Design Decisions & Rationale

  1. Identity mapping — Virtual == physical. Sim

About

Operating system from scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors