-
Notifications
You must be signed in to change notification settings - Fork 0
Process Management
ModuOS implements a modern preemptive multitasking system with support for both kernel-mode and user-mode processes, complete with process isolation, scheduler fairness, and POSIX-compatible process lifecycle management.
The process management subsystem provides:
- Preemptive multitasking with CFS (Completely Fair Scheduler)
- User and kernel mode processes with memory isolation
- POSIX-style process lifecycle (fork, exec, waitpid, exit)
- Lazy FPU context switching for performance
- Per-process file descriptors and filesystem context
- User identity management (UID/GID with supplementary groups)
- Priority scheduling with nice values (-20 to +19)
Each process is represented by a process_t structure containing:
typedef struct process {
uint32_t pid; // Process ID (0 = idle, 1-255 = user processes)
uint32_t parent_pid; // Parent process ID
int32_t pgid; // Process group ID (for waitpid)
char name[64]; // Process name
process_state_t state; // Current state
int exit_code; // Exit status (when zombie)
// User identity
uint32_t uid; // User ID (0 = mdman/root)
uint32_t gid; // Group ID
uint16_t groups[32]; // Supplementary groups
uint8_t group_count; // Number of supplementary groups
// Scheduling
cpu_state_t cpu_state; // Saved register state
uint8_t fpu_state[512] __attribute__((aligned(16))); // FPU/SSE state
// Memory management
uint64_t page_table; // CR3 value (page table physical address)
void *kernel_stack; // Kernel-mode stack (8KB)
void *user_stack; // User-mode stack base
// User memory regions
uint64_t user_stack_top; // Stack top (grows down)
uint64_t user_stack_low; // Lowest mapped stack address
uint64_t user_stack_limit; // Maximum stack growth
uint64_t user_heap_base; // Heap start (for sbrk/brk)
uint64_t user_heap_end; // Current heap end
uint64_t user_heap_limit; // Maximum heap size
uint64_t user_mmap_base; // mmap region start
uint64_t user_mmap_end; // Current mmap end
uint64_t user_mmap_limit; // Maximum mmap size
uint64_t user_image_base; // ELF image start
uint64_t user_image_end; // ELF image end
// User-mode entry (for exec)
uint64_t user_rip; // User entry point
uint64_t user_rsp; // User stack pointer
int is_user; // 1 = user mode, 0 = kernel mode
// File descriptors
void *fd_table[16]; // Open file handles
// CFS scheduler fields
uint64_t vruntime; // Virtual runtime (for fairness)
uint64_t sum_exec_runtime; // Total CPU time used
uint64_t exec_start; // When current timeslice started
int nice; // Nice value (-20 to +19)
uint32_t weight; // Scheduling weight (derived from nice)
// Arguments and environment
int argc; // Argument count
char **argv; // Argument vector
int envc; // Environment count
char **envp; // Environment vector
// Filesystem context
char cwd[256]; // Current working directory
int current_slot; // Active mount slot
// Linked list
struct process *next; // Ready queue link
} process_t;| State | Description |
|---|---|
PROCESS_STATE_READY |
Ready to run, in scheduler queue |
PROCESS_STATE_RUNNING |
Currently executing on CPU |
PROCESS_STATE_BLOCKED |
Waiting for I/O or event |
PROCESS_STATE_SLEEPING |
Sleeping for time period |
PROCESS_STATE_ZOMBIE |
Exited, waiting for parent to reap |
PROCESS_STATE_TERMINATED |
Terminated (being cleaned up) |
The cpu_state_t structure saves callee-saved registers per SysV ABI:
typedef struct {
uint64_t r15; // +0
uint64_t r14; // +8
uint64_t r13; // +16
uint64_t r12; // +24
uint64_t rbx; // +32
uint64_t rbp; // +40
uint64_t rip; // +48 - Instruction pointer
uint64_t rsp; // +56 - Stack pointer
uint64_t rflags; // +64 - Flags (includes interrupt enable flag)
} cpu_state_t;This layout must match context_switch.asm exactly. The rflags field preserves the interrupt enable flag across context switches.
ModuOS uses a Linux-inspired CFS algorithm for fair CPU time distribution.
- vruntime: Virtual runtime tracking how much CPU time a process has used (adjusted by weight)
- Weight: Derived from nice value - lower nice = higher weight = more CPU time
- Time slices: Calculated proportionally based on weight
- Red-black tree simulation: Ready queue sorted by vruntime (lowest first)
| Nice | Weight | Relative CPU % |
|---|---|---|
| -20 | 88761 | ~87x normal |
| -10 | 9548 | ~9x normal |
| 0 | 1024 | Normal (100%) |
| +10 | 110 | ~11% of normal |
| +19 | 15 | ~1.5% of normal |
- Tick update: Update current process vruntime based on time used and weight
- Time slice check: If current process exceeded its fair share, request reschedule
- Schedule: Pick process with lowest vruntime from ready queue
- Enqueue old: If old process is still runnable, add back to queue sorted by vruntime
- Context switch: Save old context, load new context, switch CR3 and TSS
vruntime_delta = (time_used * NICE_0_LOAD) / process_weight;
vruntime += vruntime_delta;
time_slice = (SCHED_LATENCY * process_weight) / total_weight;
if (time_slice < MIN_GRANULARITY) time_slice = MIN_GRANULARITY;Created with process_create() or process_create_with_args():
process_t *proc = process_create_with_args("myproc", entry_func, priority, argc, argv);- Allocates PID, kernel stack (8KB)
- Uses global kernel page table (shared CR3)
- Runs in ring 0 with full privileges
- Used for: kernel threads, init process, built-in shell
Created by fork() + exec() syscalls:
- fork(): Duplicates current process (copy-on-write would be ideal, but currently full copy)
- exec(): Loads ELF binary, replaces process image
User processes:
- Run in ring 3 (unprivileged)
- Have separate page table (per-process CR3)
- Memory isolated: 0x0000400000-0x00007FFFFFFFFFFF user space
- Kernel stack (8KB) mapped in both user and kernel page tables for IRQ handling
| Range | Purpose |
|---|---|
0x0000000000400000 - 0x00007FFFFFFFFFFF
|
User space |
0x0000000000400000 - ...
|
ELF image (text, data, bss) |
0x0000005000000000 - 0x0000005040000000
|
Heap (sbrk/brk, 64MB limit) |
0x0000006000000000 - 0x0000006010000000
|
mmap region (256MB limit) |
0x00007FFFFE000000 - 0x00007FFFFFF00000
|
User stack (64KB, grows down) |
0xFFFF800000000000 - 0xFFFFFFFFFFFFFFFF
|
Kernel space (higher half) |
Kernel processes use the global kernel page table:
- All kernel code/data in higher half (
0xFFFF800000000000+) - Identity mapping for physical RAM
- MMIO regions mapped via
ioremap()
-
process_create_with_args()allocates PID, stacks, sets up context - Initial state:
PROCESS_STATE_READY - Added to scheduler ready queue
- Scheduler picks process with lowest vruntime
- Context switch loads CPU state, switches CR3, updates TSS RSP0
- Process runs until:
- Time slice expires → preempted
- Blocks on I/O →
PROCESS_STATE_BLOCKED - Calls
yield()→ voluntary preemption - Exits →
PROCESS_STATE_ZOMBIE
When a process calls exit(code):
- State →
PROCESS_STATE_ZOMBIE - Exit code saved
- Parent process notified (if waiting in
waitpid()) - Process remains in zombie state until parent calls
waitpid()to reap it
Parent calls waitpid(pid, &status, options):
- Waits for child to exit
- Retrieves exit code from zombie
- Calls
process_destroy()to free resources - Zombie removed from process table
If parent exits before child:
- Orphan zombies are auto-reaped by scheduler
Context switching is implemented in assembly (context_switch.asm) with C wrapper:
void context_switch(cpu_state_t *old_state, cpu_state_t *new_state,
void *old_fpu_state, void *new_fpu_state);-
Disable interrupts (
cli) to prevent IRQs during switch - Save old context: Push r15, r14, r13, r12, rbx, rbp, rip, rsp, rflags
- Switch CR3: Load new process page table
- Update TSS RSP0: Set kernel stack for syscall/IRQ entry
- Lazy FPU: Set TS flag if FPU not owned by new process
- Restore new context: Pop rflags, rsp, rip, rbp, rbx, r12, r13, r14, r15
- Return: Jump to new RIP with interrupts restored by rflags
To avoid expensive FPU state saves on every context switch, ModuOS uses lazy FPU:
- On context switch: Set CR0.TS (Task Switched) bit
- First FPU use: #NM (Device Not Available) exception fires
-
Exception handler:
- Save current FPU state (if another process owns it)
- Restore new process FPU state
- Clear CR0.TS
- Return to user code
This way FPU state is only saved/restored for processes that actually use FPU instructions.
int fork(void) {
// Allocate new PID
// Duplicate parent process structure
// Copy page table (full copy for now, COW would be better)
// Copy user memory (image, heap, stack)
// Copy FPU state
// Copy file descriptors
// Return 0 to child, child PID to parent
}int execve(const char *path, char *argv[], char *envp[]) {
// Load ELF binary into memory
// Parse PT_LOAD segments, map into user space
// Free old user memory regions
// Set up new stack with argc, argv, envp
// Set user_rip to ELF entry point
// Return to user mode at entry point
}pid_t waitpid(pid_t pid, int *status, int options);Arguments:
-
pid > 0: Wait for specific PID -
pid == 0: Wait for any child in same process group -
pid == -1: Wait for any child -
pid < -1: Wait for any child in process group-pid
Behavior:
- If child is already zombie: return immediately with exit code
- Otherwise: block parent until child exits
- When child exits: wake parent, return PID and status
Process groups (PGID) allow waiting for groups of related processes:
- Shell sets PGID for pipeline:
cat file.txt | grep foo - Parent can
waitpid(0, ...)to wait for any in group
Each process has a file descriptor table fd_table[16]:
- 0: stdin
- 1: stdout
- 2: stderr
- 3-15: Open files
File descriptors are inherited across fork() but closed/reset on exec().
Processes interact with the kernel via syscalls (INT 0x80):
| Syscall | Description |
|---|---|
SYS_EXIT |
Exit process with status code |
SYS_FORK |
Create child process (duplicate) |
SYS_EXECVE |
Replace process image with new program |
SYS_WAITPID |
Wait for child process to exit |
SYS_GETPID |
Get current process ID |
SYS_GETPPID |
Get parent process ID |
SYS_GETUID |
Get user ID |
SYS_SETUID |
Set user ID (requires root) |
SYS_GETGID |
Get group ID |
SYS_SETGID |
Set group ID (requires root) |
SYS_SBRK |
Grow/shrink heap |
SYS_MMAP |
Map memory region |
SYS_YIELD |
Voluntarily yield CPU |
SYS_SLEEP |
Sleep for milliseconds |
The global process table process_table[256] stores all active processes:
- PID 0: Idle process (runs when no other process is ready)
- PID 1-255: User/kernel processes
- Protected by RWLock for concurrent access
ModuOS process management provides:
✅ Modern CFS scheduler for fair CPU distribution
✅ User/kernel mode isolation with separate address spaces
✅ POSIX-compatible fork/exec/waitpid
✅ Lazy FPU switching for performance
✅ Per-process file descriptors and filesystem context
✅ User identity management (UID/GID)
✅ Zombie process handling with proper reaping
This enables ModuOS to run complex multi-process applications with proper isolation and fairness!