# Process Management ModuOS implements a modern preemptive multitasking system with support for both kernel-mode and user-mode processes, complete with process isolation, scheduler fairness, and POSIX-compatible process lifecycle management. --- ## Overview The process management subsystem provides: - **Preemptive multitasking** with CFS (Completely Fair Scheduler) - **User and kernel mode** processes with memory isolation - **POSIX-style process lifecycle** (fork, exec, waitpid, exit) - **Lazy FPU context switching** for performance - **Per-process file descriptors** and filesystem context - **User identity management** (UID/GID with supplementary groups) - **Priority scheduling** with nice values (-20 to +19) --- ## Process Structure Each process is represented by a `process_t` structure containing: ### Core Fields ```c typedef struct process { uint32_t pid; // Process ID (0 = idle, 1-255 = user processes) uint32_t parent_pid; // Parent process ID int32_t pgid; // Process group ID (for waitpid) char name[64]; // Process name process_state_t state; // Current state int exit_code; // Exit status (when zombie) // User identity uint32_t uid; // User ID (0 = mdman/root) uint32_t gid; // Group ID uint16_t groups[32]; // Supplementary groups uint8_t group_count; // Number of supplementary groups // Scheduling cpu_state_t cpu_state; // Saved register state uint8_t fpu_state[512] __attribute__((aligned(16))); // FPU/SSE state // Memory management uint64_t page_table; // CR3 value (page table physical address) void *kernel_stack; // Kernel-mode stack (8KB) void *user_stack; // User-mode stack base // User memory regions uint64_t user_stack_top; // Stack top (grows down) uint64_t user_stack_low; // Lowest mapped stack address uint64_t user_stack_limit; // Maximum stack growth uint64_t user_heap_base; // Heap start (for sbrk/brk) uint64_t user_heap_end; // Current heap end uint64_t user_heap_limit; // Maximum heap size uint64_t user_mmap_base; // mmap region start uint64_t user_mmap_end; // Current mmap end uint64_t user_mmap_limit; // Maximum mmap size uint64_t user_image_base; // ELF image start uint64_t user_image_end; // ELF image end // User-mode entry (for exec) uint64_t user_rip; // User entry point uint64_t user_rsp; // User stack pointer int is_user; // 1 = user mode, 0 = kernel mode // File descriptors void *fd_table[16]; // Open file handles // CFS scheduler fields uint64_t vruntime; // Virtual runtime (for fairness) uint64_t sum_exec_runtime; // Total CPU time used uint64_t exec_start; // When current timeslice started int nice; // Nice value (-20 to +19) uint32_t weight; // Scheduling weight (derived from nice) // Arguments and environment int argc; // Argument count char **argv; // Argument vector int envc; // Environment count char **envp; // Environment vector // Filesystem context char cwd[256]; // Current working directory int current_slot; // Active mount slot // Linked list struct process *next; // Ready queue link } process_t; ``` ### Process States | State | Description | |-------|-------------| | `PROCESS_STATE_READY` | Ready to run, in scheduler queue | | `PROCESS_STATE_RUNNING` | Currently executing on CPU | | `PROCESS_STATE_BLOCKED` | Waiting for I/O or event | | `PROCESS_STATE_SLEEPING` | Sleeping for time period | | `PROCESS_STATE_ZOMBIE` | Exited, waiting for parent to reap | | `PROCESS_STATE_TERMINATED` | Terminated (being cleaned up) | --- ## CPU Context Structure The `cpu_state_t` structure saves callee-saved registers per SysV ABI: ```c typedef struct { uint64_t r15; // +0 uint64_t r14; // +8 uint64_t r13; // +16 uint64_t r12; // +24 uint64_t rbx; // +32 uint64_t rbp; // +40 uint64_t rip; // +48 - Instruction pointer uint64_t rsp; // +56 - Stack pointer uint64_t rflags; // +64 - Flags (includes interrupt enable flag) } cpu_state_t; ``` This layout **must match** `context_switch.asm` exactly. The `rflags` field preserves the interrupt enable flag across context switches. --- ## Scheduler: CFS (Completely Fair Scheduler) ModuOS uses a Linux-inspired CFS algorithm for fair CPU time distribution. ### Key Concepts - **vruntime**: Virtual runtime tracking how much CPU time a process has used (adjusted by weight) - **Weight**: Derived from nice value - lower nice = higher weight = more CPU time - **Time slices**: Calculated proportionally based on weight - **Red-black tree simulation**: Ready queue sorted by vruntime (lowest first) ### Nice Values and Weights | Nice | Weight | Relative CPU % | |------|--------|----------------| | -20 | 88761 | ~87x normal | | -10 | 9548 | ~9x normal | | 0 | 1024 | Normal (100%) | | +10 | 110 | ~11% of normal | | +19 | 15 | ~1.5% of normal| ### Scheduling Algorithm 1. **Tick update**: Update current process vruntime based on time used and weight 2. **Time slice check**: If current process exceeded its fair share, request reschedule 3. **Schedule**: Pick process with lowest vruntime from ready queue 4. **Enqueue old**: If old process is still runnable, add back to queue sorted by vruntime 5. **Context switch**: Save old context, load new context, switch CR3 and TSS ```c vruntime_delta = (time_used * NICE_0_LOAD) / process_weight; vruntime += vruntime_delta; time_slice = (SCHED_LATENCY * process_weight) / total_weight; if (time_slice < MIN_GRANULARITY) time_slice = MIN_GRANULARITY; ``` --- ## Process Creation ### Kernel-Mode Processes Created with `process_create()` or `process_create_with_args()`: ```c process_t *proc = process_create_with_args("myproc", entry_func, priority, argc, argv); ``` - Allocates PID, kernel stack (8KB) - Uses global kernel page table (shared CR3) - Runs in ring 0 with full privileges - Used for: kernel threads, init process, built-in shell ### User-Mode Processes Created by `fork()` + `exec()` syscalls: 1. **fork()**: Duplicates current process (copy-on-write would be ideal, but currently full copy) 2. **exec()**: Loads ELF binary, replaces process image User processes: - Run in ring 3 (unprivileged) - Have separate page table (per-process CR3) - Memory isolated: 0x0000400000-0x00007FFFFFFFFFFF user space - Kernel stack (8KB) mapped in both user and kernel page tables for IRQ handling --- ## Memory Layout ### User-Mode Address Space | Range | Purpose | |-------|---------| | `0x0000000000400000` - `0x00007FFFFFFFFFFF` | User space | | `0x0000000000400000` - `...` | ELF image (text, data, bss) | | `0x0000005000000000` - `0x0000005040000000` | Heap (sbrk/brk, 64MB limit) | | `0x0000006000000000` - `0x0000006010000000` | mmap region (256MB limit) | | `0x00007FFFFE000000` - `0x00007FFFFFF00000` | User stack (64KB, grows down) | | `0xFFFF800000000000` - `0xFFFFFFFFFFFFFFFF` | Kernel space (higher half) | ### Kernel-Mode Address Space Kernel processes use the global kernel page table: - All kernel code/data in higher half (`0xFFFF800000000000+`) - Identity mapping for physical RAM - MMIO regions mapped via `ioremap()` --- ## Process Lifecycle ### 1. Creation - `process_create_with_args()` allocates PID, stacks, sets up context - Initial state: `PROCESS_STATE_READY` - Added to scheduler ready queue ### 2. Execution - Scheduler picks process with lowest vruntime - Context switch loads CPU state, switches CR3, updates TSS RSP0 - Process runs until: - Time slice expires → preempted - Blocks on I/O → `PROCESS_STATE_BLOCKED` - Calls `yield()` → voluntary preemption - Exits → `PROCESS_STATE_ZOMBIE` ### 3. Exit and Zombie State When a process calls `exit(code)`: 1. State → `PROCESS_STATE_ZOMBIE` 2. Exit code saved 3. Parent process notified (if waiting in `waitpid()`) 4. Process remains in zombie state until parent calls `waitpid()` to reap it ### 4. Reaping Parent calls `waitpid(pid, &status, options)`: - Waits for child to exit - Retrieves exit code from zombie - Calls `process_destroy()` to free resources - Zombie removed from process table If parent exits before child: - Orphan zombies are auto-reaped by scheduler --- ## Context Switching Context switching is implemented in assembly (`context_switch.asm`) with C wrapper: ```c void context_switch(cpu_state_t *old_state, cpu_state_t *new_state, void *old_fpu_state, void *new_fpu_state); ``` ### Steps 1. **Disable interrupts** (`cli`) to prevent IRQs during switch 2. **Save old context**: Push r15, r14, r13, r12, rbx, rbp, rip, rsp, rflags 3. **Switch CR3**: Load new process page table 4. **Update TSS RSP0**: Set kernel stack for syscall/IRQ entry 5. **Lazy FPU**: Set TS flag if FPU not owned by new process 6. **Restore new context**: Pop rflags, rsp, rip, rbp, rbx, r12, r13, r14, r15 7. **Return**: Jump to new RIP with interrupts restored by rflags --- ## Lazy FPU Switching To avoid expensive FPU state saves on every context switch, ModuOS uses lazy FPU: 1. **On context switch**: Set CR0.TS (Task Switched) bit 2. **First FPU use**: #NM (Device Not Available) exception fires 3. **Exception handler**: - Save current FPU state (if another process owns it) - Restore new process FPU state - Clear CR0.TS - Return to user code This way FPU state is only saved/restored for processes that actually use FPU instructions. --- ## Fork and Exec ### fork() Implementation ```c int fork(void) { // Allocate new PID // Duplicate parent process structure // Copy page table (full copy for now, COW would be better) // Copy user memory (image, heap, stack) // Copy FPU state // Copy file descriptors // Return 0 to child, child PID to parent } ``` ### exec() Implementation ```c int execve(const char *path, char *argv[], char *envp[]) { // Load ELF binary into memory // Parse PT_LOAD segments, map into user space // Free old user memory regions // Set up new stack with argc, argv, envp // Set user_rip to ELF entry point // Return to user mode at entry point } ``` --- ## Waitpid and Process Groups ### waitpid() Syscall ```c pid_t waitpid(pid_t pid, int *status, int options); ``` **Arguments:** - `pid > 0`: Wait for specific PID - `pid == 0`: Wait for any child in same process group - `pid == -1`: Wait for any child - `pid < -1`: Wait for any child in process group `-pid` **Behavior:** - If child is already zombie: return immediately with exit code - Otherwise: block parent until child exits - When child exits: wake parent, return PID and status ### Process Groups Process groups (PGID) allow waiting for groups of related processes: - Shell sets PGID for pipeline: `cat file.txt | grep foo` - Parent can `waitpid(0, ...)` to wait for any in group --- ## File Descriptors Each process has a file descriptor table `fd_table[16]`: - **0**: stdin - **1**: stdout - **2**: stderr - **3-15**: Open files File descriptors are inherited across `fork()` but closed/reset on `exec()`. --- ## Syscalls Processes interact with the kernel via syscalls (INT 0x80): | Syscall | Description | |---------|-------------| | `SYS_EXIT` | Exit process with status code | | `SYS_FORK` | Create child process (duplicate) | | `SYS_EXECVE` | Replace process image with new program | | `SYS_WAITPID` | Wait for child process to exit | | `SYS_GETPID` | Get current process ID | | `SYS_GETPPID` | Get parent process ID | | `SYS_GETUID` | Get user ID | | `SYS_SETUID` | Set user ID (requires root) | | `SYS_GETGID` | Get group ID | | `SYS_SETGID` | Set group ID (requires root) | | `SYS_SBRK` | Grow/shrink heap | | `SYS_MMAP` | Map memory region | | `SYS_YIELD` | Voluntarily yield CPU | | `SYS_SLEEP` | Sleep for milliseconds | --- ## Process Table The global process table `process_table[256]` stores all active processes: - **PID 0**: Idle process (runs when no other process is ready) - **PID 1-255**: User/kernel processes - Protected by RWLock for concurrent access --- ## Summary ModuOS process management provides: ✅ Modern CFS scheduler for fair CPU distribution ✅ User/kernel mode isolation with separate address spaces ✅ POSIX-compatible fork/exec/waitpid ✅ Lazy FPU switching for performance ✅ Per-process file descriptors and filesystem context ✅ User identity management (UID/GID) ✅ Zombie process handling with proper reaping This enables ModuOS to run complex multi-process applications with proper isolation and fairness!