Skip to content

Process Management

NtinosTheGamer2324 edited this page Feb 14, 2026 · 2 revisions

Process Management

ModuOS implements a modern preemptive multitasking system with support for both kernel-mode and user-mode processes, complete with process isolation, scheduler fairness, and POSIX-compatible process lifecycle management.


Overview

The process management subsystem provides:

  • Preemptive multitasking with CFS (Completely Fair Scheduler)
  • User and kernel mode processes with memory isolation
  • POSIX-style process lifecycle (fork, exec, waitpid, exit)
  • Lazy FPU context switching for performance
  • Per-process file descriptors and filesystem context
  • User identity management (UID/GID with supplementary groups)
  • Priority scheduling with nice values (-20 to +19)

Process Structure

Each process is represented by a process_t structure containing:

Core Fields

typedef struct process {
    uint32_t pid;                    // Process ID (0 = idle, 1-255 = user processes)
    uint32_t parent_pid;             // Parent process ID
    int32_t pgid;                    // Process group ID (for waitpid)
    char name[64];                   // Process name
    
    process_state_t state;           // Current state
    int exit_code;                   // Exit status (when zombie)
    
    // User identity
    uint32_t uid;                    // User ID (0 = mdman/root)
    uint32_t gid;                    // Group ID
    uint16_t groups[32];             // Supplementary groups
    uint8_t group_count;             // Number of supplementary groups
    
    // Scheduling
    cpu_state_t cpu_state;           // Saved register state
    uint8_t fpu_state[512] __attribute__((aligned(16)));  // FPU/SSE state
    
    // Memory management
    uint64_t page_table;             // CR3 value (page table physical address)
    void *kernel_stack;              // Kernel-mode stack (8KB)
    void *user_stack;                // User-mode stack base
    
    // User memory regions
    uint64_t user_stack_top;         // Stack top (grows down)
    uint64_t user_stack_low;         // Lowest mapped stack address
    uint64_t user_stack_limit;       // Maximum stack growth
    
    uint64_t user_heap_base;         // Heap start (for sbrk/brk)
    uint64_t user_heap_end;          // Current heap end
    uint64_t user_heap_limit;        // Maximum heap size
    
    uint64_t user_mmap_base;         // mmap region start
    uint64_t user_mmap_end;          // Current mmap end
    uint64_t user_mmap_limit;        // Maximum mmap size
    
    uint64_t user_image_base;        // ELF image start
    uint64_t user_image_end;         // ELF image end
    
    // User-mode entry (for exec)
    uint64_t user_rip;               // User entry point
    uint64_t user_rsp;               // User stack pointer
    int is_user;                     // 1 = user mode, 0 = kernel mode
    
    // File descriptors
    void *fd_table[16];              // Open file handles
    
    // CFS scheduler fields
    uint64_t vruntime;               // Virtual runtime (for fairness)
    uint64_t sum_exec_runtime;       // Total CPU time used
    uint64_t exec_start;             // When current timeslice started
    int nice;                        // Nice value (-20 to +19)
    uint32_t weight;                 // Scheduling weight (derived from nice)
    
    // Arguments and environment
    int argc;                        // Argument count
    char **argv;                     // Argument vector
    int envc;                        // Environment count
    char **envp;                     // Environment vector
    
    // Filesystem context
    char cwd[256];                   // Current working directory
    int current_slot;                // Active mount slot
    
    // Linked list
    struct process *next;            // Ready queue link
} process_t;

Process States

State Description
PROCESS_STATE_READY Ready to run, in scheduler queue
PROCESS_STATE_RUNNING Currently executing on CPU
PROCESS_STATE_BLOCKED Waiting for I/O or event
PROCESS_STATE_SLEEPING Sleeping for time period
PROCESS_STATE_ZOMBIE Exited, waiting for parent to reap
PROCESS_STATE_TERMINATED Terminated (being cleaned up)

CPU Context Structure

The cpu_state_t structure saves callee-saved registers per SysV ABI:

typedef struct {
    uint64_t r15;      // +0
    uint64_t r14;      // +8
    uint64_t r13;      // +16
    uint64_t r12;      // +24
    uint64_t rbx;      // +32
    uint64_t rbp;      // +40
    uint64_t rip;      // +48  - Instruction pointer
    uint64_t rsp;      // +56  - Stack pointer
    uint64_t rflags;   // +64  - Flags (includes interrupt enable flag)
} cpu_state_t;

This layout must match context_switch.asm exactly. The rflags field preserves the interrupt enable flag across context switches.


Scheduler: CFS (Completely Fair Scheduler)

ModuOS uses a Linux-inspired CFS algorithm for fair CPU time distribution.

Key Concepts

  • vruntime: Virtual runtime tracking how much CPU time a process has used (adjusted by weight)
  • Weight: Derived from nice value - lower nice = higher weight = more CPU time
  • Time slices: Calculated proportionally based on weight
  • Red-black tree simulation: Ready queue sorted by vruntime (lowest first)

Nice Values and Weights

Nice Weight Relative CPU %
-20 88761 ~87x normal
-10 9548 ~9x normal
0 1024 Normal (100%)
+10 110 ~11% of normal
+19 15 ~1.5% of normal

Scheduling Algorithm

  1. Tick update: Update current process vruntime based on time used and weight
  2. Time slice check: If current process exceeded its fair share, request reschedule
  3. Schedule: Pick process with lowest vruntime from ready queue
  4. Enqueue old: If old process is still runnable, add back to queue sorted by vruntime
  5. Context switch: Save old context, load new context, switch CR3 and TSS
vruntime_delta = (time_used * NICE_0_LOAD) / process_weight;
vruntime += vruntime_delta;

time_slice = (SCHED_LATENCY * process_weight) / total_weight;
if (time_slice < MIN_GRANULARITY) time_slice = MIN_GRANULARITY;

Process Creation

Kernel-Mode Processes

Created with process_create() or process_create_with_args():

process_t *proc = process_create_with_args("myproc", entry_func, priority, argc, argv);
  • Allocates PID, kernel stack (8KB)
  • Uses global kernel page table (shared CR3)
  • Runs in ring 0 with full privileges
  • Used for: kernel threads, init process, built-in shell

User-Mode Processes

Created by fork() + exec() syscalls:

  1. fork(): Duplicates current process (copy-on-write would be ideal, but currently full copy)
  2. exec(): Loads ELF binary, replaces process image

User processes:

  • Run in ring 3 (unprivileged)
  • Have separate page table (per-process CR3)
  • Memory isolated: 0x0000400000-0x00007FFFFFFFFFFF user space
  • Kernel stack (8KB) mapped in both user and kernel page tables for IRQ handling

Memory Layout

User-Mode Address Space

Range Purpose
0x0000000000400000 - 0x00007FFFFFFFFFFF User space
0x0000000000400000 - ... ELF image (text, data, bss)
0x0000005000000000 - 0x0000005040000000 Heap (sbrk/brk, 64MB limit)
0x0000006000000000 - 0x0000006010000000 mmap region (256MB limit)
0x00007FFFFE000000 - 0x00007FFFFFF00000 User stack (64KB, grows down)
0xFFFF800000000000 - 0xFFFFFFFFFFFFFFFF Kernel space (higher half)

Kernel-Mode Address Space

Kernel processes use the global kernel page table:

  • All kernel code/data in higher half (0xFFFF800000000000+)
  • Identity mapping for physical RAM
  • MMIO regions mapped via ioremap()

Process Lifecycle

1. Creation

  • process_create_with_args() allocates PID, stacks, sets up context
  • Initial state: PROCESS_STATE_READY
  • Added to scheduler ready queue

2. Execution

  • Scheduler picks process with lowest vruntime
  • Context switch loads CPU state, switches CR3, updates TSS RSP0
  • Process runs until:
    • Time slice expires → preempted
    • Blocks on I/O → PROCESS_STATE_BLOCKED
    • Calls yield() → voluntary preemption
    • Exits → PROCESS_STATE_ZOMBIE

3. Exit and Zombie State

When a process calls exit(code):

  1. State → PROCESS_STATE_ZOMBIE
  2. Exit code saved
  3. Parent process notified (if waiting in waitpid())
  4. Process remains in zombie state until parent calls waitpid() to reap it

4. Reaping

Parent calls waitpid(pid, &status, options):

  • Waits for child to exit
  • Retrieves exit code from zombie
  • Calls process_destroy() to free resources
  • Zombie removed from process table

If parent exits before child:

  • Orphan zombies are auto-reaped by scheduler

Context Switching

Context switching is implemented in assembly (context_switch.asm) with C wrapper:

void context_switch(cpu_state_t *old_state, cpu_state_t *new_state,
                   void *old_fpu_state, void *new_fpu_state);

Steps

  1. Disable interrupts (cli) to prevent IRQs during switch
  2. Save old context: Push r15, r14, r13, r12, rbx, rbp, rip, rsp, rflags
  3. Switch CR3: Load new process page table
  4. Update TSS RSP0: Set kernel stack for syscall/IRQ entry
  5. Lazy FPU: Set TS flag if FPU not owned by new process
  6. Restore new context: Pop rflags, rsp, rip, rbp, rbx, r12, r13, r14, r15
  7. Return: Jump to new RIP with interrupts restored by rflags

Lazy FPU Switching

To avoid expensive FPU state saves on every context switch, ModuOS uses lazy FPU:

  1. On context switch: Set CR0.TS (Task Switched) bit
  2. First FPU use: #NM (Device Not Available) exception fires
  3. Exception handler:
    • Save current FPU state (if another process owns it)
    • Restore new process FPU state
    • Clear CR0.TS
    • Return to user code

This way FPU state is only saved/restored for processes that actually use FPU instructions.


Fork and Exec

fork() Implementation

int fork(void) {
    // Allocate new PID
    // Duplicate parent process structure
    // Copy page table (full copy for now, COW would be better)
    // Copy user memory (image, heap, stack)
    // Copy FPU state
    // Copy file descriptors
    // Return 0 to child, child PID to parent
}

exec() Implementation

int execve(const char *path, char *argv[], char *envp[]) {
    // Load ELF binary into memory
    // Parse PT_LOAD segments, map into user space
    // Free old user memory regions
    // Set up new stack with argc, argv, envp
    // Set user_rip to ELF entry point
    // Return to user mode at entry point
}

Waitpid and Process Groups

waitpid() Syscall

pid_t waitpid(pid_t pid, int *status, int options);

Arguments:

  • pid > 0: Wait for specific PID
  • pid == 0: Wait for any child in same process group
  • pid == -1: Wait for any child
  • pid < -1: Wait for any child in process group -pid

Behavior:

  • If child is already zombie: return immediately with exit code
  • Otherwise: block parent until child exits
  • When child exits: wake parent, return PID and status

Process Groups

Process groups (PGID) allow waiting for groups of related processes:

  • Shell sets PGID for pipeline: cat file.txt | grep foo
  • Parent can waitpid(0, ...) to wait for any in group

File Descriptors

Each process has a file descriptor table fd_table[16]:

  • 0: stdin
  • 1: stdout
  • 2: stderr
  • 3-15: Open files

File descriptors are inherited across fork() but closed/reset on exec().


Syscalls

Processes interact with the kernel via syscalls (INT 0x80):

Syscall Description
SYS_EXIT Exit process with status code
SYS_FORK Create child process (duplicate)
SYS_EXECVE Replace process image with new program
SYS_WAITPID Wait for child process to exit
SYS_GETPID Get current process ID
SYS_GETPPID Get parent process ID
SYS_GETUID Get user ID
SYS_SETUID Set user ID (requires root)
SYS_GETGID Get group ID
SYS_SETGID Set group ID (requires root)
SYS_SBRK Grow/shrink heap
SYS_MMAP Map memory region
SYS_YIELD Voluntarily yield CPU
SYS_SLEEP Sleep for milliseconds

Process Table

The global process table process_table[256] stores all active processes:

  • PID 0: Idle process (runs when no other process is ready)
  • PID 1-255: User/kernel processes
  • Protected by RWLock for concurrent access

Summary

ModuOS process management provides:

✅ Modern CFS scheduler for fair CPU distribution
✅ User/kernel mode isolation with separate address spaces
✅ POSIX-compatible fork/exec/waitpid
✅ Lazy FPU switching for performance
✅ Per-process file descriptors and filesystem context
✅ User identity management (UID/GID)
✅ Zombie process handling with proper reaping

This enables ModuOS to run complex multi-process applications with proper isolation and fairness!

Clone this wiki locally