# **Virtualization**

## **"The Abstraction: The Process."**

The **process** is the fundamental abstraction: a running program. The OS provides the illusion of many **virtual CPUs** on a few physical CPUs through **time sharing**; **mechanisms** are low-level (e.g., context switch), and **policies** decide which process to run.
The **machine state** of a process includes its **address space** (code, static data, heap, stack), **registers** (PC/IP, SP, frame pointer), and **I/O information** (open files).

## Process API (Typical Interface)
* **create** (creation)
* **destroy/kill**
* **wait** (wait for termination)
* Various controls (**suspend/resume**)
* **status** (CPU time, state)

## From Executable to Process
The OS loads the code and static data into memory (**eager** or **lazy**), allocates and initializes the **stack** (e.g., *argc/argv*), prepares the **heap** (for *malloc()*), sets up the standard **file descriptors** (*stdin/out/err*), and starts execution at *main()*.

## Process States
* **Running** (on the CPU)
* **Ready** (ready but not executing)
* **Blocked** (waiting for an event, e.g., I/O)

**Transitions**: **schedule** (ready → running), **deschedule** (running → ready); upon I/O completion: blocked → ready. Examples show how the OS maintains high **CPU utilization** by running another process during I/O.

## OS Data Structures
The OS uses a **process list** with a **PCB** (Process Control Block) for each process. In *xv6*, the *struct proc* contains: register context for the context switch, state, PID, pointers to open files, current directory, trapframe, etc. Additional states include **UNUSED/EMBRYO/SLEEPING/RUNNABLE/RUNNING/ZOMBIE** (useful to allow the parent to call *wait()* and collect the exit code).

## Key Principles
* **Separation of policy/mechanism** (changing the scheduling algorithm without touching the context switch)
* **Time sharing vs. space sharing** (CPU is shared over time; disk is typically shared over space).

In summary, the chapter defines what a process is, how the OS creates and manages it, which states it transitions through, and what data the OS maintains, laying the groundwork for understanding **CPU virtualization** and **scheduling** in subsequent chapters.

---

## Thread Scheduling Explained Simply

| Role | Analogy |
| :--- | :--- |
| **CPU** | The **Stage** |
| **Threads** | The **Actors** |
| **Scheduler** | The **Director** |
| **Ready Queue** | The line of ready actors |

* **Stack:** The actor's open **script** (the functions called). This is used *by* the thread, but doesn't decide the running order.

### Thread States

1.  **Running:** Currently using the CPU (on the Stage).
2.  **Ready:** Ready to run (in the line), waiting for the CPU.
3.  **Blocked/Waiting:** Waiting for something (I/O, a lock, a timer, a GUI event).

### Typical Transitions

* **Running $\rightarrow$ Ready:** Time slice (quantum) ends or the thread voluntarily **yields**.
* **Running $\rightarrow$ Blocked:** Needs to wait (e.g., I/O request, needs a lock, calls `sleep()`).
* **Blocked $\rightarrow$ Ready:** The waiting event is complete (I/O finishes, lock is released).

### Quick Examples

* **Typing in VS Code:** The editor is **Blocked** waiting for input. Keypress arrives $\rightarrow$ Ready $\rightarrow$ Running, updates the screen, and goes back to **Blocked**.
* **Saving a File:** The `write()` call may **Block** $\rightarrow$ Running $\rightarrow$ Blocked. When I/O finishes $\rightarrow$ Blocked $\rightarrow$ Ready.
* **Parent calls `wait()`:** The parent process goes **Blocked** until its child thread terminates.
* **Multi-core:** Multiple actors can be **Running** simultaneously (one per core).

**In Summary:** The **stack** tracks *where* you are in the code; the **ready queue** and **scheduler** decide *who* runs and *when*.

---

# **What is a process?**

A **process** is not "an executing resource," but a **running instance** of a program with its **execution context**: memory (stack/heap), registers, file descriptors, environment variables, permissions, state, etc. It is the unit that the operating system schedules on the CPU.

# `fork()`, `exec()`, `wait()` — what are they?

They are **POSIX system calls**, i.e., functions exposed by the **operating system** that a process can invoke.

* **`fork()`** – *duplicates the process*
Creates a **child** almost identical to the **parent** (same code, copy of the context).
Different returns: to the parent → **child PID**; to the child → **0**.
Side effect: without coordination, the print/progress order is **non-deterministic**.

* **`exec()`** – *replaces the program in the process*
Doesn't create a new process: it **replaces** the code and segments of the **current process** with a **new program** (executable). If successful, it **doesn't return** (continue inside the new program).

* **`wait()` / `waitpid()`** – *synchronizes and cleans up*
The **parent** blocks until a **child** exits; this makes the order **deterministic** and “collects” its exit status (avoiding **zombies**).

# Why separate `fork()` and `exec()`

To make room for the **setup** between the two steps. This is the heart of how **shells** (bash, zsh) work:

1. The shell does `fork()` → now it has a **child** it can “tweak” without affecting itself.
2. In the child, it does the **preparations**: redirections (`dup2` to `STDIN/STDOUT/STDERR`), setting environment variables, pipe management, `chdir`, `setpgid`, resource limitations, etc.
3. Then it calls `exec()` to **load the actual command** (e.g., `ls`, `python`, `ffmpeg`).
4. In the **parent** (the shell), a `wait()` “waits” for that child to know when it's finished and its exit code.

This separation allows:

* **Redirections**: `cmd > out.txt` (the child points `STDOUT` to the file, then `exec()`).
* **Pipe**: `A | B` (the shell creates a pipe, connecting `stdout` of A to `stdin` of B between `fork()` and `exec()`).
* **Isolation**: changes only in the child; The shell remains clean for the next command.

These are system calls that a process invokes. They are not "process functions" in the OO sense; they are kernel APIs for creating, replacing, and synchronizing processes.

# Mini mental flow (one line)

`fork()` (child is created) → [setup in child] → `exec()` (runs real program) → `wait()` in parent (synchronizes and reads exit code).


1. fork() = creates a worker (child),

2. between the two steps = gives it the right tools (redirections, pipes, env),

3. exec() = the worker becomes the specialist who does the work (the actual program),

4. wait() = the boss decides whether to wait for the report or immediately move on to another order.

Starting a program = creating a new instance (process) that loads and runs that program; this instance is what actually does the work, not the original file on disk.


---


## *UNIX-style process patterns*

In [None]:
# 1) “fork + wait” feel with multiprocessing
import os, time
from multiprocessing import Process

def child_job(tag="child"):
    print(f"[{tag}] pid={os.getpid()} | parent={os.getppid()}")
    time.sleep(0.2)

if __name__ == "__main__":
    print(f"[parent] pid={os.getpid()}")
    p = Process(target=child_job, kwargs={"tag": "child"})
    p.start()                  # conceptual “fork”
    p.join()                   # conceptual “wait”
    print("[parent] child finished, continue...")


[parent] pid=13548
[parent] child finished, continue...


In [4]:
# 2) “exec & wait” with subprocess.run
import sys, subprocess

code = 'import os; print(f"[child proc] hello from pid={os.getpid()}")'
result = subprocess.run([sys.executable, "-c", code], capture_output=True, text=True)
print(result.stdout, end="")           # child’s stdout
print(f"[parent] exit code: {result.returncode}")


[child proc] hello from pid=5396
[parent] exit code: 0


In [5]:
# 3) Redirection: STDOUT → file, then execute
import sys, subprocess, pathlib

out = pathlib.Path("out.txt")
code = 'print("hello redirected"); print("line 2")'
with out.open("w", encoding="utf-8") as f:
    subprocess.run([sys.executable, "-c", code], stdout=f, text=True)

print("[parent] wrote to out.txt:")
print(out.read_text(encoding="utf-8"))


[parent] wrote to out.txt:
hello redirected
line 2



In [None]:
# 4) Pipe between two processes (producer | consumer)
import sys, subprocess

producer = 'import sys; [print(f"num:{i}") for i in range(5)]'
consumer = r'''import sys
data = sys.stdin.read().strip().splitlines()
print("lines:", len(data))
print("first:", data[0] if data else "<empty>")'''

p1 = subprocess.Popen([sys.executable, "-c", producer], stdout=subprocess.PIPE, text=True)
p2 = subprocess.Popen([sys.executable, "-c", consumer], stdin=p1.stdout, stdout=subprocess.PIPE, text=True)
p1.stdout.close()  # important: let p2 see EOF
out, _ = p2.communicate()
print(out, end="")


lines: 5
first: num:0


In [7]:
# 5) Tweak the environment before launch (like between fork & exec)
import os, sys, subprocess

env = os.environ.copy()
env["MY_VAR"] = "42"
code = 'import os; print("MY_VAR =", os.getenv("MY_VAR"))'
subprocess.run([sys.executable, "-c", code], env=env, text=True)


CompletedProcess(args=['c:\\Users\\DELL\\anaconda3\\envs\\ai_env\\python.exe', '-c', 'import os; print("MY_VAR =", os.getenv("MY_VAR"))'], returncode=0)

In [8]:
# 6) Politely terminate a long-running child
import sys, subprocess, time

long_code = 'import time; [time.sleep(0.5) for _ in range(10)]'
p = subprocess.Popen([sys.executable, "-c", long_code])
time.sleep(1.2)        # child is “working”
p.terminate()          # SIGTERM on UNIX; TerminateProcess on Windows
rc = p.wait()
print("terminated with code:", rc)


terminated with code: 1


---

# Mini-Glossary of libraries used

* **`os`**
  Low-level OS info/utilities (PIDs, parent PID, environment). Used for `os.getpid()`, `os.getppid()`, `os.environ`.

* **`time`**
  Simple sleeping/timing to simulate work and ordering (`time.sleep`).

* **`multiprocessing`**
  High-level process API that feels like `fork()` + `wait()` but portable; `Process.start()` creates a child, `Process.join()` waits.

* **`sys`**
  Access to the current Python executable (`sys.executable`) and standard streams; helpful when spawning another Python.

* **`subprocess`**
  Spawn and control external processes (or another Python). `run()` waits and returns a result; `Popen()` gives fine-grained control, pipes, `communicate()`, and `terminate()`.

* **`pathlib`**
  Comfortable, cross-platform path handling and file I/O (`Path("out.txt").open(...)`, `read_text()`).


