# Comprehensive IPython Notebook on Paging in Operating Systems

Dear aspiring scientist and researcher,

As Alan Turing might ponder the computability of memory illusions, Albert Einstein the relativity of virtual spaces, and Nikola Tesla the efficient harnessing of computational energies, I present this interactive IPython Notebook. This is your laboratory for exploring paging—from foundational theory to cutting-edge research. Since you rely on this as your sole resource, I've expanded beyond our previous tutorial to include essential topics for a scientist: demand paging, page replacement, thrashing, inverted tables, huge pages, copy-on-write, and more. 

Structure: 
- Theory Sections: In-depth explanations with analogies, math, and rare insights.
- Code Guides: Practical Python simulations using numpy, matplotlib, etc.
- Visualizations: Run code to see plots and diagrams.
- Tutorials: Step-by-step guides.
- Applications: Real-world cases in OS, AI, cloud.
- Research Directions: Current trends (as of 2025), rare insights from papers.
- Projects: Mini (simple simulators) and major (real-trace analysis).

Run cells sequentially. Import libraries as needed. Experiment—alter code, hypothesize outcomes, like a true researcher. This prepares you for innovations in computer architecture or AI systems.

## Prerequisites
Install Jupyter if needed. Libraries: numpy, matplotlib (pre-installed in most environments).

## Section 1: Recap and Expanded Theory on Paging Basics

Theory: Paging creates virtual memory illusion. We covered basics; now add demand paging—load pages only on access, reducing I/O. Logic: Lazy loading exploits locality.

Rare Insight: In multiprocessors, 'TLB shootdown' synchronizes TLBs across cores via IPIs—costly, up to 10% overhead in virtualized envs (from Virtuoso paper, 2025).

Math: Page fault rate = Faults / References. Optimal <1% for performance.

Analogy: Demand paging like just-in-time delivery—order only what you need now.

Application: In Android, demand paging loads app code on demand, saving battery.

In [ ]:
# Code Guide: Simple Virtual Address to Physical Simulation
import numpy as np
import matplotlib.pyplot as plt

def virtual_to_physical(va, page_size=4096):
    pn = va // page_size
    offset = va % page_size
    # Simulate page table (dict for simplicity)
    page_table = {0: 2, 1: 5, 2: 1}  # Page -> Frame
    if pn in page_table:
        fn = page_table[pn]
        return fn * page_size + offset
    else:
        return 'Page Fault'

print(virtual_to_physical(5000))  # Example

## Section 2: Page Tables – Advanced

Theory: Beyond basics, inverted page tables (IPT): One table for all processes, hash-based (frame -> page/process). Saves space in 64-bit systems.

Logic: Traditional tables per-process; IPT global, reduces memory for sparse spaces.

Rare Insight: IPT used in IBM PowerPC; collisions resolved via chaining—can cause chains up to 8 in worst cases, per ASPLOS 2025 research.

Math: IPT size = Physical pages; vs. traditional = Virtual pages per process.

Tutorial: Simulate IPT below.

Application: In hypervisors like KVM, IPT optimizes VM memory sharing.

In [ ]:
# Code Guide: Inverted Page Table Simulation
class InvertedPageTable:
    def __init__(self, num_frames):
        self.table = {}  # Frame -> (Process, Page)
        self.hash_map = {}  # Hash(Process, Page) -> Frame
        self.num_frames = num_frames
    
    def allocate(self, process, page, frame):
        if frame < 0 or frame >= self.num_frames:
            raise ValueError('Frame out of range')
        h = hash((process, page))
        self.table[frame] = (process, page)
        self.hash_map[h] = frame
    
    def lookup(self, process, page):
        h = hash((process, page))
        return self.hash_map.get(h, 'Miss')

ipt = InvertedPageTable(10)
ipt.allocate(1, 0, 3)
print(ipt.lookup(1, 0))  # 3

## Section 3: TLB – In-Depth with Visualizations

Theory: TLB caches PTEs. Add: Associativity (direct-mapped vs. fully associative), multi-level TLBs (L1/L2).

Rare Insight: In GPUs, TLB misses can be 100x costlier due to massive parallelism—DREAM (2025) proposes device-driven access to mitigate.

Visualization: Plot hit rates below.

Application: In AI training (PyTorch), TLB thrashing slows gradients; huge pages help.

In [ ]:
# Visualization: TLB Hit Rate Simulation
references = np.random.randint(0, 100, 1000)  # Simulated accesses
tlb_size = 16
tlb = []
hits = []
for ref in references:
    if ref in tlb:
        hits.append(1)
        tlb.remove(ref)
        tlb.append(ref)  # LRU
    else:
        hits.append(0)
        if len(tlb) >= tlb_size:
            tlb.pop(0)
        tlb.append(ref)

plt.plot(np.cumsum(hits) / np.arange(1, len(hits)+1))
plt.xlabel('References')
plt.ylabel('Hit Rate')
plt.title('TLB Hit Rate Over Time')
plt.show()

## Section 4: Multi-Level Paging and Huge Pages

Theory: Multi-level reduces table size. Add huge pages (2MB/1GB): Fewer TLB entries, lower misses.

Logic: Transparent Huge Pages (THP) in Linux auto-promote 4KB to 2MB.

Rare Insight: In LLMs, vAttention (2025) uses paging-inspired blocks for dynamic KV cache, reducing fragmentation by 50%.

Tutorial: Simulate multi-level access.

Application: Databases (PostgreSQL) use huge pages for faster queries in scientific data analysis.

In [ ]:
# Code Guide: Multi-Level Page Table Simulation
def multi_level_access(va, levels=[10,10,12]):  # Bits: Dir, Table, Offset
    offsets = []
    temp = va
    for bits in reversed(levels):
        mask = (1 << bits) - 1
        offsets.append(temp & mask)
        temp >>= bits
    return offsets[::-1]  # [Dir idx, Table idx, Offset]

print(multi_level_access(0x12345678, [10,10,12]))

## Section 5: Page Replacement Algorithms

Theory: On faults, replace pages. Algorithms: FIFO (simple), LRU (stack-based), Optimal (future knowledge).

Math: Belady's Anomaly—more frames can increase faults in FIFO.

Rare Insight: In Rust OS (ToyOS, 2025), LRU approximated via clock algorithm for efficiency.

Visualization: Compare faults below.

Application: Cloud VMs use LRU to minimize swaps in overcommitted hosts.

In [ ]:
# Mini Project: Page Replacement Simulator
def fifo_replacement(references, frames):
    memory = []
    faults = 0
    for ref in references:
        if ref not in memory:
            faults += 1
            if len(memory) >= frames:
                memory.pop(0)
            memory.append(ref)
    return faults

refs = [1,2,3,4,1,2,5,1,2,3,4,5]
print('FIFO Faults:', fifo_replacement(refs, 3))  # Run and modify for LRU

## Section 6: Thrashing and Working Set Model

Theory: Thrashing—excessive faults when working set > memory. Working set: Pages used in window τ.

Logic: Prevent via admission control.

Rare Insight: In simulations (Virtuoso, 2025), model thrashing to test new VM hardware.

Application: HPC clusters monitor working sets for job scheduling in simulations.

In [ ]:
# Visualization: Thrashing Simulation
frames = np.arange(1,10)
faults = [fifo_replacement(refs, f) for f in frames]
plt.plot(frames, faults)
plt.xlabel('Frames')
plt.ylabel('Faults')
plt.title("Belady's Anomaly")
plt.show()

## Section 7: Copy-on-Write and Shared Pages

Theory: Fork() uses COW—share pages read-only, copy on write.

Rare Insight: In containers (Docker), COW with overlayfs saves storage for research envs.

Application: Fork in Unix for parallel simulations.

## Section 8: Research Directions and Rare Insights

- GPU Paging: DREAM for programmable GPU VM (2025).
- AI/LLMs: vAttention for dynamic memory in serving (2025).
- Simulations: Virtuoso for fast VM prototyping.
- Rust OS: Paging in ToyOS for secure memory.
- ASPLOS 2025: Advances in architecture/OS integration.

Direction: Explore persistent memory (Optane) paging hybrids for non-volatile VM.

## Section 9: Major Project – Virtual Memory Simulator with Real-World Traces

Description: Simulate paging with synthetic traces. For real: Use valgrind traces (generate via tool).

Steps: 1. Generate trace. 2. Implement LRU. 3. Plot faults vs. memory size.

Research Tie: Model after Virtuoso for your papers.

In [ ]:
# Major Project Starter: Extend fifo_replacement to LRU, use larger refs
def lru_replacement(references, frames):
    memory = []
    faults = 0
    for ref in references:
        if ref in memory:
            memory.remove(ref)
            memory.append(ref)
        else:
            faults += 1
            if len(memory) >= frames:
                memory.pop(0)
            memory.append(ref)
    return faults

# Synthetic trace: Locality
trace = np.concatenate([np.random.randint(0,10,500), np.random.randint(50,60,500)])
print('LRU Faults:', lru_replacement(trace.tolist(), 20))

End of Notebook. Experiment, hypothesize, publish—become the next Turing!