# **Task 3: Post-Training Quantization with SmoothQuant**

## **Overview**
Implements and evaluates **SmoothQuant**, an advanced **post-training quantization (PTQ)** method for large language models.  
The notebook diagnoses why naive quantization struggles, applies activation/weight smoothing, and measures the impact using **Perplexity (PPL)** on the **Wikitext** dataset.

---

## **Step 1: Environment and Data Preparation**

- **Baseline model:**  
  Load a full-precision **BF16** model as the gold-standard reference.

- **Dataset split:**  
  Load **Wikitext** and create two subsets:
  - **Calibration:** A small portion of the training set used to analyze activations and compute SmoothQuant scaling factors.  
  - **Evaluation:** The test set held out for fair PPL evaluation.

---

## **Step 2: Diagnosing the Quantization Challenge**

- **Activation capture:**  
  Use **forward hooks** to record input activations of selected linear layers on calibration batches.

- **Distribution plots:**  
  For each target layer, show side-by-side histograms of  
  (a) **weights** and  
  (b) **input activations**  
  to illustrate why standard quantization is difficult.

---

## **Step 3: Implementing the SmoothQuant Toolkit**

- **Basic quantizers:**  
  Implement **per-channel weight quantization (int8)** and **per-token activation quantization (int8)**.

- **Quantized layer:**  
  Define **`WnAnLinear`**, a drop-in `nn.Linear` replacement that stores quantized weights and dynamically quantizes activations in its forward pass.

- **Smoothing core:**  
  Implement **`smooth_ln_fcs`**, which scales activations down and weights up using factors derived from their distributions—shifting quantization difficulty from activations to weights.

- **Model wrappers:**  
  - **`smooth_model`** applies smoothing across the model.  
  - **`quantize_model`** replaces eligible linear layers with the quantized variant.

---

## **Step 4: Calibration and Evaluation Workflow**

- **Activation scaling:**  
  **`get_act_scales`** runs the calibration set with hooks to compute per-channel activation maxima for smoothing.

- **Perplexity evaluator:**  
  **`Evaluator`** tokenizes data, computes loss, and reports **PPL** (lower is better) on the **Wikitext** test set.

---

## **Step 5: Main Experiments**

- **Configurations:**  
  For example define runs for:
  - **BF16 baseline**
  - **Naive W8A8**
  - **W8A8 + SmoothQuant**

- **Orchestration:**  
  **`run_experiment`** loads the model, optionally smooths, quantizes, and evaluates PPL.

- **Models:**  
  Execute across multiple LLMs (e.g., **Llama-3-8B** and **Llama-2-7B**) and record results.

---

## **Step 6: Results and Conclusions**

- **Aggregation:**  
  Collect PPLs into a **pandas DataFrame** and print a concise summary table to compare **SmoothQuant**, **naive W8A8**, and the **BF16 baseline**.


In [3]:
### Cell 2: Environment Setup and Dependency Installation
import os
import random
import time
from functools import partial
from typing import Optional, Tuple, Callable
import types

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import torch
import torch.nn as nn
import torch.nn.functional as F
from datasets import load_dataset
from scipy.stats import linregress
from tqdm.auto import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation.stopping_criteria import (
    StoppingCriteria,
    StoppingCriteriaList,
)
from transformers.modeling_utils import ALL_ATTENTION_FUNCTIONS
from transformers.models.llama.modeling_llama import (
    LlamaAttention,
    rotate_half,
    repeat_kv,
)
from transformers.utils import logging

RESULTS_DIR = "./results"
FIGURES_DIR = "./figures"
os.makedirs(RESULTS_DIR, exist_ok=True)
os.makedirs(FIGURES_DIR, exist_ok=True)

if torch.cuda.is_available():
    DEVICE = torch.device("cuda")
else:
    DEVICE = torch.device("cpu")

def set_seed(seed=42):
    """Set random seeds for reproducibility across Python, NumPy, and PyTorch."""
    random.seed(seed)
    np.random.seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
    torch.manual_seed(seed)

set_seed(42)
print("\n   Environment setup and dependency installation complete.")


  from .autonotebook import tqdm as notebook_tqdm



   Environment setup and dependency installation complete.


In [4]:
# ### Cell 3: Hugging Face Login
from huggingface_hub import login, HfFolder
from getpass import getpass

# Check if a Hugging Face token is already set in the environment.
if not os.getenv("HUGGING_FACE_HUB_TOKEN"):
    try:
        # Prompt user for Hugging Face access token if not found.
        hf_token = getpass("Please enter your Hugging Face access token: ")
        login(token=hf_token, add_to_git_credential=True)
        print("   Hugging Face login successful!")
    except Exception as e:
        print(f"Login failed: {e}. Model loading may fail later.")
else:
    print("   Hugging Face token detected.")

Token has not been saved to git credential helper.


[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 'store' credential helper as default.

git config --global credential.helper store

Read https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage for more details.[0m
   Hugging Face login successful!


In [5]:
### Cell 4: Model, Tokenizer, and Dataset Loading
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"

def load_model_and_tokenizer(model_id):
    tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token

    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype=torch.bfloat16,
        device_map="auto" if torch.cuda.is_available() else None,
    )
    model.eval()
    return model, tokenizer

# Task 3, Step 1: Load the baseline BF16 model for quantization experiments
print("\nLoading bf16 model...")
model_fp16, tokenizer = load_model_and_tokenizer(MODEL_ID)

# Task 3, Step 1: Load the Wikitext dataset for calibration and evaluation
print("\nLoading Wikitext dataset...")
raw_datasets = load_dataset("wikitext", "wikitext-2-raw-v1")

# Very small train subset for calibration and full test for eval (can shrink if OOM)
calibration_dataset = raw_datasets["train"].select(range(512))
eval_dataset = raw_datasets["test"]
print("   Dataset loaded successfully.")


Loading bf16 model...


`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00,  1.36it/s]



Loading Wikitext dataset...
   Dataset loaded successfully.


In [6]:
### Cell 5: Visualization of Weight and Activation Distributions

from collections import defaultdict
from mpl_toolkits.mplot3d import Axes3D  # noqa: F401

def visualize_distributions(model, tokenizer):

    CALIBRATION_SAMPLES = 64
    SEQ_LEN = 128
    NUM_BINS = 80

    LAYERS_TO_VISUALIZE = [
        "model.layers.0.self_attn.o_proj",
        "model.layers.0.mlp.up_proj",
        "model.layers.10.self_attn.o_proj",
        "model.layers.10.mlp.up_proj",
        "model.layers.20.self_attn.o_proj",
        "model.layers.20.mlp.up_proj",
        "model.layers.30.self_attn.o_proj",
        "model.layers.30.mlp.up_proj",
    ]


    modules = dict(model.named_modules())
    activations = defaultdict(list)
    handles = []

    def make_hook(name):
        def hook(mod, inputs, output):
            x = inputs[0].detach()
            if x.dim() == 2:
                x = x.unsqueeze(1)
            activations[name].append(x.cpu())
        return hook

    for lname in LAYERS_TO_VISUALIZE:
        if lname in modules and isinstance(modules[lname], nn.Linear):
            h = modules[lname].register_forward_hook(make_hook(lname))
            handles.append(h)

    model.eval()
    with torch.no_grad():
        seen = 0
        for example in calibration_dataset:
            if seen >= CALIBRATION_SAMPLES:
                break
            text = example["text"].strip()
            if not text:
                continue
            enc = tokenizer(
                text,
                return_tensors="pt",
                truncation=True,
                max_length=SEQ_LEN,
            )
            enc = {k: v.to(DEVICE) for k, v in enc.items()}
            _ = model(**enc)
            seen += 1

    for h in handles:
        h.remove()

    for lname in LAYERS_TO_VISUALIZE:
        if lname not in modules or lname not in activations:
            continue

        layer = modules[lname]

        w = layer.weight.detach().float().cpu().numpy().reshape(-1)
        al = [t.reshape(-1, t.shape[-1]) for t in activations[lname]]
        a = torch.cat(al, dim=0).float().cpu().numpy().reshape(-1)
        
        fig, axes = plt.subplots(1, 2, figsize=(10, 4))
        axes[0].hist(w, bins=NUM_BINS, log=True)
        axes[0].set_title("Weights")
        axes[0].set_xlabel("Value")
        axes[0].set_ylabel("Count (log)")

        axes[1].hist(a, bins=NUM_BINS, log=True)
        axes[1].set_title("Activations")
        axes[1].set_xlabel("Value")
        axes[1].set_ylabel("Count (log)")

        fig.suptitle(f"Weight vs Activation Distributions\n{lname}")
        fig.tight_layout()
        fig.savefig(f"{FIGURES_DIR}/task3_step1__{lname.replace('.', '_')}.png", dpi=120)
        plt.close(fig)


def visualize_distributions_3d(model, tokenizer):
    
    CALIBRATION_SAMPLES = 1
    SEQ_LEN = 128
    LAYERS_TO_VISUALIZE = [
        "model.layers.0.self_attn.o_proj",
        "model.layers.0.mlp.up_proj",
    ]

    modules = dict(model.named_modules())
    act_snapshot = {}
    handles = []

    def make_hook(name):
        def hook(mod, inputs, output):
            if name in act_snapshot:
                return
            x = inputs[0].detach()
            if x.dim() == 2:
                x = x.unsqueeze(1)
            act_snapshot[name] = x[0].cpu()  # (L, H) from batch 0
        return hook

    for lname in LAYERS_TO_VISUALIZE:
        if lname in modules and isinstance(modules[lname], nn.Linear):
            h = modules[lname].register_forward_hook(make_hook(lname))
            handles.append(h)

    model.eval()
    with torch.no_grad():
        seen = 0
        for example in calibration_dataset:
            if seen >= CALIBRATION_SAMPLES:
                break
            text = example["text"].strip()
            if not text:
                continue
            enc = tokenizer(
                text,
                return_tensors="pt",
                truncation=True,
                max_length=SEQ_LEN,
            )
            enc = {k: v.to(DEVICE) for k, v in enc.items()}
            _ = model(**enc)
            seen += 1

    for h in handles:
        h.remove()

    for lname, act in act_snapshot.items():
        L, H = act.shape
        max_h = min(64, H)
        act_sub = act[:, :max_h]

        X = np.arange(L)
        Y = np.arange(max_h)
        X, Y = np.meshgrid(X, Y)
        Z = act_sub.T.float().cpu().numpy()

        fig = plt.figure(figsize=(10, 6))
        ax = fig.add_subplot(111, projection="3d")
        ax.plot_surface(X, Y, Z, linewidth=0, antialiased=True)
        ax.set_title(f"3D Activation Surface – {lname}")
        ax.set_xlabel("Token position")
        ax.set_ylabel("Hidden dim (subset)")
        ax.set_zlabel("Activation")

        fig.savefig(f"{FIGURES_DIR}/task3_step1__{lname.replace('.', '_')}_3d.png", dpi=120)
        plt.close(fig)

# Task 3, Step 2: Visualize weight and activation distributions to motivate SmoothQuant
visualize_distributions(model_fp16, tokenizer)
visualize_distributions_3d(model_fp16, tokenizer)


In [7]:
### Cell 6: Core Implementation of SmoothQuant

# --------------------------------------------------------------------------------
# Part 1: Quantizers
# --------------------------------------------------------------------------------

@torch.no_grad()
def quantize_weight_per_channel_absmax(w, n_bits=8):
    """
    Quantizes weights per output channel using absolute max scaling.
    Assumes w is (out_features, in_features).
    """
    qmax = 2 ** (n_bits - 1) - 1
    max_vals = w.abs().amax(dim=1, keepdim=True)  # (out, 1)
    max_vals = torch.clamp(max_vals, min=1e-8)
    scales = max_vals / qmax
    w_int = torch.round(w / scales).clamp(-qmax - 1, qmax).to(torch.int8)
    return w_int, scales.squeeze(1)


@torch.no_grad()
def quantize_activation_per_token_absmax(t, n_bits=8):
    """
    Quantizes activations per token using absolute max scaling.
    Supports (B, L, H) or (B, H) tensors.
    """
    qmax = 2 ** (n_bits - 1) - 1
    if t.dim() == 2:
        t = t.unsqueeze(1)  # (B, 1, H)
        squeeze_back = True
    else:
        squeeze_back = False

    max_vals = t.abs().amax(dim=-1, keepdim=True)  # (..., 1)
    max_vals = torch.clamp(max_vals, min=1e-8)
    scales = max_vals / qmax
    t_int = torch.round(t / scales).clamp(-qmax - 1, qmax).to(torch.int8)

    if squeeze_back:
        t_int = t_int.squeeze(1)
        scales = scales.squeeze(1)

    return t_int, scales

# --------------------------------------------------------------------------------
# Part 2: Quantized Linear Layer
# --------------------------------------------------------------------------------

class WnAnLinear(nn.Module):
    """
    Quantized Linear Layer with per-channel weight and per-token activation quantization.
    """
    def __init__(self, in_features, out_features, bias=True, w_bits=8, a_bits=8):
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.w_bits = w_bits
        self.a_bits = a_bits

        self.register_buffer(
            "weight_int",
            torch.zeros(out_features, in_features, dtype=torch.int8),
        )
        self.register_buffer(
            "weight_scale",
            torch.ones(out_features, dtype=torch.float32),
        )
        if bias:
            self.bias = nn.Parameter(torch.zeros(out_features, dtype=torch.float32))
        else:
            self.bias = None

    def forward(self, x):
        x_int, x_scale = quantize_activation_per_token_absmax(x, self.a_bits)
        x_deq = x_int.float() * x_scale
        w_deq = self.weight_int.float() * self.weight_scale.unsqueeze(1)
        out = F.linear(x_deq, w_deq, self.bias)
        return out.to(x.dtype)

    @classmethod
    @torch.no_grad()
    def from_float(cls, module, w_bits=8, a_bits=8):
        assert isinstance(module, nn.Linear)
        qmod = cls(
            module.in_features,
            module.out_features,
            bias=module.bias is not None,
            w_bits=w_bits,
            a_bits=a_bits,
        )
        w = module.weight.detach()
        w_int, w_scale = quantize_weight_per_channel_absmax(w, n_bits=w_bits)
        qmod.weight_int.copy_(w_int)
        qmod.weight_scale.copy_(w_scale.to(qmod.weight_scale.dtype))
        if module.bias is not None:
            qmod.bias.data.copy_(module.bias.detach().to(qmod.bias.dtype))
        return qmod

# --------------------------------------------------------------------------------
# Part 3: Smoothing Function (SmoothQuant)
# --------------------------------------------------------------------------------

@torch.no_grad()
def smooth_ln_fcs(ln, fcs, act_scales, alpha=0.5):
    hidden_dim = ln.normalized_shape[0]
    assert act_scales.numel() == hidden_dim
    weight_max = torch.zeros(hidden_dim, device=act_scales.device)
    for fc in fcs:
        w = fc.weight.detach()  # (out, hidden_dim)
        weight_max = torch.maximum(weight_max, w.abs().amax(dim=0))
    weight_max = torch.clamp(weight_max, min=1e-8)
    act_max = torch.clamp(act_scales.to(weight_max.device), min=1e-8)
    s = (act_max ** alpha) / (weight_max ** (1 - alpha))
    s = torch.clamp(s, min=1e-4, max=1e4)
    if ln.weight is not None:
        ln.weight.data /= s
    if ln.bias is not None:
        ln.bias.data /= s
    for fc in fcs:
        fc.weight.data *= s.unsqueeze(0)
    return s


def find_layers(module, layers=(nn.Linear,), name=""):
    out = {}
    for child_name, child in module.named_children():
        full = f"{name}.{child_name}" if name else child_name
        if isinstance(child, layers):
            out[full] = child
        else:
            out.update(find_layers(child, layers, full))
    return out


@torch.no_grad()
def smooth_model(model, act_scales, alpha=0.5):
    for i, layer in enumerate(model.model.layers):
        ln1_name = f"model.layers.{i}.input_layernorm"
        ln2_name = f"model.layers.{i}.post_attention_layernorm"

        if ln1_name in act_scales:
            ln = layer.input_layernorm
            fcs = [
                layer.self_attn.q_proj,
                layer.self_attn.k_proj,
                layer.self_attn.v_proj,
                layer.self_attn.o_proj,
            ]
            smooth_ln_fcs(ln, fcs, act_scales[ln1_name], alpha=alpha)

        if ln2_name in act_scales:
            ln = layer.post_attention_layernorm
            fcs = [
                layer.mlp.gate_proj,
                layer.mlp.up_proj,
                layer.mlp.down_proj,
            ]
            smooth_ln_fcs(ln, fcs, act_scales[ln2_name], alpha=alpha)

    return model


def quantize_model(model, w_bits=8, a_bits=8):
    """
    Replaces Llama linear layers with WnAnLinear.
    """
    for module_name, module in list(model.named_modules()):
        for child_name, child in list(module.named_children()):
            if isinstance(child, nn.Linear):
                setattr(
                    module,
                    child_name,
                    WnAnLinear.from_float(child, w_bits=w_bits, a_bits=a_bits),
                )
    return model


In [8]:
### Cell 7: Activation Scale Calibration & Perplexity Evaluation

# --------------------------------------------------------------------------------
# Part 1: Activation Scale Calibration
# --------------------------------------------------------------------------------

import math

@torch.no_grad()
def get_act_scales(model, tokenizer, dataset, num_samples=256, seq_len=512):
    ln_modules = {
        name: m for name, m in model.named_modules() if isinstance(m, nn.LayerNorm)
    }
    act_scales = {
        name: torch.zeros(m.normalized_shape[0], device=DEVICE)
        for name, m in ln_modules.items()
    }
    handles = []
    def make_hook(name):
        def hook(mod, inputs, output):
            x = output.detach()
            if x.dim() == 2:
                x = x.unsqueeze(1)
            x = x.reshape(-1, x.shape[-1])  # (B*L, H)
            max_abs = x.abs().amax(dim=0)
            act_scales[name] = torch.maximum(act_scales[name], max_abs)
        return hook
    for name, ln in ln_modules.items():
        h = ln.register_forward_hook(make_hook(name))
        handles.append(h)
    model.eval()
    seen = 0
    for ex in tqdm(dataset, total=min(num_samples, len(dataset)), desc="Calibrating"):
        if seen >= num_samples:
            break
        text = ex["text"].strip()
        if not text:
            continue
        enc = tokenizer(
            text,
            return_tensors="pt",
            truncation=True,
            max_length=seq_len,
        )
        enc = {k: v.to(DEVICE) for k, v in enc.items()}
        _ = model(**enc)
        seen += 1
    for h in handles:
        h.remove()
    act_scales = {k: v.cpu() for k, v in act_scales.items()}
    return act_scales

# --------------------------------------------------------------------------------
# Part 2: Perplexity Evaluator
# --------------------------------------------------------------------------------

class Evaluator:
    def __init__(self, dataset, tokenizer, device, n_samples=128):
        texts = []
        for ex in dataset:
            if len(texts) >= n_samples:
                break
            t = ex["text"].strip()
            if t:
                texts.append(t)
        enc = tokenizer("\n\n".join(texts), return_tensors="pt")
        self.input_ids = enc["input_ids"][0].to(device)
        self.tokenizer = tokenizer
        self.device = device

    @torch.no_grad()
    def evaluate(self, model, seq_len=2048):
        nlls = []
        ids = self.input_ids
        for i in tqdm(
            range(0, ids.size(0) - 1, seq_len),
            desc="Evaluating PPL",
        ):
            input_ids = ids[i : i + seq_len].unsqueeze(0)
            labels = input_ids.clone()
            outputs = model(
                input_ids.to(self.device),
                labels=labels.to(self.device),
            )
            nlls.append(outputs.loss.item())
        mean_nll = sum(nlls) / len(nlls)
        ppl = math.exp(mean_nll)
        return ppl


In [None]:
### Cell 8: Main Experiment - Apply SmoothQuant and Evaluate

import gc

def run_experiment(model_id, quant_config, calibration_ds, evaluation_ds, model=None, tokenizer=None):
    print(f"\n=== Experiment: {model_id} | {quant_config} ===")
    if model is None or tokenizer is None:
        model, tokenizer = load_model_and_tokenizer(model_id)

    w_bits = quant_config.get("w_bits", None)
    a_bits = quant_config.get("a_bits", None)
    smooth = quant_config.get("smooth", False)
    alpha = quant_config.get("alpha", 0.5)

    if w_bits is not None and a_bits is not None and smooth:
        act_scales = get_act_scales(
            model,
            tokenizer,
            calibration_ds,
            num_samples=quant_config.get("calib_samples", 256),
            seq_len=quant_config.get("calib_seq_len", 512),
        )
        model = smooth_model(model, act_scales, alpha=alpha)

    if w_bits is not None and a_bits is not None:
        model = quantize_model(model, w_bits=w_bits, a_bits=a_bits)

    evaluator = Evaluator(
        evaluation_ds,
        tokenizer,
        DEVICE,
        n_samples=quant_config.get("eval_samples", 128),
    )
    ppl = evaluator.evaluate(model, seq_len=quant_config.get("eval_seq_len", 2048))
    print(f"Perplexity: {ppl:.3f}")
    return ppl

# --- Experiment Configurations ---
experiment_configs = {
    "Llama-3-8B": {
        "bf16_baseline": {
            "w_bits": None,
            "a_bits": None,
            "smooth": False,
            "eval_samples": 128,
        },
        "naive_W8A8": {
            "w_bits": 8,
            "a_bits": 8,
            "smooth": False,
            "calib_samples": 256,
            "eval_samples": 128,
        },
        "W8A8_SmoothQuant_alpha0.5": {
            "w_bits": 8,
            "a_bits": 8,
            "smooth": True,
            "alpha": 0.5,
            "calib_samples": 256,
            "eval_samples": 128,
        },
    },
    "Llama-2-7B": {
        "bf16_baseline": {
            "w_bits": None,
            "a_bits": None,
            "smooth": False,
            "eval_samples": 128,
        },
        "naive_W8A8": {
            "w_bits": 8,
            "a_bits": 8,
            "smooth": False,
            "calib_samples": 256,
            "eval_samples": 128,
        },
        "W8A8_SmoothQuant_alpha0.5": {
            "w_bits": 8,
            "a_bits": 8,
            "smooth": True,
            "alpha": 0.5,
            "calib_samples": 256,
            "eval_samples": 128,
        },
    },
}

MODEL_MAPPING = {
    "Llama-3-8B": "meta-llama/Meta-Llama-3-8B",
    "Llama-2-7B": "meta-llama/Llama-2-7b-hf",
}

# --- Run all experiments and collect results ---
results = {}
for model_name, configs in experiment_configs.items():
    model_id = MODEL_MAPPING[model_name]
    results[model_name] = {}
    for config_name, config in configs.items():
        model, tokenizer = load_model_and_tokenizer(model_id)
        results[model_name][config_name] = run_experiment(
            model_id, config, calibration_dataset, eval_dataset, model, tokenizer
        )


Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00,  1.56it/s]
Some parameters are on the meta device because they were offloaded to the cpu.
You shouldn't move a model that is dispatched using accelerate hooks.


RuntimeError: You can't move a model that has some modules offloaded to cpu or disk.

In [17]:
print(torch.cuda.memory_summary())

|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 1            |        cudaMalloc retries: 1         |
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  13906 MiB |  23009 MiB | 158286 MiB | 144379 MiB |
|       from large pool |  13906 MiB |  23008 MiB | 137009 MiB | 123103 MiB |
|       from small pool |      0 MiB |     39 MiB |  21276 MiB |  21276 MiB |
|---------------------------------------------------------------------------|
| Active memory         |  13906 MiB |  23009 MiB | 158286 MiB | 144379 MiB |
|       from large pool |  13906 MiB |  23008 MiB | 137009 MiB | 123103 MiB |
|       from small pool |      0 MiB |     39 MiB |  21276 MiB |  21276 MiB |
|---------------------------------------------------------------

In [None]:
### Cell 9: Results Summary and Analysis

# --- 1. Format results as a table for easy comparison ---
results_df = pd.DataFrame(results)
print("\n" + "=" * 50)
print(" " * 15 + "Experiment Results Summary")
print("=" * 50)
# TODO: Format and display results (e.g., Markdown table).
print("=" * 50)

# TODO: Persist results if needed (e.g., CSV export).


In [None]:
### Cell 10: List All Generated Artifacts
print("Task 3 complete. Generated artifacts:")
if os.path.isdir(FIGURES_DIR):
    print("Figures:")
    # TODO: List figure artifacts that were generated.
if os.path.isdir(RESULTS_DIR):
    print("Results:")
    # TODO: List result artifacts that were generated.
