# I/O Patterns and Bottlenecks in Deep Learning Workloads

**Author:** Pablo Alessandro Santos Hugen  
**Institution:** Institute of Informatics -- UFRGS  
**Course:** Computer Systems Performance Analysis 2025/2

---

This notebook serves as the **central controller** for the entire experimental workflow:
1. Environment setup and configuration
2. Loading the experimental design
3. Running DLIO benchmarks
4. Collecting and analyzing results

**Prerequisites:**
- Allocate an interactive node: `salloc --partition=<partition> --nodes=1 --ntasks=8 --time=4:00:00`
- Launch Jupyter from the allocated node

## 1. Introduction

### 1.1 Context

Recent years have seen growing interest in optimizations for Machine Learning and Deep Learning training and inference methods. These techniques are now used across various fields, including Large Language Models (LLMs), image recognition and classification, and many other applications.

Large models often require substantial HPC infrastructures to process the enormous amounts of training data involved. In this context, **the performance of the storage and I/O subsystem is critical**.

#### Traditional HPC vs. ML Workloads

| Aspect | Traditional HPC | ML Workloads |
|--------|-----------------|---------------|
| Access Pattern | Large, sequential reads/writes | Small, random reads across numerous files |
| Typical Use Case | Simulations with periodic checkpoints | Iterative training over dataset epochs |
| I/O Characteristics | Predictable, burst-oriented | Continuous, irregular access patterns |

### 1.2 The I/O Bottleneck Problem

At large-scale distributed DL workloads:
- **I/O can take roughly 85% of the training time** (Mohan et al., 2021)
- Training is often one of the most expensive parts of the ML pipeline (Chowdhury et al., 2023)

## 2. Objectives

### 2.1 General Objective

Understand **patterns in I/O operations and possible bottlenecks** in common Machine Learning workloads.

### 2.2 Specific Objectives

1. **Disk Throughput:** Understand how disk throughput varies during training between epochs, checkpoints, and when the number of training processes varies.

2. **GPU Usage:** Analyze how GPU usage (%) behaves in those scenarios.

## 3. Environment Setup

### 3.1 Configuration

Configure the environment variables for your cluster below.

In [None]:
import os
import subprocess
import json
import glob
import shutil
import tempfile
from pathlib import Path
from datetime import datetime
from IPython.display import display, HTML, clear_output

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Configure plotting style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 12

In [None]:
#==============================================================================
# CONFIGURATION - Modify these for your cluster
#==============================================================================

# Environment modules to load (same as bench.slurm)
MODULES = "arch_gpu_sc/current openmpi/4.1.6.15.1"

# Paths
BASE_DIR = Path("..").resolve()
CONFIG_DIR = BASE_DIR / "config"
RESULTS_DIR = BASE_DIR / "results"
EXPERIMENT_FILE = Path("experimental_design.csv")

# Create scratch directory for benchmark data
SCRATCH_DIR = BASE_DIR / f"dlio_data_{os.getpid()}"

print(f"Modules: {MODULES}")
print(f"Base directory: {BASE_DIR}")
print(f"Config directory: {CONFIG_DIR}")
print(f"Results directory: {RESULTS_DIR}")
print(f"Scratch directory: {SCRATCH_DIR}")

### 3.2 Helper Functions

Functions for running shell commands and managing the benchmark environment.

In [None]:
def run_command(cmd: str, cwd: Path = None, verbose: bool = True, load_modules: bool = True) -> tuple[int, str, str]:
    """
    Execute a shell command and return the result.
    
    Parameters
    ----------
    cmd : str
        Command to execute
    cwd : Path, optional
        Working directory
    verbose : bool
        Print output in real-time
    load_modules : bool
        Prefix command with module load (default True)
        
    Returns
    -------
    tuple
        (return_code, stdout, stderr)
    """
    # Prefix with module load if configured
    if load_modules and MODULES:
        cmd = f"module load {MODULES} && {cmd}"
    
    if verbose:
        print(f"$ {cmd}")
        print("-" * 60)
    
    process = subprocess.Popen(
        cmd,
        shell=True,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        cwd=cwd,
        text=True
    )
    
    output_lines = []
    for line in process.stdout:
        output_lines.append(line)
        if verbose:
            print(line, end="")
    
    process.wait()
    stdout = "".join(output_lines)
    
    if verbose:
        print("-" * 60)
        print(f"Return code: {process.returncode}")
    
    return process.returncode, stdout, ""

In [None]:
def setup_environment():
    """
    Setup the benchmark environment:
    - Create directories
    - Sync uv dependencies
    - Install dlio-benchmark (without pydftracer)
    """
    print("=" * 60)
    print("ENVIRONMENT SETUP")
    print("=" * 60)
    
    # Create directories
    print("\n[1/3] Creating directories...")
    SCRATCH_DIR.mkdir(parents=True, exist_ok=True)
    RESULTS_DIR.mkdir(parents=True, exist_ok=True)
    print(f"  Created: {SCRATCH_DIR}")
    print(f"  Created: {RESULTS_DIR}")
    
    # Sync analysis dependencies
    print("\n[2/3] Syncing dependencies...")
    ret, _, _ = run_command("uv sync", cwd=BASE_DIR)
    if ret != 0:
        return False
    
    # Install dlio-benchmark without pydftracer (which fails to build on ARM)
    # Use --python to target the project's venv
    print("\n[3/3] Installing dlio-benchmark...")
    venv_python = BASE_DIR / ".venv" / "bin" / "python"
    ret, _, _ = run_command(
        f"uv pip install --python {venv_python} dlio-benchmark --no-deps && "
        f"uv pip install --python {venv_python} hydra-core omegaconf pyyaml numpy mpi4py h5py",
        cwd=BASE_DIR
    )
    
    print("\n" + "=" * 60)
    print("Environment setup complete!")
    print("=" * 60)
    return ret == 0


def cleanup_scratch():
    """Remove the scratch directory."""
    if SCRATCH_DIR.exists():
        shutil.rmtree(SCRATCH_DIR)
        print(f"Cleaned up: {SCRATCH_DIR}")

### 3.3 Initialize Environment

Run this cell to setup the environment. This will:
- Load required modules
- Create necessary directories
- Sync uv dependencies

In [None]:
setup_environment()

### 3.4 System Information

Dynamically collect system specifications from the current node.

In [None]:
import socket
import platform
import re

def get_system_info() -> dict:
    """
    Collect system information dynamically.
    
    Returns
    -------
    dict
        Dictionary with system specifications
    """
    info = {
        "hostname": socket.gethostname(),
        "platform": platform.platform(),
        "processor": platform.processor(),
    }
    
    # CPU info
    try:
        with open("/proc/cpuinfo", "r") as f:
            cpuinfo = f.read()
        
        # Count physical cores
        physical_ids = set(re.findall(r"physical id\s*:\s*(\d+)", cpuinfo))
        cores_per_socket = len(set(re.findall(r"core id\s*:\s*(\d+)", cpuinfo)))
        total_cores = len(re.findall(r"^processor\s*:", cpuinfo, re.MULTILINE))
        
        # Get CPU model name
        model_match = re.search(r"model name\s*:\s*(.+)", cpuinfo)
        if model_match:
            info["cpu_model"] = model_match.group(1).strip()
        else:
            # For ARM, try to get CPU part
            cpu_part = re.search(r"CPU part\s*:\s*(.+)", cpuinfo)
            info["cpu_model"] = f"ARM {cpu_part.group(1).strip()}" if cpu_part else "Unknown"
        
        info["cpu_sockets"] = len(physical_ids) if physical_ids else 1
        info["cpu_cores_total"] = total_cores
    except Exception as e:
        info["cpu_error"] = str(e)
    
    # Memory info
    try:
        with open("/proc/meminfo", "r") as f:
            meminfo = f.read()
        
        mem_match = re.search(r"MemTotal:\s*(\d+)\s*kB", meminfo)
        if mem_match:
            mem_kb = int(mem_match.group(1))
            info["memory_gb"] = round(mem_kb / 1024 / 1024, 1)
    except Exception as e:
        info["memory_error"] = str(e)
    
    # GPU info (nvidia-smi)
    try:
        result = subprocess.run(
            ["nvidia-smi", "--query-gpu=name,memory.total,count", "--format=csv,noheader,nounits"],
            capture_output=True, text=True, timeout=10
        )
        if result.returncode == 0:
            gpu_lines = result.stdout.strip().split("\n")
            gpu_count = len(gpu_lines)
            if gpu_lines and gpu_lines[0]:
                parts = gpu_lines[0].split(", ")
                info["gpu_model"] = parts[0].strip()
                info["gpu_memory_mb"] = int(parts[1].strip()) if len(parts) > 1 else 0
                info["gpu_count"] = gpu_count
    except Exception as e:
        info["gpu_info"] = "Not available"
    
    # Storage info
    try:
        result = subprocess.run(
            ["df", "-h", "--output=size,avail,target", "/"],
            capture_output=True, text=True, timeout=10
        )
        if result.returncode == 0:
            lines = result.stdout.strip().split("\n")
            if len(lines) > 1:
                parts = lines[1].split()
                info["storage_total"] = parts[0]
                info["storage_available"] = parts[1]
    except Exception as e:
        info["storage_error"] = str(e)
    
    return info


def display_system_info(info: dict):
    """Display system information as a formatted table."""
    print("=" * 60)
    print("SYSTEM INFORMATION")
    print("=" * 60)
    print(f"Hostname: {info.get('hostname', 'Unknown')}")
    print(f"Platform: {info.get('platform', 'Unknown')}")
    print()
    
    # CPU
    print("CPU:")
    print(f"  Model: {info.get('cpu_model', 'Unknown')}")
    print(f"  Sockets: {info.get('cpu_sockets', 'Unknown')}")
    print(f"  Total Cores: {info.get('cpu_cores_total', 'Unknown')}")
    print()
    
    # Memory
    print("Memory:")
    print(f"  Total: {info.get('memory_gb', 'Unknown')} GiB")
    print()
    
    # GPU
    if "gpu_model" in info:
        print("GPU:")
        print(f"  Model: {info.get('gpu_model', 'Unknown')}")
        print(f"  Count: {info.get('gpu_count', 'Unknown')}")
        print(f"  Memory per GPU: {info.get('gpu_memory_mb', 0) / 1024:.0f} GB")
    else:
        print("GPU: Not available")
    print()
    
    # Storage
    print("Storage:")
    print(f"  Total: {info.get('storage_total', 'Unknown')}")
    print(f"  Available: {info.get('storage_available', 'Unknown')}")
    print("=" * 60)
    
    return info


# Collect and display system info
system_info = get_system_info()
display_system_info(system_info)

In [None]:
# Create a summary table for the report
def system_info_table(info: dict) -> pd.DataFrame:
    """Create a DataFrame with system specifications."""
    
    gpu_spec = "Not available"
    if "gpu_model" in info:
        gpu_mem_gb = info.get('gpu_memory_mb', 0) / 1024
        gpu_spec = f"{info.get('gpu_count', 1)}x {info.get('gpu_model', 'Unknown')} ({gpu_mem_gb:.0f}GB each)"
    
    data = [
        ("CPU", f"{info.get('cpu_sockets', 1)}x {info.get('cpu_model', 'Unknown')} ({info.get('cpu_cores_total', 'Unknown')} cores total)"),
        ("Memory (RAM)", f"{info.get('memory_gb', 'Unknown')} GiB"),
        ("GPU", gpu_spec),
        ("Storage", f"{info.get('storage_total', 'Unknown')} (Available: {info.get('storage_available', 'Unknown')})"),
    ]
    
    return pd.DataFrame(data, columns=["Component", "Specification"])


# Display as table
system_table = system_info_table(system_info)
display(system_table.style.hide(axis='index').set_properties(**{'text-align': 'left'}))

## 4. Experimental Design

### 4.1 Load Experiment Configuration

The experimental design defines all benchmark runs to execute. The `run` column indicates completion status:
- `N` = Not yet run (pending)
- `Y` = Completed

In [None]:
def load_experiment_design() -> pd.DataFrame:
    """Load the experimental design CSV."""
    df = pd.read_csv(EXPERIMENT_FILE)
    return df


def save_experiment_design(df: pd.DataFrame):
    """Save the experimental design CSV."""
    df.to_csv(EXPERIMENT_FILE, index=False)
    print(f"Saved experiment design to {EXPERIMENT_FILE}")


def get_pending_experiments(df: pd.DataFrame) -> pd.DataFrame:
    """Get experiments that haven't been run yet."""
    return df[df['run'] == 'N'].copy()


def mark_experiment_complete(df: pd.DataFrame, model: str, processes: int) -> pd.DataFrame:
    """Mark an experiment as completed."""
    mask = (df['model'] == model) & (df['processes'] == processes)
    df.loc[mask, 'run'] = 'Y'
    return df


# Load experiment design
experiment_df = load_experiment_design()

print("=" * 60)
print("EXPERIMENTAL DESIGN")
print("=" * 60)
print(f"Total experiments: {len(experiment_df)}")
print(f"Completed: {(experiment_df['run'] == 'Y').sum()}")
print(f"Pending: {(experiment_df['run'] == 'N').sum()}")
print("\nExperiment matrix:")
experiment_df

In [None]:
def plot_experiment_status(df: pd.DataFrame):
    """
    Visualize experiment completion status.
    """
    pivot = df.pivot(index='model', columns='processes', values='run')
    pivot_numeric = pivot.replace({'Y': 1, 'N': 0})
    
    fig, ax = plt.subplots(figsize=(10, 5))
    
    sns.heatmap(
        pivot_numeric,
        annot=pivot.values,
        fmt='',
        cmap=['#ffcccc', '#ccffcc'],
        cbar=False,
        linewidths=2,
        linecolor='white',
        ax=ax
    )
    
    ax.set_title('Experiment Status Matrix\n(Y = Completed, N = Pending)', 
                 fontsize=14, fontweight='bold')
    ax.set_xlabel('Number of Processes', fontsize=12)
    ax.set_ylabel('Model', fontsize=12)
    
    plt.tight_layout()
    plt.show()


plot_experiment_status(experiment_df)

## 5. Benchmark Execution

### 5.1 Benchmark Runner

Functions to execute DLIO benchmarks based on the experimental design.

In [None]:
def generate_data(model: str, num_procs: int = 8) -> bool:
    """
    Generate synthetic data for a workload.
    
    Parameters
    ----------
    model : str
        Workload name (e.g., 'default_custom')
    num_procs : int
        Number of MPI processes for data generation
        
    Returns
    -------
    bool
        True if successful
    """
    print(f"\n{'='*60}")
    print(f"GENERATING DATA: {model}")
    print(f"{'='*60}")
    
    cmd = f"""
    mpirun -np {num_procs} \
        uv run --project {BASE_DIR} dlio_benchmark \
        --config-dir {CONFIG_DIR} \
        workload={model} \
        ++workload.workflow.generate_data=True \
        ++workload.workflow.train=False \
        ++workload.workflow.evaluation=False
    """.strip()
    
    ret, _, _ = run_command(cmd, cwd=SCRATCH_DIR)
    return ret == 0


def run_benchmark(model: str, num_procs: int) -> bool:
    """
    Run a benchmark for a specific model and process count.
    
    Parameters
    ----------
    model : str
        Workload name
    num_procs : int
        Number of MPI processes
        
    Returns
    -------
    bool
        True if successful
    """
    print(f"\n{'='*60}")
    print(f"RUNNING BENCHMARK: {model} with {num_procs} processes")
    print(f"{'='*60}")
    
    cmd = f"""
    mpirun -np {num_procs} \
        uv run --project {BASE_DIR} dlio_benchmark \
        --config-dir {CONFIG_DIR} \
        workload={model} \
        ++workload.workflow.generate_data=False \
        ++workload.workflow.train=True \
        ++workload.workflow.evaluation=True
    """.strip()
    
    ret, _, _ = run_command(cmd, cwd=SCRATCH_DIR)
    
    if ret == 0:
        # Copy results
        result_src = SCRATCH_DIR / "hydra_log" / model
        result_dst = RESULTS_DIR / model / str(num_procs)
        
        if result_src.exists():
            result_dst.mkdir(parents=True, exist_ok=True)
            for f in result_src.iterdir():
                shutil.copy2(f, result_dst)
            print(f"\nResults copied to: {result_dst}")
            
            # Clean up hydra_log
            shutil.rmtree(result_src)
    
    return ret == 0

### 5.2 Run Single Experiment

Use this cell to run a single experiment. Modify the parameters as needed.

In [None]:
# ============================================================
# SINGLE EXPERIMENT - Modify these parameters
# ============================================================
RUN_MODEL = "default_custom"  # Model to run
RUN_PROCS = 1                  # Number of processes
GENERATE_DATA_FIRST = True     # Generate data before benchmark?
# ============================================================

# Check if already completed
exp_row = experiment_df[(experiment_df['model'] == RUN_MODEL) & 
                        (experiment_df['processes'] == RUN_PROCS)]

if not exp_row.empty and exp_row.iloc[0]['run'] == 'Y':
    print(f"Experiment {RUN_MODEL} with {RUN_PROCS} procs already completed.")
    print("Set run='N' in the CSV to re-run, or modify parameters above.")
else:
    print(f"Will run: {RUN_MODEL} with {RUN_PROCS} processes")
    print(f"Generate data first: {GENERATE_DATA_FIRST}")
    print("\nExecute the next cell to start the benchmark.")

In [None]:
# Execute the single experiment
if GENERATE_DATA_FIRST:
    if not generate_data(RUN_MODEL):
        raise RuntimeError(f"Data generation failed for {RUN_MODEL}")

if run_benchmark(RUN_MODEL, RUN_PROCS):
    print(f"\n{'='*60}")
    print("SUCCESS!")
    print(f"{'='*60}")
    
    # Update experiment status
    experiment_df = mark_experiment_complete(experiment_df, RUN_MODEL, RUN_PROCS)
    save_experiment_design(experiment_df)
    
    print(f"\nUpdated experiment status for {RUN_MODEL} ({RUN_PROCS} procs)")
else:
    print(f"\n{'='*60}")
    print("FAILED!")
    print(f"{'='*60}")

### 5.3 Run All Pending Experiments

This will run all experiments marked as `N` (pending) in the experimental design.

In [None]:
def run_all_pending_experiments(df: pd.DataFrame) -> pd.DataFrame:
    """
    Run all pending experiments from the experimental design.
    
    Parameters
    ----------
    df : pd.DataFrame
        Experimental design DataFrame
        
    Returns
    -------
    pd.DataFrame
        Updated DataFrame with completion status
    """
    pending = get_pending_experiments(df)
    
    if pending.empty:
        print("No pending experiments to run!")
        return df
    
    print(f"Found {len(pending)} pending experiments")
    print("\nPending experiments:")
    display(pending)
    
    # Group by model to generate data once per model
    models = pending['model'].unique()
    
    for model in models:
        print(f"\n{'#'*60}")
        print(f"# Processing model: {model}")
        print(f"{'#'*60}")
        
        # Generate data for this model
        model_pending = pending[pending['model'] == model]
        max_procs = model_pending['processes'].max()
        
        if not generate_data(model, num_procs=max_procs):
            print(f"ERROR: Data generation failed for {model}")
            continue
        
        # Run benchmarks for each process count
        for _, row in model_pending.iterrows():
            procs = row['processes']
            
            if run_benchmark(model, procs):
                df = mark_experiment_complete(df, model, procs)
                save_experiment_design(df)
                print(f"Marked {model} ({procs} procs) as complete")
            else:
                print(f"ERROR: Benchmark failed for {model} ({procs} procs)")
    
    return df


# Show pending experiments
pending = get_pending_experiments(experiment_df)
print(f"Pending experiments: {len(pending)}")
if not pending.empty:
    display(pending)
    print("\nExecute the next cell to run all pending experiments.")

In [None]:
# Run all pending experiments
# WARNING: This may take a long time!

experiment_df = run_all_pending_experiments(experiment_df)

print("\n" + "="*60)
print("ALL EXPERIMENTS COMPLETE")
print("="*60)
plot_experiment_status(experiment_df)

## 6. Results Analysis

### 6.1 Load Benchmark Results

In [None]:
def load_benchmark_results(results_dir: Path) -> pd.DataFrame:
    """
    Load all benchmark summary.json files.
    """
    data = []
    
    for file_path in glob.glob(str(results_dir / "**/summary.json"), recursive=True):
        with open(file_path, "r") as f:
            summary = json.load(f)
        
        path_parts = Path(file_path).parts
        try:
            results_idx = path_parts.index('results')
            model_name = path_parts[results_idx + 1]
        except (ValueError, IndexError):
            model_name = Path(file_path).parent.parent.name
        
        num_accelerators = summary.get("num_accelerators", 0)
        metrics = summary.get("metric", {})
        
        data.append([
            model_name,
            num_accelerators,
            metrics.get("train_au_mean_percentage", 0),
            metrics.get("train_au_stdev_percentage", 0),
            metrics.get("train_io_mean_MB_per_second", 0),
            metrics.get("train_io_stdev_MB_per_second", 0)
        ])
    
    df = pd.DataFrame(
        data,
        columns=["model", "processes", "accelerator_usage", 
                 "accelerator_usage_std", "io_throughput", "io_throughput_std"]
    )
    
    return df.sort_values(by=["model", "processes"]).reset_index(drop=True)


# Load results
results_df = load_benchmark_results(RESULTS_DIR)
print(f"Loaded {len(results_df)} benchmark results")
results_df

### 6.2 Accelerator Usage vs. Number of Processes

In [None]:
def plot_accelerator_usage(df: pd.DataFrame):
    if df.empty:
        print("No data to plot")
        return
    
    fig, ax = plt.subplots(figsize=(12, 7))
    colors = sns.color_palette("husl", n_colors=len(df["model"].unique()))
    
    for i, model in enumerate(df["model"].unique()):
        model_df = df[df["model"] == model].sort_values("processes")
        ax.errorbar(
            model_df["processes"],
            model_df["accelerator_usage"],
            yerr=model_df["accelerator_usage_std"],
            marker='o', markersize=8, linewidth=2,
            label=model, capsize=5, capthick=2, color=colors[i]
        )
    
    ax.set_xlabel("Number of Processes", fontsize=14)
    ax.set_ylabel("Accelerator Usage (%)", fontsize=14)
    ax.set_title("Accelerator Usage vs. Number of Processes", fontsize=16, fontweight='bold')
    ax.legend(title="Model", loc="best")
    ax.grid(True, alpha=0.3)
    ax.set_xticks(sorted(df["processes"].unique()))
    plt.tight_layout()
    plt.savefig("accelerator_usage.png", dpi=150, bbox_inches='tight')
    plt.show()


plot_accelerator_usage(results_df)

### 6.3 I/O Throughput vs. Number of Processes

In [None]:
def plot_io_throughput(df: pd.DataFrame):
    if df.empty:
        print("No data to plot")
        return
    
    fig, ax = plt.subplots(figsize=(12, 7))
    colors = sns.color_palette("husl", n_colors=len(df["model"].unique()))
    
    for i, model in enumerate(df["model"].unique()):
        model_df = df[df["model"] == model].sort_values("processes")
        ax.errorbar(
            model_df["processes"],
            model_df["io_throughput"],
            yerr=model_df["io_throughput_std"],
            marker='s', markersize=8, linewidth=2,
            label=model, capsize=5, capthick=2, color=colors[i]
        )
    
    ax.set_xlabel("Number of Processes", fontsize=14)
    ax.set_ylabel("I/O Throughput (MB/s)", fontsize=14)
    ax.set_title("I/O Throughput vs. Number of Processes", fontsize=16, fontweight='bold')
    ax.legend(title="Model", loc="best")
    ax.grid(True, alpha=0.3)
    ax.set_xticks(sorted(df["processes"].unique()))
    plt.tight_layout()
    plt.savefig("io_throughput.png", dpi=150, bbox_inches='tight')
    plt.show()


plot_io_throughput(results_df)

### 6.4 Summary Table

In [None]:
if not results_df.empty:
    summary = results_df.copy()
    summary['accelerator_usage'] = summary['accelerator_usage'].round(2)
    summary['io_throughput'] = summary['io_throughput'].round(4)
    summary.columns = ['Model', 'Procs', 'AU (%)', 'AU Std', 'I/O (MB/s)', 'I/O Std']
    display(summary.style.background_gradient(subset=['AU (%)'], cmap='RdYlGn', vmin=0, vmax=100))
else:
    print("No results to display")

## 7. Cleanup

Run this cell to clean up the scratch directory when done.

In [None]:
cleanup_scratch()

## 8. Conclusions

### Key Findings

Based on the analysis of the benchmark results:

1. **Accelerator Usage Patterns:** [Add observations]

2. **I/O Throughput Scaling:** [Add observations]

3. **Model-Specific Characteristics:** [Add observations]

---

**Repository:** https://github.com/HpcResearchLaboratory/perf_2025