# Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device Research Artifact - Experimental Evaluation

This notebook contains all experiments from the MICRO'25 paper _Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device_. Each section corresponds to one experiment that can be run independently or as part of the complete evaluation.

## Overview of Experiments

1. **Binary Matrix Multiplication (1-bmatmul)**: Performance breakdown analysis of binary matrix multiplication with different optimization levels
2. **Phoenix Benchmark Suite (2-phoenix)**: Speedup evaluation across multiple benchmarks comparing CPU, GPU, and APU implementations
3. **Analytical Model Validation (3-analytical)**: Validation of analytical performance models against measured results
4. **RAG End-to-End Inference (4-rag-e2e)**: End-to-end inference time analysis for Retrieval-Augmented Generation workloads
5. **RAG Energy Analysis (5-rag-energy)**: Energy consumption comparison between GPU and compute-in-SRAM approaches
6. **RAG Latency Breakdown (6-rag-latency-breakdown)**: Detailed latency breakdown analysis for RAG components

## Instructions

- Run individual experiment cells to execute specific experiments
- Run all cells to execute the complete evaluation suite
- Some experiments produce PDF figures that will be saved to their respective directories
- Terminal outputs will be displayed inline in the notebook


In [None]:
import os
import sys
import subprocess
from pathlib import Path
from IPython.display import Image, display
import matplotlib.pyplot as plt

# Get the absolute path of the artifact root directory
artifact_root = Path.cwd().absolute()
print(f"Artifact root directory: {artifact_root}")

# Mapping of experiments to their expected output PNG files
EXPERIMENT_FIGURES = {
    "1-bmatmul": "bmatmul.png",
    "2-phoenix": "phoenix-speedup.png", 
    "4-rag-e2e": "e2e_inference_time.png",
    "5-rag-energy": "energy_comparison.png"
}

def display_experiment_figure(experiment_dir):
    """Display the PNG figure if it exists for this experiment"""
    if experiment_dir in EXPERIMENT_FIGURES:
        png_file = EXPERIMENT_FIGURES[experiment_dir]
        exp_path = artifact_root / experiment_dir
        png_path = exp_path / png_file
        
        if png_path.exists():
            print(f"\n📊 Generated Figure:")
            display(Image(filename=str(png_path)))
            return True
        else:
            print(f"⚠️  Expected figure {png_file} not found in {experiment_dir}")
            return False
    return True

# Function to run experiment in its directory
def run_experiment(experiment_dir, description=""):
    """Run an experiment in its specific directory and return to root"""
    print(f"\n{'='*80}")
    print(f"🚀 Running Experiment: {experiment_dir}")
    if description:
        print(f"📝 Description: {description}")
    print('='*80)
    
    # Change to experiment directory
    original_dir = os.getcwd()
    exp_path = artifact_root / experiment_dir
    
    if not exp_path.exists():
        print(f"❌ Error: Directory {experiment_dir} not found!")
        return False
    
    os.chdir(exp_path)
    print(f"📁 Changed to directory: {exp_path}")
    
    try:
        # Run the experiment
        process = subprocess.Popen(
            [sys.executable, "-u", "run.py"],
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
            bufsize=1
        )
        
        # Stream and filter
        for line in process.stdout:
            if "E: No packages found" in line:
                continue  # filter it out
            print(line, end="")  # preserve real-time behavior
        
        process.wait()
        
        if process.returncode == 0:
            print(f"✅ Experiment {experiment_dir} completed successfully!")
            success = True
        else:
            print(f"❌ Experiment {experiment_dir} failed with return code {result.returncode}")
            success = False
            
    except Exception as e:
        print(f"❌ Error running experiment {experiment_dir}: {e}")
        success = False
        
    finally:
        # Always return to original directory
        os.chdir(original_dir)
    
    # Display figure if experiment was successful and produces one
    if success:
        display_experiment_figure(experiment_dir)
        
    return success

print("🔧 Setup completed. Ready to run experiments!")

## Experiment 1: Binary Matrix Multiplication Performance Analysis

This experiment evaluates the performance of binary matrix multiplication across different optimization levels on the APU architecture. It runs five configurations:

- **Baseline**: No optimizations
- **Opt1**: First optimization level  
- **Opt2**: Second optimization level
- **Opt3**: Third optimization level
- **Optimized**: All optimizations combined

The experiment produces a performance breakdown chart showing the execution time distribution across different components:
- **LD LHS**: Left-hand side matrix loading time
- **LD RHS**: Right-hand side matrix loading time  
- **VR Op**: Vector register operations time
- **ST**: Store operations time

**Output**: 
- Generates `bmatmul.pdf` and `bmatmul.png` with stacked bar chart showing performance breakdown
- PNG figure displayed inline below after execution


In [None]:
# Run Binary Matrix Multiplication Experiment
run_experiment("1-bmatmul", 
               "Binary matrix multiplication performance breakdown across optimization levels")


## Experiment 2: Phoenix Benchmark Suite Evaluation

This experiment runs the Phoenix benchmark suite to compare performance across different computing platforms and optimization levels. It evaluates seven benchmarks:

- **Histogram**: Data frequency analysis
- **Linear Regression**: Statistical modeling  
- **Matrix Multiply**: Dense matrix operations
- **K-means**: Clustering algorithm
- **Reverse Index**: Text processing
- **String Match**: Pattern matching
- **Word Count**: Text analysis

The experiment compares performance across platforms:
- **CPU single-thread**: Baseline single-threaded CPU execution
- **CPU multi-thread**: Multi-threaded CPU execution
- **APU configurations**: No optimization, Opt1, Opt2, Opt3, and all optimizations

**Output**: 
- Generates `phoenix-speedup.pdf` and `phoenix-speedup.png` with speedup comparison chart
- Produces `ablation.json` with detailed performance data
- Terminal output showing performance statistics
- PNG figure displayed inline below after execution


In [None]:
# Run Phoenix Benchmark Suite Experiment  
run_experiment("2-phoenix",
               "Phoenix benchmark suite speedup evaluation across CPU, GPU, and APU platforms")


## Experiment 3: Analytical Model Validation

This experiment validates the accuracy of analytical performance models by comparing predicted latencies against actual measured results from the Phoenix benchmark suite. It evaluates the analytical models for all seven Phoenix benchmarks.

The experiment:
1. **Extracts measured latencies** from the optimized Phoenix benchmark results (from Experiment 2)
2. **Runs analytical prediction scripts** for each benchmark  
3. **Compares predicted vs. measured values** to calculate error percentages
4. **Reports overall model accuracy** across all benchmarks

For each benchmark, the analytical model considers:
- Memory access patterns and latencies
- Compute operation costs  
- Data movement overheads
- APU architecture-specific optimizations

**Output**: 
- Terminal table showing measured vs. predicted latencies with error percentages
- Overall error rate calculation across all benchmarks


In [None]:
# Run Analytical Model Validation Experiment
run_experiment("3-analytical", 
               "Validation of analytical performance models against measured Phoenix benchmark results")


## Experiment 4: RAG End-to-End Inference Analysis

This experiment evaluates end-to-end inference time for Retrieval-Augmented Generation (RAG) workloads across different platforms and corpus sizes. RAG combines retrieval of relevant documents with language model generation.

The experiment tests three corpus sizes:
- **10GB**: Small corpus for lightweight workloads
- **50GB**: Medium corpus for moderate workloads  
- **200GB**: Large corpus for enterprise-scale workloads

Platforms evaluated:
- **CPU**: Traditional CPU-based retrieval
- **GPU**: GPU-accelerated retrieval
- **In-SRAM configurations**: APU with different optimization levels (No Opt, Opt1, Opt2, Opt3, All Opts)

Components measured:
- **Generation**: Language model inference time (consistent across platforms)
- **Retrieval**: Document retrieval and similarity computation time (varies by platform)

**Output**:
- Generates `e2e_inference_time.pdf` and `e2e_inference_time.png` showing time-to-interactive comparison
- Terminal output with speedup analysis and performance data
- Detailed timing breakdown for each configuration
- PNG figure displayed inline below after execution


In [None]:
# Run RAG End-to-End Inference Experiment
run_experiment("4-rag-e2e",
               "RAG end-to-end inference time analysis across platforms and corpus sizes")


## Experiment 5: RAG Energy Consumption Analysis

This experiment analyzes the energy consumption of RAG workloads, comparing compute-in-SRAM (APU) approach against traditional GPU acceleration. Energy efficiency is critical for sustainable AI deployment, especially for large-scale retrieval workloads.

The experiment evaluates energy consumption across:
- **Three corpus sizes**: 10GB, 50GB, 200GB
- **Two platforms**: GPU vs. Compute-in-SRAM (APU)

Energy breakdown components for APU:
- **Static**: Base power consumption  
- **DRAM**: Memory access energy (using theoretical HBM energy)
- **L3,L2,L1**: Cache hierarchy energy
- **Compute**: Processing unit energy
- **Other**: Miscellaneous system energy

The analysis:
1. **Measures APU power consumption** using power profiling data
2. **Calculates energy breakdown** for different RAG components  
3. **Compares total energy consumption** between GPU and APU
4. **Reports energy efficiency gains** from compute-in-SRAM approach

**Output**:
- Generates `energy_comparison.pdf` and `energy_comparison.png` with energy consumption comparison chart
- Terminal table showing energy efficiency metrics (GPU energy / APU energy)
- Detailed energy breakdown analysis
- PNG figure displayed inline below after execution


In [None]:
# Run RAG Energy Consumption Analysis
run_experiment("5-rag-energy",
               "RAG energy consumption comparison between GPU and compute-in-SRAM approaches")


## Experiment 6: RAG Latency Breakdown Analysis

This experiment provides detailed latency breakdown analysis for RAG (Retrieval-Augmented Generation) workloads, decomposing the total execution time into individual components. This analysis helps identify performance bottlenecks and optimization opportunities.

The experiment analyzes latency breakdown for:
- **Two optimization levels**: No optimization vs. All optimizations
- **Three corpus sizes**: 10GB, 50GB, 200GB
- **Five RAG components**:

**RAG Pipeline Components:**
1. **Load Embedding**: Loading document embeddings from HBM memory
2. **Load Query**: Loading query vectors for similarity computation  
3. **Calc Distance**: Computing similarity/distance between query and document embeddings
4. **Top-K Aggregation**: Finding and aggregating the K most similar documents
5. **Return Top-K**: Transferring the top-K results back to the host

**Analysis Details:**
- Uses measured execution times from Experiment 4 (RAG E2E)
- Incorporates HBM memory access times based on theoretical models
- Compares optimized vs. unoptimized implementations
- Reports latencies in appropriate units (milliseconds for major components, microseconds for smaller ones)

**Output**:
- Terminal table showing detailed latency breakdown for all configurations
- Comparison between optimization levels across different corpus sizes


In [None]:
# Run RAG Latency Breakdown Analysis  
run_experiment("6-rag-latency-breakdown",
               "Detailed latency breakdown analysis for RAG pipeline components")
