# HPA Scaling Experiment Analysis (Enhanced with TTFT/ITL)

This notebook analyzes data from Horizontal Pod Autoscaler (HPA) scaling experiments with vLLM workloads, including performance metrics.

## Metrics Collected

The experiment monitors:
- **Replica counts** - Current and desired pod replicas
- **Waiting requests** - Number of requests in queue (vLLM metric)
- **KV cache usage** - GPU memory cache utilization percentage
- **TTFT (Time to First Token)** - Latency to first token in ms
- **ITL (Inter-Token Latency)** - Average latency between tokens in ms
- **Request Rate** - Throughput in requests/minute
- **Job status** - Active and completed load generation jobs
- **Scaling events** - When and why HPA scaled up/down

## Setup

Run this notebook after collecting experiment data with the enhanced monitor script.

# HPA Scaling Experiment Analysis

This notebook analyzes data from Horizontal Pod Autoscaler (HPA) scaling experiments with vLLM workloads.

## Overview

The experiment monitors:
- **Replica counts** - Current and desired pod replicas
- **Waiting requests** - Number of requests in queue (vLLM metric)
- **KV cache usage** - GPU memory cache utilization percentage
- **Job status** - Active and completed load generation jobs
- **Scaling events** - When and why HPA scaled up/down

## Setup

Run this notebook after collecting experiment data with `monitor-hpa-experiment.sh`.

## Quick Start Guide

1. **Set up environment**: Make sure you have the required packages installed:
   ```bash
   pip install pandas matplotlib jupyter
   ```

2. **Update the experiment directory path** in Section 2 (below)

3. **Run all cells**: Use "Run All" from the menu or execute cells sequentially

4. **Review the results**: 
   - Summary statistics
   - Visual plots
   - Scaling event details
   - Correlation analysis

---

## 1. Import Libraries

In [2]:
import json
import sys
from pathlib import Path
from datetime import datetime

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.patches import Rectangle

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

print("✓ Libraries imported successfully")

✓ Libraries imported successfully


## 2. Configure Experiment Directory

Set the path to your experiment data directory. This should be the output from `monitor-hpa-experiment.sh`.

In [3]:
# Set this to your experiment directory
# Example: './experiment-data/hpa-experiment-20251124-120000'
EXPERIMENT_DIR = './experiment-data/high-load-hpa-20251124-155733'

# Convert to Path object
exp_path = Path(EXPERIMENT_DIR)

if not exp_path.exists():
    print(f"❌ Error: Experiment directory not found: {EXPERIMENT_DIR}")
    print("\nAvailable experiments:")
    data_dir = Path('./experiment-data')
    if data_dir.exists():
        experiments = sorted(data_dir.glob('hpa-experiment-*'), reverse=True)
        for exp in experiments[:5]:
            print(f"  - {exp.name}")
    else:
        print("  No experiment-data directory found")
else:
    print(f"✓ Using experiment directory: {EXPERIMENT_DIR}")

❌ Error: Experiment directory not found: ./experiment-data/hpa-experiment-CHANGE-THIS

Available experiments:
  No experiment-data directory found


## 3. Load Experiment Data

Load the metadata, metrics CSV, and scaling events from the experiment directory.

In [4]:
# Load metadata
metadata_file = exp_path / "metadata.json"
if metadata_file.exists():
    with open(metadata_file) as f:
        metadata = json.load(f)
    print("✓ Loaded metadata")
else:
    metadata = {}
    print("⚠ No metadata file found")

# Load metrics CSV
metrics_file = exp_path / "metrics.csv"
if not metrics_file.exists():
    raise FileNotFoundError(f"Metrics file not found: {metrics_file}")

df = pd.read_csv(metrics_file)
df['timestamp'] = pd.to_datetime(df['timestamp'])
print(f"✓ Loaded {len(df)} data points from metrics.csv")

# Load scaling events
scaling_log = exp_path / "scaling.log"
scaling_events = []
if scaling_log.exists():
    with open(scaling_log) as f:
        content = f.read()
        # Parse scaling events
        for block in content.split("========================================"):
            if "SCALING EVENT" in block:
                lines = block.strip().split('\n')
                scaling_events.append({
                    'text': block.strip(),
                    'lines': lines
                })
    print(f"✓ Found {len(scaling_events)} scaling events")
else:
    print("⚠ No scaling log found")

print("\nData loaded successfully!")

⚠ No metadata file found


FileNotFoundError: Metrics file not found: experiment-data/hpa-experiment-CHANGE-THIS/metrics.csv

## 4. Experiment Summary

Display key information about the experiment.

In [None]:
print("=" * 70)
print("EXPERIMENT SUMMARY")
print("=" * 70)
print(f"Experiment Name: {metadata.get('experiment_name', 'N/A')}")
print(f"Start Time:      {metadata.get('start_time', 'N/A')}")
print(f"End Time:        {metadata.get('end_time', 'N/A')}")
print(f"Namespace:       {metadata.get('namespace', 'N/A')}")
print(f"HPA Name:        {metadata.get('hpa_name', 'N/A')}")
print(f"Deployment:      {metadata.get('deployment_name', 'N/A')}")
print()

print("=" * 70)
print("SCALING STATISTICS")
print("=" * 70)
print(f"Initial Replicas:  {df['replicas'].iloc[0]}")
print(f"Final Replicas:    {df['replicas'].iloc[-1]}")
print(f"Max Replicas:      {df['replicas'].max()}")
print(f"Min Replicas:      {df['replicas'].min()}")
print(f"Scaling Events:    {len(scaling_events)}")
print()

print("=" * 70)
print("METRIC STATISTICS")
print("=" * 70)
print("Waiting Requests:")
print(f"  Mean:   {df['num_requests_waiting_current'].mean():.2f}")
print(f"  Max:    {df['num_requests_waiting_current'].max():.2f}")
print(f"  Target: {df['num_requests_waiting_target'].iloc[0]}")
print()
print("KV Cache Usage (%):")
print(f"  Mean:   {df['kv_cache_usage_current'].mean():.2f}")
print(f"  Max:    {df['kv_cache_usage_current'].max():.2f}")
print(f"  Target: {df['kv_cache_usage_target'].iloc[0]}")
print()

print("=" * 70)
print("JOB STATISTICS")
print("=" * 70)
print(f"Max Active Jobs:    {df['active_jobs'].max()}")
print(f"Total Completed:    {df['completed_jobs'].iloc[-1]}")
print("=" * 70)

## 5. Data Preview

View the first few rows of the metrics data.

In [None]:
df.head(10)

## 6. Visualization: 4-Panel Time Series Plot

Create a comprehensive visualization showing:
1. **Replica count** - Current vs desired replicas over time
2. **Waiting requests** - Current value vs target threshold
3. **KV cache usage** - Current percentage vs target threshold
4. **Job activity** - Active and completed jobs

In [None]:
fig, axes = plt.subplots(4, 1, figsize=(16, 12), sharex=True)
fig.suptitle(f"HPA Scaling Experiment: {metadata.get('experiment_name', 'Unknown')}", 
             fontsize=16, fontweight='bold')

timestamps = df['timestamp']

# Plot 1: Replica count
ax1 = axes[0]
ax1.plot(timestamps, df['replicas'], marker='o', label='Current Replicas', 
         linewidth=2, color='blue', markersize=4)
ax1.plot(timestamps, df['desired_replicas'], marker='s', label='Desired Replicas', 
         linewidth=1, linestyle='--', color='orange', alpha=0.7, markersize=3)
ax1.set_ylabel('Replica Count', fontweight='bold', fontsize=12)
ax1.legend(loc='upper left', fontsize=10)
ax1.grid(True, alpha=0.3)
ax1.set_ylim(bottom=0)

# Highlight scaling events with vertical lines
scaling_times = df[df['replicas'] != df['replicas'].shift()]['timestamp']
for st in scaling_times:
    ax1.axvline(x=st, color='red', linestyle=':', alpha=0.5, linewidth=1.5)

# Plot 2: Number of waiting requests
ax2 = axes[1]
ax2.plot(timestamps, df['num_requests_waiting_current'], marker='o', 
         label='Current Waiting Requests', linewidth=2, color='green', markersize=4)
ax2.axhline(y=df['num_requests_waiting_target'].iloc[0], color='red', 
            linestyle='--', label='Target Threshold', linewidth=2, alpha=0.7)
ax2.fill_between(timestamps, 0, df['num_requests_waiting_target'].iloc[0], 
                 color='green', alpha=0.1, label='Safe Zone')
ax2.set_ylabel('Waiting Requests', fontweight='bold', fontsize=12)
ax2.legend(loc='upper left', fontsize=10)
ax2.grid(True, alpha=0.3)
ax2.set_ylim(bottom=0)

# Plot 3: KV Cache Usage
ax3 = axes[2]
ax3.plot(timestamps, df['kv_cache_usage_current'], marker='o', 
         label='Current KV Cache Usage', linewidth=2, color='purple', markersize=4)
ax3.axhline(y=df['kv_cache_usage_target'].iloc[0], color='red', 
            linestyle='--', label='Target Threshold', linewidth=2, alpha=0.7)
ax3.fill_between(timestamps, 0, df['kv_cache_usage_target'].iloc[0], 
                 color='purple', alpha=0.1, label='Safe Zone')
ax3.set_ylabel('KV Cache Usage (%)', fontweight='bold', fontsize=12)
ax3.legend(loc='upper left', fontsize=10)
ax3.grid(True, alpha=0.3)
ax3.set_ylim(bottom=0, top=100)

# Plot 4: Active Jobs
ax4 = axes[3]
width = 0.002  # Bar width for time series
ax4.bar(timestamps, df['active_jobs'], label='Active Jobs', 
        color='teal', alpha=0.7, width=width)
ax4.bar(timestamps, df['completed_jobs'], label='Completed Jobs', 
        color='gray', alpha=0.5, width=width, bottom=df['active_jobs'])
ax4.set_ylabel('Job Count', fontweight='bold', fontsize=12)
ax4.set_xlabel('Time', fontweight='bold', fontsize=12)
ax4.legend(loc='upper left', fontsize=10)
ax4.grid(True, alpha=0.3)
ax4.set_ylim(bottom=0)

# Format x-axis
ax4.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
ax4.xaxis.set_major_locator(mdates.MinuteLocator(interval=2))
plt.xticks(rotation=45, ha='right')

plt.tight_layout()
plt.show()

print("\n✓ Plot generated successfully")

## 7. Save Plot to File (Optional)

Uncomment and run to save the plot to a PNG file.

In [None]:
# Uncomment to save the plot
# output_file = f"{metadata.get('experiment_name', 'experiment')}-results.png"
# fig.savefig(output_file, dpi=300, bbox_inches='tight')
# print(f"✓ Plot saved to: {output_file}")

## 8. Scaling Events Details

View detailed information about each scaling event.

In [None]:
if scaling_events:
    print("=" * 70)
    print(f"SCALING EVENTS ({len(scaling_events)} total)")
    print("=" * 70)
    print()
    
    for i, event in enumerate(scaling_events, 1):
        print(f"Event {i}:")
        print("-" * 70)
        # Print first 600 characters of each event
        event_text = event['text']
        if len(event_text) > 600:
            print(event_text[:600] + "...")
        else:
            print(event_text)
        print()
else:
    print("No scaling events detected in this experiment.")

## 9. Additional Analysis: Metric Correlation

Analyze the relationship between metrics and scaling decisions.

In [None]:
# Calculate when each metric exceeded its threshold
df['waiting_requests_exceeded'] = df['num_requests_waiting_current'] > df['num_requests_waiting_target']
df['kv_cache_exceeded'] = df['kv_cache_usage_current'] > df['kv_cache_usage_target']
df['any_threshold_exceeded'] = df['waiting_requests_exceeded'] | df['kv_cache_exceeded']

# Correlation between metrics
print("Metric Correlation Matrix:")
print("-" * 50)
correlation_cols = ['replicas', 'num_requests_waiting_current', 'kv_cache_usage_current', 'active_jobs']
print(df[correlation_cols].corr().round(3))
print()

# Count threshold violations
waiting_violations = df['waiting_requests_exceeded'].sum()
kv_violations = df['kv_cache_exceeded'].sum()
total_samples = len(df)

print(f"Threshold Violations:")
print(f"  Waiting Requests > {df['num_requests_waiting_target'].iloc[0]}: {waiting_violations}/{total_samples} samples ({100*waiting_violations/total_samples:.1f}%)")
print(f"  KV Cache > {df['kv_cache_usage_target'].iloc[0]}%: {kv_violations}/{total_samples} samples ({100*kv_violations/total_samples:.1f}%)")
print()

# Identify which metric triggered scaling
if len(scaling_events) > 0:
    print(f"Scaling Trigger Analysis:")
    print(f"  Total scaling events: {len(scaling_events)}")
    print(f"  Note: Check scaling.log for detailed trigger information")

## 10. Export Data for Further Analysis

Export processed data to CSV for use in other tools.

In [None]:
# Add analysis columns to the dataframe
df_export = df.copy()
df_export['waiting_requests_exceeded'] = df['waiting_requests_exceeded']
df_export['kv_cache_exceeded'] = df['kv_cache_exceeded']
df_export['any_threshold_exceeded'] = df['any_threshold_exceeded']

# Uncomment to export
# output_csv = exp_path / "analysis_results.csv"
# df_export.to_csv(output_csv, index=False)
# print(f"✓ Data exported to: {output_csv}")

print("Data ready for export. Uncomment the code above to save.")

## Summary

This notebook provides a complete analysis of your HPA scaling experiment:

✅ **Data loaded** from experiment directory  
✅ **Summary statistics** calculated  
✅ **4-panel visualization** showing all key metrics  
✅ **Scaling events** extracted and displayed  
✅ **Correlation analysis** between metrics  
✅ **Export capability** for further analysis

### Next Steps

1. **Compare experiments**: Run this notebook on multiple experiment directories to compare different HPA configurations
2. **Tune thresholds**: Use the insights to adjust HPA target values
3. **Share results**: Export the plot and summary for documentation
4. **Deep dive**: Examine `scaling.log` for detailed event information

### Files in Experiment Directory

- `metrics.csv` - Time-series data (loaded in this notebook)
- `scaling.log` - Detailed scaling event information
- `events.log` - Kubernetes HPA events
- `jobs.log` - Load job status over time
- `metadata.json` - Experiment configuration
- `monitor.log` - Monitor script output