# Lab 2.4.2 - 高級動態批次處理與智能調度

## 🎯 實驗目標

本實驗將教您如何：
1. 實現智能動態批次調度算法
2. 設計優先級請求處理機制
3. 優化延遲與吞吐量的平衡
4. 實現自適應批次大小調整
5. 構建請求排隊和負載均衡策略

## 📋 前置需求

- 完成 Lab 2.1（Triton 基礎設置）
- 了解批次處理基本概念
- 熟悉性能監控和調優

---

## 📚 理論背景

### 動態批次處理挑戰

**1. 延遲 vs 吞吐量權衡**
- 大批次：高吞吐量，高延遲
- 小批次：低延遲，低吞吐量
- 動態調整：根據負載智能平衡

**2. 請求優先級管理**
- VIP 用戶優先處理
- 緊急請求快速通道
- 批次任務低優先級

**3. 資源利用率最佳化**
- GPU 記憶體有效利用
- 計算資源動態分配
- 多模型並行處理

### 智能調度架構

```mermaid
graph TD
    A[請求接收] --> B[優先級分類]
    B --> C{負載檢測}
    C -->|高負載| D[大批次策略]
    C -->|低負載| E[小批次策略]
    C -->|中負載| F[動態調整]
    
    D --> G[批次調度器]
    E --> G
    F --> G
    
    G --> H[GPU 執行]
    H --> I[結果返回]
    I --> J[性能監控]
    J --> C
```

## 🛠️ 環境準備

In [None]:
import os
import json
import time
import random
import asyncio
import threading
import queue
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Tuple, Any
from dataclasses import dataclass, field
from concurrent.futures import ThreadPoolExecutor, as_completed
from collections import deque, defaultdict
import heapq

# 性能監控
import psutil
import threading
from threading import Lock, Event

# Triton 客戶端
import tritonclient.http as httpclient
from tritonclient.utils import InferenceServerException

# 可視化
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.animation import FuncAnimation
from IPython.display import clear_output

# 設置樣式
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print(f"🚀 Dynamic Batching Lab initialized at {datetime.now()}")
print(f"📊 Working directory: {os.getcwd()}")

In [None]:
# 設置實驗環境
BASE_DIR = "/opt/tritonserver"
MODEL_REPO = f"{BASE_DIR}/models"
EXPERIMENT_DIR = f"{BASE_DIR}/experiments/dynamic_batching"

# 創建實驗目錄
os.makedirs(EXPERIMENT_DIR, exist_ok=True)
os.makedirs(f"{EXPERIMENT_DIR}/metrics", exist_ok=True)
os.makedirs(f"{EXPERIMENT_DIR}/configs", exist_ok=True)
os.makedirs(f"{EXPERIMENT_DIR}/logs", exist_ok=True)

print(f"📁 實驗目錄: {EXPERIMENT_DIR}")
print(f"📁 模型倉庫: {MODEL_REPO}")

## 🎯 實驗 1：智能批次調度器設計

### 1.1 請求和批次數據結構

In [None]:
@dataclass
class InferenceRequest:
    """推理請求數據結構"""
    request_id: str
    user_id: str
    priority: int  # 1=最高, 5=最低
    data: Any
    created_at: datetime
    timeout: float  # 秒
    callback: Optional[callable] = None
    metadata: Dict[str, Any] = field(default_factory=dict)
    
    def __lt__(self, other):
        """優先級比較（用於優先佇列）"""
        if self.priority != other.priority:
            return self.priority < other.priority
        return self.created_at < other.created_at
    
    @property
    def age(self) -> float:
        """請求年齡（秒）"""
        return (datetime.now() - self.created_at).total_seconds()
    
    @property
    def is_expired(self) -> bool:
        """是否已過期"""
        return self.age > self.timeout


@dataclass
class BatchConfig:
    """批次配置"""
    min_batch_size: int = 1
    max_batch_size: int = 32
    max_wait_time: float = 0.1  # 秒
    target_latency: float = 0.05  # 秒
    priority_boost: Dict[int, float] = field(default_factory=lambda: {
        1: 0.8,  # VIP 用戶降低 80% 等待時間
        2: 0.6,  # 高優先級降低 60%
        3: 1.0,  # 普通優先級
        4: 1.2,  # 低優先級增加 20%
        5: 1.5   # 最低優先級增加 50%
    })


@dataclass
class BatchMetrics:
    """批次性能指標"""
    batch_id: str
    batch_size: int
    processing_time: float
    wait_time: float
    total_latency: float
    throughput: float
    priority_distribution: Dict[int, int]
    timestamp: datetime
    gpu_utilization: float = 0.0
    memory_usage: float = 0.0


print("✅ 數據結構定義完成")

### 1.2 智能批次調度器實現

In [None]:
class SmartBatchScheduler:
    """智能批次調度器"""
    
    def __init__(self, config: BatchConfig, model_name: str = "text_classifier"):
        self.config = config
        self.model_name = model_name
        
        # 請求佇列（按優先級排序）
        self.priority_queue = []
        self.queue_lock = Lock()
        
        # 性能統計
        self.metrics_history: List[BatchMetrics] = []
        self.metrics_lock = Lock()
        
        # 調度狀態
        self.running = False
        self.scheduler_thread = None
        self.stop_event = Event()
        
        # 自適應參數
        self.adaptive_config = {
            "current_batch_size": config.min_batch_size,
            "avg_latency": 0.0,
            "avg_throughput": 0.0,
            "load_factor": 0.0
        }
        
        # 統計計數器
        self.stats = {
            "total_requests": 0,
            "processed_requests": 0,
            "expired_requests": 0,
            "total_batches": 0,
            "avg_batch_size": 0.0
        }
    
    def add_request(self, request: InferenceRequest) -> bool:
        """添加推理請求"""
        if request.is_expired:
            self.stats["expired_requests"] += 1
            return False
        
        with self.queue_lock:
            heapq.heappush(self.priority_queue, request)
            self.stats["total_requests"] += 1
        
        return True
    
    def get_queue_status(self) -> Dict[str, Any]:
        """獲取佇列狀態"""
        with self.queue_lock:
            queue_size = len(self.priority_queue)
            priority_counts = defaultdict(int)
            
            for req in self.priority_queue:
                priority_counts[req.priority] += 1
            
            return {
                "queue_size": queue_size,
                "priority_distribution": dict(priority_counts),
                "oldest_request_age": self.priority_queue[0].age if queue_size > 0 else 0
            }
    
    def _calculate_optimal_batch_size(self) -> int:
        """計算最佳批次大小"""
        with self.queue_lock:
            queue_size = len(self.priority_queue)
        
        if queue_size == 0:
            return self.config.min_batch_size
        
        # 基於歷史性能數據的自適應調整
        recent_metrics = self.metrics_history[-10:] if self.metrics_history else []
        
        if recent_metrics:
            avg_latency = np.mean([m.total_latency for m in recent_metrics])
            avg_throughput = np.mean([m.throughput for m in recent_metrics])
            
            # 如果延遲過高，減少批次大小
            if avg_latency > self.config.target_latency * 1.5:
                target_size = max(self.config.min_batch_size, 
                                self.adaptive_config["current_batch_size"] - 2)
            # 如果延遲合理且佇列較長，增加批次大小
            elif avg_latency <= self.config.target_latency and queue_size > 10:
                target_size = min(self.config.max_batch_size,
                                self.adaptive_config["current_batch_size"] + 1)
            else:
                target_size = self.adaptive_config["current_batch_size"]
        else:
            # 初始階段基於佇列長度
            if queue_size >= self.config.max_batch_size:
                target_size = self.config.max_batch_size
            elif queue_size >= self.config.min_batch_size:
                target_size = min(queue_size, self.config.max_batch_size)
            else:
                target_size = self.config.min_batch_size
        
        self.adaptive_config["current_batch_size"] = target_size
        return target_size
    
    def _calculate_wait_time(self, priority: int) -> float:
        """計算等待時間（基於優先級）"""
        base_wait = self.config.max_wait_time
        priority_factor = self.config.priority_boost.get(priority, 1.0)
        return base_wait * priority_factor
    
    def _form_batch(self) -> List[InferenceRequest]:
        """組成批次"""
        batch = []
        target_size = self._calculate_optimal_batch_size()
        
        with self.queue_lock:
            # 移除過期請求
            expired_count = 0
            while self.priority_queue and self.priority_queue[0].is_expired:
                heapq.heappop(self.priority_queue)
                expired_count += 1
            
            self.stats["expired_requests"] += expired_count
            
            # 組成批次
            while len(batch) < target_size and self.priority_queue:
                request = heapq.heappop(self.priority_queue)
                if not request.is_expired:
                    batch.append(request)
                else:
                    self.stats["expired_requests"] += 1
        
        return batch
    
    def _should_process_batch(self, batch: List[InferenceRequest]) -> bool:
        """判斷是否應該處理批次"""
        if not batch:
            return False
        
        # 如果達到最小批次大小
        if len(batch) >= self.config.min_batch_size:
            return True
        
        # 如果有高優先級請求等待時間過長
        for request in batch:
            wait_threshold = self._calculate_wait_time(request.priority)
            if request.age >= wait_threshold:
                return True
        
        return False
    
    def _process_batch(self, batch: List[InferenceRequest]) -> BatchMetrics:
        """處理批次（模擬）"""
        if not batch:
            return None
        
        batch_id = f"batch_{int(time.time() * 1000)}"
        batch_size = len(batch)
        
        # 記錄開始時間
        start_time = time.time()
        
        # 計算等待時間（最老請求的等待時間）
        wait_time = max(req.age for req in batch)
        
        # 模擬處理時間（基於批次大小和複雜度）
        base_processing_time = 0.02  # 20ms 基礎處理時間
        batch_overhead = batch_size * 0.001  # 每個樣本增加 1ms
        processing_time = base_processing_time + batch_overhead + random.uniform(0, 0.01)
        
        # 模擬實際處理
        time.sleep(processing_time)
        
        # 計算指標
        total_latency = time.time() - start_time + wait_time
        throughput = batch_size / processing_time
        
        # 優先級分布
        priority_dist = defaultdict(int)
        for req in batch:
            priority_dist[req.priority] += 1
        
        # 創建批次指標
        metrics = BatchMetrics(
            batch_id=batch_id,
            batch_size=batch_size,
            processing_time=processing_time,
            wait_time=wait_time,
            total_latency=total_latency,
            throughput=throughput,
            priority_distribution=dict(priority_dist),
            timestamp=datetime.now(),
            gpu_utilization=random.uniform(0.7, 0.95),  # 模擬 GPU 使用率
            memory_usage=random.uniform(0.4, 0.8)        # 模擬記憶體使用率
        )
        
        # 更新統計
        with self.metrics_lock:
            self.metrics_history.append(metrics)
            self.stats["processed_requests"] += batch_size
            self.stats["total_batches"] += 1
            
            # 保留最近 1000 條記錄
            if len(self.metrics_history) > 1000:
                self.metrics_history = self.metrics_history[-1000:]
        
        return metrics
    
    def _scheduler_loop(self):
        """調度器主循環"""
        print("🚀 智能批次調度器已啟動")
        
        while not self.stop_event.is_set():
            try:
                # 組成批次
                batch = self._form_batch()
                
                # 檢查是否需要處理
                if self._should_process_batch(batch):
                    metrics = self._process_batch(batch)
                    if metrics:
                        print(f"📊 處理批次 {metrics.batch_id}: "
                              f"大小={metrics.batch_size}, "
                              f"延遲={metrics.total_latency:.3f}s, "
                              f"吞吐量={metrics.throughput:.1f} req/s")
                else:
                    # 如果有批次但不需要立即處理，將請求放回佇列
                    if batch:
                        with self.queue_lock:
                            for req in batch:
                                heapq.heappush(self.priority_queue, req)
                
                # 短暫等待
                time.sleep(0.001)  # 1ms
                
            except Exception as e:
                print(f"❌ 調度器錯誤: {str(e)}")
                time.sleep(0.1)
        
        print("⏹️  智能批次調度器已停止")
    
    def start(self):
        """啟動調度器"""
        if not self.running:
            self.running = True
            self.stop_event.clear()
            self.scheduler_thread = threading.Thread(target=self._scheduler_loop)
            self.scheduler_thread.start()
    
    def stop(self):
        """停止調度器"""
        if self.running:
            self.running = False
            self.stop_event.set()
            if self.scheduler_thread:
                self.scheduler_thread.join()
    
    def get_performance_summary(self) -> Dict[str, Any]:
        """獲取性能摘要"""
        with self.metrics_lock:
            if not self.metrics_history:
                return {"error": "沒有性能數據"}
            
            recent_metrics = self.metrics_history[-100:]  # 最近 100 個批次
            
            avg_batch_size = np.mean([m.batch_size for m in recent_metrics])
            avg_latency = np.mean([m.total_latency for m in recent_metrics])
            avg_throughput = np.mean([m.throughput for m in recent_metrics])
            avg_gpu_util = np.mean([m.gpu_utilization for m in recent_metrics])
            
            return {
                "total_requests": self.stats["total_requests"],
                "processed_requests": self.stats["processed_requests"],
                "expired_requests": self.stats["expired_requests"],
                "total_batches": self.stats["total_batches"],
                "avg_batch_size": avg_batch_size,
                "avg_latency": avg_latency,
                "avg_throughput": avg_throughput,
                "avg_gpu_utilization": avg_gpu_util,
                "queue_status": self.get_queue_status()
            }


print("✅ 智能批次調度器實現完成")

## 🎯 實驗 2：調度器測試與性能分析

### 2.1 創建調度器實例

In [None]:
# 創建批次配置
batch_config = BatchConfig(
    min_batch_size=2,
    max_batch_size=16,
    max_wait_time=0.05,  # 50ms
    target_latency=0.03,  # 30ms
    priority_boost={
        1: 0.5,  # VIP 用戶等待時間減半
        2: 0.7,  # 高優先級用戶
        3: 1.0,  # 普通用戶
        4: 1.3,  # 低優先級用戶
        5: 1.8   # 批次處理用戶
    }
)

# 創建調度器
scheduler = SmartBatchScheduler(batch_config)

print("✅ 智能批次調度器創建完成")
print(f"📊 配置: min_batch={batch_config.min_batch_size}, "
      f"max_batch={batch_config.max_batch_size}, "
      f"max_wait={batch_config.max_wait_time}s")

### 2.2 請求生成器

In [None]:
class RequestGenerator:
    """請求生成器"""
    
    def __init__(self):
        self.user_types = {
            "vip": {"priority": 1, "ratio": 0.05, "timeout": 1.0},
            "premium": {"priority": 2, "ratio": 0.15, "timeout": 2.0},
            "regular": {"priority": 3, "ratio": 0.70, "timeout": 5.0},
            "batch": {"priority": 4, "ratio": 0.08, "timeout": 30.0},
            "background": {"priority": 5, "ratio": 0.02, "timeout": 60.0}
        }
        self.request_counter = 0
    
    def generate_request(self) -> InferenceRequest:
        """生成單個請求"""
        self.request_counter += 1
        
        # 隨機選擇用戶類型
        rand = random.random()
        cumulative = 0
        selected_type = "regular"
        
        for user_type, config in self.user_types.items():
            cumulative += config["ratio"]
            if rand <= cumulative:
                selected_type = user_type
                break
        
        user_config = self.user_types[selected_type]
        
        # 創建請求
        request = InferenceRequest(
            request_id=f"req_{self.request_counter:06d}",
            user_id=f"{selected_type}_user_{random.randint(1000, 9999)}",
            priority=user_config["priority"],
            data=np.random.randn(224, 224, 3),  # 模擬圖像數據
            created_at=datetime.now(),
            timeout=user_config["timeout"],
            metadata={"user_type": selected_type}
        )
        
        return request
    
    def generate_burst_requests(self, count: int, 
                              priority_bias: Optional[int] = None) -> List[InferenceRequest]:
        """生成突發請求"""
        requests = []
        
        for _ in range(count):
            request = self.generate_request()
            
            # 如果指定了優先級偏向，調整優先級
            if priority_bias is not None:
                request.priority = priority_bias
            
            requests.append(request)
        
        return requests
    
    def simulate_traffic_pattern(self, duration: int, 
                                pattern: str = "normal") -> List[InferenceRequest]:
        """模擬不同的流量模式"""
        requests = []
        
        if pattern == "normal":
            # 正常流量：每秒 10-20 個請求
            for _ in range(duration):
                count = random.randint(10, 20)
                requests.extend(self.generate_burst_requests(count))
        
        elif pattern == "spike":
            # 突發流量：短時間內大量請求
            for second in range(duration):
                if second % 10 == 0:  # 每 10 秒一個峰值
                    count = random.randint(50, 100)
                else:
                    count = random.randint(5, 15)
                requests.extend(self.generate_burst_requests(count))
        
        elif pattern == "priority_flood":
            # 高優先級請求突增
            for second in range(duration):
                if second < duration // 2:
                    # 前半段正常流量
                    count = random.randint(10, 20)
                    requests.extend(self.generate_burst_requests(count))
                else:
                    # 後半段高優先級突增
                    normal_count = random.randint(10, 20)
                    priority_count = random.randint(20, 40)
                    requests.extend(self.generate_burst_requests(normal_count))
                    requests.extend(self.generate_burst_requests(priority_count, priority_bias=1))
        
        return requests


# 創建請求生成器
request_generator = RequestGenerator()
print("✅ 請求生成器創建完成")

### 2.3 啟動調度器並進行測試

In [None]:
# 啟動調度器
scheduler.start()

print("🚀 開始性能測試...")

# 測試 1: 正常流量模式
print("\n📊 測試 1: 正常流量模式（30秒）")
normal_requests = request_generator.simulate_traffic_pattern(30, "normal")

# 分批添加請求（模擬實時到達）
for i, request in enumerate(normal_requests):
    scheduler.add_request(request)
    
    # 每 50 個請求顯示一次進度
    if (i + 1) % 50 == 0:
        status = scheduler.get_queue_status()
        print(f"   📈 已添加 {i + 1} 個請求，佇列長度: {status['queue_size']}")
    
    # 模擬請求間隔
    time.sleep(0.001)

# 等待處理完成
time.sleep(2)
performance_summary = scheduler.get_performance_summary()
print(f"✅ 正常流量測試完成:")
print(f"   📊 處理請求: {performance_summary['processed_requests']}")
print(f"   📊 平均延遲: {performance_summary['avg_latency']:.3f}s")
print(f"   📊 平均吞吐量: {performance_summary['avg_throughput']:.1f} req/s")
print(f"   📊 平均批次大小: {performance_summary['avg_batch_size']:.1f}")

In [None]:
# 測試 2: 突發流量模式
print("\n📊 測試 2: 突發流量模式（20秒）")
spike_requests = request_generator.simulate_traffic_pattern(20, "spike")

start_time = time.time()
for request in spike_requests:
    scheduler.add_request(request)
    time.sleep(0.0001)  # 更快的到達率

# 等待處理完成
time.sleep(3)
spike_performance = scheduler.get_performance_summary()
print(f"✅ 突發流量測試完成:")
print(f"   📊 處理請求: {spike_performance['processed_requests'] - performance_summary['processed_requests']}")
print(f"   📊 當前平均延遲: {spike_performance['avg_latency']:.3f}s")
print(f"   📊 當前平均吞吐量: {spike_performance['avg_throughput']:.1f} req/s")
print(f"   📊 當前平均批次大小: {spike_performance['avg_batch_size']:.1f}")

In [None]:
# 測試 3: 優先級壓測
print("\n📊 測試 3: 高優先級請求突增（15秒）")
priority_requests = request_generator.simulate_traffic_pattern(15, "priority_flood")

for request in priority_requests:
    scheduler.add_request(request)
    time.sleep(0.0005)

# 等待處理完成
time.sleep(2)
priority_performance = scheduler.get_performance_summary()
print(f"✅ 優先級測試完成:")
print(f"   📊 總處理請求: {priority_performance['processed_requests']}")
print(f"   📊 過期請求: {priority_performance['expired_requests']}")
print(f"   📊 總批次數: {priority_performance['total_batches']}")

# 最終狀態
final_status = scheduler.get_queue_status()
print(f"\n📈 最終佇列狀態:")
print(f"   📊 剩餘請求: {final_status['queue_size']}")
print(f"   📊 優先級分布: {final_status['priority_distribution']}")

## 🎯 實驗 3：性能監控與可視化

### 3.1 實時性能監控

In [None]:
def plot_scheduler_performance(scheduler: SmartBatchScheduler, window_size: int = 50):
    """可視化調度器性能"""
    metrics = scheduler.metrics_history[-window_size:] if scheduler.metrics_history else []
    
    if not metrics:
        print("❌ 沒有性能數據可供可視化")
        return
    
    # 準備數據
    timestamps = [m.timestamp for m in metrics]
    batch_sizes = [m.batch_size for m in metrics]
    latencies = [m.total_latency * 1000 for m in metrics]  # 轉換為毫秒
    throughputs = [m.throughput for m in metrics]
    wait_times = [m.wait_time * 1000 for m in metrics]  # 轉換為毫秒
    gpu_utils = [m.gpu_utilization * 100 for m in metrics]  # 轉換為百分比
    
    # 創建子圖
    fig, axes = plt.subplots(2, 3, figsize=(18, 10))
    fig.suptitle('智能批次調度器性能監控', fontsize=16, fontweight='bold')
    
    # 批次大小趨勢
    axes[0, 0].plot(timestamps, batch_sizes, 'b-', linewidth=2, marker='o', markersize=4)
    axes[0, 0].set_title('批次大小變化', fontweight='bold')
    axes[0, 0].set_ylabel('批次大小')
    axes[0, 0].grid(True, alpha=0.3)
    axes[0, 0].axhline(y=np.mean(batch_sizes), color='r', linestyle='--', alpha=0.7, label=f'平均: {np.mean(batch_sizes):.1f}')
    axes[0, 0].legend()
    
    # 延遲分布
    axes[0, 1].plot(timestamps, latencies, 'g-', linewidth=2, marker='s', markersize=4)
    axes[0, 1].set_title('總延遲變化', fontweight='bold')
    axes[0, 1].set_ylabel('延遲 (ms)')
    axes[0, 1].grid(True, alpha=0.3)
    axes[0, 1].axhline(y=scheduler.config.target_latency * 1000, color='r', linestyle='--', alpha=0.7, label='目標延遲')
    axes[0, 1].legend()
    
    # 吞吐量趨勢
    axes[0, 2].plot(timestamps, throughputs, 'purple', linewidth=2, marker='^', markersize=4)
    axes[0, 2].set_title('吞吐量變化', fontweight='bold')
    axes[0, 2].set_ylabel('吞吐量 (req/s)')
    axes[0, 2].grid(True, alpha=0.3)
    axes[0, 2].axhline(y=np.mean(throughputs), color='r', linestyle='--', alpha=0.7, label=f'平均: {np.mean(throughputs):.1f}')
    axes[0, 2].legend()
    
    # 等待時間分析
    axes[1, 0].plot(timestamps, wait_times, 'orange', linewidth=2, marker='d', markersize=4)
    axes[1, 0].set_title('請求等待時間', fontweight='bold')
    axes[1, 0].set_ylabel('等待時間 (ms)')
    axes[1, 0].set_xlabel('時間')
    axes[1, 0].grid(True, alpha=0.3)
    
    # GPU 利用率
    axes[1, 1].plot(timestamps, gpu_utils, 'red', linewidth=2, marker='h', markersize=4)
    axes[1, 1].set_title('GPU 利用率', fontweight='bold')
    axes[1, 1].set_ylabel('利用率 (%)')
    axes[1, 1].set_xlabel('時間')
    axes[1, 1].grid(True, alpha=0.3)
    axes[1, 1].set_ylim(0, 100)
    
    # 延遲 vs 批次大小散點圖
    scatter = axes[1, 2].scatter(batch_sizes, latencies, c=throughputs, 
                                cmap='viridis', s=50, alpha=0.7)
    axes[1, 2].set_title('延遲 vs 批次大小', fontweight='bold')
    axes[1, 2].set_xlabel('批次大小')
    axes[1, 2].set_ylabel('延遲 (ms)')
    axes[1, 2].grid(True, alpha=0.3)
    
    # 添加顏色條
    cbar = plt.colorbar(scatter, ax=axes[1, 2])
    cbar.set_label('吞吐量 (req/s)')
    
    # 調整時間軸標籤
    for ax in axes.flat:
        if 'time' in ax.get_xlabel().lower() or len(ax.get_xticklabels()) > 10:
            ax.tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()
    
    # 統計摘要
    print(f"\n📊 性能統計摘要（最近 {len(metrics)} 個批次）:")
    print(f"   📈 平均批次大小: {np.mean(batch_sizes):.2f} ± {np.std(batch_sizes):.2f}")
    print(f"   ⏱️  平均延遲: {np.mean(latencies):.2f}ms ± {np.std(latencies):.2f}ms")
    print(f"   🚀 平均吞吐量: {np.mean(throughputs):.1f} ± {np.std(throughputs):.1f} req/s")
    print(f"   ⏳ 平均等待時間: {np.mean(wait_times):.2f}ms ± {np.std(wait_times):.2f}ms")
    print(f"   💻 平均 GPU 利用率: {np.mean(gpu_utils):.1f}% ± {np.std(gpu_utils):.1f}%")


# 可視化性能
plot_scheduler_performance(scheduler, window_size=100)

### 3.2 優先級處理效能分析

In [None]:
def analyze_priority_performance(scheduler: SmartBatchScheduler):
    """分析優先級處理效能"""
    metrics = scheduler.metrics_history
    
    if not metrics:
        print("❌ 沒有數據可供分析")
        return
    
    # 收集優先級統計
    priority_stats = defaultdict(lambda: {
        'count': 0,
        'total_latency': 0,
        'total_wait_time': 0,
        'batches': []
    })
    
    for metric in metrics:
        for priority, count in metric.priority_distribution.items():
            priority_stats[priority]['count'] += count
            priority_stats[priority]['total_latency'] += metric.total_latency * count
            priority_stats[priority]['total_wait_time'] += metric.wait_time * count
            priority_stats[priority]['batches'].append(metric)
    
    # 計算平均值
    priority_names = {1: 'VIP', 2: 'Premium', 3: 'Regular', 4: 'Batch', 5: 'Background'}
    
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle('優先級處理效能分析', fontsize=16, fontweight='bold')
    
    # 準備數據
    priorities = sorted(priority_stats.keys())
    priority_labels = [priority_names.get(p, f'P{p}') for p in priorities]
    request_counts = [priority_stats[p]['count'] for p in priorities]
    avg_latencies = [priority_stats[p]['total_latency'] / max(priority_stats[p]['count'], 1) * 1000 
                    for p in priorities]  # 轉換為毫秒
    avg_wait_times = [priority_stats[p]['total_wait_time'] / max(priority_stats[p]['count'], 1) * 1000 
                     for p in priorities]  # 轉換為毫秒
    
    # 請求數量分布
    colors = ['gold', 'lightcoral', 'lightblue', 'lightgreen', 'lightgray']
    bars1 = ax1.bar(priority_labels, request_counts, color=colors[:len(priorities)], alpha=0.8)
    ax1.set_title('各優先級請求數量', fontweight='bold')
    ax1.set_ylabel('請求數量')
    ax1.grid(True, alpha=0.3)
    
    # 添加數值標籤
    for bar, count in zip(bars1, request_counts):
        ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(request_counts)*0.01,
                f'{count}', ha='center', va='bottom', fontweight='bold')
    
    # 平均延遲對比
    bars2 = ax2.bar(priority_labels, avg_latencies, color=colors[:len(priorities)], alpha=0.8)
    ax2.set_title('各優先級平均延遲', fontweight='bold')
    ax2.set_ylabel('平均延遲 (ms)')
    ax2.grid(True, alpha=0.3)
    
    for bar, latency in zip(bars2, avg_latencies):
        ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(avg_latencies)*0.01,
                f'{latency:.1f}', ha='center', va='bottom', fontweight='bold')
    
    # 平均等待時間對比
    bars3 = ax3.bar(priority_labels, avg_wait_times, color=colors[:len(priorities)], alpha=0.8)
    ax3.set_title('各優先級平均等待時間', fontweight='bold')
    ax3.set_ylabel('平均等待時間 (ms)')
    ax3.grid(True, alpha=0.3)
    
    for bar, wait_time in zip(bars3, avg_wait_times):
        ax3.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(avg_wait_times)*0.01,
                f'{wait_time:.1f}', ha='center', va='bottom', fontweight='bold')
    
    # 優先級效能比較（延遲降低比例）
    if len(avg_latencies) > 1:
        baseline_latency = avg_latencies[2] if len(avg_latencies) > 2 else avg_latencies[-1]  # 使用 Regular 或最後一個作為基準
        latency_improvements = [(baseline_latency - lat) / baseline_latency * 100 for lat in avg_latencies]
        
        bars4 = ax4.bar(priority_labels, latency_improvements, 
                       color=['green' if x > 0 else 'red' for x in latency_improvements], alpha=0.8)
        ax4.set_title('優先級延遲改善 (vs Regular)', fontweight='bold')
        ax4.set_ylabel('延遲改善 (%)')
        ax4.grid(True, alpha=0.3)
        ax4.axhline(y=0, color='black', linestyle='-', alpha=0.5)
        
        for bar, improvement in zip(bars4, latency_improvements):
            ax4.text(bar.get_x() + bar.get_width()/2, 
                    bar.get_height() + (5 if improvement > 0 else -8),
                    f'{improvement:+.1f}%', ha='center', va='bottom' if improvement > 0 else 'top', 
                    fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    # 詳細統計報告
    print(f"\n📊 優先級處理效能報告:")
    print("=" * 70)
    
    for i, priority in enumerate(priorities):
        stats = priority_stats[priority]
        name = priority_names.get(priority, f'Priority {priority}')
        
        print(f"\n🏷️  {name} (優先級 {priority}):")
        print(f"   📊 處理請求數: {stats['count']}")
        print(f"   ⏱️  平均延遲: {avg_latencies[i]:.2f}ms")
        print(f"   ⏳ 平均等待時間: {avg_wait_times[i]:.2f}ms")
        print(f"   📈 參與批次數: {len(stats['batches'])}")
        
        if i > 0:  # 與前一個優先級比較
            latency_diff = avg_latencies[i] - avg_latencies[i-1]
            wait_diff = avg_wait_times[i] - avg_wait_times[i-1]
            print(f"   📉 vs 更高優先級: 延遲 {latency_diff:+.2f}ms, 等待時間 {wait_diff:+.2f}ms")


# 分析優先級性能
analyze_priority_performance(scheduler)

## 🎯 實驗 4：自適應批次大小優化

### 4.1 自適應調優算法

In [None]:
class AdaptiveBatchOptimizer:
    """自適應批次大小優化器"""
    
    def __init__(self, scheduler: SmartBatchScheduler):
        self.scheduler = scheduler
        self.optimization_history = []
        self.best_config = None
        self.best_score = float('-inf')
        
        # 優化參數
        self.param_ranges = {
            'min_batch_size': (1, 8),
            'max_batch_size': (8, 64),
            'max_wait_time': (0.01, 0.2),
            'target_latency': (0.01, 0.1)
        }
        
        # 評分權重
        self.score_weights = {
            'throughput': 0.4,      # 吞吐量權重
            'latency': 0.3,         # 延遲權重（負相關）
            'utilization': 0.2,     # 資源利用率權重
            'fairness': 0.1         # 公平性權重
        }
    
    def calculate_performance_score(self, window_size: int = 20) -> float:
        """計算性能評分"""
        recent_metrics = self.scheduler.metrics_history[-window_size:]
        
        if not recent_metrics:
            return 0.0
        
        # 計算各項指標
        avg_throughput = np.mean([m.throughput for m in recent_metrics])
        avg_latency = np.mean([m.total_latency for m in recent_metrics])
        avg_utilization = np.mean([m.gpu_utilization for m in recent_metrics])
        
        # 計算公平性（優先級間延遲的標準差，越小越公平）
        priority_latencies = defaultdict(list)
        for metric in recent_metrics:
            for priority in metric.priority_distribution.keys():
                priority_latencies[priority].append(metric.total_latency)
        
        if len(priority_latencies) > 1:
            priority_avg_latencies = [np.mean(latencies) for latencies in priority_latencies.values()]
            fairness_score = 1.0 / (1.0 + np.std(priority_avg_latencies))
        else:
            fairness_score = 1.0
        
        # 標準化分數 (0-1)
        throughput_score = min(avg_throughput / 1000.0, 1.0)  # 假設最大吞吐量 1000 req/s
        latency_score = max(0, 1.0 - avg_latency / 0.2)  # 延遲超過 200ms 得 0 分
        utilization_score = avg_utilization  # GPU 利用率已經是 0-1
        
        # 加權平均
        total_score = (
            self.score_weights['throughput'] * throughput_score +
            self.score_weights['latency'] * latency_score +
            self.score_weights['utilization'] * utilization_score +
            self.score_weights['fairness'] * fairness_score
        )
        
        return total_score
    
    def generate_config_variant(self, base_config: BatchConfig, mutation_rate: float = 0.2) -> BatchConfig:
        """生成配置變體"""
        new_config = BatchConfig(
            min_batch_size=base_config.min_batch_size,
            max_batch_size=base_config.max_batch_size,
            max_wait_time=base_config.max_wait_time,
            target_latency=base_config.target_latency,
            priority_boost=base_config.priority_boost.copy()
        )
        
        # 隨機變異參數
        if random.random() < mutation_rate:
            min_val, max_val = self.param_ranges['min_batch_size']
            new_config.min_batch_size = random.randint(min_val, max_val)
        
        if random.random() < mutation_rate:
            min_val, max_val = self.param_ranges['max_batch_size']
            new_config.max_batch_size = random.randint(min_val, max_val)
            # 確保 max >= min
            new_config.max_batch_size = max(new_config.max_batch_size, new_config.min_batch_size)
        
        if random.random() < mutation_rate:
            min_val, max_val = self.param_ranges['max_wait_time']
            new_config.max_wait_time = random.uniform(min_val, max_val)
        
        if random.random() < mutation_rate:
            min_val, max_val = self.param_ranges['target_latency']
            new_config.target_latency = random.uniform(min_val, max_val)
        
        return new_config
    
    def optimize_batch_config(self, iterations: int = 5, test_duration: int = 10):
        """優化批次配置"""
        print(f"🔧 開始自適應批次配置優化（{iterations} 次迭代）")
        
        current_config = self.scheduler.config
        
        for iteration in range(iterations):
            print(f"\n🔄 迭代 {iteration + 1}/{iterations}")
            
            # 生成新配置
            if iteration == 0:
                test_config = current_config
            else:
                test_config = self.generate_config_variant(current_config)
            
            print(f"   🔧 測試配置: min_batch={test_config.min_batch_size}, "
                  f"max_batch={test_config.max_batch_size}, "
                  f"max_wait={test_config.max_wait_time:.3f}s, "
                  f"target_latency={test_config.target_latency:.3f}s")
            
            # 應用新配置
            old_config = self.scheduler.config
            self.scheduler.config = test_config
            
            # 清空歷史記錄以獲得純淨測試
            metrics_backup = self.scheduler.metrics_history.copy()
            self.scheduler.metrics_history.clear()
            
            try:
                # 生成測試流量
                test_requests = request_generator.simulate_traffic_pattern(test_duration, "normal")
                
                # 添加請求
                for request in test_requests:
                    self.scheduler.add_request(request)
                    time.sleep(0.001)
                
                # 等待處理完成
                time.sleep(2)
                
                # 計算性能分數
                score = self.calculate_performance_score()
                
                print(f"   📊 性能分數: {score:.4f}")
                
                # 記錄結果
                self.optimization_history.append({
                    'iteration': iteration + 1,
                    'config': test_config,
                    'score': score,
                    'metrics_count': len(self.scheduler.metrics_history)
                })
                
                # 更新最佳配置
                if score > self.best_score:
                    self.best_score = score
                    self.best_config = test_config
                    current_config = test_config
                    print(f"   ✅ 發現更佳配置！新最佳分數: {score:.4f}")
                else:
                    print(f"   📉 配置未改善，保持當前配置")
                
            except Exception as e:
                print(f"   ❌ 測試失敗: {str(e)}")
                # 恢復原配置
                self.scheduler.config = old_config
            
            # 恢復部分歷史記錄
            self.scheduler.metrics_history = metrics_backup + self.scheduler.metrics_history
        
        # 應用最佳配置
        if self.best_config:
            self.scheduler.config = self.best_config
            print(f"\n🎯 優化完成！應用最佳配置:")
            print(f"   📊 最佳分數: {self.best_score:.4f}")
            print(f"   🔧 最佳配置: min_batch={self.best_config.min_batch_size}, "
                  f"max_batch={self.best_config.max_batch_size}, "
                  f"max_wait={self.best_config.max_wait_time:.3f}s, "
                  f"target_latency={self.best_config.target_latency:.3f}s")
    
    def plot_optimization_history(self):
        """可視化優化歷史"""
        if not self.optimization_history:
            print("❌ 沒有優化歷史數據")
            return
        
        iterations = [h['iteration'] for h in self.optimization_history]
        scores = [h['score'] for h in self.optimization_history]
        min_batches = [h['config'].min_batch_size for h in self.optimization_history]
        max_batches = [h['config'].max_batch_size for h in self.optimization_history]
        wait_times = [h['config'].max_wait_time * 1000 for h in self.optimization_history]  # 轉換為毫秒
        
        fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
        fig.suptitle('自適應批次配置優化歷史', fontsize=16, fontweight='bold')
        
        # 性能分數趨勢
        ax1.plot(iterations, scores, 'b-o', linewidth=2, markersize=8)
        ax1.set_title('性能分數變化', fontweight='bold')
        ax1.set_xlabel('迭代次數')
        ax1.set_ylabel('性能分數')
        ax1.grid(True, alpha=0.3)
        
        # 標記最佳點
        best_idx = scores.index(max(scores))
        ax1.scatter(iterations[best_idx], scores[best_idx], color='red', s=100, zorder=5)
        ax1.annotate(f'最佳: {scores[best_idx]:.4f}', 
                    xy=(iterations[best_idx], scores[best_idx]),
                    xytext=(10, 10), textcoords='offset points',
                    bbox=dict(boxstyle='round,pad=0.3', facecolor='yellow', alpha=0.7))
        
        # 批次大小變化
        ax2.plot(iterations, min_batches, 'g-s', label='最小批次', linewidth=2, markersize=6)
        ax2.plot(iterations, max_batches, 'r-^', label='最大批次', linewidth=2, markersize=6)
        ax2.set_title('批次大小參數變化', fontweight='bold')
        ax2.set_xlabel('迭代次數')
        ax2.set_ylabel('批次大小')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        # 等待時間變化
        ax3.plot(iterations, wait_times, 'purple', marker='d', linewidth=2, markersize=6)
        ax3.set_title('最大等待時間變化', fontweight='bold')
        ax3.set_xlabel('迭代次數')
        ax3.set_ylabel('等待時間 (ms)')
        ax3.grid(True, alpha=0.3)
        
        # 參數相關性熱圖
        param_data = pd.DataFrame({
            'score': scores,
            'min_batch': min_batches,
            'max_batch': max_batches,
            'wait_time': wait_times
        })
        
        correlation_matrix = param_data.corr()
        im = ax4.imshow(correlation_matrix, cmap='RdYlBu', aspect='auto', vmin=-1, vmax=1)
        ax4.set_title('參數相關性', fontweight='bold')
        ax4.set_xticks(range(len(correlation_matrix.columns)))
        ax4.set_yticks(range(len(correlation_matrix.columns)))
        ax4.set_xticklabels(correlation_matrix.columns, rotation=45)
        ax4.set_yticklabels(correlation_matrix.columns)
        
        # 添加相關係數文字
        for i in range(len(correlation_matrix)):
            for j in range(len(correlation_matrix.columns)):
                text = ax4.text(j, i, f'{correlation_matrix.iloc[i, j]:.2f}',
                               ha="center", va="center", color="black", fontweight='bold')
        
        plt.colorbar(im, ax=ax4)
        plt.tight_layout()
        plt.show()


# 創建優化器
optimizer = AdaptiveBatchOptimizer(scheduler)
print("✅ 自適應批次優化器創建完成")

### 4.2 執行自適應優化

In [None]:
# 執行優化
optimizer.optimize_batch_config(iterations=6, test_duration=8)

# 可視化優化過程
optimizer.plot_optimization_history()

### 4.3 優化後性能驗證

In [None]:
# 使用優化後的配置進行最終驗證測試
print("🧪 優化後配置驗證測試")

# 生成大量測試請求
validation_requests = request_generator.simulate_traffic_pattern(30, "spike")

print(f"📊 驗證測試開始，共 {len(validation_requests)} 個請求")

# 清空之前的指標
validation_start_metrics = len(scheduler.metrics_history)

# 添加驗證請求
for i, request in enumerate(validation_requests):
    scheduler.add_request(request)
    time.sleep(0.0005)
    
    if (i + 1) % 100 == 0:
        print(f"   📈 已添加 {i + 1}/{len(validation_requests)} 個請求")

# 等待處理完成
print("⏳ 等待處理完成...")
time.sleep(5)

# 獲取驗證結果
final_performance = scheduler.get_performance_summary()
validation_metrics = scheduler.metrics_history[validation_start_metrics:]

print(f"\n✅ 優化後性能驗證結果:")
print(f"   📊 處理請求: {len(validation_requests)}")
print(f"   📊 實際處理: {final_performance['processed_requests'] - (validation_start_metrics * np.mean([m.batch_size for m in scheduler.metrics_history[:validation_start_metrics]] if validation_start_metrics > 0 else [1]))}")
print(f"   📊 過期請求: {final_performance['expired_requests']}")
print(f"   ⏱️  平均延遲: {final_performance['avg_latency']:.3f}s")
print(f"   🚀 平均吞吐量: {final_performance['avg_throughput']:.1f} req/s")
print(f"   📏 平均批次大小: {final_performance['avg_batch_size']:.1f}")
print(f"   💻 平均 GPU 利用率: {final_performance['avg_gpu_utilization']:.1f}%")

# 計算最終性能分數
final_score = optimizer.calculate_performance_score(50)
print(f"   🎯 最終性能分數: {final_score:.4f}")

# 比較優化前後
if optimizer.optimization_history:
    initial_score = optimizer.optimization_history[0]['score']
    improvement = ((final_score - initial_score) / initial_score) * 100
    print(f"   📈 性能改善: {improvement:+.1f}%")

# 可視化最終性能
plot_scheduler_performance(scheduler, window_size=min(50, len(validation_metrics)))

## 🎯 實驗 5：負載均衡與故障恢復

### 5.1 多調度器負載均衡

In [None]:
class LoadBalancer:
    """負載均衡器"""
    
    def __init__(self, schedulers: List[SmartBatchScheduler]):
        self.schedulers = schedulers
        self.scheduler_weights = [1.0] * len(schedulers)  # 初始權重相等
        self.request_counts = [0] * len(schedulers)
        self.health_status = [True] * len(schedulers)
        
        # 負載均衡策略
        self.strategies = {
            'round_robin': self._round_robin,
            'least_connections': self._least_connections,
            'weighted_performance': self._weighted_performance,
            'priority_aware': self._priority_aware
        }
        
        self.current_strategy = 'weighted_performance'
        self.round_robin_index = 0
    
    def _round_robin(self, request: InferenceRequest) -> int:
        """輪詢策略"""
        # 跳過不健康的調度器
        attempts = 0
        while attempts < len(self.schedulers):
            if self.health_status[self.round_robin_index]:
                selected = self.round_robin_index
                self.round_robin_index = (self.round_robin_index + 1) % len(self.schedulers)
                return selected
            self.round_robin_index = (self.round_robin_index + 1) % len(self.schedulers)
            attempts += 1
        return 0  # 備用方案
    
    def _least_connections(self, request: InferenceRequest) -> int:
        """最少連接策略"""
        min_connections = float('inf')
        selected_idx = 0
        
        for i, scheduler in enumerate(self.schedulers):
            if not self.health_status[i]:
                continue
            
            queue_size = scheduler.get_queue_status()['queue_size']
            if queue_size < min_connections:
                min_connections = queue_size
                selected_idx = i
        
        return selected_idx
    
    def _weighted_performance(self, request: InferenceRequest) -> int:
        """基於性能權重的策略"""
        # 更新權重基於最近性能
        self._update_weights()
        
        # 加權隨機選擇
        healthy_indices = [i for i, healthy in enumerate(self.health_status) if healthy]
        if not healthy_indices:
            return 0
        
        healthy_weights = [self.scheduler_weights[i] for i in healthy_indices]
        total_weight = sum(healthy_weights)
        
        if total_weight == 0:
            return random.choice(healthy_indices)
        
        rand_val = random.uniform(0, total_weight)
        cumulative = 0
        
        for i, idx in enumerate(healthy_indices):
            cumulative += healthy_weights[i]
            if rand_val <= cumulative:
                return idx
        
        return healthy_indices[-1]
    
    def _priority_aware(self, request: InferenceRequest) -> int:
        """優先級感知策略"""
        # 高優先級請求選擇性能最好的調度器
        if request.priority <= 2:
            best_scheduler = -1
            best_score = -1
            
            for i, scheduler in enumerate(self.schedulers):
                if not self.health_status[i]:
                    continue
                
                # 計算調度器分數（低延遲 + 低佇列長度）
                recent_metrics = scheduler.metrics_history[-5:]
                if recent_metrics:
                    avg_latency = np.mean([m.total_latency for m in recent_metrics])
                    queue_size = scheduler.get_queue_status()['queue_size']
                    score = 1.0 / (avg_latency + 0.001) - queue_size * 0.01
                    
                    if score > best_score:
                        best_score = score
                        best_scheduler = i
            
            return best_scheduler if best_scheduler >= 0 else 0
        else:
            # 低優先級請求使用負載均衡
            return self._least_connections(request)
    
    def _update_weights(self):
        """更新調度器權重"""
        for i, scheduler in enumerate(self.schedulers):
            if not self.health_status[i]:
                self.scheduler_weights[i] = 0.0
                continue
            
            recent_metrics = scheduler.metrics_history[-10:]
            if recent_metrics:
                avg_throughput = np.mean([m.throughput for m in recent_metrics])
                avg_latency = np.mean([m.total_latency for m in recent_metrics])
                
                # 權重 = 吞吐量 / 延遲
                weight = avg_throughput / max(avg_latency, 0.001)
                self.scheduler_weights[i] = weight
            else:
                self.scheduler_weights[i] = 1.0
    
    def route_request(self, request: InferenceRequest) -> bool:
        """路由請求到合適的調度器"""
        strategy_func = self.strategies.get(self.current_strategy, self._round_robin)
        selected_idx = strategy_func(request)
        
        if 0 <= selected_idx < len(self.schedulers) and self.health_status[selected_idx]:
            success = self.schedulers[selected_idx].add_request(request)
            if success:
                self.request_counts[selected_idx] += 1
            return success
        
        return False
    
    def check_health(self):
        """檢查調度器健康狀態"""
        for i, scheduler in enumerate(self.schedulers):
            try:
                # 簡單健康檢查：檢查佇列狀態
                status = scheduler.get_queue_status()
                
                # 如果佇列過長或最老請求過久，標記為不健康
                queue_too_long = status['queue_size'] > 1000
                oldest_too_old = status['oldest_request_age'] > 60  # 60秒
                
                self.health_status[i] = not (queue_too_long or oldest_too_old)
                
            except Exception:
                self.health_status[i] = False
    
    def get_load_distribution(self) -> Dict[str, Any]:
        """獲取負載分布狀態"""
        total_requests = sum(self.request_counts)
        
        distribution = []
        for i, scheduler in enumerate(self.schedulers):
            queue_status = scheduler.get_queue_status()
            performance = scheduler.get_performance_summary()
            
            distribution.append({
                'scheduler_id': i,
                'healthy': self.health_status[i],
                'weight': self.scheduler_weights[i],
                'requests_routed': self.request_counts[i],
                'request_percentage': (self.request_counts[i] / max(total_requests, 1)) * 100,
                'queue_size': queue_status['queue_size'],
                'avg_latency': performance.get('avg_latency', 0),
                'avg_throughput': performance.get('avg_throughput', 0)
            })
        
        return {
            'strategy': self.current_strategy,
            'total_requests': total_requests,
            'healthy_schedulers': sum(self.health_status),
            'distribution': distribution
        }


print("✅ 負載均衡器實現完成")

### 5.2 創建多調度器環境

In [None]:
# 停止原調度器
scheduler.stop()

# 創建多個調度器實例
print("🔧 創建多調度器環境...")

# 調度器配置（每個略有不同以模擬真實環境）
scheduler_configs = [
    BatchConfig(min_batch_size=1, max_batch_size=8, max_wait_time=0.03, target_latency=0.025),
    BatchConfig(min_batch_size=2, max_batch_size=16, max_wait_time=0.05, target_latency=0.035),
    BatchConfig(min_batch_size=1, max_batch_size=12, max_wait_time=0.04, target_latency=0.030),
]

# 創建調度器實例
schedulers = []
for i, config in enumerate(scheduler_configs):
    sched = SmartBatchScheduler(config, model_name=f"text_classifier_replica_{i}")
    sched.start()
    schedulers.append(sched)
    print(f"   ✅ 調度器 {i} 已啟動 (max_batch={config.max_batch_size})")

# 創建負載均衡器
load_balancer = LoadBalancer(schedulers)
print(f"✅ 負載均衡器已創建，管理 {len(schedulers)} 個調度器")

### 5.3 負載均衡測試

In [None]:
# 測試不同負載均衡策略
strategies_to_test = ['round_robin', 'least_connections', 'weighted_performance', 'priority_aware']

strategy_results = {}

for strategy in strategies_to_test:
    print(f"\n🧪 測試負載均衡策略: {strategy}")
    
    # 重置統計
    load_balancer.current_strategy = strategy
    load_balancer.request_counts = [0] * len(schedulers)
    
    # 生成測試請求
    test_requests = request_generator.simulate_traffic_pattern(20, "normal")
    
    # 路由請求
    successful_routes = 0
    for request in test_requests:
        if load_balancer.route_request(request):
            successful_routes += 1
        time.sleep(0.001)
    
    # 等待處理
    time.sleep(3)
    
    # 檢查健康狀態
    load_balancer.check_health()
    
    # 獲取分布狀態
    distribution = load_balancer.get_load_distribution()
    strategy_results[strategy] = distribution
    
    print(f"   📊 成功路由: {successful_routes}/{len(test_requests)} 個請求")
    print(f"   🏥 健康調度器: {distribution['healthy_schedulers']}/{len(schedulers)}")
    
    for i, sched_info in enumerate(distribution['distribution']):
        print(f"   📈 調度器 {i}: {sched_info['requests_routed']} 請求 "
              f"({sched_info['request_percentage']:.1f}%), "
              f"佇列: {sched_info['queue_size']}, "
              f"健康: {'✅' if sched_info['healthy'] else '❌'}")

### 5.4 可視化負載均衡效果

In [None]:
def plot_load_balancing_results(strategy_results: Dict[str, Dict]):
    """可視化負載均衡結果"""
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle('負載均衡策略效果對比', fontsize=16, fontweight='bold')
    
    strategies = list(strategy_results.keys())
    num_schedulers = len(strategy_results[strategies[0]]['distribution'])
    
    # 請求分布對比
    ax1 = axes[0, 0]
    x = np.arange(num_schedulers)
    width = 0.2
    
    for i, strategy in enumerate(strategies):
        percentages = [sched['request_percentage'] for sched in strategy_results[strategy]['distribution']]
        ax1.bar(x + i * width, percentages, width, label=strategy, alpha=0.8)
    
    ax1.set_title('請求分布比例 (%)', fontweight='bold')
    ax1.set_xlabel('調度器 ID')
    ax1.set_ylabel('請求比例 (%)')
    ax1.set_xticks(x + width * 1.5)
    ax1.set_xticklabels([f'Sched {i}' for i in range(num_schedulers)])
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # 平均延遲對比
    ax2 = axes[0, 1]
    for i, strategy in enumerate(strategies):
        latencies = [sched['avg_latency'] * 1000 for sched in strategy_results[strategy]['distribution']]
        ax2.bar(x + i * width, latencies, width, label=strategy, alpha=0.8)
    
    ax2.set_title('平均延遲對比 (ms)', fontweight='bold')
    ax2.set_xlabel('調度器 ID')
    ax2.set_ylabel('平均延遲 (ms)')
    ax2.set_xticks(x + width * 1.5)
    ax2.set_xticklabels([f'Sched {i}' for i in range(num_schedulers)])
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    # 吞吐量對比
    ax3 = axes[1, 0]
    for i, strategy in enumerate(strategies):
        throughputs = [sched['avg_throughput'] for sched in strategy_results[strategy]['distribution']]
        ax3.bar(x + i * width, throughputs, width, label=strategy, alpha=0.8)
    
    ax3.set_title('平均吞吐量對比 (req/s)', fontweight='bold')
    ax3.set_xlabel('調度器 ID')
    ax3.set_ylabel('吞吐量 (req/s)')
    ax3.set_xticks(x + width * 1.5)
    ax3.set_xticklabels([f'Sched {i}' for i in range(num_schedulers)])
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    
    # 負載均衡度分析（標準差）
    ax4 = axes[1, 1]
    balance_scores = []
    strategy_names = []
    
    for strategy in strategies:
        percentages = [sched['request_percentage'] for sched in strategy_results[strategy]['distribution']]
        # 計算分布的標準差（越小越均衡）
        balance_score = np.std(percentages)
        balance_scores.append(balance_score)
        strategy_names.append(strategy)
    
    colors = ['skyblue', 'lightcoral', 'lightgreen', 'gold']
    bars = ax4.bar(strategy_names, balance_scores, color=colors[:len(strategies)], alpha=0.8)
    ax4.set_title('負載均衡度 (標準差)', fontweight='bold')
    ax4.set_ylabel('分布標準差 (越小越均衡)')
    ax4.tick_params(axis='x', rotation=45)
    ax4.grid(True, alpha=0.3)
    
    # 添加數值標籤
    for bar, score in zip(bars, balance_scores):
        ax4.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(balance_scores)*0.01,
                f'{score:.2f}', ha='center', va='bottom', fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    # 策略效果摘要
    print(f"\n📊 負載均衡策略效果摘要:")
    print("=" * 60)
    
    for strategy in strategies:
        result = strategy_results[strategy]
        percentages = [sched['request_percentage'] for sched in result['distribution']]
        latencies = [sched['avg_latency'] * 1000 for sched in result['distribution']]
        throughputs = [sched['avg_throughput'] for sched in result['distribution']]
        
        balance_score = np.std(percentages)
        avg_latency = np.mean(latencies)
        total_throughput = np.sum(throughputs)
        
        print(f"\n🔧 {strategy.replace('_', ' ').title()}:")
        print(f"   ⚖️  均衡度: {balance_score:.2f} (越小越好)")
        print(f"   ⏱️  平均延遲: {avg_latency:.2f}ms")
        print(f"   🚀 總吞吐量: {total_throughput:.1f} req/s")
        print(f"   🏥 健康率: {result['healthy_schedulers']}/{len(schedulers)}")


# 可視化結果
plot_load_balancing_results(strategy_results)

## 📊 實驗總結與最佳實踐

In [None]:
# 停止所有調度器
for i, sched in enumerate(schedulers):
    sched.stop()
    print(f"⏹️  調度器 {i} 已停止")

# 導出實驗結果
def export_dynamic_batching_results():
    """導出動態批次處理實驗結果"""
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    # 綜合結果
    experiment_results = {
        "experiment_name": "Dynamic Batching Advanced",
        "timestamp": timestamp,
        "optimization_history": optimizer.optimization_history,
        "best_config": {
            "min_batch_size": optimizer.best_config.min_batch_size if optimizer.best_config else None,
            "max_batch_size": optimizer.best_config.max_batch_size if optimizer.best_config else None,
            "max_wait_time": optimizer.best_config.max_wait_time if optimizer.best_config else None,
            "target_latency": optimizer.best_config.target_latency if optimizer.best_config else None,
        },
        "best_score": optimizer.best_score,
        "load_balancing_results": strategy_results,
        "scheduler_metrics": [
            {
                "scheduler_id": i,
                "total_batches": len(sched.metrics_history),
                "performance_summary": sched.get_performance_summary()
            }
            for i, sched in enumerate(schedulers)
        ]
    }
    
    # 導出 JSON 文件
    results_file = f"{EXPERIMENT_DIR}/dynamic_batching_results_{timestamp}.json"
    with open(results_file, 'w') as f:
        json.dump(experiment_results, f, indent=2, default=str)
    
    print(f"📄 實驗結果已導出: {results_file}")
    return results_file


# 導出結果
results_file = export_dynamic_batching_results()

# 最佳實踐總結
best_practices_summary = """
🎯 高級動態批次處理最佳實踐總結

🔧 智能調度策略:
   ✅ 自適應批次大小調整
   ✅ 優先級感知的等待時間策略
   ✅ 實時性能監控和反饋調整
   ✅ 基於負載的動態配置優化

⚖️ 負載均衡策略選擇:
   📊 輪詢策略: 簡單均勻，適合同質化環境
   📊 最少連接: 動態均衡，適合異質化負載
   📊 權重性能: 智能分配，適合性能差異化
   📊 優先級感知: 服務質量保證，適合SLA要求

🎯 關鍵性能指標:
   ⏱️  延遲優化: 目標 < 50ms P99 延遲
   🚀 吞吐量: 最大化 GPU 利用率
   ⚖️  公平性: 優先級間合理的延遲差異
   💻 資源效率: GPU 利用率 > 80%

🛡️ 故障恢復機制:
   ✅ 健康檢查和自動故障轉移
   ✅ 請求超時和過期處理
   ✅ 佇列長度監控和限制
   ✅ 性能退化檢測和恢復

📈 優化建議:
   🔄 定期重新評估和調優批次參數
   📊 基於業務模式調整優先級策略
   🎯 根據SLA要求設定目標延遲
   💡 使用A/B測試驗證優化效果

🚀 生產部署考慮:
   📦 容器化部署支持水平擴展
   📊 完整的監控和告警體系
   🔧 配置熱更新能力
   📝 詳細的性能日誌記錄
"""

print(best_practices_summary)

print(f"\n✅ Lab 2.4.2 高級動態批次處理實驗完成！")
print(f"📊 實驗數據已保存至: {EXPERIMENT_DIR}")
print(f"🎯 最佳配置已確定並可用於生產環境")

## 📖 總結

本實驗完成了高級動態批次處理與智能調度的完整實現：

### 🎯 實驗成果
1. **智能調度器** - 實現了自適應批次大小和優先級感知調度
2. **性能優化** - 通過自動調優實現延遲與吞吐量的最佳平衡
3. **負載均衡** - 支援多種策略的智能請求路由
4. **故障恢復** - 具備健康檢查和自動故障轉移能力

### 🔧 關鍵技術特點
- 基於優先級的動態等待時間調整
- 實時性能監控和反饋式參數調優
- 多調度器協同工作和負載分散
- 自動化故障檢測和恢復機制

### 🚀 實際應用價值
1. **提升系統吞吐量** - 智能批次處理提高GPU利用率
2. **降低服務延遲** - 優先級調度保證關鍵請求響應時間
3. **增強系統穩定性** - 負載均衡和故障恢復提高可用性
4. **簡化運維管理** - 自動化調優減少人工干預

### 💡 學習要點
- 動態批次處理需要平衡延遲與吞吐量
- 優先級調度是企業級服務的關鍵特性
- 負載均衡策略的選擇影響整體性能
- 監控和可觀測性是優化的基礎

---

**🎉 恭喜完成 Lab 2.4.2！**

您已經掌握了高級動態批次處理技術，可以構建高效、可靠的智能推理調度系統。