# Lab 2.4.1 - 模型版本管理與 A/B 測試

## 🎯 實驗目標

本實驗將教您如何：
1. 實現企業級模型版本管理策略
2. 設計和執行 A/B 測試框架
3. 實現漸進式模型部署 (Canary Deployment)
4. 構建模型性能監控和回滾機制
5. 設置流量分配和路由策略

## 📋 前置需求

- 完成 Lab 2.1（Triton 基礎設置）
- 熟悉容器技術和 Kubernetes
- 了解 CI/CD 流程和版本控制

---

## 📚 理論背景

### 企業級模型管理挑戰

**1. 版本管理複雜性**
- 多個模型版本並存
- 不同環境間的版本同步
- 回滾策略和數據一致性

**2. A/B 測試需求**
- 業務指標評估
- 用戶體驗比較
- 風險控制和漸進部署

**3. 生產環境穩定性**
- 零停機部署
- 性能監控和告警
- 自動故障恢復

### Triton 版本管理架構

```mermaid
graph TD
    A[Model Repository] --> B[Version 1]
    A --> C[Version 2]
    A --> D[Version 3]
    
    B --> E[Production 80%]
    C --> F[Canary 15%]
    D --> G[Shadow 5%]
    
    E --> H[Load Balancer]
    F --> H
    G --> I[Metrics Only]
    
    H --> J[User Traffic]
```

## 🛠️ 環境準備

In [None]:
import os
import json
import time
import random
import hashlib
import requests
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass
from concurrent.futures import ThreadPoolExecutor, as_completed

# Triton 客戶端
import tritonclient.http as httpclient
from tritonclient.utils import InferenceServerException

# 可視化
import matplotlib.pyplot as plt
import seaborn as sns

# 設置樣式
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print(f"Environment ready at {datetime.now()}")
print(f"Working directory: {os.getcwd()}")

In [None]:
# 設置實驗環境
BASE_DIR = "/opt/tritonserver"
MODEL_REPO = f"{BASE_DIR}/models"
EXPERIMENT_DIR = f"{BASE_DIR}/experiments/ab_testing"

# 創建實驗目錄
os.makedirs(EXPERIMENT_DIR, exist_ok=True)
os.makedirs(f"{EXPERIMENT_DIR}/metrics", exist_ok=True)
os.makedirs(f"{EXPERIMENT_DIR}/configs", exist_ok=True)
os.makedirs(f"{EXPERIMENT_DIR}/logs", exist_ok=True)

print(f"📁 實驗目錄: {EXPERIMENT_DIR}")
print(f"📁 模型倉庫: {MODEL_REPO}")

## 🎯 實驗 1：模型版本管理系統

### 1.1 版本管理類設計

In [None]:
@dataclass
class ModelVersion:
    """模型版本信息"""
    name: str
    version: int
    created_at: datetime
    status: str  # "active", "inactive", "testing", "deprecated"
    traffic_percentage: float
    performance_metrics: Dict[str, float]
    metadata: Dict[str, str]


class ModelVersionManager:
    """模型版本管理器"""
    
    def __init__(self, model_name: str, triton_url: str = "localhost:8000"):
        self.model_name = model_name
        self.triton_url = triton_url
        self.client = httpclient.InferenceServerClient(url=triton_url)
        self.versions: Dict[int, ModelVersion] = {}
        self.traffic_rules = {}
        
        # 加載現有版本
        self._discover_versions()
    
    def _discover_versions(self):
        """發現現有模型版本"""
        try:
            model_config = self.client.get_model_config(self.model_name)
            print(f"✅ 發現模型: {self.model_name}")
            
            # 模擬版本發現（在實際環境中會從模型倉庫讀取）
            for version in [1, 2, 3]:
                self.versions[version] = ModelVersion(
                    name=self.model_name,
                    version=version,
                    created_at=datetime.now() - timedelta(days=version*10),
                    status="active" if version == 2 else "inactive",
                    traffic_percentage=100.0 if version == 2 else 0.0,
                    performance_metrics={
                        "latency_p99": random.uniform(50, 200),
                        "throughput": random.uniform(100, 1000),
                        "error_rate": random.uniform(0, 0.05)
                    },
                    metadata={
                        "framework": "pytorch",
                        "precision": "fp16" if version > 1 else "fp32"
                    }
                )
        
        except Exception as e:
            print(f"❌ 模型發現失敗: {str(e)}")
    
    def register_version(self, version: int, metadata: Dict[str, str] = None) -> bool:
        """註冊新模型版本"""
        try:
            new_version = ModelVersion(
                name=self.model_name,
                version=version,
                created_at=datetime.now(),
                status="inactive",
                traffic_percentage=0.0,
                performance_metrics={},
                metadata=metadata or {}
            )
            
            self.versions[version] = new_version
            
            print(f"✅ 版本 {version} 註冊成功")
            return True
            
        except Exception as e:
            print(f"❌ 版本註冊失敗: {str(e)}")
            return False
    
    def get_version_info(self, version: int) -> Optional[ModelVersion]:
        """獲取版本信息"""
        return self.versions.get(version)
    
    def list_versions(self) -> List[ModelVersion]:
        """列出所有版本"""
        return list(self.versions.values())
    
    def set_traffic_split(self, traffic_config: Dict[int, float]):
        """設置流量分配"""
        total_percentage = sum(traffic_config.values())
        
        if abs(total_percentage - 100.0) > 0.001:
            raise ValueError(f"流量分配總和必須為 100%，當前為 {total_percentage}%")
        
        # 更新版本流量
        for version_num, version in self.versions.items():
            version.traffic_percentage = traffic_config.get(version_num, 0.0)
            version.status = "active" if version.traffic_percentage > 0 else "inactive"
        
        self.traffic_rules = traffic_config
        
        print(f"✅ 流量分配已更新: {traffic_config}")
    
    def get_version_for_request(self, request_id: str = None) -> int:
        """根據流量規則選擇版本"""
        if not self.traffic_rules:
            # 默認使用最新的活躍版本
            active_versions = [v for v in self.versions.values() if v.status == "active"]
            if active_versions:
                return max(active_versions, key=lambda x: x.version).version
            return max(self.versions.keys())
        
        # 基於權重隨機選擇
        rand_val = random.uniform(0, 100)
        cumulative = 0
        
        for version, percentage in sorted(self.traffic_rules.items()):
            cumulative += percentage
            if rand_val <= cumulative:
                return version
        
        # 回退到默認版本
        return max(self.traffic_rules.keys())
    
    def export_config(self, filepath: str):
        """導出配置到文件"""
        config = {
            "model_name": self.model_name,
            "versions": {},
            "traffic_rules": self.traffic_rules,
            "exported_at": datetime.now().isoformat()
        }
        
        for version_num, version in self.versions.items():
            config["versions"][str(version_num)] = {
                "status": version.status,
                "traffic_percentage": version.traffic_percentage,
                "performance_metrics": version.performance_metrics,
                "metadata": version.metadata,
                "created_at": version.created_at.isoformat()
            }
        
        with open(filepath, 'w') as f:
            json.dump(config, f, indent=2)
        
        print(f"✅ 配置已導出到: {filepath}")


# 創建版本管理器實例
print("🔧 創建模型版本管理器...")
version_manager = ModelVersionManager("text_classifier")
print(f"📊 發現版本數量: {len(version_manager.versions)}")

### 1.2 版本信息展示

In [None]:
# 顯示所有版本信息
def display_version_summary(manager: ModelVersionManager):
    """顯示版本摘要"""
    print(f"\n📋 模型 '{manager.model_name}' 版本摘要")
    print("=" * 80)
    
    for version in sorted(manager.list_versions(), key=lambda x: x.version):
        print(f"\n🏷️  版本 {version.version} ({version.status.upper()})")
        print(f"   📅 創建時間: {version.created_at.strftime('%Y-%m-%d %H:%M')}")
        print(f"   🚦 流量比例: {version.traffic_percentage:.1f}%")
        print(f"   📊 性能指標:")
        for metric, value in version.performance_metrics.items():
            if metric == "error_rate":
                print(f"      • {metric}: {value:.3f}")
            else:
                print(f"      • {metric}: {value:.1f}")
        print(f"   🏷️  元數據: {version.metadata}")


display_version_summary(version_manager)

In [None]:
# 可視化版本性能對比
def plot_version_performance(manager: ModelVersionManager):
    """可視化版本性能"""
    versions = manager.list_versions()
    
    if not versions:
        print("❌ 沒有版本數據可供分析")
        return
    
    # 準備數據
    version_nums = [v.version for v in versions]
    latencies = [v.performance_metrics.get("latency_p99", 0) for v in versions]
    throughputs = [v.performance_metrics.get("throughput", 0) for v in versions]
    error_rates = [v.performance_metrics.get("error_rate", 0) * 100 for v in versions]
    traffic = [v.traffic_percentage for v in versions]
    
    # 創建子圖
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
    
    # 延遲對比
    bars1 = ax1.bar(version_nums, latencies, alpha=0.7, color='skyblue')
    ax1.set_title('P99 延遲對比 (ms)', fontsize=14, fontweight='bold')
    ax1.set_xlabel('版本')
    ax1.set_ylabel('延遲 (ms)')
    ax1.grid(True, alpha=0.3)
    
    # 添加數值標籤
    for bar, val in zip(bars1, latencies):
        ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1,
                f'{val:.1f}', ha='center', va='bottom')
    
    # 吞吐量對比
    bars2 = ax2.bar(version_nums, throughputs, alpha=0.7, color='lightgreen')
    ax2.set_title('吞吐量對比 (QPS)', fontsize=14, fontweight='bold')
    ax2.set_xlabel('版本')
    ax2.set_ylabel('吞吐量 (QPS)')
    ax2.grid(True, alpha=0.3)
    
    for bar, val in zip(bars2, throughputs):
        ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 10,
                f'{val:.0f}', ha='center', va='bottom')
    
    # 錯誤率對比
    bars3 = ax3.bar(version_nums, error_rates, alpha=0.7, color='salmon')
    ax3.set_title('錯誤率對比 (%)', fontsize=14, fontweight='bold')
    ax3.set_xlabel('版本')
    ax3.set_ylabel('錯誤率 (%)')
    ax3.grid(True, alpha=0.3)
    
    for bar, val in zip(bars3, error_rates):
        ax3.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1,
                f'{val:.2f}%', ha='center', va='bottom')
    
    # 流量分配
    colors = ['gold', 'lightcoral', 'lightblue']
    wedges, texts, autotexts = ax4.pie(traffic, labels=[f'V{v}' for v in version_nums],
                                      autopct='%1.1f%%', colors=colors[:len(version_nums)])
    ax4.set_title('流量分配', fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.show()


plot_version_performance(version_manager)

## 🎯 實驗 2：A/B 測試框架

### 2.1 A/B 測試管理器

In [None]:
@dataclass
class ABTestConfig:
    """A/B 測試配置"""
    test_name: str
    model_name: str
    control_version: int
    treatment_version: int
    traffic_split: float  # treatment 版本的流量比例
    start_time: datetime
    end_time: datetime
    success_metrics: List[str]
    min_sample_size: int
    significance_level: float
    status: str  # "planned", "running", "completed", "stopped"


@dataclass
class TestMetrics:
    """測試指標數據"""
    version: int
    request_count: int
    success_count: int
    total_latency: float
    error_count: int
    timestamp: datetime


class ABTestManager:
    """A/B 測試管理器"""
    
    def __init__(self, version_manager: ModelVersionManager):
        self.version_manager = version_manager
        self.active_tests: Dict[str, ABTestConfig] = {}
        self.test_metrics: Dict[str, List[TestMetrics]] = {}
        self.test_results: Dict[str, Dict] = {}
    
    def create_test(self, config: ABTestConfig) -> bool:
        """創建新的 A/B 測試"""
        try:
            # 驗證版本存在
            control_version = self.version_manager.get_version_info(config.control_version)
            treatment_version = self.version_manager.get_version_info(config.treatment_version)
            
            if not control_version or not treatment_version:
                raise ValueError("指定的版本不存在")
            
            # 檢查時間配置
            if config.start_time >= config.end_time:
                raise ValueError("結束時間必須晚於開始時間")
            
            # 添加到活躍測試
            self.active_tests[config.test_name] = config
            self.test_metrics[config.test_name] = []
            
            print(f"✅ A/B 測試 '{config.test_name}' 創建成功")
            print(f"   📊 控制組: V{config.control_version} ({100-config.traffic_split:.1f}%)")
            print(f"   🧪 實驗組: V{config.treatment_version} ({config.traffic_split:.1f}%)")
            print(f"   ⏰ 測試期間: {config.start_time.strftime('%Y-%m-%d')} - {config.end_time.strftime('%Y-%m-%d')}")
            
            return True
            
        except Exception as e:
            print(f"❌ A/B 測試創建失敗: {str(e)}")
            return False
    
    def start_test(self, test_name: str) -> bool:
        """啟動 A/B 測試"""
        if test_name not in self.active_tests:
            print(f"❌ 測試 '{test_name}' 不存在")
            return False
        
        test_config = self.active_tests[test_name]
        
        if datetime.now() < test_config.start_time:
            print(f"❌ 測試尚未到達開始時間")
            return False
        
        # 更新流量分配
        traffic_config = {
            test_config.control_version: 100 - test_config.traffic_split,
            test_config.treatment_version: test_config.traffic_split
        }
        
        self.version_manager.set_traffic_split(traffic_config)
        test_config.status = "running"
        
        print(f"🚀 A/B 測試 '{test_name}' 已啟動")
        return True
    
    def stop_test(self, test_name: str, reason: str = "Manual stop") -> bool:
        """停止 A/B 測試"""
        if test_name not in self.active_tests:
            print(f"❌ 測試 '{test_name}' 不存在")
            return False
        
        test_config = self.active_tests[test_name]
        test_config.status = "stopped"
        
        # 恢復到控制組版本
        self.version_manager.set_traffic_split({test_config.control_version: 100.0})
        
        print(f"⏹️  A/B 測試 '{test_name}' 已停止")
        print(f"   📝 原因: {reason}")
        return True
    
    def record_metrics(self, test_name: str, version: int, 
                      latency: float, success: bool):
        """記錄測試指標"""
        if test_name not in self.test_metrics:
            self.test_metrics[test_name] = []
        
        # 尋找或創建該版本的指標記錄
        current_time = datetime.now()
        
        # 找到當前分鐘的指標記錄
        minute_key = current_time.replace(second=0, microsecond=0)
        
        # 查找現有記錄
        existing_metric = None
        for metric in self.test_metrics[test_name]:
            if (metric.version == version and 
                metric.timestamp.replace(second=0, microsecond=0) == minute_key):
                existing_metric = metric
                break
        
        if existing_metric:
            # 更新現有記錄
            existing_metric.request_count += 1
            existing_metric.total_latency += latency
            if success:
                existing_metric.success_count += 1
            else:
                existing_metric.error_count += 1
        else:
            # 創建新記錄
            new_metric = TestMetrics(
                version=version,
                request_count=1,
                success_count=1 if success else 0,
                total_latency=latency,
                error_count=0 if success else 1,
                timestamp=current_time
            )
            self.test_metrics[test_name].append(new_metric)
    
    def get_test_summary(self, test_name: str) -> Dict:
        """獲取測試摘要"""
        if test_name not in self.active_tests:
            return {"error": "測試不存在"}
        
        test_config = self.active_tests[test_name]
        metrics = self.test_metrics.get(test_name, [])
        
        # 按版本分組統計
        control_metrics = [m for m in metrics if m.version == test_config.control_version]
        treatment_metrics = [m for m in metrics if m.version == test_config.treatment_version]
        
        def calculate_stats(metric_list):
            if not metric_list:
                return {
                    "requests": 0,
                    "success_rate": 0.0,
                    "avg_latency": 0.0,
                    "error_rate": 0.0
                }
            
            total_requests = sum(m.request_count for m in metric_list)
            total_success = sum(m.success_count for m in metric_list)
            total_latency = sum(m.total_latency for m in metric_list)
            total_errors = sum(m.error_count for m in metric_list)
            
            return {
                "requests": total_requests,
                "success_rate": (total_success / total_requests * 100) if total_requests > 0 else 0,
                "avg_latency": (total_latency / total_requests) if total_requests > 0 else 0,
                "error_rate": (total_errors / total_requests * 100) if total_requests > 0 else 0
            }
        
        control_stats = calculate_stats(control_metrics)
        treatment_stats = calculate_stats(treatment_metrics)
        
        return {
            "test_name": test_name,
            "status": test_config.status,
            "control_version": test_config.control_version,
            "treatment_version": test_config.treatment_version,
            "control_stats": control_stats,
            "treatment_stats": treatment_stats,
            "total_requests": control_stats["requests"] + treatment_stats["requests"],
            "test_duration": (datetime.now() - test_config.start_time).total_seconds() / 3600
        }


# 創建 A/B 測試管理器
ab_test_manager = ABTestManager(version_manager)
print("✅ A/B 測試管理器已創建")

### 2.2 創建和啟動 A/B 測試

In [None]:
# 創建 A/B 測試配置
ab_test_config = ABTestConfig(
    test_name="model_v2_vs_v3_performance",
    model_name="text_classifier",
    control_version=2,
    treatment_version=3,
    traffic_split=20.0,  # 20% 流量到新版本
    start_time=datetime.now(),
    end_time=datetime.now() + timedelta(hours=24),
    success_metrics=["latency", "accuracy", "error_rate"],
    min_sample_size=1000,
    significance_level=0.05,
    status="planned"
)

# 創建測試
ab_test_manager.create_test(ab_test_config)

# 啟動測試
ab_test_manager.start_test("model_v2_vs_v3_performance")

### 2.3 模擬測試數據收集

In [None]:
# 模擬請求和數據收集
def simulate_ab_test_traffic(ab_manager: ABTestManager, test_name: str, 
                           num_requests: int = 500):
    """模擬 A/B 測試流量"""
    if test_name not in ab_manager.active_tests:
        print(f"❌ 測試 '{test_name}' 不存在")
        return
    
    test_config = ab_manager.active_tests[test_name]
    print(f"📊 開始模擬 {num_requests} 個請求的 A/B 測試流量...")
    
    for i in range(num_requests):
        # 根據流量分配選擇版本
        if random.uniform(0, 100) < test_config.traffic_split:
            version = test_config.treatment_version
            # Treatment 版本通常有不同的性能特性
            base_latency = 80
            base_success_rate = 0.96
        else:
            version = test_config.control_version
            # Control 版本的基準性能
            base_latency = 100
            base_success_rate = 0.95
        
        # 模擬請求延遲（添加隨機變化）
        latency = base_latency + random.gauss(0, 20)
        latency = max(10, latency)  # 確保延遲為正數
        
        # 模擬成功率
        success = random.random() < base_success_rate
        
        # 記錄指標
        ab_manager.record_metrics(test_name, version, latency, success)
        
        # 每100個請求顯示進度
        if (i + 1) % 100 == 0:
            print(f"   📈 已處理 {i + 1}/{num_requests} 請求")
        
        # 模擬請求間隔
        time.sleep(0.001)
    
    print(f"✅ 模擬完成，共處理 {num_requests} 個請求")


# 執行模擬
simulate_ab_test_traffic(ab_test_manager, "model_v2_vs_v3_performance", 1000)

### 2.4 A/B 測試結果分析

In [None]:
# 顯示測試摘要
def display_ab_test_results(ab_manager: ABTestManager, test_name: str):
    """顯示 A/B 測試結果"""
    summary = ab_manager.get_test_summary(test_name)
    
    if "error" in summary:
        print(f"❌ {summary['error']}")
        return
    
    print(f"\n📊 A/B 測試結果報告: {test_name}")
    print("=" * 80)
    
    print(f"🔬 測試狀態: {summary['status'].upper()}")
    print(f"⏱️  測試時長: {summary['test_duration']:.1f} 小時")
    print(f"📈 總請求數: {summary['total_requests']}")
    
    print("\n🅰️  控制組 (Version {}):".format(summary['control_version']))
    control = summary['control_stats']
    print(f"   📊 請求數: {control['requests']}")
    print(f"   ✅ 成功率: {control['success_rate']:.2f}%")
    print(f"   ⏱️  平均延遲: {control['avg_latency']:.1f}ms")
    print(f"   ❌ 錯誤率: {control['error_rate']:.2f}%")
    
    print("\n🅱️  實驗組 (Version {}):".format(summary['treatment_version']))
    treatment = summary['treatment_stats']
    print(f"   📊 請求數: {treatment['requests']}")
    print(f"   ✅ 成功率: {treatment['success_rate']:.2f}%")
    print(f"   ⏱️  平均延遲: {treatment['avg_latency']:.1f}ms")
    print(f"   ❌ 錯誤率: {treatment['error_rate']:.2f}%")
    
    # 計算改進度
    if control['avg_latency'] > 0 and treatment['avg_latency'] > 0:
        latency_improvement = ((control['avg_latency'] - treatment['avg_latency']) / 
                              control['avg_latency']) * 100
        success_improvement = treatment['success_rate'] - control['success_rate']
        
        print("\n📈 性能對比:")
        print(f"   ⚡ 延遲改善: {latency_improvement:+.1f}%")
        print(f"   ✅ 成功率變化: {success_improvement:+.2f}%")
        
        # 簡單的統計顯著性判斷
        min_sample_size = 100
        if (control['requests'] >= min_sample_size and 
            treatment['requests'] >= min_sample_size):
            
            if abs(latency_improvement) > 5:
                significance = "顯著" if abs(latency_improvement) > 10 else "中等"
                print(f"   🔬 延遲差異: {significance}")
            
            if abs(success_improvement) > 1:
                significance = "顯著" if abs(success_improvement) > 2 else "中等"
                print(f"   🔬 成功率差異: {significance}")
        else:
            print(f"   ⚠️  樣本量不足，需要更多數據進行統計推斷")


# 顯示測試結果
display_ab_test_results(ab_test_manager, "model_v2_vs_v3_performance")

In [None]:
# 可視化 A/B 測試結果
def plot_ab_test_comparison(ab_manager: ABTestManager, test_name: str):
    """可視化 A/B 測試對比"""
    summary = ab_manager.get_test_summary(test_name)
    
    if "error" in summary:
        print(f"❌ {summary['error']}")
        return
    
    control = summary['control_stats']
    treatment = summary['treatment_stats']
    
    # 創建對比圖
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
    
    # 延遲對比
    versions = ['Control (V{})'.format(summary['control_version']), 
               'Treatment (V{})'.format(summary['treatment_version'])]
    latencies = [control['avg_latency'], treatment['avg_latency']]
    colors = ['#3498db', '#e74c3c']
    
    bars1 = ax1.bar(versions, latencies, color=colors, alpha=0.7)
    ax1.set_title('平均延遲對比 (ms)', fontsize=14, fontweight='bold')
    ax1.set_ylabel('延遲 (ms)')
    ax1.grid(True, alpha=0.3)
    
    for bar, val in zip(bars1, latencies):
        ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1,
                f'{val:.1f}', ha='center', va='bottom', fontweight='bold')
    
    # 成功率對比
    success_rates = [control['success_rate'], treatment['success_rate']]
    bars2 = ax2.bar(versions, success_rates, color=colors, alpha=0.7)
    ax2.set_title('成功率對比 (%)', fontsize=14, fontweight='bold')
    ax2.set_ylabel('成功率 (%)')
    ax2.set_ylim(90, 100)  # 聚焦在相關範圍
    ax2.grid(True, alpha=0.3)
    
    for bar, val in zip(bars2, success_rates):
        ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1,
                f'{val:.2f}%', ha='center', va='bottom', fontweight='bold')
    
    # 請求量分布
    request_counts = [control['requests'], treatment['requests']]
    bars3 = ax3.bar(versions, request_counts, color=colors, alpha=0.7)
    ax3.set_title('請求量分布', fontsize=14, fontweight='bold')
    ax3.set_ylabel('請求數')
    ax3.grid(True, alpha=0.3)
    
    for bar, val in zip(bars3, request_counts):
        ax3.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 5,
                f'{val}', ha='center', va='bottom', fontweight='bold')
    
    # 錯誤率對比
    error_rates = [control['error_rate'], treatment['error_rate']]
    bars4 = ax4.bar(versions, error_rates, color=['#e67e22', '#e67e22'], alpha=0.7)
    ax4.set_title('錯誤率對比 (%)', fontsize=14, fontweight='bold')
    ax4.set_ylabel('錯誤率 (%)')
    ax4.grid(True, alpha=0.3)
    
    for bar, val in zip(bars4, error_rates):
        ax4.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.05,
                f'{val:.2f}%', ha='center', va='bottom', fontweight='bold')
    
    plt.suptitle(f'A/B 測試結果對比: {test_name}', fontsize=16, fontweight='bold')
    plt.tight_layout()
    plt.show()


plot_ab_test_comparison(ab_test_manager, "model_v2_vs_v3_performance")

## 🎯 實驗 3：漸進式部署 (Canary Deployment)

### 3.1 Canary 部署管理器

In [None]:
@dataclass
class CanaryConfig:
    """Canary 部署配置"""
    deployment_name: str
    model_name: str
    stable_version: int
    canary_version: int
    initial_traffic: float
    target_traffic: float
    increment_step: float
    step_duration: int  # 分鐘
    success_threshold: Dict[str, float]
    rollback_threshold: Dict[str, float]
    auto_promote: bool


class CanaryDeploymentManager:
    """Canary 部署管理器"""
    
    def __init__(self, version_manager: ModelVersionManager):
        self.version_manager = version_manager
        self.active_deployments: Dict[str, CanaryConfig] = {}
        self.deployment_metrics: Dict[str, List] = {}
        self.deployment_history: Dict[str, List] = {}
    
    def create_canary_deployment(self, config: CanaryConfig) -> bool:
        """創建 Canary 部署"""
        try:
            # 驗證版本
            stable_version = self.version_manager.get_version_info(config.stable_version)
            canary_version = self.version_manager.get_version_info(config.canary_version)
            
            if not stable_version or not canary_version:
                raise ValueError("指定的版本不存在")
            
            # 設置初始流量分配
            initial_traffic_config = {
                config.stable_version: 100 - config.initial_traffic,
                config.canary_version: config.initial_traffic
            }
            
            self.version_manager.set_traffic_split(initial_traffic_config)
            self.active_deployments[config.deployment_name] = config
            self.deployment_metrics[config.deployment_name] = []
            self.deployment_history[config.deployment_name] = []
            
            # 記錄初始狀態
            self.deployment_history[config.deployment_name].append({
                "timestamp": datetime.now(),
                "action": "deployment_started",
                "canary_traffic": config.initial_traffic,
                "status": "active"
            })
            
            print(f"🚀 Canary 部署 '{config.deployment_name}' 已創建")
            print(f"   📊 穩定版本: V{config.stable_version} ({100-config.initial_traffic:.1f}%)")
            print(f"   🐤 Canary 版本: V{config.canary_version} ({config.initial_traffic:.1f}%)")
            
            return True
            
        except Exception as e:
            print(f"❌ Canary 部署創建失敗: {str(e)}")
            return False
    
    def monitor_and_scale(self, deployment_name: str, 
                         current_metrics: Dict[str, float]) -> str:
        """監控並自動調整流量"""
        if deployment_name not in self.active_deployments:
            return "deployment_not_found"
        
        config = self.active_deployments[deployment_name]
        current_traffic = self.version_manager.get_version_info(
            config.canary_version
        ).traffic_percentage
        
        # 檢查回滾條件
        for metric, threshold in config.rollback_threshold.items():
            if metric in current_metrics:
                if metric == "error_rate" and current_metrics[metric] > threshold:
                    return self._rollback_deployment(deployment_name, 
                                                   f"High {metric}: {current_metrics[metric]:.3f}")
                elif metric == "latency" and current_metrics[metric] > threshold:
                    return self._rollback_deployment(deployment_name, 
                                                   f"High {metric}: {current_metrics[metric]:.1f}ms")
        
        # 檢查成功條件
        success_criteria_met = True
        for metric, threshold in config.success_threshold.items():
            if metric in current_metrics:
                if metric == "error_rate" and current_metrics[metric] > threshold:
                    success_criteria_met = False
                elif metric == "latency" and current_metrics[metric] > threshold:
                    success_criteria_met = False
        
        # 如果成功條件滿足，增加流量
        if success_criteria_met and current_traffic < config.target_traffic:
            new_traffic = min(current_traffic + config.increment_step, 
                            config.target_traffic)
            
            new_traffic_config = {
                config.stable_version: 100 - new_traffic,
                config.canary_version: new_traffic
            }
            
            self.version_manager.set_traffic_split(new_traffic_config)
            
            # 記錄歷史
            self.deployment_history[deployment_name].append({
                "timestamp": datetime.now(),
                "action": "traffic_increased",
                "canary_traffic": new_traffic,
                "metrics": current_metrics.copy(),
                "status": "scaling"
            })
            
            print(f"📈 Canary 流量增加到 {new_traffic:.1f}%")
            
            # 檢查是否達到目標
            if new_traffic >= config.target_traffic and config.auto_promote:
                return self._promote_canary(deployment_name)
            
            return "traffic_increased"
        
        return "stable"
    
    def _rollback_deployment(self, deployment_name: str, reason: str) -> str:
        """回滾部署"""
        config = self.active_deployments[deployment_name]
        
        # 恢復到穩定版本
        self.version_manager.set_traffic_split({config.stable_version: 100.0})
        
        # 記錄回滾
        self.deployment_history[deployment_name].append({
            "timestamp": datetime.now(),
            "action": "rollback",
            "reason": reason,
            "canary_traffic": 0.0,
            "status": "rolled_back"
        })
        
        print(f"🔙 Canary 部署已回滾: {reason}")
        return "rolled_back"
    
    def _promote_canary(self, deployment_name: str) -> str:
        """提升 Canary 為穩定版本"""
        config = self.active_deployments[deployment_name]
        
        # 將 Canary 版本設為 100% 流量
        self.version_manager.set_traffic_split({config.canary_version: 100.0})
        
        # 記錄提升
        self.deployment_history[deployment_name].append({
            "timestamp": datetime.now(),
            "action": "promoted",
            "canary_traffic": 100.0,
            "status": "completed"
        })
        
        print(f"🎉 Canary 版本已提升為穩定版本")
        return "promoted"
    
    def get_deployment_status(self, deployment_name: str) -> Dict:
        """獲取部署狀態"""
        if deployment_name not in self.active_deployments:
            return {"error": "部署不存在"}
        
        config = self.active_deployments[deployment_name]
        history = self.deployment_history.get(deployment_name, [])
        
        current_canary_traffic = self.version_manager.get_version_info(
            config.canary_version
        ).traffic_percentage
        
        latest_status = "unknown"
        if history:
            latest_status = history[-1]["status"]
        
        return {
            "deployment_name": deployment_name,
            "stable_version": config.stable_version,
            "canary_version": config.canary_version,
            "current_canary_traffic": current_canary_traffic,
            "target_traffic": config.target_traffic,
            "status": latest_status,
            "steps_completed": len(history),
            "progress": (current_canary_traffic / config.target_traffic) * 100
        }


# 創建 Canary 部署管理器
canary_manager = CanaryDeploymentManager(version_manager)
print("✅ Canary 部署管理器已創建")

### 3.2 啟動 Canary 部署

In [None]:
# 創建 Canary 部署配置
canary_config = CanaryConfig(
    deployment_name="v3_canary_rollout",
    model_name="text_classifier",
    stable_version=2,
    canary_version=3,
    initial_traffic=5.0,
    target_traffic=100.0,
    increment_step=15.0,
    step_duration=5,  # 5分鐘
    success_threshold={
        "error_rate": 0.05,  # 5% 以下
        "latency": 120.0     # 120ms 以下
    },
    rollback_threshold={
        "error_rate": 0.10,  # 10% 以上回滾
        "latency": 200.0     # 200ms 以上回滾
    },
    auto_promote=True
)

# 創建並啟動 Canary 部署
canary_manager.create_canary_deployment(canary_config)

### 3.3 模擬 Canary 部署過程

In [None]:
# 模擬 Canary 部署的監控和自動調整過程
def simulate_canary_deployment(canary_manager: CanaryDeploymentManager, 
                             deployment_name: str, steps: int = 6):
    """模擬 Canary 部署過程"""
    print(f"🐤 開始模擬 Canary 部署: {deployment_name}")
    
    for step in range(steps):
        print(f"\n--- 步驟 {step + 1}/{steps} ---")
        
        # 模擬當前性能指標（逐步改善）
        base_error_rate = 0.03 + random.uniform(-0.01, 0.02)
        base_latency = 85 + random.uniform(-15, 25)
        
        # 偶爾模擬性能問題
        if step == 3 and random.random() < 0.3:  # 30% 機率在第3步出現問題
            base_error_rate = 0.12  # 觸發回滾
            base_latency = 220
            print("⚠️  檢測到性能問題...")
        
        current_metrics = {
            "error_rate": base_error_rate,
            "latency": base_latency,
            "throughput": random.uniform(800, 1200)
        }
        
        print(f"📊 當前指標:")
        print(f"   錯誤率: {current_metrics['error_rate']:.3f}")
        print(f"   延遲: {current_metrics['latency']:.1f}ms")
        print(f"   吞吐量: {current_metrics['throughput']:.0f} QPS")
        
        # 監控並調整
        result = canary_manager.monitor_and_scale(deployment_name, current_metrics)
        
        # 顯示部署狀態
        status = canary_manager.get_deployment_status(deployment_name)
        if "error" not in status:
            print(f"🎯 當前進度: {status['progress']:.1f}% 完成")
            print(f"📈 Canary 流量: {status['current_canary_traffic']:.1f}%")
        
        # 檢查部署結果
        if result == "rolled_back":
            print("🔙 部署已回滾，停止模擬")
            break
        elif result == "promoted":
            print("🎉 Canary 版本已提升，部署完成")
            break
        
        # 模擬時間間隔
        time.sleep(1)
    
    print(f"\n🏁 Canary 部署模擬完成")


# 執行模擬
simulate_canary_deployment(canary_manager, "v3_canary_rollout", 8)

### 3.4 Canary 部署歷史分析

In [None]:
# 可視化 Canary 部署歷史
def plot_canary_deployment_history(canary_manager: CanaryDeploymentManager, 
                                 deployment_name: str):
    """可視化 Canary 部署歷史"""
    if deployment_name not in canary_manager.deployment_history:
        print(f"❌ 部署 '{deployment_name}' 的歷史記錄不存在")
        return
    
    history = canary_manager.deployment_history[deployment_name]
    
    if not history:
        print("❌ 沒有歷史數據可供分析")
        return
    
    # 準備數據
    timestamps = []
    traffic_percentages = []
    actions = []
    
    for record in history:
        timestamps.append(record["timestamp"])
        traffic_percentages.append(record["canary_traffic"])
        actions.append(record["action"])
    
    # 創建圖表
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
    
    # 流量變化趨勢
    ax1.plot(timestamps, traffic_percentages, marker='o', linewidth=2, 
            markersize=8, color='#3498db')
    ax1.set_title(f'Canary 部署流量變化: {deployment_name}', 
                 fontsize=14, fontweight='bold')
    ax1.set_ylabel('Canary 流量 (%)', fontsize=12)
    ax1.grid(True, alpha=0.3)
    ax1.set_ylim(0, 105)
    
    # 標記關鍵事件
    for i, (ts, traffic, action) in enumerate(zip(timestamps, traffic_percentages, actions)):
        if action == "rollback":
            ax1.annotate('回滾', xy=(ts, traffic), xytext=(ts, traffic + 10),
                        arrowprops=dict(arrowstyle='->', color='red'),
                        color='red', fontweight='bold')
        elif action == "promoted":
            ax1.annotate('提升', xy=(ts, traffic), xytext=(ts, traffic - 10),
                        arrowprops=dict(arrowstyle='->', color='green'),
                        color='green', fontweight='bold')
    
    # 事件時間線
    action_colors = {
        'deployment_started': '#3498db',
        'traffic_increased': '#2ecc71',
        'rollback': '#e74c3c',
        'promoted': '#f39c12'
    }
    
    for i, action in enumerate(actions):
        color = action_colors.get(action, '#95a5a6')
        ax2.barh(i, 1, color=color, alpha=0.7)
        ax2.text(0.5, i, action.replace('_', ' ').title(), 
                ha='center', va='center', fontweight='bold')
    
    ax2.set_title('部署事件時間線', fontsize=14, fontweight='bold')
    ax2.set_xlabel('時間進度', fontsize=12)
    ax2.set_yticks(range(len(actions)))
    ax2.set_yticklabels([f'{i+1}' for i in range(len(actions))])
    ax2.set_xlim(0, 1)
    
    plt.tight_layout()
    plt.show()
    
    # 顯示摘要統計
    print(f"\n📊 Canary 部署摘要: {deployment_name}")
    print("=" * 50)
    print(f"📅 開始時間: {timestamps[0].strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"📅 結束時間: {timestamps[-1].strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"⏱️  總耗時: {(timestamps[-1] - timestamps[0]).total_seconds():.1f} 秒")
    print(f"🔄 步驟數: {len(history)}")
    print(f"🎯 最終狀態: {history[-1]['status']}")
    print(f"📈 最終流量: {history[-1]['canary_traffic']:.1f}%")


# 可視化部署歷史
plot_canary_deployment_history(canary_manager, "v3_canary_rollout")

## 🎯 實驗 4：配置導出和持久化

In [None]:
# 導出完整的實驗配置和結果
def export_experiment_results(version_manager, ab_test_manager, canary_manager):
    """導出實驗結果"""
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    # 版本管理配置
    version_config_file = f"{EXPERIMENT_DIR}/configs/version_config_{timestamp}.json"
    version_manager.export_config(version_config_file)
    
    # A/B 測試結果
    ab_results = {}
    for test_name in ab_test_manager.active_tests.keys():
        ab_results[test_name] = ab_test_manager.get_test_summary(test_name)
    
    ab_results_file = f"{EXPERIMENT_DIR}/configs/ab_test_results_{timestamp}.json"
    with open(ab_results_file, 'w') as f:
        json.dump(ab_results, f, indent=2, default=str)
    
    # Canary 部署結果
    canary_results = {}
    for deployment_name in canary_manager.active_deployments.keys():
        canary_results[deployment_name] = {
            "status": canary_manager.get_deployment_status(deployment_name),
            "history": canary_manager.deployment_history.get(deployment_name, [])
        }
    
    canary_results_file = f"{EXPERIMENT_DIR}/configs/canary_results_{timestamp}.json"
    with open(canary_results_file, 'w') as f:
        json.dump(canary_results, f, indent=2, default=str)
    
    # 創建綜合報告
    comprehensive_report = {
        "experiment_timestamp": timestamp,
        "model_name": version_manager.model_name,
        "total_versions": len(version_manager.versions),
        "ab_tests_count": len(ab_test_manager.active_tests),
        "canary_deployments_count": len(canary_manager.active_deployments),
        "files_generated": {
            "version_config": version_config_file,
            "ab_test_results": ab_results_file,
            "canary_results": canary_results_file
        },
        "summary": {
            "current_traffic_split": version_manager.traffic_rules,
            "active_ab_tests": list(ab_test_manager.active_tests.keys()),
            "active_canary_deployments": list(canary_manager.active_deployments.keys())
        }
    }
    
    report_file = f"{EXPERIMENT_DIR}/experiment_report_{timestamp}.json"
    with open(report_file, 'w') as f:
        json.dump(comprehensive_report, f, indent=2)
    
    print(f"📊 實驗結果已導出:")
    print(f"   📄 綜合報告: {report_file}")
    print(f"   ⚙️  版本配置: {version_config_file}")
    print(f"   🧪 A/B 測試: {ab_results_file}")
    print(f"   🐤 Canary 部署: {canary_results_file}")


# 導出實驗結果
export_experiment_results(version_manager, ab_test_manager, canary_manager)

## 📊 最佳實踐總結

In [None]:
# 最佳實踐指南
best_practices = """
🎯 企業級模型版本管理與 A/B 測試最佳實踐

📋 版本管理策略:
   ✅ 語義化版本控制 (Semantic Versioning)
   ✅ 完整的版本元數據記錄
   ✅ 自動化版本發現和註冊
   ✅ 版本間相容性檢查

🧪 A/B 測試設計原則:
   ✅ 明確定義成功指標
   ✅ 適當的樣本量計算
   ✅ 統計顯著性驗證
   ✅ 多維度性能評估

🐤 Canary 部署策略:
   ✅ 漸進式流量增加
   ✅ 實時監控和自動回滾
   ✅ 多層級健康檢查
   ✅ 業務指標持續監控

⚠️ 風險控制措施:
   ✅ 快速回滾機制
   ✅ 多級告警系統
   ✅ 業務影響評估
   ✅ 災難恢復預案

📈 監控和可觀測性:
   ✅ 全面的指標收集
   ✅ 即時性能儀表板
   ✅ 異常檢測和告警
   ✅ 歷史趨勢分析

🔧 運維自動化:
   ✅ CI/CD 管道整合
   ✅ 自動化測試流程
   ✅ 智能決策引擎
   ✅ 自動化故障恢復

💡 成功要素:
   🎯 業務目標對齊
   📊 數據驅動決策
   🚀 快速迭代能力
   🛡️ 風險控制意識
   👥 跨團隊協作
"""

print(best_practices)

## 📖 總結

本實驗完成了企業級模型版本管理與 A/B 測試的完整實現：

### 🎯 實驗成果
1. **版本管理系統** - 實現了完整的模型版本生命週期管理
2. **A/B 測試框架** - 構建了自動化的實驗設計和分析系統
3. **Canary 部署** - 開發了漸進式部署和自動化回滾機制
4. **監控和可視化** - 提供了全面的性能分析和決策支持工具

### 🔧 關鍵技術點
- 企業級版本管理策略
- 統計學驅動的 A/B 測試
- 智能化的 Canary 部署
- 實時監控和自動化決策

### 🚀 實際應用價值
1. **降低部署風險** - 通過漸進式部署減少生產事故
2. **提升決策品質** - 基於數據的科學決策流程
3. **加快迭代速度** - 自動化流程提升部署效率
4. **增強系統穩定性** - 多重保障機制確保服務可用性

### 💡 學習要點
- 企業級部署需要考慮風險控制和業務連續性
- A/B 測試需要統計學基礎和業務理解
- Canary 部署是平衡創新和穩定的有效策略
- 監控和可觀測性是成功部署的關鍵

---

**🎉 恭喜完成 Lab 2.4.1！**

您已經掌握了企業級模型版本管理和 A/B 測試的核心技術，可以構建安全、可靠的模型部署流程。