# Lab-2.1.2: Model Repository 設計與配置

## 🎯 學習目標

1. **理解 Triton Model Repository 架構**
   - 模型倉庫的目錄結構設計
   - 配置文件 (`config.pbtxt`) 的詳細設定
   - 版本控制和模型生命週期管理

2. **掌握企業級模型管理最佳實踐**
   - 模型命名規範和組織策略
   - 多版本模型共存和切換
   - 動態模型載入和卸載

3. **實現完整的模型部署流程**
   - 從 HuggingFace Hub 下載和轉換模型
   - 配置 PyTorch Backend 模型
   - 驗證模型部署和推理功能

## 📋 企業案例背景

**場景**: Netflix 推薦系統需要管理 20+ 個不同的 ML 模型：
- 用戶行為預測模型 (BERT-based)
- 內容相似度模型 (Sentence Transformers)
- 個性化排序模型 (Custom PyTorch)
- A/B 測試模型版本管理

**挑戰**: 如何設計可擴展的模型倉庫架構，支援動態模型更新而不影響服務可用性？

---

## 1. Model Repository 基礎架構

### 1.1 標準目錄結構設計

In [None]:
import os
import json
import shutil
from pathlib import Path
from typing import Dict, List, Any
import logging

# 設置日誌
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# 定義模型倉庫根目錄
MODEL_REPOSITORY_ROOT = Path("/tmp/triton_model_repository")

def create_model_repository_structure():
    """
    創建標準的 Triton Model Repository 目錄結構
    
    標準結構:
    model_repository/
    ├── model_name_1/
    │   ├── config.pbtxt
    │   ├── 1/
    │   │   └── model.pt
    │   ├── 2/
    │   │   └── model.pt
    │   └── labels.txt (可選)
    └── model_name_2/
        ├── config.pbtxt
        └── 1/
            └── model_files...
    """
    
    # 創建根目錄
    MODEL_REPOSITORY_ROOT.mkdir(parents=True, exist_ok=True)
    
    # 企業級模型分類目錄
    model_categories = {
        "nlp_models": ["sentiment_analysis", "text_classification", "ner_model"],
        "recommendation": ["user_behavior", "content_similarity", "ranking_model"],
        "cv_models": ["image_classification", "object_detection"],
        "custom_models": ["business_logic", "feature_extraction"]
    }
    
    print("🏗️  創建 Triton Model Repository 結構...")
    print(f"📂 根目錄: {MODEL_REPOSITORY_ROOT}")
    print()
    
    for category, models in model_categories.items():
        category_path = MODEL_REPOSITORY_ROOT / category
        category_path.mkdir(exist_ok=True)
        print(f"📁 {category}/")
        
        for model_name in models:
            model_path = MODEL_REPOSITORY_ROOT / model_name
            model_path.mkdir(exist_ok=True)
            
            # 創建版本目錄 (1, 2)
            for version in [1, 2]:
                version_path = model_path / str(version)
                version_path.mkdir(exist_ok=True)
            
            print(f"   └── {model_name}/")
            print(f"       ├── config.pbtxt")
            print(f"       ├── 1/")
            print(f"       └── 2/")
    
    return MODEL_REPOSITORY_ROOT

# 執行目錄結構創建
repo_root = create_model_repository_structure()

### 1.2 企業級模型命名規範

In [None]:
class ModelNamingConvention:
    """
    企業級模型命名規範管理器
    
    命名規範: {business_unit}_{model_type}_{version}_{environment}
    例如: netflix_recommendation_v2_prod
    """
    
    BUSINESS_UNITS = ["netflix", "paypal", "visa", "general"]
    MODEL_TYPES = ["nlp", "cv", "recommendation", "risk", "classification"]
    ENVIRONMENTS = ["dev", "staging", "prod"]
    
    @classmethod
    def generate_model_name(cls, business_unit: str, model_type: str, 
                          version: str, environment: str = "prod") -> str:
        """
        生成符合企業規範的模型名稱
        """
        if business_unit not in cls.BUSINESS_UNITS:
            raise ValueError(f"Business unit must be one of {cls.BUSINESS_UNITS}")
        if model_type not in cls.MODEL_TYPES:
            raise ValueError(f"Model type must be one of {cls.MODEL_TYPES}")
        if environment not in cls.ENVIRONMENTS:
            raise ValueError(f"Environment must be one of {cls.ENVIRONMENTS}")
            
        return f"{business_unit}_{model_type}_{version}_{environment}"
    
    @classmethod
    def parse_model_name(cls, model_name: str) -> Dict[str, str]:
        """
        解析模型名稱，提取元數據
        """
        parts = model_name.split("_")
        if len(parts) != 4:
            raise ValueError(f"Invalid model name format: {model_name}")
            
        return {
            "business_unit": parts[0],
            "model_type": parts[1],
            "version": parts[2],
            "environment": parts[3]
        }
    
    @classmethod
    def validate_model_name(cls, model_name: str) -> bool:
        """
        驗證模型名稱是否符合規範
        """
        try:
            metadata = cls.parse_model_name(model_name)
            return (
                metadata["business_unit"] in cls.BUSINESS_UNITS and
                metadata["model_type"] in cls.MODEL_TYPES and
                metadata["environment"] in cls.ENVIRONMENTS
            )
        except ValueError:
            return False

# 示例：企業級模型命名
print("🏷️  企業級模型命名規範示例:")
print()

example_models = [
    ModelNamingConvention.generate_model_name("netflix", "recommendation", "v2", "prod"),
    ModelNamingConvention.generate_model_name("paypal", "risk", "v1", "staging"),
    ModelNamingConvention.generate_model_name("visa", "classification", "v3", "prod"),
    ModelNamingConvention.generate_model_name("general", "nlp", "v1", "dev")
]

for model_name in example_models:
    metadata = ModelNamingConvention.parse_model_name(model_name)
    valid = ModelNamingConvention.validate_model_name(model_name)
    
    print(f"📋 {model_name}")
    print(f"   業務單位: {metadata['business_unit']}")
    print(f"   模型類型: {metadata['model_type']}")
    print(f"   版本: {metadata['version']}")
    print(f"   環境: {metadata['environment']}")
    print(f"   有效性: {'✅' if valid else '❌'}")
    print()

## 2. Config.pbtxt 配置深度解析

### 2.1 基礎配置模板

In [None]:
class TritonConfigGenerator:
    """
    Triton Model Configuration 生成器
    支援多種 Backend 和複雜配置場景
    """
    
    @staticmethod
    def generate_pytorch_config(
        model_name: str,
        max_batch_size: int = 8,
        input_specs: List[Dict] = None,
        output_specs: List[Dict] = None,
        instance_group: Dict = None,
        dynamic_batching: Dict = None,
        optimization: Dict = None
    ) -> str:
        """
        生成 PyTorch Backend 的配置文件
        """
        
        # 默認輸入輸出規格 (BERT-like model)
        if input_specs is None:
            input_specs = [
                {
                    "name": "input_ids",
                    "data_type": "TYPE_INT64",
                    "dims": [-1]  # 可變長度序列
                },
                {
                    "name": "attention_mask",
                    "data_type": "TYPE_INT64",
                    "dims": [-1]
                }
            ]
        
        if output_specs is None:
            output_specs = [
                {
                    "name": "logits",
                    "data_type": "TYPE_FP32",
                    "dims": [2]  # 二分類輸出
                }
            ]
        
        # 默認實例組配置
        if instance_group is None:
            instance_group = {
                "count": 1,
                "kind": "KIND_GPU",
                "gpus": [0]
            }
        
        # 默認動態批處理配置
        if dynamic_batching is None:
            dynamic_batching = {
                "enabled": True,
                "max_queue_delay_microseconds": 100,
                "preferred_batch_size": [4, 8]
            }
        
        # 生成配置內容
        config_lines = [
            f'name: "{model_name}"',
            'backend: "pytorch"',
            f'max_batch_size: {max_batch_size}',
            ''
        ]
        
        # 輸入配置
        for input_spec in input_specs:
            config_lines.extend([
                'input [',
                '  {',
                f'    name: "{input_spec["name"]}"',
                f'    data_type: {input_spec["data_type"]}',
                f'    dims: {input_spec["dims"]}',
                '  }',
                ']',
                ''
            ])
        
        # 輸出配置
        for output_spec in output_specs:
            config_lines.extend([
                'output [',
                '  {',
                f'    name: "{output_spec["name"]}"',
                f'    data_type: {output_spec["data_type"]}',
                f'    dims: {output_spec["dims"]}',
                '  }',
                ']',
                ''
            ])
        
        # 實例組配置
        config_lines.extend([
            'instance_group [',
            '  {',
            f'    count: {instance_group["count"]}',
            f'    kind: {instance_group["kind"]}',
        ])
        
        if "gpus" in instance_group:
            gpu_list = ", ".join(map(str, instance_group["gpus"]))
            config_lines.append(f'    gpus: [ {gpu_list} ]')
        
        config_lines.extend([
            '  }',
            ']',
            ''
        ])
        
        # 動態批處理配置
        if dynamic_batching["enabled"]:
            config_lines.extend([
                'dynamic_batching {',
                f'  max_queue_delay_microseconds: {dynamic_batching["max_queue_delay_microseconds"]}',
            ])
            
            if "preferred_batch_size" in dynamic_batching:
                batch_sizes = ", ".join(map(str, dynamic_batching["preferred_batch_size"]))
                config_lines.append(f'  preferred_batch_size: [ {batch_sizes} ]')
            
            config_lines.extend([
                '}',
                ''
            ])
        
        # 優化配置 (可選)
        if optimization:
            config_lines.extend([
                'optimization {',
                f'  execution_accelerators {{',
                f'    gpu_execution_accelerator : [ {{',
                f'      name : "tensorrt"',
                f'      parameters {{ key: "precision_mode" value: "FP16" }}',
                f'    }} ]',
                f'  }}',
                '}'
            ])
        
        return '\n'.join(config_lines)

# 生成企業級模型配置示例
print("⚙️  生成企業級 Triton 模型配置...")
print()

# Netflix 推薦系統模型配置
netflix_config = TritonConfigGenerator.generate_pytorch_config(
    model_name="netflix_recommendation_v2_prod",
    max_batch_size=16,
    input_specs=[
        {"name": "user_features", "data_type": "TYPE_FP32", "dims": [128]},
        {"name": "item_features", "data_type": "TYPE_FP32", "dims": [256]}
    ],
    output_specs=[
        {"name": "recommendation_scores", "data_type": "TYPE_FP32", "dims": [100]}
    ],
    instance_group={
        "count": 2,  # 雙實例提高吞吐量
        "kind": "KIND_GPU",
        "gpus": [0, 1]
    },
    dynamic_batching={
        "enabled": True,
        "max_queue_delay_microseconds": 50,  # 低延遲要求
        "preferred_batch_size": [4, 8, 16]
    }
)

print("📄 Netflix 推薦系統模型配置:")
print("```")
print(netflix_config)
print("```")
print()

### 2.2 高級配置場景

In [None]:
# PayPal 風控模型配置 (高安全性需求)
paypal_config = TritonConfigGenerator.generate_pytorch_config(
    model_name="paypal_risk_v1_prod",
    max_batch_size=32,  # 高吞吐量批處理
    input_specs=[
        {"name": "transaction_features", "data_type": "TYPE_FP32", "dims": [50]},
        {"name": "user_profile", "data_type": "TYPE_FP32", "dims": [30]},
        {"name": "merchant_info", "data_type": "TYPE_FP32", "dims": [20]}
    ],
    output_specs=[
        {"name": "risk_score", "data_type": "TYPE_FP32", "dims": [1]},
        {"name": "fraud_probability", "data_type": "TYPE_FP32", "dims": [1]}
    ],
    instance_group={
        "count": 4,  # 高可用性多實例
        "kind": "KIND_GPU",
        "gpus": [0, 1, 2, 3]
    },
    dynamic_batching={
        "enabled": True,
        "max_queue_delay_microseconds": 10,  # 極低延遲 (10微秒)
        "preferred_batch_size": [8, 16, 32]
    },
    optimization={
        "tensorrt_fp16": True  # 啟用 TensorRT FP16 優化
    }
)

print("💳 PayPal 風控模型配置 (高安全性):")
print("```")
print(paypal_config)
print("```")
print()

# 保存配置文件到模型倉庫
def save_model_config(model_name: str, config_content: str):
    """
    保存模型配置到對應的模型目錄
    """
    model_path = MODEL_REPOSITORY_ROOT / model_name
    model_path.mkdir(exist_ok=True)
    
    config_path = model_path / "config.pbtxt"
    with open(config_path, 'w', encoding='utf-8') as f:
        f.write(config_content)
    
    logger.info(f"✅ 配置文件已保存: {config_path}")
    return config_path

# 保存企業級模型配置
netflix_config_path = save_model_config("netflix_recommendation_v2_prod", netflix_config)
paypal_config_path = save_model_config("paypal_risk_v1_prod", paypal_config)

## 3. 實際模型部署實踐

### 3.1 從 HuggingFace 下載和準備模型

In [None]:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel, AutoConfig
import numpy as np

class TritonModelDeployer:
    """
    Triton 模型部署器 - 處理從 HuggingFace 到 Triton 的完整部署流程
    """
    
    def __init__(self, model_repository_root: Path):
        self.model_repository_root = model_repository_root
    
    def deploy_huggingface_model(
        self, 
        model_name_or_path: str,
        triton_model_name: str,
        model_version: int = 1,
        task_type: str = "classification"
    ):
        """
        部署 HuggingFace 模型到 Triton Model Repository
        
        Args:
            model_name_or_path: HuggingFace 模型名稱或路徑
            triton_model_name: Triton 中的模型名稱
            model_version: 模型版本號
            task_type: 任務類型 (classification, regression, generation)
        """
        
        print(f"🚀 開始部署模型: {model_name_or_path} -> {triton_model_name}")
        
        # 1. 創建模型目錄結構
        model_path = self.model_repository_root / triton_model_name
        version_path = model_path / str(model_version)
        version_path.mkdir(parents=True, exist_ok=True)
        
        try:
            # 2. 下載 HuggingFace 模型
            print(f"📥 下載模型: {model_name_or_path}")
            
            # 模擬下載過程 (在實際環境中會真正下載)
            print("   ├── 下載 tokenizer...")
            # tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
            
            print("   ├── 下載模型配置...")
            # config = AutoConfig.from_pretrained(model_name_or_path)
            
            print("   └── 下載模型權重...")
            # model = AutoModel.from_pretrained(model_name_or_path)
            
            # 3. 創建 Triton 兼容的模型包裝器
            wrapper_code = self._generate_model_wrapper(task_type, triton_model_name)
            wrapper_path = version_path / "model.py"
            
            with open(wrapper_path, 'w', encoding='utf-8') as f:
                f.write(wrapper_code)
            
            print(f"📝 生成模型包裝器: {wrapper_path}")
            
            # 4. 創建模型配置文件
            config_content = self._generate_model_config(triton_model_name, task_type)
            config_path = model_path / "config.pbtxt"
            
            with open(config_path, 'w', encoding='utf-8') as f:
                f.write(config_content)
            
            print(f"⚙️  生成配置文件: {config_path}")
            
            # 5. 創建模型元數據文件
            metadata = {
                "model_name": triton_model_name,
                "version": model_version,
                "source_model": model_name_or_path,
                "task_type": task_type,
                "deployment_date": "2024-10-09",
                "backend": "pytorch"
            }
            
            metadata_path = model_path / "metadata.json"
            with open(metadata_path, 'w', encoding='utf-8') as f:
                json.dump(metadata, f, indent=2, ensure_ascii=False)
            
            print(f"📋 生成元數據文件: {metadata_path}")
            
            print(f"✅ 模型部署完成: {triton_model_name}")
            
            return {
                "model_path": str(model_path),
                "version_path": str(version_path),
                "config_path": str(config_path),
                "metadata": metadata
            }
            
        except Exception as e:
            print(f"❌ 模型部署失敗: {str(e)}")
            # 清理失敗的部署
            if model_path.exists():
                shutil.rmtree(model_path)
            raise
    
    def _generate_model_wrapper(self, task_type: str, model_name: str) -> str:
        """
        生成 Triton PyTorch Backend 模型包裝器代碼
        """
        
        if task_type == "classification":
            return f'''
import torch
import torch.nn as nn
import triton_python_backend_utils as pb_utils
import numpy as np
import json
from transformers import AutoTokenizer, AutoModel, AutoConfig

class TritonPythonModel:
    """
    Triton 分類模型包裝器 - {model_name}
    
    支援企業級特性:
    - 批量推理優化
    - 錯誤處理和日誌記錄
    - 性能監控集成
    """
    
    def initialize(self, args):
        """
        初始化模型 - 在模型載入時執行一次
        """
        # 獲取模型配置
        self.model_config = model_config = json.loads(args[\'model_config\'])
        
        # 設置輸出配置
        output0_config = pb_utils.get_output_config_by_name(
            model_config, "logits"
        )
        self.output0_dtype = pb_utils.triton_string_to_numpy(
            output0_config[\'data_type\'] 
        )
        
        # 載入模型 (在實際部署中載入真實模型)
        print(f"🔄 載入模型: {model_name}")
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        
        # 模擬模型載入
        # self.model = AutoModel.from_pretrained(model_path)
        # self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        # self.model.to(self.device)
        # self.model.eval()
        
        print(f"✅ 模型載入完成，設備: {{self.device}}")
    
    def execute(self, requests):
        """
        執行推理 - 處理批量請求
        """
        responses = []
        
        for request in requests:
            # 獲取輸入數據
            input_ids = pb_utils.get_input_tensor_by_name(
                request, "input_ids"
            ).as_numpy()
            
            attention_mask = pb_utils.get_input_tensor_by_name(
                request, "attention_mask"
            ).as_numpy()
            
            # 模擬推理過程
            batch_size = input_ids.shape[0]
            
            # 在實際部署中執行真實推理
            # with torch.no_grad():
            #     outputs = self.model(
            #         input_ids=torch.tensor(input_ids).to(self.device),
            #         attention_mask=torch.tensor(attention_mask).to(self.device)
            #     )
            #     logits = outputs.logits
            
            # 模擬輸出 (2分類)
            logits = np.random.rand(batch_size, 2).astype(self.output0_dtype)
            
            # 創建輸出張量
            output_tensor = pb_utils.Tensor("logits", logits)
            
            # 創建響應
            inference_response = pb_utils.InferenceResponse(
                output_tensors=[output_tensor]
            )
            responses.append(inference_response)
        
        return responses
    
    def finalize(self):
        """
        清理資源 - 在模型卸載時執行
        """
        print(f"🧹 清理模型資源: {model_name}")
'''
        
        elif task_type == "generation":
            # 生成式模型包裝器 (簡化版)
            return "# Generation model wrapper - 實現類似於分類模型，但處理文本生成任務"
        
        else:
            raise ValueError(f"不支持的任務類型: {task_type}")
    
    def _generate_model_config(self, model_name: str, task_type: str) -> str:
        """
        根據任務類型生成相應的模型配置
        """
        
        if task_type == "classification":
            return TritonConfigGenerator.generate_pytorch_config(
                model_name=model_name,
                max_batch_size=8,
                input_specs=[
                    {"name": "input_ids", "data_type": "TYPE_INT64", "dims": [-1]},
                    {"name": "attention_mask", "data_type": "TYPE_INT64", "dims": [-1]}
                ],
                output_specs=[
                    {"name": "logits", "data_type": "TYPE_FP32", "dims": [2]}
                ]
            )
        else:
            raise ValueError(f"不支持的任務類型: {task_type}")

# 初始化部署器
deployer = TritonModelDeployer(MODEL_REPOSITORY_ROOT)

print("🏭 開始企業級模型部署演示...")
print()

### 3.2 部署企業級模型示例

In [None]:
# 部署 Netflix 用戶情感分析模型
netflix_sentiment_deployment = deployer.deploy_huggingface_model(
    model_name_or_path="cardiffnlp/twitter-roberta-base-sentiment-latest",
    triton_model_name="netflix_sentiment_v1_prod",
    model_version=1,
    task_type="classification"
)

print(f"Netflix 情感分析模型部署結果:")
print(f"  📂 模型路徑: {netflix_sentiment_deployment['model_path']}")
print(f"  📋 元數據: {netflix_sentiment_deployment['metadata']}")
print()

# 部署 PayPal 欺詐檢測模型
paypal_fraud_deployment = deployer.deploy_huggingface_model(
    model_name_or_path="ProsusAI/finbert",
    triton_model_name="paypal_fraud_detection_v2_prod",
    model_version=2,
    task_type="classification"
)

print(f"PayPal 欺詐檢測模型部署結果:")
print(f"  📂 模型路徑: {paypal_fraud_deployment['model_path']}")
print(f"  📋 元數據: {paypal_fraud_deployment['metadata']}")
print()

## 4. 模型版本管理和生命週期

### 4.1 企業級版本控制策略

In [None]:
class ModelVersionManager:
    """
    企業級模型版本管理器
    
    支援功能:
    - 語義化版本控制 (Semantic Versioning)
    - A/B 測試版本管理
    - 回滾和金絲雀部署
    - 版本性能追蹤
    """
    
    def __init__(self, model_repository_root: Path):
        self.model_repository_root = model_repository_root
        self.version_registry = {}
    
    def create_model_version(
        self, 
        model_name: str, 
        version: str,
        description: str = "",
        performance_metrics: Dict = None,
        deployment_strategy: str = "blue_green"
    ):
        """
        創建新的模型版本
        
        Args:
            model_name: 模型名稱
            version: 版本號 (語義化版本 如 1.2.3)
            description: 版本描述
            performance_metrics: 性能指標
            deployment_strategy: 部署策略
        """
        
        if performance_metrics is None:
            performance_metrics = {}
        
        # 驗證語義化版本格式
        version_parts = version.split('.')
        if len(version_parts) != 3:
            raise ValueError(f"版本號必須遵循語義化版本格式 (如 1.2.3): {version}")
        
        try:
            major, minor, patch = map(int, version_parts)
        except ValueError:
            raise ValueError(f"版本號必須為數字: {version}")
        
        # 創建版本目錄
        model_path = self.model_repository_root / model_name
        version_path = model_path / f"{major}_{minor}_{patch}"  # Triton 版本目錄命名
        version_path.mkdir(parents=True, exist_ok=True)
        
        # 創建版本元數據
        version_metadata = {
            "version": version,
            "major": major,
            "minor": minor,
            "patch": patch,
            "description": description,
            "created_at": "2024-10-09T10:00:00Z",
            "deployment_strategy": deployment_strategy,
            "performance_metrics": performance_metrics,
            "status": "deployed",
            "triton_version_dir": f"{major}_{minor}_{patch}"
        }
        
        # 保存版本元數據
        metadata_path = version_path / "version_metadata.json"
        with open(metadata_path, 'w', encoding='utf-8') as f:
            json.dump(version_metadata, f, indent=2, ensure_ascii=False)
        
        # 更新版本註冊表
        if model_name not in self.version_registry:
            self.version_registry[model_name] = []
        
        self.version_registry[model_name].append(version_metadata)
        
        print(f"✅ 創建模型版本: {model_name} v{version}")
        print(f"   📂 版本路徑: {version_path}")
        print(f"   📋 部署策略: {deployment_strategy}")
        
        return version_metadata
    
    def list_model_versions(self, model_name: str) -> List[Dict]:
        """
        列出模型的所有版本
        """
        return self.version_registry.get(model_name, [])
    
    def get_latest_version(self, model_name: str) -> Dict:
        """
        獲取模型的最新版本
        """
        versions = self.list_model_versions(model_name)
        if not versions:
            raise ValueError(f"模型 {model_name} 沒有任何版本")
        
        # 按語義化版本排序
        sorted_versions = sorted(
            versions, 
            key=lambda v: (v["major"], v["minor"], v["patch"]),
            reverse=True
        )
        
        return sorted_versions[0]
    
    def setup_ab_testing(self, model_name: str, version_a: str, version_b: str, traffic_split: float = 0.5):
        """
        設置 A/B 測試版本分流
        
        Args:
            model_name: 模型名稱
            version_a: A 版本
            version_b: B 版本
            traffic_split: 流量分配比例 (0.5 = 50/50)
        """
        
        ab_config = {
            "model_name": model_name,
            "version_a": version_a,
            "version_b": version_b,
            "traffic_split": traffic_split,
            "start_time": "2024-10-09T10:00:00Z",
            "status": "active",
            "metrics_collection": True
        }
        
        # 保存 A/B 測試配置
        model_path = self.model_repository_root / model_name
        ab_config_path = model_path / "ab_testing_config.json"
        
        with open(ab_config_path, 'w', encoding='utf-8') as f:
            json.dump(ab_config, f, indent=2, ensure_ascii=False)
        
        print(f"🧪 A/B 測試設置完成: {model_name}")
        print(f"   🅰️  版本 A: {version_a} ({traffic_split * 100:.1f}% 流量)")
        print(f"   🅱️  版本 B: {version_b} ({(1-traffic_split) * 100:.1f}% 流量)")
        
        return ab_config

# 初始化版本管理器
version_manager = ModelVersionManager(MODEL_REPOSITORY_ROOT)

print("📊 企業級模型版本管理演示...")
print()

### 4.2 實際版本管理場景

In [None]:
# Netflix 推薦系統版本演進
print("🎬 Netflix 推薦系統版本管理場景:")
print()

# 版本 1.0.0 - 基礎推薦算法
v1_metadata = version_manager.create_model_version(
    model_name="netflix_recommendation_v2_prod",
    version="1.0.0",
    description="基礎協同過濾推薦模型",
    performance_metrics={
        "precision_at_k": 0.85,
        "recall_at_k": 0.72,
        "latency_p95_ms": 45,
        "throughput_qps": 1200
    },
    deployment_strategy="blue_green"
)

# 版本 1.1.0 - 加入深度學習特徵
v1_1_metadata = version_manager.create_model_version(
    model_name="netflix_recommendation_v2_prod",
    version="1.1.0",
    description="加入用戶行為深度學習特徵",
    performance_metrics={
        "precision_at_k": 0.89,
        "recall_at_k": 0.76,
        "latency_p95_ms": 52,
        "throughput_qps": 1100
    },
    deployment_strategy="canary"
)

# 版本 2.0.0 - 全新 Transformer 架構
v2_metadata = version_manager.create_model_version(
    model_name="netflix_recommendation_v2_prod",
    version="2.0.0",
    description="Transformer-based 序列推薦模型",
    performance_metrics={
        "precision_at_k": 0.93,
        "recall_at_k": 0.81,
        "latency_p95_ms": 38,
        "throughput_qps": 1400
    },
    deployment_strategy="blue_green"
)

# 查看版本歷史
versions = version_manager.list_model_versions("netflix_recommendation_v2_prod")
print(f"📈 Netflix 推薦系統版本歷史 ({len(versions)} 個版本):")

for version in versions:
    print(f"")
    print(f"   🏷️  版本: {version['version']}")
    print(f"   📝 描述: {version['description']}")
    print(f"   🎯 Precision@K: {version['performance_metrics']['precision_at_k']}")
    print(f"   ⚡ 延遲 P95: {version['performance_metrics']['latency_p95_ms']}ms")
    print(f"   🚀 QPS: {version['performance_metrics']['throughput_qps']}")
    print(f"   📦 部署策略: {version['deployment_strategy']}")

print()

# 獲取最新版本
latest_version = version_manager.get_latest_version("netflix_recommendation_v2_prod")
print(f"🔝 最新版本: {latest_version['version']}")
print(f"   性能提升: Precision@K {latest_version['performance_metrics']['precision_at_k']} (+{latest_version['performance_metrics']['precision_at_k'] - 0.85:.2f})")
print()

# 設置 A/B 測試 (v1.1.0 vs v2.0.0)
ab_config = version_manager.setup_ab_testing(
    model_name="netflix_recommendation_v2_prod",
    version_a="1.1.0",
    version_b="2.0.0",
    traffic_split=0.3  # 30% 流量給版本 A，70% 給版本 B
)

print()

## 5. 動態模型管理

### 5.1 模型載入和卸載機制

In [None]:
import requests
import time
from typing import Optional

class TritonModelManager:
    """
    Triton 動態模型管理器
    
    支援功能:
    - 動態載入/卸載模型
    - 模型狀態監控
    - 優雅的模型切換
    - 資源使用最佳化
    """
    
    def __init__(self, triton_url: str = "http://localhost:8000"):
        self.triton_url = triton_url
        self.management_api_url = f"{triton_url}/v2/repository"
    
    def load_model(self, model_name: str, wait_for_ready: bool = True) -> bool:
        """
        動態載入模型到 Triton Server
        
        Args:
            model_name: 要載入的模型名稱
            wait_for_ready: 是否等待模型完全載入
        
        Returns:
            bool: 載入是否成功
        """
        try:
            print(f"🔄 載入模型: {model_name}")
            
            # 模擬 Triton Model Management API 調用
            # 實際代碼: 
            # response = requests.post(f"{self.management_api_url}/models/{model_name}/load")
            
            # 模擬載入過程
            print(f"   ├── 驗證模型配置...")
            time.sleep(0.5)
            
            print(f"   ├── 分配 GPU 資源...")
            time.sleep(0.3)
            
            print(f"   ├── 載入模型權重...")
            time.sleep(1.0)
            
            print(f"   └── 初始化推理引擎...")
            time.sleep(0.5)
            
            if wait_for_ready:
                print(f"   ⏳ 等待模型就緒...")
                # 模擬等待模型就緒
                for i in range(3):
                    time.sleep(0.5)
                    ready = self.is_model_ready(model_name)
                    if ready:
                        break
            
            print(f"✅ 模型載入成功: {model_name}")
            return True
            
        except Exception as e:
            print(f"❌ 模型載入失敗: {model_name} - {str(e)}")
            return False
    
    def unload_model(self, model_name: str) -> bool:
        """
        動態卸載模型
        
        Args:
            model_name: 要卸載的模型名稱
        
        Returns:
            bool: 卸載是否成功
        """
        try:
            print(f"🔄 卸載模型: {model_name}")
            
            # 模擬 Triton Model Management API 調用
            # response = requests.post(f"{self.management_api_url}/models/{model_name}/unload")
            
            print(f"   ├── 停止推理請求...")
            time.sleep(0.3)
            
            print(f"   ├── 釋放 GPU 記憶體...")
            time.sleep(0.5)
            
            print(f"   └── 清理資源...")
            time.sleep(0.2)
            
            print(f"✅ 模型卸載成功: {model_name}")
            return True
            
        except Exception as e:
            print(f"❌ 模型卸載失敗: {model_name} - {str(e)}")
            return False
    
    def is_model_ready(self, model_name: str) -> bool:
        """
        檢查模型是否已準備好接受推理請求
        """
        try:
            # 模擬模型狀態檢查
            # response = requests.get(f"{self.triton_url}/v2/models/{model_name}/ready")
            # return response.status_code == 200
            
            # 模擬檢查結果
            return True
            
        except Exception as e:
            return False
    
    def get_model_status(self, model_name: str) -> Dict:
        """
        獲取模型的詳細狀態信息
        """
        try:
            # 模擬模型狀態信息
            return {
                "name": model_name,
                "state": "READY",
                "reason": "",
                "version": "1",
                "backend": "pytorch",
                "instances": [
                    {
                        "name": f"{model_name}_0",
                        "state": "READY",
                        "kind": "GPU",
                        "gpu_id": 0
                    }
                ]
            }
            
        except Exception as e:
            return {"error": str(e)}
    
    def graceful_model_switch(self, old_model: str, new_model: str) -> bool:
        """
        優雅地切換模型版本
        
        流程:
        1. 載入新模型
        2. 等待新模型就緒
        3. 執行健康檢查
        4. 卸載舊模型
        """
        print(f"🔄 開始優雅模型切換: {old_model} → {new_model}")
        print()
        
        try:
            # 1. 載入新模型
            print(f"📥 第一步: 載入新模型 {new_model}")
            if not self.load_model(new_model, wait_for_ready=True):
                print(f"❌ 新模型載入失敗，中止切換")
                return False
            
            print()
            
            # 2. 執行健康檢查
            print(f"🏥 第二步: 執行新模型健康檢查")
            health_check_passed = self._perform_health_check(new_model)
            
            if not health_check_passed:
                print(f"❌ 健康檢查失敗，回滾操作")
                self.unload_model(new_model)
                return False
            
            print(f"✅ 健康檢查通過")
            print()
            
            # 3. 卸載舊模型
            print(f"📤 第三步: 卸載舊模型 {old_model}")
            if not self.unload_model(old_model):
                print(f"⚠️  舊模型卸載失敗，但新模型已成功部署")
            
            print()
            print(f"🎉 模型切換完成: {old_model} → {new_model}")
            return True
            
        except Exception as e:
            print(f"❌ 模型切換失敗: {str(e)}")
            return False
    
    def _perform_health_check(self, model_name: str) -> bool:
        """
        執行模型健康檢查
        """
        print(f"   ├── 檢查模型狀態...")
        time.sleep(0.3)
        
        print(f"   ├── 執行示例推理...")
        time.sleep(0.5)
        
        print(f"   ├── 驗證輸出格式...")
        time.sleep(0.2)
        
        print(f"   └── 檢查性能指標...")
        time.sleep(0.3)
        
        return True

# 初始化模型管理器
model_manager = TritonModelManager()

print("🎛️  Triton 動態模型管理演示...")
print()

### 5.2 實際模型切換場景

In [None]:
# 場景：Netflix 推薦系統模型升級
print("🎬 Netflix 推薦系統模型升級場景:")
print("📋 需求：從 v1.1.0 升級到 v2.0.0，零停機時間")
print()

# 模擬當前運行的模型
current_model = "netflix_recommendation_v1_1_prod"
new_model = "netflix_recommendation_v2_0_prod"

# 執行優雅切換
switch_success = model_manager.graceful_model_switch(current_model, new_model)

if switch_success:
    print("📊 切換後性能對比:")
    print(f"   🎯 Precision@K: 0.89 → 0.93 (+4.5%)")
    print(f"   ⚡ 延遲: 52ms → 38ms (-26.9%)")
    print(f"   🚀 吞吐量: 1100 QPS → 1400 QPS (+27.3%)")
else:
    print("❌ 模型切換失敗，保持原有模型運行")

print()
print("─" * 60)
print()

# 場景：PayPal 風控模型緊急回滾
print("💳 PayPal 風控模型緊急回滾場景:")
print("📋 需求：檢測到新版本誤報率過高，緊急回滾到穩定版本")
print()

problematic_model = "paypal_fraud_detection_v2_1_prod"
stable_model = "paypal_fraud_detection_v2_0_prod"

# 緊急回滾
print(f"🚨 執行緊急回滾操作...")
rollback_success = model_manager.graceful_model_switch(problematic_model, stable_model)

if rollback_success:
    print("📈 回滾後指標恢復:")
    print(f"   ✅ 誤報率: 8.5% → 2.1% (恢復正常)")
    print(f"   ✅ 召回率: 89.2% → 94.7% (恢復正常)")
    print(f"   ✅ 系統穩定性: 恢復")
else:
    print("❌ 緊急回滾失敗，需要人工介入")

print()

# 檢查最終模型狀態
print("📋 當前載入的模型狀態:")
models_to_check = [new_model, stable_model]

for model_name in models_to_check:
    status = model_manager.get_model_status(model_name)
    print(f"   🔍 {model_name}:")
    print(f"      狀態: {status.get('state', 'UNKNOWN')}")
    print(f"      後端: {status.get('backend', 'UNKNOWN')}")
    print(f"      版本: {status.get('version', 'UNKNOWN')}")
    print(f"      實例數: {len(status.get('instances', []))}")

## 6. 企業級最佳實踐總結

### 6.1 Model Repository 設計原則

In [None]:
class EnterpriseModelRepositoryBestPractices:
    """
    企業級 Model Repository 最佳實踐指南
    """
    
    @staticmethod
    def print_best_practices():
        print("🏆 企業級 Model Repository 最佳實踐")
        print("═" * 60)
        print()
        
        practices = {
            "🏗️  架構設計原則": [
                "採用語義化版本控制 (Semantic Versioning)",
                "實施清晰的模型命名規範",
                "分離模型配置和業務邏輯",
                "支援多環境部署 (dev/staging/prod)",
                "實現模型元數據管理"
            ],
            "⚡ 性能最佳化": [
                "配置動態批處理 (Dynamic Batching)",
                "使用 GPU 實例組提高吞吐量",
                "啟用 TensorRT 或其他加速器",
                "優化模型輸入輸出格式",
                "實施模型預熱 (Model Warmup)"
            ],
            "🔒 安全性考量": [
                "實施模型訪問控制",
                "加密敏感模型文件",
                "審計模型部署和訪問日誌",
                "實施網路安全措施",
                "定期安全漏洞掃描"
            ],
            "📊 監控與可觀測性": [
                "實時性能指標監控",
                "模型準確性追蹤",
                "資源使用監控",
                "異常檢測和告警",
                "分散式追蹤整合"
            ],
            "🔄 DevOps 整合": [
                "自動化模型部署流水線",
                "A/B 測試自動化",
                "金絲雀部署支援",
                "自動回滾機制",
                "容器化和 Kubernetes 整合"
            ],
            "📈 擴展性設計": [
                "支援水平擴展",
                "負載均衡配置",
                "多區域部署",
                "彈性資源調配",
                "高可用性架構"
            ]
        }
        
        for category, items in practices.items():
            print(f"{category}:")
            for item in items:
                print(f"   ✅ {item}")
            print()
    
    @staticmethod
    def print_common_pitfalls():
        print("⚠️  常見陷阱與解決方案")
        print("═" * 60)
        print()
        
        pitfalls = {
            "🐛 配置錯誤": {
                "問題": "模型輸入輸出維度配置錯誤",
                "解決方案": "使用自動化配置生成和驗證工具",
                "預防措施": "實施配置模板和測試套件"
            },
            "💾 記憶體洩漏": {
                "問題": "模型長時間運行後記憶體使用持續增長",
                "解決方案": "實施記憶體監控和自動重啟機制",
                "預防措施": "定期記憶體使用審計和優化"
            },
            "🔥 版本衝突": {
                "問題": "多個模型版本之間的依賴衝突",
                "解決方案": "使用容器隔離和版本鎖定",
                "預防措施": "制定嚴格的版本管理策略"
            },
            "📉 性能退化": {
                "問題": "模型推理性能隨時間逐漸下降",
                "解決方案": "持續性能監控和自動調優",
                "預防措施": "建立性能基準線和告警機制"
            }
        }
        
        for category, details in pitfalls.items():
            print(f"{category}:")
            print(f"   ❌ 問題: {details['問題']}")
            print(f"   💡 解決方案: {details['解決方案']}")
            print(f"   🛡️  預防措施: {details['預防措施']}")
            print()

# 顯示最佳實踐指南
EnterpriseModelRepositoryBestPractices.print_best_practices()
EnterpriseModelRepositoryBestPractices.print_common_pitfalls()

## 🎯 本章總結

### 核心學習成果

通過本實驗室，您已經掌握了：

1. **🏗️ Model Repository 架構設計**
   - 標準目錄結構創建
   - 企業級命名規範制定
   - 多版本模型共存管理

2. **⚙️ 配置文件深度定制**
   - config.pbtxt 全面配置
   - 動態批處理優化
   - 多 GPU 實例組設置

3. **🚀 模型部署自動化**
   - HuggingFace 模型整合
   - 模型包裝器生成
   - 元數據管理體系

4. **📊 版本控制與生命週期**
   - 語義化版本管理
   - A/B 測試配置
   - 優雅模型切換

### 企業級技能提升

您現在具備了：
- **Netflix 級別**的多模型管理能力
- **PayPal 級別**的高可用性部署技能
- **完整 MLOps 流程**的設計和實施能力

### 下一步學習路徑

在下一個實驗室 **Lab-2.1.3: PyTorch Backend Deployment** 中，我們將：
- 深入 PyTorch Backend 的高級特性
- 實現自定義推理邏輯
- 優化模型推理性能
- 整合企業級監控系統

---

**🏆 恭喜！您已經完成了 Triton Model Repository 的企業級設計與配置！**