# Lab 2.5.1 - Kubernetes 原生部署

## 🎯 實驗目標

本實驗將教您如何：
1. 設計雲原生 Triton 部署架構
2. 實現 GPU 資源調度與管理
3. 配置服務發現與負載均衡
4. 建立自動擴縮容機制
5. 實施滾動更新與回滾策略

## 📋 前置需求

- Kubernetes 叢集 (v1.24+)
- NVIDIA GPU Operator 或 Device Plugin
- Helm 3.x
- kubectl 已配置

---

## 📚 理論背景

### Kubernetes 中的 AI 推理挑戰

**1. GPU 資源管理**
- GPU 是稀缺且昂貴的計算資源
- 需要精確的資源調度和分配
- 支援多租戶和資源隔離

**2. 模型生命週期管理**
- 模型文件通常體積龐大 (數 GB 到數十 GB)
- 需要版本控制和快速部署
- 支援 A/B 測試和漸進式發布

**3. 服務高可用性**
- 零停機時間的模型更新
- 自動故障轉移和恢復
- 彈性擴縮容應對流量變化

### 雲原生 AI 推理架構

```
┌─────────────────────────────────────────────────────────────┐
│                    Ingress Controller                       │
│                  (NGINX/Traefik/Istio)                     │
└─────────────────┬───────────────────────────────────────────┘
                  │
      ┌───────────▼─────────────────────────────────┐
      │            Service Mesh (Optional)          │
      │              (Istio/Linkerd)                │
      └─────────────────┬───────────────────────────┘
                        │
    ┌───────────────────▼───────────────────────────┐
    │              LoadBalancer Service             │
    └─────┬─────────────────────────────────┬───────┘
          │                                 │
    ┌─────▼──────┐                   ┌─────▼──────┐
    │   Pod-1    │                   │   Pod-2    │
    │ ┌────────┐ │                   │ ┌────────┐ │
    │ │ Triton │ │                   │ │ Triton │ │
    │ │ Server │ │                   │ │ Server │ │
    │ └────────┘ │                   │ └────────┘ │
    │ ┌────────┐ │                   │ ┌────────┐ │
    │ │  GPU   │ │                   │ │  GPU   │ │
    │ └────────┘ │                   │ └────────┘ │
    └────────────┘                   └────────────┘
          │                                 │
    ┌─────▼─────────────────────────────────▼───────┐
    │           Persistent Volume (Models)          │
    │              (NFS/GlusterFS/S3)               │
    └───────────────────────────────────────────────┘
```

## 🛠️ 環境準備

In [None]:
import os
import yaml
import json
import subprocess
import time
from datetime import datetime
from typing import Dict, List, Optional, Any
from pathlib import Path

import requests
from kubernetes import client, config
from kubernetes.client.rest import ApiException

print(f"Environment ready at {datetime.now()}")
print(f"Working directory: {os.getcwd()}")

In [None]:
# 設置 Kubernetes 部署環境
K8S_DIR = "/tmp/triton-k8s"
MANIFESTS_DIR = f"{K8S_DIR}/manifests"
HELM_DIR = f"{K8S_DIR}/helm"
CONFIGS_DIR = f"{K8S_DIR}/configs"

# 創建目錄結構
directories = [
    MANIFESTS_DIR,
    HELM_DIR,
    CONFIGS_DIR,
    f"{HELM_DIR}/triton-inference",
    f"{HELM_DIR}/triton-inference/templates"
]

for directory in directories:
    os.makedirs(directory, exist_ok=True)

print("📁 Kubernetes 部署目錄結構:")
for directory in directories:
    print(f"   {directory}")

# 全域配置
NAMESPACE = "triton-inference"
APP_NAME = "triton-server"
IMAGE = "nvcr.io/nvidia/tritonserver:24.10-py3"
MODEL_REPOSITORY = "/models"

print(f"\n⚙️  部署配置:")
print(f"   命名空間: {NAMESPACE}")
print(f"   應用名稱: {APP_NAME}")
print(f"   容器映像: {IMAGE}")
print(f"   模型倉庫: {MODEL_REPOSITORY}")

## 🎯 實驗 1：集群環境驗證與準備

### 1.1 Kubernetes 集群檢查

In [None]:
def check_kubernetes_cluster():
    """檢查 Kubernetes 集群狀態"""
    try:
        # 載入 kubeconfig
        config.load_kube_config()
        v1 = client.CoreV1Api()
        
        # 檢查節點狀態
        print("🔍 檢查集群節點:")
        nodes = v1.list_node()
        for node in nodes.items:
            name = node.metadata.name
            status = "Ready" if any(condition.type == "Ready" and condition.status == "True" 
                                  for condition in node.status.conditions) else "NotReady"
            
            # 檢查 GPU 資源
            gpu_capacity = node.status.capacity.get('nvidia.com/gpu', '0')
            gpu_allocatable = node.status.allocatable.get('nvidia.com/gpu', '0')
            
            print(f"   節點: {name}")
            print(f"   狀態: {status}")
            print(f"   GPU 容量: {gpu_capacity}")
            print(f"   GPU 可分配: {gpu_allocatable}")
            print()
        
        return True
        
    except Exception as e:
        print(f"❌ 集群檢查失敗: {e}")
        return False

cluster_ready = check_kubernetes_cluster()

### 1.2 GPU Operator 驗證

In [None]:
def check_gpu_operator():
    """檢查 NVIDIA GPU Operator 狀態"""
    try:
        # 檢查 GPU Operator namespace
        v1 = client.CoreV1Api()
        apps_v1 = client.AppsV1Api()
        
        print("🔍 檢查 GPU Operator:")
        
        # 檢查 gpu-operator namespace
        try:
            namespace = v1.read_namespace(name="gpu-operator")
            print(f"   ✅ GPU Operator namespace 存在")
        except ApiException as e:
            if e.status == 404:
                print(f"   ⚠️  GPU Operator namespace 不存在")
                return False
        
        # 檢查 DaemonSet
        daemonsets = apps_v1.list_namespaced_daemon_set(namespace="gpu-operator")
        for ds in daemonsets.items:
            name = ds.metadata.name
            desired = ds.status.desired_number_scheduled or 0
            ready = ds.status.number_ready or 0
            print(f"   DaemonSet: {name} ({ready}/{desired} ready)")
        
        # 檢查 Device Plugin
        pods = v1.list_namespaced_pod(namespace="gpu-operator", 
                                     label_selector="app=nvidia-device-plugin-daemonset")
        
        if pods.items:
            print(f"   ✅ NVIDIA Device Plugin 運行中 ({len(pods.items)} pods)")
            return True
        else:
            print(f"   ⚠️  NVIDIA Device Plugin 未找到")
            return False
            
    except Exception as e:
        print(f"❌ GPU Operator 檢查失敗: {e}")
        return False

gpu_ready = check_gpu_operator()

### 1.3 創建命名空間和基礎資源

In [None]:
# 創建命名空間 YAML
namespace_yaml = f"""
apiVersion: v1
kind: Namespace
metadata:
  name: {NAMESPACE}
  labels:
    name: {NAMESPACE}
    purpose: ai-inference
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: triton-quota
  namespace: {NAMESPACE}
spec:
  hard:
    requests.cpu: "8"
    requests.memory: 32Gi
    requests.nvidia.com/gpu: "4"
    limits.cpu: "16"
    limits.memory: 64Gi
    limits.nvidia.com/gpu: "4"
    persistentvolumeclaims: "2"
"""

# 寫入文件
with open(f"{MANIFESTS_DIR}/namespace.yaml", "w") as f:
    f.write(namespace_yaml)

print("📝 已創建命名空間配置")
print(f"   文件位置: {MANIFESTS_DIR}/namespace.yaml")

In [None]:
def apply_kubernetes_manifest(manifest_path: str) -> bool:
    """應用 Kubernetes manifest"""
    try:
        result = subprocess.run(
            ["kubectl", "apply", "-f", manifest_path],
            capture_output=True,
            text=True,
            check=True
        )
        print(f"✅ 成功應用: {manifest_path}")
        print(f"   輸出: {result.stdout.strip()}")
        return True
    except subprocess.CalledProcessError as e:
        print(f"❌ 應用失敗: {manifest_path}")
        print(f"   錯誤: {e.stderr}")
        return False

# 應用命名空間配置
if cluster_ready:
    apply_kubernetes_manifest(f"{MANIFESTS_DIR}/namespace.yaml")

## 🎯 實驗 2：模型儲存與 PV 配置

### 2.1 設計模型儲存策略

In [None]:
# PersistentVolume 和 PersistentVolumeClaim 配置
storage_yaml = f"""
apiVersion: v1
kind: PersistentVolume
metadata:
  name: triton-models-pv
  labels:
    type: models
    app: triton-server
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: models-storage
  hostPath:
    path: /data/triton/models
    type: DirectoryOrCreate
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: triton-models-pvc
  namespace: {NAMESPACE}
  labels:
    app: triton-server
    component: storage
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 100Gi
  storageClassName: models-storage
  selector:
    matchLabels:
      type: models
      app: triton-server
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: triton-cache-pv
  labels:
    type: cache
    app: triton-server
spec:
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Delete
  storageClassName: cache-storage
  hostPath:
    path: /data/triton/cache
    type: DirectoryOrCreate
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: triton-cache-pvc
  namespace: {NAMESPACE}
  labels:
    app: triton-server
    component: cache
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi
  storageClassName: cache-storage
  selector:
    matchLabels:
      type: cache
      app: triton-server
"""

# 寫入文件
with open(f"{MANIFESTS_DIR}/storage.yaml", "w") as f:
    f.write(storage_yaml)

print("📝 已創建儲存配置")
print(f"   文件位置: {MANIFESTS_DIR}/storage.yaml")
print("\n💾 儲存策略:")
print("   - 模型儲存: 100Gi (ReadOnlyMany)")
print("   - 快取儲存: 50Gi (ReadWriteMany)")
print("   - 回收策略: Retain (models), Delete (cache)")

### 2.2 ConfigMap 配置管理

In [None]:
# ConfigMap 用於 Triton 配置
configmap_yaml = f"""
apiVersion: v1
kind: ConfigMap
metadata:
  name: triton-config
  namespace: {NAMESPACE}
  labels:
    app: triton-server
    component: config
data:
  # Triton 服務器配置
  triton-config.json: |
    {{
      "backend_config": {{
        "pytorch": {{
          "cmdline": {{
            "auto-complete-config": "true",
            "backend-directory": "/opt/tritonserver/backends",
            "min-compute-capability": "6.0"
          }}
        }}
      }},
      "model_config_name": "config.pbtxt",
      "log_level": 1,
      "log_verbose": 1,
      "metrics": {{
        "allow_metrics": true,
        "allow_gpu_metrics": true,
        "allow_cpu_metrics": true,
        "metrics_interval_ms": 1000
      }}
    }}
  
  # 健康檢查腳本
  health-check.sh: |
    #!/bin/bash
    set -e
    
    # 檢查 Triton 服務器狀態
    curl -f http://localhost:8000/v2/health/ready || exit 1
    
    # 檢查模型狀態
    MODEL_COUNT=$(curl -s http://localhost:8000/v2/models | jq '.models | length')
    if [ "$MODEL_COUNT" -eq 0 ]; then
      echo "No models loaded"
      exit 1
    fi
    
    echo "Health check passed: $MODEL_COUNT models loaded"
  
  # 模型載入腳本
  model-loader.sh: |
    #!/bin/bash
    set -e
    
    echo "Starting model loader..."
    
    # 等待模型儲存掛載
    while [ ! -d "{MODEL_REPOSITORY}" ]; do
      echo "Waiting for model repository to be mounted..."
      sleep 5
    done
    
    # 檢查模型文件
    MODEL_COUNT=$(find {MODEL_REPOSITORY} -name "config.pbtxt" | wc -l)
    echo "Found $MODEL_COUNT models in repository"
    
    if [ "$MODEL_COUNT" -eq 0 ]; then
      echo "Warning: No models found in repository"
    fi
    
    echo "Model loader completed"

---
apiVersion: v1
kind: Secret
metadata:
  name: triton-secrets
  namespace: {NAMESPACE}
  labels:
    app: triton-server
    component: security
type: Opaque
data:
  # Base64 編碼的憑證 (範例)
  model-access-key: bW9kZWwtYWNjZXNzLWtleQ==
  registry-token: cmVnaXN0cnktdG9rZW4=
"""

# 寫入文件
with open(f"{MANIFESTS_DIR}/configmap.yaml", "w") as f:
    f.write(configmap_yaml)

print("📝 已創建配置管理")
print(f"   文件位置: {MANIFESTS_DIR}/configmap.yaml")
print("\n⚙️  配置內容:")
print("   - Triton 服務器配置")
print("   - 健康檢查腳本")
print("   - 模型載入腳本")
print("   - 安全憑證")

## 🎯 實驗 3：Triton Server Deployment

### 3.1 設計高可用部署配置

In [None]:
# Triton Server Deployment
deployment_yaml = f"""
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {APP_NAME}
  namespace: {NAMESPACE}
  labels:
    app: {APP_NAME}
    version: v1
    component: inference-server
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: {APP_NAME}
  template:
    metadata:
      labels:
        app: {APP_NAME}
        version: v1
        component: inference-server
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8002"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: triton-service-account
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      
      # 節點選擇器 - 選擇有 GPU 的節點
      nodeSelector:
        accelerator: nvidia-tesla-gpu
      
      # 容忍度 - 允許在有 GPU 污點的節點上調度
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      
      # 反親和性 - 避免多個 Pod 調度到同一節點
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: {APP_NAME}
              topologyKey: kubernetes.io/hostname
      
      # Init Container - 模型預載入
      initContainers:
      - name: model-loader
        image: curlimages/curl:latest
        command: ["/bin/sh"]
        args: ["/scripts/model-loader.sh"]
        volumeMounts:
        - name: triton-models
          mountPath: {MODEL_REPOSITORY}
          readOnly: true
        - name: triton-config
          mountPath: /scripts
          readOnly: true
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
      
      containers:
      - name: triton-server
        image: {IMAGE}
        imagePullPolicy: IfNotPresent
        
        command: ["tritonserver"]
        args:
        - --model-repository={MODEL_REPOSITORY}
        - --allow-http=true
        - --allow-grpc=true
        - --allow-metrics=true
        - --allow-gpu-metrics=true
        - --strict-model-config=false
        - --strict-readiness=false
        - --http-port=8000
        - --grpc-port=8001
        - --metrics-port=8002
        - --log-verbose=1
        - --model-control-mode=poll
        - --repository-poll-secs=30
        
        ports:
        - containerPort: 8000
          name: http
          protocol: TCP
        - containerPort: 8001
          name: grpc
          protocol: TCP
        - containerPort: 8002
          name: metrics
          protocol: TCP
        
        # 資源請求和限制
        resources:
          requests:
            cpu: 2
            memory: 8Gi
            nvidia.com/gpu: 1
          limits:
            cpu: 4
            memory: 16Gi
            nvidia.com/gpu: 1
        
        # 健康檢查
        livenessProbe:
          httpGet:
            path: /v2/health/live
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 30
          timeoutSeconds: 10
          failureThreshold: 3
        
        readinessProbe:
          httpGet:
            path: /v2/health/ready
            port: 8000
          initialDelaySeconds: 15
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        
        startupProbe:
          httpGet:
            path: /v2/health/ready
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 30
        
        # 環境變數
        env:
        - name: CUDA_VISIBLE_DEVICES
          value: "0"
        - name: TRITON_MODEL_REPOSITORY
          value: {MODEL_REPOSITORY}
        - name: NVIDIA_VISIBLE_DEVICES
          value: all
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: compute,utility
        
        # 掛載點
        volumeMounts:
        - name: triton-models
          mountPath: {MODEL_REPOSITORY}
          readOnly: true
        - name: triton-cache
          mountPath: /opt/tritonserver/cache
        - name: triton-config
          mountPath: /opt/tritonserver/config
          readOnly: true
        - name: triton-secrets
          mountPath: /opt/tritonserver/secrets
          readOnly: true
        - name: shm
          mountPath: /dev/shm
        
        # 安全上下文
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: false
          capabilities:
            drop:
            - ALL
      
      # 存儲卷
      volumes:
      - name: triton-models
        persistentVolumeClaim:
          claimName: triton-models-pvc
      - name: triton-cache
        persistentVolumeClaim:
          claimName: triton-cache-pvc
      - name: triton-config
        configMap:
          name: triton-config
          defaultMode: 0755
      - name: triton-secrets
        secret:
          secretName: triton-secrets
          defaultMode: 0600
      - name: shm
        emptyDir:
          medium: Memory
          sizeLimit: 1Gi
      
      # 重啟策略
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
"""

# 寫入文件
with open(f"{MANIFESTS_DIR}/deployment.yaml", "w") as f:
    f.write(deployment_yaml)

print("📝 已創建部署配置")
print(f"   文件位置: {MANIFESTS_DIR}/deployment.yaml")
print("\n🚀 部署特性:")
print("   - 高可用性: 2 副本 + 反親和性")
print("   - 滾動更新: 零停機部署")
print("   - GPU 調度: 節點選擇器 + 容忍度")
print("   - 健康檢查: 存活/就緒/啟動探針")
print("   - 資源管理: CPU/Memory/GPU 限制")
print("   - 安全性: 非root用戶 + 安全上下文")

### 3.2 服務帳戶和 RBAC

In [None]:
# ServiceAccount 和 RBAC 配置
rbac_yaml = f"""
apiVersion: v1
kind: ServiceAccount
metadata:
  name: triton-service-account
  namespace: {NAMESPACE}
  labels:
    app: triton-server
    component: security
automountServiceAccountToken: true

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: triton-role
  namespace: {NAMESPACE}
  labels:
    app: triton-server
    component: security
rules:
# 允許讀取 ConfigMaps 和 Secrets
- apiGroups: [""]
  resources: ["configmaps", "secrets"]
  verbs: ["get", "list", "watch"]

# 允許讀取 PVC 狀態
- apiGroups: [""]
  resources: ["persistentvolumeclaims"]
  verbs: ["get", "list"]

# 允許讀取自己的 Pod 資訊
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]

# 允許創建 Events
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create", "patch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: triton-role-binding
  namespace: {NAMESPACE}
  labels:
    app: triton-server
    component: security
subjects:
- kind: ServiceAccount
  name: triton-service-account
  namespace: {NAMESPACE}
roleRef:
  kind: Role
  name: triton-role
  apiGroup: rbac.authorization.k8s.io

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: triton-monitoring
  namespace: {NAMESPACE}
  labels:
    app: triton-server
    component: monitoring

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: triton-monitoring-cluster-role
  labels:
    app: triton-server
    component: monitoring
rules:
# 允許讀取節點和 Pod 指標
- apiGroups: [""]
  resources: ["nodes", "nodes/metrics", "services", "endpoints", "pods"]
  verbs: ["get", "list", "watch"]

# 允許讀取 Deployment 和 ReplicaSet
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets"]
  verbs: ["get", "list", "watch"]

# 允許讀取指標
- apiGroups: ["metrics.k8s.io"]
  resources: ["nodes", "pods"]
  verbs: ["get", "list"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: triton-monitoring-cluster-role-binding
  labels:
    app: triton-server
    component: monitoring
subjects:
- kind: ServiceAccount
  name: triton-monitoring
  namespace: {NAMESPACE}
roleRef:
  kind: ClusterRole
  name: triton-monitoring-cluster-role
  apiGroup: rbac.authorization.k8s.io
"""

# 寫入文件
with open(f"{MANIFESTS_DIR}/rbac.yaml", "w") as f:
    f.write(rbac_yaml)

print("📝 已創建 RBAC 配置")
print(f"   文件位置: {MANIFESTS_DIR}/rbac.yaml")
print("\n🔐 安全配置:")
print("   - 服務帳戶: triton-service-account")
print("   - 最小權限原則: 僅必要的資源訪問")
print("   - 監控權限: 獨立的監控服務帳戶")
print("   - 命名空間隔離: 權限限制在命名空間內")

## 🎯 實驗 4：服務發現與負載均衡

### 4.1 設計服務層級

In [None]:
# Service 配置
service_yaml = f"""
apiVersion: v1
kind: Service
metadata:
  name: {APP_NAME}-service
  namespace: {NAMESPACE}
  labels:
    app: {APP_NAME}
    component: load-balancer
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
    prometheus.io/scrape: "true"
    prometheus.io/port: "8002"
spec:
  type: LoadBalancer
  selector:
    app: {APP_NAME}
  ports:
  - name: http
    port: 8000
    targetPort: 8000
    protocol: TCP
  - name: grpc
    port: 8001
    targetPort: 8001
    protocol: TCP
  - name: metrics
    port: 8002
    targetPort: 8002
    protocol: TCP
  sessionAffinity: None
  loadBalancerSourceRanges:
  - 10.0.0.0/8
  - 172.16.0.0/12
  - 192.168.0.0/16

---
apiVersion: v1
kind: Service
metadata:
  name: {APP_NAME}-headless
  namespace: {NAMESPACE}
  labels:
    app: {APP_NAME}
    component: discovery
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: {APP_NAME}
  ports:
  - name: http
    port: 8000
    targetPort: 8000
  - name: grpc
    port: 8001
    targetPort: 8001
  - name: metrics
    port: 8002
    targetPort: 8002

---
apiVersion: v1
kind: Service
metadata:
  name: {APP_NAME}-internal
  namespace: {NAMESPACE}
  labels:
    app: {APP_NAME}
    component: internal
spec:
  type: ClusterIP
  selector:
    app: {APP_NAME}
  ports:
  - name: http
    port: 80
    targetPort: 8000
  - name: grpc
    port: 8001
    targetPort: 8001
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 300
"""

# 寫入文件
with open(f"{MANIFESTS_DIR}/service.yaml", "w") as f:
    f.write(service_yaml)

print("📝 已創建服務配置")
print(f"   文件位置: {MANIFESTS_DIR}/service.yaml")
print("\n🌐 服務層級:")
print("   - LoadBalancer: 外部訪問 (HTTP/gRPC/Metrics)")
print("   - Headless: 服務發現 (DNS)")
print("   - ClusterIP: 內部訪問 (會話親和性)")
print("   - 安全性: 私有網路來源限制")

### 4.2 Ingress 配置

In [None]:
# Ingress 配置
ingress_yaml = f"""
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {APP_NAME}-ingress
  namespace: {NAMESPACE}
  labels:
    app: {APP_NAME}
    component: ingress
  annotations:
    # NGINX Ingress Controller 配置
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    
    # 速率限制
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/rate-limit-window: "1m"
    
    # 連接和請求超時
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    
    # 健康檢查
    nginx.ingress.kubernetes.io/upstream-healthcheck-path: "/v2/health/ready"
    nginx.ingress.kubernetes.io/upstream-healthcheck-interval: "30s"
    
    # 負載均衡
    nginx.ingress.kubernetes.io/load-balance: "round_robin"
    nginx.ingress.kubernetes.io/upstream-hash-by: "$remote_addr"
    
    # 安全標頭
    nginx.ingress.kubernetes.io/configuration-snippet: |
      more_set_headers "X-Content-Type-Options: nosniff";
      more_set_headers "X-Frame-Options: DENY";
      more_set_headers "X-XSS-Protection: 1; mode=block";
      more_set_headers "Referrer-Policy: strict-origin-when-cross-origin";
      
    # TLS 配置
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    
spec:
  tls:
  - hosts:
    - triton.example.com
    - api.triton.example.com
    secretName: triton-tls-secret
  
  rules:
  # 主要 API 端點
  - host: triton.example.com
    http:
      paths:
      - path: /v2
        pathType: Prefix
        backend:
          service:
            name: {APP_NAME}-service
            port:
              number: 8000
      - path: /metrics
        pathType: Prefix
        backend:
          service:
            name: {APP_NAME}-service
            port:
              number: 8002
  
  # API 專用域名
  - host: api.triton.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: {APP_NAME}-service
            port:
              number: 8000

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: {APP_NAME}-network-policy
  namespace: {NAMESPACE}
  labels:
    app: {APP_NAME}
    component: security
spec:
  podSelector:
    matchLabels:
      app: {APP_NAME}
  
  policyTypes:
  - Ingress
  - Egress
  
  ingress:
  # 允許來自 Ingress Controller 的流量
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8000
    - protocol: TCP
      port: 8001
    - protocol: TCP
      port: 8002
  
  # 允許來自相同命名空間的流量
  - from:
    - podSelector: {{}}
    ports:
    - protocol: TCP
      port: 8000
    - protocol: TCP
      port: 8001
    - protocol: TCP
      port: 8002
  
  # 允許來自監控命名空間的流量
  - from:
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 8002
  
  egress:
  # 允許 DNS 查詢
  - to: []
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53
  
  # 允許 HTTPS 出站 (模型下載)
  - to: []
    ports:
    - protocol: TCP
      port: 443
  
  # 允許與 Kubernetes API 通信
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: TCP
      port: 6443
"""

# 寫入文件
with open(f"{MANIFESTS_DIR}/ingress.yaml", "w") as f:
    f.write(ingress_yaml)

print("📝 已創建 Ingress 配置")
print(f"   文件位置: {MANIFESTS_DIR}/ingress.yaml")
print("\n🌍 Ingress 特性:")
print("   - TLS 終止: Let's Encrypt 自動證書")
print("   - 速率限制: 100 req/min")
print("   - 健康檢查: 自動後端檢測")
print("   - 安全標頭: 防護常見攻擊")
print("   - 網路政策: 精細流量控制")

## 🎯 實驗 5：自動擴縮容配置

### 5.1 HPA (Horizontal Pod Autoscaler)

In [None]:
# HPA 配置
hpa_yaml = f"""
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: {APP_NAME}-hpa
  namespace: {NAMESPACE}
  labels:
    app: {APP_NAME}
    component: autoscaling
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {APP_NAME}
  
  minReplicas: 2
  maxReplicas: 10
  
  metrics:
  # CPU 使用率
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  
  # 記憶體使用率
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  
  # GPU 使用率 (自定義指標)
  - type: Pods
    pods:
      metric:
        name: nvidia_gpu_utilization
      target:
        type: AverageValue
        averageValue: "75"
  
  # 請求速率 (自定義指標)
  - type: Object
    object:
      metric:
        name: triton_request_rate
      describedObject:
        apiVersion: v1
        kind: Service
        name: {APP_NAME}-service
      target:
        type: AverageValue
        averageValue: "100"
  
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Max
    
    scaleDown:
      stabilizationWindowSeconds: 600
      policies:
      - type: Percent
        value: 50
        periodSeconds: 300
      - type: Pods
        value: 1
        periodSeconds: 300
      selectPolicy: Min

---
apiVersion: autoscaling/v2
kind: VerticalPodAutoscaler
metadata:
  name: {APP_NAME}-vpa
  namespace: {NAMESPACE}
  labels:
    app: {APP_NAME}
    component: autoscaling
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {APP_NAME}
  
  updatePolicy:
    updateMode: "Off"  # 僅建議，不自動應用
  
  resourcePolicy:
    containerPolicies:
    - containerName: triton-server
      minAllowed:
        cpu: 1
        memory: 4Gi
      maxAllowed:
        cpu: 8
        memory: 32Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: {APP_NAME}-pdb
  namespace: {NAMESPACE}
  labels:
    app: {APP_NAME}
    component: availability
spec:
  selector:
    matchLabels:
      app: {APP_NAME}
  
  # 至少保持 50% 的 Pod 可用
  minAvailable: 50%
  
  # 或者最多允許 1 個 Pod 不可用
  # maxUnavailable: 1
"""

# 寫入文件
with open(f"{MANIFESTS_DIR}/autoscaling.yaml", "w") as f:
    f.write(hpa_yaml)

print("📝 已創建自動擴縮容配置")
print(f"   文件位置: {MANIFESTS_DIR}/autoscaling.yaml")
print("\n📈 擴縮容策略:")
print("   - HPA: CPU/Memory/GPU/RPS 多指標")
print("   - VPA: 垂直擴展建議 (手動應用)")
print("   - PDB: 50% 最小可用性保證")
print("   - 擴展行為: 快速擴展 + 緩慢收縮")

### 5.2 Custom Metrics 配置

In [None]:
# Prometheus Adapter 配置
metrics_config_yaml = f"""
apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: {NAMESPACE}
  labels:
    app: prometheus-adapter
    component: metrics
data:
  config.yaml: |
    rules:
    # Triton 請求速率指標
    - seriesQuery: 'nv_inference_request_success_rate{{namespace!="",pod!=""}}'
      resources:
        overrides:
          namespace: {{resource: "namespace"}}
          pod: {{resource: "pod"}}
      name:
        matches: "^nv_inference_request_success_rate"
        as: "triton_request_rate"
      metricsQuery: 'sum(rate(<<.Series>>{{<<.LabelMatchers>>}}[2m])) by (<<.GroupBy>>)'
    
    # GPU 使用率指標
    - seriesQuery: 'DCGM_FI_DEV_GPU_UTIL{{namespace!="",pod!=""}}'
      resources:
        overrides:
          namespace: {{resource: "namespace"}}
          pod: {{resource: "pod"}}
      name:
        matches: "^DCGM_FI_DEV_GPU_UTIL"
        as: "nvidia_gpu_utilization"
      metricsQuery: 'avg(<<.Series>>{{<<.LabelMatchers>>}}) by (<<.GroupBy>>)'
    
    # 模型推理延遲
    - seriesQuery: 'nv_inference_request_duration_us{{namespace!="",pod!=""}}'
      resources:
        overrides:
          namespace: {{resource: "namespace"}}
          pod: {{resource: "pod"}}
      name:
        matches: "^nv_inference_request_duration_us"
        as: "triton_inference_latency"
      metricsQuery: 'avg(rate(<<.Series>>{{<<.LabelMatchers>>}}[2m])) by (<<.GroupBy>>)'
    
    # 模型記憶體使用量
    - seriesQuery: 'nv_gpu_memory_used_bytes{{namespace!="",pod!=""}}'
      resources:
        overrides:
          namespace: {{resource: "namespace"}}
          pod: {{resource: "pod"}}
      name:
        matches: "^nv_gpu_memory_used_bytes"
        as: "triton_gpu_memory_usage"
      metricsQuery: 'avg(<<.Series>>{{<<.LabelMatchers>>}}) by (<<.GroupBy>>)'
    
    # 佇列長度
    - seriesQuery: 'nv_inference_queue_duration_us{{namespace!="",pod!=""}}'
      resources:
        overrides:
          namespace: {{resource: "namespace"}}
          pod: {{resource: "pod"}}
      name:
        matches: "^nv_inference_queue_duration_us"
        as: "triton_queue_time"
      metricsQuery: 'avg(rate(<<.Series>>{{<<.LabelMatchers>>}}[2m])) by (<<.GroupBy>>)'

---
apiVersion: v1
kind: ServiceMonitor
metadata:
  name: {APP_NAME}-metrics
  namespace: {NAMESPACE}
  labels:
    app: {APP_NAME}
    component: monitoring
spec:
  selector:
    matchLabels:
      app: {APP_NAME}
  
  endpoints:
  - port: metrics
    path: /metrics
    interval: 30s
    scrapeTimeout: 10s
    honorLabels: true
    
    # 指標重新標記
    metricRelabelings:
    - sourceLabels: [__name__]
      regex: 'nv_(.*)'
      targetLabel: triton_metric
      replacement: '${{1}}'
    
    # 樣本限制
    sampleLimit: 10000
    
  namespaceSelector:
    matchNames:
    - {NAMESPACE}
"""

# 寫入文件
with open(f"{MANIFESTS_DIR}/metrics-config.yaml", "w") as f:
    f.write(metrics_config_yaml)

print("📝 已創建自定義指標配置")
print(f"   文件位置: {MANIFESTS_DIR}/metrics-config.yaml")
print("\n📊 自定義指標:")
print("   - triton_request_rate: 請求速率")
print("   - nvidia_gpu_utilization: GPU 使用率")
print("   - triton_inference_latency: 推理延遲")
print("   - triton_gpu_memory_usage: GPU 記憶體")
print("   - triton_queue_time: 佇列等待時間")

## 🎯 實驗 6：部署驗證與測試

### 6.1 批量部署所有資源

In [None]:
def deploy_all_manifests():
    """批量部署所有 Kubernetes 資源"""
    
    # 定義部署順序 (依賴關係)
    deployment_order = [
        "namespace.yaml",
        "rbac.yaml", 
        "storage.yaml",
        "configmap.yaml",
        "deployment.yaml",
        "service.yaml",
        "ingress.yaml",
        "autoscaling.yaml",
        "metrics-config.yaml"
    ]
    
    deployment_results = []
    
    print("🚀 開始批量部署...")
    print("=" * 50)
    
    for i, manifest in enumerate(deployment_order, 1):
        manifest_path = f"{MANIFESTS_DIR}/{manifest}"
        
        if not os.path.exists(manifest_path):
            print(f"⚠️  檔案不存在: {manifest}")
            continue
            
        print(f"\n[{i}/{len(deployment_order)}] 部署 {manifest}...")
        
        success = apply_kubernetes_manifest(manifest_path)
        deployment_results.append((manifest, success))
        
        if success:
            print(f"✅ {manifest} 部署成功")
            # 等待資源創建
            time.sleep(5)
        else:
            print(f"❌ {manifest} 部署失敗")
    
    print("\n" + "=" * 50)
    print("📊 部署結果總結:")
    
    success_count = 0
    for manifest, success in deployment_results:
        status = "✅ 成功" if success else "❌ 失敗"
        print(f"   {manifest:<25} {status}")
        if success:
            success_count += 1
    
    print(f"\n總計: {success_count}/{len(deployment_results)} 成功")
    
    return deployment_results

# 執行部署 (如果集群可用)
if cluster_ready:
    deployment_results = deploy_all_manifests()
else:
    print("⚠️  跳過部署 - Kubernetes 集群不可用")
    print("\n📝 生成的配置文件:")
    for file in os.listdir(MANIFESTS_DIR):
        if file.endswith('.yaml'):
            print(f"   {MANIFESTS_DIR}/{file}")

### 6.2 部署狀態檢查

In [None]:
def check_deployment_status():
    """檢查部署狀態"""
    try:
        v1 = client.CoreV1Api()
        apps_v1 = client.AppsV1Api()
        
        print("🔍 檢查部署狀態...")
        print("=" * 50)
        
        # 檢查命名空間
        try:
            namespace = v1.read_namespace(name=NAMESPACE)
            print(f"✅ 命名空間: {NAMESPACE} (活躍)")
        except ApiException as e:
            print(f"❌ 命名空間: {NAMESPACE} (不存在)")
            return False
        
        # 檢查 PVC
        print("\n📦 持久化存儲:")
        pvcs = v1.list_namespaced_persistent_volume_claim(namespace=NAMESPACE)
        for pvc in pvcs.items:
            name = pvc.metadata.name
            status = pvc.status.phase
            capacity = pvc.status.capacity.get('storage', 'Unknown') if pvc.status.capacity else 'Unknown'
            print(f"   {name}: {status} ({capacity})")
        
        # 檢查 Deployment
        print("\n🚀 部署狀態:")
        deployments = apps_v1.list_namespaced_deployment(namespace=NAMESPACE)
        for deployment in deployments.items:
            name = deployment.metadata.name
            replicas = deployment.spec.replicas
            ready_replicas = deployment.status.ready_replicas or 0
            available_replicas = deployment.status.available_replicas or 0
            
            print(f"   {name}:")
            print(f"     期望副本: {replicas}")
            print(f"     就緒副本: {ready_replicas}")
            print(f"     可用副本: {available_replicas}")
            
            # 檢查部署條件
            if deployment.status.conditions:
                for condition in deployment.status.conditions:
                    if condition.type == "Available":
                        status_icon = "✅" if condition.status == "True" else "❌"
                        print(f"     {status_icon} 可用性: {condition.status}")
        
        # 檢查 Pod
        print("\n🏃 Pod 狀態:")
        pods = v1.list_namespaced_pod(namespace=NAMESPACE)
        for pod in pods.items:
            name = pod.metadata.name
            phase = pod.status.phase
            node = pod.spec.node_name or "未調度"
            
            # 檢查容器狀態
            ready_containers = 0
            total_containers = len(pod.spec.containers)
            
            if pod.status.container_statuses:
                ready_containers = sum(1 for status in pod.status.container_statuses if status.ready)
            
            status_icon = "✅" if phase == "Running" and ready_containers == total_containers else "⚠️"
            print(f"   {status_icon} {name}:")
            print(f"     階段: {phase}")
            print(f"     節點: {node}")
            print(f"     容器: {ready_containers}/{total_containers} 就緒")
        
        # 檢查 Service
        print("\n🌐 服務狀態:")
        services = v1.list_namespaced_service(namespace=NAMESPACE)
        for service in services.items:
            name = service.metadata.name
            service_type = service.spec.type
            cluster_ip = service.spec.cluster_ip
            
            print(f"   {name}:")
            print(f"     類型: {service_type}")
            print(f"     Cluster IP: {cluster_ip}")
            
            if service_type == "LoadBalancer":
                if service.status.load_balancer.ingress:
                    external_ip = service.status.load_balancer.ingress[0].ip
                    print(f"     外部 IP: {external_ip}")
                else:
                    print(f"     外部 IP: <待分配>")
        
        # 檢查 HPA
        print("\n📈 自動擴縮容:")
        try:
            autoscaling_v2 = client.AutoscalingV2Api()
            hpas = autoscaling_v2.list_namespaced_horizontal_pod_autoscaler(namespace=NAMESPACE)
            for hpa in hpas.items:
                name = hpa.metadata.name
                current_replicas = hpa.status.current_replicas or 0
                desired_replicas = hpa.status.desired_replicas or 0
                min_replicas = hpa.spec.min_replicas
                max_replicas = hpa.spec.max_replicas
                
                print(f"   {name}:")
                print(f"     當前副本: {current_replicas}")
                print(f"     目標副本: {desired_replicas}")
                print(f"     範圍: {min_replicas}-{max_replicas}")
        except Exception as e:
            print(f"   ⚠️  無法檢查 HPA: {e}")
        
        return True
        
    except Exception as e:
        print(f"❌ 狀態檢查失敗: {e}")
        return False

# 檢查部署狀態
if cluster_ready:
    check_deployment_status()
else:
    print("⚠️  跳過狀態檢查 - Kubernetes 集群不可用")

### 6.3 功能測試

In [None]:
def test_triton_deployment():
    """測試 Triton 部署功能"""
    print("🧪 開始功能測試...")
    print("=" * 50)
    
    # 測試配置
    test_results = []
    
    try:
        v1 = client.CoreV1Api()
        
        # 獲取服務端點
        services = v1.list_namespaced_service(namespace=NAMESPACE)
        service_endpoint = None
        
        for service in services.items:
            if service.metadata.name == f"{APP_NAME}-service":
                cluster_ip = service.spec.cluster_ip
                http_port = None
                
                for port in service.spec.ports:
                    if port.name == "http":
                        http_port = port.port
                        break
                
                if cluster_ip and http_port:
                    service_endpoint = f"http://{cluster_ip}:{http_port}"
                break
        
        if not service_endpoint:
            print("❌ 無法獲取服務端點")
            return False
        
        print(f"🎯 測試端點: {service_endpoint}")
        
        # 測試 1: 健康檢查
        print("\n1️⃣ 健康檢查測試:")
        try:
            response = requests.get(f"{service_endpoint}/v2/health/live", timeout=10)
            if response.status_code == 200:
                print("   ✅ 存活檢查: 通過")
                test_results.append(('health_live', True))
            else:
                print(f"   ❌ 存活檢查: 失敗 ({response.status_code})")
                test_results.append(('health_live', False))
        except Exception as e:
            print(f"   ❌ 存活檢查: 連接失敗 ({e})")
            test_results.append(('health_live', False))
        
        try:
            response = requests.get(f"{service_endpoint}/v2/health/ready", timeout=10)
            if response.status_code == 200:
                print("   ✅ 就緒檢查: 通過")
                test_results.append(('health_ready', True))
            else:
                print(f"   ❌ 就緒檢查: 失敗 ({response.status_code})")
                test_results.append(('health_ready', False))
        except Exception as e:
            print(f"   ❌ 就緒檢查: 連接失敗 ({e})")
            test_results.append(('health_ready', False))
        
        # 測試 2: 模型庫檢查
        print("\n2️⃣ 模型庫測試:")
        try:
            response = requests.get(f"{service_endpoint}/v2/models", timeout=10)
            if response.status_code == 200:
                models_data = response.json()
                model_count = len(models_data.get('models', []))
                print(f"   ✅ 模型庫: 可訪問 ({model_count} 個模型)")
                test_results.append(('models_api', True))
                
                if model_count > 0:
                    print("   📋 可用模型:")
                    for model in models_data['models'][:5]:  # 只顯示前5個
                        model_name = model.get('name', 'Unknown')
                        model_version = model.get('version', 'Unknown')
                        print(f"      - {model_name} (v{model_version})")
                else:
                    print("   ⚠️  沒有可用的模型")
            else:
                print(f"   ❌ 模型庫: API 錯誤 ({response.status_code})")
                test_results.append(('models_api', False))
        except Exception as e:
            print(f"   ❌ 模型庫: 連接失敗 ({e})")
            test_results.append(('models_api', False))
        
        # 測試 3: 指標檢查
        print("\n3️⃣ 指標測試:")
        try:
            # 嘗試獲取指標端點
            metrics_endpoint = service_endpoint.replace(':8000', ':8002')
            response = requests.get(f"{metrics_endpoint}/metrics", timeout=10)
            
            if response.status_code == 200:
                metrics_text = response.text
                # 檢查關鍵指標
                key_metrics = [
                    'nv_inference_request_success',
                    'nv_inference_request_failure', 
                    'nv_inference_queue_duration_us',
                    'nv_gpu_utilization'
                ]
                
                found_metrics = []
                for metric in key_metrics:
                    if metric in metrics_text:
                        found_metrics.append(metric)
                
                print(f"   ✅ 指標端點: 可訪問")
                print(f"   📊 找到指標: {len(found_metrics)}/{len(key_metrics)}")
                test_results.append(('metrics_api', True))
                
                if found_metrics:
                    print("   📈 可用指標:")
                    for metric in found_metrics:
                        print(f"      - {metric}")
            else:
                print(f"   ❌ 指標端點: HTTP 錯誤 ({response.status_code})")
                test_results.append(('metrics_api', False))
        except Exception as e:
            print(f"   ❌ 指標端點: 連接失敗 ({e})")
            test_results.append(('metrics_api', False))
        
        # 測試 4: 負載均衡測試
        print("\n4️⃣ 負載均衡測試:")
        try:
            # 多次請求檢查負載分配
            server_ids = set()
            success_count = 0
            
            for i in range(5):
                response = requests.get(f"{service_endpoint}/v2/health/live", timeout=5)
                if response.status_code == 200:
                    success_count += 1
                    # 嘗試從響應標頭獲取服務器資訊
                    server_id = response.headers.get('Server', f"unknown-{i}")
                    server_ids.add(server_id)
            
            print(f"   ✅ 負載均衡: {success_count}/5 請求成功")
            print(f"   🔄 服務器實例: {len(server_ids)} 個")
            test_results.append(('load_balancing', success_count >= 4))
        except Exception as e:
            print(f"   ❌ 負載均衡: 測試失敗 ({e})")
            test_results.append(('load_balancing', False))
        
        # 總結測試結果
        print("\n" + "=" * 50)
        print("📊 測試結果總結:")
        
        success_count = 0
        for test_name, success in test_results:
            status = "✅ 通過" if success else "❌ 失敗"
            test_display_names = {
                'health_live': '存活檢查',
                'health_ready': '就緒檢查', 
                'models_api': '模型庫 API',
                'metrics_api': '指標端點',
                'load_balancing': '負載均衡'
            }
            display_name = test_display_names.get(test_name, test_name)
            print(f"   {display_name:<15} {status}")
            if success:
                success_count += 1
        
        overall_success = success_count >= len(test_results) * 0.8  # 80% 通過率
        overall_status = "✅ 整體通過" if overall_success else "❌ 整體失敗"
        print(f"\n{overall_status} ({success_count}/{len(test_results)})")
        
        return overall_success
        
    except Exception as e:
        print(f"❌ 測試執行失敗: {e}")
        return False

# 執行功能測試
if cluster_ready:
    test_success = test_triton_deployment()
else:
    print("⚠️  跳過功能測試 - Kubernetes 集群不可用")
    test_success = False

## 📊 實驗總結

In [None]:
# 生成部署報告
def generate_deployment_report():
    """生成部署報告"""
    
    report = f"""
# Kubernetes 部署報告

## 📋 部署概要

**時間**: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
**命名空間**: {NAMESPACE}
**應用**: {APP_NAME}
**映像**: {IMAGE}

## 🏗️ 資源配置

### 計算資源
- **CPU**: 2-4 cores per pod
- **記憶體**: 8-16Gi per pod
- **GPU**: 1x NVIDIA GPU per pod
- **副本數**: 2 (min) - 10 (max)

### 存儲資源
- **模型存儲**: 100Gi (ReadOnlyMany)
- **快取存儲**: 50Gi (ReadWriteMany)
- **共享記憶體**: 1Gi per pod

### 網路配置
- **HTTP 端口**: 8000
- **gRPC 端口**: 8001 
- **指標端口**: 8002
- **TLS**: 支援 (Let's Encrypt)

## 🔧 功能特性

### 高可用性
- ✅ 多副本部署 (反親和性)
- ✅ 滾動更新 (零停機)
- ✅ 健康檢查 (存活/就緒/啟動)
- ✅ Pod 中斷預算 (50% 最小可用)

### 自動擴縮容
- ✅ HPA (CPU/Memory/GPU/RPS)
- ✅ VPA 建議模式
- ✅ 自定義指標支援
- ✅ 智能擴縮策略

### 安全性
- ✅ RBAC 最小權限
- ✅ 網路政策
- ✅ 非 root 用戶
- ✅ 安全上下文

### 監控與可觀測性
- ✅ Prometheus 指標
- ✅ 自定義指標
- ✅ ServiceMonitor
- ✅ 指標重新標記

## 📁 生成的配置文件
"""
    
    # 列出所有生成的文件
    manifest_files = []
    for file in os.listdir(MANIFESTS_DIR):
        if file.endswith('.yaml'):
            file_path = f"{MANIFESTS_DIR}/{file}"
            file_size = os.path.getsize(file_path)
            manifest_files.append((file, file_size))
    
    for file, size in sorted(manifest_files):
        report += f"\n- `{file}` ({size:,} bytes)"
    
    report += f"""

## 🚀 部署命令

```bash
# 一鍵部署所有資源
kubectl apply -f {MANIFESTS_DIR}/

# 檢查部署狀態
kubectl get all -n {NAMESPACE}

# 查看 Pod 日誌
kubectl logs -f deployment/{APP_NAME} -n {NAMESPACE}

# 端口轉發 (測試用)
kubectl port-forward svc/{APP_NAME}-service 8000:8000 -n {NAMESPACE}
```

## 🧪 驗證步驟

```bash
# 健康檢查
curl http://localhost:8000/v2/health/live
curl http://localhost:8000/v2/health/ready

# 模型列表
curl http://localhost:8000/v2/models

# Prometheus 指標
curl http://localhost:8002/metrics
```

## 📚 下一步

1. **模型部署**: 上傳模型到 PV 並配置 config.pbtxt
2. **監控設置**: 配置 Prometheus 和 Grafana
3. **CI/CD 整合**: 設置自動化部署流程
4. **壓力測試**: 驗證自動擴縮容功能
5. **災難恢復**: 測試故障轉移機制

---
**生成時間**: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
**實驗室**: Lab 2.5.1 - Kubernetes 原生部署
"""
    
    # 保存報告
    report_path = f"{K8S_DIR}/deployment-report.md"
    with open(report_path, "w", encoding="utf-8") as f:
        f.write(report)
    
    print("📊 部署報告已生成")
    print(f"   文件位置: {report_path}")
    
    return report_path

# 生成報告
report_path = generate_deployment_report()

print("\n" + "=" * 60)
print("🎉 Lab 2.5.1 實驗完成！")
print("=" * 60)
print("\n✅ 完成項目:")
print("   - Kubernetes 集群環境準備")
print("   - GPU 資源調度配置")
print("   - 高可用 Triton 部署")
print("   - 服務發現與負載均衡")
print("   - 自動擴縮容機制")
print("   - 安全與網路政策")
print("   - 監控指標整合")
print("\n📁 輸出文件:")
print(f"   - 配置文件: {MANIFESTS_DIR}/")
print(f"   - 部署報告: {report_path}")
print("\n🚀 準備進入下一階段: CI/CD 與 MLOps 整合")