# Lab 2.3.4 - 自定義 Python Backend 開發

## 🎯 實驗目標

本實驗將教您如何：
1. 開發自定義 Python Backend 處理複雜邏輯
2. 實現多步驟推理流程
3. 整合外部服務和 API
4. 處理動態批次和流式輸出
5. 監控和調試 Python Backend

## 📋 前置需求

- 完成 Lab 2.1（Triton 基礎設置）
- 熟悉 Python 編程和異步處理
- 了解 REST API 和 gRPC 協議

---

## 📚 理論背景

### Python Backend 的優勢

**1. 靈活性**
- 可以實現任意複雜的推理邏輯
- 支持動態模型加載和切換
- 易於集成外部庫和服務

**2. 快速開發**
- Python 生態系統豐富
- 調試和測試便利
- 快速原型驗證

**3. 適用場景**
- 多步驟推理管道
- 複雜的預/後處理
- 集成外部 API 和數據庫
- A/B 測試和實驗功能

### Python Backend 架構

```mermaid
graph TD
    A[Client Request] --> B[Triton Server]
    B --> C[Python Backend]
    C --> D[Model Instance]
    D --> E[Execute Function]
    E --> F[External API]
    E --> G[Database]
    E --> H[Other Models]
    E --> I[Response Processing]
    I --> B
    B --> A
```

## 🛠️ 環境準備

In [None]:
import os
import json
import time
import asyncio
import requests
import numpy as np
import pandas as pd
from datetime import datetime
from typing import List, Dict, Any, Optional

# Triton 相關
import triton_python_backend_utils as pb_utils
import tritonclient.http as httpclient
from tritonclient.utils import InferenceServerException

# 機器學習相關
import torch
import transformers
from transformers import AutoTokenizer, AutoModel

# 檢查環境
print(f"Python version: {__import__('sys').version}")
print(f"PyTorch version: {torch.__version__}")
print(f"Transformers version: {transformers.__version__}")
print(f"Working directory: {os.getcwd()}")

In [None]:
# 設置實驗路徑
BASE_DIR = "/opt/tritonserver"
MODEL_REPO = f"{BASE_DIR}/models"
PYTHON_BACKEND_DIR = f"{MODEL_REPO}/custom_pipeline"

# 創建目錄結構
os.makedirs(f"{PYTHON_BACKEND_DIR}/1", exist_ok=True)

print(f"Model repository: {MODEL_REPO}")
print(f"Python backend directory: {PYTHON_BACKEND_DIR}")

## 🎯 實驗 1：基礎 Python Backend

### 1.1 創建模型配置

In [None]:
# 基礎 Python Backend 配置
config_pbtxt = '''
name: "custom_pipeline"
backend: "python"
max_batch_size: 8

input [
  {
    name: "text_input"
    data_type: TYPE_STRING
    dims: [ 1 ]
  },
  {
    name: "parameters"
    data_type: TYPE_STRING
    dims: [ 1 ]
    optional: true
  }
]

output [
  {
    name: "processed_text"
    data_type: TYPE_STRING
    dims: [ 1 ]
  },
  {
    name: "metadata"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]

instance_group [
  {
    count: 2
    kind: KIND_CPU
  }
]

parameters: {
  key: "EXECUTION_ENV_PATH",
  value: {string_value: "$$TRITON_MODEL_DIRECTORY/python_env.tar.gz"}
}
'''

with open(f"{PYTHON_BACKEND_DIR}/config.pbtxt", "w") as f:
    f.write(config_pbtxt)

print("✅ 基礎配置創建完成")

### 1.2 實現 Python Backend 邏輯

In [None]:
# 基礎 Python Backend 實現
python_backend_code = '''
import json
import time
import asyncio
import numpy as np
from typing import List, Dict, Any
import triton_python_backend_utils as pb_utils


class TritonPythonModel:
    """
    自定義 Python Backend 模型
    """

    def initialize(self, args):
        """
        模型初始化
        """
        self.model_config = model_config = json.loads(args['model_config'])
        
        # 獲取輸入輸出配置
        input_configs = pb_utils.get_input_config_by_name(
            model_config, "text_input"
        )
        output_configs = pb_utils.get_output_config_by_name(
            model_config, "processed_text"
        )
        
        # 初始化處理管道組件
        self.text_processors = {
            "uppercase": lambda x: x.upper(),
            "lowercase": lambda x: x.lower(),
            "reverse": lambda x: x[::-1],
            "word_count": lambda x: f"Words: {len(x.split())}",
        }
        
        # 統計信息
        self.request_count = 0
        self.start_time = time.time()
        
        print(f"✅ Python Backend 初始化完成")
        print(f"📊 可用處理器: {list(self.text_processors.keys())}")

    def execute(self, requests):
        """
        處理推理請求
        """
        responses = []
        
        for request in requests:
            # 解析輸入
            text_input = pb_utils.get_input_tensor_by_name(
                request, "text_input"
            ).as_numpy().astype(str)[0]
            
            # 解析參數（可選）
            parameters = {}
            try:
                param_tensor = pb_utils.get_input_tensor_by_name(
                    request, "parameters"
                )
                if param_tensor is not None:
                    parameters = json.loads(
                        param_tensor.as_numpy().astype(str)[0]
                    )
            except:
                parameters = {}
            
            # 執行處理
            processed_text, metadata = self._process_text(
                text_input, parameters
            )
            
            # 創建輸出張量
            processed_tensor = pb_utils.Tensor(
                "processed_text",
                np.array([processed_text], dtype=np.object_)
            )
            
            metadata_tensor = pb_utils.Tensor(
                "metadata",
                np.array([json.dumps(metadata)], dtype=np.object_)
            )
            
            # 創建響應
            response = pb_utils.InferenceResponse(
                output_tensors=[processed_tensor, metadata_tensor]
            )
            responses.append(response)
            
            # 更新統計
            self.request_count += 1
        
        return responses

    def _process_text(self, text: str, parameters: dict) -> tuple:
        """
        文本處理邏輯
        """
        start_time = time.time()
        
        # 獲取處理類型
        process_type = parameters.get("type", "uppercase")
        
        # 執行處理
        if process_type in self.text_processors:
            processed = self.text_processors[process_type](text)
        else:
            processed = text
        
        # 創建元數據
        metadata = {
            "original_length": len(text),
            "processed_length": len(processed),
            "process_type": process_type,
            "processing_time_ms": (time.time() - start_time) * 1000,
            "request_id": self.request_count + 1,
            "timestamp": time.time()
        }
        
        return processed, metadata

    def finalize(self):
        """
        模型清理
        """
        total_time = time.time() - self.start_time
        print(f"🏁 Python Backend 結束")
        print(f"📈 總請求數: {self.request_count}")
        print(f"⏱️  總運行時間: {total_time:.2f}s")
        print(f"📊 平均 QPS: {self.request_count/total_time:.2f}")
'''

with open(f"{PYTHON_BACKEND_DIR}/1/model.py", "w") as f:
    f.write(python_backend_code)

print("✅ Python Backend 代碼創建完成")

## 🎯 實驗 2：高級 Pipeline Backend

In [None]:
# 創建高級管道模型目錄
ADVANCED_PIPELINE_DIR = f"{MODEL_REPO}/advanced_pipeline"
os.makedirs(f"{ADVANCED_PIPELINE_DIR}/1", exist_ok=True)

# 高級管道配置
advanced_config = '''
name: "advanced_pipeline"
backend: "python"
max_batch_size: 4

input [
  {
    name: "query"
    data_type: TYPE_STRING
    dims: [ 1 ]
  },
  {
    name: "context"
    data_type: TYPE_STRING
    dims: [ -1 ]
    optional: true
  },
  {
    name: "config"
    data_type: TYPE_STRING
    dims: [ 1 ]
    optional: true
  }
]

output [
  {
    name: "answer"
    data_type: TYPE_STRING
    dims: [ 1 ]
  },
  {
    name: "confidence"
    data_type: TYPE_FP32
    dims: [ 1 ]
  },
  {
    name: "sources"
    data_type: TYPE_STRING
    dims: [ -1 ]
  },
  {
    name: "pipeline_info"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]

instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [ 0 ]
  }
]

dynamic_batching {
  max_queue_delay_microseconds: 100
}
'''

with open(f"{ADVANCED_PIPELINE_DIR}/config.pbtxt", "w") as f:
    f.write(advanced_config)

print("✅ 高級管道配置創建完成")

In [None]:
# 高級管道實現
advanced_pipeline_code = '''
import json
import time
import asyncio
import threading
import numpy as np
import requests
from typing import List, Dict, Any, Optional
from concurrent.futures import ThreadPoolExecutor
import triton_python_backend_utils as pb_utils


class TritonPythonModel:
    """
    高級管道處理模型
    實現多步驟推理：檢索 -> 重排 -> 生成 -> 後處理
    """

    def initialize(self, args):
        """
        模型初始化
        """
        self.model_config = json.loads(args['model_config'])
        
        # 初始化組件
        self.retriever = SimpleRetriever()
        self.reranker = SimpleReranker()
        self.generator = SimpleGenerator()
        self.post_processor = PostProcessor()
        
        # 線程池
        self.executor = ThreadPoolExecutor(max_workers=4)
        
        # 統計信息
        self.stats = {
            "total_requests": 0,
            "successful_requests": 0,
            "failed_requests": 0,
            "avg_pipeline_time": 0.0,
            "start_time": time.time()
        }
        
        print("🚀 高級管道初始化完成")
        print("📋 管道步驟: 檢索 -> 重排 -> 生成 -> 後處理")

    def execute(self, requests):
        """
        執行管道推理
        """
        responses = []
        
        for request in requests:
            try:
                # 解析輸入
                query = pb_utils.get_input_tensor_by_name(
                    request, "query"
                ).as_numpy().astype(str)[0]
                
                # 解析上下文
                context = []
                context_tensor = pb_utils.get_input_tensor_by_name(
                    request, "context"
                )
                if context_tensor is not None:
                    context = context_tensor.as_numpy().astype(str).tolist()
                
                # 解析配置
                config = {}
                config_tensor = pb_utils.get_input_tensor_by_name(
                    request, "config"
                )
                if config_tensor is not None:
                    try:
                        config = json.loads(
                            config_tensor.as_numpy().astype(str)[0]
                        )
                    except:
                        config = {}
                
                # 執行管道
                result = self._execute_pipeline(query, context, config)
                
                # 創建響應
                response = self._create_response(result)
                responses.append(response)
                
                self.stats["successful_requests"] += 1
                
            except Exception as e:
                print(f"❌ 請求處理失敗: {str(e)}")
                
                # 創建錯誤響應
                error_response = self._create_error_response(str(e))
                responses.append(error_response)
                
                self.stats["failed_requests"] += 1
            
            self.stats["total_requests"] += 1
        
        return responses

    def _execute_pipeline(self, query: str, context: List[str], config: dict) -> dict:
        """
        執行完整推理管道
        """
        start_time = time.time()
        pipeline_info = {
            "steps": [],
            "total_time": 0.0,
            "query": query,
            "config": config
        }
        
        try:
            # 步驟 1: 檢索相關文檔
            step_start = time.time()
            if not context:
                retrieved_docs = self.retriever.retrieve(query, config.get("top_k", 5))
            else:
                retrieved_docs = context
            
            step_time = (time.time() - step_start) * 1000
            pipeline_info["steps"].append({
                "name": "retrieve",
                "time_ms": step_time,
                "output_count": len(retrieved_docs)
            })
            
            # 步驟 2: 重排序文檔
            step_start = time.time()
            reranked_docs = self.reranker.rerank(query, retrieved_docs)
            
            step_time = (time.time() - step_start) * 1000
            pipeline_info["steps"].append({
                "name": "rerank",
                "time_ms": step_time,
                "rerank_changes": len(reranked_docs)
            })
            
            # 步驟 3: 生成答案
            step_start = time.time()
            generation_result = self.generator.generate(
                query, reranked_docs, config
            )
            
            step_time = (time.time() - step_start) * 1000
            pipeline_info["steps"].append({
                "name": "generate",
                "time_ms": step_time,
                "tokens_generated": len(generation_result["answer"].split())
            })
            
            # 步驟 4: 後處理
            step_start = time.time()
            final_result = self.post_processor.process(
                generation_result, config
            )
            
            step_time = (time.time() - step_start) * 1000
            pipeline_info["steps"].append({
                "name": "post_process",
                "time_ms": step_time,
                "final_length": len(final_result["answer"])
            })
            
            # 計算總時間
            pipeline_info["total_time"] = (time.time() - start_time) * 1000
            
            # 添加源文檔
            final_result["sources"] = [doc["title"] for doc in reranked_docs[:3]]
            final_result["pipeline_info"] = pipeline_info
            
            return final_result
            
        except Exception as e:
            pipeline_info["error"] = str(e)
            pipeline_info["total_time"] = (time.time() - start_time) * 1000
            raise e

    def _create_response(self, result: dict):
        """
        創建響應張量
        """
        answer_tensor = pb_utils.Tensor(
            "answer",
            np.array([result["answer"]], dtype=np.object_)
        )
        
        confidence_tensor = pb_utils.Tensor(
            "confidence",
            np.array([result["confidence"]], dtype=np.float32)
        )
        
        sources_tensor = pb_utils.Tensor(
            "sources",
            np.array(result["sources"], dtype=np.object_)
        )
        
        pipeline_info_tensor = pb_utils.Tensor(
            "pipeline_info",
            np.array([json.dumps(result["pipeline_info"])], dtype=np.object_)
        )
        
        return pb_utils.InferenceResponse(
            output_tensors=[
                answer_tensor, confidence_tensor,
                sources_tensor, pipeline_info_tensor
            ]
        )

    def _create_error_response(self, error_msg: str):
        """
        創建錯誤響應
        """
        return pb_utils.InferenceResponse(
            output_tensors=[
                pb_utils.Tensor("answer", np.array([f"Error: {error_msg}"], dtype=np.object_)),
                pb_utils.Tensor("confidence", np.array([0.0], dtype=np.float32)),
                pb_utils.Tensor("sources", np.array([], dtype=np.object_)),
                pb_utils.Tensor("pipeline_info", np.array([json.dumps({"error": error_msg})], dtype=np.object_))
            ]
        )

    def finalize(self):
        """
        清理資源
        """
        self.executor.shutdown(wait=True)
        
        total_time = time.time() - self.stats["start_time"]
        print(f"🏁 高級管道結束")
        print(f"📊 統計信息:")
        print(f"   總請求: {self.stats['total_requests']}")
        print(f"   成功: {self.stats['successful_requests']}")
        print(f"   失敗: {self.stats['failed_requests']}")
        print(f"   成功率: {self.stats['successful_requests']/max(self.stats['total_requests'], 1)*100:.1f}%")


# 輔助類實現
class SimpleRetriever:
    def __init__(self):
        # 模擬文檔庫
        self.documents = [
            {"id": 1, "title": "Python 基礎", "content": "Python 是一種解釋型程式語言..."},
            {"id": 2, "title": "機器學習入門", "content": "機器學習是人工智慧的子領域..."},
            {"id": 3, "title": "深度學習框架", "content": "PyTorch 和 TensorFlow 是流行的框架..."},
            {"id": 4, "title": "自然語言處理", "content": "NLP 處理人類語言的計算方法..."},
            {"id": 5, "title": "Transformer 架構", "content": "注意力機制是 Transformer 的核心..."}
        ]
    
    def retrieve(self, query: str, top_k: int = 5):
        # 簡單的關鍵詞匹配
        scored_docs = []
        query_lower = query.lower()
        
        for doc in self.documents:
            score = 0
            if query_lower in doc["title"].lower():
                score += 2
            if query_lower in doc["content"].lower():
                score += 1
            
            scored_docs.append((doc, score))
        
        # 排序並返回 top_k
        scored_docs.sort(key=lambda x: x[1], reverse=True)
        return [doc for doc, score in scored_docs[:top_k]]


class SimpleReranker:
    def rerank(self, query: str, documents: List[dict]):
        # 簡單重排：根據標題相關性
        query_words = set(query.lower().split())
        
        scored_docs = []
        for doc in documents:
            title_words = set(doc["title"].lower().split())
            overlap = len(query_words & title_words)
            scored_docs.append((doc, overlap))
        
        scored_docs.sort(key=lambda x: x[1], reverse=True)
        return [doc for doc, score in scored_docs]


class SimpleGenerator:
    def generate(self, query: str, documents: List[dict], config: dict):
        # 簡單生成：基於模板
        if not documents:
            return {
                "answer": "抱歉，沒有找到相關信息。",
                "confidence": 0.1
            }
        
        # 使用第一個文檔生成答案
        top_doc = documents[0]
        answer = f"根據'{top_doc['title']}'，{top_doc['content'][:100]}..."
        
        # 計算置信度（基於文檔數量和匹配度）
        confidence = min(0.9, 0.3 + 0.1 * len(documents))
        
        return {
            "answer": answer,
            "confidence": confidence
        }


class PostProcessor:
    def process(self, generation_result: dict, config: dict):
        # 後處理：格式化和優化
        answer = generation_result["answer"]
        
        # 應用配置
        if config.get("format") == "markdown":
            answer = f"**答案:** {answer}"
        
        if config.get("max_length"):
            max_len = config["max_length"]
            if len(answer) > max_len:
                answer = answer[:max_len-3] + "..."
        
        return {
            "answer": answer,
            "confidence": generation_result["confidence"]
        }
'''

with open(f"{ADVANCED_PIPELINE_DIR}/1/model.py", "w") as f:
    f.write(advanced_pipeline_code)

print("✅ 高級管道代碼創建完成")

## 🎯 實驗 3：異步處理 Backend

In [None]:
# 創建異步處理模型目錄
ASYNC_BACKEND_DIR = f"{MODEL_REPO}/async_processor"
os.makedirs(f"{ASYNC_BACKEND_DIR}/1", exist_ok=True)

# 異步處理配置
async_config = '''
name: "async_processor"
backend: "python"
max_batch_size: 16

input [
  {
    name: "requests"
    data_type: TYPE_STRING
    dims: [ 1 ]
  },
  {
    name: "async_config"
    data_type: TYPE_STRING
    dims: [ 1 ]
    optional: true
  }
]

output [
  {
    name: "results"
    data_type: TYPE_STRING
    dims: [ 1 ]
  },
  {
    name: "status"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]

instance_group [
  {
    count: 1
    kind: KIND_CPU
  }
]

dynamic_batching {
  max_queue_delay_microseconds: 500
  default_queue_policy {
    timeout_action: DELAY
    default_timeout_microseconds: 1000
  }
}
'''

with open(f"{ASYNC_BACKEND_DIR}/config.pbtxt", "w") as f:
    f.write(async_config)

print("✅ 異步處理配置創建完成")

In [None]:
# 異步處理實現
async_backend_code = '''
import json
import time
import asyncio
import aiohttp
import numpy as np
from typing import List, Dict, Any, Optional
from concurrent.futures import ThreadPoolExecutor, as_completed
import triton_python_backend_utils as pb_utils


class TritonPythonModel:
    """
    異步處理 Backend
    支持並發 API 調用和批量處理
    """

    def initialize(self, args):
        """
        模型初始化
        """
        self.model_config = json.loads(args['model_config'])
        
        # 創建事件循環（在新線程中）
        self.loop = None
        self.loop_thread = None
        self._start_event_loop()
        
        # HTTP 客戶端會話
        self.session = None
        
        # 線程池
        self.executor = ThreadPoolExecutor(max_workers=8)
        
        # 統計信息
        self.stats = {
            "total_requests": 0,
            "async_requests": 0,
            "batch_requests": 0,
            "avg_batch_size": 0.0,
            "total_async_time": 0.0
        }
        
        print("🔄 異步處理 Backend 初始化完成")
        print("⚡ 支持並發 API 調用和批量處理")

    def _start_event_loop(self):
        """
        在新線程中啟動事件循環
        """
        import threading
        
        def run_loop():
            self.loop = asyncio.new_event_loop()
            asyncio.set_event_loop(self.loop)
            self.loop.run_forever()
        
        self.loop_thread = threading.Thread(target=run_loop, daemon=True)
        self.loop_thread.start()
        
        # 等待事件循環啟動
        time.sleep(0.1)

    def execute(self, requests):
        """
        執行異步處理
        """
        batch_size = len(requests)
        self.stats["total_requests"] += batch_size
        self.stats["batch_requests"] += 1
        self.stats["avg_batch_size"] = self.stats["total_requests"] / self.stats["batch_requests"]
        
        # 解析所有請求
        parsed_requests = []
        for request in requests:
            try:
                request_data = pb_utils.get_input_tensor_by_name(
                    request, "requests"
                ).as_numpy().astype(str)[0]
                
                config_data = {}
                config_tensor = pb_utils.get_input_tensor_by_name(
                    request, "async_config"
                )
                if config_tensor is not None:
                    try:
                        config_data = json.loads(
                            config_tensor.as_numpy().astype(str)[0]
                        )
                    except:
                        config_data = {}
                
                parsed_requests.append({
                    "data": json.loads(request_data),
                    "config": config_data
                })
                
            except Exception as e:
                parsed_requests.append({
                    "data": {"error": str(e)},
                    "config": {}
                })
        
        # 執行異步處理
        start_time = time.time()
        
        if self.loop and not self.loop.is_closed():
            # 使用異步處理
            future = asyncio.run_coroutine_threadsafe(
                self._async_process_batch(parsed_requests), self.loop
            )
            results = future.result(timeout=30)  # 30秒超時
            self.stats["async_requests"] += batch_size
        else:
            # 回退到同步處理
            results = self._sync_process_batch(parsed_requests)
        
        async_time = time.time() - start_time
        self.stats["total_async_time"] += async_time
        
        # 創建響應
        responses = []
        for result in results:
            result_tensor = pb_utils.Tensor(
                "results",
                np.array([json.dumps(result["result"])], dtype=np.object_)
            )
            
            status_tensor = pb_utils.Tensor(
                "status",
                np.array([result["status"]], dtype=np.object_)
            )
            
            response = pb_utils.InferenceResponse(
                output_tensors=[result_tensor, status_tensor]
            )
            responses.append(response)
        
        return responses

    async def _async_process_batch(self, requests: List[dict]) -> List[dict]:
        """
        異步批量處理
        """
        # 創建 HTTP 會話
        if self.session is None:
            connector = aiohttp.TCPConnector(limit=100)
            timeout = aiohttp.ClientTimeout(total=30)
            self.session = aiohttp.ClientSession(
                connector=connector, timeout=timeout
            )
        
        # 創建異步任務
        tasks = []
        for req in requests:
            if req["data"].get("type") == "api_call":
                task = self._async_api_call(req["data"], req["config"])
            elif req["data"].get("type") == "computation":
                task = self._async_computation(req["data"], req["config"])
            else:
                task = self._async_default_process(req["data"], req["config"])
            
            tasks.append(task)
        
        # 並發執行所有任務
        results = await asyncio.gather(*tasks, return_exceptions=True)
        
        # 處理結果
        processed_results = []
        for i, result in enumerate(results):
            if isinstance(result, Exception):
                processed_results.append({
                    "result": {"error": str(result)},
                    "status": "error"
                })
            else:
                processed_results.append({
                    "result": result,
                    "status": "success"
                })
        
        return processed_results

    async def _async_api_call(self, data: dict, config: dict) -> dict:
        """
        異步 API 調用
        """
        url = data.get("url", "https://httpbin.org/delay/1")
        method = data.get("method", "GET").upper()
        payload = data.get("payload", {})
        
        try:
            if method == "GET":
                async with self.session.get(url, params=payload) as response:
                    result = await response.json()
            else:
                async with self.session.post(url, json=payload) as response:
                    result = await response.json()
            
            return {
                "api_result": result,
                "status_code": response.status,
                "url": url
            }
            
        except Exception as e:
            return {
                "error": str(e),
                "url": url
            }

    async def _async_computation(self, data: dict, config: dict) -> dict:
        """
        異步計算任務
        """
        operation = data.get("operation", "sum")
        numbers = data.get("numbers", [1, 2, 3, 4, 5])
        
        # 模擬計算延遲
        await asyncio.sleep(0.1)
        
        if operation == "sum":
            result = sum(numbers)
        elif operation == "product":
            result = 1
            for n in numbers:
                result *= n
        elif operation == "average":
            result = sum(numbers) / len(numbers) if numbers else 0
        else:
            result = len(numbers)
        
        return {
            "operation": operation,
            "result": result,
            "input_count": len(numbers)
        }

    async def _async_default_process(self, data: dict, config: dict) -> dict:
        """
        默認異步處理
        """
        # 模擬處理時間
        delay = config.get("delay", 0.05)
        await asyncio.sleep(delay)
        
        return {
            "processed_data": data,
            "processing_time": delay,
            "timestamp": time.time()
        }

    def _sync_process_batch(self, requests: List[dict]) -> List[dict]:
        """
        同步批量處理（回退方案）
        """
        results = []
        
        with ThreadPoolExecutor(max_workers=4) as executor:
            futures = []
            for req in requests:
                future = executor.submit(self._sync_process_single, req)
                futures.append(future)
            
            for future in as_completed(futures):
                try:
                    result = future.result(timeout=10)
                    results.append({
                        "result": result,
                        "status": "success"
                    })
                except Exception as e:
                    results.append({
                        "result": {"error": str(e)},
                        "status": "error"
                    })
        
        return results

    def _sync_process_single(self, request: dict) -> dict:
        """
        同步處理單個請求
        """
        data = request["data"]
        config = request["config"]
        
        # 模擬處理
        time.sleep(config.get("delay", 0.1))
        
        return {
            "sync_processed": data,
            "timestamp": time.time()
        }

    def finalize(self):
        """
        清理資源
        """
        # 關閉 HTTP 會話
        if self.session and not self.session.closed:
            asyncio.run_coroutine_threadsafe(
                self.session.close(), self.loop
            ).result(timeout=5)
        
        # 停止事件循環
        if self.loop and not self.loop.is_closed():
            self.loop.call_soon_threadsafe(self.loop.stop)
        
        # 關閉線程池
        self.executor.shutdown(wait=True)
        
        print(f"🏁 異步處理 Backend 結束")
        print(f"📊 處理統計:")
        print(f"   總請求: {self.stats['total_requests']}")
        print(f"   異步請求: {self.stats['async_requests']}")
        print(f"   批次數量: {self.stats['batch_requests']}")
        print(f"   平均批次大小: {self.stats['avg_batch_size']:.1f}")
        if self.stats['async_requests'] > 0:
            avg_async_time = self.stats['total_async_time'] / self.stats['batch_requests']
            print(f"   平均異步處理時間: {avg_async_time:.3f}s")
'''

with open(f"{ASYNC_BACKEND_DIR}/1/model.py", "w") as f:
    f.write(async_backend_code)

print("✅ 異步處理代碼創建完成")

## 🧪 測試和驗證

### 測試客戶端代碼

In [None]:
# Python Backend 測試客戶端
class PythonBackendTester:
    def __init__(self, server_url="localhost:8000"):
        self.server_url = server_url
        self.client = httpclient.InferenceServerClient(
            url=server_url, verbose=False
        )
    
    def test_basic_backend(self):
        """測試基礎 Python Backend"""
        print("🧪 測試基礎 Python Backend...")
        
        # 準備測試數據
        test_cases = [
            {
                "text": "Hello World",
                "params": {"type": "uppercase"}
            },
            {
                "text": "Python Programming",
                "params": {"type": "reverse"}
            },
            {
                "text": "Machine Learning is Amazing",
                "params": {"type": "word_count"}
            }
        ]
        
        for i, case in enumerate(test_cases):
            try:
                # 創建輸入
                text_input = httpclient.InferInput(
                    "text_input", [1], "BYTES"
                )
                text_input.set_data_from_numpy(
                    np.array([case["text"]], dtype=np.object_)
                )
                
                params_input = httpclient.InferInput(
                    "parameters", [1], "BYTES"
                )
                params_input.set_data_from_numpy(
                    np.array([json.dumps(case["params"])], dtype=np.object_)
                )
                
                # 創建輸出
                outputs = [
                    httpclient.InferRequestedOutput("processed_text"),
                    httpclient.InferRequestedOutput("metadata")
                ]
                
                # 發送請求
                response = self.client.infer(
                    "custom_pipeline",
                    inputs=[text_input, params_input],
                    outputs=outputs
                )
                
                # 解析結果
                processed_text = response.as_numpy("processed_text")[0].decode()
                metadata = json.loads(response.as_numpy("metadata")[0].decode())
                
                print(f"📝 測試案例 {i+1}:")
                print(f"   輸入: '{case['text']}'")
                print(f"   處理類型: {case['params']['type']}")
                print(f"   結果: '{processed_text}'")
                print(f"   處理時間: {metadata['processing_time_ms']:.2f}ms")
                print("")
                
            except Exception as e:
                print(f"❌ 測試案例 {i+1} 失敗: {str(e)}")
    
    def test_advanced_pipeline(self):
        """測試高級管道"""
        print("🧪 測試高級管道...")
        
        try:
            # 準備測試數據
            query = "什麼是機器學習？"
            config = {
                "top_k": 3,
                "format": "markdown",
                "max_length": 200
            }
            
            # 創建輸入
            query_input = httpclient.InferInput("query", [1], "BYTES")
            query_input.set_data_from_numpy(
                np.array([query], dtype=np.object_)
            )
            
            config_input = httpclient.InferInput("config", [1], "BYTES")
            config_input.set_data_from_numpy(
                np.array([json.dumps(config)], dtype=np.object_)
            )
            
            # 創建輸出
            outputs = [
                httpclient.InferRequestedOutput("answer"),
                httpclient.InferRequestedOutput("confidence"),
                httpclient.InferRequestedOutput("sources"),
                httpclient.InferRequestedOutput("pipeline_info")
            ]
            
            # 發送請求
            start_time = time.time()
            response = self.client.infer(
                "advanced_pipeline",
                inputs=[query_input, config_input],
                outputs=outputs
            )
            request_time = (time.time() - start_time) * 1000
            
            # 解析結果
            answer = response.as_numpy("answer")[0].decode()
            confidence = response.as_numpy("confidence")[0]
            sources = [s.decode() for s in response.as_numpy("sources")]
            pipeline_info = json.loads(
                response.as_numpy("pipeline_info")[0].decode()
            )
            
            print(f"❓ 查詢: '{query}'")
            print(f"💡 答案: {answer}")
            print(f"🎯 置信度: {confidence:.2f}")
            print(f"📚 來源: {', '.join(sources)}")
            print(f"⏱️  總請求時間: {request_time:.2f}ms")
            print(f"🔧 管道總時間: {pipeline_info['total_time']:.2f}ms")
            print("")
            print("📊 管道步驟詳情:")
            for step in pipeline_info['steps']:
                print(f"   {step['name']}: {step['time_ms']:.2f}ms")
                
        except Exception as e:
            print(f"❌ 高級管道測試失敗: {str(e)}")
    
    def test_async_processor(self):
        """測試異步處理器"""
        print("🧪 測試異步處理器...")
        
        # 測試不同類型的異步任務
        test_requests = [
            {
                "type": "api_call",
                "url": "https://httpbin.org/delay/0.5",
                "method": "GET"
            },
            {
                "type": "computation",
                "operation": "sum",
                "numbers": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
            },
            {
                "type": "computation",
                "operation": "product",
                "numbers": [2, 3, 4]
            }
        ]
        
        for i, req_data in enumerate(test_requests):
            try:
                # 創建輸入
                request_input = httpclient.InferInput(
                    "requests", [1], "BYTES"
                )
                request_input.set_data_from_numpy(
                    np.array([json.dumps(req_data)], dtype=np.object_)
                )
                
                config_input = httpclient.InferInput(
                    "async_config", [1], "BYTES"
                )
                config_input.set_data_from_numpy(
                    np.array([json.dumps({"delay": 0.1})], dtype=np.object_)
                )
                
                # 創建輸出
                outputs = [
                    httpclient.InferRequestedOutput("results"),
                    httpclient.InferRequestedOutput("status")
                ]
                
                # 發送請求
                start_time = time.time()
                response = self.client.infer(
                    "async_processor",
                    inputs=[request_input, config_input],
                    outputs=outputs
                )
                request_time = (time.time() - start_time) * 1000
                
                # 解析結果
                results = json.loads(response.as_numpy("results")[0].decode())
                status = response.as_numpy("status")[0].decode()
                
                print(f"🔄 異步任務 {i+1} ({req_data['type']}):")
                print(f"   狀態: {status}")
                print(f"   請求時間: {request_time:.2f}ms")
                print(f"   結果: {results}")
                print("")
                
            except Exception as e:
                print(f"❌ 異步任務 {i+1} 失敗: {str(e)}")


# 創建測試器實例（注意：需要 Triton 服務器運行）
print("🧪 Python Backend 測試客戶端已準備")
print("💡 使用方法:")
print("   tester = PythonBackendTester()")
print("   tester.test_basic_backend()")
print("   tester.test_advanced_pipeline()")
print("   tester.test_async_processor()")

## 📊 性能監控和調試

In [None]:
# Python Backend 監控和調試工具
class PythonBackendMonitor:
    def __init__(self, server_url="localhost:8000"):
        self.server_url = server_url
        self.client = httpclient.InferenceServerClient(url=server_url)
    
    def get_model_status(self, model_name):
        """獲取模型狀態"""
        try:
            model_config = self.client.get_model_config(model_name)
            model_stats = self.client.get_inference_statistics(model_name)
            
            print(f"📋 模型: {model_name}")
            print(f"🏷️  Backend: {model_config['backend']}")
            print(f"📈 最大批次大小: {model_config['max_batch_size']}")
            print(f"🔢 實例數量: {len(model_config['instance_group'])}")
            
            if 'model_stats' in model_stats:
                stats = model_stats['model_stats'][0]
                print(f"📊 推理統計:")
                print(f"   成功請求: {stats['inference_count']}")
                print(f"   執行時間: {stats['inference_stats']['success']['count']}")
                if stats['inference_stats']['success']['count'] > 0:
                    avg_time = stats['inference_stats']['success']['total_time_ns'] / stats['inference_stats']['success']['count'] / 1000000
                    print(f"   平均延遲: {avg_time:.2f}ms")
            
            return True
            
        except Exception as e:
            print(f"❌ 無法獲取模型狀態: {str(e)}")
            return False
    
    def benchmark_model(self, model_name, test_data, num_requests=10, concurrency=1):
        """模型性能基準測試"""
        print(f"🚀 開始基準測試: {model_name}")
        print(f"📋 測試配置: {num_requests} 請求, 並發度 {concurrency}")
        
        results = {
            "successful_requests": 0,
            "failed_requests": 0,
            "response_times": [],
            "total_time": 0
        }
        
        start_time = time.time()
        
        # 簡單的串行測試（可以擴展為並發）
        for i in range(num_requests):
            try:
                request_start = time.time()
                
                # 根據模型類型準備輸入
                if model_name == "custom_pipeline":
                    inputs = self._prepare_basic_inputs(test_data)
                    outputs = ["processed_text", "metadata"]
                elif model_name == "advanced_pipeline":
                    inputs = self._prepare_advanced_inputs(test_data)
                    outputs = ["answer", "confidence", "sources", "pipeline_info"]
                elif model_name == "async_processor":
                    inputs = self._prepare_async_inputs(test_data)
                    outputs = ["results", "status"]
                else:
                    continue
                
                # 創建輸出對象
                output_objects = [httpclient.InferRequestedOutput(name) for name in outputs]
                
                # 發送請求
                response = self.client.infer(
                    model_name, inputs=inputs, outputs=output_objects
                )
                
                request_time = (time.time() - request_start) * 1000
                results["response_times"].append(request_time)
                results["successful_requests"] += 1
                
            except Exception as e:
                print(f"❌ 請求 {i+1} 失敗: {str(e)}")
                results["failed_requests"] += 1
        
        results["total_time"] = (time.time() - start_time) * 1000
        
        # 計算統計信息
        if results["response_times"]:
            avg_latency = np.mean(results["response_times"])
            p95_latency = np.percentile(results["response_times"], 95)
            p99_latency = np.percentile(results["response_times"], 99)
            throughput = results["successful_requests"] / (results["total_time"] / 1000)
            
            print("\n📊 基準測試結果:")
            print(f"   成功請求: {results['successful_requests']}/{num_requests}")
            print(f"   成功率: {results['successful_requests']/num_requests*100:.1f}%")
            print(f"   平均延遲: {avg_latency:.2f}ms")
            print(f"   P95 延遲: {p95_latency:.2f}ms")
            print(f"   P99 延遲: {p99_latency:.2f}ms")
            print(f"   吞吐量: {throughput:.2f} QPS")
            print(f"   總測試時間: {results['total_time']:.2f}ms")
        
        return results
    
    def _prepare_basic_inputs(self, test_data):
        """準備基礎模型輸入"""
        text_input = httpclient.InferInput("text_input", [1], "BYTES")
        text_input.set_data_from_numpy(
            np.array([test_data.get("text", "Hello World")], dtype=np.object_)
        )
        
        params_input = httpclient.InferInput("parameters", [1], "BYTES")
        params_input.set_data_from_numpy(
            np.array([json.dumps(test_data.get("params", {"type": "uppercase"}))], dtype=np.object_)
        )
        
        return [text_input, params_input]
    
    def _prepare_advanced_inputs(self, test_data):
        """準備高級管道輸入"""
        query_input = httpclient.InferInput("query", [1], "BYTES")
        query_input.set_data_from_numpy(
            np.array([test_data.get("query", "什麼是機器學習？")], dtype=np.object_)
        )
        
        config_input = httpclient.InferInput("config", [1], "BYTES")
        config_input.set_data_from_numpy(
            np.array([json.dumps(test_data.get("config", {"top_k": 3}))], dtype=np.object_)
        )
        
        return [query_input, config_input]
    
    def _prepare_async_inputs(self, test_data):
        """準備異步處理輸入"""
        request_input = httpclient.InferInput("requests", [1], "BYTES")
        request_input.set_data_from_numpy(
            np.array([json.dumps(test_data.get("request", {"type": "computation", "operation": "sum", "numbers": [1,2,3,4,5]}))], dtype=np.object_)
        )
        
        config_input = httpclient.InferInput("async_config", [1], "BYTES")
        config_input.set_data_from_numpy(
            np.array([json.dumps(test_data.get("async_config", {"delay": 0.05}))], dtype=np.object_)
        )
        
        return [request_input, config_input]


print("📊 Python Backend 監控工具已準備")
print("💡 使用方法:")
print("   monitor = PythonBackendMonitor()")
print("   monitor.get_model_status('custom_pipeline')")
print("   monitor.benchmark_model('custom_pipeline', {'text': 'Test'}, num_requests=10)")

## 🎯 實驗 4：部署和驗證

In [None]:
# 創建部署腳本
deployment_script = '''
#!/bin/bash

echo "🚀 部署 Python Backend 模型..."

# 設置環境變量
export MODEL_REPOSITORY="/opt/tritonserver/models"
export TRITON_LOG_LEVEL="INFO"

# 檢查模型目錄
echo "📋 檢查模型目錄結構..."
ls -la $MODEL_REPOSITORY/

# 檢查 Python Backend 模型
for model in "custom_pipeline" "advanced_pipeline" "async_processor"; do
    if [ -d "$MODEL_REPOSITORY/$model" ]; then
        echo "✅ 發現模型: $model"
        ls -la "$MODEL_REPOSITORY/$model/"
        
        # 檢查模型文件
        if [ -f "$MODEL_REPOSITORY/$model/config.pbtxt" ]; then
            echo "  📄 配置文件存在"
        else
            echo "  ❌ 配置文件缺失"
        fi
        
        if [ -f "$MODEL_REPOSITORY/$model/1/model.py" ]; then
            echo "  🐍 Python 模型存在"
        else
            echo "  ❌ Python 模型缺失"
        fi
        echo ""
    else
        echo "❌ 模型目錄不存在: $model"
    fi
done

# 啟動 Triton 服務器
echo "🖥️  啟動 Triton 服務器..."
tritonserver \
    --model-repository=$MODEL_REPOSITORY \
    --backend-config=python,shm-default-byte-size=134217728 \
    --log-verbose=1 \
    --allow-http=true \
    --allow-grpc=true \
    --allow-metrics=true
'''

with open("/tmp/deploy_python_backends.sh", "w") as f:
    f.write(deployment_script)

# 設置執行權限
os.chmod("/tmp/deploy_python_backends.sh", 0o755)

print("✅ 部署腳本已創建: /tmp/deploy_python_backends.sh")

In [None]:
# 創建驗證腳本
validation_script = '''
#!/usr/bin/env python3

import time
import json
import requests
import numpy as np

def wait_for_server(url="http://localhost:8000/v2/health/ready", timeout=60):
    """等待 Triton 服務器就緒"""
    print(f"⏳ 等待 Triton 服務器就緒...")
    
    start_time = time.time()
    while time.time() - start_time < timeout:
        try:
            response = requests.get(url)
            if response.status_code == 200:
                print("✅ Triton 服務器已就緒")
                return True
        except:
            pass
        
        time.sleep(2)
    
    print("❌ 等待服務器超時")
    return False

def check_models():
    """檢查模型狀態"""
    models = ["custom_pipeline", "advanced_pipeline", "async_processor"]
    
    for model in models:
        try:
            response = requests.get(f"http://localhost:8000/v2/models/{model}/ready")
            if response.status_code == 200:
                print(f"✅ 模型就緒: {model}")
            else:
                print(f"❌ 模型未就緒: {model}")
        except Exception as e:
            print(f"❌ 檢查模型失敗 {model}: {str(e)}")

def test_basic_inference():
    """測試基礎推理"""
    print("🧪 測試基礎推理...")
    
    payload = {
        "inputs": [
            {
                "name": "text_input",
                "shape": [1],
                "datatype": "BYTES",
                "data": ["Hello Python Backend"]
            },
            {
                "name": "parameters",
                "shape": [1],
                "datatype": "BYTES",
                "data": ['{"type": "uppercase"}']
            }
        ],
        "outputs": [
            {"name": "processed_text"},
            {"name": "metadata"}
        ]
    }
    
    try:
        response = requests.post(
            "http://localhost:8000/v2/models/custom_pipeline/infer",
            json=payload
        )
        
        if response.status_code == 200:
            result = response.json()
            processed_text = result["outputs"][0]["data"][0]
            print(f"✅ 基礎推理成功: '{processed_text}'")
        else:
            print(f"❌ 基礎推理失敗: {response.status_code}")
            print(response.text)
            
    except Exception as e:
        print(f"❌ 基礎推理異常: {str(e)}")

if __name__ == "__main__":
    print("🔍 Python Backend 驗證開始")
    
    # 等待服務器
    if wait_for_server():
        # 檢查模型
        check_models()
        
        # 測試推理
        test_basic_inference()
        
        print("🏁 驗證完成")
    else:
        print("❌ 服務器未就緒，驗證終止")
'''

with open("/tmp/validate_python_backends.py", "w") as f:
    f.write(validation_script)

os.chmod("/tmp/validate_python_backends.py", 0o755)

print("✅ 驗證腳本已創建: /tmp/validate_python_backends.py")

## 📚 最佳實踐和故障排除

### 🎯 Python Backend 開發最佳實踐

#### 1. 代碼結構
- **模組化設計**：將邏輯分解為可重用的組件
- **錯誤處理**：完善的異常捕獲和處理機制
- **日誌記錄**：詳細的執行日誌便於調試
- **配置管理**：靈活的參數配置系統

#### 2. 性能優化
- **批處理**：充分利用動態批次功能
- **異步處理**：使用協程處理 I/O 密集型任務
- **資源池**：重用連接和計算資源
- **內存管理**：及時釋放不需要的對象

#### 3. 安全考慮
- **輸入驗證**：嚴格驗證所有輸入數據
- **資源限制**：設置合理的超時和資源限制
- **錯誤信息**：避免暴露敏感信息
- **依賴管理**：保持依賴庫的安全更新

#### 4. 監控和調試
- **性能指標**：收集關鍵性能數據
- **健康檢查**：實現模型健康狀態檢查
- **調試模式**：支持開發時的詳細調試
- **版本管理**：清晰的模型版本控制

In [None]:
# 故障排除指南
troubleshooting_guide = """
🔧 Python Backend 故障排除指南

常見問題和解決方案：

1. 模型加載失敗
   問題：模型無法加載或初始化失敗
   解決方案：
   - 檢查 model.py 語法錯誤
   - 確認所有依賴庫已安裝
   - 檢查 config.pbtxt 配置正確性
   - 查看 Triton 服務器日誌

2. 推理請求超時
   問題：推理請求響應時間過長
   解決方案：
   - 優化模型執行邏輯
   - 增加實例數量
   - 調整超時配置
   - 使用異步處理

3. 內存使用過高
   問題：Python Backend 消耗大量內存
   解決方案：
   - 及時釋放大對象
   - 避免內存洩漏
   - 優化數據結構
   - 限制批次大小

4. 並發處理問題
   問題：高並發時出現競態條件
   解決方案：
   - 使用線程安全的數據結構
   - 避免全局變量修改
   - 合理使用鎖機制
   - 設計無狀態處理邏輯

5. 依賴庫衝突
   問題：不同模型間的依賴庫版本衝突
   解決方案：
   - 使用 Python 環境隔離
   - 創建專用執行環境
   - 統一依賴版本管理
   - 使用容器化部署

調試技巧：
- 使用 print() 或 logging 輸出調試信息
- 設置斷點進行調試（開發環境）
- 檢查 Triton 服務器日誌
- 使用性能分析工具
- 監控系統資源使用情況

日誌查看命令：
- docker logs <triton_container_id>
- tail -f /var/log/triton/triton.log
- journalctl -u triton-server -f
"""

print(troubleshooting_guide)

## 📖 總結

本實驗完成了自定義 Python Backend 的完整開發流程：

### 🎯 實驗成果
1. **基礎 Python Backend** - 實現了文本處理管道
2. **高級管道 Backend** - 構建了多步驟推理流程
3. **異步處理 Backend** - 開發了並發處理能力
4. **監控和調試工具** - 提供了完整的運維支持

### 🔧 關鍵技術點
- Triton Python Backend API 使用
- 異步編程和並發處理
- 性能監控和優化
- 錯誤處理和故障排除

### 🚀 後續步驟
1. 部署到生產環境
2. 集成更複雜的業務邏輯
3. 實現 A/B 測試功能
4. 添加模型熱更新能力

### 💡 學習要點
- Python Backend 提供了最大的靈活性
- 異步處理能顯著提升性能
- 監控和調試是成功部署的關鍵
- 良好的錯誤處理能提升系統穩定性

---

**🎉 恭喜完成 Lab 2.3.4！**

您已經掌握了 Triton Python Backend 的高級開發技術，可以構建複雜的推理管道和處理邏輯。