# RAG 進階技術 (Advanced RAG Fundamentals)

**對應課程**: 李宏毅 2025 Spring ML HW1 進階部分

本 notebook 涵蓋 RAG 系統的進階技術，包括 Reranking、HyDE、Query Transformation 等。

## 學習目標
1. 理解 RAG 系統的瓶頸與優化方向
2. 實作 Reranking（Cross-encoder）
3. 實作 HyDE（Hypothetical Document Embeddings）
4. 學習 Query Transformation 技術
5. 掌握 RAG 系統的完整評估方法

## Part 1: RAG 系統的瓶頸分析

### 1.1 基礎 RAG 的問題

```
┌─────────────────────────────────────────────────────────────┐
│                  基礎 RAG 的常見問題                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. 檢索品質問題                                             │
│     • Bi-encoder 語義匹配不夠精確                            │
│     • 查詢與文件的表達方式不同（詞彙差異）                    │
│     • Top-K 可能遺漏重要文件                                 │
│                                                             │
│  2. 上下文品質問題                                           │
│     • 檢索到的文件可能不相關                                 │
│     • 文件順序影響 LLM 理解                                  │
│     • Context window 限制                                   │
│                                                             │
│  3. 生成品質問題                                             │
│     • LLM 可能忽略檢索內容                                   │
│     • 無法正確整合多個來源                                   │
│     • 仍可能產生幻覺                                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

### 1.2 進階 RAG 架構

```
┌─────────────────────────────────────────────────────────────────────┐
│                      進階 RAG Pipeline                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────── Pre-Retrieval 優化 ─────────────────┐        │
│  │                                                         │        │
│  │   原始查詢         查詢改寫          多查詢生成          │        │
│  │  ┌─────────┐     ┌─────────┐     ┌─────────────┐       │        │
│  │  │  Query  │ →   │Rewrite/ │ →   │Multi-Query  │       │        │
│  │  │         │     │ Expand  │     │(HyDE/Step)  │       │        │
│  │  └─────────┘     └─────────┘     └──────┬──────┘       │        │
│  │                                         │               │        │
│  └─────────────────────────────────────────┼───────────────┘        │
│                                            ↓                        │
│  ┌─────────────────── Retrieval 階段 ──────┼───────────────┐        │
│  │                                         │               │        │
│  │                                    ┌────┴────┐          │        │
│  │                                    │ Initial │          │        │
│  │                                    │Retrieval│          │        │
│  │                                    │ (Top-K) │          │        │
│  │                                    └────┬────┘          │        │
│  │                                         │               │        │
│  └─────────────────────────────────────────┼───────────────┘        │
│                                            ↓                        │
│  ┌─────────────────── Post-Retrieval 優化 ─┼───────────────┐        │
│  │                                         │               │        │
│  │   Reranking        過濾/壓縮        上下文整合           │        │
│  │  ┌─────────┐     ┌─────────┐     ┌─────────────┐       │        │
│  │  │ Cross-  │ →   │ Filter/ │ →   │  Context    │       │        │
│  │  │ Encoder │     │Compress │     │ Formatting  │       │        │
│  │  └─────────┘     └─────────┘     └──────┬──────┘       │        │
│  │                                         │               │        │
│  └─────────────────────────────────────────┼───────────────┘        │
│                                            ↓                        │
│                                       ┌──────────┐                  │
│                                       │   LLM    │                  │
│                                       │ Generate │                  │
│                                       └──────────┘                  │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
```

In [None]:
# 環境設置
import torch
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass
import re

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"使用設備: {device}")

# 資料結構
@dataclass
class Document:
    content: str
    metadata: Dict = None
    
    def __post_init__(self):
        if self.metadata is None:
            self.metadata = {}

## Part 2: Reranking（重排序）

### 2.1 Bi-encoder vs Cross-encoder

```
┌────────────────────────────────────────────────────────────────┐
│            Bi-encoder vs Cross-encoder 比較                    │
├────────────────────────────┬───────────────────────────────────┤
│        Bi-encoder          │         Cross-encoder            │
├────────────────────────────┼───────────────────────────────────┤
│                            │                                   │
│   Query    Document        │      Query + Document             │
│     │         │            │            │                      │
│     ▼         ▼            │            ▼                      │
│  ┌─────┐  ┌─────┐         │      ┌───────────┐                │
│  │Enc 1│  │Enc 2│         │      │ Encoder   │                │
│  └──┬──┘  └──┬──┘         │      │ (Joint)   │                │
│     │        │            │      └─────┬─────┘                │
│     ▼        ▼            │            │                      │
│   [Vec]    [Vec]          │            ▼                      │
│     └───┬────┘            │        Relevance                  │
│         │                 │         Score                     │
│     Similarity            │                                   │
├────────────────────────────┼───────────────────────────────────┤
│ + 可預先計算文件向量       │ + 更精確的相關性判斷              │
│ + 快速（O(1) 比較）       │ + 考慮 query-doc 交互             │
│ - 較不精確                │ - 需要 O(N) 計算                  │
│                           │ - 較慢                            │
├────────────────────────────┼───────────────────────────────────┤
│ 用途: 初始檢索（召回）     │ 用途: 重排序（精排）              │
└────────────────────────────┴───────────────────────────────────┘
```

In [None]:
# Cross-encoder Reranker 實作
class CrossEncoderReranker:
    """使用 Cross-encoder 進行重排序"""
    
    def __init__(self, model_name: str = 'cross-encoder/ms-marco-MiniLM-L-6-v2'):
        try:
            from sentence_transformers import CrossEncoder
            self.model = CrossEncoder(model_name)
            self.available = True
        except ImportError:
            print("警告: sentence-transformers 未安裝，使用簡化實作")
            self.model = None
            self.available = False
    
    def rerank(self, query: str, documents: List[Document], 
               top_k: int = 5) -> List[Tuple[Document, float]]:
        """
        重排序文件
        
        Args:
            query: 查詢文本
            documents: 候選文件列表
            top_k: 返回的文件數量
        
        Returns:
            排序後的 (document, score) 列表
        """
        if not self.available or not documents:
            # 簡化實作：使用字詞重疊率
            results = []
            query_words = set(query.lower().split())
            for doc in documents:
                doc_words = set(doc.content.lower().split())
                overlap = len(query_words & doc_words) / len(query_words) if query_words else 0
                results.append((doc, overlap))
            results.sort(key=lambda x: x[1], reverse=True)
            return results[:top_k]
        
        # 使用 Cross-encoder
        pairs = [(query, doc.content) for doc in documents]
        scores = self.model.predict(pairs)
        
        # 組合結果並排序
        results = list(zip(documents, scores))
        results.sort(key=lambda x: x[1], reverse=True)
        
        return results[:top_k]

# 測試 Reranker
reranker = CrossEncoderReranker()

# 準備測試文件
test_docs = [
    Document(content="Machine learning is a subset of artificial intelligence that focuses on building systems that learn from data."),
    Document(content="The weather forecast predicts sunny skies for the weekend."),
    Document(content="Deep learning uses neural networks with multiple layers to process complex patterns in data."),
    Document(content="Pizza is a popular Italian dish made with dough, tomato sauce, and cheese."),
    Document(content="Natural language processing enables computers to understand human language."),
]

query = "What is deep learning and how does it work?"
reranked = reranker.rerank(query, test_docs, top_k=3)

print(f"Query: {query}\n")
print("Reranked results:")
for i, (doc, score) in enumerate(reranked, 1):
    print(f"  [{i}] Score: {score:.4f}")
    print(f"      {doc.content[:80]}...")

### 2.2 Two-stage Retrieval Pipeline

結合 Bi-encoder（召回）和 Cross-encoder（精排）的兩階段檢索。

In [None]:
class TwoStageRetriever:
    """兩階段檢索器：Bi-encoder 召回 + Cross-encoder 精排"""
    
    def __init__(self, bi_encoder_name: str = 'all-MiniLM-L6-v2',
                 cross_encoder_name: str = 'cross-encoder/ms-marco-MiniLM-L-6-v2'):
        try:
            from sentence_transformers import SentenceTransformer
            self.bi_encoder = SentenceTransformer(bi_encoder_name)
        except ImportError:
            self.bi_encoder = None
        
        self.reranker = CrossEncoderReranker(cross_encoder_name)
        self.documents: List[Document] = []
        self.embeddings: np.ndarray = None
    
    def add_documents(self, documents: List[Document]):
        """建立索引"""
        self.documents.extend(documents)
        
        if self.bi_encoder:
            new_embeddings = self.bi_encoder.encode(
                [doc.content for doc in documents],
                show_progress_bar=False
            )
            if self.embeddings is None:
                self.embeddings = new_embeddings
            else:
                self.embeddings = np.vstack([self.embeddings, new_embeddings])
    
    def retrieve(self, query: str, 
                 initial_k: int = 20, 
                 final_k: int = 5) -> List[Tuple[Document, float]]:
        """
        兩階段檢索
        
        Args:
            query: 查詢
            initial_k: 第一階段召回數量
            final_k: 最終返回數量
        """
        # Stage 1: Bi-encoder 召回
        if self.bi_encoder and self.embeddings is not None:
            query_embedding = self.bi_encoder.encode([query])[0]
            
            # 計算餘弦相似度
            similarities = np.dot(self.embeddings, query_embedding) / (
                np.linalg.norm(self.embeddings, axis=1) * np.linalg.norm(query_embedding)
            )
            
            # 取 top-K
            top_indices = np.argsort(similarities)[::-1][:initial_k]
            candidates = [self.documents[i] for i in top_indices]
        else:
            candidates = self.documents[:initial_k]
        
        # Stage 2: Cross-encoder 精排
        results = self.reranker.rerank(query, candidates, top_k=final_k)
        
        return results

# 示範
print("兩階段檢索器已定義")
print("Stage 1: Bi-encoder 快速召回大量候選")
print("Stage 2: Cross-encoder 精確重排序")

## Part 3: HyDE (Hypothetical Document Embeddings)

### 3.1 HyDE 概念

HyDE 的核心想法：讓 LLM 先生成一個「假設性答案文件」，然後用這個文件去檢索。

```
┌─────────────────────────────────────────────────────────────────┐
│                         HyDE Pipeline                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   原始查詢                                                      │
│  "What causes climate change?"                                 │
│           │                                                     │
│           ▼                                                     │
│   ┌───────────────┐                                            │
│   │     LLM       │ ← "Generate a passage that answers..."     │
│   │  (Zero-shot)  │                                            │
│   └───────┬───────┘                                            │
│           │                                                     │
│           ▼                                                     │
│   假設性文件 (Hypothetical Document)                            │
│  "Climate change is primarily caused by greenhouse gas         │
│   emissions from human activities such as burning fossil       │
│   fuels, deforestation..."                                     │
│           │                                                     │
│           ▼                                                     │
│   ┌───────────────┐                                            │
│   │   Embedding   │                                            │
│   └───────┬───────┘                                            │
│           │                                                     │
│           ▼                                                     │
│   ┌───────────────┐    ┌───────────────────────┐               │
│   │   Retrieval   │ ←  │  Document Embeddings  │               │
│   └───────────────┘    └───────────────────────┘               │
│                                                                 │
│   優勢: 假設性文件與真實文件在向量空間中更接近                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

In [None]:
class HyDERetriever:
    """HyDE: Hypothetical Document Embeddings"""
    
    def __init__(self, embedding_model=None, llm_generator=None):
        """
        Args:
            embedding_model: 用於嵌入的模型
            llm_generator: 用於生成假設性文件的 LLM
        """
        try:
            from sentence_transformers import SentenceTransformer
            self.embedding_model = embedding_model or SentenceTransformer('all-MiniLM-L6-v2')
        except ImportError:
            self.embedding_model = None
        
        self.llm_generator = llm_generator
        self.documents: List[Document] = []
        self.doc_embeddings: np.ndarray = None
        
        # HyDE prompt template
        self.hyde_prompt = """Please write a passage that answers the following question.

Question: {query}

Passage:"""
    
    def add_documents(self, documents: List[Document]):
        """建立文件索引"""
        self.documents.extend(documents)
        
        if self.embedding_model:
            new_embeddings = self.embedding_model.encode(
                [doc.content for doc in documents],
                show_progress_bar=False
            )
            if self.doc_embeddings is None:
                self.doc_embeddings = new_embeddings
            else:
                self.doc_embeddings = np.vstack([self.doc_embeddings, new_embeddings])
    
    def generate_hypothetical_document(self, query: str) -> str:
        """生成假設性文件"""
        if self.llm_generator:
            prompt = self.hyde_prompt.format(query=query)
            return self.llm_generator(prompt)
        else:
            # 簡化版：直接擴展查詢
            return f"This passage discusses {query}. It provides detailed information about the topic, including relevant facts, explanations, and examples."
    
    def retrieve(self, query: str, k: int = 5, 
                 use_hyde: bool = True) -> List[Tuple[Document, float]]:
        """
        使用 HyDE 進行檢索
        
        Args:
            query: 原始查詢
            k: 返回文件數量
            use_hyde: 是否使用 HyDE
        """
        if not self.embedding_model or self.doc_embeddings is None:
            return [(doc, 0.0) for doc in self.documents[:k]]
        
        if use_hyde:
            # 生成假設性文件
            hypothetical_doc = self.generate_hypothetical_document(query)
            query_text = hypothetical_doc
        else:
            query_text = query
        
        # 嵌入查詢
        query_embedding = self.embedding_model.encode([query_text])[0]
        
        # 計算相似度
        similarities = np.dot(self.doc_embeddings, query_embedding) / (
            np.linalg.norm(self.doc_embeddings, axis=1) * np.linalg.norm(query_embedding)
        )
        
        # 取 top-K
        top_indices = np.argsort(similarities)[::-1][:k]
        results = [(self.documents[i], float(similarities[i])) for i in top_indices]
        
        return results

# 示範
print("HyDE Retriever 已定義")
print("\nHyDE 的優勢:")
print("1. 查詢通常很短，文件通常很長")
print("2. 假設性文件橋接了這個差距")
print("3. 特別適合問答型查詢")

In [None]:
# 視覺化 HyDE 效果
def visualize_hyde_effect():
    """展示 HyDE 如何改善查詢與文件的語義匹配"""
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # 模擬的向量空間（2D 簡化）
    np.random.seed(42)
    
    # 文件向量（聚集在某區域）
    doc_vectors = np.random.randn(10, 2) * 0.3 + np.array([2, 2])
    
    # 原始查詢向量（離文件較遠）
    query_vector = np.array([0.5, 0.5])
    
    # HyDE 假設性文件向量（更接近文件）
    hyde_vector = np.array([1.8, 1.9])
    
    # 左圖：標準檢索
    ax1 = axes[0]
    ax1.scatter(doc_vectors[:, 0], doc_vectors[:, 1], c='blue', s=100, alpha=0.6, label='Documents')
    ax1.scatter(query_vector[0], query_vector[1], c='red', s=200, marker='*', label='Query')
    
    # 畫出到最近文件的線
    distances_standard = np.linalg.norm(doc_vectors - query_vector, axis=1)
    nearest_idx = np.argmin(distances_standard)
    ax1.plot([query_vector[0], doc_vectors[nearest_idx, 0]], 
             [query_vector[1], doc_vectors[nearest_idx, 1]], 'r--', alpha=0.5)
    
    ax1.set_title('Standard Retrieval\n(Query far from documents)', fontsize=12)
    ax1.set_xlabel('Dimension 1')
    ax1.set_ylabel('Dimension 2')
    ax1.legend()
    ax1.set_xlim(-0.5, 3.5)
    ax1.set_ylim(-0.5, 3.5)
    ax1.grid(True, alpha=0.3)
    ax1.text(0.5, 0.2, f'Distance: {distances_standard[nearest_idx]:.2f}', fontsize=10)
    
    # 右圖：HyDE 檢索
    ax2 = axes[1]
    ax2.scatter(doc_vectors[:, 0], doc_vectors[:, 1], c='blue', s=100, alpha=0.6, label='Documents')
    ax2.scatter(query_vector[0], query_vector[1], c='red', s=100, marker='*', alpha=0.3, label='Original Query')
    ax2.scatter(hyde_vector[0], hyde_vector[1], c='green', s=200, marker='*', label='HyDE Document')
    
    # 畫出轉換箭頭
    ax2.annotate('', xy=hyde_vector, xytext=query_vector,
                arrowprops=dict(arrowstyle='->', color='purple', lw=2))
    ax2.text(1.0, 1.0, 'LLM\nGeneration', fontsize=9, color='purple')
    
    # 畫出到最近文件的線
    distances_hyde = np.linalg.norm(doc_vectors - hyde_vector, axis=1)
    nearest_idx_hyde = np.argmin(distances_hyde)
    ax2.plot([hyde_vector[0], doc_vectors[nearest_idx_hyde, 0]], 
             [hyde_vector[1], doc_vectors[nearest_idx_hyde, 1]], 'g--', alpha=0.5)
    
    ax2.set_title('HyDE Retrieval\n(Hypothetical doc closer to documents)', fontsize=12)
    ax2.set_xlabel('Dimension 1')
    ax2.set_ylabel('Dimension 2')
    ax2.legend(loc='upper left')
    ax2.set_xlim(-0.5, 3.5)
    ax2.set_ylim(-0.5, 3.5)
    ax2.grid(True, alpha=0.3)
    ax2.text(1.8, 1.6, f'Distance: {distances_hyde[nearest_idx_hyde]:.2f}', fontsize=10)
    
    plt.tight_layout()
    plt.show()

visualize_hyde_effect()

## Part 4: Query Transformation（查詢轉換）

### 4.1 Multi-Query（多查詢）

生成多個不同角度的查詢，增加召回率。

In [None]:
class MultiQueryRetriever:
    """多查詢檢索器"""
    
    def __init__(self, base_retriever, query_generator=None):
        self.base_retriever = base_retriever
        self.query_generator = query_generator
        
        # 查詢生成 prompt
        self.multi_query_prompt = """Your task is to generate 3 different versions of the given user question 
to retrieve relevant documents from a vector database. By generating multiple perspectives on the user question, 
your goal is to help the user overcome some of the limitations of distance-based similarity search.

Provide these alternative questions separated by newlines.

Original question: {query}

Alternative questions:"""
    
    def generate_queries(self, query: str, num_queries: int = 3) -> List[str]:
        """生成多個查詢變體"""
        if self.query_generator:
            prompt = self.multi_query_prompt.format(query=query)
            response = self.query_generator(prompt)
            queries = [q.strip() for q in response.split('\n') if q.strip()]
            return [query] + queries[:num_queries-1]
        else:
            # 簡化版：規則式變體
            variants = [query]
            
            # 變體 1: 重新措辭
            if '?' in query:
                variants.append(query.replace('?', '').strip() + " explanation")
            
            # 變體 2: 加入同義詞提示
            variants.append(f"detailed information about {query}")
            
            # 變體 3: 更具體
            variants.append(f"examples and use cases of {query}")
            
            return variants[:num_queries]
    
    def retrieve(self, query: str, k: int = 5) -> List[Tuple[Document, float]]:
        """使用多查詢進行檢索"""
        # 生成多個查詢
        queries = self.generate_queries(query)
        
        # 對每個查詢進行檢索
        all_results = {}
        for q in queries:
            results = self.base_retriever.retrieve(q, k=k)
            for doc, score in results:
                doc_id = id(doc)
                if doc_id not in all_results:
                    all_results[doc_id] = (doc, score)
                else:
                    # 取最高分數
                    _, existing_score = all_results[doc_id]
                    all_results[doc_id] = (doc, max(score, existing_score))
        
        # 排序並返回
        results = list(all_results.values())
        results.sort(key=lambda x: x[1], reverse=True)
        
        return results[:k]

# 測試多查詢生成
class DummyRetriever:
    def retrieve(self, query, k=5):
        return [(Document(content=f"Result for: {query}"), 0.5)]

multi_query = MultiQueryRetriever(DummyRetriever())
test_query = "How does attention mechanism work in transformers?"
generated = multi_query.generate_queries(test_query)

print(f"原始查詢: {test_query}\n")
print("生成的查詢變體:")
for i, q in enumerate(generated, 1):
    print(f"  {i}. {q}")

### 4.2 Step-back Prompting

將具體問題抽象化，檢索更通用的背景知識。

In [None]:
class StepBackRetriever:
    """Step-back Prompting 檢索器"""
    
    def __init__(self, base_retriever, abstractor=None):
        self.base_retriever = base_retriever
        self.abstractor = abstractor
        
        self.stepback_prompt = """Given a specific question, generate a more abstract, higher-level question 
that would help provide background knowledge for answering the original question.

Original question: {query}

Abstract question:"""
    
    def generate_stepback_query(self, query: str) -> str:
        """生成抽象化查詢"""
        if self.abstractor:
            prompt = self.stepback_prompt.format(query=query)
            return self.abstractor(prompt)
        else:
            # 簡化版：提取核心概念
            # 移除具體細節，保留核心主題
            abstract = query
            
            # 規則式抽象化
            specific_patterns = [
                (r'in (PyTorch|TensorFlow|Keras)', 'in deep learning frameworks'),
                (r'for (GPT-\d|BERT|LLaMA)', 'for language models'),
                (r'(\d+) layer', 'multi-layer'),
                (r'how to implement', 'what is'),
            ]
            
            for pattern, replacement in specific_patterns:
                abstract = re.sub(pattern, replacement, abstract, flags=re.IGNORECASE)
            
            return f"What are the fundamental concepts of {abstract.split()[-1]}?" if len(abstract.split()) > 0 else query
    
    def retrieve(self, query: str, k: int = 5, include_stepback: bool = True) -> List[Tuple[Document, float]]:
        """使用 step-back 進行檢索"""
        results = {}
        
        # 原始查詢檢索
        original_results = self.base_retriever.retrieve(query, k=k)
        for doc, score in original_results:
            results[id(doc)] = (doc, score, 'original')
        
        # Step-back 查詢檢索
        if include_stepback:
            stepback_query = self.generate_stepback_query(query)
            stepback_results = self.base_retriever.retrieve(stepback_query, k=k//2)
            for doc, score in stepback_results:
                doc_id = id(doc)
                if doc_id not in results:
                    results[doc_id] = (doc, score * 0.8, 'stepback')  # 稍微降低權重
        
        # 排序
        final_results = [(doc, score) for doc, score, _ in results.values()]
        final_results.sort(key=lambda x: x[1], reverse=True)
        
        return final_results[:k]

# 測試
stepback = StepBackRetriever(DummyRetriever())
specific_query = "How to implement multi-head attention in PyTorch for GPT-2?"
abstract_query = stepback.generate_stepback_query(specific_query)

print(f"具體查詢: {specific_query}")
print(f"抽象查詢: {abstract_query}")

## Part 5: RAG 評估框架

### 5.1 評估維度

```
┌─────────────────────────────────────────────────────────────┐
│                    RAG 評估維度                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. 檢索品質 (Retrieval Quality)                            │
│     • Precision@K: 前 K 個結果中相關的比例                   │
│     • Recall@K: 找到的相關文件佔全部相關文件的比例           │
│     • MRR: 第一個相關結果的排名倒數                         │
│     • NDCG: 考慮排名位置的相關性評分                        │
│                                                             │
│  2. 生成品質 (Generation Quality)                           │
│     • Faithfulness: 生成內容是否忠於檢索結果                 │
│     • Answer Relevance: 答案是否回答了問題                   │
│     • Fluency: 語言流暢度                                   │
│                                                             │
│  3. 端到端評估 (End-to-End)                                 │
│     • Accuracy: 答案正確率                                  │
│     • F1 Score: 與參考答案的詞彙重疊                        │
│     • Human Evaluation: 人工評估                            │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

In [None]:
class RAGEvaluator:
    """RAG 系統評估器"""
    
    def __init__(self):
        pass
    
    # === 檢索品質指標 ===
    
    @staticmethod
    def precision_at_k(retrieved: List[Document], relevant: List[Document], k: int) -> float:
        """Precision@K"""
        retrieved_ids = {id(doc) for doc in retrieved[:k]}
        relevant_ids = {id(doc) for doc in relevant}
        
        relevant_retrieved = len(retrieved_ids & relevant_ids)
        return relevant_retrieved / k if k > 0 else 0.0
    
    @staticmethod
    def recall_at_k(retrieved: List[Document], relevant: List[Document], k: int) -> float:
        """Recall@K"""
        retrieved_ids = {id(doc) for doc in retrieved[:k]}
        relevant_ids = {id(doc) for doc in relevant}
        
        relevant_retrieved = len(retrieved_ids & relevant_ids)
        return relevant_retrieved / len(relevant_ids) if relevant_ids else 0.0
    
    @staticmethod
    def mrr(retrieved: List[Document], relevant: List[Document]) -> float:
        """Mean Reciprocal Rank"""
        relevant_ids = {id(doc) for doc in relevant}
        
        for i, doc in enumerate(retrieved, 1):
            if id(doc) in relevant_ids:
                return 1.0 / i
        return 0.0
    
    @staticmethod
    def ndcg_at_k(retrieved: List[Document], relevance_scores: List[float], k: int) -> float:
        """NDCG@K"""
        def dcg(scores, k):
            return sum(score / np.log2(i + 2) for i, score in enumerate(scores[:k]))
        
        dcg_score = dcg(relevance_scores, k)
        ideal_scores = sorted(relevance_scores, reverse=True)
        idcg_score = dcg(ideal_scores, k)
        
        return dcg_score / idcg_score if idcg_score > 0 else 0.0
    
    # === 生成品質指標 ===
    
    @staticmethod
    def faithfulness(answer: str, context: str) -> float:
        """
        忠實度：答案中的資訊是否來自於 context
        簡化實作：計算答案詞彙在 context 中出現的比例
        """
        answer_words = set(answer.lower().split())
        context_words = set(context.lower().split())
        
        # 移除停用詞
        stopwords = {'the', 'a', 'an', 'is', 'are', 'was', 'were', 'be', 'been', 
                    'being', 'have', 'has', 'had', 'do', 'does', 'did', 'will',
                    'would', 'could', 'should', 'may', 'might', 'must', 'shall',
                    'to', 'of', 'in', 'for', 'on', 'with', 'at', 'by', 'from',
                    'as', 'into', 'through', 'during', 'before', 'after',
                    'above', 'below', 'between', 'under', 'again', 'further',
                    'then', 'once', 'here', 'there', 'when', 'where', 'why',
                    'how', 'all', 'each', 'few', 'more', 'most', 'other',
                    'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same',
                    'so', 'than', 'too', 'very', 'can', 'just', 'and', 'but',
                    'or', 'if', 'because', 'until', 'while', 'this', 'that'}
        
        answer_words = answer_words - stopwords
        
        if not answer_words:
            return 1.0
        
        overlap = len(answer_words & context_words)
        return overlap / len(answer_words)
    
    @staticmethod
    def answer_relevance(answer: str, question: str) -> float:
        """
        答案相關性：答案是否回答了問題
        簡化實作：問題關鍵詞在答案中出現的比例
        """
        # 提取問題中的關鍵詞
        question_words = set(re.findall(r'\w+', question.lower()))
        answer_words = set(re.findall(r'\w+', answer.lower()))
        
        # 移除疑問詞和常見詞
        question_words -= {'what', 'how', 'why', 'when', 'where', 'who', 'which',
                          'is', 'are', 'the', 'a', 'an', 'do', 'does', 'can'}
        
        if not question_words:
            return 1.0
        
        overlap = len(question_words & answer_words)
        return overlap / len(question_words)

# 測試評估器
evaluator = RAGEvaluator()

# 模擬檢索結果
retrieved_docs = [
    Document(content="Relevant doc 1"),
    Document(content="Irrelevant doc"),
    Document(content="Relevant doc 2"),
    Document(content="Irrelevant doc 2"),
    Document(content="Relevant doc 3"),
]

relevant_docs = [retrieved_docs[0], retrieved_docs[2], retrieved_docs[4]]

print("檢索品質評估:")
print(f"  Precision@3: {evaluator.precision_at_k(retrieved_docs, relevant_docs, 3):.3f}")
print(f"  Recall@3: {evaluator.recall_at_k(retrieved_docs, relevant_docs, 3):.3f}")
print(f"  MRR: {evaluator.mrr(retrieved_docs, relevant_docs):.3f}")

# 模擬生成評估
context = "Deep learning is a subset of machine learning that uses neural networks with multiple layers."
answer = "Deep learning uses neural networks with many layers to learn complex patterns."
question = "What is deep learning?"

print("\n生成品質評估:")
print(f"  Faithfulness: {evaluator.faithfulness(answer, context):.3f}")
print(f"  Answer Relevance: {evaluator.answer_relevance(answer, question):.3f}")

## Part 6: 綜合 RAG 系統

整合所有進階技術。

In [None]:
class AdvancedRAG:
    """進階 RAG 系統：整合 HyDE、Multi-Query、Reranking"""
    
    def __init__(self, 
                 embedding_model_name: str = 'all-MiniLM-L6-v2',
                 use_hyde: bool = True,
                 use_multi_query: bool = True,
                 use_reranking: bool = True):
        
        self.use_hyde = use_hyde
        self.use_multi_query = use_multi_query
        self.use_reranking = use_reranking
        
        # 初始化組件
        try:
            from sentence_transformers import SentenceTransformer
            self.embedding_model = SentenceTransformer(embedding_model_name)
        except ImportError:
            self.embedding_model = None
        
        self.reranker = CrossEncoderReranker() if use_reranking else None
        self.documents: List[Document] = []
        self.embeddings: np.ndarray = None
    
    def add_documents(self, documents: List[Document]):
        """建立索引"""
        self.documents.extend(documents)
        
        if self.embedding_model:
            new_embeddings = self.embedding_model.encode(
                [doc.content for doc in documents],
                show_progress_bar=False
            )
            if self.embeddings is None:
                self.embeddings = new_embeddings
            else:
                self.embeddings = np.vstack([self.embeddings, new_embeddings])
    
    def _generate_queries(self, query: str) -> List[str]:
        """生成多個查詢"""
        queries = [query]
        
        if self.use_multi_query:
            # 簡化版多查詢
            queries.append(f"explain {query}")
            queries.append(f"details about {query}")
        
        if self.use_hyde:
            # 簡化版 HyDE
            hyde_query = f"This document provides comprehensive information about {query}. It covers key concepts, methodologies, and applications."
            queries.append(hyde_query)
        
        return queries
    
    def _initial_retrieval(self, query: str, k: int) -> List[Tuple[Document, float]]:
        """初始檢索"""
        if not self.embedding_model or self.embeddings is None:
            return [(doc, 0.5) for doc in self.documents[:k]]
        
        query_embedding = self.embedding_model.encode([query])[0]
        
        # 餘弦相似度
        similarities = np.dot(self.embeddings, query_embedding) / (
            np.linalg.norm(self.embeddings, axis=1) * np.linalg.norm(query_embedding)
        )
        
        top_indices = np.argsort(similarities)[::-1][:k]
        return [(self.documents[i], float(similarities[i])) for i in top_indices]
    
    def retrieve(self, query: str, 
                 initial_k: int = 20,
                 final_k: int = 5) -> List[Tuple[Document, float]]:
        """
        進階檢索流程
        
        1. Query Transformation (Multi-Query / HyDE)
        2. Initial Retrieval (Bi-encoder)
        3. Reranking (Cross-encoder)
        """
        # Step 1: Query Transformation
        queries = self._generate_queries(query)
        
        # Step 2: Initial Retrieval (對所有查詢)
        all_candidates = {}
        for q in queries:
            results = self._initial_retrieval(q, initial_k)
            for doc, score in results:
                doc_id = id(doc)
                if doc_id not in all_candidates:
                    all_candidates[doc_id] = (doc, score)
                else:
                    # 融合分數（取最大）
                    _, existing_score = all_candidates[doc_id]
                    all_candidates[doc_id] = (doc, max(score, existing_score))
        
        # 轉換為列表
        candidates = list(all_candidates.values())
        
        # Step 3: Reranking
        if self.use_reranking and self.reranker:
            candidate_docs = [doc for doc, _ in candidates]
            results = self.reranker.rerank(query, candidate_docs, top_k=final_k)
        else:
            candidates.sort(key=lambda x: x[1], reverse=True)
            results = candidates[:final_k]
        
        return results

print("AdvancedRAG 系統已定義")
print("\n支援的功能:")
print("  - HyDE (Hypothetical Document Embeddings)")
print("  - Multi-Query Retrieval")
print("  - Cross-encoder Reranking")

## Part 7: 練習題

### Exercise 1: 實作 Contextual Compression

在將文件送入 LLM 前，壓縮/過濾掉不相關的部分。

In [None]:
class ContextualCompressor:
    """
    上下文壓縮器：過濾掉文件中與查詢不相關的部分
    """
    
    def __init__(self, embedding_model=None, similarity_threshold: float = 0.3):
        try:
            from sentence_transformers import SentenceTransformer
            self.embedding_model = embedding_model or SentenceTransformer('all-MiniLM-L6-v2')
        except ImportError:
            self.embedding_model = None
        
        self.similarity_threshold = similarity_threshold
    
    def compress(self, query: str, document: Document) -> Document:
        """
        壓縮文件，只保留與查詢相關的句子
        """
        # TODO: 實作壓縮邏輯
        # 1. 將文件分成句子
        # 2. 計算每個句子與查詢的相似度
        # 3. 只保留相似度高於閾值的句子
        
        sentences = re.split(r'(?<=[.!?])\s+', document.content)
        
        if not self.embedding_model:
            # 簡化版：使用詞彙重疊
            query_words = set(query.lower().split())
            relevant_sentences = []
            for sent in sentences:
                sent_words = set(sent.lower().split())
                overlap = len(query_words & sent_words) / len(query_words) if query_words else 0
                if overlap > self.similarity_threshold:
                    relevant_sentences.append(sent)
        else:
            # 使用 embedding 相似度
            query_emb = self.embedding_model.encode([query])[0]
            sent_embs = self.embedding_model.encode(sentences)
            
            relevant_sentences = []
            for sent, sent_emb in zip(sentences, sent_embs):
                sim = np.dot(query_emb, sent_emb) / (
                    np.linalg.norm(query_emb) * np.linalg.norm(sent_emb)
                )
                if sim > self.similarity_threshold:
                    relevant_sentences.append(sent)
        
        # 如果沒有相關句子，返回原文件的開頭部分
        if not relevant_sentences:
            compressed_content = ' '.join(sentences[:2]) if len(sentences) > 2 else document.content
        else:
            compressed_content = ' '.join(relevant_sentences)
        
        return Document(
            content=compressed_content,
            metadata={**document.metadata, 'compressed': True, 'original_length': len(document.content)}
        )

# 測試壓縮器
compressor = ContextualCompressor(similarity_threshold=0.2)

test_doc = Document(content="""
Machine learning is a field of artificial intelligence. It allows computers to learn from data.
The weather today is sunny with a high of 25 degrees. Perfect for outdoor activities.
Deep learning is a subset of machine learning using neural networks. It has revolutionized many fields.
Pizza is a popular Italian dish. It comes with various toppings like cheese and pepperoni.
Neural networks are inspired by the human brain. They consist of layers of interconnected nodes.
""")

query = "How does machine learning use neural networks?"
compressed = compressor.compress(query, test_doc)

print(f"Query: {query}")
print(f"\nOriginal length: {len(test_doc.content)} chars")
print(f"Compressed length: {len(compressed.content)} chars")
print(f"\nCompressed content:\n{compressed.content}")

### Exercise 2: 實作 Self-RAG

讓模型自己決定何時需要檢索。

In [None]:
class SelfRAG:
    """
    Self-RAG: 模型自主決定是否需要檢索
    
    流程:
    1. 判斷是否需要檢索
    2. 如果需要，執行檢索
    3. 判斷檢索結果是否有用
    4. 生成最終答案
    """
    
    def __init__(self, retriever, generator=None):
        self.retriever = retriever
        self.generator = generator
        
        # 判斷是否需要檢索的關鍵詞
        self.retrieval_triggers = [
            'what is', 'how does', 'explain', 'define', 'describe',
            'who is', 'when did', 'where is', 'why does',
            'tell me about', 'information about'
        ]
    
    def needs_retrieval(self, query: str) -> bool:
        """
        判斷查詢是否需要檢索
        
        簡化實作：基於規則判斷
        實際應該用 LLM 判斷
        """
        query_lower = query.lower()
        
        # 檢查觸發詞
        for trigger in self.retrieval_triggers:
            if trigger in query_lower:
                return True
        
        # 檢查問號
        if '?' in query:
            return True
        
        return False
    
    def is_relevant(self, query: str, document: Document, threshold: float = 0.3) -> bool:
        """
        判斷檢索結果是否相關
        """
        query_words = set(query.lower().split())
        doc_words = set(document.content.lower().split())
        
        # 移除停用詞
        stopwords = {'the', 'a', 'an', 'is', 'are', 'what', 'how', 'why', 'when', 'where', 'who'}
        query_words -= stopwords
        
        if not query_words:
            return True
        
        overlap = len(query_words & doc_words) / len(query_words)
        return overlap >= threshold
    
    def query(self, query: str, k: int = 3) -> Dict:
        """
        執行 Self-RAG 查詢
        """
        result = {
            'query': query,
            'retrieval_needed': False,
            'retrieved_docs': [],
            'relevant_docs': [],
            'answer': None
        }
        
        # Step 1: 判斷是否需要檢索
        if self.needs_retrieval(query):
            result['retrieval_needed'] = True
            
            # Step 2: 執行檢索
            retrieved = self.retriever.retrieve(query, k=k)
            result['retrieved_docs'] = retrieved
            
            # Step 3: 過濾相關結果
            for doc, score in retrieved:
                if self.is_relevant(query, doc):
                    result['relevant_docs'].append((doc, score))
        
        # Step 4: 生成答案（這裡簡化處理）
        if result['relevant_docs']:
            context = ' '.join([doc.content for doc, _ in result['relevant_docs']])
            result['answer'] = f"Based on retrieved information: {context[:200]}..."
        else:
            result['answer'] = "I can answer this based on my knowledge."
        
        return result

# 測試
print("Self-RAG 系統已定義")
print("\n特點:")
print("1. 自動判斷是否需要檢索")
print("2. 過濾不相關的檢索結果")
print("3. 根據結果品質決定是否使用")

## 總結

```
┌─────────────────────────────────────────────────────────────┐
│                   RAG 進階技術總結                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Pre-Retrieval 優化                                         │
│  ├─ Multi-Query: 多角度查詢增加召回                         │
│  ├─ HyDE: 假設性文件縮小查詢-文件差距                       │
│  └─ Step-back: 抽象化查詢獲取背景知識                       │
│                                                             │
│  Retrieval 優化                                             │
│  ├─ Two-stage: Bi-encoder 召回 + Cross-encoder 精排        │
│  └─ Hybrid: 向量搜尋 + BM25                                 │
│                                                             │
│  Post-Retrieval 優化                                        │
│  ├─ Reranking: Cross-encoder 重排序                        │
│  ├─ Compression: 過濾不相關內容                             │
│  └─ Self-RAG: 自主決定檢索需求                              │
│                                                             │
│  評估維度                                                   │
│  ├─ Retrieval: P@K, R@K, MRR, NDCG                         │
│  ├─ Generation: Faithfulness, Relevance                    │
│  └─ End-to-End: Accuracy, F1                               │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

### 下一步學習

- **AI Agent**: `ai_agents/agent_tools.ipynb`
- **LLM 微調**: `language_models/llm_finetuning.ipynb`
- **RLHF**: `reinforcement_learning/rlhf_alignment.ipynb`

## 參考資源

### 論文
- [Precise Zero-Shot Dense Retrieval without Relevance Labels](https://arxiv.org/abs/2212.10496) - HyDE
- [Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection](https://arxiv.org/abs/2310.11511)
- [Take a Step Back: Evoking Reasoning via Abstraction](https://arxiv.org/abs/2310.06117)

### 工具
- [LangChain](https://python.langchain.com/)
- [LlamaIndex](https://www.llamaindex.ai/)
- [RAGAS](https://github.com/explodinggradients/ragas) - RAG 評估框架