[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/reranking_with_llamaindex.ipynb)

# RAG 系统中的重排方法

## 概述
重排是检索增强生成（RAG）系统中至关重要的一步，旨在提高检索文档的相关性和质量。它涉及重新评估和重新排序最初检索到的文档，以确保最相关的信息在后续处理或呈现中得到优先处理。

## 动机
在 RAG 系统中进行重排的主要动机是克服初始检索方法的局限性，这些方法通常依赖于更简单的相似性度量。重排允许进行更复杂的关联性评估，考虑到传统检索技术可能忽略的查询和文档之间的细微关系。此过程旨在通过确保在生成阶段使用最相关的信息来增强 RAG 系统的整体性能。

## 关键组件
重排系统通常包括以下组件：

1. 初始检索器：通常是使用基于嵌入的相似性搜索的向量存储。
2. 重排模型：可以是：
   - 用于评分相关性的大型语言模型（LLM）
   - 专门为相关性评估训练的交叉编码器模型
3. 评分机制：一种为文档分配相关性分数的方法
4. 排序和选择逻辑：根据新分数重新排序文档

## 方法详情
重排过程通常遵循以下步骤：

1. 初始检索：获取一组潜在相关的文档。
2. 配对创建：为每个检索到的文档创建查询-文档对。
3. 评分：
   - LLM 方法：使用提示要求 LLM 评估文档相关性。
   - 交叉编码器方法：将查询-文档对直接输入模型。
4. 分数解释：解析和规范化相关性分数。
5. 重新排序：根据其新的相关性分数对文档进行排序。
6. 选择：从重新排序的列表中选择前 K 个文档。

## 此方法的优点
重排具有以下几个优点：

1. 提高相关性：通过使用更复杂的模型，重排可以捕捉到细微的相关性因素。
2. 灵活性：可以根据具体需求和资源应用不同的重排方法。
3. 增强上下文质量：向 RAG 系统提供更相关的文档可以提高生成响应的质量。
4. 减少噪音：重排有助于过滤掉不太相关的信息，专注于最相关的内容。

## 结论
重排是 RAG 系统中一种强大的技术，可显著提高检索信息的质量。无论是使用基于 LLM 的评分还是专门的交叉编码器模型，重排都可以对文档相关性进行更细致、更准确的评估。这种提高的相关性直接转化为下游任务更好的性能，使重排成为高级 RAG 实现中必不可少的组件。

在基于 LLM 和交叉编码器的重排方法之间进行选择，取决于所需准确性、可用计算资源和特定应用需求等因素。两种方法都比基本检索方法有实质性改进，并有助于 RAG 系统的整体有效性。

<div style="text-align: center;">

<img src="../images/reranking-visualization.svg" alt="rerank llm" style="width:100%; height:auto;">
</div>

<div style="text-align: center;">

<img src="../images/reranking_comparison.svg" alt="rerank llm" style="width:100%; height:auto;">
</div>

# 包安装和导入

下面的单元格安装了运行此笔记本所需的所有必要软件包。


In [None]:
# Install required packages
!pip install faiss-cpu llama-index python-dotenv

In [9]:
import os
import sys
from dotenv import load_dotenv
from typing import List
from llama_index.core import Document
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.readers import SimpleDirectoryReader
from llama_index.vector_stores.faiss import FaissVectorStore
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import VectorStoreIndex
from llama_index.core.postprocessor import SentenceTransformerRerank, LLMRerank
from llama_index.core import QueryBundle
import faiss


# Original path append replaced for Colab compatibility

# Load environment variables from a .env file
load_dotenv()

# Set the OpenAI API key environment variable
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

# Llamaindex global settings for llm and embeddings
EMBED_DIMENSION=512
Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small", dimensions=EMBED_DIMENSION)

### 读取文档

In [None]:
# Download required data files
import os
os.makedirs('data', exist_ok=True)

# Download the PDF document used in this notebook
!wget -O data/Understanding_Climate_Change.pdf https://raw.githubusercontent.com/NirDiamant/RAG_TECHNIQUES/main/data/Understanding_Climate_Change.pdf


In [2]:
path = "data/"
reader = SimpleDirectoryReader(input_dir=path, required_exts=['.pdf'])
documents = reader.load_data()

### 创建向量存储

In [3]:
# Create FaisVectorStore to store embeddings
fais_index = faiss.IndexFlatL2(EMBED_DIMENSION)
vector_store = FaissVectorStore(faiss_index=fais_index)

## 摄取管道

In [4]:
base_pipeline = IngestionPipeline(
    transformations=[SentenceSplitter()],
    vector_store=vector_store,
    documents=documents
)

nodes = base_pipeline.run()

## Querying

### Method 1: LLM based reranking the retrieved documents

<div style="text-align: center;">

<img src="../images/rerank_llm.svg" alt="rerank llm" style="width:40%; height:auto;">
</div>

In [5]:
# Create vector index from base nodes
index = VectorStoreIndex(nodes)

query_engine_w_llm_rerank = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[
        LLMRerank(
            top_n=5
        )
    ],
)

In [None]:
resp = query_engine_w_llm_rerank.query("What are the impacts of climate change on biodiversity?")
print(resp)

#### Example that demonstrates why we should use reranking 

In [10]:
chunks = [
    "The capital of France is great.",
    "The capital of France is huge.",
    "The capital of France is beautiful.",
    """Have you ever visited Paris? It is a beautiful city where you can eat delicious food and see the Eiffel Tower. I really enjoyed all the cities in france, but its capital with the Eiffel Tower is my favorite city.""", 
    "I really enjoyed my trip to Paris, France. The city is beautiful and the food is delicious. I would love to visit again. Such a great capital city."
]
docs = [Document(page_content=sentence) for sentence in chunks]


def compare_rag_techniques(query: str, docs: List[Document] = docs) -> None:
    docs = [Document(text=sentence) for sentence in chunks]
    index = VectorStoreIndex.from_documents(docs)
    
    
    print("Comparison of Retrieval Techniques")
    print("==================================")
    print(f"Query: {query}\n")
    
    print("Baseline Retrieval Result:")
    baseline_docs = index.as_retriever(similarity_top_k=5).retrieve(query)
    for i, doc in enumerate(baseline_docs[:2]): # Get only the first two retrieved docs
        print(f"\nDocument {i+1}:")
        print(doc.text)

    print("\nAdvanced Retrieval Result:")
    reranker = LLMRerank(
        top_n=2,
    )
    advanced_docs = reranker.postprocess_nodes(
            baseline_docs, 
            QueryBundle(query)
        )
    for i, doc in enumerate(advanced_docs):
        print(f"\nDocument {i+1}:")
        print(doc.text)


query = "what is the capital of france?"
compare_rag_techniques(query, docs)

Comparison of Retrieval Techniques
Query: what is the capital of france?

Baseline Retrieval Result:

Document 1:
The capital of France is great.

Document 2:
The capital of France is huge.

Advanced Retrieval Result:

Document 1:
Have you ever visited Paris? It is a beautiful city where you can eat delicious food and see the Eiffel Tower. I really enjoyed all the cities in france, but its capital with the Eiffel Tower is my favorite city.

Document 2:
I really enjoyed my trip to Paris, France. The city is beautiful and the food is delicious. I would love to visit again. Such a great capital city.


### Method 2: Cross Encoder models

<div style="text-align: center;">

<img src="../images/rerank_cross_encoder.svg" alt="rerank cross encoder" style="width:40%; height:auto;">
</div>

LlamaIndex has builtin support for [SBERT](https://www.sbert.net/index.html) models that can be used directly as node postprocessor.

In [None]:
query_engine_w_cross_encoder = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[
        SentenceTransformerRerank(
            model='cross-encoder/ms-marco-MiniLM-L-6-v2',
            top_n=5
        )
    ],
)

resp = query_engine_w_cross_encoder.query("What are the impacts of climate change on biodiversity?")
print(resp)