## Presteps to Load llama3.2 On Colab

In [2]:
import tensorflow as tf
from psutil import virtual_memory

# Check GPU
gpu_info = tf.config.list_physical_devices('GPU')
print(f"GPU Info: {gpu_info}")

# Check RAM
ram_info = virtual_memory()
print(f"Total RAM: {ram_info.total / (1024**3)} GB")

GPU Info: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Total RAM: 15.489391326904297 GB


In [None]:
# !sudo apt-get install -y pciutils
# !curl -fsSL https://ollama.com/install.sh | sh # download ollama api
# from IPython.display import clear_output

# # Create a Python script to start the Ollama API server in a separate thread

# import os
# import threading
# import subprocess
# import requests
# import json

# def ollama():
#     os.environ['OLLAMA_HOST'] = '0.0.0.0:11434'
#     os.environ['OLLAMA_ORIGINS'] = '*'
#     subprocess.Popen(["ollama", "serve"])

# ollama_thread = threading.Thread(target=ollama)
# ollama_thread.start()


In [None]:
# from IPython.display import clear_output
# !ollama pull llama3.2:3b  & ollama pull nomic-embed-text

## Presteps to Load llama3.2 Locally

**Hardware Requirements** <br>
**CPU**: Multicore processor<br>
**RAM**: Minimum of 16 GB recommended<br>
**GPU**: NVIDIA RTX series (for optimal performance), at least 8 GB VRAM<br>

**Step1**:<br>
Download ollama from this site according to your operating system<br>
https://ollama.com/download/linux<br>
<br>
**Step2**:<br>
open your teminal<br>
<br>
**Step3**:<br>
run following commands in your terminal<br>
\$ ollama serve<br>
\$ ollama pull llama3.2:3b  & ollama pull nomic-embed-text<br>

## Load LlaMA3.2

In [None]:
# !pip install -r requirements.txt

In [3]:
from langchain_ollama.llms import OllamaLLM
from langchain_ollama.embeddings import OllamaEmbeddings

MODEL = "llama3.2:3b"

# Initialize the Llama model
model = OllamaLLM(model=MODEL)

# Create an embedding model
embeddings = OllamaEmbeddings(model="nomic-embed-text")



In [4]:
print(model.invoke("Hi. Are you LlaMA, the language model?"))

Hello! I am indeed a variant of the Llama model, specifically designed to assist and communicate with users like you. I'm here to provide information, answer questions, and engage in conversations to the best of my abilities based on my training data. How can I help you today?


# Setup

In [5]:
from dotenv import load_dotenv
import os 

load_dotenv()

os.environ["LANGCHAIN_TRACING_V2"] = os.getenv("LANGCHAIN_TRACING_V2")
os.environ["LANGCHAIN_ENDPOINT"] = os.getenv("LANGCHAIN_ENDPOINT")
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_PROJECT"] = os.getenv("LANGCHAIN_PROJECT")

## Part1 Standard RAG

In [6]:
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_chroma.vectorstores import Chroma

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from chromadb.errors import InvalidDimensionException
from langchain.chains import LLMChain
#### INDEXING ####

loader = PyPDFLoader("RAG_survey.pdf")
docs = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
splits = text_splitter.split_documents(docs)

# Embed
## NOTE: you must run Chroma().delete_collection() before load the Chroma vectorstore 
## to delete previous loaded documents.
Chroma().delete_collection()
vectorstore = Chroma.from_documents(documents = splits, embedding=embeddings)

retriever = vectorstore.as_retriever()

# Prompt
prompt = hub.pull(
    "rlm/rag-prompt", 
    api_key=os.environ["LANGCHAIN_API_KEY"],
    api_url=os.environ["LANGCHAIN_ENDPOINT"],
)

# LLM
llm = model




### (a) Chain the Components:

In [7]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Chain

output_parser = StrOutputParser()

rag_chain = (
    {"context": retriever|format_docs, "question": RunnablePassthrough()}|
    ## TODO: complete the chain here
    prompt | llm | output_parser
)

In [8]:
rag_chain.invoke("what is this paper about?")

"This paper appears to be about Research on Active Retrieval Architectures (RAG), discussing its evolution into three stages: Naive RAG, Advanced RAG, and Modular RAG, with a focus on addressing limitations and optimizing retrieval methods. It explores core components of RAG, including retrieval, generation, and augmentation, as well as post-retrieval process optimization. The paper aims to improve RAG's resistance to adversarial inputs and evaluate its performance metrics."

### (b) Explain TextSplitter Settings

In [44]:
import os
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_chroma.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from chromadb.errors import InvalidDimensionException
from langchain.chains import LLMChain

# 加載 PDF 文件
loader = PyPDFLoader("RAG_survey.pdf")
docs = loader.load()

# 定義不同的 chunk_size 和 chunk_overlap 設置
settings = [
    {"chunk_size": 500, "chunk_overlap": 100},
    {"chunk_size": 1000, "chunk_overlap": 200},
    {"chunk_size": 200, "chunk_overlap": 50},
]

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Chain

output_parser = StrOutputParser()

# 初始化嵌入模型
embeddings = OllamaEmbeddings(model="nomic-embed-text")

# 初始化 LLM 模型
llm = OllamaLLM(model="llama3.2:3b")

# 提示模板
prompt = hub.pull(
    "rlm/rag-prompt", 
    api_key=os.environ["LANGCHAIN_API_KEY"],
    api_url=os.environ["LANGCHAIN_ENDPOINT"],
)

question = "what is this paper about?"

# 循環遍歷不同的設置
for setting in settings:
    print(f"使用 chunk_size={setting['chunk_size']} 和 chunk_overlap={setting['chunk_overlap']} 的結果：")

    # 分割文本
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=setting['chunk_size'], chunk_overlap=setting['chunk_overlap'])
    splits = text_splitter.split_documents(docs)

    # 嵌入文本
    Chroma().delete_collection()
    vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)

    retriever = vectorstore.as_retriever()

    # 構建檢索鏈
    rag_chain = (
        {"context": retriever|format_docs, "question": RunnablePassthrough()}|
        ## TODO: complete the chain here
        prompt | llm | output_parser
    )

    # 執行檢索鏈
    res = rag_chain.invoke(question)
    print(res)

使用 chunk_size=500 和 chunk_overlap=100 的結果：
This paper is about the Research Agenda on Generative Models (RAG), specifically its evolution into three stages (Naive RAG, Advanced RAG, and Modular RAG) and the limitations of the RAG method compared to native Large Language Models (LLMs). The paper explores various components and optimization methods in RAG, as well as the importance of improving its resistance to adversarial inputs. It also discusses recent research on augmentation processes and retrieval methods for RAG.
使用 chunk_size=1000 和 chunk_overlap=200 的結果：
I don't know, but based on the context provided, it appears that this paper is about a research paradigm called RAG (Research Artifact Generation), which is a method for generating answers to questions using large language models. The paper discusses the development of Advanced and Modular RAG as responses to limitations in Naive RAG, with an emphasis on its adaptability, efficiency, and evaluation abilities.
使用 chunk_size=200 

Discussion:

在文本檢索過程中，TextSplitter 的設置對於結果的準確性和效率有著重要的影響。TextSplitter 的主要參數包括塊大小（chunk size）、塊重疊（chunk overlap）以及其他相關參數。

首先，塊大小（chunk size）決定了每個文本塊的長度。較大的塊大小可以包含更多的上下文信息，有助於提高檢索結果的準確性。然而，過大的塊大小可能會導致處理時間增加，並且在某些情況下，可能會包含過多無關的信息。相反，較小的塊大小可以提高檢索的速度，但可能會因為上下文信息不足而降低準確性。因此，選擇合適的塊大小需要在準確性和效率之間取得平衡。

其次，塊重疊（chunk overlap）是指相鄰文本塊之間的重疊部分。適當的重疊可以確保重要的信息不會因為被分割到不同的塊中而丟失，從而提高檢索結果的完整性和準確性。過大的重疊會增加處理的冗餘度，從而影響效率；而過小的重疊則可能導致信息丟失。因此，合理設置塊重疊是確保檢索質量的重要因素。

除了塊大小和塊重疊，其他參數如分割策略（例如按句子、段落或固定長度分割）也會影響檢索過程。按句子或段落分割可以保留自然語言的結構，從而提高檢索結果的可讀性和相關性；而固定長度分割則可能更適合於結構化數據的處理。

根據不同的 `chunk_size` 和 `chunk_overlap` 設置，檢索結果顯示了明顯的差異，這些差異反映了文本分割策略對檢索過程的影響。

使用 `chunk_size=500` 和 `chunk_overlap=100` 的結果顯示，這種設置能夠提供較為詳細和準確的回答。這是因為適中的塊大小和適當的重疊確保了足夠的上下文信息，同時避免了過多的冗餘。這樣的設置在保留文本完整性的同時，也能夠有效地處理和檢索信息。

使用 `chunk_size=1000` 和 `chunk_overlap=200` 的結果顯示，這種設置可能導致過多的上下文信息被包含在單個文本塊中，從而影響了檢索的準確性。過大的塊大小可能會使模型難以聚焦於關鍵信息，導致回答變得模糊和不確定。此外，過大的重疊也增加了處理的冗餘度，進一步影響了檢索效率。

使用 `chunk_size=200` 和 `chunk_overlap=50` 的結果顯示，這種設置可能導致上下文信息不足，從而影響了檢索結果的完整性和準確性。過小的塊大小可能會切斷重要的上下文信息，使得模型無法充分理解文本內容，導致回答不夠準確和具體。

總結來說，TextSplitter 的設置需要根據具體應用場景進行調整， `chunk_size` 和 `chunk_overlap` 的設置對檢索過程有著重要的影響。適中的塊大小和適當的重疊可以在保留上下文信息和避免冗餘之間取得平衡，從而提高檢索結果的準確性和效率。根據具體的應用場景和需求，合理調整這些參數可以顯著提升文本檢索系統的性能。

### (c) Experiment with Retriever Settings

In [9]:
# 例子1：设置返回的文档数量 k
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
retrieved_docs = retriever.invoke("what is this paper about?")
print("Setting: k=3")
for doc in retrieved_docs:
    print(doc)
    print()

# 例子2：更改检索类型，例如向量搜索或相似度搜索
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})
retrieved_docs = retriever.invoke("what is this paper about?")
print("Setting: search_type=similarity, k=3")
for doc in retrieved_docs:
    print(doc)
    print()

# 例子3：尝试更大范围的文档数量
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
retrieved_docs = retriever.invoke("what is this paper about?")
print("Setting: k=5")
for doc in retrieved_docs:
    print(doc)
    print()


Setting: k=3
page_content='question, form a comprehensive prompt that empowers LLMs
to generate a well-informed answer.
The RAG research paradigm is continuously evolving, and
we categorize it into three stages: Naive RAG, Advanced
RAG, and Modular RAG, as showed in Figure 3. Despite
RAG method are cost-effective and surpass the performance
of the native LLM, they also exhibit several limitations.
The development of Advanced RAG and Modular RAG is
a response to these specific shortcomings in Naive RAG.
A. Naive RAG' metadata={'page': 1, 'source': 'RAG_survey.pdf'}

page_content='irrelevant context,” in International Conference on Machine Learning .
PMLR, 2023, pp. 31 210–31 227.
[88] R. Teja, “Evaluating the ideal chunk size for a rag
system using llamaindex,” https://www.llamaindex.ai/blog/
evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5,
2023.' metadata={'page': 17, 'source': 'RAG_survey.pdf'}

page_content='The paper unfolds as follows: Section II intr

Discussion:
在這次實驗中，我們嘗試了不同的檢索器設置，包括 `k` 值（返回的頂部文檔數量）和檢索類型（如相似度檢索），並觀察了每種設置下檢索到的文檔內容。以下是對這些設置的討論及其適用場景。

### 設置：k=3
在這種設置下，檢索器返回了三個頂部文檔。這些文檔涵蓋了 RAG 研究範式的演變、主要組件和優化方法等關鍵內容。這種設置適合於需要快速獲取關鍵信息的場景，因為返回的文檔數量較少，可以更快地處理和分析。然而，這也意味著可能會遺漏一些有價值的信息，特別是在文檔數量較多且信息分散的情況下。

### 設置：search_type=similarity, k=3
在這種設置下，檢索器根據相似度返回了三個頂部文檔。結果顯示，這種設置能夠有效地找到與查詢最相關的文檔，並且返回的內容與 `k=3` 的設置相似。這表明相似度檢索在確保相關性方面具有良好的性能。這種設置適合於需要高相關性檢索結果的場景，例如精確回答具體問題或查找特定主題的信息。

### 設置：k=5
在這種設置下，檢索器返回了五個頂部文檔。相比於 `k=3` 的設置，這種設置能夠提供更多的信息，涵蓋了更廣泛的內容，包括 RAG 的挑戰和未來發展方向等。這種設置適合於需要全面了解某個主題的場景，因為返回的文檔數量更多，可以提供更豐富的上下文信息。然而，這也意味著需要更多的時間來處理和分析這些文檔。

### 討論
不同的檢索器設置在不同的場景下具有不同的適用性。較小的 `k` 值（如 `k=3`）適合於需要快速獲取關鍵信息的場景，而較大的 `k` 值（如 `k=5`）則適合於需要全面了解某個主題的場景。相似度檢索能夠確保返回的文檔與查詢高度相關，適合於需要高相關性檢索結果的場景。

總結來說，選擇合適的檢索器設置需要根據具體的應用需求進行調整。在實際應用中，可以根據查詢的具體要求和場景，靈活調整 `k` 值和檢索類型，以獲得最佳的檢索效果。

## Part2 Multi-Query RAG

### (a) Prompt Template for Multi-Query:

In [14]:
from langchain.prompts import ChatPromptTemplate

# Design a prompt template that instructs the language model to respond to questions from multiple perspectives.
template = """
You are an expert in various fields. Please provide answers to the following question from different perspectives:
1. As a scientist
2. As a historian
3. As a philosopher
4. As an economist

Original question: {question}
"""

prompt_perspectives = ChatPromptTemplate.from_template(template)

In [21]:
generate_queries = (
    prompt_perspectives 
    | llm
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
    | (lambda x : list(filter(lambda x : x != '', x)))
)

In [22]:
# You may generate some queries here to see if the queries diverse enough
question = "What is this paper about?"
generate_queries.invoke({"question": question})

["I'll respond from four different perspectives:",
 '**As a Scientist**',
 'This paper appears to be a study on the effects of environmental pollution on human health and ecosystems. Based on the title, "Impact of Human Activity on Ecological Balance," I would expect the paper to discuss the scientific methods used to collect data on pollutants in various environments, such as air quality, water contamination, and soil degradation. The researchers may also explore the biological consequences of these pollutants on human populations and the natural world.',
 '**As a Historian**',
 'From a historical perspective, this paper seems to be an analysis of the relationship between human activity and ecological balance throughout time. I would expect the authors to have conducted extensive research on past events, such as industrial revolutions, colonialism, and environmental disasters, to understand how these activities impacted the natural world. The paper may also touch on the development of

In [23]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

# Retrieve
question = "What is this paper about?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question":question})

In [24]:
docs

[Document(metadata={'page': 15, 'source': 'RAG_survey.pdf'}, page_content='VIII. C ONCLUSION\nThe summary of this paper, as depicted in Figure 6, empha-\nsizes RAG’s significant advancement in enhancing the capa-\nbilities of LLMs by integrating parameterized knowledge from\nlanguage models with extensive non-parameterized data from\nexternal knowledge bases. The survey showcases the evolution\nof RAG technologies and their application on many different\ntasks. The analysis outlines three developmental paradigms\nwithin the RAG framework: Naive, Advanced, and Modu-'),
 Document(metadata={'page': 16, 'source': 'RAG_survey.pdf'}, page_content='search,” arXiv preprint arXiv:2311.03758 , 2023.\n[10] H. S. Zheng, S. Mishra, X. Chen, H.-T. Cheng, E. H. Chi, Q. V . Le,\nand D. Zhou, “Take a step back: Evoking reasoning via abstraction in\nlarge language models,” arXiv preprint arXiv:2310.06117 , 2023.\n[11] L. Gao, X. Ma, J. Lin, and J. Callan, “Precise zero-shot dense retrieval\nwithout rele

### (b) Multi-Query RAG Chain: 

In [25]:
from operator import itemgetter
# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

# Construct a Multi-Query RAG Chain
multi_query_rag_chain = (
    retrieval_chain |
    (lambda docs: {"context": "\n\n".join([doc.page_content for doc in docs]), "question": question}) |
    prompt |
    llm |
    output_parser
)

### (c) Example Comparisons:

In [30]:
# Define the question
question = "What are the key challenges discussed in the paper?"

# Standard RAG output
standard_rag_output = rag_chain.invoke(question)

# Multi-query RAG output
multi_query_rag_output = multi_query_rag_chain.invoke({"question": question})

# Print the outputs
print("Standard RAG Output:")
print(standard_rag_output)
print("\nMulti-Query RAG Output:")
print(multi_query_rag_output)

Standard RAG Output:
The key challenges discussed in the paper include overcoming the limitations of Naive RAG, managing increasing context sizes that surpass the capabilities of traditional LLMs, and addressing specific shortcomings in the retrieval-augmentation process. These challenges are a response to the limitations of Naive RAG and warrant further research. The paper explores potential avenues for research and development to address these challenges.

Multi-Query RAG Output:
The paper discusses several key challenges related to Retrieval-Augmented Generation (RAG) models, including:

1. Misinformation and counterfactual inputs: Improving RAG's resistance to adversarial or counterfactual inputs is a major challenge.
2. Context relevance and noise robustness: Evaluating the quality of retrieval and generation requires assessing context relevance and noise robustness.
3. External knowledge requirements: The type of retrieval source and granularity of retrieval units affect the fina

### Discussion:

在這次實驗中，我們比較了標準 RAG 和多查詢 RAG 的輸出結果，以展示多查詢方法如何在處理複雜或模糊查詢時提供更全面或更細緻的答案。

#### 標準 RAG 輸出：
標準 RAG 的輸出主要集中在克服 Naive RAG 的局限性、管理超過傳統 LLM 能力的上下文大小以及解決檢索增強過程中的特定缺點。這些挑戰是對 Naive RAG 局限性的回應，並且需要進一步的研究。該論文探討了研究和開發的潛在途徑，以解決這些挑戰。

#### 多查詢 RAG 輸出：
多查詢 RAG 的輸出更為詳細，涵蓋了與檢索增強生成（RAG）模型相關的多個關鍵挑戰，包括：

1. 錯誤信息和反事實輸入：提高 RAG 對抗對抗性或反事實輸入的抵抗力是一個主要挑戰。
2. 上下文相關性和噪聲魯棒性：評估檢索和生成的質量需要評估上下文相關性和噪聲魯棒性。
3. 外部知識需求：檢索源的類型和檢索單元的粒度會影響最終的生成結果。
4. 語義不連續性和無關信息的積累：檢索迭代可能會受到語義不連續性和無關信息積累的影響。
5. 評估指標：為 RAG 模型開發有效的評估指標具有挑戰性，現有的指標如準確性、召回率、精確度和 BLEU 需要進行調整或結合以應對這些挑戰。


#### 比較與討論：
從結果可以看出，多查詢 RAG 方法提供了更全面和細緻的答案。標準 RAG 方法雖然能夠識別出一些關鍵挑戰，但其輸出較為簡略，缺乏對具體問題的深入探討。相比之下，多查詢 RAG 方法能夠識別出更多的挑戰，並提供更詳細的解釋，這使得其在處理複雜或模糊查詢時更具優勢。

多查詢方法的優勢在於它能夠從多個角度檢索和生成答案，從而提供更豐富的上下文信息和更全面的見解。這對於需要深入理解和分析的場景特別有用，例如研究報告、技術文檔或需要多方面考量的決策支持系統。

總結來說，多查詢 RAG 方法在處理複雜或模糊查詢時能夠提供更全面和細緻的答案，這使得其在許多應用場景中具有顯著的優勢。未來的研究可以進一步探索如何優化多查詢方法，以提高其效率和準確性。

## Part3 RAG Fusion

In [32]:
template = """
You are an expert in various fields. Please provide answers to the following question from different perspectives:
1. As a scientist
2. As a historian
3. As a philosopher
4. As an economist

Original question: {question}
"""

prompt_rag_fusion = ChatPromptTemplate.from_template(template)

In [35]:
generate_queries = (
    prompt_rag_fusion 
    | llm
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
    | (lambda x : list(filter(lambda x : x != '', x)))
)

In [36]:
generate_queries.invoke(question)

['I\'d be happy to provide answers to the question "What are the key challenges discussed in the paper?" from different perspectives:',
 '**As a Scientist:**',
 'The key challenges discussed in the paper appear to be related to the impact of climate change on human health and well-being. From a scientific perspective, the paper likely highlights the following challenges:',
 '1. Rising temperatures and extreme weather events: The paper may discuss the increasing frequency and severity of heatwaves, droughts, and storms, which can have devastating effects on human health.',
 '2. Air and water pollution: The paper may also highlight the challenges posed by air and water pollution, particularly in urban areas where high population densities and industrial activities contribute to environmental degradation.',
 '3. Vector-borne diseases: The paper likely addresses the risks of vector-borne diseases such as malaria, dengue fever, and Zika virus, which are exacerbated by climate change.',
 '4.

### (a) Implement Reciprocal Rank Fusion (RRF)

In [38]:
def reciprocal_rank_fusion(results: list[list], c=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            doc_str = dumps(doc)
            # Calculate the RRF score
            score = 1 / (c + rank + 1)
            # Update the fused score for the document
            if doc_str in fused_scores:
                fused_scores[doc_str] += score
            else:
                fused_scores[doc_str] = score

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion

### (b) RRF Example and k-Value Discussion: 

In [39]:
# Define the question
question = "What are the key challenges discussed in the paper?"

# Retrieve and re-rank documents using RRF
reranked_docs = retrieval_chain_rag_fusion.invoke({"question": question})

# Print the re-ranked documents
print("Re-ranked Documents using RRF:")
for doc, score in reranked_docs:
    print(f"Score: {score}")
    print(doc.page_content)
    print()

Re-ranked Documents using RRF:
Score: 0.20744532492450257
question, form a comprehensive prompt that empowers LLMs
to generate a well-informed answer.
The RAG research paradigm is continuously evolving, and
we categorize it into three stages: Naive RAG, Advanced
RAG, and Modular RAG, as showed in Figure 3. Despite
RAG method are cost-effective and surpass the performance
of the native LLM, they also exhibit several limitations.
The development of Advanced RAG and Modular RAG is
a response to these specific shortcomings in Naive RAG.
A. Naive RAG

Score: 0.15954739136091595
(FT) and prompt engineering. Each method has distinct charac-
teristics as illustrated in Figure 4. We used a quadrant chart to
illustrate the differences among three methods in two dimen-
sions: external knowledge requirements and model adaption
requirements. Prompt engineering leverages a model’s inherent
capabilities with minimum necessity for external knowledge
and model adaption. RAG can be likened to providing 

### Discussion:

在這次實驗中，我們展示了使用 RRF（Ranked Retrieval Fusion）重新排序文檔的例子，並討論了 RRF 中的 c 值如何影響結果，以及 RRF 在哪些情境下具有優勢。

#### 重新排序的文檔使用 RRF：
結果顯示，使用 RRF 重新排序後的文檔得分如下：

1. **Score: 0.20744532492450257**
   - 這篇文檔討論了 RAG 研究範式的演變，包括 Naive RAG、Advanced RAG 和 Modular RAG 的三個階段，並指出了 RAG 方法的成本效益和性能優勢，同時也提到了其局限性。

2. **Score: 0.15954739136091595**
   - 這篇文檔介紹了不同方法的特點，使用象限圖展示了三種方法在外部知識需求和模型適應需求兩個維度上的差異。

3. **Score: 0.1446904405160619**
   - 這篇文檔描述了模型在回答問題時的不同方法，並討論了在多輪對話中整合現有對話歷史的能力。

4. **Score: 0.11192758101966194**
   - 這篇文檔主要介紹了 RAG 面臨的挑戰和未來的研究方向，特別是 RAG 與長上下文的關係。

5. **Score: 0.11017433690787737**
   - 這篇文檔討論了幻覺問題、無關內容和偏見對 RAG 輸出的影響，以及增強過程中的挑戰。

#### c 值在 RRF 中的影響：
RRF 中的 c 值是一個調整參數，用於平衡不同檢索器的影響。較大的 c 值會增加高排名文檔的權重，使得這些文檔在最終排序中更具優勢。相反，較小的 c 值會減少高排名文檔的權重，使得更多的文檔有機會進入最終排序。

在這次實驗中，我們可以看到，使用 RRF 重新排序後，得分較高的文檔更集中於與查詢高度相關的內容，這表明 RRF 能夠有效地提升檢索結果的相關性和質量。

#### RRF 的適用場景：
RRF 在以下情境中特別有用：

1. **多檢索器融合**：
   - 當使用多個檢索器進行檢索時，RRF 可以有效地融合不同檢索器的結果，從而提高最終排序的準確性和相關性。

2. **處理複雜查詢**：
   - 對於複雜或模糊的查詢，RRF 能夠通過重新排序，將最相關的文檔提升到前面，從而提供更準確和全面的答案。

3. **提高檢索質量**：
   - RRF 可以平衡不同檢索器的影響，避免單一檢索器的偏差，從而提高檢索結果的整體質量。

總結來說，RRF 是一種強大的重新排序技術，能夠通過調整 c 值來平衡不同檢索器的影響，從而提高檢索結果的相關性和質量。在多檢索器融合和處理複雜查詢的場景中，RRF 特別有用，能夠提供更準確和全面的答案。

### (c) RAG Fusion Chain:

In [41]:
# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

# Implement the RAG Fusion chain
rag_fusion_chain = (
    retrieval_chain_rag_fusion |
    (lambda docs: {"context": "\n\n".join([doc.page_content for doc, _ in docs]), "question": question}) |
    prompt |
    llm |
    output_parser
)

res = rag_fusion_chain.invoke({"question": question})

In [42]:
print(res)

The paper discusses several key challenges related to Retrieval-Augmented Generation (RAG) models. These include:

1. **Retrieval Quality**: Evaluating the effectiveness of the context sourced by RAG, which is crucial for determining its overall quality.
2. **Hybrid Approaches**: Combining RAG with fine-tuning is becoming increasingly popular as a way to enhance model performance.
3. **Optimization Process**: The optimization process involving RAG and fine-tuning may require multiple iterations to achieve satisfactory results.
4. **Noise Robustness**: RAG models need to be able to handle noise and irrelevant information in the retrieved documents, which can impact their accuracy.
5. **Context Relevance**: Ensuring that the model is relevantly informative and focused on answering specific questions.
6. **Hybrid Approaches**: Combining RAG with fine-tuning is emerging as a leading approach for enhancing model performance.
7. **Incorporating Irrelevant Documents**: Including irrelevant do