# RAG 中的反馈循环

带有反馈循环机制的 RAG 系统，能够使其持续不断地自我改进。通过收集和整合用户反馈，系统在每一次交互中都能学会提供更相关、更高质量的响应。

传统的 RAG 系统是静态的——它们完全基于嵌入相似度来检索信息。而通过反馈循环，可以创建了一个动态系统，它能够：

-   记住哪些方法有效（哪些无效）
-   随时间调整文档的相关性分数
-   将成功的问答对（Q&A pairs）整合进其知识库
-   在每次用户交互中变得更加智能

导入相关的库

In [55]:
import pymupdf
import os
import numpy as np
import json
import openai
from tqdm import tqdm
import re
from datetime import datetime

提取pdf文本

In [34]:
def extract_text_from_pdf(pdf_path):
    """
    提取PDF文件中的文本并打印前`num_chars`个字符。

    参数：
    pdf_path (str): PDF文件的路径。

    返回：
    str: 从PDF中提取的文本。

    """
    # 打开PDF文件
    mypdf = pymupdf.open(pdf_path)
    all_text = ""  # 初始化一个空字符串来存储提取的文本

    # 迭代PDF中的每个页面
    for page_num in range(mypdf.page_count):
        page = mypdf[page_num]  # 获取页面
        text = page.get_text("text")  # 从页面中提取文本
        all_text += text  # 将提取的文本附加到all_text字符串

    return all_text  # 返回提取的文本

pdf_path = "data/AI_Information.pdf"


extracted_text = extract_text_from_pdf(pdf_path)

print(extracted_text[:500])

Understanding Artificial Intelligence 
Chapter 1: Introduction to Artificial Intelligence 
Artificial intelligence (AI) refers to the ability of a digital computer or computer-controlled robot 
to perform tasks commonly associated with intelligent beings. The term is frequently applied to 
the project of developing systems endowed with the intellectual processes characteristic of 
humans, such as the ability to reason, discover meaning, generalize, or learn from past 
experience. Over the past f


分块

In [35]:
def chunk_text(text, n, overlap):
    """
    将文本分割为多个块，每个块的大小为n，重叠部分为overlap。
    参数：
    text: 输入的文本
    n: 每个块的大小
    overlap: 相邻块之间的重叠部分大小

    返回：
    文本块列表
    """
    chunks = []  
    for i in range(0, len(text), n - overlap):
        
        chunks.append(text[i:i + n])
    
    return chunks  

配置client

In [36]:
client = openai.OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # 如果您没有配置环境变量，请在此处用您的API Key进行替换
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"  # 百炼服务的base_url
)

简易向量库

In [37]:
class SimpleVectorStore:
    """
    简易的向量存储库。
    """
    def __init__(self):
        
        self.vectors = []
        self.texts = []
        self.metadata = []
    
    def add_item(self, text, embedding, metadata=None):
        """
        添加一个新的项到存储库。

        参数:
        text (str): 文本内容。
        embedding (List[float]): 文本的嵌入向量。
        metadata (Dict, optional): 与文本相关的元数据。
        """
        self.vectors.append(np.array(embedding))
        self.texts.append(text)
        self.metadata.append(metadata or {})
    
    def similarity_search(self, query_embedding, k=5):
        """
        查找与查询嵌入向量最相似的文本。

        参数:
        query_embedding (List[float]): 查询的嵌入向量。
        k (int, optional): 返回最相似的k个结果。

        返回:
        List[Dict]: 最相似的文本及其相关信息。
        """
        if not self.vectors:
            return []
        

        query_vector = np.array(query_embedding)
        

        similarities = []
        for i, vector in enumerate(self.vectors):
            similarity = np.dot(query_vector, vector) / (np.linalg.norm(query_vector) * np.linalg.norm(vector))
            similarities.append((i, similarity))
        

        similarities.sort(key=lambda x: x[1], reverse=True)
        

        results = []
        for i in range(min(k, len(similarities))):
            idx, score = similarities[i]
            results.append({
                "text": self.texts[idx],
                "metadata": self.metadata[idx],
                "similarity": score
            })
        
        return results

生成向量

In [38]:
def create_embeddings_in_batches(text_chunks, model="text-embedding-v3", batch_size_limit=10): # 我改成了官方模型名，你可以换回 "text-embedding-v3"
    """
    调用 OpenAI 的 Embedding API 来创建文本列表的嵌入向量，处理批处理大小限制。

    参数:
    text_chunks (List[str]): 需要创建嵌入的文本字符串列表。
    model (str): 使用的嵌入模型。
    batch_size_limit (int): API 允许的最大批处理大小。根据错误信息，这里是10。

    返回:
    List[List[float]]: 所有文本的嵌入向量列表。
    """
    all_embeddings = []
    if not text_chunks:
        return []

    if not isinstance(text_chunks, list): # 确保输入是列表
        text_chunks = [text_chunks]

    for i in range(0, len(text_chunks), batch_size_limit):
        batch = text_chunks[i:i + batch_size_limit]
        try:
            #print(f"Processing batch {i//batch_size_limit + 1}, size: {len(batch)}")
            response = client.embeddings.create(
                input=batch,
                model=model,
                encoding_format="float"
            )
            # 从响应中提取该批次的嵌入向量
            batch_embeddings = [item.embedding for item in response.data]
            all_embeddings.extend(batch_embeddings)


        except Exception as e:
            print(f"Error processing batch starting with chunk: '{batch[0][:50]}...'")
            print(f"API Error: {e}")

            raise e 

    return all_embeddings

def create_embeddings(text, model="text-embedding-v3"):
    """
    字符串向量化
    参数:
    text (str): 需要创建嵌入的文本字符串。
    model (str): 使用的嵌入模型。

    返回:
    List[float]: 文本的嵌入向量。
    """
    response = client.embeddings.create(
        model=model,
        input=text
    )

    return response.data[0].embedding

反馈系统

In [39]:
def get_user_feedback(query, response, relevance, quality, comments=""):
    """
    格式化用户反馈。

    Args:
        query (str): 用户查询
        response (str): 模型响应
        relevance (bool): 响应是否与查询相关
        quality (bool): 响应质量是否良好
        comments (str): 可选的反馈评论

    Returns:
        dict: 格式化的反馈
    """
    return {
        "query": query,
        "response": response,
        "relevance": int(relevance),
        "quality": int(quality),
        "comments": comments,
        "timestamp": datetime.now().isoformat()
    }

In [40]:
def store_feedback(feedback, feedback_file="feedback_data.json"):
    """
    存储反馈到文件。

    Args:
        feedback (Dict): 用户反馈
        feedback_file (str): 存储反馈的文件名
    """
    with open(feedback_file, "a") as f:
        json.dump(feedback, f)
        f.write("\n")

In [41]:
def load_feedback_data(feedback_file="feedback_data.json"):
    """
    加载反馈数据
    """
    feedback_data = []
    try:
        with open(feedback_file, "r") as f:
            for line in f:
                if line.strip():
                    feedback_data.append(json.loads(line.strip()))
    except FileNotFoundError:
        print("No feedback data file found. Starting with empty feedback.")
    
    return feedback_data

文本处理和反馈流程

In [42]:
def process_document(pdf_path, chunk_size=1000, chunk_overlap=200):
    """
    处理带有反馈循环的RAG（检索增强生成）文档。
    此函数处理完整的文档处理管道：
    1、从PDF中提取文本
    2、重叠文本分块
    3、嵌入区块创建
    4、矢量数据库元数据存储
    """
    
    print("Extracting text from PDF...")
    extracted_text = extract_text_from_pdf(pdf_path)
    
    print("Chunking text...")
    chunks = chunk_text(extracted_text, chunk_size, chunk_overlap)
    print(f"Created {len(chunks)} text chunks")
    

    print("Creating embeddings for chunks...")
    chunk_embeddings = create_embeddings_in_batches(chunks)
    
    store = SimpleVectorStore()

    for i, (chunk, embedding) in enumerate(zip(chunks, chunk_embeddings)):
        store.add_item(
            text=chunk,
            embedding=embedding,
            metadata={
                "index": i,                
                "source": pdf_path,     
                "relevance_score": 1.0,   
                "feedback_count": 0        
            }
        )
    
    print(f"Added {len(chunks)} chunks to the vector store")
    return chunks, store

基于反馈的相关性调整

In [43]:
def assess_feedback_relevance(query, doc_text, feedback):
    """
    使用llm来评估反馈的相关性
    Args:
        query (str): 当前的查询
        doc_text (str): 文档的文本内容
        feedback (dict): 包含查询、响应、评论和评分的反馈字典
    Returns:
        bool: 如果反馈与查询相关，则返回True；否则返回False
    """
    system_prompt = """You are an AI system that determines if a past feedback is relevant to a current query and document.
    Answer with ONLY 'yes' or 'no'. Your job is strictly to determine relevance, not to provide explanations."""

    user_prompt = f"""
    Current query: {query}
    Past query that received feedback: {feedback['query']}
    Document content: {doc_text[:500]}... [truncated]
    Past response that received feedback: {feedback['response'][:500]}... [truncated]

    Is this past feedback relevant to the current query and document? (yes/no)
    """

    response = client.chat.completions.create(
        model="qwen-turbo",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        extra_body={
            "enable_thinking": False,
            "temperature": 0
            }
    )
    
    answer = response.choices[0].message.content.strip().lower()
    return 'yes' in answer  

In [44]:
def adjust_relevance_scores(query, results, feedback_data):
    """
    根据历史反馈调整文档相关性得分以提高检索质量。
    
    此功能分析过去的用户反馈，动态调整
    检索到的文档。它标识与当前查询上下文相关的反馈，
    根据相关性评级计算分数修饰符，并相应地对结果重新排序。
    
    Args:
        query (str): 用户的查询文本
        results (List[Dict]): 包含文档文本和相似度得分的检索结果
        feedback_data (List[Dict]): 包含用户反馈的历史记录
        
    Returns:
        List[Dict]: 调整后的检索结果
    """
    
    if not feedback_data:
        return results
    
    print("Adjusting relevance scores based on feedback history...")
    
    for i, result in enumerate(results):
        document_text = result["text"]
        relevant_feedback = []
        
        for feedback in feedback_data:
            is_relevant = assess_feedback_relevance(query, document_text, feedback)
            if is_relevant:
                relevant_feedback.append(feedback)
       
        if relevant_feedback:
            
            avg_relevance = sum(f['relevance'] for f in relevant_feedback) / len(relevant_feedback)
            
            modifier = 0.5 + (avg_relevance / 5.0)

            original_score = result["similarity"]
            adjusted_score = original_score * modifier
            
            result["original_similarity"] = original_score  
            result["similarity"] = adjusted_score           
            result["relevance_score"] = adjusted_score      
            result["feedback_applied"] = True               
            result["feedback_count"] = len(relevant_feedback)  
            
            print(f"  Document {i+1}: Adjusted score from {original_score:.4f} to {adjusted_score:.4f} based on {len(relevant_feedback)} feedback(s)")
    
    results.sort(key=lambda x: x["similarity"], reverse=True)
    
    return results

使用反馈来微调RAG模型

In [45]:
def fine_tune_index(current_store, chunks, feedback_data):
    """
    使用高质量反馈增强向量存储，以随时间提高检索质量。
    
    此功能通过以下方式实现连续学习过程：
    1、识别高质量反馈（高评分问答对）
    2、从成功交互中新建检索项
    3、使用增强的相关权重将其添加到向量存储中

    参数:
    current_store (SimpleVectorStore): 当前的向量存储，包含原始文档块
    chunks (List[str]): 原始文档文本块
    feedback_data (List[Dict]): 历史用户反馈，包括相关性和质量评分

    返回:
    SimpleVectorStore: 增强的向量存储，包含原始块和从成功交互中派生的内容
    
    """
    print("Fine-tuning index with high-quality feedback...")
    

    good_feedback = [f for f in feedback_data if f['relevance'] >= 4 and f['quality'] >= 4]
    
    if not good_feedback:
        print("No high-quality feedback found for fine-tuning.")
        return current_store  
    
    new_store = SimpleVectorStore()

    for i in range(len(current_store.texts)):
        new_store.add_item(
            text=current_store.texts[i],
            embedding=current_store.vectors[i],
            metadata=current_store.metadata[i].copy()  
        )

    for feedback in good_feedback:

        enhanced_text = f"Question: {feedback['query']}\nAnswer: {feedback['response']}"
        
        embedding = create_embeddings(enhanced_text)

        new_store.add_item(
            text=enhanced_text,
            embedding=embedding,
            metadata={
                "type": "feedback_enhanced",  
                "query": feedback["query"],   
                "relevance_score": 1.2,       
                "feedback_count": 1,          
                "original_feedback": feedback 
            }
        )
        
        print(f"Added enhanced content from feedback: {feedback['query'][:50]}...")
    print(f"Fine-tuned index now has {len(new_store.texts)} items (original: {len(chunks)})")
    return new_store

In [46]:
def generate_response(query, context, model="qwen3-4b"):

    system_prompt = """You are a helpful AI assistant. Answer the user's question based only on the provided context. If you cannot find the answer in the context, state that you don't have enough information."""

    user_prompt = f"""
        Context:
        {context}

        Question: {query}

        Please provide a comprehensive answer based only on the context above.
    """

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        extra_body={
            "enable_thinking": False,
            "temperature": 0
            }

    )
    
    # Return the generated response content
    return response.choices[0].message.content

In [47]:
def rag_with_feedback_loop(query, vector_store, feedback_data, k=5, model="qwen3-4b"):
    """
    完整的RAG流程，包括反馈循环。
    
    Args:
        query (str): 用户的查询
        vector_store (SimpleVectorStore): 包含文档片段的向量存储
        feedback_data (List[Dict]): 历史用户反馈，包含查询、响应、相关性和质量评分
        k (int): 初始检索时考虑的文档片段数量
        model (str): 用于生成响应的LLM模型

    Returns:    
        Dict: 包含查询、检索到的文档片段和生成的响应的结果
    """
    print(f"\n=== Processing query with feedback-enhanced RAG ===")
    print(f"Query: {query}")
    
    query_embedding = create_embeddings(query)
    
    results = vector_store.similarity_search(query_embedding, k=k)
    
    adjusted_results = adjust_relevance_scores(query, results, feedback_data)
    

    retrieved_texts = [result["text"] for result in adjusted_results]
    
    context = "\n\n---\n\n".join(retrieved_texts)

    print("Generating response...")
    response = generate_response(query, context, model)

    result = {
        "query": query,
        "retrieved_documents": adjusted_results,
        "response": response
    }
    
    print("\n=== Response ===")
    print(response)
    
    return result

完整工作流：从初始化到反馈收集

In [48]:
def full_rag_workflow(pdf_path, query, feedback_data=None, feedback_file="feedback_data.json", fine_tune=False):
    """
    执行完整的RAG工作流程，并进行反馈集成，以持续改进。
    
    此函数协调整个检索增强生成过程：
    1、加载历史反馈数据
    2、文件处理分块
    3、可选择使用事先反馈微调向量索引
    4、利用反馈调整的相关性得分进行检索和生成
    5、收集新用户反馈，以便今后改进
    6、存储反馈，以便随时间进行系统学习
    
    """

    if feedback_data is None:
        feedback_data = load_feedback_data(feedback_file)
        print(f"Loaded {len(feedback_data)} feedback entries from {feedback_file}")
    
    chunks, vector_store = process_document(pdf_path)
    
    if fine_tune and feedback_data:
        vector_store = fine_tune_index(vector_store, chunks, feedback_data)

    result = rag_with_feedback_loop(query, vector_store, feedback_data)
    
    print("\n=== Would you like to provide feedback on this response? ===")
    print("Rate relevance (1-5, with 5 being most relevant):")
    relevance = input()
    
    print("Rate quality (1-5, with 5 being highest quality):")
    quality = input()
    
    print("Any comments? (optional, press Enter to skip)")
    comments = input()

    feedback = get_user_feedback(
        query=query,
        response=result["response"],
        relevance=int(relevance),
        quality=int(quality),
        comments=comments
    )
    
    store_feedback(feedback, feedback_file)
    print("Feedback recorded. Thank you!")
    
    return result

评估

In [49]:
def evaluate_feedback_loop(pdf_path, test_queries, reference_answers=None):
    """
    通过比较反馈集成前后的性能，评估反馈回路对抹布质量的影响。
    
    此函数运行受控实验，以测量合并反馈如何影响检索和生成：
    1、第一轮：运行所有测试查询，无反馈
    2、根据参考答案生成综合反馈（如果提供）
    3、第二轮：使用反馈增强检索运行相同的查询
    4、比较轮间结果，量化反馈影响

    """
    print("=== Evaluating Feedback Loop Impact ===")
    
    temp_feedback_file = "temp_evaluation_feedback.json"
    
    feedback_data = []
    
    print("\n=== ROUND 1: NO FEEDBACK ===")
    round1_results = []
    
    for i, query in enumerate(test_queries):
        print(f"\nQuery {i+1}: {query}")
        chunks, vector_store = process_document(pdf_path)

        result = rag_with_feedback_loop(query, vector_store, [])
        round1_results.append(result)
        
        if reference_answers and i < len(reference_answers):

            similarity_to_ref = calculate_similarity(result["response"], reference_answers[i])

            relevance = max(1, min(5, int(similarity_to_ref * 5)))
            quality = max(1, min(5, int(similarity_to_ref * 5)))

            feedback = get_user_feedback(
                query=query,
                response=result["response"],
                relevance=relevance,
                quality=quality,
                comments=f"Synthetic feedback based on reference similarity: {similarity_to_ref:.2f}"
            )

            feedback_data.append(feedback)
            store_feedback(feedback, temp_feedback_file)
    
    print("\n=== ROUND 2: WITH FEEDBACK ===")
    round2_results = []
    
    chunks, vector_store = process_document(pdf_path)
    vector_store = fine_tune_index(vector_store, chunks, feedback_data)
    
    for i, query in enumerate(test_queries):
        print(f"\nQuery {i+1}: {query}")

        result = rag_with_feedback_loop(query, vector_store, feedback_data)
        round2_results.append(result)

    comparison = compare_results(test_queries, round1_results, round2_results, reference_answers)

    if os.path.exists(temp_feedback_file):
        os.remove(temp_feedback_file)
    
    return {
        "round1_results": round1_results,
        "round2_results": round2_results,
        "comparison": comparison
    }

In [50]:
def calculate_similarity(text1, text2):
    """
    计算两个文本的相似度。
    """

    embedding1 = create_embeddings(text1)
    embedding2 = create_embeddings(text2)

    vec1 = np.array(embedding1)
    vec2 = np.array(embedding2)

    similarity = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
    
    return similarity

In [58]:
def compare_results(queries, round1_results, round2_results, reference_answers=None):
    """
    比较两个RAG系统的结果。

    """
    print("\n=== COMPARING RESULTS ===")
    
    system_prompt = """You are an expert evaluator of RAG systems. Compare responses from two versions:
        1. Standard RAG: No feedback used
        2. Feedback-enhanced RAG: Uses a feedback loop to improve retrieval

        Analyze which version provides better responses in terms of:
        - Relevance to the query
        - Accuracy of information
        - Completeness
        - Clarity and conciseness
    """

    comparisons = []

    for i, (query, r1, r2) in enumerate(zip(queries, round1_results, round2_results)):

        comparison_prompt = f"""
        Query: {query}

        Standard RAG Response:
        {r1["response"]}

        Feedback-enhanced RAG Response:
        {r2["response"]}
        """

        if reference_answers and i < len(reference_answers):
            comparison_prompt += f"""
            Reference Answer:
            {reference_answers[i]}
            """

        comparison_prompt += """
        Compare these responses and explain which one is better and why.
        Focus specifically on how the feedback loop has (or hasn't) improved the response quality.
        """

        response = client.chat.completions.create(
            model="qwen-plus",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": comparison_prompt}
            ],

        )
        
        comparisons.append({
            "query": query,
            "analysis": response.choices[0].message.content
        })

        print(f"\nQuery {i+1}: {query}")
        print(f"Analysis: {response.choices[0].message.content}...")
    
    return comparisons

In [59]:

pdf_path = "data/AI_Information.pdf"


test_queries = [
    "What is a neural network and how does it function?",

    #################################################################################
    ### Commented out queries to reduce the number of queries for testing purposes ###
    
    # "Describe the process and applications of reinforcement learning.",
    # "What are the main applications of natural language processing in today's technology?",
    # "Explain the impact of overfitting in machine learning models and how it can be mitigated."
]


reference_answers = [
    "A neural network is a series of algorithms that attempt to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. It consists of layers of nodes, with each node representing a neuron. Neural networks function by adjusting the weights of connections between nodes based on the error of the output compared to the expected result.",

    ############################################################################################
    #### Commented out reference answers to reduce the number of queries for testing purposes ###

#     "Reinforcement learning is a type of machine learning where an agent learns to make decisions by performing actions in an environment to maximize cumulative reward. It involves exploration, exploitation, and learning from the consequences of actions. Applications include robotics, game playing, and autonomous vehicles.",
#     "The main applications of natural language processing in today's technology include machine translation, sentiment analysis, chatbots, information retrieval, text summarization, and speech recognition. NLP enables machines to understand and generate human language, facilitating human-computer interaction.",
#     "Overfitting in machine learning models occurs when a model learns the training data too well, capturing noise and outliers. This results in poor generalization to new data, as the model performs well on training data but poorly on unseen data. Mitigation techniques include cross-validation, regularization, pruning, and using more training data."
]

evaluation_results = evaluate_feedback_loop(
    pdf_path=pdf_path,
    test_queries=test_queries,
    reference_answers=reference_answers
)

=== Evaluating Feedback Loop Impact ===

=== ROUND 1: NO FEEDBACK ===

Query 1: What is a neural network and how does it function?
Extracting text from PDF...
Chunking text...
Created 42 text chunks
Creating embeddings for chunks...
Added 42 chunks to the vector store

=== Processing query with feedback-enhanced RAG ===
Query: What is a neural network and how does it function?
Generating response...

=== Response ===
The context provided does not explicitly define what a neural network is or explain how it functions. However, it does mention deep learning as a subfield of machine learning that uses artificial neural networks with multiple layers (deep neural networks) to analyze data, inspired by the structure and function of the human brain. It also refers to Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), which are types of neural networks. 

From the context, we can infer that a neural network is a computational model inspired by the human brain, consistin