# RAG Agent
首先安装必须的库

In [None]:
#pip install langchain langchain-text-splitters langchain-community bs4

## LangSmith
使用LangChain构建的许多应用程序将包含多个步骤，并调用多个LLM调用。随着这些应用变得越来越复杂，检测究竟什么在你的agent或chain中运行显得格外重要，这时候就要用到 LangSmith。

提前配置API密钥之类的操作在这里就不赘述了。
## 模型和向量库的导入
配置聊天模型、嵌入模型和向量库

In [2]:
from langchain_openai import ChatOpenAI
import os
# 配置聊天模型
model = ChatOpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
    model="qwen-plus",  # 此处以qwen-plus为例，您可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/getting-started/models
    # other params...
)
# 配置嵌入模型
from langchain_community.embeddings import DashScopeEmbeddings
embeddings = DashScopeEmbeddings(
    model="text-embedding-v2",
    # other params...
)
# 配置向量库
from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

## 1. 索引
和语义搜索基本相似。

In [3]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

assert len(docs) == 1
print(f"Total characters: {len(docs[0].page_content)}")

USER_AGENT environment variable not set, consider setting it to identify your requests.


Total characters: 43047


In [4]:
print(docs[0].page_content[:500])



      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In


In [5]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)
all_splits = text_splitter.split_documents(docs)

print(f"Split blog post into {len(all_splits)} sub-documents.")

Split blog post into 63 sub-documents.


In [6]:
document_ids = vector_store.add_documents(documents=all_splits)

print(document_ids[:3])

['f708c48a-741c-434e-934e-77882926be89', '7c83bc5b-1784-42f9-b5e6-8f6895ae00ff', '678673b4-181b-44f0-9659-19a1f2b09aa1']


## 召回和生成
![](./Imaegs/rag_retrieval_generation.avif)
1. 召回：根据用户输入，从知识库中检索出最相关的文档片段。
2. 生成：根据召回的文档片段和用户的问题生成prompt，生成用户所需的答案。
   
编写实际的应用程序逻辑。我们希望创建一个简单的应用程序，它接受用户问题，搜索与该问题相关的文档，将检索到的文档和初始问题传递给模型，并返回答案。

可以构建一个装备了检索向量数据库工具的简单RAG agent。

In [7]:
from langchain.tools import tool

@tool(response_format="content_and_artifact")
def retrieve_context(query: str):
    """Retrieve information to help answer a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

这里我们使用工具装饰器来配置工具，将原始文档作为工件附加到每个ToolMessage。这将允许我们访问应用程序中的文档元数据，而不是发送给模型的字符串化表示。

检索工具不限于单个字符串查询参数，如下面的示例所示。你可以通过添加参数来强制LLM指定额外的搜索参数——例如，一个类别：

In [None]:
#from typing import Literal

#def retrieve_context(query: str, section: Literal["beginning", "middle", "end"]):

构建agent：


In [8]:
from langchain.agents import create_agent


tools = [retrieve_context]
# If desired, specify custom instructions
prompt = (
    "You have access to a tool that retrieves context from a blog post. "
    "Use the tool to help answer user queries."
)
agent = create_agent(model, tools, system_prompt=prompt)

下面进行测试，我们构造一个问题，它通常需要一系列迭代的检索步骤来回答：

In [9]:
query = (
    "What is the standard method for Task Decomposition?\n\n"
    "Once you get the answer, look up common extensions of that method."
)

for event in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    event["messages"][-1].pretty_print()


What is the standard method for Task Decomposition?

Once you get the answer, look up common extensions of that method.
Tool Calls:
  retrieve_context (call_1b7e5b659144467a85ba2e)
 Call ID: call_1b7e5b659144467a85ba2e
  Args:
    query: standard method for Task Decomposition
Name: retrieve_context

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1638}
Content: Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
Tree of Thoughts (Yao et al. 2023) extends Co

步骤：

1. 生成查询以搜索任务分解的标准方法；
2. 接收到答案后，生成第二个查询以搜索其公共扩展；
3. 在接受了所有必要的上下文之后，回答了这个问题。

## RAG chains
上文方法存在一些缺点：
1. 两个推理调用——在执行搜索时，需要一个调用生成查询，另一个调用生成最终响应。
2. 缺少控制——LLM可能跳过某些实际上需要的搜索或者额外执行本来不需要的搜索。

另一种常见的方法是两步链，在这种方法中，我们总是运行搜索（可能使用原始用户查询），并将结果合并为单个LLM查询的上下文。这将导致每个查询只有一个推理调用，以牺牲灵活性为代价来减少延迟。

在这种方法中，我们不再在循环中调用模型，而是进行单遍传递。

我们可以通过从代理中删除工具，并将检索步骤合并到自定义提示符中来实现这个链：

In [10]:
from langchain.agents.middleware import dynamic_prompt, ModelRequest

@dynamic_prompt
def prompt_with_context(request: ModelRequest) -> str:
    """Inject context into state messages."""
    last_query = request.state["messages"][-1].text
    retrieved_docs = vector_store.similarity_search(last_query)

    docs_content = "\n\n".join(doc.page_content for doc in retrieved_docs)

    system_message = (
        "You are a helpful assistant. Use the following context in your response:"
        f"\n\n{docs_content}"
    )

    return system_message


agent = create_agent(model, tools=[], middleware=[prompt_with_context])

测试：

In [11]:
query = "What is task decomposition?"
for step in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()


What is task decomposition?

Task decomposition is the process of breaking down a complex task into smaller, more manageable steps or subtasks. This approach allows for better planning and execution, as each simpler step can be addressed individually, making the overall task easier to handle.

In the context of AI and language models, task decomposition can be achieved through methods like:

- **Chain of Thought (CoT)**: The model is prompted to "think step by step," generating intermediate reasoning steps that lead to the final solution.
- **Tree of Thoughts (ToT)**: An extension of CoT where multiple reasoning paths are explored at each step, forming a tree-like structure of possible solutions, which can be searched using strategies like breadth-first or depth-first search.
- **LLM+P**: Uses an external classical planner with Planning Domain Definition Language (PDDL) to generate a formal plan, leveraging structured planning tools beyond the LLM’s internal reasoning.

These techniqu