In [None]:
!python --version

Python 3.10.12


# 使用 LangChain v1 的 Claude 3 RAG 代理

LangChain v1 带来了许多变化，当比较版本 `0.0.3xx` 到 `0.1.x` 的 LangChain 时，首选的操作方式有很多变化。代理（agents）尤其如此。

我们初始化和使用代理的方式通常比过去更清晰——仍然有许多抽象，但我们可以（并且被鼓励）更接近代理逻辑本身。这在一开始可能会造成一些混乱，但一旦理解了，新逻辑会比以前版本更清晰。

在这个例子中，我们将使用 LangChain v1 构建一个 RAG 代理。我们将使用 Claude 3 作为 LLM，Voyage AI 进行知识嵌入，以及 Pinecone 来驱动我们的知识检索。

首先，让我们安装必要的依赖：

In [2]:
!pip install -qU \
    langchain==0.1.11 \
    langchain-core==0.1.30 \
    langchain-community==0.0.27 \
    langchain-anthropic==0.1.4 \
    langchainhub==0.1.15 \
    anthropic==0.19.1 \
    voyageai==0.2.1 \
    pinecone-client==3.1.0 \
    datasets==2.16.1

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m848.6/848.6 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.0/211.0 kB[0m [31m18.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m507.1/507.1 kB[0m [31m30.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.8/77.8 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━

然后获取所需的 API 密钥。我们需要 [Claude](https://docs.claude.com/claude/reference/getting-started-with-the-api)、[Voyage AI](https://docs.voyageai.com/install/) 和 [Pinecone](https://docs.pinecone.io/docs/quickstart) 的 API 密钥。

In [None]:
# 在此处插入您的 API 密钥
ANTHROPIC_API_KEY = "<YOUR_ANTHROPIC_API_KEY>"
PINECONE_API_KEY = "<YOUR_PINECONE_API_KEY>"
VOYAGE_API_KEY = "<YOUR_VOYAGE_API_KEY>"

## 寻找知识

代理使用 RAG 首先需要的是我们可以从中提取知识的数据源。我们将使用 AI ArXiv 数据集的 v2 版本，可在 Hugging Face 数据集的 [`jamescalam/ai-arxiv2-chunks`](https://huggingface.co/datasets/jamescalam/ai-arxiv2-chunks) 找到。

_注意：我们使用的是预分块数据集。有关原始版本，请参阅 [`jamescalam/ai-arxiv2`](https://huggingface.co/datasets/jamescalam/ai-arxiv2)。_

In [4]:
from datasets import load_dataset

dataset = load_dataset("jamescalam/ai-arxiv2-chunks", split="train[:20000]")
dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading data:   0%|          | 0.00/766M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 20000
})

In [5]:
dataset[1]

{'doi': '2401.09350',
 'chunk-id': 1,
 'chunk': 'These neural networks and their training algorithms may be complex, and the scope of their impact broad and wide, but nonetheless they are simply functions in a high-dimensional space. A trained neural network takes a vector as input, crunches and transforms it in various ways, and produces another vector, often in some other space. An image may thereby be turned into a vector, a song into a sequence of vectors, and a social network as a structured collection of vectors. It seems as though much of human knowledge, or at least what is expressed as text, audio, image, and video, has a vector representation in one form or another.\nIt should be noted that representing data as vectors is not unique to neural networks and deep learning. In fact, long before learnt vector representations of pieces of dataâ\x80\x94what is commonly known as â\x80\x9cembeddingsâ\x80\x9dâ\x80\x94came along, data was often encoded as hand-crafted feature vectors. E

## 构建知识库

要构建我们的知识库，我们需要**两样东西**：

1. 嵌入向量（Embeddings），为此我们将使用 `VoyageEmbeddings`，它使用 Voyage AI 的嵌入模型，这需要一个 [API 密钥](https://dash.voyageai.com/api-keys)。
2. 一个向量数据库，用于存储我们的嵌入向量并查询它们。我们使用 Pinecone，这也需要一个 [免费的 API 密钥](https://app.pinecone.io)。

首先，我们初始化与 Voyage AI 的连接并定义一个 `embed` 对象用于嵌入：

In [7]:
from langchain_community.embeddings import VoyageEmbeddings

embed = VoyageEmbeddings(voyage_api_key=VOYAGE_API_KEY, model="voyage-2")

然后我们初始化与 Pinecone 的连接：

In [None]:
from pinecone import Pinecone

# 配置客户端
pc = Pinecone(api_key=PINECONE_API_KEY)

现在我们设置索引规范，这使我们能够定义云提供商和我们想要部署索引的区域。您可以在[此处](https://docs.pinecone.io/docs/projects)找到所有可用的提供商和区域列表。

In [9]:
from pinecone import ServerlessSpec

spec = ServerlessSpec(cloud="aws", region="us-west-2")

在创建索引之前，我们需要 Voyage AI 嵌入模型的维度，我们可以通过创建嵌入并检查长度来轻松找到：

In [10]:
vec = embed.embed_documents(["ello"])
len(vec[0])

1024

现在我们使用嵌入维度创建索引，以及与模型兼容的度量（可以是 cosine 或 dotproduct）。我们还将规范传递给索引初始化。

In [None]:
import time

index_name = "claude-3-rag"

# 检查索引是否已存在（如果这是第一次，应该不存在）
if index_name not in pc.list_indexes().names():
    # 如果不存在，创建索引
    pc.create_index(
        index_name,
        dimension=len(vec[0]),  # voyage 模型的维度
        metric="dotproduct",
        spec=spec,
    )
    # 等待索引初始化
    while not pc.describe_index(index_name).status["ready"]:
        time.sleep(1)

# 连接到索引
index = pc.Index(index_name)
time.sleep(1)
# 查看索引统计信息
index.describe_index_stats()

### 填充我们的索引

现在我们的知识库已经准备好填充数据了。我们将使用 `embed` 辅助函数来嵌入文档，然后将它们添加到索引中。

我们还将包含每条记录的元数据。

In [None]:
from tqdm.auto import tqdm

# 使用 pandas dataframe 处理数据更方便
data = dataset.to_pandas()

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i + batch_size)
    # 获取数据批次
    batch = data.iloc[i:i_end]
    # 为每个块生成唯一ID
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # 获取要嵌入的文本
    texts = [x["chunk"] for _, x in batch.iterrows()]
    # 嵌入文本
    embeds = embed.embed_documents(texts)
    # 获取要存储在 Pinecone 中的元数据
    metadata = [
        {"text": x["chunk"], "source": x["source"], "title": x["title"]}
        for i, x in batch.iterrows()
    ]
    # 添加到 Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

为我们的代理创建一个工具，用于搜索 ArXiv 论文：

In [None]:
from langchain.agents import tool


@tool
def arxiv_search(query: str) -> str:
    """当回答关于 AI、机器学习、数据科学或其他可能通过 ArXiv 论文回答的技术问题时，请使用此工具。
    """
    # 创建查询向量
    xq = embed.embed_query(query)
    # 执行搜索
    out = index.query(vector=xq, top_k=5, include_metadata=True)
    # 将结果重新格式化为字符串
    results_str = "\n\n".join([x["metadata"]["text"] for x in out["matches"]])
    return results_str


tools = [arxiv_search]

当我们的代理使用此工具时，它会这样执行：

In [14]:
print(arxiv_search.run(tool_input={"query": "can you tell me about llama 2?"}))

Model Llama 2 Code Llama Code Llama - Python Size FIM LCFT Python CPP Java PHP TypeScript C# Bash Average 7B â 13B â 34B â 70B â 7B â 7B â 7B â 7B â 13B â 13B â 13B â 13B â 34B â 34B â 7B â 7B â 13B â 13B â 34B â 34B â â â â â 14.3% 6.8% 10.8% 9.9% 19.9% 13.7% 15.8% 13.0% 24.2% 23.6% 22.2% 19.9% 27.3% 30.4% 31.6% 34.2% 12.6% 13.2% 21.4% 15.1% 6.3% 3.2% 8.3% 9.5% 3.2% 12.6% 17.1% 3.8% 18.9% 25.9% 8.9% 24.8% â â â â â â â â â â 37.3% 31.1% 36.1% 30.4% 29.2% 29.8% 38.0%

Ethical Considerations and Limitations (Section 5.2) Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2âs potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or objectionable responses to user prompts. Therefore, before deploying any applications of L

## 定义 XML 代理

XML 代理主要是为了支持 Anthropic 模型而构建的。Anthropic 模型已经过训练，可以使用 XML 标签，如 `<input>{some input}</input` 或在使用工具时使用：

```
<tool>{tool name}</tool>
<tool_input>{tool input}</tool_input>
```

这与典型 ReAct 代理产生的格式有很大不同，Anthropic 模型对 ReAct 格式的支持不如 XML 格式。

要创建 XML 代理，我们需要一个 `prompt`、`llm` 和 `tools` 列表。我们可以从 LangChain hub 下载对话式 XML 代理的预构建提示。

In [15]:
from langchain import hub

prompt = hub.pull("hwchase17/xml-agent-convo")
prompt

ChatPromptTemplate(input_variables=['agent_scratchpad', 'input', 'tools'], partial_variables={'chat_history': ''}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['agent_scratchpad', 'chat_history', 'input', 'tools'], template="You are a helpful assistant. Help the user answer any questions.\n\nYou have access to the following tools:\n\n{tools}\n\nIn order to use a tool, you can use <tool></tool> and <tool_input></tool_input> tags. You will then get back a response in the form <observation></observation>\nFor example, if you have a tool called 'search' that could run a google search, in order to search for the weather in SF you would respond:\n\n<tool>search</tool><tool_input>weather in SF</tool_input>\n<observation>64 degrees</observation>\n\nWhen you are done, respond with a final answer between <final_answer></final_answer>. For example:\n\n<final_answer>The weather in SF is 64 degrees</final_answer>\n\nBegin!\n\nPrevious Conversation:\n{chat_history}\n\n

我们可以看到 XML 格式在整个提示中用于向 LLM 解释它应该如何使用工具。

接下来我们初始化与 Anthropic 的连接，为此我们需要 [Claude API 密钥](https://console.anthropic.com/)。

In [None]:
from langchain_anthropic import ChatAnthropic

# 聊天完成 LLM
llm = ChatAnthropic(
    ANTHROPIC_API_KEY=ANTHROPIC_API_KEY,
    model_name="claude-opus-4-1",  # 将 "opus" 改为 "sonnet" 以提高速度
    temperature=0.0,
)

当代理运行时，我们将为其提供一个 `input` —— 这是来自用户的输入文本。然而，在代理逻辑中，还会传递一个 *agent_scratchpad* 对象，其中将包含工具信息。要将这些信息输入到我们的 LLM 中，我们需要将其转换为上面描述的 XML 格式，我们定义 `convert_intermediate_steps` 函数来处理这个问题。

In [None]:
def convert_intermediate_steps(intermediate_steps):
    log = ""
    for action, observation in intermediate_steps:
        log += (
            f"<tool>{action.tool}</tool><tool_input>{action.tool_input}"
            f"</tool_input><observation>{observation}</observation>"
        )
    return log

我们还必须将工具解析为包含 `tool_name: tool_description` 的字符串——我们用 `convert_tools` 函数处理这个问题。

In [None]:
def convert_tools(tools):
    return "\n".join([f"{tool.name}: {tool.description}" for tool in tools])

一切准备就绪后，我们可以使用 [**L**ang**C**hain **E**xpression **L**anguage (LCEL)](https://www.pinecone.io/learn/series/langchain/langchain-expression-language/) 初始化我们的代理对象。我们使用 `llm.bind(stop=[...])` 添加关于 LLM 何时应该*停止*生成的指令，最后我们使用 `XMLAgentOutputParser` 对象解析代理的输出。

In [None]:
from langchain.agents.output_parsers import XMLAgentOutputParser

agent = (
    {
        "input": lambda x: x["input"],
        # 没有 "chat_history"，工具使用就没有先前交互的上下文
        "chat_history": lambda x: x["chat_history"],
        "agent_scratchpad": lambda x: convert_intermediate_steps(x["intermediate_steps"]),
    }
    | prompt.partial(tools=convert_tools(tools))
    | llm.bind(stop=["</tool_input>", "</final_answer>"])
    | XMLAgentOutputParser()
)

在我们初始化 `agent` 对象后，我们将其传递给 `AgentExecutor` 对象以及我们最初的 `tools` 列表：

In [None]:
from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

现在我们可以通过 `invoke` 方法使用代理：

In [25]:
user_msg = "can you tell me about llama 2?"

out = agent_executor.invoke({"input": user_msg, "chat_history": ""})

print(out["output"])



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m<tool>arxiv_search</tool>
<tool_input>llama 2[0m[36;1m[1;3mModel Llama 2 Code Llama Code Llama - Python Size FIM LCFT Python CPP Java PHP TypeScript C# Bash Average 7B â 13B â 34B â 70B â 7B â 7B â 7B â 7B â 13B â 13B â 13B â 13B â 34B â 34B â 7B â 7B â 13B â 13B â 34B â 34B â â â â â 14.3% 6.8% 10.8% 9.9% 19.9% 13.7% 15.8% 13.0% 24.2% 23.6% 22.2% 19.9% 27.3% 30.4% 31.6% 34.2% 12.6% 13.2% 21.4% 15.1% 6.3% 3.2% 8.3% 9.5% 3.2% 12.6% 17.1% 3.8% 18.9% 25.9% 8.9% 24.8% â â â â â â â â â â 37.3% 31.1% 36.1% 30.4% 29.2% 29.8% 38.0%

2
Cove Liama Long context (7B =, 13B =, 34B) + fine-tuning ; Lrama 2 Code training 20B oes Cope Liama - Instruct Foundation models â> nfilling code training = eee.â (7B =, 13B =, 34B) â 5B (7B, 13B, 348) 5008 Python code Long context Cove Liama - PyrHon (7B, 13B, 34B) > training Â» Fine-tuning > 1008 208
Figure 2: The Code Llama

这看起来很不错，但现在我们的代理是*无状态的*——使得很难与之对话。我们可以用许多不同的方式为其提供记忆，但最简单的方法之一是使用 `ConversationBufferWindowMemory`。

In [None]:
from langchain.chains.conversation.memory import ConversationBufferWindowMemory

# 对话记忆
conversational_memory = ConversationBufferWindowMemory(
    memory_key="chat_history", k=5, return_messages=True
)

我们还没有将对话记忆附加到我们的代理——所以 `conversational_memory` 对象将保持为空：

In [27]:
conversational_memory.chat_memory.messages

[]

我们必须手动将与我们和代理之间的交互添加到我们的记忆中。

In [28]:
conversational_memory.chat_memory.add_user_message(user_msg)
conversational_memory.chat_memory.add_ai_message(out["output"])

conversational_memory.chat_memory.messages

[HumanMessage(content='can you tell me about llama 2?'),
 AIMessage(content='\n- Llama 2 is a large language model developed by Meta AI. It comes in sizes ranging from 7B to 70B parameters.\n\n- Code Llama is a version of Llama 2 that has been specialized for code generation through fine-tuning on code datasets. Code Llama models are available in Python, C++, Java, PHP, TypeScript, C#, and Bash.\n\n- The Code Llama specialization pipeline involves foundation model pre-training, long context training, code infilling training, and fine-tuning on specific programming languages. \n\n- Code Llama significantly outperforms the base Llama 2 models on code generation benchmarks like HumanEval and MBPP. For example, the 34B parameter Code Llama - Python achieves 48.8% pass@1 on HumanEval compared to 34.1% for the 34B Llama 2.\n\n- As with all large language models, Llama 2 has limitations and potential risks that need to be considered before deploying it in applications. Meta provides a respons

现在我们可以看到已经添加了*两条*消息，我们的 `HumanMessage` 和代理的 `AIMessage` 响应。不幸的是，我们不能将这些消息直接发送到我们的 XML 代理。相反，我们需要传递以下格式的字符串：

```
Human: {human message}
AI: {AI message}
```

让我们编写一个快速的 `memory2str` 辅助函数来为我们处理这个问题：

In [29]:
from langchain_core.messages.human import HumanMessage


def memory2str(memory: ConversationBufferWindowMemory):
    messages = memory.chat_memory.messages
    memory_list = [
        f"Human: {mem.content}" if isinstance(mem, HumanMessage) else f"AI: {mem.content}"
        for mem in messages
    ]
    memory_str = "\n".join(memory_list)
    return memory_str

In [30]:
print(memory2str(conversational_memory))

Human: can you tell me about llama 2?
AI: 
- Llama 2 is a large language model developed by Meta AI. It comes in sizes ranging from 7B to 70B parameters.

- Code Llama is a version of Llama 2 that has been specialized for code generation through fine-tuning on code datasets. Code Llama models are available in Python, C++, Java, PHP, TypeScript, C#, and Bash.

- The Code Llama specialization pipeline involves foundation model pre-training, long context training, code infilling training, and fine-tuning on specific programming languages. 

- Code Llama significantly outperforms the base Llama 2 models on code generation benchmarks like HumanEval and MBPP. For example, the 34B parameter Code Llama - Python achieves 48.8% pass@1 on HumanEval compared to 34.1% for the 34B Llama 2.

- As with all large language models, Llama 2 has limitations and potential risks that need to be considered before deploying it in applications. Meta provides a responsible use guide with recommendations for safe

现在让我们把另一个名为 `chat` 的辅助函数放在一起，帮助我们处理代理的*状态*部分。

In [31]:
def chat(text: str):
    out = agent_executor.invoke({"input": text, "chat_history": memory2str(conversational_memory)})
    conversational_memory.chat_memory.add_user_message(text)
    conversational_memory.chat_memory.add_ai_message(out["output"])
    return out["output"]

现在我们只需与我们的代理聊天，它会记住先前交互的上下文。

In [33]:
print(chat("was any red teaming done with the model?"))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m<tool>arxiv_search</tool>
<tool_input>llama 2 red teaming[0m[36;1m[1;3mAfter conducting red team exercises, we asked participants (who had also participated in Llama 2 Chat exercises) to also provide qualitative assessment of safety capabilities of the model. Some participants who had expertise in offensive security and malware development questioned the ultimate risk posed by âmalicious code generationâ through LLMs with current capabilities.
One red teamer remarked, âWhile LLMs being able to iteratively improve on produced source code is a risk, producing source code isnât the actual gap. That said, LLMs may be risky because they can inform low-skill adversaries in production of scripts through iteration that perform some malicious behavior.â
According to another red teamer, â[v]arious scripts, program code, and compiled binaries are readily available on mainstream public websites, hacking forums or on âthe

我们可以提出遗漏关键信息的跟进问题，但由于对话历史，LLM 理解上下文并使用它来调整搜索查询。例如我们询问了 `red teaming` 但没有提到 `llama 2` —— Claude 3 根据聊天历史将此上下文添加到搜索查询 `"llama 2 red teaming"` 中。

---