<a href="https://colab.research.google.com/github/weedge/doraemon-nb/blob/main/Langchain_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval-augmented generation (RAG)

https://python.langchain.com/docs/use_cases/question_answering/

## Overview

### 什么是RAG?
RAG是一种使用额外的(通常是私有的或实时的)数据来扩充LMMs的技术。

LLMs可以对广泛的主题进行推理，但他们的知识仅限于他们接受训练的特定时间点的公共数据。如果您想要构建能够推断私有数据或在模型截止日期之后引入的数据的AI应用程序，则需要使用模型所需的特定信息来增强模型的知识。引入适当的信息并将其插入模型提示符的过程称为检索增强生成(RAG)。

### 这本指南有什么内容?
LangChain有许多专门设计用来帮助构建RAG应用程序的组件。为了熟悉它们，我们将在文本数据源上构建一个简单的问答应用程序。具体来说，我们将在Lilian Weng的LLM Powered Autonomous Agents博客文章上构建一个QA机器人。在此过程中，我们将介绍一个典型的QA体系结构，讨论相关的LangChain组件，并重点介绍用于更高级QA技术的额外资源。我们还将看到LangSmith如何帮助我们跟踪和理解我们的应用程序。随着我们的应用程序变得越来越复杂，LangSmith将变得越来越有帮助。

注意:这里我们关注的是非结构化数据的RAG。我们在其他地方介绍的两个RAG用例是:
- [QA over structured data](/docs/use_cases/qa_structured/sql) (e.g., SQL)
- [QA over code](/docs/use_cases/question_answering/code_understanding) (e.g., Python)

## 架构
典型的RAG应用程序有两个主要组件:

**索引**:从源获取数据并为其建立索引的管道。这通常发生在离线状态下

**检索和生成**:实际的RAG链，它在运行时接受用户查询并从索引中检索相关数据，然后将其传递给模型。

从原始数据到答案最常见的完整序列如下:

### 索引
1. **加载**:首先我们需要加载数据。我们将使用[DocumentLoaders](https://python.langchain.com/docs/modules/data_connection/document_loaders/)。
2. **Split**: [Text splitter](https://python.langchain.com/docs/modules/data_connection/document_transformers/)将大的“Documents”分解成小块。这对于索引数据和将数据传递给模型都很有用，因为大块很难搜索，并且不适合模型有限的上下文窗口。
3. **Store**:我们需要一个地方来存储和索引拆分，以便以后可以搜索它们。这通常使用[VectorStore](https://python.langchain.com/docs/modules/data_connection/vectorstores/)和[Embeddings](https://python.langchain.com/docs/modules/data_connection/text_embedding/)模型来完成。

![index_diagram](https://github.com/langchain-ai/langchain/blob/master/docs/static/img/rag_indexing.png?raw=1)

### 检索与生成
4. **检索**: 给定用户输入，使用[retriver](https://python.langchain.com/docs/modules/data_connection/retrivers/)从存储中检索相关拆分。
5. **生成**:[ChatModel](https://python.langchain.com/docs/modules/model_io/chat_models) / [LLM](/docs/modules/model_io/llms/)使用包含问题和检索数据的提示生成答案

![retrieval_diagram](https://github.com/langchain-ai/langchain/blob/master/docs/static/img/rag_retrieval_generation.png?raw=1)

## 设置

### 安装依赖

我们将使用OpenAI聊天模型和嵌入和一个Chroma矢量存储在这个walkthrough中，但这里显示的一切都可以与任何[ChatModel](https://python.langchain.com/docs/integrations/chat/)或[LLM](https://python.langchain.com/docs/integrations/llms/)， [embeddings](https://python.langchain.com/docs/integrations/text_embedding/)和[VectorStore](https://python.langchain.com/docs/integrations/vectorstores/)或[retriver](https://python.langchain.com/docs/integrations/retrivers)一起工作。

我们将使用以下包:

In [5]:
!pip install -U tiktoken

Collecting tiktoken
  Downloading tiktoken-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.[0m[31m
[0mSuccessfully installed tiktoken-0.5.2


In [1]:
!pip install -U langchain openai chromadb langchainhub bs4

Collecting langchain
  Downloading langchain-0.0.351-py3-none-any.whl (794 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m794.3/794.3 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai
  Downloading openai-1.5.0-py3-none-any.whl (223 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m223.7/223.7 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting chromadb
  Downloading chromadb-0.4.20-py3-none-any.whl (507 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m507.7/507.7 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchainhub
  Downloading langchainhub-0.1.14-py3-none-any.whl (3.4 kB)
Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.3-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-p

从colab设置的OPENAI_API_KEY秘钥中获取

In [None]:
import getpass
import os
from google.colab import userdata
userdata.get('OPENAI_API_KEY')

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
print(userdata.get('OPENAI_API_KEY'))

# import dotenv

# dotenv.load_dotenv()

### LangSmith

使用LangChain构建的许多应用程序将包含多个步骤，并调用多个LLM调用。随着这些应用程序变得越来越复杂，能够检查链或代理内部究竟发生了什么变得至关重要。最好的方法是使用[LangSmith](https://smith.langchain.com)。

请注意，LangSmith不是必需的，但它很有帮助。如果你确实想使用LangSmith，在你注册了上面的链接后，确保设置你的环境变量来开始记录跟踪:

In [None]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
from google.colab import userdata
os.environ["LANGCHAIN_API_KEY"] = userdata.get('LANGCHAIN_API_KEY')


##快速入门

- https://docs.smith.langchain.com/cookbook/hub-examples

假设我们想在Lilian Weng的博客文章[LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/)上构建一个QA应用程序。我们可以用20行代码创建一个简单的管道:

In [3]:
import bs4
from langchain import hub
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema import StrOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_core.runnables import RunnablePassthrough

In [6]:
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

#https://smith.langchain.com/hub/rlm/rag-prompt?organizationId=37c38d2b-ad01-59a2-a7ab-48f1fcc0cf57
prompt = hub.pull("rlm/rag-prompt")
print(prompt)
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

input_variables=['context', 'question'] messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))]


In [7]:
rag_chain.invoke("What is Task Decomposition?")

"Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through various methods such as using prompting techniques, task-specific instructions, or human inputs. The goal is to make the task more manageable and facilitate the interpretation of the model's thinking process."

In [8]:
# cleanup
vectorstore.delete_collection()

:::tip

Check out the [LangSmith trace](https://smith.langchain.com/public/1c6ca97e-445b-4d00-84b4-c7befcbc59fe/r)

:::

##详细演练

让我们一步一步地检查上面的代码，以真正理解发生了什么。

##步骤1. 加载

需要首先加载博客文章的内容。可以使用'DocumentLoader's，它是从一个源中加载数据为'Documents'的对象。'Document'是一个具有'page_content'(str)和'metadata'(dict)属性的对象。

在这种情况下，将使用'WebBaseLoader'，它使用'urllib'和'BeautifulSoup'来加载和解析传入的web url，每个url返回一个'Document'。我们可以自定义html -> 通过'bs_kwargs'将参数传递给'BeautifulSoup'解析器进行文本解析(参见[BeautifulSoup文档](https://beautiful-soup-4.readthedocs.io/en/latest/#beautifulsoup))。在这种情况下，只有带有“post-content”、“post-title”或“post-header”类的HTML标签是相关的，所以删除所有其他标签。

In [9]:
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={
        "parse_only": bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    },
)
docs = loader.load()

In [10]:
len(docs[0].page_content)

42824

In [11]:
print(docs[0].page_content[:500])



      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In


### 进一步了解
'DocumentLoader':从一个源中以'Documents'加载数据的对象。
- [Docs](https://python.langchain.com/docs/modules/data_connection/document_loaders/):关于如何使用DocumentLoader的进一步文档。
- [Integrations](https://python.langchain.com/docs/Integrations/document_loaders/):查找相关的“DocumentLoader”集成(其中160个)用于用例。

##步骤2. 切分

我们加载的文档长度超过42k个字符。这对于许多模型的上下文窗口来说太长了。即使对于那些可以在上下文窗口中匹配整个帖子的模型，经验模型也很难在很长的提示中找到相关的上下文。

因此，我们将把“Document”分割成块用于embeddings矢量存储。这将帮助我们在运行时只检索博客文章中最相关的部分。

在本例中，我们将文档分成1000个字符的块，块之间有200个字符重叠。重叠有助于减少将语句从与其相关的重要上下文中分离出来的可能性。我们使用“RecursiveCharacterTextSplitter”，它将使用通用分隔符(如新行)(递归地)拆分文档，直到每个块都是适当的大小。

In [12]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

In [13]:
len(all_splits)

66

In [14]:
len(all_splits[0].page_content)

969

In [15]:
all_splits[10].metadata

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'start_index': 7056}

### 深入阅读

'DocumentSplitter': 将'Document'列表分割成更小块的对象。DocumentTransformer的子类。
- 探索“Context-aware splitters 上下文感知分配器”，它保留原始“Document”中每个分配器的位置(“Context”):
- [Markdown文件](https://python.langchain.com/docs/use_cases/question_answer/document-context-aware-QA)
- [代码(py或js)](https://python.langchain.com/docs/integrations/document_loaders/source_code)
- [科学论文](https://python.langchain.com/docs/integrations/document_loaders/grobid)

'DocumentTransformer': 对'Document'的列表执行转换的对象。
- [Docs](https://python.langchain.com/docs/modules/data_connection/document_transformers/):关于如何使用DocumentTransformer的进一步文档
- [Integration](https://python.langchain.com/docs/integrations/document_transformers/)

##步骤3. 存储

现在我们在内存中有66个文本块，我们需要存储和索引它们，以便稍后在我们的RAG应用程序中搜索它们。最常见的方法是嵌入每个文档拆分的内容并将这些嵌入上传到矢量存储。

然后，当我们想要搜索分割时，我们将搜索查询也embedding其中，并执行某种“相似性”搜索，以识别与我们的查询embedding最相似的embedding的存储分割。最简单的相似度度量是余弦相似度——我们测量每一对嵌入(它们只是非常高维的向量)之间角度的余弦。

我们可以使用“Chroma”矢量存储和“OpenAIEmbeddings”模型在单个命令中embedding和存储所有文档分割。

In [16]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

###进一步了解
`Embeddings`:围绕文本embedding模型的包装器，用于将文本转换为embeddings。
- [Docs](https://python.langchain.com/Docs/modules/data_connection/text_embedding):关于接口的进一步文档。
- [Integration](https://python.langchain.com/docs/integration/text_embedding/):浏览30个文本嵌入集成

`VectorStore`:围绕向量数据库的包装器，用于存储和查询嵌入。
- [Docs](https://python.langchain.com/Docs/modules/data_connection/vectorstores/):关于接口的进一步文档。
- [Integration](https://python.langchain.com/docs/integration/vectorstores/):浏览40个“VectorStore”集成。

这完成了管道的**Indexing**部分。此时，我们有一个可查询的矢量存储，其中包含博客文章的分块内容。给定一个用户问题，理想情况下，我们应该能够返回回答这个问题的博客文章的片段:

##步骤4. 检索

现在让我们编写实际的应用程序逻辑。我们希望创建一个简单的应用程序，让用户提出问题，搜索与该问题相关的文档，将检索到的文档和初始问题传递给模型，最后返回答案。

LangChain定义了一个`retriver`接口，该接口包装了一个索引，该索引可以返回给定字符串查询的相关文档。所有检索器都实现一个通用方法`get_relevance_documents()`(及其异步变体`aget_relevance_documents()`)。

最常见的“检索器”类型是“向量存储检索器”，它使用向量存储的相似性搜索功能来方便检索。任何“VectorStore”都可以很容易地变成“Retriever”:

In [17]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

In [21]:
retrieved_docs = retriever.get_relevant_documents(
    "What are the approaches to Task Decomposition?"
)

retrieved_docs

[Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 2192}),
 Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decompositi

In [19]:
len(retrieved_docs)

6

In [20]:
print(retrieved_docs[0].page_content)

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.


### 进一步了解
向量存储通常用于检索，但是还有很多其他的方法来进行检索。

一个对象，它返回给定文档的文本查询
-  [Docs](https://python.langchain.com/docs/modules/data_connection/retrievers/):关于接口和内置检索技术的进一步文档。其中包括:
- `MultiQueryRetriever`[generates variants of the input question](https://python.langchain.com/docs/modules/data_connection/retrievers/multiqueryretriver)提高检索命中率。
- `MultiVectorRetriever`[generates variants of embeddings](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector)，也是为了提高检索命中率。
- `Max marginal relevance`(最大边际相关性)在检索的文档中选择[相关性和多样性](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf)，以避免传递重复的上下文。
-文档可以在矢量存储检索期间使用[`metadata`过滤器](https://python.langchain.com/docs/use_cases/question_answer/document-context-aware-QA)进行过滤。
- [Integrations](https://python.langchain.com/docs/integrations/retrievers/):与检索服务的集成。

##步骤5. 生成

让我们将所有这些放到一个链中，该链接受一个问题、检索相关文档、构造提示、将提示传递给模型并解析输出。

我们将使用gpt-3.5 turbo OpenAI聊天模型，但任何LangChain `LLM`或`ChatModel`都可以替换。

In [23]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
llm

ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x78b03147e4d0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x78b0314e56c0>, temperature=0.0, openai_api_key='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', openai_proxy='')

我们将为RAG使用一个签入到LangChain提示中心的提示符([这里](https://smith.langchain.com/hub/rlm/rag-prompt))。

In [24]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

In [25]:
print(
    prompt.invoke(
        {"context": "filler context", "question": "filler question"}
    ).to_string()
)

Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question 
Context: filler context 
Answer:


我们将使用[**LCEL Runnable**](https://python.langchain.com/docs/expression_language/)协议来定义链，允许我们
- 以透明的方式将组件和功能连接在一起
- 自动追踪我们在LangSmith的链条
- 获得流，异步和批处理调用

In [26]:
from langchain.schema import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [29]:
for chunk in rag_chain.stream("What is Task Decomposition?"):
    print(chunk, end="", flush=True)

Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through methods like Chain of Thought (CoT) or Tree of Thoughts, which involve dividing the task into manageable subtasks and exploring multiple reasoning possibilities at each step. Task decomposition can be performed by AI models with prompting, task-specific instructions, or human inputs.

:::tip

Check out the [LangSmith trace](https://smith.langchain.com/public/1799e8db-8a6d-4eb2-84d5-46e8d7d5a99b/r)

:::

### 进一步了解

#### 选择LLMs
`ChatModel`: llm支持的聊天模型包装器。接受消息序列并返回消息。
- [Docs](https://python.langchain.com/docs/modules/model_io/chat/)
- [Integrations](https://python.langchain.com/docs/integrations/chat/):探索超过25个“ChatModel”集成。

`LLM`: 文本输入文本输出LLM。接受一个字符串并返回一个字符串。
- [Docs](https://python.langchain.com/docs/modules/model_io/llms)
- [Integrations](https://python.langchain.com/docs/integrations/llms): 浏览超过75个LLM集成。

参见本地运行模型的RAG指南在[这里](https://python.langchain.com/docs/use_cases/question_answering/local_retrieval_qa)。

#### 自定义提示符

如上所示，我们可以从提示中心加载提示(例如，[这个RAG提示](https://smith.langchain.com/hub/rlm/rag-prompt))。提示符也可以很容易地定制:

In [31]:
from langchain.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
rag_prompt_custom = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt_custom
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")

'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks, allowing for a more systematic and organized approach to problem-solving. Thanks for asking!'

:::tip

Check out the [LangSmith trace](https://smith.langchain.com/public/da23c4d8-3b33-47fd-84df-a3a582eedf84/r)

:::

### 添加源

使用LCEL很容易从文档中返回检索到的文档或某些源元数据:

In [32]:
from operator import itemgetter

from langchain_core.runnables import RunnableParallel

rag_chain_from_docs = (
    {
        "context": lambda input: format_docs(input["documents"]),
        "question": itemgetter("question"),
    }
    | rag_prompt_custom
    | llm
    | StrOutputParser()
)
rag_chain_with_source = RunnableParallel(
    {"documents": retriever, "question": RunnablePassthrough()}
) | {
    "documents": lambda input: [doc.metadata for doc in input["documents"]],
    "answer": rag_chain_from_docs,
}

rag_chain_with_source.invoke("What is Task Decomposition")

{'documents': [{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 1585},
  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 2192},
  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 17804},
  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 17414},
  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 29630},
  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 19373}],
 'answer': 'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks, allowing for a more systematic and organized approach to problem-solving. Thanks for asking!'}

:::tip

Check out the [LangSmith trace](https://smith.langchain.com/public/007d7e01-cb62-4a84-8b71-b24767f953ee/r)

:::

### 添加记忆(memory)

假设我们想要创建一个有状态的应用程序来记住过去的用户输入。为了支持这一点，我们需要做两件主要的事情。
1. 在我们的链中添加一个消息占位符，它允许我们传入历史消息
2. 添加一条链，接收最新的用户查询，并在聊天历史的上下文中将其重新表述为可以传递给检索器的独立问题。

从2开始。我们可以构建一个“浓缩问题”链，看起来像这样:

In [33]:
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

condense_q_system_prompt = """Given a chat history and the latest user question \
which might reference the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
condense_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", condense_q_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)
condense_q_chain = condense_q_prompt | llm | StrOutputParser()

In [34]:
from langchain_core.messages import AIMessage, HumanMessage

condense_q_chain.invoke(
    {
        "chat_history": [
            HumanMessage(content="What does LLM stand for?"),
            AIMessage(content="Large language model"),
        ],
        "question": "What is meant by large",
    }
)

'What is the definition of "large" in the context of a language model?'

In [35]:
condense_q_chain.invoke(
    {
        "chat_history": [
            HumanMessage(content="What does LLM stand for?"),
            AIMessage(content="Large language model"),
        ],
        "question": "How do transformers work",
    }
)

'How do transformer models function?'

现在我们可以构建完整的QA链了。注意，我们添加了一些路由功能，只在聊天记录不是空的情况下运行“压缩问题链(condense question chain)”。

In [36]:
qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)


def condense_question(input: dict):
    if input.get("chat_history"):
        return condense_q_chain
    else:
        return input["question"]


rag_chain = (
    RunnablePassthrough.assign(context=condense_question | retriever | format_docs)
    | qa_prompt
    | llm
)

In [37]:
chat_history = []

question = "What is Task Decomposition?"
ai_msg = rag_chain.invoke({"question": question, "chat_history": chat_history})
chat_history.extend([HumanMessage(content=question), ai_msg])

second_question = "What are common ways of doing it?"
rag_chain.invoke({"question": second_question, "chat_history": chat_history})

AIMessage(content='Common ways of task decomposition include:\n\n1. Using Chain of Thought (CoT): CoT is a prompting technique that instructs a model to "think step by step" and decompose complex tasks into smaller and simpler steps. This approach utilizes more computation at test-time and helps in interpreting the model\'s thinking process.\n\n2. Prompting with LLM: Language Model (LLM) can be used to prompt the model with simple instructions like "Steps for XYZ" or "What are the subgoals for achieving XYZ?" This allows the model to generate a sequence of subtasks or steps for completing the main task.\n\n3. Task-specific instructions: For certain tasks, specific instructions can be provided to guide the model in decomposing the task. For example, in the context of writing a novel, an instruction like "Write a story outline" can help in breaking down the task into manageable components.\n\n4. Human inputs: In some cases, human inputs can be used to assist in task decomposition. Humans

:::tip

Check out the [LangSmith trace](https://smith.langchain.com/public/b3001782-bb30-476a-886b-12da17ec258f/r)

:::

在这里，我们讨论了如何添加链逻辑来合并历史输出。但是我们如何存储和检索不同会话的历史输出呢?为此，请查看LCEL[如何添加消息历史(记忆)](https://python.langchain.com/docs/expression_language/how_to/message_history)页面。

## 下一步

我们在很短的时间内涵盖了很多内容。在上面的每一节中都有许多细微差别、特性、集成等值得探索。除了上面提到的来源，接下来的步骤包括:

- 在[retrievers](https://python.langchain.com/docs/modules/data_connection/ retrievers/)部分中阅读更高级的检索技术。
- 学习LangChain [Indexing API](https://python.langchain.com/docs/modules/data_connection/indexing)，它可以帮助重复同步数据源和矢量存储，而无需冗余计算或存储。
- 探索RAG [LangChain Templates](https://python.langchain.com/docs/templates/#-advanced-retrieval)，这些参考应用程序可以很容易地与[LangServe](https://python.langchain.com/docs/langserve)一起部署。
- 学习[使用LangSmith评估RAG应用程序](https://github.com/langchain-ai/langsmith-cookbook/blob/main/testing-examples/qa-correctness/qa-correctness.ipynb)。

# 总结

- 使用这个CoT的RAG方式, 可以整合不同的LLMs, 如果有自己的内部垂直领域好的大模型，对于短板领域，可以`套壳`借助另外好的该领域大模型进行使用（模型服务，还是上层应用都可以`套壳`大模型）。 补齐短板，未来机器人交互应该可以这样，个人助理 - 超级马里奥
- 主要还是以大模型Transform为主的神经网络的思维去思考上层应用，进行串联，仿生人类大脑工作思维进行交互。