# # LangChain 应用开发指南（进阶）: 使用案例👨‍🍳👩‍🍳

*本文档基于[LangChain 概念文档](https://docs.langchain.com/docs/)*

**目标：**

1. 激励您去构建
2. 通过[ELI5](https://www.dictionary.com/e/slang/eli5/#:~:text=ELI5%20is%20short%20for%20%E2%80%9CExplain,a%20complicated%20question%20or%20problem.)示例和代码片段，提供LangChain主要用例的初步理解。要了解LangChain的*基础知识*，请查看文档第1部分：基础知识。

**LangChain 链接：**
* [LC 概念文档](https://docs.langchain.com/docs/)
* [LC Python 文档](https://python.langchain.com/en/latest/)
* [LC Javascript/Typescript 文档](https://js.langchain.com/docs/)
* [LC Discord](https://discord.gg/6adMQxSpJS)
* [www.langchain.com](https://langchain.com/)
* [LC Twitter](https://twitter.com/LangChainAI)


### **什么是LangChain？**
> LangChain是一个开发由语言模型驱动的应用程序的框架。
*[来源](https://blog.langchain.dev/announcing-our-10m-seed-round-led-by-benchmark/#:~:text=LangChain%20is%20a%20framework%20for%20developing%20applications%20powered%20by%20language%20models)*

**简而言之**：LangChain使得与AI模型的工作和构建的复杂部分变得更简单。它通过两种方式帮助实现这一点：

1. **集成** - 将外部数据，如您的文件、其他应用程序和api数据，带到您的LLM中
2. **代理** - 允许您的LLM通过决策与其环境互动。使用LLM帮助决定下一步采取哪种行动

### **为什么选择LangChain？**
1. **组件** - LangChain使得更换与语言模型工作所需的抽象和组件变得容易。

2. **定制链** - LangChain提供了使用和定制“链”（一系列串联起来的动作）的开箱即用支持。

3. **速度 🚢** - 这个团队的开发速度极快。您将能够及时了解最新的LLM特性。

4. **社区 👥** - 精彩的[discord](https://discord.gg/6adMQxSpJS)和社区支持，聚会，黑客松等。

虽然LLM可以很直接（文本输入，文本输出），但一旦您开发更复杂的应用程序，您将很快遇到LangChain可以帮助解决的摩擦点。

### **主要用例**

* **摘要** - 表达文本或聊天互动中最重要的事实
* **文档问题与回答** - 使用文档中的信息回答问题或查询
* **提取** - 从文本或用户查询中提取结构化数据
* **评估** - 理解您的应用程序输出的质量
* **查询表格数据** - 从数据库或其他表格数据源中提取数据
* **代码理解** - 对代码进行推理和解析
* **与API交互** - 查询API并与外部世界进行交互
* **聊天机器人** - 一个在聊天界面中与用户进行来回交互并结合记忆的框架
* **代理** - 使用LLM对下一步要做的决策进行决策。使用工具来实现这些决策。

想要看到这些用例的实际示例吗？请前往[LangChain 项目画廊](https://github.com/gkamradt/langchain-tutorials)

#### **作者说明:**

* 本文档不会涵盖LangChain的所有方面。它的内容经过精心策划，旨在让您尽快构建和产生影响。更多信息，请查看[LangChain 技术文档](https://python.langchain.com/en/latest/index.html)
* 本笔记本假设您已经看过本系列的第一部分[基础知识](https://github.com/gkamradt/langchain-tutorials/blob/main/LangChain%20Cookbook%20Part%201%20-%20Fundamentals.ipynb)。本笔记本侧重于如何应用这些基础知识。
* 您会注意到我在整个笔记本中重复导入语句。我的意图是倾向于清晰，并帮助您在一个地方看到完整的代码块。无需来回查看我们何时导入了一个包。
* 我们在整个笔记本中使用默认模型，当时它们是davinci-003和gpt-3.5-turbo。毫无疑问，使用GPT4会获得更好的结果。

让我们开始吧

在整个教程中，我们将使用OpenAI的各种[模型](https://platform.openai.com/docs/models/overview)。LangChain使得[替换LLMs](https://langchain.com/integrations.html#:~:text=integrations%20LangChain%20provides.-,LLMs,-LLM%20Provider)变得容易，这样您就可以自己选择LLM（如果需要的话）。

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()

openai_api_key = os.getenv('OPENAI_API_KEY', 'YourAPIKeyIfNotSet')

In [2]:
# Run this cell if you want to make your display wider
from IPython.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

# LangChain 使用案例

## 摘要

LangChain 和 LLMs 最常见的用例之一是摘要。您可以对任何文本进行摘要，但用例包括对电话、文章、书籍、学术论文、法律文件、用户历史、表格或财务文件进行摘要。拥有一个可以快速总结信息的工具非常有帮助。

* **示例** - [对B2B销售电话进行摘要](https://www.youtube.com/watch?v=DIw4rbpI9ic)
* **用例** - 对文章、成绩单、聊天记录、Slack/Discord、客户互动、医学论文、法律文件、播客、推特串、代码库、产品评论、财务文件进行摘要

### 对短文本的摘要

对短文本进行摘要的方法很简单，事实上，除了简单提示和说明外，您不需要做任何花哨的事情

In [3]:
from langchain.llms import OpenAI
from langchain import PromptTemplate

# Note, the default model is already 'text-davinci-003' but I call it out here explicitly so you know where to change it later if you want
llm = OpenAI(temperature=0, model_name='text-davinci-003', openai_api_key=openai_api_key)

# Create our template
template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:
{text}
"""

# Create a LangChain prompt template that we can insert values to later
prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)

让我们在网上找一个令人困惑的文本。*[来源](https://www.smithsonianmag.com/smart-news/long-before-trees-overtook-the-land-earth-was-covered-by-giant-mushrooms-13709647/)*

In [4]:
confusing_text = """
For the next 130 years, debate raged.
Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.
“The problem is that when you look up close at the anatomy, it’s evocative of a lot of different things, but it’s diagnostic of nothing,” says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.
“And it’s so damn big that when whenever someone says it’s something, everyone else’s hackles get up: ‘How could you have a lichen 20 feet tall?’”
"""

让我们看看将发送给LLM的提示是什么

In [5]:
print ("------- Prompt Begin -------")

final_prompt = prompt.format(text=confusing_text)
print(final_prompt)

print ("------- Prompt End -------")

------- Prompt Begin -------

%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:

For the next 130 years, debate raged.
Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.
“The problem is that when you look up close at the anatomy, it’s evocative of a lot of different things, but it’s diagnostic of nothing,” says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.
“And it’s so damn big that when whenever someone says it’s something, everyone else’s hackles get up: ‘How could you have a lichen 20 feet tall?’”


------- Prompt End -------


最后让我们通过LLM处理一下

In [6]:
output = llm(final_prompt)
print (output)


For 130 years, people argued about what Prototaxites was. Some thought it was a lichen, some thought it was a fungus, and some thought it was a tree. But no one could agree. It was so big that it was hard to figure out what it was.


这种方法对于较长的文本效果不错，但是对于更长的文本，管理起来可能会变得麻烦，并且会遇到令牌限制。幸运的是，LangChain具有针对通过它们的[load_summarize_chain](https://python.langchain.com/en/latest/use_cases/summarization.html)进行摘要的不同方法的开箱即用支持。

### 对长文本的摘要

*注意：这种方法也适用于短文本*

In [7]:
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

让我们加载一个较长的文档

In [8]:
with open('data/PaulGrahamEssays/good.txt', 'r') as file:
    text = file.read()

# Printing the first 285 characters as a preview
print (text[:285])

April 2008(This essay is derived from a talk at the 2008 Startup School.)About a month after we started Y Combinator we came up with the
phrase that became our motto: Make something people want.  We've
learned a lot since then, but if I were choosing now that's still
the one I'd pick.


然后让我们检查一下这个文档中有多少个标记。[get_num_tokens](https://python.langchain.com/en/latest/reference/modules/llms.html#langchain.llms.OpenAI.get_num_tokens) 是一个很好的方法来做这件事。

In [9]:
num_tokens = llm.get_num_tokens(text)

print (f"There are {num_tokens} tokens in your file")

There are 3970 tokens in your file


虽然你可能可以把这段文本放入你的提示中，但让我们假装它太大了，需要另一种方法。

首先，我们需要将其分割。这个过程叫做“分块”或“分割”你的文本成小块。我喜欢[RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html)，因为它很容易控制，但你也可以尝试一些[其他方法](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html)。

In [10]:
text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=5000, chunk_overlap=350)
docs = text_splitter.create_documents([text])

print (f"You now have {len(docs)} docs intead of 1 piece of text")

You now have 4 docs intead of 1 piece of text


接下来，我们需要加载一个链，它将连续调用LLM。想要查看下面链中使用的提示吗？请查看[LangChain documentation](https://github.com/hwchase17/langchain/blob/master/langchain/chains/summarize/map_reduce_prompt.py)

有关链类型之间的区别，请查看这个关于[令牌限制的解决方法](https://youtu.be/f9_BWhCI4Zo)的视频。

*注意：您还可以将map_reduce的前4次调用并行运行*

In [11]:
# Get your chain ready to use
chain = load_summarize_chain(llm=llm, chain_type='map_reduce') # verbose=True optional to see what is getting sent to the LLM

In [12]:
# Use it. This will run through the 4 documents, summarize the chunks, then get a summary of the summary.
output = chain.run(docs)
print (output)

 This essay looks at the idea of benevolence in startups, and how it can help them succeed. It explains how benevolence can improve morale, make people want to help, and help startups be decisive. It also looks at how markets have evolved to value potential dividends and potential earnings, and how users dislike their new operating system. The author argues that starting a company with benevolent aims is currently undervalued, and that Y Combinator's motto of "Make something people want" is a powerful concept.


## 使用文档作为上下文进行问答

*[LangChain 问答文档](https://python.langchain.com/en/latest/use_cases/question_answering.html)*

为了使用LLM进行问答，我们必须：

1. 为LLM提供回答问题所需的相关上下文
2. 提出我们想要回答的问题

简化而言，这个过程看起来像这样 "llm(您的上下文 + 您的问题) = 您的答案"

* **深入研究** - [问一本书](https://youtu.be/h0DHDp1FbmQ), [向您的自定义文件提问](https://youtu.be/EnT-ZTrcPrg), [Chat Your Data JS (1000页财务报告)](https://www.youtube.com/watch?v=Ix9WIZpArm0&t=1051s), [LangChain 问答网络研讨会](https://www.crowdcast.io/c/rh66hcwivly0)
* **示例** - [ChatPDF](https://www.chatpdf.com/)
* **用例** - 与您的文档交谈，向学术论文提问，创建学习指南，参考医学信息

### 简单问答示例

在这里，让我们回顾一下“llm(您的上下文 + 您的问题) = 您的答案”的惯例

In [13]:
from langchain.llms import OpenAI

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [14]:
context = """
Rachel is 30 years old
Bob is 45 years old
Kevin is 65 years old
"""

question = "Who is under 40 years old?"

然后将它们组合起来。

In [15]:
output = llm(context + question)

# I strip the text to remove the leading and trailing whitespace
print (output.strip())

Rachel is under 40 years old.


随着我们提升复杂度，我们将更多地利用这个惯例。

当您需要选择性地将数据放入您的上下文中时，困难之处在于*您*放入上下文的数据。这个研究领域被称为“[文档检索](https://python.langchain.com/en/latest/modules/indexes/retrievers.html)”，与AI记忆紧密相连。

### 使用嵌入

我非正式地称我们即将进行的过程为“向量存储舞蹈”。这个过程涉及将文本分割、嵌入块、将嵌入放入数据库，然后查询它们。有关此内容的完整视频，请参阅[如何问一本书](https://www.youtube.com/watch?v=h0DHDp1FbmQ)

我们的目标是选择我们长文本的相关块，但我们应该选择哪些块呢？最流行的方法是基于比较向量嵌入来选择*相似*的文本。

In [16]:
from langchain import OpenAI

# The vectorstore we'll be using
from langchain.vectorstores import FAISS

# The LangChain component we'll use to get the documents
from langchain.chains import RetrievalQA

# The easy document loader for text
from langchain.document_loaders import TextLoader

# The embedding engine that will convert our text to vectors
from langchain.embeddings.openai import OpenAIEmbeddings

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

让我们加载一个较长的文档

In [17]:
loader = TextLoader('data/PaulGrahamEssays/worked.txt')
doc = loader.load()
print (f"You have {len(doc)} document")
print (f"You have {len(doc[0].page_content)} characters in that document")

You have 1 document
You have 74663 characters in that document


现在让我们将长文档分割成小块

In [18]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
docs = text_splitter.split_documents(doc)

In [19]:
# Get the total number of characters so we can see the average later
num_total_characters = sum([len(x.page_content) for x in docs])

print (f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")

Now you have 29 documents that have an average of 2,930 characters (smaller pieces)


In [20]:
# Get your embeddings engine ready
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

# Embed your documents and combine with the raw text in a pseudo db. Note: This will make an API call to OpenAI
docsearch = FAISS.from_documents(docs, embeddings)

创建您的检索引擎

In [21]:
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

现在是时候提出一个问题了。检索器将获取相似的文档，并与您的问题结合，供LLM推理。

注意：这看起来可能不起眼，但这里的奇迹在于我们不必传入完整的原始文档。

In [22]:
query = "What does the author describe as good work?"
qa.run(query)

' The author describes painting as good work.'

如果您想要做更多，您可以将其连接到云向量数据库，使用类似 metal 的工具，并开始管理您的文档，以及外部数据源

## 提取
*[LangChain 提取文档](https://python.langchain.com/en/latest/use_cases/extraction.html)*

提取是从文本中解析数据的过程。这通常用于输出解析，以便*结构化*我们的数据。

* **深入研究** - [使用LLM从文本中提取数据（专家级文本提取）](https://youtu.be/xZzvwR9jdPA), [从OpenAI中提取结构化输出（清理脏数据）](https://youtu.be/KwAXfey-xQk)
* **示例** - [OpeningAttributes](https://twitter.com/GregKamradt/status/1646500373837008897)
* **用例:** 从句子中提取结构化行以插入数据库，从长文档中提取多行以插入数据库，从用户查询中提取参数以进行API调用

一个流行的提取库是[Kor](https://eyurtsev.github.io/kor/)。我们今天不会涉及它，但我强烈建议您查看它，以进行高级提取。

In [23]:
# To help construct our Chat Messages
from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate

# We will be using a chat model, defaults to gpt-3.5-turbo
from langchain.chat_models import ChatOpenAI

# To parse outputs and get structured data back
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

chat_model = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)

### 原始提取

让我们从一个简单的例子开始。在这里，我只是提供了一个带有输出类型说明的提示。

In [24]:
instructions = """
You will be given a sentence with fruit names, extract those fruit names and assign an emoji to them
Return the fruit name and emojis in a python dictionary
"""

fruit_names = """
Apple, Pear, this is an kiwi
"""

In [25]:
# Make your prompt which combines the instructions w/ the fruit names
prompt = (instructions + fruit_names)

# Call the LLM
output = chat_model([HumanMessage(content=prompt)])

print (output.content)
print (type(output.content))

{'Apple': '🍎', 'Pear': '🍐', 'kiwi': '🥝'}
<class 'str'>


让我们将其转换为一个合适的Python字典

In [26]:
output_dict = eval(output.content)

print (output_dict)
print (type(output_dict))

{'Apple': '🍎', 'Pear': '🍐', 'kiwi': '🥝'}
<class 'dict'>


虽然这次有效，但对于更高级的用例来说，这并不是一个长期可靠的方法

### 使用LangChain的响应模式

LangChain的响应模式将为我们做两件事：

1. 自动生成带有真实格式说明的提示。这很棒，因为我不需要担心提示工程方面的问题，我会把这个交给LangChain！

2. 从LLM的输出中读取数据，并将其转换为适当的Python对象

在这里，我定义了我想要的模式。我将从伪聊天消息中提取用户想要播放的歌曲和艺术家。

In [27]:
# The schema I want out
response_schemas = [
    ResponseSchema(name="artist", description="The name of the musical artist"),
    ResponseSchema(name="song", description="The name of the song that the artist plays")
]

# The parser that will look for the LLM output in my schema and return it back to me
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [28]:
# The format instructions that LangChain makes. Let's look at them
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":

```json
{
	"artist": string  // The name of the musical artist
	"song": string  // The name of the song that the artist plays
}
```


In [29]:
# The prompt template that brings it all together
# Note: This is a different prompt template than before because we are using a Chat Model

prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template("Given a command from the user, extract the artist and song names \n \
                                                    {format_instructions}\n{user_prompt}")  
    ],
    input_variables=["user_prompt"],
    partial_variables={"format_instructions": format_instructions}
)

In [30]:
fruit_query = prompt.format_prompt(user_prompt="I really like So Young by Portugal. The Man")
print (fruit_query.messages[0].content)

Given a command from the user, extract the artist and song names 
                                                     The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":

```json
{
	"artist": string  // The name of the musical artist
	"song": string  // The name of the song that the artist plays
}
```
I really like So Young by Portugal. The Man


In [31]:
fruit_output = chat_model(fruit_query.to_messages())
output = output_parser.parse(fruit_output.content)

print (output)
print (type(output))

{'artist': 'Portugal. The Man', 'song': 'So Young'}
<class 'dict'>


太棒了，现在我们有一个字典，以后可以用。

<span style="background:#fff5d6">警告：</span> 解析器寻找特定格式的LLM输出。您的模型可能不会每次都输出相同的格式。确保使用此方法处理错误。GPT4和未来的迭代版本将更加可靠。

要进行更高级的解析，请查看[Kor](https://eyurtsev.github.io/kor/)

## 评估

*[LangChain 评估文档](https://python.langchain.com/en/latest/use_cases/evaluation.html)*

评估是对应用程序输出进行质量检查的过程。通常，确定性的代码有我们可以运行的测试，但由于自然语言的不确定性和可变性，评估LLM的输出更加困难。LangChain提供了帮助我们在这个过程中的工具。

* **深入研究** - 即将推出
* **示例** - [Lance Martin的高级](https://twitter.com/RLanceMartin) [自动评估器](https://github.com/rlancemartin/auto-evaluator)
* **用例:** 对您的摘要或问答流程运行质量检查，检查您的摘要流程的输出

In [32]:
# Embeddings, store, and retrieval
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

# Model and doc loader
from langchain import OpenAI
from langchain.document_loaders import TextLoader

# Eval!
from langchain.evaluation.qa import QAEvalChain

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [33]:
# Our long essay from before
loader = TextLoader('data/PaulGrahamEssays/worked.txt')
doc = loader.load()

print (f"You have {len(doc)} document")
print (f"You have {len(doc[0].page_content)} characters in that document")

You have 1 document
You have 74663 characters in that document


首先让我们进行向量存储的操作，这样我们就可以进行问答了

In [34]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
docs = text_splitter.split_documents(doc)

# Get the total number of characters so we can see the average later
num_total_characters = sum([len(x.page_content) for x in docs])

print (f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")

Now you have 29 documents that have an average of 2,930 characters (smaller pieces)


In [35]:
# Embeddings and docstore
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
docsearch = FAISS.from_documents(docs, embeddings)

创建您的检索引擎。请注意，现在我有一个`input_key`参数。这告诉链条我提供的字典中哪个键包含我的提示/查询。我指定`question`以匹配下面字典中的问题

In [36]:
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever(), input_key="question")

现在我将向LLM传递一个问题列表和我知道是正确的真实答案（我作为人类验证过）。

In [37]:
question_answers = [
    {'question' : "Which company sold the microcomputer kit that his friend built himself?", 'answer' : 'Healthkit'},
    {'question' : "What was the small city he talked about in the city that is the financial capital of USA?", 'answer' : 'Yorkville, NY'}
]

我将使用`chain.apply`分别运行我的两个问题。

其中一个很酷的部分是，我将得到一个包含问题和答案的字典列表，但字典中还会有另一个键`result`，这将是LLM的输出。

注意：我特意让我的第二个问题含糊不清，很难一次回答正确，这样LLM会回答错误。

In [38]:
predictions = chain.apply(question_answers)
predictions

[{'question': 'Which company sold the microcomputer kit that his friend built himself?',
  'answer': 'Healthkit',
  'result': ' The microcomputer kit was sold by Heathkit.'},
 {'question': 'What was the small city he talked about in the city that is the financial capital of USA?',
  'answer': 'Yorkville, NY',
  'result': ' The small city he talked about is New York City, which is the financial capital of the United States.'}]

然后让LLM将我的真实答案（`answer`键）与LLM的结果（`result`键）进行比较。

简单来说，我们要求LLM对自己进行评分。我们生活在一个疯狂的世界。

In [39]:
# Start your eval chain
eval_chain = QAEvalChain.from_llm(llm)

# Have it grade itself. The code below helps the eval_chain know where the different parts are
graded_outputs = eval_chain.evaluate(question_answers,
                                     predictions,
                                     question_key="question",
                                     prediction_key="result",
                                     answer_key='answer')

In [40]:
graded_outputs

[{'text': ' CORRECT'}, {'text': ' INCORRECT'}]

这是正确的！请注意，问题＃1的答案是“Healthkit”，而预测是“该微型计算机套件由Heathkit出售”。LLM知道答案和结果是相同的，并给了我们一个“正确”的标签。太棒了。

对于问题＃2，它知道它们不相同，并给了我们一个“不正确”的标签。

## 查询表格数据

*[LangChain 查询表格数据文档](https://python.langchain.com/en/latest/use_cases/tabular.html)*

世界上最常见的数据类型是表格形式的（好吧，除了非结构化数据）。能够使用LangChain查询这些数据并将其传递给LLM非常强大。

* **深入研究** - 即将推出
* **示例** - 待定
* **用例:** 使用LLM查询用户数据，进行数据分析，从数据库获取实时信息

要了解更多信息，请查看“代理 + 表格数据”（[Pandas](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/pandas.html), [SQL](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/sql_database.html), [CSV](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/csv.html)）

让我们用自然语言查询一个SQLite数据库。我们将查看[旧金山树木](https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq)数据集。

In [41]:
from langchain import OpenAI, SQLDatabase, SQLDatabaseChain

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

我们将首先指定数据的位置并准备好连接。

In [42]:
sqlite_db_path = 'data/San_Francisco_Trees.db'
db = SQLDatabase.from_uri(f"sqlite:///{sqlite_db_path}")

然后我们将创建一个链条，将我们的LLM和数据库结合起来。我设置`verbose=True`，这样您可以看到发生了什么。

In [43]:
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)



In [44]:
db_chain.run("How many Species of trees are there in San Francisco?")



[1m> Entering new SQLDatabaseChain chain...[0m
How many Species of trees are there in San Francisco?
SQLQuery:[32;1m[1;3mSELECT COUNT(DISTINCT "qSpecies") FROM "SFTrees";[0m
SQLResult: [33;1m[1;3m[(578,)][0m
Answer:[32;1m[1;3mThere are 578 Species of trees in San Francisco.[0m
[1m> Finished chain.[0m


'There are 578 Species of trees in San Francisco.'

太棒了！实际上这里有几个步骤。

**步骤:**
1. 找到要使用的表
2. 找到要使用的列
3. 构造正确的SQL查询
4. 执行该查询
5. 获取结果
6. 返回自然语言响应

让我们通过pandas来确认一下

In [45]:
import sqlite3
import pandas as pd

# Connect to the SQLite database
connection = sqlite3.connect(sqlite_db_path)

# Define your SQL query
query = "SELECT count(distinct qSpecies) FROM SFTrees"

# Read the SQL query into a Pandas DataFrame
df = pd.read_sql_query(query, connection)

# Close the connection
connection.close()

In [46]:
# Display the result in the first column first cell
print(df.iloc[0,0])

578


很好！答案匹配。

## 代码理解

*[LangChain 代码理解文档](https://python.langchain.com/en/latest/use_cases/code.html)*

LLM最令人兴奋的能力之一就是代码理解。全世界的人们都因为人工智能的帮助而提高了输出速度和质量。其中一个重要部分是拥有一个能够理解代码并帮助您完成特定任务的LLM。

* **深入研究** - 即将推出
* **示例** - 待定
* **用例:** 类似Co-Pilot的功能，可以帮助回答特定库中的问题，帮助您生成新代码

In [47]:
# Helper to read local files
import os

# Vector Support
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

# Model and chain
from langchain.chat_models import ChatOpenAI

# Text splitters
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader

llm = ChatOpenAI(model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)

我们将再次进行向量存储的操作。

In [48]:
embeddings = OpenAIEmbeddings(disallowed_special=(), openai_api_key=openai_api_key)

我在该存储库的数据文件夹中放了一个小的Python包[The Fuzz](https://github.com/seatgeek/thefuzz)（个人喜爱的独立包）。

下面的循环将遍历库中的每个文件，并将其加载为文档。

In [49]:
root_dir = 'data/thefuzz'
docs = []

# Go through each folder
for dirpath, dirnames, filenames in os.walk(root_dir):
    
    # Go through each file
    for file in filenames:
        try: 
            # Load up the file as a doc and split
            loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')
            docs.extend(loader.load_and_split())
        except Exception as e: 
            pass

Let's look at an example of a document. It's just code!

In [50]:
print (f"You have {len(docs)} documents\n")
print ("------ Start Document ------")
print (docs[0].page_content[:300])

You have 175 documents

------ Start Document ------
import unittest
import re
import pycodestyle

from thefuzz import fuzz
from thefuzz import process
from thefuzz import utils
from thefuzz.string_processing import StringProcessor


class StringProcessingTest(unittest.TestCase):
    def test_replace_non_letters_non_numbers_with_whitespace(self):
    


将它们嵌入并存储在文档存储中。这将会向OpenAI发出API调用。

In [51]:
docsearch = FAISS.from_documents(docs, embeddings)

In [52]:
# Get our retriever ready
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

In [53]:
query = "What function do I use if I want to find the most similar item in a list of items?"
output = qa.run(query)

In [54]:
print (output)

You can use the `process.extractOne()` function from `thefuzz` package to find the most similar item in a list of items. Here's an example:

```
from thefuzz import process

choices = ["apple", "banana", "orange", "pear"]
query = "pineapple"

best_match = process.extractOne(query, choices)
print(best_match)
```

This would output `(u'apple', 36)`, which means that the most similar item to "pineapple" in the list of choices is "apple", with a similarity score of 36.


In [55]:
query = "Can you write the code to use the process.extractOne() function? Only respond with code. No other text or explanation"
output = qa.run(query)

In [56]:
print (output)

import fuzzywuzzy.process as process

choices = [
    "new york mets vs chicago cubs",
    "chicago cubs at new york mets",
    "atlanta braves vs pittsbugh pirates",
    "new york yankees vs boston red sox"
]

query = "new york mets at chicago cubs"

best = process.extractOne(query, choices)
print(best[0])


## 与API交互

*[LangChain API交互文档](https://python.langchain.com/en/latest/use_cases/apis.html)*

如果您需要的数据或操作位于API后面，您将需要您的LLM与API进行交互。

* **深入研究** - 即将推出
* **示例** - 待定
* **用例:** 理解用户的请求并执行操作，能够自动化更多的现实世界工作流程

这个主题与代理和插件密切相关，尽管我们将在本节中简单介绍一个用例。有关更多信息，请查看[LangChain + 插件](https://python.langchain.com/en/latest/use_cases/agents/custom_agent_with_plugin_retrieval_using_plugnplai.html)文档。

In [57]:
from langchain.chains import APIChain
from langchain.llms import OpenAI

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

LangChain的APIChain能够阅读API文档并理解它需要调用哪个端点。

在这种情况下，我写了（故意马虎的）API文档来演示这是如何工作的。

In [58]:
api_docs = """

BASE URL: https://restcountries.com/

API Documentation:

The API endpoint /v3.1/name/{name} Used to find informatin about a country. All URL parameters are listed below:
    - name: Name of country - Ex: italy, france
    
The API endpoint /v3.1/currency/{currency} Uesd to find information about a region. All URL parameters are listed below:
    - currency: 3 letter currency. Example: USD, COP
    
Woo! This is my documentation
"""

chain_new = APIChain.from_llm_and_api_docs(llm, api_docs, verbose=True)

让我们尝试进行一个针对国家端点的API调用。

In [59]:
chain_new.run('Can you tell me information about france?')



[1m> Entering new APIChain chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/name/france[0m
[33;1m[1;3m[{"name":{"common":"France","official":"French Republic","nativeName":{"fra":{"official":"République française","common":"France"}}},"tld":[".fr"],"cca2":"FR","ccn3":"250","cca3":"FRA","cioc":"FRA","independent":true,"status":"officially-assigned","unMember":true,"currencies":{"EUR":{"name":"Euro","symbol":"€"}},"idd":{"root":"+3","suffixes":["3"]},"capital":["Paris"],"altSpellings":["FR","French Republic","République française"],"region":"Europe","subregion":"Western Europe","languages":{"fra":"French"},"translations":{"ara":{"official":"الجمهورية الفرنسية","common":"فرنسا"},"bre":{"official":"Republik Frañs","common":"Frañs"},"ces":{"official":"Francouzská republika","common":"Francie"},"cym":{"official":"French Republic","common":"France"},"deu":{"official":"Französische Republik","common":"Frankreich"},"est":{"official":"Prantsuse Vabariik","common":"Prantsusmaa"},"f

' France is an officially-assigned, independent country located in Western Europe. Its capital is Paris and its official language is French. Its currency is the Euro (€). It has a population of 67,391,582 and its borders are with Andorra, Belgium, Germany, Italy, Luxembourg, Monaco, Spain, and Switzerland.'

让我们尝试进行一个针对货币端点的API调用。

In [60]:
chain_new.run('Can you tell me about the currency COP?')



[1m> Entering new APIChain chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/currency/COP[0m
[33;1m[1;3m[{"name":{"common":"Colombia","official":"Republic of Colombia","nativeName":{"spa":{"official":"República de Colombia","common":"Colombia"}}},"tld":[".co"],"cca2":"CO","ccn3":"170","cca3":"COL","cioc":"COL","independent":true,"status":"officially-assigned","unMember":true,"currencies":{"COP":{"name":"Colombian peso","symbol":"$"}},"idd":{"root":"+5","suffixes":["7"]},"capital":["Bogotá"],"altSpellings":["CO","Republic of Colombia","República de Colombia"],"region":"Americas","subregion":"South America","languages":{"spa":"Spanish"},"translations":{"ara":{"official":"جمهورية كولومبيا","common":"كولومبيا"},"bre":{"official":"Republik Kolombia","common":"Kolombia"},"ces":{"official":"Kolumbijská republika","common":"Kolumbie"},"cym":{"official":"Gweriniaeth Colombia","common":"Colombia"},"deu":{"official":"Republik Kolumbien","common":"Kolumbien"},"est":{"official":"Colom

' The currency of Colombia is the Colombian peso (COP), symbolized by the "$" sign.'

在这两种情况下，APIChain读取了说明并理解了它需要进行哪个API调用。

一旦返回了响应，它就会被解析，然后我的问题就会得到回答。太棒了 🐒

## 聊天机器人

*[LangChain 聊天机器人文档](https://python.langchain.com/en/latest/use_cases/chatbots.html)*

聊天机器人使用了我们已经了解的许多工具，还增加了一个重要的主题：记忆。有很多不同类型的[记忆](https://python.langchain.com/en/latest/modules/memory/how_to_guides.html)，可以尝试看看哪种对您最合适。

* **深入研究** - 即将推出
* **示例** - [ChatBase](https://www.chatbase.co/?via=greg) (Affiliate link), [NexusGPT](https://twitter.com/achammah1/status/1649482899253501958?s=20), [ChatPDF](https://www.chatpdf.com/)
* **用例:** 与用户进行实时交互，为用户提供一个可交流的界面，让用户提出自然语言问题

In [61]:
from langchain.llms import OpenAI
from langchain import LLMChain
from langchain.prompts.prompt import PromptTemplate

# Chat specific components
from langchain.memory import ConversationBufferMemory

对于这个用例，我将向您展示如何定制提供给聊天机器人的上下文。

您可以传递有关机器人应如何响应的说明，还可以传递任何其他相关信息。

In [62]:
template = """
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"], 
    template=template
)
memory = ConversationBufferMemory(memory_key="chat_history")

In [63]:
llm_chain = LLMChain(
    llm=OpenAI(openai_api_key=openai_api_key), 
    prompt=prompt, 
    verbose=True, 
    memory=memory
)

In [64]:
llm_chain.predict(human_input="Is an pear a fruit or vegetable?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it


Human: Is an pear a fruit or vegetable?
Chatbot:[0m

[1m> Finished chain.[0m


' Yes, an pear is a fruit of confusion!'

In [65]:
llm_chain.predict(human_input="What was one of the fruits I first asked you about?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it

Human: Is an pear a fruit or vegetable?
AI:  Yes, an pear is a fruit of confusion!
Human: What was one of the fruits I first asked you about?
Chatbot:[0m

[1m> Finished chain.[0m


' I think it was the fruit of knowledge!'

请注意，我的第一次交互被放入了第二次交互的提示中。这就是记忆的作用。

有许多不同的对话结构方式，请查看[文档](https://python.langchain.com/en/latest/use_cases/chatbots.html)中的不同方式。

## 代理

*[LangChain 代理文档](https://python.langchain.com/en/latest/modules/agents.html)*

代理是LLM中最热门的话题之一。代理是决策者，可以查看数据，推理下一步应该采取什么行动，并通过工具执行该行动。

* **深入研究** - [代理简介](https://youtu.be/2xxziIWmaSA?t=1972), [LangChain 代理网络研讨会](https://www.crowdcast.io/c/46erbpbz609r)，更深入的研究即将推出
* **示例** - 待定
* **用例:** 在无需人工输入的情况下自主运行程序

代理的高级用例示例出现在[BabyAGI](https://github.com/yoheinakajima/babyagi)和[AutoGPT](https://github.com/Significant-Gravitas/Auto-GPT)中

In [66]:
# Helpers
import os
import json

from langchain.llms import OpenAI

# Agent imports
from langchain.agents import load_tools
from langchain.agents import initialize_agent

# Tool imports
from langchain.agents import Tool
from langchain.utilities import GoogleSearchAPIWrapper
from langchain.utilities import TextRequestsWrapper

在这个例子中，我将获取谷歌搜索结果。如果您需要一个研究项目的网站列表，您可能会需要这样做。

您可以在下面的网址注册这两个密钥

[GOOGLE_API_KEY](https://console.cloud.google.com/apis/credentials)
[GOOGLE_CSE_ID](https://programmablesearchengine.google.com/controlpanel/create)

In [67]:
GOOGLE_CSE_ID = os.getenv('GOOGLE_CSE_ID', 'YourAPIKeyIfNotSet')
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY', 'YourAPIKeyIfNotSet')

In [68]:
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

初始化您将使用的两个工具。在这个例子中，我们将搜索谷歌，并让LLM能够执行Python代码。

In [69]:
search = GoogleSearchAPIWrapper(google_api_key=GOOGLE_API_KEY, google_cse_id=GOOGLE_CSE_ID)

requests = TextRequestsWrapper()

将您的两个工具放入工具包中

In [70]:
toolkit = [
    Tool(
        name = "Search",
        func=search.run,
        description="useful for when you need to search google to answer questions about current events"
    ),
    Tool(
        name = "Requests",
        func=requests.get,
        description="Useful for when you to make a request to a URL"
    ),
]

通过提供工具、LLM和代理类型来创建您的代理

In [71]:
agent = initialize_agent(toolkit, llm, agent="zero-shot-react-description", verbose=True, return_intermediate_steps=True)

现在问它一个问题，我将给出一个应该让它去谷歌搜索的问题

In [72]:
response = agent({"input":"What is the capital of canada?"})
response['output']



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out what the capital of Canada is.
Action: Search
Action Input: "capital of Canada"[0m
Observation: [36;1m[1;3mLooking to build credit or earn rewards? Compare our rewards, Guaranteed secured and other Guaranteed credit cards. Canada's capital is Ottawa and its three largest metropolitan areas are Toronto, Montreal, and Vancouver. Canada. A vertical triband design (red, white, red) ... Browse available job openings at Capital One - CA. ... Together, we will build one of Canada's leading information-based technology companies – join us, ... Ottawa is the capital city of Canada. It is located in the southern portion of the province of Ontario, at the confluence of the Ottawa River and the Rideau ... Shopify Capital offers small business funding in the form of merchant cash advances to eligible merchants in Canada. If you live in Canada and need ... Download Capital One Canada and enjoy it on your iPhone, iPad

'Ottawa is the capital of Canada.'

太棒了，那是正确的。现在让我们问一个需要列出当前目录的问题

In [73]:
response = agent({"input":"Tell me what the comments are about on this webpage https://news.ycombinator.com/item?id=34425779"})
response['output']



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out what the comments are about
Action: Search
Action Input: "comments on https://news.ycombinator.com/item?id=34425779"[0m
Observation: [36;1m[1;3mAbout a month after we started Y Combinator we came up with the phrase that ... Action Input: "comments on https://news.ycombinator.com/item?id=34425779" .[0m
Thought:[32;1m[1;3m I now know the comments are about Y Combinator
Final Answer: The comments on the webpage are about Y Combinator.[0m

[1m> Finished chain.[0m


'The comments on the webpage are about Y Combinator.'

## 结束

哇！您一直看到了最底部。

接下来该怎么办？

人工智能的世界是巨大的，用例将继续增长。我个人最期待的是我们尚不知道的用例。

我们还应该添加什么到这个列表中？

查看这个[存储库的自述文件](https://github.com/gkamradt/langchain-tutorials)以获取更多灵感
在[YouTube](https://www.youtube.com/@DataIndependent)上查看更多教程

我很想看看您建立的项目。在[Twitter](https://twitter.com/GregKamradt)上@我！

如果您有想要编辑的内容，请查看我们的[贡献指南](https://github.com/gkamradt/langchain-tutorials)并提交PR