# 使用Langchain、Tair和OpenAI进行问答本笔记本介绍了如何使用Langchain、Tair作为知识库和OpenAI嵌入来实现问答系统。如果您对Tair不熟悉，最好先查看[Getting_started_with_Tair_and_OpenAI.ipynb](Getting_started_with_Tair_and_OpenAI.ipynb)笔记。本笔记本展示了一个端到端的过程：- 使用OpenAI API计算嵌入。- 将嵌入存储在Tair实例中以构建知识库。- 将原始文本查询转换为使用OpenAI API的嵌入。- 使用Tair在创建的集合中执行最近邻搜索以找到一些上下文。- 请求LLM在给定上下文中找到答案。所有步骤都将简化为调用一些相应的Langchain方法。

## 先决条件为了完成这个练习，我们需要准备一些东西：[Tair云实例](https://www.alibabacloud.com/help/en/tair/latest/what-is-tair)。[Langchain](https://github.com/hwchase17/langchain)作为一个框架。一个OpenAI API密钥。

### 安装要求该笔记本需要以下Python包：`openai`、`tiktoken`、`langchain`和`tair`。- `openai` 提供了方便访问OpenAI API的功能。- `tiktoken` 是用于OpenAI模型的快速BPE分词器。- `langchain` 帮助我们更轻松地构建具有LLM的应用程序。- `tair` 库用于与tair向量数据库进行交互。

In [1]:
! pip install openai tiktoken langchain tair 

Looking in indexes: http://sg.mirrors.cloud.aliyuncs.com/pypi/simple/
[0m

### 准备您的OpenAI API密钥OpenAI API密钥用于对文档和查询进行向量化。如果您没有OpenAI API密钥，可以从[https://platform.openai.com/account/api-keys](https://platform.openai.com/account/api-keys)获取。获取到密钥后，请使用getpass添加。

In [1]:
import getpassopenai_api_key = getpass.getpass("Input your OpenAI API key:")

Input your OpenAI API key:········


### 准备您的Tair URL要建立Tair连接，您需要拥有`TAIR_URL`。

In [2]:
# URL 格式：redis://[[username]:[password]]@localhost:6379/0TAIR_URL = getpass.getpass("Input your tair url:")

Input your tair url:········


## 加载数据在这一部分，我们将加载包含一些自然问题和它们的答案的数据。所有的数据将被用来创建一个以Tair为知识库的Langchain应用程序。

In [4]:
import wget# 所有示例均来自 https://ai.google.com/research/NaturalQuestions。# 这是我们下载并提取的一些训练集样本。# 进一步加工。wget.download("https://storage.googleapis.com/dataset-natural-questions/questions.json")wget.download("https://storage.googleapis.com/dataset-natural-questions/answers.json")

100% [..............................................................................] 95372 / 95372

'answers (2).json'

In [5]:
import jsonwith open("questions.json", "r") as fp:    questions = json.load(fp)with open("answers.json", "r") as fp:    answers = json.load(fp)

In [6]:
print(questions[0])

when is the last episode of season 8 of the walking dead


In [7]:
print(answers[0])

No . overall No. in season Title Directed by Written by Original air date U.S. viewers ( millions ) 100 `` Mercy '' Greg Nicotero Scott M. Gimple October 22 , 2017 ( 2017 - 10 - 22 ) 11.44 Rick , Maggie , and Ezekiel rally their communities together to take down Negan . Gregory attempts to have the Hilltop residents side with Negan , but they all firmly stand behind Maggie . The group attacks the Sanctuary , taking down its fences and flooding the compound with walkers . With the Sanctuary defaced , everyone leaves except Gabriel , who reluctantly stays to save Gregory , but is left behind when Gregory abandons him . Surrounded by walkers , Gabriel hides in a trailer , where he is trapped inside with Negan . 101 `` The Damned '' Rosemary Rodriguez Matthew Negrete & Channing Powell October 29 , 2017 ( 2017 - 10 - 29 ) 8.92 Rick 's forces split into separate parties to attack several of the Saviors ' outposts , during which many members of the group are killed ; Eric is critically injure

## 链定义Langchain已经集成了Tair，并对给定的文档列表执行所有索引操作。在我们的情况下，我们将存储我们拥有的一组答案。

In [7]:
from langchain.vectorstores import Tairfrom langchain.embeddings import OpenAIEmbeddingsfrom langchain import VectorDBQA, OpenAIembeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)doc_store = Tair.from_texts(    texts=answers, embedding=embeddings, tair_url=TAIR_URL,)

在这个阶段，所有可能的答案都已经存储在Tair中，因此我们可以定义整个问答链。

In [8]:
llm = OpenAI(openai_api_key=openai_api_key)qa = VectorDBQA.from_chain_type(    llm=llm,    chain_type="stuff",    vectorstore=doc_store,    return_source_documents=False,)



## 搜索数据一旦数据被放入Tair中，我们就可以开始提出一些问题。一个问题将会被OpenAI模型自动转换为向量，然后使用创建的向量在Tair中找到一些可能匹配的答案。一旦检索到，最相似的答案将被合并到发送给OpenAI大型语言模型的提示中。

In [9]:
import randomrandom.seed(52)selected_questions = random.choices(questions, k=5)

In [12]:
import timefor question in selected_questions:    print(">", question)    print(qa.run(question), end="\n\n")    # 由于速率限制，请等待20秒。    time.sleep(20)

> where do frankenstein and the monster first meet
 Frankenstein and the monster first meet in the mountains.

> who are the actors in fast and furious
 The actors in Fast & Furious are Vin Diesel ( Dominic Toretto ), Paul Walker ( Brian O'Conner ), Michelle Rodriguez ( Letty Ortiz ), Jordana Brewster ( Mia Toretto ), Tyrese Gibson ( Roman Pearce ), Ludacris ( Tej Parker ), Lucas Black ( Sean Boswell ), Sung Kang ( Han Lue ), Gal Gadot ( Gisele Yashar ), and Dwayne Johnson ( Luke Hobbs ).

> properties of red black tree in data structure
 The properties of a red-black tree in data structure are that each node is either red or black, the root is black, if a node is red then both its children must be black, and every path from a given node to any of its descendant NIL nodes contains the same number of black nodes.

> who designed the national coat of arms of south africa
 Iaan Bekker

> caravaggio's death of the virgin pamela askew
 I don't know.



### 自定义提示模板Langchain中的`stuff`链类型使用特定的提示，其中包含问题和上下文文档。这是默认提示的样式：```text使用以下上下文片段来回答最后的问题。如果你不知道答案，只需说你不知道，不要试图凭空编造答案。{context}问题：{question}有用的回答：```然而，我们可以提供自定义的提示模板，并改变OpenAI LLM的行为，同时仍然使用`stuff`链类型。重要的是保持`{context}`和`{question}`作为占位符。#### 尝试自定义提示我们可以尝试使用不同的提示模板，这样模型：1. 如果知道答案，就用一个简短的句子回答。2. 如果不知道问题的答案，就建议一个随机的歌曲标题。

In [13]:
from langchain.prompts import PromptTemplatecustom_prompt = """Use the following pieces of context to answer the question at the end. Please providea short single-sentence summary answer only. If you don't know the answer or if it'snot present in given context, don't try to make up an answer, but suggest me a randomunrelated song title I could listen to.Context: {context}Question: {question}Helpful Answer:"""custom_prompt_template = PromptTemplate(    template=custom_prompt, input_variables=["context", "question"])

In [14]:
custom_qa = VectorDBQA.from_chain_type(    llm=llm,    chain_type="stuff",    vectorstore=doc_store,    return_source_documents=False,    chain_type_kwargs={"prompt": custom_prompt_template},)

In [15]:
random.seed(41)for question in random.choices(questions, k=5):    print(">", question)    print(custom_qa.run(question), end="\n\n")    # 由于速率限制，请等待20秒。    time.sleep(20)

> what was uncle jesse's original last name on full house
Uncle Jesse's original last name on Full House was Cochran.

> when did the volcano erupt in indonesia 2018
The given context does not mention any volcanic eruption in Indonesia in 2018. Suggested song title: "The Heat Is On" by Glenn Frey.

> what does a dualist way of thinking mean
Dualism means the belief that there is a distinction between the mind and the body, and that the mind is a non-extended, non-physical substance.

> the first civil service commission in india was set up on the basis of recommendation of
The first Civil Service Commission in India was not set up on the basis of the recommendation of the Election Commission of India's Model Code of Conduct.

> how old do you have to be to get a tattoo in utah
You must be at least 18 years old to get a tattoo in Utah.

