简介

这是该 `LangChain` 极简入门系列的最后一讲。我们将利用过去9讲学习的知识，来完成一个具备完整功能集的LLM应用。该应用基于 `LangChain` 框架，以某 `PDF` 文件的内容为知识库，提供给用户基于该文件内容的问答能力。

我们利用 `LangChain` 的QA chain，结合 `Chroma` 来实现PDF文档的语义化搜索。示例代码所引用的是[AWS Serverless
Developer Guide](https://docs.aws.amazon.com/pdfs/serverless/latest/devguide/serverless-core.pdf)，该PDF文档共84页。

本讲的完整代码请参考[10_Example.jpynb](./10_Example.ipynb)

安装必要的 `Python` 包

In [None]:
!pip install langchain == 0.0.235 openai chromadb pymupdf tiktoken

设置OpenAI环境
环境变量设置好key

下载PDF文件AWS Serverless Developer Guide

In [13]:
PDF_NAME = './file/demo.pdf'

加载PDF文件

In [14]:
from langchain.document_loaders import PyMuPDFLoader

docs = PyMuPDFLoader(PDF_NAME).load()

print(f'There are {len(docs)} document(s) in {PDF_NAME}.')
print(f'There are {len(docs[0].page_content)} characters in the first page of your document.')

There are 45 document(s) in ./file/demo.pdf.
There are 67 characters in the first page of your document.


拆分文档并存储文本嵌入的向量数据

In [15]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(docs)

embeddings = OpenAIEmbeddings()

vectorstore = Chroma.from_documents(split_docs, embeddings, collection_name="serverless_guide")

基于OpenAI创建QA链

In [26]:
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

llm = OpenAI(temperature=0)
chain = load_qa_chain(llm, chain_type="stuff")

基于提问，进行相似性查询

In [27]:
query = "报价义务概览是什么?"
similar_docs = vectorstore.similarity_search(query, 5, include_metadata=True)

In [28]:
similar_docs

[Document(page_content='3\n一、做市商报价义务豁免........................................................................................23\n二、做市服务激励.................................................................................................. 24\n三、终止做市商做市服务........................................................................................24\n四、终止开展做市业务........................................................................................... 24\n五、日常监管......................................................................................................... 25\n六、违规处理......................................................................................................... 25\n第五章\n风险管理......................................................................................................... 26\n一、做市业务部门风险管理机制............................................................................. 26\n二、公司层面风险管理机制.................................................................................... 27\n第六

基于相关文档，利用QA链完成回答

In [29]:
chain.run(input_documents=similar_docs, question=query)

' 报价义务概览是本所根据上市基金资产类别设置做市服务报价义务的要求，具体要求如表1所示。'