# RAG Search Example

通过 langchain 实现一个简单的 RAG 系统

先准备 LLM 环境

In [1]:
from langchain_openai import ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain.embeddings import HuggingFaceEmbeddings


# LLM model
llm = ChatOpenAI(
    openai_api_base="http://localhost:1234/v1",
    openai_api_key="not-needed"
)

# Embedding Model
modelPath = "BAAI/bge-base-en-v1.5"

# Create a dictionary with model configuration options, specifying to use the CPU for computations
# model_kwargs = {'device':'cpu'}
model_kwargs = {'device':'cuda'}

# Create a dictionary with encoding options, specifically setting 'normalize_embeddings' to False
encode_kwargs = {'normalize_embeddings': True}

# Initialize an instance of HuggingFaceEmbeddings with the specified parameters
embeddings = HuggingFaceEmbeddings(
    model_name=modelPath,     # Provide the pre-trained model's path
    model_kwargs=model_kwargs, # Pass the model configuration options
    encode_kwargs=encode_kwargs # Pass the encoding options
)


# load the data from web
loader = WebBaseLoader("https://python.langchain.com/docs/get_started/introduction")
docs = loader.load()

# vetorize and store
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)

retriever = vector.as_retriever()

定义 prompt 模板

In [2]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

output_parser = StrOutputParser()

定义 langchain 处理流水线

In [3]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

setup_and_retrieval = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
)

chain = setup_and_retrieval | prompt | llm | output_parser

检索资料，并交由LLM处理

In [4]:
chain.invoke("langchain 有什么特点与优势?")

'LangChain 是一个框架，用于开发由语言模型驱动的应用程序。它有以下特点和优势：\n\n1. **应用程序功能**：LangChain 允许应用程序具备上下文感知能力，通过连接语言模型到各种源（如提示说明、示例或内容来支持响应）进行推理。\n\n2. **组件与集成**：框架包含 LangChain 库，提供了可组合的工具和集成，用于语言模型的工作。这些组件是模块化的，易于使用，并且可以独立于整个 LangChain 框架使用。此外，还包含预构建的链和代理，简化了高级任务的实现。\n\n3. **快速入门**：有 Quickstart 指南帮助用户熟悉框架，通过模板快速构建第一个 LangChain 应用程序。\n\n4. **生产友好**：LangSmith 提供了调试、测试、评估和监控功能，确保在生产环境中可以持续改进和自信地部署应用。\n\n5. **易用性与定制**：LCEL 是一个声明式的链编排方式，设计用于支持将原型直接部署到生产中，无需更改代码。它提供了标准接口和模块，如模型I/O、检索和代理工具。\n\n6. **生态系统**：LangChain 链接了一个丰富的生态系统，与其他工具无缝集成，并且易于扩展。\n\n7. **安全**：文档建议遵循最佳实践来确保开发过程的安全。\n\n8. **跨语言支持**：除了Python库外，还有JavaScript LangChain库供选择。\n\n总之，LangChain 提供了一整套工具和平台，简化了构建基于语言模型的应用程序的过程，并提供了强大的功能和灵活性。'

In [5]:
chain.invoke("What is the benefit of using langchain?")

'The LangChain packages provide components and integrations for working with language models in a modular and easy-to-use way, supporting both simple chains like "prompt + LLM" and complex cognitive architectures. This allows developers to build applications that are context-aware and can reason based on provided context. Additionally, LangChain offers off-the-shelf chains and agents for various tasks, making it easy to get started and customize existing chains. The framework also includes a developer platform (LangSmith) for debugging, testing, evaluation, and monitoring chains, and a range of resources for common end-to-end use cases and integrations with other tools in the ecosystem. Overall, LangChain simplifies the application lifecycle by providing a comprehensive set of tools for building applications powered by language models.'