# Retrieval Chain

Date: 2024/08/12

LangChain完全入門の本は良かったが、LangChain APIの仕様変更が激しいので、LangChainサイト上のチュートリアルに沿って勉強する。

Reference: https://python.langchain.com/v0.1/docs/get_started/quickstart/

In [8]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")

docs = loader.load()

In [9]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

In [13]:
# これまでChromaDB使ってきたが、FAISSというのもあるのか。。。
#!pip3 install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0.post1-cp311-cp311-macosx_11_0_arm64.whl (6.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0mm
Installing collected packages: faiss-cpu
Successfully installed faiss-cpu-1.8.0.post1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m


In [15]:
# RecursiveCharacterTextSplitterは文字数でテキスト分割。
# SpacyTextSplitterはトークンでテキスト分割。

from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

# FAISSへembeddingsを保持
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector_store = FAISS.from_documents(documents, embeddings)

In [21]:
# Vector Storeを検索してみる
# Reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.faiss.FAISS.html
results = vector_store.similarity_search(query="generative ai",k=5)
for doc in results:
    print(f"* {doc.page_content[:200]} [{doc.metadata}]")

* Skip to main contentGo to API DocsSearchRegionUSEUGo to AppQuick StartUser GuideTracingEvaluationProduction Monitoring & AutomationsPrompt HubProxyPricingSelf-HostingCookbookThis is outdated documenta [{'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'}]
* applications are multi-turn, meaning that they involve a series of interactions between the user and the application. LangSmith provides a threads view that groups traces from a single conversation to [{'source': 'https://docs.smith.langchain.c

In [24]:
# LLM

from langchain_openai import ChatOpenAI
llm = ChatOpenAI()

In [25]:
# Promptのテンプレート

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

In [31]:
# Retrieval の Chain
from langchain.chains import create_retrieval_chain

retriever = vector_store.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [32]:
response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})
print(response["answer"])

# LangSmith offers several features that can help with testing:...

LangSmith can help with testing by allowing developers to create datasets, run tests on LLM applications, upload test cases in bulk, create custom evaluations, compare different configurations, use a playground environment for rapid iteration and experimentation, conduct beta testing to collect real-world performance data, capture feedback from users, annotate traces for evaluation criteria, add runs to datasets for refining and improving performance, closely inspect key data points, monitor key metrics over time, perform A/B testing, set up automations for real-time actions on traces, and group traces from multi-turn conversations for easier tracking and annotation.
