# Retrieval

Retrieval Augmented Generation (RAG) is the process of inluding data in the prompt to the LLM that was was not part of the language model training data. The overall vector store process looks like:

![flow](https://python.langchain.com/assets/images/data_connection-95ff2033a8faa5f3ba41376c0f6dd32a.jpg)

The components of the vector store RAG process:
1. Document Loaders
2. Document Transformers
3. Text Embedding Models
4. Vector Stores
5. Retrievers

## Document Loaders

These load documents from many different sources. The simpliest type of loader is `TextLoader`, which loads an entire file as a single document. The rest are common loaders you will commonly need:

In [5]:
from langchain.document_loaders import (
  TextLoader, 
  DirectoryLoader, 
  UnstructuredHTMLLoader, 
  JSONLoader,
  UnstructuredMarkdownLoader,
  PyPDFLoader,
  AsyncHtmlLoader,
  WebBaseLoader
)
from langchain.document_loaders.csv_loader import CSVLoader

In [None]:
TextLoader("./data/data.txt").load()

In [None]:
CSVLoader("./data/data.csv").load()

In [None]:
DirectoryLoader("./data/", glob="*.txt", loader_cls=TextLoader).load() # you can pick the loader class

In [None]:
UnstructuredHTMLLoader("./data/data.html").load() # will strip markup and load just the text

In [None]:
JSONLoader(file_path="./data/data.json", jq_schema=".data[].name").load() # returns text not json

In [None]:
UnstructuredMarkdownLoader("./data/data.md").load()

In [None]:
PyPDFLoader("./data/data.pdf").load()

In [None]:
PyPDFLoader("./data/data_long.pdf").load_and_split() # uses RecursiveCharacterTextSplitter to split the document

In [None]:
url = "https://shop.deere.com/us/product/Sherpa-Full-Zip-Jacket/p/SCUALC0593"
loader = AsyncHtmlLoader([url])
loader.load()

In [None]:
loader = WebBaseLoader(url) # a combination of a AsyncHtmlLoader and Html2TextLoader
docs = loader.load() # there is also a loader.aload() for asyncronous loading
print(docs[0].metadata)
print(docs[0])

## Document Transformers

After loading your documents, you will often want to transform them. The most common transformation is splitting the document into smaller chunks. Here are some common transformations:

1. Text splitting
2. Content Transformation
3. Extract Metadata

In [6]:
from typing import Any, Sequence
from doctran import Doctran
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain.document_transformers.openai_functions import create_metadata_tagger
from langchain.schema import Document
from langchain.llms.openai import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import AnalyzeDocumentChain
from langchain_core.documents import BaseDocumentTransformer
from langchain.document_transformers import (
  Html2TextTransformer, 
  BeautifulSoupTransformer, 
  LongContextReorder
)

### Text Splitting

The most common text splitter is `RecursiveCharacterTextSplitter` which splits larger documents into smaller documents. There are a few things to keep mind when deciding how big each document (chunk) should be:

1. You may want to have small documents, so that their embeddings can most accurately reflect their meaning. If too long, then the embeddings can lose meaning.
2. You want to have long enough documents that the context of each chunk is retained.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
  chunk_size=100, # maximium number of characters in a chunk (default: 4000)
  chunk_overlap=20 # overlap between chunks to maintain context in each chunk (default: 200)
)

In [None]:
text_splitter.split_documents(TextLoader("./data/sotu.txt").load())[:4]

In [None]:
# can also be used directly on text
RecursiveCharacterTextSplitter(chunk_size=3, chunk_overlap=1).split_text("this is some text 1 2 3 4")

This is a simple `CharacterTextSplitter` that doesn't use multiple separators to split the text:

In [None]:
CharacterTextSplitter(
  separator="#ENTRY",
  chunk_size=10, # you will notice that the entries are still in their own document 
  chunk_overlap=5
).split_documents(TextLoader("./data/data_split.txt").load())

### Content Transformation

#### HTML to Text

In [None]:
url = "https://shop.deere.com/us/product/Sherpa-Full-Zip-Jacket/p/SCUALC0593"
loader = AsyncHtmlLoader(url)
docs = loader.load()
Html2TextTransformer().transform_documents(docs)

In [None]:
loader = AsyncHtmlLoader(url)
docs = loader.load()
BeautifulSoupTransformer().transform_documents(
  docs,
  tags_to_extract=["main"]
)

#### Summarize

There are a few ways to summarize multiple documents into one:

1. `stuff`: All the documents will be put together and summarized by the LLM.
2. `map_reduce`: Each document will be summarized by the LLM, and the then the LLM will summarize the summaries.
3. `refine`: Will iterate over each document refining the summary until there are no more documents.

In [None]:
sherpa = "https://shop.deere.com/us/product/Sherpa-Full-Zip-Jacket/p/SCUALC0593"
puffer = "https://shop.deere.com/us/product/Puffer-Jacket/p/SCUFLC0082"
loader = AsyncHtmlLoader([sherpa, puffer])
docs = loader.load()
transformed_docs = BeautifulSoupTransformer().transform_documents(
  docs,
  tags_to_extract=["main"]
)
llm = ChatOpenAI(temperature=0)

In [None]:
chain = load_summarize_chain(llm, chain_type="stuff")
chain.run(transformed_docs)

In [None]:
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(transformed_docs)

In [None]:
chain = load_summarize_chain(llm, chain_type="refine")
chain.run(transformed_docs)

If you need more control over the summary prompt, you can provide your own like this:

In [None]:
summary_template = """
Write a concise summary of the following, but DO NOT explicitly say it is a summary:
"{text}"

SUMMARY:
"""
prompt = PromptTemplate.from_template(summary_template)
stuff_chain = load_summarize_chain(llm, "stuff", prompt=prompt)
stuff_chain.run(transformed_docs)

OR

In [None]:
summary_template = """
Write a concise summary of the following, but DO NOT explicitly say it is a summary:
"{text}"

SUMMARY:
"""
prompt = PromptTemplate.from_template(summary_template)
summary_chain = LLMChain(llm=llm, prompt=prompt)
stuff_chain = StuffDocumentsChain(llm_chain=summary_chain, document_variable_name="text")
stuff_chain.run(transformed_docs)

There is also a special chain that can summarize some arbitrary text:

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100,chunk_overlap=20)
chain = load_summarize_chain(llm, chain_type="stuff")
analyze_chain = AnalyzeDocumentChain(combine_docs_chain=chain, text_splitter=text_splitter)
analyze_chain.run(transformed_docs[0].page_content)

There is also a library `doctran` that stands for Document Transformations that uses OpenAI to analyze documents and perform special tasks. One of those tasks is summarization. Currently, LangChain does not offer any wrapper for this doctran task. Here is how that is done:

In [22]:
llm = OpenAI(max_tokens=1000)
output = llm("Provide a details about George Washinton's life.")
len(output)

2394

In [23]:
doctran = Doctran()
doc = doctran.parse(content=output)
doc_summary = doc.summarize(token_limit=500).execute()

In [24]:
summary = doc_summary.transformed_content
len(summary)

1004

We can also create our own custom document transformer that wraps `doctran`:

In [28]:
class SummarizeTransformer(BaseDocumentTransformer):
  def transform_documents(
    self, 
    documents: Sequence[Document], 
    token_limit: int = 200, 
    **kwargs: Any
  ) -> Sequence[Document]:
    doctran = Doctran()
    transformed_docs = []
    for doc in documents:
      doc_summary = doctran.parse(content=doc.page_content).summarize(token_limit=token_limit).execute()
      new_doc = Document(page_content=doc_summary.transformed_content)
      transformed_docs.append(new_doc)
    return transformed_docs

In [None]:
SummarizeTransformer().transform_documents([Document(page_content=output)])

### Reordering (Lost-in-the-Middle)

If there is too much context, some of it can get lost during inference. There are a couple ways of dealing with that. One, is to return less context. Two, sometimes just shuffling the context can also help.

In [72]:
texts = [
  "Call of the Wild (COTW) is a great, vast hunting game.",
  "Way of the Hunter (WOTH) is a modern and good looking hunting game.",
  "Metallica is a great heavy metal band.",
  "VSCode is the best text editor for programming.",
  "God of War is a great single player game."
]

docs = [Document(page_content=x) for x in texts]

In [73]:
docs

[Document(page_content='Call of the Wild (COTW) is a great, vast hunting game.'),
 Document(page_content='Way of the Hunter (WOTH) is a modern and good looking hunting game.'),
 Document(page_content='Metallica is a great heavy metal band.'),
 Document(page_content='VSCode is the best text editor for programming.'),
 Document(page_content='God of War is a great single player game.')]

In [74]:
LongContextReorder().transform_documents(docs)

[Document(page_content='Call of the Wild (COTW) is a great, vast hunting game.'),
 Document(page_content='Metallica is a great heavy metal band.'),
 Document(page_content='God of War is a great single player game.'),
 Document(page_content='VSCode is the best text editor for programming.'),
 Document(page_content='Way of the Hunter (WOTH) is a modern and good looking hunting game.')]

### Extract Metadata

Searching for relevant documents can be greatly improved with the proper metadata associated with the text itself. There are a couple ways that you can add metadata with LangChain.

The first way is by using `OpenAI` functions, which tries to extract the metadata for you:

In [32]:
# you can define the schema with a dict or a pydantic model
schema = {
  "properties": {
      "category": {
        "type": "string",
        "description": "the high-level category of the food being discussed like 'fruit' or 'vegetable' or snack"
      },
      "tone": {
        "type": "string",
        "enum": ["positive", "negative"],
        "description": "the tone of the text like 'positive' or 'negative'"
      }
  },
  "required": ["category", "tone"],
}
llm = ChatOpenAI()
document_transformer = create_metadata_tagger(llm=llm, metadata_schema=schema) # can provide prompt if you want

In [33]:
org_documents = [
  Document(page_content="The apples I ate were delicious."),
  Document(page_content="The brocolli was bad."),
]
enhanced_documents = document_transformer.transform_documents(org_documents)

In [34]:
enhanced_documents

[Document(page_content='The apples I ate were delicious.', metadata={'category': 'fruit', 'tone': 'positive'}),
 Document(page_content='The brocolli was bad.', metadata={'category': 'vegetable', 'tone': 'negative'})]

## Vectore Store of Embeddings

It is very common for unstructured text to be stored as embeddings in a vector database, since it simplifies saving, searching, and retreiving text much easier. This vector store then becomes the primary data source for RAG. It is not the only source of data, but it is a common source.

### Text Embedding Models

These models can produce numerical embeddings of text. The purpose is so that retrieval can happen quickly with the much simplier vector space, which is a compressed representation of the text. There are a number of text embedding models that can be used. Here are a few:

In [7]:
from langchain.embeddings import OpenAIEmbeddings, HuggingFaceBgeEmbeddings
from langchain.vectorstores.chroma import Chroma

In [4]:
openai_model = OpenAIEmbeddings()

In [None]:
hug_model = HuggingFaceBgeEmbeddings(
  model_name="BAAI/bge-small-en",
  model_kwargs={"device": "cpu"},
  encode_kwargs={"normalize_embeddings": True}
)

In [13]:
embeddings = openai_model.embed_documents([
  "this is a test",
  "another test"
])
print(len(embeddings), len(embeddings[0]))

2 1536


In [20]:
embedding = openai_model.embed_query("this is a test") # single string
len(embedding)

1536

In [19]:
embedding = hug_model.embed_query("this is a test")
len(embedding)

384

### Vector Stores

There are a number of possible vector stores, but a few common local ones are `Chroma` and `FAISS`. A common online option is `Pinecone`. All of them implement the `VectorStore` interace. Let's walk through an example using `Chroma`:

In [40]:
docs = [Document(page_content="Michael Jordan is the best basketball player of all time.")]
chroma = Chroma(embedding_function=openai_model)

In [41]:
record_ids = chroma.add_documents(docs)

In [42]:
chroma.get(record_ids) # has embedding but not returned in response by default

{'ids': ['816fd5d2-9abd-11ee-aa64-825fc74ed04f'],
 'embeddings': None,
 'metadatas': [None],
 'documents': ['Michael Jordan is the best basketball player of all time.'],
 'uris': None,
 'data': None}

In [43]:
chroma.similarity_search("jordan", k=1)

[Document(page_content='Michael Jordan is the best basketball player of all time.')]

In [44]:
chroma.similarity_search_with_relevance_scores("jordan", k=1)

[(Document(page_content='Michael Jordan is the best basketball player of all time.'),
  0.7929736264432757)]

In [52]:
# maximal marginal relevance (MMR) optimizes similarity and diversity
chroma.max_marginal_relevance_search("jordan", k=1)

Number of requested results 20 is greater than number of elements in index 1, updating n_results = 1


[Document(page_content='Michael Jordan is one of the best basketball player of all time.', metadata={'type': 'quote'})]

In [48]:
# can update one or more documents
# for some reason requires metadata in document
chroma.update_document(
  record_ids[0], 
  Document(page_content="Michael Jordan is one of the best basketball player of all time.", metadata={"type": "quote"})
)
chroma.get(record_ids[0])

{'ids': ['816fd5d2-9abd-11ee-aa64-825fc74ed04f'],
 'embeddings': None,
 'metadatas': [{'type': 'quote'}],
 'documents': ['Michael Jordan is one of the best basketball player of all time.'],
 'uris': None,
 'data': None}

In [None]:
# it is also possible to create the VectorStore client and add documents at the same time
Chroma.from_texts(["this is a test"], openai_model) # from_documents also available

In [54]:
chroma.delete_collection()

## Retrievers

Retreievers are able to query and retreive documents. Often a retriever uses a `VectorStore`, but not always. Moreover, retreivers are `Runnable`, which means they can be used as part of a `Chain`.

Once we have the data loaded, transformed, and saved into a VectorStore, then we can retrieve relevant documents:

In [8]:
import logging, uuid, json
from dotenv import load_dotenv
from langchain_core.runnables import RunnablePassthrough
from langchain.prompts import ChatPromptTemplate
from langchain.storage import InMemoryStore
from langchain.schema import StrOutputParser, BaseOutputParser
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.retrievers.document_compressors import (
  LLMChainExtractor, 
  LLMChainFilter, 
  EmbeddingsFilter,
  CohereRerank
)
from langchain.retrievers import (
  BM25Retriever, 
  EnsembleRetriever, 
  MultiQueryRetriever,
  ContextualCompressionRetriever,
  ParentDocumentRetriever,
  MultiVectorRetriever
)

In [19]:
load_dotenv()

True

In [14]:
# load
docs = TextLoader("./data/sotu.txt").load()
# transform
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
docs = splitter.split_documents(docs)
# embed and save
chroma = Chroma.from_documents(docs, openai_model)

In [84]:
# retrieve directly
retriever = chroma.as_retriever(search_kwargs = {"k": 2}) # top 2 sources
matching_docs = retriever.invoke("What did the president say about the economy?")

We can retrieve the relevant documents as part of a `chain` injecting the context into the prompt:

In [None]:
# retrieve as part of a chain
prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context:

{context}                                         

Question: {question}                                          
""")

def format_docs(docs: Sequence[Document]) -> str:
  return "\n\n".join([x.page_content for x in docs])

llm = ChatOpenAI()
chain = (
  {"context": retriever | format_docs, "question": RunnablePassthrough()}
  | prompt | llm
)
chain.invoke("What did the president say about the economy?")

### Multi-query Retriever

Instead of relying directly on the original query to retrieve the results, it can be beneficial to ask slightly different queries. The `MultiQueryRetriever` will provide that by doing multiple retrieval calls with subtle query changes and joining them all together.

In [20]:
llm = ChatOpenAI()
chroma = Chroma(embedding_function=openai_model).as_retriever()
retriever = MultiQueryRetriever.from_llm(chroma, llm)

In [None]:
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)
retriever.get_relevant_documents("What did the president say about the economy?")

Original Query: "What did the president say about the economy?"

Variations:
1. Can you provide any information on the president's statements regarding the economy?
2. What are the president's comments about the current state of the economy?
3. I'm interested in knowing the president's views on the economy. Could you share any details?

### Contextual Compression

Sometimes the retrieved documents from a vector store may contain unneeded information that is unrelated to the query, or the document itself may just not be relevant to the query. Compression refers to the process of only keeping the relevant part of the document and relevent documents. This is often needed since LLMs will have input size limits.

First, let's see just the common compressor `LLMChainExtractor`, which extract important parts of the document:

In [6]:
content = [
  "We’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees.",
   "Let’s pass the Paycheck Fairness Act and paid leave."
]
docs = [Document(page_content=x) for x in content]

In [None]:
compressor = LLMChainExtractor.from_llm(llm)
compressed_docs = compressor.compress_documents(docs, "fairness act")

In [16]:
len(compressed_docs)

2

A slightly different version of the compressor is the `LLMChainFilter`, which will remove the documents that are not important to the query:

In [None]:
compressor = LLMChainFilter.from_llm(llm)
compressed_docs = compressor.compress_documents(docs, "paid leave")

In [14]:
len(compressed_docs)

1

Thus far we have seen compressors that rely upon the LLM to perform the work. There is also the `EmbeddingsFilter` which will use the embedding model to compare the documents and remove those that fall under a threshold. This approach is much faster than using the LLM:

In [115]:
compressor = EmbeddingsFilter(embeddings=openai_model, similarity_threshold=0.80)
compressed_docs = compressor.compress_documents(docs, "fairness act")
len(compressed_docs)

1

These methods can produce good results, but the best way to compress is to rerank the documents using a dedicated reranking model, like the one provided by `Cohere` or an open source model. The reason why this is better is because it is faster than using a LLM and it maintains full semantic context unlike embeddings which lose fidelity during the embedding process. Here is how to do it with Cohere:

In [None]:
rerank = CohereRerank()
rerank.compress_documents(docs, "fairness act")

Now, let's combine the compressor with a retriever to see how it works. This is done using the `ContextualCompressionRetriever`:

In [None]:
chroma = Chroma(embedding_function=openai_model).as_retriever()
retriever = ContextualCompressionRetriever(base_compressor=rerank, base_retriever=chroma)
retriever.get_relevant_documents("What did the president say about the economy?")

### Ensemble Retriever

From a high-level this retriever simply uses multiple retrievers, combines their results, reranks them using Reciprocal Rank Fusion and returns a single list of documents. Practically, this approach is normally used with diverse types of retrievers. For example, one can use a sparse database like OpenSearch or any BM25 algorithm and a dense database like the vector stores we have been using. This type of strategy is called a "hybrid search". We will use BM25 and Chroma as our retrievers in this example.

> Note: The Reciprocal Rank Fusion (RRF) algorithm is a simple but effective way to combine lists from multiple sources, where each source has its own way of scoring. In other words, RRF creates a new scoring strategy by combining disparate scores.  

In [22]:
docs = [
  "I like apples",
  "I like oranges",
  "Apples and oranges are fruits"
]
bm25 = BM25Retriever.from_texts(docs)
bm25.k = 2
chroma = Chroma.from_texts(docs, openai_model).as_retriever(search_kwargs = {"k": 2})

In [23]:
retriever = EnsembleRetriever(
  retrievers=[bm25, chroma],
  weights=[0.5, 0.5]
)

In [24]:
retriever.get_relevant_documents("apples")

[Document(page_content='I like apples'),
 Document(page_content='Apples and oranges are fruits')]

### Parent Document Retriever

When storing documents into a vector store, you will usually have multiple smaller documents (chunks). This makes search faster and more effective; however, during retrieval, you may want to have the original document the chunk originated from. That is what the `ParentDocumentRetriever` is used for. Keep in mind, if you intend on retrieving the original documents, then you need to save them using this retriever also.

In [68]:
# from a single document, two documents get created due to the splitting
splitter = CharacterTextSplitter(
  separator="#ENTRY",
  chunk_size=10,
  chunk_overlap=5
)
docs = TextLoader("./data/data_split.txt").load()

In [69]:
len(docs[0].page_content) # original document size

144

In [70]:
# remove collection if exists already
Chroma(collection_name="full_documents").delete_collection()

In [71]:
chroma = Chroma(collection_name="full_documents", embedding_function=openai_model)
store = InMemoryStore()
retriever = ParentDocumentRetriever(
  vectorstore=chroma, 
  docstore=store, 
  child_splitter=splitter,
  search_kwargs={"k": 2}
)

In [None]:
# retriever splits and stores with document metadata
retriever.add_documents(docs)

In [73]:
list(store.yield_keys()) # only one source document

['87e834e7-be05-4ca1-a684-2ba26189c4d8']

In [74]:
docs = chroma.similarity_search("entry", k=2)
print(len(docs)) # result is two documents
docs[0].metadata

2


{'doc_id': '87e834e7-be05-4ca1-a684-2ba26189c4d8',
 'source': './data/data_split.txt'}

In [75]:
# returns one document event though there are two documents in vector store
found_docs = retriever.get_relevant_documents("entry")
(len(found_docs), len(found_docs[0].page_content))

(1, 144)

### Multi-Vector Retriever

We actually just saw a multi-vector retriever: `ParentDocumentRetriever`. In fact, that class implements the `MultiVectorRetriever` class. We'll use this base class to satisfy a couple use-caes that benefit from multiple vectors.

#### Summary
The first use-case is the desire to search the summary of a document, but still return the original document when retrieved.

In [86]:
splitter = RecursiveCharacterTextSplitter(
  chunk_size=200,
  chunk_overlap=10
)
docs = splitter.split_documents(TextLoader("./data/sotu.txt").load())

In [100]:
summary_prompt = ChatPromptTemplate.from_template("Summarize the following document:\n\n{doc}")
chain = (
  {"doc": lambda x: x.page_content}
  | summary_prompt
  | ChatOpenAI(max_retries=0)
  | StrOutputParser()
)

In [101]:
summaries = chain.batch(docs, {"max_concurrency": 5})

In [102]:
print("Org:", len(docs[0].page_content), "Summary:", len(summaries[0]))

Org: 200 Summary: 121


In [117]:
Chroma(collection_name="summaries").delete_collection()

In [118]:
chroma = Chroma(collection_name="summaries", embedding_function=openai_model)
store = InMemoryStore() # storage for parent documents
id_key = "doc_id"
retriever = MultiVectorRetriever(
  vectorstore=chroma, 
  docstore=store, 
  id_key=id_key
)

In [119]:
# give each summary document a unique document ID
doc_ids = [str(uuid.uuid4()) for _ in docs]
summary_docs = [
  Document(page_content=s, metadata={id_key: doc_ids[i]})
  for i, s in enumerate(summaries)
]

In [120]:
vector_ids = retriever.vectorstore.add_documents(summary_docs)
chroma.get(vector_ids[0])

{'ids': ['fdc7e216-9b92-11ee-9647-825fc74ed051'],
 'embeddings': None,
 'metadatas': [{'doc_id': 'c6de8e6b-6323-4ba6-a7ec-a217e5cc9bb9'}],
 'documents': ['On February 4, 2020, President Donald J. Trump delivered a State of the Union address before a joint session of Congress.'],
 'uris': None,
 'data': None}

In [143]:
# this will save the second vector to storage, which relates to the embedded vector metadata
doc_ids_to_doc = list(zip(doc_ids, docs))
retriever.docstore.mset(doc_ids_to_doc)

In [141]:
# this is the summarized version of the document
result = chroma.similarity_search("economic")[0]
summary_doc_id = result.metadata["doc_id"]
(result.metadata, len(result.page_content))

({'doc_id': '39f921df-7703-4d09-872a-7b17d2632758'}, 937)

In [142]:
# this is the orignal document before summarizing
result = retriever.get_relevant_documents("economic")[0]
(summary_doc_id, result.metadata, len(result.page_content))

('39f921df-7703-4d09-872a-7b17d2632758', {'source': './data/sotu.txt'}, 199)

#### Hypothetical Queries

An LLM chat conversation is usually a series of questions from the human and answers from the LLM. Therefore, it can be beneficial to understand what types of questions the document in a RAG setup can answer. In other words, during RAG we want to search for relevant documents based off of the potential questions the document can answer. That is what this example use-case of `MultiVectorRetriever` will demonstrate:

In [178]:
class JsonOutputParser(BaseOutputParser):
  def parse(self, text: str) -> list:
    try:
      return json.loads(text)
    except:
      return []

In [180]:
question_template = """
Respond in JSON format. The response should only be the array and should not be an object. 

Here is an exmaple response:
["Question 1?", "Question 2?", "Question 3?"]

Generate a JSON list of exactly 3 hypothetical questions that the below document could be used to answer:
{doc}
"""
questions_prompt = ChatPromptTemplate.from_template(question_template)
chain = (
    {"doc": lambda x: x.page_content}
    | questions_prompt
    | ChatOpenAI(max_retries=0, model="gpt-3.5-turbo")
    | JsonOutputParser()
)

In [181]:
chain.invoke(docs[0])

["What was the date of Donald J. Trump's State of the Union address in 2020?",
 "What is the title of Donald J. Trump's speech before a Joint Session of Congress in 2020?",
 'How did Donald J. Trump open his speech before a Joint Session of Congress in 2020?']

In [183]:
questions = chain.batch(docs, {"max_concurrency": 5})

In [185]:
(len(docs), len(questions), 3 * len(docs))

(279, 279, 837)

In [182]:
Chroma(collection_name="questions").delete_collection()

In [186]:
chroma = Chroma(collection_name="questions", embedding_function=openai_model)
store = InMemoryStore()
id_key = "doc_id"
retriever = MultiVectorRetriever(
  vectorstore=chroma, 
  docstore=store, 
  id_key=id_key
)

In [187]:
doc_ids = [str(uuid.uuid4()) for _ in docs]
# every question becomes it's own document associated with the original document
question_docs = []
for i, question_list in enumerate(questions):
  question_docs.extend([
    Document(page_content=q, metadata={id_key: doc_ids[i]})
    for q in question_list
  ])

In [189]:
len(question_docs)

794

In [190]:
retriever.vectorstore.add_documents(question_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))

In [196]:
# shows the question that is actually being matched
result = chroma.similarity_search("economic")[0]
summary_doc_id = result.metadata["doc_id"]
(result.metadata["doc_id"], len(result.page_content), result.page_content)

('f46ebe1a-0af1-463a-90d3-84d3df38572e',
 41,
 'What is the current state of the economy?')

In [194]:
# shows the original document the question was generated from
result = retriever.get_relevant_documents("economic")[0]
(summary_doc_id, result.metadata, len(result.page_content))

('f46ebe1a-0af1-463a-90d3-84d3df38572e', {'source': './data/sotu.txt'}, 189)

### Self-Querying Vector Stores

There is another powerful feature of many vector stores like `Chroma` that we haven't used: metadata. This metadata can be helpful to filter the vector store to improve search results. You could filter on metadata yourself, but this makes the process easier. The `SelfQueryRetriever` class requires the package `lark` to work.

In [6]:
# some test documents about movie reviews
docs = [
  Document(
    page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
    metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
  ),
  Document(
    page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
    metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
  ),
  Document(
    page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
    metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
  ),
  Document(
    page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
    metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
  ),
  Document(
    page_content="Toys come alive and have a blast doing so",
    metadata={"year": 1995, "genre": "animated"},
  ),
  Document(
    page_content="Three men walk into the Zone, three men walk out of the Zone",
    metadata={
      "year": 1979,
      "director": "Andrei Tarkovsky",
      "genre": "thriller",
      "rating": 9.9,
    },
  ),
]
chroma = Chroma.from_documents(docs, openai_model)

In [7]:
# descriptions of the metadata fields
metadata_field_info = [
  AttributeInfo(
    name="genre",
    description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']",
    type="string"
  ),
  AttributeInfo(
    name="year",
    description="The year the movie was released",
    type="integer"
  ),
  AttributeInfo(
    name="director",
    description="The name of the movie director",
    type="string"
  ),
  AttributeInfo(
    name="rating", description="A 1-10 rating for the movie", type="float"
  )
]
document_content_description = "Brief summary of a movie"

In [8]:
llm = ChatOpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
  llm,
  chroma,
  document_content_description,
  metadata_field_info
)

In [10]:
# querying only by filter
results = retriever.invoke("I want to watch a movie rated higher than 8.5")
[r.metadata for r in results]

[{'director': 'Andrei Tarkovsky',
  'genre': 'thriller',
  'rating': 9.9,
  'year': 1979},
 {'director': 'Satoshi Kon', 'rating': 8.6, 'year': 2006}]

In [11]:
# querying by document content and filter
results = retriever.invoke("Has Greta Gerwig directed any movies about women")
[r.metadata for r in results]

[{'director': 'Greta Gerwig', 'rating': 8.3, 'year': 2019}]

### Web Research Retriever

This is a cool, but a bit restrictive retriever that will take a question, ask the LLM to generate 3 questions optimized for search, search Google, store the results in a vector store, search the vector store for the original question, and include the relevant documents (RAG) along with the original question to the LLM. 

This requires the `google-api-python-client` and `html2text` package since it uses Google Search directly.

> Note: having the source returned does not appear to be working

In [54]:
from langchain.retrievers.web_research import WebResearchRetriever
from langchain.utilities.google_search import GoogleSearchAPIWrapper
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.chains.qa_with_sources import stuff_prompt

In [36]:
openai_model = OpenAIEmbeddings()
Chroma(collection_name="web_research").delete_collection()
chroma = Chroma(embedding_function=openai_model, collection_name="web_research")

In [23]:
llm = ChatOpenAI(temperature=0)
search = GoogleSearchAPIWrapper()

In [38]:
web_research_retriever = WebResearchRetriever.from_llm(
  vectorstore=chroma,
  llm=llm,
  search=search
)

In [None]:
# see how retriever fetches documents from the web
user_input = "How do LLM Powered Autonomous Agents work?"
logging.basicConfig()
logging.getLogger("langchain.retrievers.web_research").setLevel(logging.INFO)
web_docs = web_research_retriever.get_relevant_documents(user_input)
logging.getLogger("langchain.retrievers.web_research").setLevel(logging.ERROR)

In [59]:
web_docs

[Document(page_content="# Agent System Overview#\n\nIn a LLM-powered autonomous agent system, LLM functions as the agent's brain,\ncomplemented by several key components:\n\n  * **Planning**\n    * Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\n    * Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n  * **Memory**\n    * Short-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\n    * Long-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\n  * **Tool use**\n    * The agent learns to call external APIs for extra information 

In [48]:
# see how retriever stores documents in the vector store
chroma.get(chroma.get()["ids"][0])

{'ids': ['373fb67c-9c97-11ee-87ab-825fc74ed051'],
 'embeddings': None,
 'metadatas': [{'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:',
   'language': 'en',
   'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'title': "LLM Powered Autonomous Agents | Lil'Log"}],
 'documents': ["Lil'Log\n\n  * Posts\n  * Archive\n  * Search\n  * Tags\n  * FAQ\n  * emojisearch.app\n\n#  LLM Powered Autonomous Agents\n\nDate: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng\n\nTable of Contents\n\n  * Agent System

In [60]:
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(
  llm=llm,
  retriever=web_research_retriever
)

In [51]:
result = qa_chain({"question": user_input})

In [29]:
result

{'question': 'How do LLM Powered Autonomous Agents work?',
 'answer': "LLM-powered autonomous agents work by using LLM as the agent's brain, complemented by several key components such as planning, memory, and tool use. The planning component involves task decomposition and self-reflection. The memory component includes short-term memory and long-term memory. The tool use component allows the agent to call external APIs for additional information. LLM-powered agents have been used in various applications, including scientific discovery and generative agents simulation. They have the potential to be powerful general problem solvers. \n",
 'sources': ''}

Debugging what is happening inside of `RetrievalQAWithSourcesChain`:

In [57]:
stuft_chain = StuffDocumentsChain(
  llm_chain=LLMChain(llm=llm, prompt=stuff_prompt.PROMPT),
  document_variable_name="summaries",
  document_prompt=stuff_prompt.EXAMPLE_PROMPT 
)
combined_docs = stuft_chain.run(input_documents=web_docs, question=user_input)

In [79]:
import regex as re
answer, sources = re.split(r"SOURCES?:|QUESTION:\s", combined_docs, flags=re.IGNORECASE)[:2]
source = re.split(r"\n", sources)[0].strip()
(combined_docs, answer, sources, source)

("LLM-powered autonomous agents work by using LLM as the agent's brain, complemented by several key components such as planning, memory, and tool use. The planning component involves task decomposition and self-reflection. The memory component includes short-term memory and long-term memory. The tool use component allows the agent to call external APIs for additional information. LLM-powered agents have been used in various applications, including scientific discovery and generative agents simulation. They have the potential to be powerful general problem solvers. \nSOURCES: \n- https://lilianweng.github.io/posts/2023-06-23-agent/",
 "LLM-powered autonomous agents work by using LLM as the agent's brain, complemented by several key components such as planning, memory, and tool use. The planning component involves task decomposition and self-reflection. The memory component includes short-term memory and long-term memory. The tool use component allows the agent to call external APIs for 

### Retriever Tool within Agent

What if we want an agent to be able to also use RAG? Yes, we can create a tool that uses a retriever `chain`:

In [82]:
from langchain.agents.agent_toolkits import create_retriever_tool, create_conversational_retrieval_agent

In [86]:
chroma = Chroma(embedding_function=openai_model, collection_name="web_research").as_retriever()
# a helper function to create a retriever Tool
tool = create_retriever_tool(chroma, "search_internet", "Will search the internet for relevant documents.")
tools = [tool]

In [83]:
agent = create_conversational_retrieval_agent(llm, tools, verbose=True)

In [None]:
result = agent({"input": "What is an LLM agent?"})

In [None]:
result