# Retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

Interface:
- Input: A Query (string)
- Output: A list of documents (standardized LangChain Document objects)

Common retrievers include:
- Vector store retrievers
- Search api retrievers
- Relational database retrievers


In [1]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_ollama import OllamaEmbeddings

### Loading Documents

In [2]:
loaders = [
    TextLoader("data/langchain.md"),
    TextLoader("data/langchain2.md"),
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

## Retrieving Documents

**Conflicting needs in document retrieval:**

- Need for small chunks to maintain embedding accuracy
- Need for longer chunks to preserve context

Steps:
1. Split and store small chunks of data.
2.	The retriever first fetches the small chunks.
3.	It then looks up the parent IDs for those chunks.
4.	Finally, it returns the larger documents.

In [3]:
# Define a text splitter that will be used to create child documents from larger parent documents.
child_splitter = RecursiveCharacterTextSplitter(chunk_size=500)

# Initialize a vector store named "full_documents" which will index the child chunks of the documents.
# The OllamaEmbeddings model "snowflake-arctic-embed:33m" is used to generate embeddings for these chunks.
vectorstore = Chroma(
    collection_name="full_documents", embedding_function=OllamaEmbeddings(model="snowflake-arctic-embed:33m")
)
# Set up an in-memory storage layer that will store the parent documents.
store = InMemoryStore()

# Create a retriever that uses the previously defined vector store, document store, and child splitter.
# This retriever will be able to fetch relevant parent documents based on queries and split them into child chunks as needed.
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

In [4]:
retriever.add_documents(docs, ids=None)
list(store.yield_keys())

['6052ec2b-3bec-4ed6-ab9e-73148f36109a',
 '5a539fb2-1e79-43ff-9c87-8862cd0897b7']

In [6]:
sub_docs = vectorstore.similarity_search("What is LangChian", k=1)
print(sub_docs)

[Document(metadata={'doc_id': '6052ec2b-3bec-4ed6-ab9e-73148f36109a', 'source': 'data/langchain.md'}, page_content='| Parameter      | Description                                                                                                                                                                                                                                                                                                                                                          |')]


In [10]:
retrieved_docs = retriever.invoke("What is LangChian")
print(len(retrieved_docs[0].page_content))
print(retrieved_docs)

21344
[Document(metadata={'source': 'data/langchain.md'}, page_content='Title: Chat models | 🦜️🔗 LangChain\n\nURL Source: https://python.langchain.com/docs/concepts/chat_models/\n\nMarkdown Content:\nOverview[\u200b](https://python.langchain.com/docs/concepts/chat_models/#overview "Direct link to Overview")\n-------------------------------------------------------------------------------------------------------\n\nLarge Language Models (LLMs) are advanced machine learning models that excel in a wide range of language-related tasks such as text generation, translation, summarization, question answering, and more, without needing task-specific fine tuning for every scenario.\n\nModern LLMs are typically accessed through a chat model interface that takes a list of [messages](https://python.langchain.com/docs/concepts/messages/) as input and returns a [message](https://python.langchain.com/docs/concepts/messages/) as output.\n\nThe newest generation of chat models offer additional capabilit

## Retrieving Large Chunks

In [11]:
# This text splitter is used to create the parent documents
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
# This text splitter is used to create the child documents
# It should create documents smaller than the parent
child_splitter = RecursiveCharacterTextSplitter(chunk_size=500)

# The vectorstore to use to index the child chunks
vectorstore = Chroma(
    collection_name="split_parents", embedding_function=OllamaEmbeddings(model="snowflake-arctic-embed:33m")
)
# The storage layer for the parent documents
store = InMemoryStore()

### ParentDocumentRetriever
    - Splits and stores small chunks for embedding/indexing
    - During retrieval, fetches small chunks first
    - Then looks up and returns the parent documents of those chunks

In [12]:
# Create a retriever that uses the previously defined vector store, document store, child splitter, and parent splitter.
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

In [13]:
# Add documents to the retriever
retriever.add_documents(docs)

# Get the total number of keys in the store
len(list(store.yield_keys()))

22

In [16]:
sub_docs = vectorstore.similarity_search("what is LangChain used for", k=5)

print(sub_docs)

[Document(metadata={'doc_id': '50054b35-f67a-4614-8e08-58340bb732b8', 'source': 'data/langchain.md'}, page_content='LangChain provides a consistent interface for working with chat models from different providers while offering additional features for monitoring, debugging, and optimizing the performance of applications that use LLMs.'), Document(metadata={'doc_id': '50054b35-f67a-4614-8e08-58340bb732b8', 'source': 'data/langchain.md'}, page_content='*   Standard API for [structuring outputs](https://python.langchain.com/docs/concepts/structured_outputs/#structured-output-method) via the `with_structured_output` method.\n*   Provides support for [async programming](https://python.langchain.com/docs/concepts/async/), [efficient batching](https://python.langchain.com/docs/concepts/runnables/#optimized-parallel-execution-batch), [a rich streaming API](https://python.langchain.com/docs/concepts/streaming/).'), Document(metadata={'doc_id': '0d93b96d-ff50-430d-b2fd-8fb862d224d9', 'source': 'd

In [17]:
retrieved_docs = retriever.invoke("what is LangChain used for")

print(len(retrieved_docs[0].page_content))
print(retrieved_docs[0].page_content)

1989
LangChain provides a consistent interface for working with chat models from different providers while offering additional features for monitoring, debugging, and optimizing the performance of applications that use LLMs.

*   Integrations with many chat model providers (e.g., Anthropic, OpenAI, Ollama, Microsoft Azure, Google Vertex, Amazon Bedrock, Hugging Face, Cohere, Groq). Please see [chat model integrations](https://python.langchain.com/docs/integrations/chat/) for an up-to-date list of supported models.
*   Use either LangChain's [messages](https://python.langchain.com/docs/concepts/messages/) format or OpenAI format.
*   Standard [tool calling API](https://python.langchain.com/docs/concepts/tool_calling/): standard interface for binding tools to models, accessing tool call requests made by models, and sending tool results back to the model.
*   Standard API for [structuring outputs](https://python.langchain.com/docs/concepts/structured_outputs/#structured-output-method) via

## Putting it all together 

In [18]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_ollama import ChatOllama

model = ChatOllama(model='llama3.2:1b')

In [19]:
template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)  
print(prompt)

input_variables=['context', 'question'] input_types={} partial_variables={} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\n\n{context}\n\nQuestion: {question}\n'), additional_kwargs={})]


In [21]:
# Function to format documents by joining their content
def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])  

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()} 
    | prompt  # Apply the prompt template
    | model  # Use the language model to generate a response
    | StrOutputParser()  # Parse the output string
)

print(chain.invoke("What is LangChain"))  

LangChain is a library that provides a consistent interface for working with chat models from different providers while offering additional features for monitoring, debugging, and optimizing the performance of applications that use Large Language Models (LLMs).
