# **Retrievers**

## **What's Covered?**
- Introduction to Retrievers
    - Specify Top k
    - Specify Top k and Search Type
    - Maximum Marginal Relevance Retrieval
    - Similarity Score Threshold Retrieval
- Building an End-to-End RAG Chain
    - Step 1: Initialize the Chroma DB Connection
    - Step 2: Create a Retriever Object
    - Step 3: Initialize a Chat Prompt Template
    - Step 4: Initialize a Generator (i.e. Chat Model)
    - Step 5: Initialize a Output Parser
    - Step 6: Define a RAG Chain
    - Step 7: Invoke the Chain

## **Building an End-to-End RAG Chain**

**Step 1: Initialize the Chroma DB Connection**  
**Step 2: Create a Retriever Object**   
**Step 3: Initialize a Chat Prompt Template**  
**Step 4: Initialize a Generator (i.e. Chat Model)**  
**Step 5: Initialize a Output Parser**   
**Step 6: Define a RAG Chain**  
**Step 7: Invoke the Chain**

In [9]:
# Step 1: Initialize the Chroma DB Connection

from langchain_chroma import Chroma

# Initialize the database connection
# If database exist, it will connect with the collection_name and persist_directory
# Otherwise a new collection will be created
db = Chroma(collection_name="vector_database", 
            embedding_function=embedding_model, 
            persist_directory="./chroma_db_")

# We can check the already existing values
print(len(db.get()["ids"]))

1004


In [10]:
# Step 2: Create a Retriever Object 

retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 3})

In [11]:
# Step 3: Initialize a Chat Prompt Template

from langchain_core.prompts import ChatPromptTemplate

PROMPT_TEMPLATE = """
Answer the question based only on the following context:
{context}
Answer the question based on the above context: {question}.
Provide a detailed answer.
Don’t justify your answers.
Don’t give information not mentioned in the CONTEXT INFORMATION.
Do not say "according to the context" or "mentioned in the context" or similar.
"""

prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)

In [12]:
# Step 4: Initialize a Generator (i.e. Chat Model)

from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY)

In [13]:
# Step 5: Initialize a Output Parser

from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

In [15]:
# Step 6: Define a RAG Chain

from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)
    
rag_chain = {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt_template | chat_model | parser

In [17]:
# Invoke the Chain

query = 'Who is Rachem?'

rag_chain.invoke(query)

'Rachem is not a person, it is a misinterpretation of the name Rachel. The speaker is questioning what the term "Rachem" means and if it is a term related to paleontology. They clarify that they wouldn\'t know because they are just a waitress.'

In [18]:
# Invoke the Chain

query = 'What is there on the List comparing Rachel and Julie?'

rag_chain.invoke(query)

'Rachel is described as being a waitress, while Julie is mentioned to have a lot in common with the speaker as they are both paleontologists.'