# Retrievers

--------------------------------------------------------------------------------------------------------

* A retriever is a tool or component used to search and fetch relevant documents or data from a large collection, often based on a query or a prompt.

* LangChain VectorStore objects do not subclass Runnable, and so cannot immediately be integrated into LangChain Expression Language Chains (LCEL)

* LangChain Retrievers are Runnables, so thet implement a standard set of methods (eg: synchronous and asynchronous invoke and batch operations) and are designed to be incorporated in LCEL chains.

* We can create a simple version of this ourselves, without subclassing retrievers. if we choose what method we wish to use to retrieve documents, we can create a runnable easily. below we will build one around the similarity search.  

------------------------------------------------------------------------------------------------------

### Types of Retrievers

##### Vector-based retrievers: 

* These convert the text into vectors and use similarity search to find the most relevant documents.
    
##### Keyword-based retrievers: 
    
* These search for exact matches or partial matches of words in documents (like a traditional search engine).

------------------------------------------------------------------------------------------------------

In [None]:
from typing import List
from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

# first cat will be given to runnablelamba and similarity search will be done and top similarity is selected then dog comes in..
retriever = RunnableLambda(vectorstore.similarity_search).bind(k=1) # k=1 means select top result
retriever.batch(["cat","dog"])

### Step 1 : Import Neccessary Libraries and API Keys

In [18]:
# To Import the .env file
import os
from dotenv import load_dotenv
load_dotenv()

# Import the API keys from .env file
os.environ['GROQ_API_KEY'] = os.getenv("GROQ_API_KEY") # Acess Groq to interact with llm in Groq platform.

#Import necessary Libraries
from langchain_chroma import Chroma #importing Chromadb to store vectors


### Step 2 : Create a document (hardcoded)

In [21]:
from langchain_core.documents import Document

documents = [
    Document(page_content="Dogs are loyal animals and make great companions for humans.",
             metadata={"source": "pet facts", "author": "John Doe", "date": "2024-03-10"}),

    Document(page_content="Cats sleep for about 12-16 hours a day and love to climb high places.",
             metadata={"source": "pet behavior", "author": "Jane Smith", "date": "2024-02-25"}),

    Document(page_content="Parrots are intelligent birds that can mimic human speech and sounds.",
             metadata={"source": "bird care", "author": "Alice Johnson", "date": "2023-12-15"}),

    Document(page_content="Goldfish have a memory span of at least three months and can recognize their owners.",
             metadata={"source": "fish facts", "author": "Dr. Emily Brown", "date": "2024-01-20"}),

    Document(page_content="Rabbits need plenty of space to hop around and should not be kept in small cages.",
             metadata={"source": "rabbit care", "author": "Michael Lee", "date": "2024-03-05"})
]
print(documents)

[Document(metadata={'source': 'pet facts', 'author': 'John Doe', 'date': '2024-03-10'}, page_content='Dogs are loyal animals and make great companions for humans.'), Document(metadata={'source': 'pet behavior', 'author': 'Jane Smith', 'date': '2024-02-25'}, page_content='Cats sleep for about 12-16 hours a day and love to climb high places.'), Document(metadata={'source': 'bird care', 'author': 'Alice Johnson', 'date': '2023-12-15'}, page_content='Parrots are intelligent birds that can mimic human speech and sounds.'), Document(metadata={'source': 'fish facts', 'author': 'Dr. Emily Brown', 'date': '2024-01-20'}, page_content='Goldfish have a memory span of at least three months and can recognize their owners.'), Document(metadata={'source': 'rabbit care', 'author': 'Michael Lee', 'date': '2024-03-05'}, page_content='Rabbits need plenty of space to hop around and should not be kept in small cages.')]


### Step 3 : Embedding using Huggingface model

In [22]:
from langchain_huggingface import HuggingFaceEmbeddings # use huggingface embedding model

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-V2") # download a model from huggingface
embeddings

HuggingFaceEmbeddings(model_name='all-MiniLM-L6-V2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

### Step 4 : Store the Vector in ChromaDB

In [23]:
from langchain_chroma import Chroma # import Chroma vectordb to store embeddings

# All documents will convert into vectors and then get stored in chroma db
vectorstore = Chroma.from_documents(documents,embeddings)
vectorstore

<langchain_chroma.vectorstores.Chroma at 0x1903f812a40>

### Step 5.1 : Similarity Search (or)

* synchronous call, meaning it waits for the search to finish before moving on.

In [24]:
vectorstore.similarity_search("cat") # retrieve all the documents relevant to cat

[Document(id='b8e7226a-ab92-4513-84a5-80f3da200e16', metadata={'author': 'Jane Smith', 'date': '2024-02-25', 'source': 'pet behavior'}, page_content='Cats sleep for about 12-16 hours a day and love to climb high places.'),
 Document(id='4a83cccb-20ca-44c8-afe4-b65a57064dfa', metadata={'author': 'Jane Smith', 'date': '2024-02-25', 'source': 'pet behavior'}, page_content='Cats sleep for about 12-16 hours a day and love to climb high places.'),
 Document(id='0c8a6f6f-8fb1-4c4a-a8ae-8dcac07e4c35', metadata={'author': 'John Doe', 'date': '2024-03-10', 'source': 'pet facts'}, page_content='Dogs are loyal animals and make great companions for humans.'),
 Document(id='cdb5382f-3ef7-4e80-8dd7-30f19f07b54d', metadata={'author': 'John Doe', 'date': '2024-03-10', 'source': 'pet facts'}, page_content='Dogs are loyal animals and make great companions for humans.')]

### Step 5.2 : Asimilarity Search (Async Query)

* asynchronous call, allowing the program to do other tasks while waiting for the search to finish.

In [None]:
await vectorstore.similarity_search("cat") 

### Step 5.3 : Similarity search with score

In [28]:
vectorstore.similarity_search_with_score("cat") # retrieve all the documents relevant to cat along with similarity score

[(Document(id='b8e7226a-ab92-4513-84a5-80f3da200e16', metadata={'author': 'Jane Smith', 'date': '2024-02-25', 'source': 'pet behavior'}, page_content='Cats sleep for about 12-16 hours a day and love to climb high places.'),
  1.1713635921478271),
 (Document(id='4a83cccb-20ca-44c8-afe4-b65a57064dfa', metadata={'author': 'Jane Smith', 'date': '2024-02-25', 'source': 'pet behavior'}, page_content='Cats sleep for about 12-16 hours a day and love to climb high places.'),
  1.1713635921478271),
 (Document(id='0c8a6f6f-8fb1-4c4a-a8ae-8dcac07e4c35', metadata={'author': 'John Doe', 'date': '2024-03-10', 'source': 'pet facts'}, page_content='Dogs are loyal animals and make great companions for humans.'),
  1.4782705307006836),
 (Document(id='cdb5382f-3ef7-4e80-8dd7-30f19f07b54d', metadata={'author': 'John Doe', 'date': '2024-03-10', 'source': 'pet facts'}, page_content='Dogs are loyal animals and make great companions for humans.'),
  1.4782705307006836)]

# Step 6 : Retrievers

### Technique 1 : Convert Vectorstore to Retrievers using RunnableLambda

In [29]:
from typing import List
from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

# first cat will be given to runnablelamba and similarity search will be done and top similarity is selected then dog comes in..
retriever = RunnableLambda(vectorstore.similarity_search).bind(k=1) # k=1 means select top result
retriever.batch(["cat","dog"])

[[Document(id='b8e7226a-ab92-4513-84a5-80f3da200e16', metadata={'author': 'Jane Smith', 'date': '2024-02-25', 'source': 'pet behavior'}, page_content='Cats sleep for about 12-16 hours a day and love to climb high places.')],
 [Document(id='cdb5382f-3ef7-4e80-8dd7-30f19f07b54d', metadata={'author': 'John Doe', 'date': '2024-03-10', 'source': 'pet facts'}, page_content='Dogs are loyal animals and make great companions for humans.')]]

* Using RunnableLambda the vectorstore is converted to retrievers

### Technique 2 : Convert Vectorstore into Retriever using as_retriever (bestway)

* The as_retriever method turns a VectorStore into a Retriever, specifically a VectorStoreRetriever.

* It uses search_type and search_kwargs to tell the retriever how to search in the VectorStore, like which search methods to use and how to fine-tune the search process.

In [31]:
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k":1}
)
retriever.batch(["cat","dog"])

[[Document(id='b8e7226a-ab92-4513-84a5-80f3da200e16', metadata={'author': 'Jane Smith', 'date': '2024-02-25', 'source': 'pet behavior'}, page_content='Cats sleep for about 12-16 hours a day and love to climb high places.')],
 [Document(id='cdb5382f-3ef7-4e80-8dd7-30f19f07b54d', metadata={'author': 'John Doe', 'date': '2024-03-10', 'source': 'pet facts'}, page_content='Dogs are loyal animals and make great companions for humans.')]]

### Step 7 : PromptTemplate

In [33]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message="""
Answer this question using the provided context only.

{question}

context:
{context}
"""

prompt = ChatPromptTemplate.from_messages([("human",message)])

### Step 8 : Call a model from Groq Platform

In [34]:
from langchain_groq import ChatGroq # Import to use Groq's AI models with ultra-fast LPU inference

model = ChatGroq(model="Llama3-8b-8192") # Check the website to find the model name
model

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x00000190410A0130>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x00000190410A1060>, model_name='Llama3-8b-8192', model_kwargs={}, groq_api_key=SecretStr('**********'))

### Step 9 : Chain

In [36]:
chain = {"context":retriever,"question":RunnablePassthrough()}|prompt|model
response = chain.invoke("tell me about dogs")
print(response.content)

According to the provided context, dogs are loyal animals and make great companions for humans.
