## Vector Retrievers



Langchain's vector store and Retriever abstractions. 

These abstractions are designed to support retrieval of data -- from vector databases and other sources --for integration with LLM workflows.

They are important for applications that fetch data to be reasoned over as part of model inference, as in the case of Retrieval Augmented Generation. 




### Documents 

Langchain implements a Document abstraction, which is intended to represent a unit of text and associated metadata. It has 2 attributes:

- page_content: a string representing the content,
- metadata: a dict containing arbitary metadata. The metadata attribute can capture informaton about source of the document, its relatonship to other documents and other information. Note that an individual Document object often represents a chunk of larger document.



In [1]:
import os 
import pydantic.v1
from dotenv import load_dotenv 
load_dotenv() 

groq_api_key= os.getenv("GROQ_API_KEY")

In [2]:
from langchain_core.documents import Document

documents = [

Document(
    page_content="Dogs are great companions known for loyalty & friendliness.",
    metadata={"source":"mammal-pets-doc"}
),
Document(
    page_content="Cats are independent creatures, often appreciated for their grace and agility.",
    metadata={"source":"feline-facts-doc"}
),

Document(
    page_content="Birds are known for their ability to fly, though not all species have this capability.",
    metadata={"source":"avian-traits-doc"}
),

Document(
    page_content="Horses have been domesticated by humans for centuries, primarily for transportation and labor.",
    metadata={"source":"equine-history-doc"}
),

Document(
    page_content="Rabbits are social animals, often kept as pets due to their gentle nature and soft fur.",
    metadata={"source":"small-mammals-doc"}
),
]


documents

[Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions known for loyalty & friendliness.'),
 Document(metadata={'source': 'feline-facts-doc'}, page_content='Cats are independent creatures, often appreciated for their grace and agility.'),
 Document(metadata={'source': 'avian-traits-doc'}, page_content='Birds are known for their ability to fly, though not all species have this capability.'),
 Document(metadata={'source': 'equine-history-doc'}, page_content='Horses have been domesticated by humans for centuries, primarily for transportation and labor.'),
 Document(metadata={'source': 'small-mammals-doc'}, page_content='Rabbits are social animals, often kept as pets due to their gentle nature and soft fur.')]

In [4]:
os.environ["HUGGINGFACE_TOKEN"] = os.getenv("HUGGINGFACE_TOKEN")

In [5]:
from langchain_groq import ChatGroq

model=ChatGroq(model="llama3-groq-70b-8192-tool-use-preview",groq_api_key=groq_api_key)

model

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7916741966e0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x791674195720>, model_name='llama3-groq-70b-8192-tool-use-preview', model_kwargs={}, groq_api_key=SecretStr('**********'))

## Converting text in to Embedding 

In [18]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

embeddings

  from tqdm.autonotebook import tqdm, trange


HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [19]:
### Vector Store


from langchain_chroma import Chroma

vectorStore = Chroma.from_documents(documents,embedding=embeddings)

vectorStore

<langchain_chroma.vectorstores.Chroma at 0x79156b8eba60>

In [20]:
vectorStore.similarity_search("cat")

[Document(metadata={'source': 'feline-facts-doc'}, page_content='Cats are independent creatures, often appreciated for their grace and agility.'),
 Document(metadata={'source': 'small-mammals-doc'}, page_content='Rabbits are social animals, often kept as pets due to their gentle nature and soft fur.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions known for loyalty & friendliness.'),
 Document(metadata={'source': 'equine-history-doc'}, page_content='Horses have been domesticated by humans for centuries, primarily for transportation and labor.')]

In [21]:
## Async Query by vectorestore 

await vectorStore.asimilarity_search("cat")

[Document(metadata={'source': 'feline-facts-doc'}, page_content='Cats are independent creatures, often appreciated for their grace and agility.'),
 Document(metadata={'source': 'small-mammals-doc'}, page_content='Rabbits are social animals, often kept as pets due to their gentle nature and soft fur.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions known for loyalty & friendliness.'),
 Document(metadata={'source': 'equine-history-doc'}, page_content='Horses have been domesticated by humans for centuries, primarily for transportation and labor.')]

In [22]:
vectorStore.similarity_search_with_score("cat")

[(Document(metadata={'source': 'feline-facts-doc'}, page_content='Cats are independent creatures, often appreciated for their grace and agility.'),
  0.9895869493484497),
 (Document(metadata={'source': 'small-mammals-doc'}, page_content='Rabbits are social animals, often kept as pets due to their gentle nature and soft fur.'),
  1.5084611177444458),
 (Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions known for loyalty & friendliness.'),
  1.5920161008834839),
 (Document(metadata={'source': 'equine-history-doc'}, page_content='Horses have been domesticated by humans for centuries, primarily for transportation and labor.'),
  1.7771875858306885)]

### Retriever 

Langchain Vector Store objects do not subclass Runnable, and so cannot immediately be integrated into 
Langchain Expression Language chains ( LCEL ).

Langchain Retrievers are Runnables, so they can implement a standard set of methods 
( e.g synchronous and asyncrhonous invoke and batch operations )  are designed to be incorporated in LCEL chain 





We can create simple version of this ourselves, without subclassing Retriever. 
If we choose what method we wish to use to retriever documents, we can create a runnable easily.  

In [29]:
from typing import List 
from langchain_core.documents  import Document
from langchain_core.runnables import RunnableLambda



retriever = RunnableLambda(vectorStore.similarity_search).bind(k=1)  

retriever.batch(["cat","dog",'bird']) 

[[Document(metadata={'source': 'feline-facts-doc'}, page_content='Cats are independent creatures, often appreciated for their grace and agility.')],
 [Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions known for loyalty & friendliness.')],
 [Document(metadata={'source': 'avian-traits-doc'}, page_content='Birds are known for their ability to fly, though not all species have this capability.')]]

Vectorstores implement an as_retriever method that will generate a Retriever, specifically a Vector Retriever.    
These retrievers include specific search_type and search_kwargs attributes that identify what methods of the underlying vector store to call , and how to parameterize them.    
For instance, we can replicate the above 

In [30]:
vectorStore.as_retriever(

search_type="similarity",
search_kwargs = {'k':1}
)

retriever.batch(["cat","dog",'bird']) 

[[Document(metadata={'source': 'feline-facts-doc'}, page_content='Cats are independent creatures, often appreciated for their grace and agility.')],
 [Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions known for loyalty & friendliness.')],
 [Document(metadata={'source': 'avian-traits-doc'}, page_content='Birds are known for their ability to fly, though not all species have this capability.')]]

In [34]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
Answer this question using provided context only. 
{question}

Context:
{context}
"""


prompt = ChatPromptTemplate.from_messages( [('human',message)] )


rag_chain = {
    "context":retriever,"question":RunnablePassthrough()} | prompt | model  # Runnable pass through means -> Yeah info Invoke ke time millega 



response = rag_chain.invoke("tell me about dogs")

print(response.content)

Dogs are great companions known for loyalty and friendliness.
