# Vector Stores and Retrievers

here we will familiarize with langchain's vector store and retriever abstractions.These abstractions are designed to support retrieval of data -- from vector databases and other sources -- for integration with LLM work flows. They are important for Applications that fetch data to be reasoned over as part of model inference, as in case of Retrieval - Augumented - Generation

WE WILL COVER:
1. Documents
2. Vector Stores 
3. Retrievers

### Documents 
Langchain implements a Document abstraction , which is intended to represent a unit of text and associated metadata . It has 2 (two) Attributes:
- page_content: a string representing the content,
- metadata: a dict containing arbitary metadata.
the metadata attribute can capture info about the source of document, its rlnship to other documents , and other info. Note that an individualn document object often represents a chunk of a larger document.

In [1]:
## how to go ahead and create document 
from langchain_core.documents import Document

documents=[
    Document(
        page_content="Dogs are great companions , known for their loyality and friendliness",
        metadata={"source":"mammal-pets-doc"},
    ),
    Document(
        page_content="Cats are independent pets that often enjoy their own space",
        metadata={"source":"mammal-pets-doc"},
    ),
    Document(
        page_content="Goldfish are popular pets for beginners,requiring relatively simple care",
        metadata={"source":"fish-pets-doc"},
    ),
    Document(
        page_content="Parrots are intelligent birds capable of mimiking human speech",
        metadata={"source":"bird-pets-doc"},
    ),
    Document(
        page_content="Rabbits are social animals that need plenty of space to hop around",
        metadata={"source":"mammal-pets-doc"},
    )
]

In [2]:
documents

[Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions , known for their loyality and friendliness'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space'),
 Document(metadata={'source': 'fish-pets-doc'}, page_content='Goldfish are popular pets for beginners,requiring relatively simple care'),
 Document(metadata={'source': 'bird-pets-doc'}, page_content='Parrots are intelligent birds capable of mimiking human speech'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around')]

In [3]:
import os
from dotenv import load_dotenv
from langchain_groq import ChatGroq
load_dotenv()
groq_api_key=os.getenv("GROQ_API_KEY")
os.environ["HF_LANGCHAIN_TOKEN"]=os.getenv("HF_LANGCHAIN_TOKEN")

llm=ChatGroq(model="Llama3-8b-8192",groq_api_key=groq_api_key)



In [4]:
## embedding wrt huggingface
from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
## vector store
from langchain_chroma import Chroma

vectorstore=Chroma.from_documents(documents,embedding=embeddings)
vectorstore



<langchain_chroma.vectorstores.Chroma at 0x2370fa2f040>

In [6]:
vectorstore.similarity_search("cat")

[Document(id='26420b1a-63c4-4e59-b100-cc9df5e0f201', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space'),
 Document(id='afcfee5c-e14b-4161-817a-a1b5ebbbb84f', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions , known for their loyality and friendliness'),
 Document(id='4412218b-2d31-4c69-b99b-38506cc5a24a', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around'),
 Document(id='1e4383cc-6fa8-4c43-8b9d-ec94277825ef', metadata={'source': 'bird-pets-doc'}, page_content='Parrots are intelligent birds capable of mimiking human speech')]

In [7]:
## async query
await vectorstore.asimilarity_search("loyal")

[Document(id='afcfee5c-e14b-4161-817a-a1b5ebbbb84f', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions , known for their loyality and friendliness'),
 Document(id='26420b1a-63c4-4e59-b100-cc9df5e0f201', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space'),
 Document(id='4412218b-2d31-4c69-b99b-38506cc5a24a', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around'),
 Document(id='1e4383cc-6fa8-4c43-8b9d-ec94277825ef', metadata={'source': 'bird-pets-doc'}, page_content='Parrots are intelligent birds capable of mimiking human speech')]

In [8]:
vectorstore.similarity_search_with_score("cat")

[(Document(id='26420b1a-63c4-4e59-b100-cc9df5e0f201', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space'),
  0.9436442255973816),
 (Document(id='afcfee5c-e14b-4161-817a-a1b5ebbbb84f', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions , known for their loyality and friendliness'),
  1.554835557937622),
 (Document(id='4412218b-2d31-4c69-b99b-38506cc5a24a', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around'),
  1.5573378801345825),
 (Document(id='1e4383cc-6fa8-4c43-8b9d-ec94277825ef', metadata={'source': 'bird-pets-doc'}, page_content='Parrots are intelligent birds capable of mimiking human speech'),
  1.627744197845459)]

### Retrievers

langchain vectorstore objects do not subclass runnable, and cannot be immediately be integrated into Langchain Expression Language Chains.

LangChain Retrievers are runnables , so they implement a standard set of methods(e.g synchronous and asynchronous invoke and batch operations) and are designed to be incorporated in LCEL chains.

We can create a simple version of this , without subclassing retriever. if we choose what method we wish to use to retrieve dociuments , we can create runnable easily . 



In [None]:
## how to do it 
from typing import List
from langchain_core.documents import Document

from langchain_core.runnables import RunnableLambda


## this below is not good way of getting retrievers from a function
retriever=RunnableLambda(vectorstore.similarity_search).bind(k=1)
retriever.batch(["cat","dog"])


[[Document(id='26420b1a-63c4-4e59-b100-cc9df5e0f201', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space')],
 [Document(id='afcfee5c-e14b-4161-817a-a1b5ebbbb84f', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions , known for their loyality and friendliness')]]

#### 2nd Technique
vectorstore implement as_retriever method that will generate a Retriever , specifically a vectorstore retriever . these retrievers include specific search type and search_kwargs attributes that identify what methods of underlying vector store to call , and how to parameterize them .
for instance we can replicate the above with following:

In [10]:
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k":1}
)
retriever.batch(["cat","dog","fish"])

[[Document(id='26420b1a-63c4-4e59-b100-cc9df5e0f201', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space')],
 [Document(id='afcfee5c-e14b-4161-817a-a1b5ebbbb84f', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions , known for their loyality and friendliness')],
 [Document(id='d8ca7ed4-cae2-43d0-b02e-e2e4a9ebd3de', metadata={'source': 'fish-pets-doc'}, page_content='Goldfish are popular pets for beginners,requiring relatively simple care')]]

In [11]:
## how to integrate retriever along with chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message="""
Answer the question by provided context

{question}

context:
{context}
"""

prompt=ChatPromptTemplate.from_messages(["human",message])

rag_chain={"context":retriever,"question":RunnablePassthrough()}|prompt|llm
response=rag_chain.invoke("tell me about dog")

response


AIMessage(content="Based on the provided context, here's some information about dogs:\n\nDogs are great companions, known for their loyalty and friendliness.", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 92, 'total_tokens': 120, 'completion_time': 0.02409142, 'prompt_time': 0.019756759, 'queue_time': 0.264591046, 'total_time': 0.043848179}, 'model_name': 'Llama3-8b-8192', 'system_fingerprint': 'fp_c0b3855449', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--10344e20-e6aa-49e5-8655-ac6849533121-0', usage_metadata={'input_tokens': 92, 'output_tokens': 28, 'total_tokens': 120})