# Vector store and retrievers

1. Documents
2. Vector store
3. Retriever

# 1. Documents

LangChain implements a Document abstraction, which is intended to represent a unit of text and associated metadata. It has two attributes:

- page_content: a string representing the content;
- metadata: a dict containing arbitrary metadata.
The metadata attribute can capture information about the source of the document, its relationship to other documents, and other information. Note that an individual Document object often represents a chunk of a larger document.

In [2]:
from langchain_core.documents import Document

documents = [
    Document(
        page_content="Dogs are great companions, known for their loyalty and friendliness.",
        metadata={"source": "mammal-pets-doc"},
    ),
    Document(
        page_content="Cats are independent pets that often enjoy their own space.",
        metadata={"source": "mammal-pets-doc"},
    ),
    Document(
        page_content="Goldfish are popular pets for beginners, requiring relatively simple care.",
        metadata={"source": "fish-pets-doc"},
    ),
    Document(
        page_content="Parrots are intelligent birds capable of mimicking human speech.",
        metadata={"source": "bird-pets-doc"},
    ),
    Document(
        page_content="Rabbits are social animals that need plenty of space to hop around.",
        metadata={"source": "mammal-pets-doc"},
    ),
]

In [3]:
documents

[Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
 Document(metadata={'source': 'fish-pets-doc'}, page_content='Goldfish are popular pets for beginners, requiring relatively simple care.'),
 Document(metadata={'source': 'bird-pets-doc'}, page_content='Parrots are intelligent birds capable of mimicking human speech.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around.')]

# 2. Vectore store

In [10]:
import os
from dotenv import load_dotenv
load_dotenv()


groq_api_key=os.getenv("GROQ_API_KEY")

os.environ["HF_TOKEN"]=os.getenv("HF_TOKEN")

In [11]:
from langchain_chroma import Chroma
from langchain_groq import ChatGroq

In [12]:
#CALLING LLM MODELS

llm=ChatGroq(groq_api_key=groq_api_key,model="Llama3-8b-8192")
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x000001CCEDF3FE80>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001CCEDF82230>, model_name='Llama3-8b-8192', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [15]:
#convert text into vectors using hugging face embeddings
from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

In [16]:
vector_store=Chroma.from_documents(documents,embedding=embeddings)

In [17]:
vector_store

<langchain_chroma.vectorstores.Chroma at 0x1ccedda33a0>

In [20]:
#apply similarity search
#async query

vector_store.asimilarity_search("cat")

<coroutine object VectorStore.asimilarity_search at 0x000001CC9E016570>

In [22]:
#similarirty serach with score

vector_store.similarity_search_with_score("cat")

[(Document(id='ed9ad70f-833a-4f1f-8573-fc7618ee1773', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
  0.9351057410240173),
 (Document(id='920d4de9-7a0c-49ec-9208-0b268252efed', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
  1.574089765548706),
 (Document(id='61474ef6-3227-49c5-9d82-ec1fea626af6', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around.'),
  1.5956902503967285),
 (Document(id='3b08e10c-96b9-4ea8-9ee5-36dfd8a1da5d', metadata={'source': 'bird-pets-doc'}, page_content='Parrots are intelligent birds capable of mimicking human speech.'),
  1.665792465209961)]

# 3. Retrievers

In [25]:
#Retrievers are runnable , so they implement a standard set of methods and are designed to be incorporated in LCEL chains.

from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

In [26]:
retriever=RunnableLambda(vector_store.similarity_search).bind(k=1)
retriever.batch(["cat","dog"])

[[Document(id='ed9ad70f-833a-4f1f-8573-fc7618ee1773', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.')],
 [Document(id='920d4de9-7a0c-49ec-9208-0b268252efed', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]]

In [27]:
# we use 2nd method of retriever (as_retriever)

retriever=vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k":1}
)
retriever.batch(["cat","dog"])

[[Document(id='ed9ad70f-833a-4f1f-8573-fc7618ee1773', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.')],
 [Document(id='920d4de9-7a0c-49ec-9208-0b268252efed', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]]

In [28]:
## RAG
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
Answer this question using the provided context only.

{question}

Context:
{context}
"""
prompt = ChatPromptTemplate.from_messages([("human", message)])

rag_chain={"context":retriever,"question":RunnablePassthrough()}|prompt|llm

response=rag_chain.invoke("tell me about dogs")
print(response.content)


According to the provided context, dogs are great companions, known for their loyalty and friendliness.


In [30]:
response=rag_chain.invoke("tell me about parrot")
print(response.content)

According to the provided context, parrots are intelligent birds capable of mimicking human speech.
