#### Vector store and retrivers 

In this notebook we familiarize with langchain's vectore store and retriever abstraction . These abstraction are designes to support retrival of data from vector databases and other sources  for integration  with with llm workflows . They are important for application that fetch data to be reasoned over of model inference as in the case of RAG

we will se
- Document
- Vector stores 
- Retrievers
 


### Document

Documents module for data retrieval and processing workflows.

This module provides core abstractions for handling data in retrieval-augmented generation (RAG) pipelines, vector stores, and document processing workflows.


**Key distinction:** 

###### Documents (this module): For data retrieval and processing workflows

Vector stores, retrievers, RAG pipelines
Text chunking, embedding, and semantic search
Example: Chunks of a PDF stored in a vector database


##### Content Blocks (messages.content): For LLM conversational I/O

Multimodal message content sent to/from models
Tool calls, reasoning, citations within chat
Example: An image sent to a vision model in a chat message (via ImageContentBlock)

In [41]:
from langchain_core.documents import Document

documents = [
    Document(
        page_content="LangChain supports prompt templates and chains.",
        metadata={"source": "docs", "page": 1}
    ),
    Document(
        page_content="LangChain integrates with vector databases.",
        metadata={"source": "docs", "page": 2}
    ),
    Document(
        page_content="LangChain provides memory and message history for conversational applications.",
        metadata={"source": "docs", "page": 3}
    ),
    Document(
        page_content="LangChain enables Retrieval-Augmented Generation (RAG) by combining LLMs with external knowledge.",
        metadata={"source": "docs", "page": 4}
    ),
    Document(
        page_content="LangChain supports agents that can decide which tools to use based on user queries.",
        metadata={"source": "docs", "page": 5}
    ),
    Document(
        page_content="Dogs are loyal and friendly animals that have been domesticated for thousands of years. They are known for their strong bonds with humans and their ability to be trained for various tasks.",
        metadata={"source": "animals", "page": 6}
    ),
    Document(
        page_content="Dogs come in many breeds with different sizes, colors, and temperaments. Popular breeds include Labrador Retrievers, German Shepherds, Golden Retrievers, and Bulldogs.",
        metadata={"source": "animals", "page": 7}
    ),
    Document(
        page_content="Dogs require regular exercise, proper nutrition, and veterinary care to stay healthy. They are social animals that thrive on interaction with their owners and other dogs.",
        metadata={"source": "animals", "page": 8}
    )
]


In [42]:
documents

[Document(metadata={'source': 'docs', 'page': 1}, page_content='LangChain supports prompt templates and chains.'),
 Document(metadata={'source': 'docs', 'page': 2}, page_content='LangChain integrates with vector databases.'),
 Document(metadata={'source': 'docs', 'page': 3}, page_content='LangChain provides memory and message history for conversational applications.'),
 Document(metadata={'source': 'docs', 'page': 4}, page_content='LangChain enables Retrieval-Augmented Generation (RAG) by combining LLMs with external knowledge.'),
 Document(metadata={'source': 'docs', 'page': 5}, page_content='LangChain supports agents that can decide which tools to use based on user queries.'),
 Document(metadata={'source': 'animals', 'page': 6}, page_content='Dogs are loyal and friendly animals that have been domesticated for thousands of years. They are known for their strong bonds with humans and their ability to be trained for various tasks.'),
 Document(metadata={'source': 'animals', 'page': 7}, 

In [43]:
import os 
from  dotenv import load_dotenv
load_dotenv()


# Step 1 load API Key
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")

## for langsmith tracking
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT_NAME"] = "LangchainFramework"


In [44]:
## step 2 - Model Configured 

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
from langchain_groq import ChatGroq

llm = ChatGroq(
    model="meta-llama/llama-4-scout-17b-16e-instruct", 
    temperature=0
)


In [45]:
## just check model is working 
response = llm.invoke("what is python?")
print(response.content)

**Python** is a high-level, interpreted programming language that is widely used for various purposes such as:

* **Web Development**: Building web applications and web services using popular frameworks like Django and Flask.
* **Data Analysis and Science**: Performing data analysis, machine learning, and data visualization using libraries like NumPy, pandas, and scikit-learn.
* **Automation**: Automating tasks and processes using Python scripts.
* **Game Development**: Creating games using libraries like Pygame and Panda3D.
* **Education**: Teaching programming concepts and principles due to its simplicity and readability.

**Key Features of Python:**

* **Easy to Learn**: Python has a simple syntax and is relatively easy to learn, making it a great language for beginners.
* **High-Level Language**: Python abstracts away many low-level details, allowing developers to focus on the logic of their program.
* **Interpreted Language**: Python code is executed line by line, making it easier

In [46]:
## Step 3 loading hugging face emmbedding model   all-MiniLM-L6-v2
os.environ["HUGGINGFACEHUB_API_TOKEN"]=os.getenv("HUGGINGFACEHUB_API_TOKEN")

from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

In [47]:
!pip install -U langchain-chroma

Collecting langchain-chroma
  Using cached langchain_chroma-1.1.0-py3-none-any.whl.metadata (1.9 kB)
Collecting chromadb<2.0.0,>=1.3.5 (from langchain-chroma)
  Using cached chromadb-1.4.1-cp39-abi3-win_amd64.whl.metadata (7.3 kB)
Collecting build>=1.0.3 (from chromadb<2.0.0,>=1.3.5->langchain-chroma)
  Using cached build-1.4.0-py3-none-any.whl.metadata (5.8 kB)
Collecting pybase64>=1.4.1 (from chromadb<2.0.0,>=1.3.5->langchain-chroma)
  Using cached pybase64-1.4.3-cp314-cp314-win_amd64.whl.metadata (9.1 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb<2.0.0,>=1.3.5->langchain-chroma)
  Using cached posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
INFO: pip is looking at multiple versions of chromadb to determine which version is compatible with other requirements. This could take a while.
Collecting chromadb<2.0.0,>=1.3.5 (from langchain-chroma)
  Using cached chromadb-1.4.0-cp39-abi3-win_amd64.whl.metadata (7.3 kB)
  Using cached chromadb-1.3.7-cp39-abi3-win_amd64.whl.metadata (7.

ERROR: Cannot install langchain-chroma because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts


In [48]:


from langchain_community.vectorstores import FAISS
vectorstore=FAISS.from_documents(documents,embeddings)
vectorstore


<langchain_community.vectorstores.faiss.FAISS at 0x1cadb89bed0>

In [49]:
## simple query
vectorstore.similarity_search('cat')

[Document(id='37bb4191-6106-4fd1-85c0-9d7feda58032', metadata={'source': 'animals', 'page': 6}, page_content='Dogs are loyal and friendly animals that have been domesticated for thousands of years. They are known for their strong bonds with humans and their ability to be trained for various tasks.'),
 Document(id='fc6ae43f-6446-44cf-bdfb-bfafee281780', metadata={'source': 'animals', 'page': 7}, page_content='Dogs come in many breeds with different sizes, colors, and temperaments. Popular breeds include Labrador Retrievers, German Shepherds, Golden Retrievers, and Bulldogs.'),
 Document(id='07fae842-87d9-4d0d-98d8-f7420bd09070', metadata={'source': 'animals', 'page': 8}, page_content='Dogs require regular exercise, proper nutrition, and veterinary care to stay healthy. They are social animals that thrive on interaction with their owners and other dogs.'),
 Document(id='45f1d284-5129-461a-97d3-971e6e7b7471', metadata={'source': 'docs', 'page': 3}, page_content='LangChain provides memory 

In [50]:
## Async query
await vectorstore.asimilarity_search('cat')

[Document(id='37bb4191-6106-4fd1-85c0-9d7feda58032', metadata={'source': 'animals', 'page': 6}, page_content='Dogs are loyal and friendly animals that have been domesticated for thousands of years. They are known for their strong bonds with humans and their ability to be trained for various tasks.'),
 Document(id='fc6ae43f-6446-44cf-bdfb-bfafee281780', metadata={'source': 'animals', 'page': 7}, page_content='Dogs come in many breeds with different sizes, colors, and temperaments. Popular breeds include Labrador Retrievers, German Shepherds, Golden Retrievers, and Bulldogs.'),
 Document(id='07fae842-87d9-4d0d-98d8-f7420bd09070', metadata={'source': 'animals', 'page': 8}, page_content='Dogs require regular exercise, proper nutrition, and veterinary care to stay healthy. They are social animals that thrive on interaction with their owners and other dogs.'),
 Document(id='45f1d284-5129-461a-97d3-971e6e7b7471', metadata={'source': 'docs', 'page': 3}, page_content='LangChain provides memory 

In [51]:
## if you want to see similarity search with score you can use
## simple query
vectorstore.similarity_search_with_score('cat')

[(Document(id='37bb4191-6106-4fd1-85c0-9d7feda58032', metadata={'source': 'animals', 'page': 6}, page_content='Dogs are loyal and friendly animals that have been domesticated for thousands of years. They are known for their strong bonds with humans and their ability to be trained for various tasks.'),
  np.float32(1.5415628)),
 (Document(id='fc6ae43f-6446-44cf-bdfb-bfafee281780', metadata={'source': 'animals', 'page': 7}, page_content='Dogs come in many breeds with different sizes, colors, and temperaments. Popular breeds include Labrador Retrievers, German Shepherds, Golden Retrievers, and Bulldogs.'),
  np.float32(1.6842835)),
 (Document(id='07fae842-87d9-4d0d-98d8-f7420bd09070', metadata={'source': 'animals', 'page': 8}, page_content='Dogs require regular exercise, proper nutrition, and veterinary care to stay healthy. They are social animals that thrive on interaction with their owners and other dogs.'),
  np.float32(1.7256761)),
 (Document(id='45f1d284-5129-461a-97d3-971e6e7b7471'

### Retriever ---->
######  A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them

***Retrievers accept a string query as input and return a list of Document objects as output.***


Note- in LangChain Retrievers are Runnables , so they implement a standard set of methods (e.g., synchronous and asynchronous invoke and batch operations) and are designed to be incorporated in LCEL chains.

We can create a simple version of this ourselves , without subclassing Retriever . If we choose what method we want to use to retrieve documents , we can create a runnable easily below we will one around the similairity_search method

In [52]:
## It is a advanced way which any of function change into smart tool (Runnable) Using LCEL(LangChain Expression Language)
## why we use this -- beacuse without inherit any complex class we create a custome retriever which can fit with LCEL  
## we can use  chain = retriever | llm
from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

retriever = RunnableLambda(vectorstore.similarity_search).bind(k=1)
retriever.batch(["cat","dog"])

[[Document(id='37bb4191-6106-4fd1-85c0-9d7feda58032', metadata={'source': 'animals', 'page': 6}, page_content='Dogs are loyal and friendly animals that have been domesticated for thousands of years. They are known for their strong bonds with humans and their ability to be trained for various tasks.')],
 [Document(id='37bb4191-6106-4fd1-85c0-9d7feda58032', metadata={'source': 'animals', 'page': 6}, page_content='Dogs are loyal and friendly animals that have been domesticated for thousands of years. They are known for their strong bonds with humans and their ability to be trained for various tasks.')]]

## This methode is more simple aur standard as compare above RunnableLambda

***Yeh tarika pichle wale RunnableLambda se zyada simple aur standard hai. LangChain ne vectorstores ke andar hi ek inbuilt function diya hai—as_retriever()—jo automatic saara kaam kar deta hai.***

***Vectorstores*** -- Implement  an as_retriever method that will generate a retrriever , Specifically a VectorStoreRetriever, These retriever include specific search_type and search_kwargs attributes that identify what methods of the underlying vectorstore  to call ,and how to parameterize them , For instance we can replicate tha above with the following 

In [53]:
retriever=vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k":1}
)

retriever.batch([ "cat","dog"])

[[Document(id='37bb4191-6106-4fd1-85c0-9d7feda58032', metadata={'source': 'animals', 'page': 6}, page_content='Dogs are loyal and friendly animals that have been domesticated for thousands of years. They are known for their strong bonds with humans and their ability to be trained for various tasks.')],
 [Document(id='37bb4191-6106-4fd1-85c0-9d7feda58032', metadata={'source': 'animals', 'page': 6}, page_content='Dogs are loyal and friendly animals that have been domesticated for thousands of years. They are known for their strong bonds with humans and their ability to be trained for various tasks.')]]

In [None]:
## RAG 
from langchain_core.prompts import ChatPromptTemplate  
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter

message = """
Answer this question using the provided context only.

{question}

Context:
{context}
"""

# 2. ChatPromptTemplate.from_messages use
prompt = ChatPromptTemplate.from_messages([
    ("human", message)
])

# RAG Chain setup 
rag_chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question")
    } 
    | prompt 
    | llm
)

response = rag_chain.invoke({"question": "tell me about dogs"})
print(response.content)


According to the provided context, dogs are loyal and friendly animals that have been domesticated for thousands of years. They are known for their strong bonds with humans and their ability to be trained for various tasks.
