### Vector Stores & REtrievers

Langchain's vector and retriever abstractions are designed to support retrieval of data -- from (vector) databases and other sources-- for integration with LLM workflows. They are important for applications that fetch data to be reasoned over as part of model inference, as in the case of retrieval-augmented generation

### Documents

Langchain implements a Document abstraction, whihc is intended to represent a unit of text and associated metadata. It has two attributes:

page_content: a string representing the content

metadata: a dict containing arbitrary metadata. The metadata attribute can capture information about hte source of the document, its relationship to other documents, and otehr information. Note that an individual document object often represents a chunk of a larger document.

In [1]:
import os
from dotenv import load_dotenv
from langchain_groq import ChatGroq
from langchain_chroma import Chroma
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
load_dotenv()

groq_api_key = os.getenv("GROQ_API")
os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")

llm = ChatGroq(model="openai/gpt-oss-120b", groq_api_key=groq_api_key)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# data loading
from langchain_community.document_loaders import TextLoader
loader = TextLoader('C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt', encoding='utf-8')
docs = loader.load()
docs

[Document(metadata={'source': 'C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt'}, page_content='1. Company Overview\n\nDatopic Technologies Private Limited is an India-based emerging technology and innovation-driven IT services company, incorporated on 7 February 2017 under the Companies Act, 2013. Its Corporate Identification Number (CIN) is U74999DL2017PTC312445, and it is registered with the Registrar of Companies, Delhi.\n\nThe company’s registered office is located at RZ-77A, Dabri Extension, Main Palam Road, New Delhi – 110045, India. Operationally, Datopic has also established a significant presence in Noida, Uttar Pradesh, serving as its primary technology and development hub.\n\nDatopic Technologies is guided by the vision of “transforming data into intelligence and innovation into action.” Its central focus lies in building next-generation digital solutions for enterprises, leveraging cutting-edge technologies such as Artificial Intelligence (AI), Blo

In [3]:
# data chunking

from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size = 2000, chunk_overlap = 100)

chunks = splitter.split_documents(docs)
print(chunks)


[Document(metadata={'source': 'C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt'}, page_content='1. Company Overview\n\nDatopic Technologies Private Limited is an India-based emerging technology and innovation-driven IT services company, incorporated on 7 February 2017 under the Companies Act, 2013. Its Corporate Identification Number (CIN) is U74999DL2017PTC312445, and it is registered with the Registrar of Companies, Delhi.\n\nThe company’s registered office is located at RZ-77A, Dabri Extension, Main Palam Road, New Delhi – 110045, India. Operationally, Datopic has also established a significant presence in Noida, Uttar Pradesh, serving as its primary technology and development hub.\n\nDatopic Technologies is guided by the vision of “transforming data into intelligence and innovation into action.” Its central focus lies in building next-generation digital solutions for enterprises, leveraging cutting-edge technologies such as Artificial Intelligence (AI), Blo

In [4]:
# embeddings
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

embedding = HuggingFaceEmbeddings(model="all-MiniLM-L6-v2")

In [None]:
# storing in vector stores (chroma)
vector_store = Chroma.from_documents(docs, embedding)


In [10]:
res = vector_store.similarity_search("datopic")
print(res)

[Document(id='d65dc97b-6b8f-4059-8638-697e1a71e05c', metadata={'source': 'C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt'}, page_content='1. Company Overview\n\nDatopic Technologies Private Limited is an India-based emerging technology and innovation-driven IT services company, incorporated on 7 February 2017 under the Companies Act, 2013. Its Corporate Identification Number (CIN) is U74999DL2017PTC312445, and it is registered with the Registrar of Companies, Delhi.\n\nThe company’s registered office is located at RZ-77A, Dabri Extension, Main Palam Road, New Delhi – 110045, India. Operationally, Datopic has also established a significant presence in Noida, Uttar Pradesh, serving as its primary technology and development hub.\n\nDatopic Technologies is guided by the vision of “transforming data into intelligence and innovation into action.” Its central focus lies in building next-generation digital solutions for enterprises, leveraging cutting-edge technologie

In [11]:
# asyn query

res = await vector_store.asimilarity_search("datopic")
print(res)

[Document(id='d65dc97b-6b8f-4059-8638-697e1a71e05c', metadata={'source': 'C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt'}, page_content='1. Company Overview\n\nDatopic Technologies Private Limited is an India-based emerging technology and innovation-driven IT services company, incorporated on 7 February 2017 under the Companies Act, 2013. Its Corporate Identification Number (CIN) is U74999DL2017PTC312445, and it is registered with the Registrar of Companies, Delhi.\n\nThe company’s registered office is located at RZ-77A, Dabri Extension, Main Palam Road, New Delhi – 110045, India. Operationally, Datopic has also established a significant presence in Noida, Uttar Pradesh, serving as its primary technology and development hub.\n\nDatopic Technologies is guided by the vision of “transforming data into intelligence and innovation into action.” Its central focus lies in building next-generation digital solutions for enterprises, leveraging cutting-edge technologie

In [13]:
vector_store.similarity_search_with_score("datopic")

[(Document(id='d65dc97b-6b8f-4059-8638-697e1a71e05c', metadata={'source': 'C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt'}, page_content='1. Company Overview\n\nDatopic Technologies Private Limited is an India-based emerging technology and innovation-driven IT services company, incorporated on 7 February 2017 under the Companies Act, 2013. Its Corporate Identification Number (CIN) is U74999DL2017PTC312445, and it is registered with the Registrar of Companies, Delhi.\n\nThe company’s registered office is located at RZ-77A, Dabri Extension, Main Palam Road, New Delhi – 110045, India. Operationally, Datopic has also established a significant presence in Noida, Uttar Pradesh, serving as its primary technology and development hub.\n\nDatopic Technologies is guided by the vision of “transforming data into intelligence and innovation into action.” Its central focus lies in building next-generation digital solutions for enterprises, leveraging cutting-edge technologi

### Retrivers

LangChain vectorstore objects do not subclass Runnable, and so cannot immediately be integrated into LangChain Expression Language chains.

LangChain Retrievers are runnables, so they implement a standard set of methods (e.g., synchronous and asynchronous invoke and batch operations) and are designed to be incorporated in LCEL chains.


In [None]:
# This method of using retriever is not recommended

from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

retriever = RunnableLambda(vector_store.similarity_search_with_score).bind(k=1)
retriever.batch(["datopic", "cybersecurity"])

[[(Document(id='d65dc97b-6b8f-4059-8638-697e1a71e05c', metadata={'source': 'C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt'}, page_content='1. Company Overview\n\nDatopic Technologies Private Limited is an India-based emerging technology and innovation-driven IT services company, incorporated on 7 February 2017 under the Companies Act, 2013. Its Corporate Identification Number (CIN) is U74999DL2017PTC312445, and it is registered with the Registrar of Companies, Delhi.\n\nThe company’s registered office is located at RZ-77A, Dabri Extension, Main Palam Road, New Delhi – 110045, India. Operationally, Datopic has also established a significant presence in Noida, Uttar Pradesh, serving as its primary technology and development hub.\n\nDatopic Technologies is guided by the vision of “transforming data into intelligence and innovation into action.” Its central focus lies in building next-generation digital solutions for enterprises, leveraging cutting-edge technolog

VectorStores imiplement an as_retriever method that will generate a Retriver, specifically a vectorRetriver. These retrievers include specific search_type and search_kwargs attributes that identify what methods of the underlying vector store to call, and how to parameterize them. 

In [24]:
# This is the best and the recommended method of using retrivers and query a vector DB

retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k":1}
)

retriever.batch(["AI", "Blockchain", "cybersecurity"])


[[Document(id='d65dc97b-6b8f-4059-8638-697e1a71e05c', metadata={'source': 'C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt'}, page_content='1. Company Overview\n\nDatopic Technologies Private Limited is an India-based emerging technology and innovation-driven IT services company, incorporated on 7 February 2017 under the Companies Act, 2013. Its Corporate Identification Number (CIN) is U74999DL2017PTC312445, and it is registered with the Registrar of Companies, Delhi.\n\nThe company’s registered office is located at RZ-77A, Dabri Extension, Main Palam Road, New Delhi – 110045, India. Operationally, Datopic has also established a significant presence in Noida, Uttar Pradesh, serving as its primary technology and development hub.\n\nDatopic Technologies is guided by the vision of “transforming data into intelligence and innovation into action.” Its central focus lies in building next-generation digital solutions for enterprises, leveraging cutting-edge technologi

In [None]:
# Basic RAG
# integrating retrivers in a chain 

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
Answer the users question using the provided context only.

{question}

context:
{context}
"""

prompt = ChatPromptTemplate.from_messages([("human", message)])
rag_chain ={"context":retriever, "question":RunnablePassthrough()}|prompt|llm
response = rag_chain.invoke("what is datopic?")
print(response)

content='Datopic\u202f—\u202fshort for **Datopic Technologies Private Limited** — is an India‑based, emerging‑technology and innovation‑driven IT services company. Incorporated on\u202f7\u202fFebruary\u202f2017, it is registered in Delhi (CIN\u202fU74999DL2017PTC312445) with its operational hub in Noida, Uttar\u202fPradesh.  \n\nThe firm’s vision is “transforming data into intelligence and innovation into action,” and it focuses on building next‑generation digital solutions for enterprises using technologies such as artificial intelligence, blockchain, cloud computing, cybersecurity, and data analytics. With a lean team of roughly 20–50 professionals, Datopic delivers services across product engineering, AI & data analytics, cybersecurity, blockchain, and strategic IT consulting, targeting sectors like finance, healthcare, government, and retail.' additional_kwargs={'reasoning_content': 'The user asks: "what is datopic?" Must answer using provided context only. Summarize definition: Da