### Vector Stores & REtrievers

Langchain's vector and retriever abstractions are designed to support retrieval of data -- from (vector) databases and other sources-- for integration with LLM workflows. They are important for applications that fetch data to be reasoned over as part of model inference, as in the case of retrieval-augmented generation

### Documents

Langchain implements a Document abstraction, whihc is intended to represent a unit of text and associated metadata. It has two attributes:

page_content: a string representing the content

metadata: a dict containing arbitrary metadata. The metadata attribute can capture information about hte source of the document, its relationship to other documents, and otehr information. Note that an individual document object often represents a chunk of a larger document.

In [None]:
import os
from dotenv import load_dotenv
from langchain_groq import ChatGroq
from langchain_chroma import Chroma
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
load_dotenv()

groq_api_key = os.getenv("GROQ_API")
os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")

llm = ChatGroq(model="openai/gpt-oss-120b", groq_api_key=groq_api_key)

In [14]:
# data loading
from langchain_community.document_loaders import TextLoader
loader = TextLoader('C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt', encoding='utf-8')
docs = loader.load()
docs

[Document(metadata={'source': 'C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt'}, page_content='1. Company Overview\n\nDatopic Technologies Private Limited is an India-based emerging technology and innovation-driven IT services company, incorporated on 7 February 2017 under the Companies Act, 2013. Its Corporate Identification Number (CIN) is U74999DL2017PTC312445, and it is registered with the Registrar of Companies, Delhi.\n\nThe company’s registered office is located at RZ-77A, Dabri Extension, Main Palam Road, New Delhi – 110045, India. Operationally, Datopic has also established a significant presence in Noida, Uttar Pradesh, serving as its primary technology and development hub.\n\nDatopic Technologies is guided by the vision of “transforming data into intelligence and innovation into action.” Its central focus lies in building next-generation digital solutions for enterprises, leveraging cutting-edge technologies such as Artificial Intelligence (AI), Blo

In [19]:
# data chunking

from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size = 2000, chunk_overlap = 100)

chunks = splitter.split_documents(docs)
print(chunks)


[Document(metadata={'source': 'C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt'}, page_content='1. Company Overview\n\nDatopic Technologies Private Limited is an India-based emerging technology and innovation-driven IT services company, incorporated on 7 February 2017 under the Companies Act, 2013. Its Corporate Identification Number (CIN) is U74999DL2017PTC312445, and it is registered with the Registrar of Companies, Delhi.\n\nThe company’s registered office is located at RZ-77A, Dabri Extension, Main Palam Road, New Delhi – 110045, India. Operationally, Datopic has also established a significant presence in Noida, Uttar Pradesh, serving as its primary technology and development hub.\n\nDatopic Technologies is guided by the vision of “transforming data into intelligence and innovation into action.” Its central focus lies in building next-generation digital solutions for enterprises, leveraging cutting-edge technologies such as Artificial Intelligence (AI), Blo

In [20]:
# embeddings
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

embedding = HuggingFaceEmbeddings(model="all-MiniLM-L6-v2")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


In [21]:
# storing in vector stores (chroma)

vector_store = Chroma.from_documents(docs, embedding)


In [None]:
-vector_store.similarity_search("datopic")

[Document(id='6647de1e-f478-4815-b358-d273a1586887', metadata={'source': 'C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt'}, page_content='1. Company Overview\n\nDatopic Technologies Private Limited is an India-based emerging technology and innovation-driven IT services company, incorporated on 7 February 2017 under the Companies Act, 2013. Its Corporate Identification Number (CIN) is U74999DL2017PTC312445, and it is registered with the Registrar of Companies, Delhi.\n\nThe company’s registered office is located at RZ-77A, Dabri Extension, Main Palam Road, New Delhi – 110045, India. Operationally, Datopic has also established a significant presence in Noida, Uttar Pradesh, serving as its primary technology and development hub.\n\nDatopic Technologies is guided by the vision of “transforming data into intelligence and innovation into action.” Its central focus lies in building next-generation digital solutions for enterprises, leveraging cutting-edge technologie

In [23]:
# asyn query

await vector_store.asimilarity_search("datopic")

[Document(id='6647de1e-f478-4815-b358-d273a1586887', metadata={'source': 'C:\\Users\\pouru\\Desktop\\LangChain\\Lanchain_AgenticAI\\data\\text.txt'}, page_content='1. Company Overview\n\nDatopic Technologies Private Limited is an India-based emerging technology and innovation-driven IT services company, incorporated on 7 February 2017 under the Companies Act, 2013. Its Corporate Identification Number (CIN) is U74999DL2017PTC312445, and it is registered with the Registrar of Companies, Delhi.\n\nThe company’s registered office is located at RZ-77A, Dabri Extension, Main Palam Road, New Delhi – 110045, India. Operationally, Datopic has also established a significant presence in Noida, Uttar Pradesh, serving as its primary technology and development hub.\n\nDatopic Technologies is guided by the vision of “transforming data into intelligence and innovation into action.” Its central focus lies in building next-generation digital solutions for enterprises, leveraging cutting-edge technologie