### How to work with Chormadb and store it in a database

- Here we insert docs into db
- Uses search operation on it
- Get the context
- Provide that context to the LLM with scores
- Generate the response

In [5]:
from langchain_community.document_loaders import PyMuPDFLoader
import os
from dotenv import load_dotenv
load_dotenv()

True

In [6]:
# langchain + chormadb
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.documents import Document

#for chromadb
from langchain_community.vectorstores import Chroma
from typing import List
import os



In [7]:
strings_array = [
    "Python is renowned for its flexible data structures, which include lists, tuples, sets, and dictionaries. Lists are ordered collections that support dynamic resizing and a variety of methods for adding, removing, or searching elements. Tuples are similar to lists but immutable, ensuring that data remains unchanged once assigned and making them suitable for fixed data groupings. Dictionaries use key-value pairs allowing fast access, manipulation, and association of data by unique keys. Sets are collections of unordered, unique elements, great for removing duplicates and performing mathematical operations like unions and intersections. These data structures form the backbone for efficient algorithm development, making Python popular for data engineering, scientific computing, and rapid prototyping in diverse software projects."
    ,
    "Docker containers have revolutionized application deployment by encapsulating all dependencies within lightweight, portable units. The container lifecycle runs through image creation, build, run, and destruction, allowing consistency across environments from a developer’s laptop to cloud production servers. Networking and persistent storage are handled through Docker’s bridge networks and mounted volumes, often configured in a YAML file for orchestration. Security features, resource limits, and automated health checks help maintain uptime and isolation. Command-line tools and APIs provide granular control, while platforms like Kubernetes extend management to large-scale clusters. Docker’s architecture enables microservices, CI/CD pipelines, and efficient scaling for modern software infrastructure."
    ,
    "Vector databases have emerged as critical infrastructure for AI-driven applications by enabling the fast, approximate search of high-dimensional embeddings. Unlike traditional relational stores, vector DBs represent data points as mathematical vectors, supporting similarity queries, KNN search, and clustering. This approach underpins semantic retrieval in RAG systems, recommendation engines, and fraud detection. Technologies such as Pinecone, Weaviate, and ChromaDB offer APIs to store, update, and query embeddings generated by models like BERT or CLIP. They optimize for speed and scalability with techniques including approximate neighbors, distributed indexing, and GPU acceleration. Advanced filtering and metadata support enable hybrid retrieval for context-aware generative AI solutions."
    ,
    "SQL query optimization is crucial for scalable database operations. Indexing frequently searched columns is an essential strategy, but too many indexes can degrade write performance. Queries should avoid ‘SELECT *’, minimize joins to essential tables, and use WHERE clauses that leverage indexed columns. Tools like EXPLAIN PLAN visualize execution steps, guiding developers to restructure queries for efficiency. Partitioning large tables can improve access speed while reducing locking contention. Regularly updating table statistics ensures the optimizer selects the best execution path. Avoiding correlated subqueries and using batch processing techniques can reduce resource consumption. These approaches together lead to faster, more reliable database systems."
    ,
    "In Node.js, the event loop is a key mechanism that allows non-blocking I/O operations on a single thread. Each incoming request is delegated to the system kernel, freeing the JavaScript runtime to handle other events. Async callbacks are queued and executed when the kernel signals completion, drastically improving throughput. Promises and async/await syntax further simplify asynchronous code management, reducing callback hell and enhancing maintainability. Node.js excels in microservices, web servers, and real-time applications like chat or streaming services. Its event-driven model, combined with fast V8 execution, supports horizontal scaling and resource-efficient concurrency on modest hardware, making Node.js immensely popular for backend services."
]


In [8]:
# Storing the samlpes in a file
import tempfile
temp_dir = tempfile.mkdtemp()

for i,doc in enumerate(strings_array):
    file_path = os.path.join(temp_dir, f"doc_{i}.txt")
    with open(file_path, "w", encoding="utf-8") as f:
        f.write(doc)

print("Doc created")

Doc created


In [None]:

# Document loading
from langchain_community.document_loaders import DirectoryLoader,TextLoader
load = DirectoryLoader(
    temp_dir,
    glob="*.txt",
    loader_cls=TextLoader,
    loader_kwargs={"encoding": "utf-8"}
)

documents=load.load()
for i,doc in enumerate(documents):
    print(f"Document {i}:\n{doc.page_content}")
    print(f"Metadata: {doc.metadata}")

### Text Splitting from docs


In [10]:
# Text Splitting
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter=RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=20,
    length_function=len,
    separators=[" "]
)
chunks=splitter.split_documents(documents)
print(f"Total chunks: {len(chunks)} of {len(documents)}")
print(f"Content: {chunks[0]}")
print(f"Metadata: {chunks[0].metadata}")

Total chunks: 15 of 5
Content: page_content='Python is renowned for its flexible data structures, which include lists, tuples, sets, and dictionaries. Lists are ordered collections that support dynamic resizing and a variety of methods for adding, removing, or searching elements. Tuples are similar to lists but immutable, ensuring that data' metadata={'source': 'C:\\Users\\ps19j\\AppData\\Local\\Temp\\tmpb9s9tz0u\\doc_0.txt'}
Metadata: {'source': 'C:\\Users\\ps19j\\AppData\\Local\\Temp\\tmpb9s9tz0u\\doc_0.txt'}


### Embedding

In [11]:
sample_text = "The quick brown fox jumps over the lazy dog."
embeddings=HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"

)
embeddings.embed_query(sample_text)

[0.043933555483818054,
 0.05893440172076225,
 0.04817839711904526,
 0.07754813879728317,
 0.02674437128007412,
 -0.03762954846024513,
 -0.0026051148306578398,
 -0.05994303897023201,
 -0.0024960089940577745,
 0.02207283116877079,
 0.048025909811258316,
 0.055755313485860825,
 -0.03894546255469322,
 -0.026616770774126053,
 0.0076933917589485645,
 -0.026237700134515762,
 -0.03641606494784355,
 -0.03781614825129509,
 0.07407816499471664,
 -0.04950505867600441,
 -0.05852171406149864,
 -0.06361967325210571,
 0.032435014843940735,
 0.022008540108799934,
 -0.07106371223926544,
 -0.03315779194235802,
 -0.06941041350364685,
 -0.05003739148378372,
 0.07462679594755173,
 -0.11113381385803223,
 -0.01230629812926054,
 0.03774565830826759,
 -0.02803134173154831,
 0.014535323716700077,
 -0.031558554619550705,
 -0.08058364689350128,
 0.05835256725549698,
 0.002590067917481065,
 0.0392802357673645,
 0.025769580155611038,
 0.049850545823574066,
 -0.0017562442226335406,
 -0.04552978649735451,
 0.029260773

#### Storing into ChromaDB using HuggingFace Space

In [12]:
# Directory of ChromaDB
persistant_directory = "./chromaDB"

VECTOR_STORE = Chroma(
    persist_directory=persistant_directory,
    embedding_function=HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2"
    ),
    collection_name="Rag_collection"
)
VECTOR_STORE.add_documents(chunks)
print(f"Vector store created at {persistant_directory}")
print(f"Number of vectors: {VECTOR_STORE._collection.count()}")

  VECTOR_STORE = Chroma(


Vector store created at ./chromaDB
Number of vectors: 35


In [None]:
query="How to work with Node js?"
similar_chunks=VECTOR_STORE.similarity_search(query, k=3)
print(similar_chunks)
similar_chunks_with_score=VECTOR_STORE.similarity_search_with_score(query,k=3)
print(similar_chunks_with_score)# it prints the similar chunks with their scores or ratings

[Document(metadata={'source': 'C:\\Users\\ps19j\\AppData\\Local\\Temp\\tmppjdesij8\\doc_4.txt'}, page_content='signals completion, drastically improving throughput. Promises and async/await syntax further simplify asynchronous code management, reducing callback hell and enhancing maintainability. Node.js excels in microservices, web servers, and real-time applications like chat or streaming services. Its'), Document(metadata={'source': 'C:\\Users\\ps19j\\AppData\\Local\\Temp\\tmpb9s9tz0u\\doc_4.txt'}, page_content='signals completion, drastically improving throughput. Promises and async/await syntax further simplify asynchronous code management, reducing callback hell and enhancing maintainability. Node.js excels in microservices, web servers, and real-time applications like chat or streaming services. Its'), Document(metadata={'source': 'C:\\Users\\ps19j\\AppData\\Local\\Temp\\tmppjdesij8\\doc_4.txt'}, page_content='In Node.js, the event loop is a key mechanism that allows non-blockin

In [None]:
import os
from dotenv import load_dotenv  
load_dotenv()
OPENROUTER_API_KEY = os.getenv('OPENROUTER_API_KEY')


In [None]:
# Using free LLM to know the working of Agentic AI
from openai import OpenAI
client = OpenAI(
    api_key=OPENROUTER_API_KEY,
    base_url="https://openrouter.ai/api/v1"
)

# First API call with reasoning
response = client.chat.completions.create(
  model="x-ai/grok-4.1-fast",
  messages=[
          {
            "role": "user",
            "content": "Can you explain me about Agentic AI?"
          }
        ],
  extra_body={"reasoning": {"enabled": True}}
)
response=response.choices[0].message.content
print(response)




In [14]:
from langchain_openai import ChatOpenAI
llm=ChatOpenAI(
    model="x-ai/grok-4.1-fast",
    api_key=OPENROUTER_API_KEY,
    base_url="https://openrouter.ai/api/v1"
)

response=llm.invoke("What is the capital of France?")
print(response)



content="**Paris** is the capital of France. It's located in the north-central part of the country along the Seine River and is famous for landmarks like the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral." additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 122, 'prompt_tokens': 163, 'total_tokens': 285, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 81, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'x-ai/grok-4.1-fast', 'system_fingerprint': None, 'id': 'gen-1764070297-udtHQqyafZRhDxw6Kz2M', 'finish_reason': 'stop', 'logprobs': None} id='lc_run--0fb0e2e8-30f6-4e79-9f51-5b8b816e3571-0' usage_metadata={'input_tokens': 163, 'output_tokens': 122, 'total_tokens': 285, 'input_token_details': {}, 'output_token_details': {'reasoning': 81}}


# Using RAG Chains with Chormadb

In [18]:
# With out using Langchain Expression Language
from langchain_classic.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_classic.chains.combine_documents import create_stuff_documents_chain

retriever=VECTOR_STORE.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4},
)
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002563D8CFCB0>, search_kwargs={'k': 4})

In [23]:
# Creating custom prompts
system_prompt="""
Use the following chunks as reference for the answer of the question 
Provide a answer of nearly 200-300 words and use the chunks to gather the answer also use your own knowledge and experience

Context:
{context}

Answer:

"""

prompt=ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)



In [24]:
# Putting in the chain using create_stuff_document_chain-> It will help to combine all the chunks and provide that to the prompt and then that will be passed to the LLM

document_chain=create_stuff_documents_chain(llm,prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\nUse the following chunks as reference for the answer of the question \nProvide a answer of nearly 200-300 words and use the chunks to gather the answer also use your own knowledge and experience\n\nContext:\n{context}\n\nAnswer:\n\n'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000002563FC0B890>, async_client=<openai.resources.chat.completions.complet

#### The total chain is something like this:\n
Input → Format Documents → Create Prompt → LLM Call → Parse Output<br>
**Now what internally happens is that RunnableBinding wraps the whole chain and in that at first it accumulates
the context using format_docs and then it provide that to the ChatPromptTemplate and then it passed it to the ChatOpenAI , the OpenAI provides a response which later showed using StrOutputParser()**

- The format_docs() function is responsible for formatting the documents into a single string.
```
def format_docs(chunks):
return "\n\n".join(chunks.page_content for doc in docs)
```

- The ChatPromptTemplate() function is responsible for creating the prompt for the LLM.
```
ChatPromptTemplate(
    input_variables=['context', 'question'],
    messages=[
        SystemMessagePromptTemplate(
            prompt=PromptTemplate(
                template='''Use the following chunks as reference for the answer of the question 
Provide a answer of nearly 200-300 words and use the chunks to gather the answer also use your own knowledge and experience

Context:
{context}

Answer:

'''
            )
        ),
        HumanMessagePromptTemplate(
            prompt=PromptTemplate(template='{question}')
        )
    ]
)
```

- The ChatOpenAI() function is responsible for calling the LLM.
```ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, max_tokens=1000)```

- The StrOutputParser() function is responsible for parsing the output of the LLM.
```StrOutputParser(output_key='text')```

- The runnable_binding is responsible for binding the above functions together.
```runnable_binding = RunnableBinding(format_docs, ChatPromptTemplate, ChatOpenAI, StrOutputParser)```

- The runnable is something like this:\n
```runnable = runnable_binding.bind(context=context, question=question)```

- The runnable is something like this:\n
```runnable.run()```


#### Final Rag with both Chunks and Chormadb

In [25]:
rag_chain=create_retrieval_chain(retriever,document_chain)
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002563D8CFCB0>, search_kwargs={'k': 4}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\nUse the following chunks as reference for the answer of the question \nProvide a answer of nearly 200-300 words and use the

In [None]:
# Final chain
# Retriver will retrive based on query from the db -> retreived dataw will passed as context
# -> context will be passed to the LLM
# -> LLM will generate a response based on the custom prompt

last_response=rag_chain.invoke({"input":"I want to know about vector database as a beginner"})
print(last_response)

{'input': 'I want to know about vector database as a beginner', 'context': [Document(metadata={'source': 'C:\\Users\\ps19j\\AppData\\Local\\Temp\\tmppjdesij8\\doc_2.txt'}, page_content='Vector databases have emerged as critical infrastructure for AI-driven applications by enabling the fast, approximate search of high-dimensional embeddings. Unlike traditional relational stores, vector DBs represent data points as mathematical vectors, supporting similarity queries, KNN search, and'), Document(metadata={'source': 'C:\\Users\\ps19j\\AppData\\Local\\Temp\\tmpb9s9tz0u\\doc_2.txt'}, page_content='Vector databases have emerged as critical infrastructure for AI-driven applications by enabling the fast, approximate search of high-dimensional embeddings. Unlike traditional relational stores, vector DBs represent data points as mathematical vectors, supporting similarity queries, KNN search, and'), Document(metadata={'source': 'C:\\Users\\ps19j\\AppData\\Local\\Temp\\tmppjdesij8\\doc_2.txt'}, pa

In [29]:
last_response['answer']

'### What is a Vector Database? A Beginner\'s Guide\n\nImagine you\'re trying to find a book in a massive library not by its title or author, but by how similar its content feels to what you\'re thinking—like "stories about brave adventurers in magical worlds." Traditional databases (like SQL ones) excel at exact matches: "Give me the book titled *Harry Potter*." But vector databases are built for **similarity searches**, treating data as points in a high-dimensional space.\n\nAt their core, vector databases store **embeddings**—numerical vectors (arrays of numbers) that represent things like text, images, or audio. For example, AI models like BERT or OpenAI\'s embeddings turn a sentence like "I love cats" into a vector [0.2, -0.5, 1.3, ...] capturing its meaning. These vectors live in hundreds or thousands of dimensions, where similar items cluster close together.\n\nUnlike relational databases with rows and tables, vector DBs use math like **cosine similarity** or **Euclidean distanc

'### What is a Vector Database? A Beginner\'s Guide\n\nImagine you\'re trying to find a book in a massive library not by its title or author, but by how similar its content feels to what you\'re thinking—like "stories about brave adventurers in magical worlds." Traditional databases (like SQL ones) excel at exact matches: "Give me the book titled *Harry Potter*." But vector databases are built for **similarity searches**, treating data as points in a high-dimensional space.\n\nAt their core, vector databases store **embeddings**—numerical vectors (arrays of numbers) that represent things like text, images, or audio. For example, AI models like BERT or OpenAI\'s embeddings turn a sentence like "I love cats" into a vector [0.2, -0.5, 1.3, ...] capturing its meaning. These vectors live in hundreds or thousands of dimensions, where similar items cluster close together.\n\nUnlike relational databases with rows and tables, vector DBs use math like **cosine similarity** or **Euclidean distance** to find the **nearest neighbors (KNN)** quickly. They rely on **approximate nearest neighbors (ANN)** algorithms (e.g., HNSW or IVF) for speed on massive datasets—think billions of vectors—without scanning everything.\n\n**Why do they matter?** They\'re the backbone of modern AI apps:\n- **RAG (Retrieval-Augmented Generation)**: ChatGPT-like systems pull relevant docs for accurate answers.\n- **Recommendations**: Netflix suggests shows based on your tastes.\n- **Search & Fraud Detection**: Find similar images or spot anomalies.\n\nPopular open-source/free options: **ChromaDB** (super simple for local use), **Weaviate** (GraphQL-friendly with hybrid search), **Pinecone** (managed cloud service). Getting started? Install Chroma via pip, embed text with Hugging Face, and query in Python—it\'s beginner-friendly!\n\nVector DBs scale with GPU acceleration, distributed indexes, and metadata filtering for real-world apps. They\'re exploding in AI because they make "fuzzy" searches lightning-fast. Dive in with a tutorial; you\'ll see why they\'re essential infrastructure!\n\n*(Word count: 278)*'