# Building a RAG System with LangChain and ChromaDB

## Introduction
Retrieval-Augmented Generation (RAG) is a powerful technique that combines the capabilities of large language models with external knowledge retrieval. This notebook will walk you through building a complete RAG system using:

- LangChain: A framework for developing applications powered by language models
- ChromaDB: An open-source vector database for storing and retrieving embeddings
- OpenAI: For embeddings and language model (you can substitute with other providers)

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
## langchain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain.schema import Document

## vectorstores
from langchain_community.vectorstores import Chroma

## utility imports
import numpy as np
from typing import List

### RAG (Retrieval-Augmented Generation) Architecture


1. Document Loading: Load documents from various sources
2. Document Splitting: Break documents into smaller chunks
3. Embedding Generation: Convert chunks into vector representations
4. Vector Storage: Store embeddings in ChromaDB
5. Query Processing: Convert user query to embedding
6. Similarity Search: Find relevant chunks from vector store
7. Context Augmentation: Combine retrieved chunks with query
8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge

## Create Sample Data

In [5]:
## create sample documents
sample_docs = [
    """
    Machine Learning Fundamentals
    
    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine learning: supervised learning, unsupervised learning, and reinforcement 
    learning. Supervised learning uses labeled data to train models, while unsupervised 
    learning finds patterns in unlabeled data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties.
    """,
    
    """
    Deep Learning and Neural Networks
    
    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning has revolutionized fields like computer vision, natural language 
    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly 
    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers 
    excel at sequential data processing.
    """,
    
    """
    Natural Language Processing (NLP)
    
    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, 
    machine translation, and question answering. Modern NLP heavily relies on transformer 
    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand 
    context and relationships between words in text.
    """
]

sample_docs


['\n    Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.\n    ',
 '\n    Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective f

In [None]:
# ------------------------------------------------------------
# Save Sample Documents to Temporary Text Files
# ------------------------------------------------------------
# This script demonstrates how to:
# 1. Create a temporary directory using Python's `tempfile` module.
# 2. Write multiple text documents into that directory.
# 3. Print the directory path where files are stored.
# ------------------------------------------------------------

import tempfile  # Provides utilities for creating temporary files and directories

# Step 1: Create a temporary directory
# ------------------------------------
# `tempfile.mkdtemp()` creates a unique temporary directory
# which is automatically cleaned up when the system reboots or manually deleted.
temp_dir = tempfile.mkdtemp()

# Step 2: Write each document to a separate text file
# ---------------------------------------------------
# `enumerate(sample_docs)` gives both index (i) and content (doc)
# Each document is saved as: doc_0.txt, doc_1.txt, doc_2.txt, etc.
for i, doc in enumerate(sample_docs):
    # Open a new text file in write mode ('w')
    with open(f"{temp_dir}/doc_{i}.txt", "w") as f:
        # Write the content of the document into the file
        f.write(doc)

# Step 3: Print the directory path
# --------------------------------
# Display the temporary directory location to help locate saved documents.
print(f"Sample documents created in: {temp_dir}")


In [None]:
## save sample documents to files
import tempfile
temp_dir=tempfile.mkdtemp()

for i,doc in enumerate(sample_docs):
    with open(f"doc_{i}.txt","w") as f:
        f.write(doc)

In [7]:
temp_dir

'C:\\Users\\win10\\AppData\\Local\\Temp\\tmp36cdbi28'

## Document Loading

In [None]:
from langchain_community.document_loaders import DirectoryLoader,TextLoader

# ------------------------------------------------------------
# Load Documents from a Directory
# ------------------------------------------------------------
# This script demonstrates how to:
# 1. Use LangChain's DirectoryLoader to load multiple text files.
# 2. Specify custom file-matching patterns and loader configurations.
# 3. Preview the content of loaded documents.
# ------------------------------------------------------------

from langchain_community.document_loaders import DirectoryLoader, TextLoader

# Step 1: Initialize the Directory Loader
# ---------------------------------------
# DirectoryLoader helps automatically load all files from a directory.
# Parameters:
# - "data" : Path to the folder containing text files.
# - glob="*.txt" : File matching pattern ‚Äî only loads `.txt` files.
# - loader_cls=TextLoader : Specifies that each file is loaded as text.
# - loader_kwargs : Extra arguments passed to the TextLoader (e.g., encoding type).
loader = DirectoryLoader(
    "data", 
    glob="*.txt", 
    loader_cls=TextLoader,
    loader_kwargs={'encoding': 'utf-8'}  # Ensures text files are correctly decoded
)

# Step 2: Load all matching documents
# -----------------------------------
# The `load()` method reads all the matched files and converts them
# into LangChain `Document` objects ‚Äî each with metadata and content.
documents = loader.load()

# Step 3: Display summary information
# -----------------------------------
# Print how many documents were successfully loaded.
print(f"Loaded {len(documents)} documents")

# Step 4: Preview the first document
# -----------------------------------
# Safely display a small snippet of the first document‚Äôs content.
print("\nFirst document preview:")
print(documents[0].page_content[:200] + "...")  # Show first 200 characters



Loaded 3 documents

First document preview:

    Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. Ther...


In [9]:
documents

[Document(metadata={'source': 'data\\doc_0.txt'}, page_content='\n    Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.\n    '),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='\n    Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natur

## Document Splitting

In [None]:
# ------------------------------------------------------------
# Text Splitting into Manageable Chunks
# ------------------------------------------------------------
# This script demonstrates how to:
# 1. Initialize LangChain's RecursiveCharacterTextSplitter.
# 2. Divide large documents into smaller, overlapping chunks.
# 3. Preserve contextual flow across chunks.
# ------------------------------------------------------------

from langchain.text_splitter import RecursiveCharacterTextSplitter

# Step 1: Initialize the Text Splitter
# ------------------------------------
# RecursiveCharacterTextSplitter intelligently splits text into smaller chunks
# while trying to respect natural language boundaries (e.g., sentences, spaces).
# Parameters:
# - chunk_size: Maximum number of characters in each chunk.
# - chunk_overlap: Number of overlapping characters between adjacent chunks.
#   (Useful to retain context for models like embeddings or LLMs.)
# - length_function: Function used to measure text length (here we use `len`).
# - separators: Defines hierarchy for splitting (e.g., paragraph, sentence, space).
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,       # Each chunk will be around 500 characters
    chunk_overlap=50,     # Overlap ensures continuity of meaning across chunks
    length_function=len,  # Length is measured in number of characters
    separators=[" "]      # Splitting primarily based on spaces
)

# Step 2: Split the documents
# ----------------------------
# The splitter divides each document into multiple smaller segments,
# returning a list of `Document` objects ‚Äî each containing a chunk of text
# along with inherited metadata (e.g., source file).
chunks = text_splitter.split_documents(documents)

# Step 3: Display results
# ------------------------
# Provide an overview of how many chunks were created and preview one of them.
print(f"Created {len(chunks)} chunks from {len(documents)} documents")

# Step 4: Inspect an example chunk
# --------------------------------
# Show a short snippet from the first chunk for verification.
print("\nChunk example:")
print(f"Content: {chunks[0].page_content[:150]}...")  # First 150 characters
print(f"Metadata: {chunks[0].metadata}")               # Associated metadata


Created 5 chunks from 3 documents

Chunk example:
Content: Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experie...
Metadata: {'source': 'data\\doc_0.txt'}


In [11]:
chunks

[Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through'),
 Document(metadata={'source': 'data\\doc_0.txt'}, page_content='data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of in

### Embedding Models

In [12]:
os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY")

In [13]:
sample_text="MAchine LEarning is fascinating"
embeddings=OpenAIEmbeddings()
embeddings

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x000001D9674BECF0>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x000001D9674BF620>, model='text-embedding-ada-002', dimensions=None, deployment='text-embedding-ada-002', openai_api_version=None, openai_api_base=None, openai_api_type=None, openai_proxy=None, embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)

In [None]:
vector=embeddings.embed_query(sample_text)

### Intilialize the ChromaDB Vector Store And Stores the chunks in Vector Representation

In [15]:
chunks

[Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through'),
 Document(metadata={'source': 'data\\doc_0.txt'}, page_content='data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of in

In [None]:
# ------------------------------------------------------------
# Create and Persist a Chroma Vector Store
# ------------------------------------------------------------
# This section creates a local vector database using Chroma
# and stores text embeddings for efficient retrieval.
# ------------------------------------------------------------

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# Directory path where the vector store will be saved
persist_directory = "./chroma_db"

# Initialize Chroma with precomputed text chunks and embeddings
# - `documents`: list of chunked text data
# - `embedding`: embedding model (OpenAI embeddings used here)
# - `persist_directory`: location to save the vector index
# - `collection_name`: logical grouping name for the stored vectors
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=OpenAIEmbeddings(),
    persist_directory=persist_directory,
    collection_name="rag_collection"
)

# Display summary information
print(f"Vector store created with {vectorstore._collection.count()} vectors")
print(f"Persisted to: {persist_directory}")


Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


Vector store created with 10 vectors
Persisted to: ./chroma_db


## Test Similarity Search

In [None]:
# ------------------------------------------------------------
# Query the Chroma Vector Store
# ------------------------------------------------------------
# Perform a semantic similarity search to find documents
# most relevant to a given natural language query.
# ------------------------------------------------------------

# Define a natural language query
query = "What are the types of machine learning?"

# Perform similarity search
# - Converts the query into an embedding vector
# - Retrieves the top-k most similar document chunks
# - Returns LangChain `Document` objects with content and metadata
similar_docs = vectorstore.similarity_search(query, k=3)

# Display results
print(f"‚úÖ Retrieved {len(similar_docs)} relevant chunks:\n")
for i, doc in enumerate(similar_docs, 1):
    print(f"üîπ Result {i}:")
    print(doc.page_content[:200] + "...")  # Preview first 200 characters
    print(f"Metadata: {doc.metadata}\n")


Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


[Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through'),
 Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, w

In [18]:
query="what is NLP?"

similar_docs=vectorstore.similarity_search(query,k=3)
similar_docs

[Document(metadata={'source': 'data\\doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand \n    context and relationships between words in text.'),
 Document(metadata={'source': 'data\\doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mec

In [19]:
query="what is Deep Learning?"

similar_docs=vectorstore.similarity_search(query,k=3)
similar_docs

[Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neura

In [20]:
print(f"Query: {query}")
print(f"\nTop {len(similar_docs)} similar chunks:")
for i, doc in enumerate(similar_docs):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:200] + "...")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")

Query: what is Deep Learning?

Top 3 similar chunks:

--- Chunk 1 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...
Source: data\doc_1.txt

--- Chunk 2 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...
Source: data\doc_1.txt

--- Chunk 3 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...
Source: data\doc_0.txt


## Advanced Similarity Search With Scores

In [None]:
# ------------------------------------------------------------
# Perform similarity search with scores
# ------------------------------------------------------------
# Returns the top-k most similar documents along with
# their similarity scores (lower = more similar).
# Each result is a tuple: (Document, score)
# ------------------------------------------------------------

results_scores = vectorstore.similarity_search_with_score(query, k=3)
results_scores


[(Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
  0.23813432455062866),
 (Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recogn

## Understanding Similarity Scores

The similarity score represents how closely related a document chunk is to your query. The scoring depends on the distance metric used:

ChromaDB default: Uses L2 distance (Euclidean distance)

- Lower scores = MORE similar (closer in vector space)
- Score of 0 = identical vectors
- Typical range: 0 to 2 (but can be higher)


Cosine similarity (if configured):

- Higher scores = MORE similar
- Range: -1 to 1 (1 being identical)

## Initialize LLM, RAG Chain, Prompt Template,Query the RAG system

In [22]:
from langchain_openai import ChatOpenAI

llm=ChatOpenAI(
    model_name="gpt-3.5-turbo"
)


In [23]:
test_response=llm.invoke("What is Large Language Models")
test_response

AIMessage(content="Large Language Models (LLMs) are a type of artificial intelligence model that have the ability to generate and understand human language at a large scale. These models are trained on massive amounts of text data, allowing them to produce coherent and human-like text in responses to prompts or queries. Some well-known examples of LLMs include OpenAI's GPT-3 and Google's BERT. These models have a wide range of applications, including natural language processing, text generation, and dialogue generation.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 99, 'prompt_tokens': 12, 'total_tokens': 111, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-Bx8e1mrx70gI0XqLI3sdVxQDO8J2Z', 'service_tier':

In [24]:
from langchain.chat_models.base import init_chat_model

llm=init_chat_model("openai:gpt-3.5-turbo")
#llm=init_chat_model("groq:")
llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001D96AA93110>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001D96AA92FD0>, root_client=<openai.OpenAI object at 0x000001D96AB19810>, root_async_client=<openai.AsyncOpenAI object at 0x000001D96AB196E0>, model_kwargs={}, openai_api_key=SecretStr('**********'))

In [25]:
llm.invoke("What is AI")

AIMessage(content='AI, or artificial intelligence, is the simulation of human intelligence processes by machines. These processes include learning, reasoning, problem-solving, perception, and language understanding. AI technologies are used in a wide range of applications, such as speech recognition, natural language processing, image recognition, and robotics.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 58, 'prompt_tokens': 10, 'total_tokens': 68, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-Bx8e8mjKa0JsflCAr1zExfdW3ASlJ', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--2715a612-f118-4c13-8de1-6b72506ae34c-0', usage_metadata={'input_tokens': 10, 'output_tokens': 58, 

### Modern RAG Chain

In [26]:
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

In [None]:
# ------------------------------------------------------------
# Create a Retriever from the Vector Store
# ------------------------------------------------------------
# A retriever provides an easy interface to fetch the most
# relevant document chunks for a given query.
# ------------------------------------------------------------

retriever = vectorstore.as_retriever(
    search_kwargs={"k": 3}  # Retrieve top 3 most relevant chunks
)

retriever

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001D9674BFB60>, search_kwargs={})

In [28]:
## Create a prompt template
from langchain_core.prompts import ChatPromptTemplate
system_prompt="""You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Context: {context}"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])

In [29]:
prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse three sentences maximum and keep the answer concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

## What is create_stuff_documents_chain?

create_stuff_documents_chain creates a chain that "stuffs" (inserts) all retrieved documents into a single prompt and sends it to the LLM. It's called "stuff" because it literally stuffs all the documents into the context window at once.

In [30]:
### Create a document chain
from langchain.chains.combine_documents import create_stuff_documents_chain
document_chain=create_stuff_documents_chain(llm,prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse three sentences maximum and keep the answer concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001D96AA93110>, async_client=<openai.res

This chain:

- Takes retrieved documents
- "Stuffs" them into the prompt's {context} placeholder
- Sends the complete prompt to the LLM
- Returns the LLM's response

## What is create_retrieval_chain?


create_retrieval_chain is a function that combines a retriever (which fetches relevant documents) with a document chain (which processes those documents with an LLM) to create a complete RAG pipeline.

In [31]:
### Create The Final RAG Chain
from langchain.chains import create_retrieval_chain
rag_chain=create_retrieval_chain(retriever,document_chain)
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001D9674BFB60>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don'

In [32]:
response=rag_chain.invoke({"input":"What is Deep LEarning"})

In [33]:
response

{'input': 'What is Deep LEarning',
 'context': [Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
  Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    process

In [34]:
response['answer']

'Deep learning is a subset of machine learning that relies on artificial neural networks inspired by the human brain, consisting of interconnected layers of nodes. It has significantly advanced fields such as computer vision, natural language processing, and speech recognition. Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers are commonly used architectures in deep learning for tasks like image processing and sequential data analysis.'

In [None]:
# ------------------------------------------------------------
# Function: Query the Modern RAG System
# ------------------------------------------------------------
# This function sends a user question through the Retrieval-Augmented
# Generation (RAG) pipeline using `rag_chain.invoke()`.
# It prints both the final answer and the retrieved source context.
# ------------------------------------------------------------

def query_rag_modern(question):
    print(f"Question: {question}")
    print("-" * 50)
    
    # Run the query through the RAG chain
    # - "input" is passed to the chain, which handles retrieval + LLM response
    result = rag_chain.invoke({"input": question})
    
    # Display the generated answer
    print(f"Answer: {result['answer']}")
    
    # Show retrieved document context for transparency
    print("\nRetrieved Context:")
    for i, doc in enumerate(result['context']):
        print(f"\n--- Source {i+1} ---")
        print(doc.page_content[:200] + "...")  # Preview first 200 chars
    
    return result


# ------------------------------------------------------------
# Test Queries for the RAG System
# ------------------------------------------------------------
# Each query tests retrieval accuracy and reasoning quality
# of the integrated RAG pipeline.
# ------------------------------------------------------------

test_questions = [
    "What are the three types of machine learning?",
    "What is deep learning and how does it relate to neural networks?",
    "What are CNNs best used for?"
]

# Run the test queries and display results
for question in test_questions:
    result = query_rag_modern(question)
    print("\n" + "="*80 + "\n")  # Separator between results


Question: What are the three types of machine learning?
--------------------------------------------------
Answer: The three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, unsupervised learning finds patterns in unlabeled data, and reinforcement learning learns through feedback from the environment.

Retrieved Context:

--- Source 1 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...

--- Source 2 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...

--- Source 3 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artifici

## Create RAG Chain Alternative - Using LCEL (LangChain Expression Language)

In [38]:
# Even more flexible approach using LCEL
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

In [39]:
# Create a custom prompt
custom_prompt = ChatPromptTemplate.from_template("""Use the following context to answer the question. 
If you don't know the answer based on the context, say you don't know.
Provide specific details from the context to support your answer.

Context:
{context}

Question: {question}

Answer:""")
custom_prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following context to answer the question. \nIf you don't know the answer based on the context, say you don't know.\nProvide specific details from the context to support your answer.\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])

In [40]:
retriever

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001D9674BFB60>, search_kwargs={})

In [41]:
## Format the output documents for the prompt
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [None]:
# ------------------------------------------------------------
# Build a Modern RAG Chain using LangChain Execution Layer (LCEL)
# ------------------------------------------------------------
# This pipeline integrates retrieval, prompt formatting, LLM generation,
# and output parsing into a single executable chain.
# ------------------------------------------------------------

rag_chain_lcel = (
    {
        # Retrieve relevant documents and format them for the prompt
        "context": retriever | format_docs,
        # Pass the user's question directly without modification
        "question": RunnablePassthrough()
    }
    # Apply a custom prompt template using the retrieved context and question
    | custom_prompt
    # Generate the answer using the chosen LLM
    | llm
    # Parse the LLM output into a clean string
    | StrOutputParser()
)

# Display the LCEL RAG chain object
rag_chain_lcel


{
  context: VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001D9674BFB60>, search_kwargs={})
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following context to answer the question. \nIf you don't know the answer based on the context, say you don't know.\nProvide specific details from the context to support your answer.\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001D96AA93110>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001D96AA92FD0>, r

In [None]:
# ------------------------------------------------------------
# Query the LCEL RAG Chain
# ------------------------------------------------------------
# Sends a natural language question through the fully integrated
# retrieval + LLM pipeline and returns the generated answer.
# ------------------------------------------------------------

# Invoke the chain with a user question
response = rag_chain_lcel.invoke("What is Deep Learning")

# Display the raw response
response


'Deep learning is a subset of machine learning based on artificial neural networks. These networks consist of layers of interconnected nodes, inspired by the human brain, and have revolutionized fields like computer vision, natural language processing, and speech recognition.'

In [None]:
# ------------------------------------------------------------
# Retrieve Relevant Documents Using the Retriever
# ------------------------------------------------------------
# Fetches the most relevant documents for a given query without
# running them through the LLM. Useful for inspecting context.
# ------------------------------------------------------------

retriever.get_relevant_documents("What is Deep Learning")

[Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neura

In [None]:
# ------------------------------------------------------------
# Query the RAG System using the LCEL Approach
# ------------------------------------------------------------
# This function sends a user question through the LCEL-based RAG chain
# and optionally retrieves the supporting source documents.
# ------------------------------------------------------------

def query_rag_lcel(question):
    print(f"Question: {question}")
    print("-" * 50)
    
    # Step 1: Generate answer using LCEL RAG chain
    # - Directly pass the question string (RunnablePassthrough handles it)
    answer = rag_chain_lcel.invoke(question)
    print(f"Answer: {answer}")
    
    # Step 2: Retrieve supporting documents (optional)
    # - Fetches the top-k most relevant chunks from the retriever
    docs = retriever.get_relevant_documents(question)
    print("\nSource Documents:")
    for i, doc in enumerate(docs):
        print(f"\n--- Source {i+1} ---")
        print(doc.page_content[:200] + "...")  # Show first 200 characters


In [66]:
# Test LCEL chain
print("Testing LCEL Chain:")
query_rag_lcel("What are the key concepts in reinforcement learning?")

Testing LCEL Chain:
Question: What are the key concepts in reinforcement learning?
--------------------------------------------------
Answer: The key concepts in reinforcement learning are interaction with an environment, using rewards and penalties for learning. This involves the system learning through trial and error, receiving feedback in the form of rewards and penalties to adjust its behavior.

Source Documents:

--- Source 1 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 2 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 3 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...

--- Source 4 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificia

In [67]:
query_rag_lcel("What is machine learning?")

Question: What is machine learning?
--------------------------------------------------
Answer: Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It consists of three main types: supervised learning, unsupervised learning, and reinforcement learning.

Source Documents:

--- Source 1 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...

--- Source 2 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...

--- Source 3 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human 

In [69]:
query_rag_lcel("What is depe learning?")

Question: What is depe learning?
--------------------------------------------------
Answer: Deep learning is a subset of machine learning based on artificial neural networks. These networks consist of layers of interconnected nodes and are inspired by the human brain. Deep learning has revolutionized fields such as computer vision, natural language processing, and speech recognition. Convolutional Neural Networks (CNNs) are especially effective for image processing in deep learning.

Source Documents:

--- Source 1 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...

--- Source 2 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...

--- Source 3 ---
Machine Learning Fundamentals



## Add New Documents To Existing Vector Store

In [73]:
vectorstore

<langchain_community.vectorstores.chroma.Chroma at 0x1d9674bfb60>

In [74]:
# Add new documents to the existing vector store
new_document = """
Reinforcement Learning in Detail

Reinforcement learning (RL) is a type of machine learning where an agent learns to make 
decisions by interacting with an environment. The agent receives rewards or penalties 
based on its actions and learns to maximize cumulative reward over time. Key concepts 
in RL include: states, actions, rewards, policies, and value functions. Popular RL 
algorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and 
Actor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), 
robotics, and autonomous systems.
"""

In [75]:
new_document

'\nReinforcement Learning in Detail\n\nReinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and value functions. Popular RL \nalgorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and \nActor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), \nrobotics, and autonomous systems.\n'

In [76]:
chunks

[Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through'),
 Document(metadata={'source': 'data\\doc_0.txt'}, page_content='data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of in

In [77]:
new_doc=Document(
    page_content=new_document,
    metadata={"source": "manual_addition", "topic": "reinforcement_learning"}
)

In [79]:
new_doc

Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='\nReinforcement Learning in Detail\n\nReinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and value functions. Popular RL \nalgorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and \nActor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), \nrobotics, and autonomous systems.\n')

In [81]:
## split the documents
new_chunks=text_splitter.split_documents([new_doc])
new_chunks

[Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='Reinforcement Learning in Detail\n\nReinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and value functions. Popular RL \nalgorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and \nActor-Critic methods. RL has been'),
 Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='methods, and \nActor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), \nrobotics, and autonomous systems.')]

In [82]:
### Add new documents to vectorstore
vectorstore.add_documents(new_chunks)



['b9b7448a-f68b-4593-a297-84e5697793f6',
 '4669dce8-2bbb-4d08-a11c-1f8b9ab2c01b']

In [83]:
print(f"Added {len(new_chunks)} new chunks to the vector store")
print(f"Total vectors now: {vectorstore._collection.count()}")

Added 2 new chunks to the vector store
Total vectors now: 12


In [84]:
## query with the updated vector
new_question="What are the keys concepts in reinforcement learning"
result=query_rag_lcel(new_question)
result

Question: What are the keys concepts in reinforcement learning
--------------------------------------------------
Answer: The key concepts in reinforcement learning include states, actions, rewards, policies, and value functions.

Source Documents:

--- Source 1 ---
Reinforcement Learning in Detail

Reinforcement learning (RL) is a type of machine learning where an agent learns to make 
decisions by interacting with an environment. The agent receives rewards or p...

--- Source 2 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 3 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 4 ---
methods, and 
Actor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), 
robotics, and autonomous systems....


## Advanced Rag Techniques- Conversational Memory

Understanding Conversational Memory in RAG
Conversational memory enables RAG systems to maintain context across multiple interactions. This is crucial for:

Follow-up questions that reference previous answers
Pronoun resolution (e.g., "it", "they", "that")
Context-dependent queries that build on prior discussion
Natural dialogue flow where users don't repeat context

Key Challenge:
Traditional RAG retrieves documents based only on the current query, missing important context from the conversation. For example:

User: "Tell me about Python"
Bot: explains Python programming language
User: "What are its main libraries?" ‚Üê "its" refers to Python, but retriever doesn't know this

Solution:
The modern approach uses a two-step process:

Query Reformulation: Transform context-dependent questions into standalone queries
Context-Aware Retrieval: Use the reformulated query to fetch relevant documents

- create_history_aware_retriever: Makes the retriever understand conversation context
- MessagesPlaceholder: Placeholder for chat history in prompts
- HumanMessage/AIMessage: Structured message types for conversation history

In [85]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage

In [86]:
## create a prompt that includes the chat history
contextualize_q_system_prompt = """Given a chat history and the latest user question 
which might reference context in the chat history, formulate a standalone question 
which can be understood without the chat history. Do NOT answer the question, 
just reformulate it if needed and otherwise return it as is."""

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

In [87]:
## create history aware retriever
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)
history_aware_retriever

RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
| VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001D9674BFB60>, search_kwargs={}))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIM

In [88]:
# Create a new document chain with history
qa_system_prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Context: {context}"""

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

# Create conversational RAG chain
conversational_rag_chain = create_retrieval_chain(
    history_aware_retriever, 
    question_answer_chain
)
print("Conversational RAG chain created!")

Conversational RAG chain created!


In [89]:
chat_history=[]
# First question
result1 = conversational_rag_chain.invoke({
    "chat_history": chat_history,
    "input": "What is machine learning?"
})
print(f"Q: What is machine learning?")
print(f"A: {result1['answer']}")

Q: What is machine learning?
A: Machine learning is a subset of artificial intelligence that allows systems to learn and improve from experience without being explicitly programmed. It encompasses supervised learning, unsupervised learning, and reinforcement learning as its main types. In supervised learning, models are trained using labeled data, while unsupervised learning identifies patterns in unlabeled data.


In [91]:
chat_history.extend([
    HumanMessage(content="What is machine learning"),
    AIMessage(content=result1['answer'])
])

In [92]:
chat_history

[HumanMessage(content='What is machine learning', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Machine learning is a subset of artificial intelligence that allows systems to learn and improve from experience without being explicitly programmed. It encompasses supervised learning, unsupervised learning, and reinforcement learning as its main types. In supervised learning, models are trained using labeled data, while unsupervised learning identifies patterns in unlabeled data.', additional_kwargs={}, response_metadata={})]

In [93]:
## Follow up question
# Follow-up question
result2 = conversational_rag_chain.invoke({
    "chat_history": chat_history,
    "input": "What are its main types?"  # Refers to ML from previous question
})
result2

{'chat_history': [HumanMessage(content='What is machine learning', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Machine learning is a subset of artificial intelligence that allows systems to learn and improve from experience without being explicitly programmed. It encompasses supervised learning, unsupervised learning, and reinforcement learning as its main types. In supervised learning, models are trained using labeled data, while unsupervised learning identifies patterns in unlabeled data.', additional_kwargs={}, response_metadata={})],
 'input': 'What are its main types?',
 'context': [Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    lear

In [94]:
result2['answer']

'The main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, while unsupervised learning finds patterns in unlabeled data. Reinforcement learning learns through a system of reward and punishment.'

## Using GROQ LLM's
 

In [97]:
llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001D96AA93110>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001D96AA92FD0>, root_client=<openai.OpenAI object at 0x000001D96AB19810>, root_async_client=<openai.AsyncOpenAI object at 0x000001D96AB196E0>, model_kwargs={}, openai_api_key=SecretStr('**********'))

In [98]:
load_dotenv()

True

In [None]:
os.getenv("GROQ_API_KEY")

In [100]:
from langchain_groq import ChatGroq
from langchain.chat_models import init_chat_model

In [101]:
os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

In [102]:
llm=ChatGroq(model="gemma2-9b-it",api_key=os.getenv("GROQ_API_KEY"))
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x000001D9741897F0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001D974189BE0>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [103]:
llm=init_chat_model(model="groq:gemma2-9b-it")
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x000001D9742AB110>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001D9742AB610>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'))

# **Additional Notes**

## **ChromaDB**

### **Introduction to Chroma DB**

**1. What is Chroma DB?**

**Chroma DB** is an open-source vector database designed to efficiently store, manage, and query high-dimensional embeddings generated by machine learning models. It is optimized for similarity search, recommendation systems, and other applications that require working with vector representations of data.

Key highlights:

* Fast vector similarity search using advanced indexing techniques.
* Integrates seamlessly with machine learning pipelines.
* Supports rich metadata for documents to enable contextual filtering.
* Scales for both small projects and industrial-level workloads.


**2. Overview of Chroma DB**

Chroma DB acts as a bridge between raw data and intelligent applications that rely on vector representations:

* It stores **vectors (embeddings)** produced by models such as OpenAI embeddings, SentenceTransformers, etc.
* Enables **efficient querying** to find vectors similar to a given input vector (nearest neighbor search).
* Organizes data in **collections** with metadata for easier filtering and management.


**3. Use Cases and Applications**

Chroma DB is widely used in scenarios involving **similarity search** and **semantic retrieval**:

* **Semantic Search:** Retrieve documents, images, or media relevant to a query vector.
* **Recommendation Systems:** Suggest items similar to user preferences based on embeddings.
* **Chatbots and RAG (Retrieval-Augmented Generation):** Retrieve contextually relevant documents for generating answers.
* **Anomaly Detection:** Identify outliers in high-dimensional feature space.
* **Image, Audio, and Video Search:** Use embeddings to find similar multimedia content.

**Core Concepts**

a. Vectors and Embeddings

* **Vector:** A numeric representation of an object in high-dimensional space (e.g., `[0.23, -0.14, 0.98]`).
* **Embedding:** A transformation of data (text, image, audio) into a vector, capturing semantic meaning.
* **Similarity Search:** Use metrics like cosine similarity or Euclidean distance to find vectors closest to a query.

b. Collections, Documents, and Metadata

* **Collection:** A container in Chroma DB to organize related vectors. Think of it as a table in relational databases.
* **Document:** A single entry in a collection, usually consisting of:

  * **Embedding vector**
  * **Content** (optional raw data like text or image reference)
  * **Metadata** (key-value pairs for filtering and organization)

c. Tenancy and Database Hierarchies

* **Tenancy:** Supports isolation of data for different users or applications.
* **Database Hierarchy:**

  * **Database:** Top-level container
  * **Collection:** Logical grouping within a database
  * **Document/Vector:** Individual entries within a collection

---
**5. Installation and Setup**

a. System Requirements

* **Python:** Version 3.8 or higher
* **OS:** Cross-platform (Linux, Windows, macOS)
* **Memory:** Dependent on dataset size, typically 8GB+ recommended for medium workloads

b. Installing Chroma DB

Chroma DB can be installed via Python‚Äôs `pip`:

```bash
pip install chromadb
```

Optional dependencies for enhanced performance:

```bash
pip install chromadb[duckdb]
```

c. Setting up Your First Collection

1. **Import Chroma DB client:**

```python
import chromadb
from chromadb.config import Settings

client = chromadb.Client(Settings())
```

2. **Create a collection:**

```python
collection = client.create_collection("my_first_collection")
```

3. **Add documents with embeddings:**

```python
collection.add(
    documents=["Hello world", "Chroma DB is great!"],
    embeddings=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
    metadatas=[{"source": "doc1"}, {"source": "doc2"}],
    ids=["1", "2"]
)
```

4. **Query similar embeddings:**

```python
results = collection.query(
    query_embeddings=[[0.1, 0.2, 0.3]],
    n_results=1
)
print(results)
```

### **Creating and Managing Collections in Chroma DB**

**1. Defining Collections**

> A **collection** in Chroma DB is a container for storing related vectors (embeddings) along with their documents and metadata. Collections help organize data and enable efficient querying.

**Steps to Define a Collection**

1. **Initialize Chroma Client:**

```python
import chromadb
from chromadb.config import Settings

client = chromadb.Client(Settings())
```

2. **Create a Collection:**

```python
collection = client.create_collection(
    name="my_collection",
    metadata={"description": "A collection for demo purposes"}
)
```

* `name`: Unique name of the collection
* `metadata` (optional): Key-value pairs describing the collection

3. **Retrieve an Existing Collection:**

```python
collection = client.get_collection("my_collection")
```

4. **List All Collections:**

```python
collections = client.list_collections()
print([c.name for c in collections])
```

---

**2. Adding and Removing Documents**

a. Adding Documents

A **document** in Chroma DB includes:

* **embedding vector**: Numeric representation of the content
* **content**: Optional raw data
* **metadata**: Optional key-value pairs
* **id**: Unique identifier

**Example:**

```python
collection.add(
    documents=["Hello Chroma", "Vector databases are powerful!"],
    embeddings=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
    metadatas=[{"type": "greeting"}, {"type": "statement"}],
    ids=["doc1", "doc2"]
)
```

**Notes:**

* Vectors should all have the **same dimension**.
* Metadata allows for **filtered queries** later.

b. Removing Documents

Documents can be removed by their **IDs**:

```python
collection.delete(ids=["doc1"])
```

* Deletes only the specified documents.
* You can also remove documents based on metadata using filters:

```python
collection.delete(where={"type": "statement"})
```

---

3. Modifying Collection Metadata

Collections can have metadata attached for organizational purposes. This metadata can be **updated or modified** after creation.

a. Updating Metadata:

```python
collection.update(
    metadata={"description": "Updated description for my collection"}
)
```

* This changes the collection‚Äôs metadata without affecting the stored documents.

b. Retrieving Metadata:

```python
print(collection.metadata)
```

* Displays the key-value pairs associated with the collection.

c. Use Cases for Metadata:

* Descriptions, tags, or versioning of the collection
* Categorizing collections by project, domain, or data type
* Enabling filtered queries at the collection level

---

**Summary**

| Operation         | Method              | Example                                                        |
| ----------------- | ------------------- | -------------------------------------------------------------- |
| Create Collection | `create_collection` | `client.create_collection("my_collection")`                    |
| Get Collection    | `get_collection`    | `client.get_collection("my_collection")`                       |
| List Collections  | `list_collections`  | `client.list_collections()`                                    |
| Add Document      | `add`               | `collection.add(documents=[...], embeddings=[...], ids=[...])` |
| Delete Document   | `delete`            | `collection.delete(ids=["doc1"])`                              |
| Update Metadata   | `update`            | `collection.update(metadata={"key":"value"})`                  |

### **Storage, Persistence, Indexing, and Retrieval in Chroma DB**

**1. Storage and Persistence**

> Chroma DB provides options to **persist your vector data** so it is not lost when the application stops. Understanding storage and persistence is crucial for production-grade deployments.

a. Understanding the Persistent Directory

* By default, Chroma DB can run **in-memory**, meaning data is lost after the process ends.
* To persist data, Chroma DB stores vectors, documents, and metadata in a **persistent directory**:

```python
import chromadb
from chromadb.config import Settings

client = chromadb.Client(Settings(
    persist_directory="./chroma_data"  # specify your directory
))
```

* The `persist_directory` ensures that all collections and documents are saved to disk.
* After modifying or adding documents, call `client.persist()` to save changes:

```python
client.persist()
```

---

b. Managing Storage Backends

Chroma DB supports multiple storage backends, such as **DuckDB** and **SQLite**, for storing persistent data:

* **DuckDB**: Fast analytical database; recommended for large-scale workloads
* **SQLite**: Lightweight, easy to set up; good for small projects or prototypes

**Example (DuckDB backend):**

```python
client = chromadb.Client(Settings(
    persist_directory="./chroma_data",
    chroma_db_impl="duckdb+parquet"
))
```

* `chroma_db_impl` controls the storage engine.

---

c. Data Durability and Recovery

* **Durability:** Persisted collections remain safe even after crashes or restarts.
* **Recovery:** Simply initialize Chroma DB with the same `persist_directory` to reload collections:

```python
client = chromadb.Client(Settings(
    persist_directory="./chroma_data"
))
# All previous collections are restored
print(client.list_collections())
```

---

**2. Indexing and Retrieval**

Efficient retrieval of vectors requires proper **indexing**.

a. Introduction to HNSW Indexing

* Chroma DB uses **HNSW (Hierarchical Navigable Small World)** graphs for fast similarity search.
* HNSW allows **approximate nearest neighbor search** in high-dimensional vector spaces with high performance.
* Index parameters can be configured for speed vs. accuracy trade-offs (e.g., `M` and `ef_construction`).

```python
collection = client.create_collection("my_collection", metadata={}, get_or_create=True)
# HNSW indexing is enabled by default
```

---

b. Performing Similarity Searches

* **Querying by embedding vector** finds documents closest in semantic space:

```python
query_vector = [0.1, 0.2, 0.3]
results = collection.query(
    query_embeddings=[query_vector],
    n_results=2
)
print(results)
```

* `n_results`: Number of closest documents to return
* Returned results include document IDs, content, embeddings, and metadata

---

c. Filtering and Querying Documents

* Filters allow restricting searches based on **metadata**:

```python
results = collection.query(
    query_embeddings=[[0.1, 0.2, 0.3]],
    n_results=2,
    where={"type": "greeting"}  # metadata filter
)
```

* **Advanced filtering:** Combine multiple conditions on metadata for more precise queries.
* You can also query **without embeddings** using metadata alone:

```python
results = collection.get(where={"type": "greeting"})
```

---

**Summary**

| Feature            | Description                     | Example                                                   |
| ------------------ | ------------------------------- | --------------------------------------------------------- |
| Persist Directory  | Stores collections/data on disk | `Settings(persist_directory="./data")`                    |
| Storage Backends   | DuckDB, SQLite                  | `chroma_db_impl="duckdb+parquet"`                         |
| Durability         | Data persists across restarts   | `client.persist()`                                        |
| HNSW Index         | Fast similarity search          | Default in collections                                    |
| Query by Embedding | Returns nearest neighbors       | `collection.query(query_embeddings=[[...]], n_results=3)` |
| Metadata Filtering | Filter results using metadata   | `collection.query(where={"type":"greeting"})`             |

### **Embedding Models and Functions in Chroma DB**

Chroma DB stores **vectors (embeddings)** rather than raw data, so integrating with the right embedding model is crucial for meaningful similarity search.

---

**1. Integrating with External Embedding Models**

Chroma DB can work with **any embedding model** that converts raw data (text, images, audio) into numeric vectors.

a. OpenAI Embeddings

* OpenAI provides text embeddings via models like `text-embedding-3-large` and `text-embedding-3-small`.
* Example integration:

```python
from openai import OpenAI
import chromadb
from chromadb.utils import embedding_functions

# Initialize OpenAI client
client = OpenAI(api_key="YOUR_OPENAI_API_KEY")

# Create Chroma DB embedding function using OpenAI
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="YOUR_OPENAI_API_KEY",
    model_name="text-embedding-3-large"
)

# Assign embedding function to collection
collection = chromadb.Client().create_collection(
    name="my_collection",
    embedding_function=openai_ef
)
```

b. Hugging Face Models

* Hugging Face provides transformer-based models for embeddings (e.g., `sentence-transformers/all-MiniLM-L6-v2`).
* Integration example:

```python
from sentence_transformers import SentenceTransformer
from chromadb.utils import embedding_functions

model = SentenceTransformer('all-MiniLM-L6-v2')

hf_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name_or_path='all-MiniLM-L6-v2')

collection = chromadb.Client().create_collection(
    name="hf_collection",
    embedding_function=hf_ef
)
```

c. Custom Embedding Models

* You can use your own models trained on specific data.
* Any model that takes raw input and returns a **fixed-size vector** can be used:

```python
def custom_embedding_fn(texts):
    # Example: returns 3-dimensional dummy embeddings
    return [[len(t), len(t.split()), sum(ord(c) for c in t) % 100] for t in texts]

collection = chromadb.Client().create_collection(
    name="custom_collection",
    embedding_function=custom_embedding_fn
)
```

---

**2. Using Embedding Functions**

Embedding functions in Chroma DB abstract away the **vector generation step**:

* When adding documents:

```python
collection.add(
    documents=["Hello world", "Chroma DB rocks!"],
    ids=["1", "2"]
)  # embeddings are automatically computed via embedding_function
```

* When querying:

```python
results = collection.query(
    query_texts=["Hello!"],  # embedding function converts text to vector
    n_results=1
)
```

* Using embedding functions ensures **consistency between storage and queries**.

---

**3. Fine-Tuning Embeddings for Specific Tasks**

Fine-tuning embeddings improves performance for **domain-specific tasks**, such as legal, medical, or technical documents.

**Strategies:**

1. **Task-Specific Pretrained Models:**

   * Use domain-specific models from Hugging Face or OpenAI fine-tuned models.

2. **Custom Fine-Tuning:**

   * Train your model on labeled pairs or triplets (contrastive learning) for your dataset.

3. **Embedding Normalization:**

   * Normalize vectors to unit length for **cosine similarity**.

4. **Metadata-Aware Embeddings:**

   * Combine embeddings with metadata features to enhance retrieval relevance.

**Example:**

```python
import numpy as np

def normalized_embedding_fn(texts):
    vectors = custom_embedding_fn(texts)
    return [list(np.array(v)/np.linalg.norm(v)) for v in vectors]

collection = chromadb.Client().create_collection(
    name="fine_tuned_collection",
    embedding_function=normalized_embedding_fn
)
```

---

**Summary**

| Topic                    | Key Points                                                             |
| ------------------------ | ---------------------------------------------------------------------- |
| OpenAI Integration       | Use `OpenAIEmbeddingFunction` to generate embeddings automatically     |
| Hugging Face Integration | Use `SentenceTransformerEmbeddingFunction` or similar                  |
| Custom Models            | Any function that returns numeric vectors can be used                  |
| Embedding Functions      | Automate vector creation during add/query operations                   |
| Fine-Tuning              | Use task-specific models, normalize embeddings, or train custom models |


### **Metadata and Filtering in Chroma DB**

Chroma DB allows you to attach **metadata** to your documents and use it for **filtered queries**, which is essential for narrowing down search results and organizing your vector data effectively.

---

**1. Associating Metadata with Documents**

Metadata is **additional information** stored alongside each document. It is typically structured as **key-value pairs** and can include any descriptive information such as tags, source, type, category, or timestamp.

Example: Adding Documents with Metadata

```python
collection.add(
    documents=[
        "Introduction to Chroma DB",
        "Using embeddings in vector databases"
    ],
    embeddings=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
    metadatas=[
        {"category": "tutorial", "level": "beginner"},
        {"category": "tutorial", "level": "intermediate"}
    ],
    ids=["doc1", "doc2"]
)
```

* `metadatas` is a list of dictionaries matching each document.
* Metadata enables **filtered queries** without relying solely on embeddings.

---

**2. Querying Based on Metadata**

Chroma DB supports querying documents based on metadata filters using the `where` parameter.

Example: Filter by Metadata

```python
results = collection.query(
    query_embeddings=[[0.1, 0.2, 0.3]],
    n_results=1,
    where={"level": "beginner"}
)
print(results)
```

* Only documents with `level` = `"beginner"` will be considered in the similarity search.
* Useful for combining **semantic search with structured filtering**.

Example: Retrieving Documents Without Embeddings

```python
docs = collection.get(where={"category": "tutorial"})
print(docs)
```

* Retrieves all documents matching metadata filters.
* Does not require embedding-based similarity search.

---

**3. Advanced Filtering Techniques**

Chroma DB supports more **complex filtering** using multiple conditions:

a. Multiple Metadata Conditions

```python
results = collection.query(
    query_embeddings=[[0.4, 0.5, 0.6]],
    n_results=2,
    where={"category": "tutorial", "level": "intermediate"}
)
```

* Documents must match **all conditions** in the `where` dictionary.

b. Range Queries (numeric metadata)

```python
collection.add(
    documents=["Advanced AI tutorial"],
    embeddings=[[0.7, 0.8, 0.9]],
    metadatas=[{"difficulty": 8}],
    ids=["doc3"]
)

results = collection.query(
    query_embeddings=[[0.7, 0.8, 0.9]],
    n_results=1,
    where={"difficulty": {"$gt": 5}}  # difficulty greater than 5
)
```

* Use operators like `$gt` (greater than), `$lt` (less than), `$gte`, `$lte`.

### c. Combining Metadata Filters and Similarity Search

* Filters refine the **search space** for embeddings.
* Helps **prioritize relevance** while respecting metadata constraints.

---

**Summary**

| Feature                    | Description                                           | Example                                                                |
| -------------------------- | ----------------------------------------------------- | ---------------------------------------------------------------------- |
| Metadata                   | Key-value pairs attached to each document             | `{"category": "tutorial", "level": "beginner"}`                        |
| Simple Filtering           | Retrieve documents matching metadata                  | `collection.get(where={"category": "tutorial"})`                       |
| Filtered Similarity Search | Combine embeddings with metadata filters              | `collection.query(query_embeddings=[...], where={"level":"beginner"})` |
| Advanced Filtering         | Multiple conditions, range queries, numeric operators | `{"difficulty":{"$gt":5}}`                                             |



**Performance Optimization in Chroma DB**

Chroma DB is designed for high-performance vector search, but **optimizing performance** becomes critical as dataset size grows. This section covers tuning, indexing strategies, and scaling considerations.

---

**1. Tuning for Large-Scale Datasets**

Large datasets require careful configuration to maintain **low-latency queries** and efficient storage.

a. Memory and Disk Considerations

* Keep frequently queried collections in memory for faster access.
* Use a **persistent directory** for durability and disk-backed storage.
* Monitor memory usage to avoid excessive swapping.

b. Batch Operations

* **Add documents in batches** rather than one by one to reduce overhead:

```python
collection.add(
    documents=batch_docs,
    embeddings=batch_embeddings,
    metadatas=batch_metadata,
    ids=batch_ids
)
```

* **Query multiple vectors at once** rather than issuing multiple single queries:

```python
results = collection.query(
    query_embeddings=batch_query_vectors,
    n_results=5
)
```

c. Embedding Optimization

* Use **lower-dimensional embeddings** if accuracy allows (reduces storage and speeds up search).
* Normalize embeddings to improve **cosine similarity search efficiency**.

---

**2. Indexing Strategies**

Indexing determines **how efficiently nearest neighbors are retrieved**.

a. HNSW (Hierarchical Navigable Small World)

* Default index type in Chroma DB for vector search.
* **Key parameters:**

  * `M` ‚Üí Maximum number of neighbors per node (trade-off between speed and accuracy).
  * `ef_construction` ‚Üí Higher values improve accuracy at the cost of slower indexing.
* Example:

```python
collection = client.create_collection(
    "optimized_collection",
    metadata={},
    get_or_create=True
)
```

* **Querying parameters:**

  * `ef` ‚Üí Controls the number of candidates considered during search (higher = more accurate).

b. Hybrid Indexing

* Combine **metadata filtering** with vector search to reduce the search space.
* Example: Filter on a category before performing HNSW search.

---

**3. Scaling with Distributed Systems**

For **enterprise-scale workloads**, you may need to distribute Chroma DB or embed it into a larger pipeline.

a. Sharding

* Split large collections into multiple **shards** based on metadata or document type.
* Each shard can be queried independently and results merged.

b. Replication

* Maintain multiple copies of collections for **high availability** and **load balancing**.

c. Integration with Orchestrators

* Use **distributed orchestration frameworks** (e.g., Kubernetes, Airflow) to manage large-scale indexing, persistence, and querying.
* Store embeddings in distributed storage (DuckDB + Parquet, S3) for **scalable persistence**.

d. Approximate Nearest Neighbor (ANN) Trade-offs

* HNSW is **approximate**, so you can tune parameters to balance **query speed vs. accuracy**.
* Large datasets benefit from **lower `ef` during queries for speed** and higher `ef_construction` during indexing for accuracy.

---

**4. Summary of Performance Tips**

| Optimization Area      | Techniques                                                              |
| ---------------------- | ----------------------------------------------------------------------- |
| Memory & Storage       | Keep hot collections in memory, use persistent directory for durability |
| Batch Operations       | Add/query in batches to reduce overhead                                 |
| Embeddings             | Use lower dimensions if possible, normalize embeddings                  |
| Indexing               | Tune HNSW parameters: `M`, `ef_construction`, `ef`                      |
| Filtering              | Use metadata filters to reduce search space                             |
| Sharding & Replication | Split large collections, replicate for availability                     |
| Distributed Systems    | Integrate with orchestration frameworks and distributed storage         |
| ANN Trade-offs         | Adjust search parameters to balance speed and accuracy                  |

### **Integrating Chroma DB with LangChain**

Chroma DB is a natural fit for **Retrieval-Augmented Generation (RAG)** pipelines, where vector search enhances language model responses with relevant contextual data.

---

**1. Overview of RAG (Retrieval-Augmented Generation)**

**RAG** combines:

1. **Retrieval:** Fetch relevant documents from a knowledge base or vector store.
2. **Generation:** Use a language model (LLM) to generate responses based on retrieved documents.

**Benefits:**

* Provides contextually accurate answers.
* Reduces hallucinations by grounding the LLM with real data.
* Scales easily with your knowledge base.

---

**2. Setting Up Chroma DB with LangChain**

a. Install Dependencies

```bash
pip install chromadb langchain openai
```

b. Initialize Chroma Client and Collection

```python
import chromadb
from chromadb.config import Settings

client = chromadb.Client(Settings(persist_directory="./chroma_data"))

collection = client.get_collection("my_collection")  # or create_collection(...)
```

---

**3. Integrating with Embedding Functions**

LangChain requires an **embedding function** compatible with your vector store:

```python
from langchain.embeddings import OpenAIEmbeddings

embedding_fn = OpenAIEmbeddings(model="text-embedding-3-large")
```

* Chroma DB collections can use the same embedding function for **document addition** and **querying**.

---

**4. Building a RAG Pipeline**

LangChain provides a **RetrievalQA** chain that connects Chroma DB (vector store) with LLMs:

a. Create a Chroma DB Retriever

```python
from langchain.vectorstores import Chroma

vectordb = Chroma(
    collection_name="my_collection",
    embedding_function=embedding_fn,
    persist_directory="./chroma_data"
)

retriever = vectordb.as_retriever(search_kwargs={"k": 3})  # fetch top 3 relevant docs
```

b. Connect Retriever to an LLM

```python
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model="gpt-4", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True  # optional: returns docs used for answers
)
```

---

c. Querying the RAG Pipeline

```python
query = "Explain the core concepts of Chroma DB"
result = qa_chain.run(query)

print(result)
```

* The **retriever** fetches relevant documents from Chroma DB.
* The **LLM** generates an answer grounded in the retrieved documents.

d. Accessing Source Documents

```python
result = qa_chain({"query": query})
answer = result["result"]
sources = result["source_documents"]
```

* Useful for **citation, debugging, or displaying retrieved context**.

---

**5. Best Practices for RAG Pipelines**

1. **Keep embedding models consistent:** Ensure embeddings used in Chroma DB match the embedding function used by the retriever.
2. **Tune `k` in search_kwargs:** Balance between context size and retrieval relevance.
3. **Filter with metadata:** Narrow retrieval results based on categories, timestamps, or document types.
4. **Persist collections:** Ensure the vector store persists across sessions for continuity.
5. **Batch document ingestion:** Improves indexing speed for large datasets.

---

**Summary**

| Component          | Role                                        | Example                                            |
| ------------------ | ------------------------------------------- | -------------------------------------------------- |
| Chroma DB          | Vector store for embeddings                 | `collection.add(...)`                              |
| Embedding Function | Converts text to vector                     | `OpenAIEmbeddings(model="text-embedding-3-large")` |
| Retriever          | Fetches top-k relevant docs                 | `vectordb.as_retriever(search_kwargs={"k":3})`     |
| LLM                | Generates answer based on retrieved context | `ChatOpenAI(model="gpt-4")`                        |
| RetrievalQA        | RAG chain combining retriever + LLM         | `RetrievalQA.from_chain_type(...)`                 |


### **How Chroma DB Works**

Chroma DB is designed as a **high-performance vector database** optimized for similarity search, semantic retrieval, and AI-driven pipelines. Understanding its internal architecture and algorithms helps optimize usage and design better systems.

**1. Core Architecture**

Chroma DB consists of the following main components:

a. Client Layer

* The **interface** for applications to interact with Chroma DB.
* Provides APIs to:

  * Create and manage collections
  * Add and query documents
  * Perform filtering and metadata operations
* Examples: Python SDK, REST API (planned/experimental)

b. Collection Layer

* Each **collection** is a logical container for:

  * **Embeddings (vectors)**
  * **Documents (raw data or content references)**
  * **Metadata (key-value pairs)**
* Collections isolate datasets and support multi-tenancy.

c. Storage Layer

* Persistent storage backend for vectors and metadata:

  * **DuckDB + Parquet** (default for production-scale storage)
  * **SQLite** (lightweight option)
* Supports **disk persistence** and **recovery** after crashes.

d. Index Layer

* Responsible for fast nearest-neighbor search.
* Uses **Hierarchical Navigable Small World (HNSW) graphs** for vector indexing.

e. Query & Retrieval Layer

* Handles **similarity search** and **filtered retrieval**.
* Supports:

  * k-nearest neighbor search (k-NN)
  * Metadata-based filtering
  * Hybrid queries (embedding + metadata)

---

**2. How Data Flows in Chroma DB**

1. **Document ingestion**:

   * Raw data ‚Üí Embedding function ‚Üí Vector
   * Add vectors + metadata to a collection
2. **Indexing**:

   * HNSW index updates to include new vectors
3. **Querying**:

   * Input query ‚Üí Embedding function ‚Üí Query vector
   * HNSW index searches nearest neighbors
   * Metadata filters narrow search results
4. **Result Retrieval**:

   * Return top-k documents + embeddings + metadata
   * Can be fed into downstream applications (RAG, recommendation, etc.)

---

**3. Vector Storage and Representation**

* Vectors are stored in **dense numerical arrays**, typically float32.
* Each vector is associated with:

  * **Document content** (optional)
  * **ID** (unique identifier)
  * **Metadata** (key-value pairs)

Persistent Storage

* Chroma DB serializes vectors and metadata into **Parquet files (via DuckDB)** or SQLite tables.
* Supports **incremental persistence** with `client.persist()`.

---
**4. Indexing Mechanism: HNSW**

**HNSW (Hierarchical Navigable Small World)** is the primary algorithm for nearest neighbor search:

Key Concepts

* Builds a **multi-layered graph** where nodes represent vectors.
* **Top layer**: Sparse graph with long-range connections for fast approximate search.
* **Bottom layer**: Dense graph for precise local neighbor search.

Parameters

* `M`: Maximum number of links per node; higher M ‚Üí better accuracy, more memory.
* `ef_construction`: Controls graph construction quality; higher ‚Üí more accurate index.
* `ef` (query-time): Number of candidates explored during search; higher ‚Üí slower but more accurate.

Search Algorithm

1. Start from the top layer and find closest node to query vector.
2. Navigate through layers using greedy search.
3. Reach bottom layer and perform **refined k-NN search**.

* **Advantages**:

  * Logarithmic search complexity for large datasets
  * High accuracy with approximate nearest neighbors
  * Incremental updates without rebuilding the entire index

---

**5. Metadata Filtering and Hybrid Search**

* Metadata allows **structured filtering** alongside vector search.
* During retrieval:

  * First, filter vectors by metadata
  * Then, perform HNSW k-NN search on the filtered subset
* Supports **complex filtering**: multiple conditions, range queries, boolean combinations.

---

**6. Embedding Functions and Model Integration**

* Chroma DB is **model-agnostic**: any embedding function that outputs fixed-size numeric vectors can be used.
* Typical integration workflow:

  * Text/Image ‚Üí Embedding function (OpenAI, Hugging Face, custom) ‚Üí Vector ‚Üí Chroma DB
* Embeddings can be normalized for **cosine similarity** or used directly for **Euclidean distance**.

---

**7. Persistence, Durability, and Recovery**

* **Persistent directory** stores collections and indices.
* On restart, Chroma DB reloads vectors and indexes from disk.
* Supports **incremental persistence**, enabling high durability for large datasets.

---

**8. Scaling Considerations**

a. Sharding

* Split collections into smaller **logical shards** for distributed querying.

b. Replication

* Duplicate collections for **high availability** and **load balancing**.

c. Distributed Storage

* Leverage **DuckDB + Parquet on shared storage** for large-scale deployments.

d. Batch Operations

* Adding/querying in batches improves indexing speed and reduces memory overhead.

---

**9. Summary of Chroma DB Architecture**

| Layer              | Function                     | Notes                          |
| ------------------ | ---------------------------- | ------------------------------ |
| Client             | API interface                | Python SDK, REST API           |
| Collection         | Logical container            | Stores vectors, docs, metadata |
| Storage            | Persistent backend           | DuckDB + Parquet, SQLite       |
| Index              | HNSW nearest neighbor        | Efficient approximate k-NN     |
| Query              | Retrieval + filtering        | Embeddings + metadata          |
| Embedding Function | Converts raw data to vectors | OpenAI, Hugging Face, custom   |
| Scaling            | Sharding & replication       | For large datasets             |


# **Additional Notes - LangChain**

## **Introduction and Fundamentals**



**Introduction to LangChain**
LangChain is a framework designed to simplify the **development of applications using large language models (LLMs)**. It provides tools for building pipelines that combine LLMs, retrieval systems, prompts, and memory.

**What is LangChain?**
LangChain is a Python-based framework that allows developers to **orchestrate LLMs**, integrate external data sources, and create intelligent workflows that go beyond basic text generation.

**History and Evolution**

* Released to address the need for structured LLM applications.
* Evolved from simple LLM wrappers to a **comprehensive ecosystem** for building RAG systems, conversational agents, and AI workflows.
* Constantly expanding with new connectors, memory modules, and evaluation tools.

**Use Cases and Applications**

* Retrieval-Augmented Generation (RAG) systems
* Conversational AI agents
* Knowledge management and semantic search
* Summarization, question answering, and data analysis
* Automation and workflow orchestration using LLMs

**Architecture Overview**
LangChain‚Äôs architecture consists of modular components that can be combined to create complex pipelines:

* **LLMs and Embeddings:** Core AI engines for generating text and creating vector representations
* **Chains:** Sequences of operations executed in order
* **Agents:** Decision-making entities that choose actions dynamically
* **Tools:** External utilities or APIs integrated into chains
* **Memory:** Persistent or session-based storage for conversational context

**Core Concepts**

**Chains, Agents, and Tools**

* **Chains:** Linear or branched sequences of LLM calls or operations
* **Agents:** Components that decide which action to take based on user input
* **Tools:** External resources or functions that agents can call (e.g., calculators, APIs, vector stores)

**Prompts and Prompt Templates**

* **Prompts:** Input text templates sent to LLMs
* **Prompt Templates:** Parameterized prompts allowing dynamic insertion of variables and context

**LLMs (Large Language Models) and Embeddings**

* **LLMs:** Models like GPT-4, LLaMA, or ChatOpenAI that generate text or complete tasks
* **Embeddings:** Vector representations of text or data used for similarity search, retrieval, and semantic operations

**Memory in LangChain**

* Stores conversational context or intermediate results
* Types of memory:

  * **Conversation Buffer Memory:** Keeps the last few exchanges
  * **Key-Value Memory:** Stores data for retrieval across sessions
  * **Summary Memory:** Condenses previous interactions into a summary

**Setting Up the Environment**

**Installing LangChain**

```bash
pip install langchain
```

**Dependencies and Virtual Environments**

* Use **virtual environments** to manage dependencies:

```bash
python -m venv env
source env/bin/activate  # Linux/Mac
env\Scripts\activate     # Windows
```

* Install required libraries like `openai`, `chromadb`, `huggingface_hub` as needed:

```bash
pip install openai chromadb huggingface_hub
```

**Configuring API Keys (OpenAI, Hugging Face, etc.)**

* **OpenAI:**

```bash
export OPENAI_API_KEY="your_api_key_here"   # Linux/Mac
setx OPENAI_API_KEY "your_api_key_here"    # Windows
```

* **Hugging Face:**

```bash
export HUGGINGFACEHUB_API_TOKEN="your_token_here"
```

* Verify keys are accessible in your Python scripts:

```python
import os
openai_key = os.getenv("OPENAI_API_KEY")
```

This setup ensures LangChain can interact with **LLMs, embeddings, and external tools** effectively.


## **Prompt Engineering**



**Prompt Engineering** is the practice of designing and refining the input text (prompts) you give to a language model to get **accurate, relevant, and useful outputs**. Good prompt engineering is crucial for ensuring LLMs behave as expected.

---

**Simple vs. Complex Prompts**

Language models can respond differently depending on how prompts are structured.

| Type               | Description                                                                                                                      | Example                                                                                                                          |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- |
| **Simple Prompt**  | Short and direct input, often a single question or instruction.                                                                  | `"Translate 'Hello' to French."`                                                                                                 |
| **Complex Prompt** | Includes multiple instructions, context, examples, or conditions. Often used for tasks requiring reasoning or structured output. | `"You are a helpful assistant. Translate 'Hello' to French, provide a formal and informal version, and explain the difference."` |

**Key Points:**

* **Simple prompts** are easier for beginners but may produce inconsistent results for complex tasks.
* **Complex prompts** guide the model more precisely, improving accuracy for multi-step or structured tasks.

---

**Templates and Variables**

Prompt templates allow you to **reuse prompts dynamically** by inserting variables.

* **Template Example:**

```text
Summarize the following text in one sentence: {text}
```

* **Python Usage with LangChain:**

```python
from langchain.prompts import PromptTemplate

template = PromptTemplate(
    input_variables=["text"],
    template="Summarize the following text in one sentence: {text}"
)

prompt_text = template.format(text="LangChain simplifies building LLM applications.")
print(prompt_text)
```

**Benefits of Using Templates:**

* Avoids rewriting prompts repeatedly
* Ensures consistency across multiple queries
* Makes prompts dynamic and adaptable to different inputs

---

**Few-Shot and Zero-Shot Prompting**

These techniques help the model understand the task better by providing examples or relying on instructions only.

| Technique               | Description                                                                     | Example                                                                                            |
| ----------------------- | ------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| **Zero-Shot Prompting** | You provide **only the task description**; the model must infer how to respond. | `"Translate 'Good morning' to Spanish."`                                                           |
| **Few-Shot Prompting**  | You provide **a few examples along with the task** to guide the model.          | `"English: Hello ‚Üí Spanish: Hola\nEnglish: Thank you ‚Üí Spanish: Gracias\nEnglish: Good morning ‚Üí"` |

**Tips for Effective Prompting:**

* Start with **clear and concise instructions**
* Include **examples for few-shot prompts** to improve accuracy
* Specify the **format or style of output** if needed
* Test different prompt variations to see which yields the best results

---

**Key Takeaways:**

* Simple prompts are quick and easy; complex prompts provide better guidance.
* Templates allow dynamic, reusable, and consistent prompts.
* Zero-shot relies purely on instructions; few-shot improves reliability with examples.
* Effective prompt engineering directly impacts **model performance and output quality**.





## **Chains**

In LangChain, **Chains** are the fundamental building blocks for creating workflows. A chain connects multiple steps‚Äîlike LLM calls, data transformations, or retrieval operations‚Äîso you can process input and generate output in a structured way.

---

**Simple Sequential Chains**

* A **Sequential Chain** executes steps in order, passing the output of one step to the next.
* Useful for **linear workflows** where each operation depends on the previous one.

**Example:** Summarization after translation

```python
from langchain.chains import SequentialChain
from langchain.llms import OpenAI

llm = OpenAI(model="gpt-4", temperature=0)

chain = SequentialChain(
    chains=[
        LLMChain(llm=llm, prompt="Translate '{text}' to French."),
        LLMChain(llm=llm, prompt="Summarize the French text in one sentence.")
    ],
    input_variables=["text"],
    output_variables=["summary"]
)

result = chain.run(text="LangChain simplifies building LLM applications.")
print(result)
```

**Key Points:**

* Steps are executed **in order**
* Output of one chain can become input for the next
* Ideal for **linear tasks**

---

**LLMChain vs. SequentialChain**

| Chain Type          | Description                                              | Use Case                                                     |
| ------------------- | -------------------------------------------------------- | ------------------------------------------------------------ |
| **LLMChain**        | Wraps a **single LLM call** with a prompt template.      | Translation, summarization, or single-step generation tasks. |
| **SequentialChain** | Combines **multiple chains or LLMChains** in a sequence. | Multi-step workflows like translate ‚Üí summarize ‚Üí reformat.  |

**Key Differences:**

* **LLMChain** = single operation
* **SequentialChain** = multiple operations chained together

---

**Conditional and Custom Chains**

* Sometimes, workflows need **decision-making** based on input or intermediate results.
* **Conditional Chains** allow branching using `if-else` logic or custom functions.
* **Custom Chains** let you define your own logic for processing inputs and outputs.

**Example:** Conditional chain based on input length

```python
from langchain.chains import SimpleChain

def conditional_logic(inputs):
    text = inputs["text"]
    if len(text.split()) > 10:
        return {"output": f"Long text summary: {text[:50]}..."}
    else:
        return {"output": f"Short text: {text}"}

custom_chain = SimpleChain(function=conditional_logic, input_variables=["text"], output_variables=["output"])
result = custom_chain.run(text="This is a very long text example to demonstrate conditional chains in LangChain.")
print(result)
```

**Key Points:**

* Conditional Chains = branch execution based on logic
* Custom Chains = full control over input/output processing
* Useful for **dynamic workflows** or **task-specific routing**

---

**Summary Table of Chains in LangChain**

| Chain Type            | Characteristics                              | Example                                   |
| --------------------- | -------------------------------------------- | ----------------------------------------- |
| **LLMChain**          | Single LLM call with prompt                  | Translate text                            |
| **SequentialChain**   | Multiple LLMChains or operations in sequence | Translate ‚Üí Summarize ‚Üí Reformat          |
| **Conditional Chain** | Branches execution based on conditions       | Long vs. short text processing            |
| **Custom Chain**      | Fully custom Python logic                    | Specialized processing or transformations |



## **Memory**



Memory in LangChain is a **mechanism to store and recall information** across multiple steps in a workflow or conversation. It is especially important in **Retrieval-Augmented Generation (RAG)** pipelines and conversational AI applications, where context and previous interactions influence responses. By maintaining memory, language models can produce **more coherent, context-aware, and accurate outputs** over time.

---

**Types of Memory**

LangChain supports multiple types of memory, each suited for different tasks:

| Memory Type                               | Description                                                                                                             | Use Case                                                                                                                                 |
| ----------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| **Conversation Buffer Memory**            | Stores the last few interactions or messages in a conversation.                                                         | Short-term chat history for chatbots or assistants. Helps maintain context across recent messages.                                       |
| **Key-Value Memory**                      | Stores information as key-value pairs that can be retrieved later.                                                      | Useful for storing structured data, such as user preferences, variables, or intermediate results in multi-step workflows.                |
| **Summary Memory**                        | Condenses previous interactions or data into a **summarized form** to save space while retaining essential information. | Ideal for long conversations or large documents, reducing memory usage while maintaining context.                                        |
| **Document/Vector Memory (RAG-specific)** | Stores retrieved documents or vector embeddings from external knowledge bases.                                          | Enables retrieval-augmented generation, where models answer questions based on external documents rather than just conversation context. |

---

**Implementing Memory in Chains**

Memory in LangChain can be integrated directly into **chains**, allowing each step to access and update stored information:

```python
from langchain.memory import ConversationBufferMemory
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

# Initialize memory
memory = ConversationBufferMemory(memory_key="chat_history")

# Create a prompt template that includes memory
template = PromptTemplate(
    input_variables=["chat_history", "user_input"],
    template="Previous conversation: {chat_history}\nUser: {user_input}\nAssistant:"
)

llm = ChatOpenAI(model="gpt-4", temperature=0.5)

chain = LLMChain(llm=llm, prompt=template, memory=memory)

# Run the chain multiple times
response1 = chain.run(user_input="Hello! Who won the World Cup in 2018?")
response2 = chain.run(user_input="Can you summarize that result?")
```

* `ConversationBufferMemory` automatically keeps track of previous exchanges.
* Memory can be **read and updated dynamically** during the workflow.
* Using memory in chains allows the model to **reference prior context without repeating it in every prompt manually**.

---

**Long-Term vs. Short-Term Memory**

* **Short-Term Memory:**

  * Stores immediate or recent context, often limited to the last few interactions.
  * Examples: `ConversationBufferMemory` for chat history or a session-specific variable store.
  * Benefits: Lightweight, fast, reduces token usage, ideal for short conversations.

* **Long-Term Memory:**

  * Stores information persistently across sessions or for extended periods.
  * Examples: `SummaryMemory` or a vector database storing embeddings for RAG applications.
  * Benefits: Maintains knowledge over time, supports complex workflows, enhances model consistency in repeated interactions.

**Key Considerations:**

* **Memory management is critical** for performance and relevance; storing too much can slow down chains or increase token usage.
* **RAG systems** often combine long-term memory (external knowledge base) with short-term memory (recent conversation) for the best results.
* Memory can also be **augmented with filtering or summarization**, ensuring that the most relevant information is used for each step.

---

**Summary Table: Memory Overview**

| Aspect         | Short-Term Memory             | Long-Term Memory                            |
| -------------- | ----------------------------- | ------------------------------------------- |
| **Purpose**    | Maintain recent context       | Maintain persistent knowledge               |
| **Examples**   | ConversationBufferMemory      | SummaryMemory, Vector store embeddings      |
| **Scope**      | Session-specific              | Across multiple sessions                    |
| **Advantages** | Lightweight, fast             | Retains knowledge, supports complex queries |
| **Use Case**   | Chatbots, temporary workflows | RAG pipelines, knowledge-grounded agents    |

## **Documents and Text Processing**



Processing documents and text is a **critical part of building intelligent applications** with LangChain. This involves organizing, splitting, embedding, storing, and retrieving textual data so that large language models (LLMs) can use it effectively for tasks like summarization, question answering, or retrieval-augmented generation (RAG).

---

**LangChain Documents**

* In LangChain, a **Document** is a standard data structure representing a piece of text along with optional metadata.
* Metadata can include:

  * Document title, author, or source URL
  * Timestamps or creation dates
  * Categories or tags for filtering
* Using metadata helps organize large datasets and enables **filtered queries** in vector stores.

**Example:**

```python
from langchain.schema import Document

doc = Document(
    page_content="LangChain simplifies building workflows with LLMs.",
    metadata={"source": "tutorial", "topic": "LangChain Basics"}
)
```

* Documents are the **primary unit** for embedding and retrieval operations.

---

**Text Splitters**

* **Text splitting** is necessary when dealing with large documents to:

  * Break text into chunks compatible with LLM token limits
  * Preserve context across segments
* LangChain provides multiple **text splitter classes**:

  * **CharacterTextSplitter:** Splits text by character count
  * **RecursiveCharacterTextSplitter:** Recursively splits text by paragraphs, sentences, or characters
  * **TokenTextSplitter:** Splits text based on tokens for better LLM compatibility

**Example:**

```python
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_text(long_text)
```

---

**Document Loaders (PDF, CSV, Webpages, etc.)**

LangChain supports **loading documents from various sources**:

| Loader Type                | Description                             | Example                                       |
| -------------------------- | --------------------------------------- | --------------------------------------------- |
| **PDFLoader**              | Loads PDF files and extracts text       | `PDFLoader("file.pdf").load()`                |
| **CSVLoader**              | Reads CSV files as structured documents | `CSVLoader("data.csv").load()`                |
| **WebBaseLoader**          | Extracts text from web pages            | `WebBaseLoader("https://example.com").load()` |
| **UnstructuredFileLoader** | Generic loader for TXT, DOCX, etc.      | `UnstructuredFileLoader("file.txt").load()`   |

* Loaders automatically return a **list of Documents**, ready for splitting and embedding.

---

**Embeddings**

**What are Embeddings?**

* Embeddings are **numerical vector representations of text or data**, capturing semantic meaning.
* They allow LLMs and vector databases to perform **similarity searches**, semantic clustering, and retrieval tasks.

**Using OpenAI and Hugging Face Embeddings**

* **OpenAI Embeddings** example:

```python
from langchain.embeddings import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings(model="text-embedding-3-large")
vector = embedding_model.embed_query("LangChain is powerful")
```

* **Hugging Face Embeddings** example:

```python
from langchain.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector = embedding_model.embed_query("LangChain simplifies workflows")
```

**Fine-Tuning Embeddings for Custom Tasks**

* Fine-tuning embeddings allows models to better capture **domain-specific semantics**.
* Methods include:

  * Training with **domain-specific text pairs** for similarity
  * Adjusting embedding dimensions or normalizations
  * Using supervised or contrastive learning to improve retrieval relevance

---

**Vector Stores**

**Introduction to Vector Databases**

* Vector databases store embeddings efficiently and enable **fast similarity searches**.
* They are optimized for **nearest neighbor queries** over large datasets.

**Popular Vector Databases**

| Database     | Features                            | Notes                                  |
| ------------ | ----------------------------------- | -------------------------------------- |
| **Chroma**   | Open-source, Python-native          | Easy integration with LangChain        |
| **Pinecone** | Cloud-based, scalable               | Fully managed service                  |
| **FAISS**    | Facebook AI library, local indexing | High-performance, GPU acceleration     |
| **Weaviate** | Open-source, cloud-ready            | Supports GraphQL API                   |
| **Milvus**   | High-performance vector DB          | Suitable for enterprise-scale datasets |

**Storing and Querying Embeddings**

* After generating embeddings, they are added to a vector store with optional metadata:

```python
collection.add(
    documents=["LangChain simplifies LLM workflows."],
    embeddings=[vector],
    metadatas=[{"source": "tutorial"}],
    ids=["doc1"]
)
```

* Queries return **top-k similar documents** based on vector similarity:

```python
results = collection.query(query_vector, n_results=3)
```




## **Setting up a RAG Pipeline**

**Retrieval-Augmented Generation (RAG)** is a method where a **language model (LLM) generates answers based on retrieved documents** from a vector database. This approach allows the model to produce **accurate, contextually grounded responses** without relying solely on its internal knowledge.

---

**Steps to Set Up a RAG Pipeline**

1. **Prepare Your Documents**

   * Collect textual data from PDFs, web pages, CSVs, or plain text.
   * Use **document loaders** and optionally **split large documents** into smaller chunks using text splitters.

   ```python
   from langchain.document_loaders import PDFLoader
   from langchain.text_splitter import RecursiveCharacterTextSplitter

   loader = PDFLoader("example.pdf")
   documents = loader.load()

   splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
   docs = splitter.split_documents(documents)
   ```

2. **Generate Embeddings**

   * Convert text chunks into **vector embeddings** using OpenAI, Hugging Face, or custom models.

   ```python
   from langchain.embeddings import OpenAIEmbeddings

   embedding_model = OpenAIEmbeddings(model="text-embedding-3-large")
   embeddings = [embedding_model.embed_query(doc.page_content) for doc in docs]
   ```

3. **Store Embeddings in a Vector Store**

   * Choose a vector database like **Chroma, FAISS, Pinecone, Weaviate, or Milvus**.
   * Add documents, embeddings, and metadata for efficient retrieval.

   ```python
   import chromadb
   client = chromadb.Client()
   collection = client.create_collection("my_docs")
   collection.add(documents=[doc.page_content for doc in docs], embeddings=embeddings)
   ```

4. **Set Up a Retriever**

   * A retriever queries the vector store and returns **top-k relevant documents** for a user query.

   ```python
   retriever = collection.as_retriever(search_kwargs={"k": 5})
   ```

5. **Integrate the LLM**

   * Combine the retriever with an LLM to **generate answers using the retrieved context**.

   ```python
   from langchain.chat_models import ChatOpenAI
   from langchain.chains import RetrievalQA

   llm = ChatOpenAI(model="gpt-4", temperature=0)
   rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
   ```

6. **Run the RAG Pipeline**

   * Input a user query and get a context-aware response:

   ```python
   query = "Explain how LangChain handles document embeddings."
   answer = rag_chain.run(query)
   print(answer)
   ```

---

**Combining LLMs with Vector Stores**

* **Vector Stores** provide **retrieval capability** by returning the most semantically relevant documents.
* **LLMs** generate answers **grounded in the retrieved documents**, reducing hallucinations and improving accuracy.

**Key Benefits of LLM + Vector Store Integration:**

| Feature               | Benefit                                                                                                 |
| --------------------- | ------------------------------------------------------------------------------------------------------- |
| Context-Aware Answers | The LLM generates responses using real documents instead of relying solely on its pretrained knowledge. |
| Scalability           | Easily scale to large datasets by storing embeddings in vector databases.                               |
| Flexibility           | Supports multiple data sources: PDFs, web pages, CSVs, or any text documents.                           |
| Accuracy              | Metadata filtering and top-k retrieval improve relevance and precision.                                 |

---

**Tips for Optimizing RAG Pipelines:**

* Use **chunked documents** to ensure the LLM receives context that fits within token limits.
* Fine-tune **retriever parameters** like `k` or distance metric to balance accuracy and performance.
* Include **metadata filtering** for domain-specific queries (e.g., date, category).
* Optionally, maintain **short-term memory** for multi-turn conversations to improve continuity.

## **Setting up a RAG Pipeline with LCEL**



**LangChain Expression Language (LCEL)** allows you to define **RAG workflows declaratively** using a concise, chainable syntax. By combining LCEL with vector stores and LLMs, you can build a **dynamic retrieval-augmented generation pipeline** without manually orchestrating each step.

**Steps to Set Up a RAG Pipeline with LCEL**

1. **Prepare and Embed Documents**

   * Load and split documents, then generate embeddings as usual:

```python
from langchain.document_loaders import PDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings

# Load PDF documents
loader = PDFLoader("example.pdf")
documents = loader.load()

# Split documents into manageable chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(documents)

# Create embeddings
embedding_model = OpenAIEmbeddings(model="text-embedding-3-large")
embeddings = [embedding_model.embed_query(doc.page_content) for doc in docs]
```

2. **Store Embeddings in a Vector Store**

```python
import chromadb
client = chromadb.Client()
collection = client.create_collection("my_docs")
collection.add(documents=[doc.page_content for doc in docs], embeddings=embeddings)
retriever = collection.as_retriever(search_kwargs={"k": 5})
```

3. **Define the LCEL RAG Chain**

* LCEL allows you to define a **retrieval + prompt + LLM** pipeline in a single, readable expression.

```python
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Example custom prompt template
custom_prompt = "Using the context below, answer the question.\nContext: {context}\nQuestion: {question}\nAnswer:"

# Define the LCEL chain
rag_chain_lcel = (
    {
        # Retrieve relevant documents and format them
        "context": retriever | (lambda docs: "\n".join([d.page_content for d in docs])),
        # Pass the user query directly
        "question": RunnablePassthrough()
    }
    # Apply the custom prompt template
    | (lambda x: custom_prompt.format(context=x["context"], question=x["question"]))
    # Generate the answer using the LLM
    | llm
    # Parse the output into a clean string
    | StrOutputParser()
)
```

4. **Run the LCEL RAG Chain**

```python
query = "Explain how LangChain handles document embeddings."
answer = rag_chain_lcel.invoke(query)
print(answer)
```

---

**How LCEL Improves the RAG Workflow**

| Feature                      | Benefit                                                                                                        |             |
| ---------------------------- | -------------------------------------------------------------------------------------------------------------- | ----------- |
| **Declarative Syntax**       | Define the pipeline in a concise, readable way without manually chaining functions.                            |             |
| **Composable Steps**         | Retrieval, formatting, LLM invocation, and output parsing are easily chained with the `                        | ` operator. |
| **Dynamic Input Handling**   | User queries are passed directly via `RunnablePassthrough()`, making it flexible for interactive applications. |             |
| **Integration with Prompts** | Custom prompt templates can be applied dynamically to retrieved documents before generating responses.         |             |
| **Output Parsing**           | Built-in output parsers ensure the LLM response is clean and usable.                                           |             |

---

**Key Tips for LCEL RAG Pipelines:**

* Use **metadata filtering** in the retriever to improve relevance.
* Chunk documents appropriately to fit within LLM token limits.
* Combine **short-term memory** with retrieval to maintain context in multi-turn conversations.
* Fine-tune embeddings or prompt templates for **domain-specific accuracy**.

## **Agents and Tools**

Agents are a **core concept in LangChain** that allow for more dynamic and intelligent workflows compared to static chains. They act as **decision-making entities**, determining which actions to take based on user input, retrieved information, or intermediate results. Agents are particularly useful when tasks require reasoning, multi-step decision-making, or the integration of external tools.

---

**What are Agents?**

* Agents are components that **decide what to do next** rather than just performing a predefined sequence of operations.
* Unlike chains that execute steps linearly, agents can:

  * Call different tools dynamically
  * Handle conditional logic
  * Iterate until a satisfactory solution is found
* Agents are ideal for **complex workflows, question answering, or multi-turn conversations** where the model must reason about which action to take next.

**Key Characteristics of Agents:**

* **Autonomy:** Agents can choose the next action without human intervention.
* **Flexibility:** They can call different tools based on context.
* **Context-Aware:** Agents can maintain memory and use retrieved information to inform decisions.

---

**Types of Agents**

| Agent Type               | Description                                                                                                                                               | Example Use Case                                                               |
| ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ |
| **Zero-Shot Agent**      | Operates without examples, relying on instructions and prompts to decide actions.                                                                         | Quickly answering questions using only a tool description.                     |
| **ReAct Agent**          | Combines **reasoning and action** in iterative steps. It alternates between thinking (reasoning) and acting (tool execution) until it reaches a solution. | Solving complex math problems by retrieving data and calculating step-by-step. |
| **Conversational Agent** | Maintains context across a conversation, allowing multi-turn interactions with the user. Often integrated with memory to provide coherent responses.      | Customer support chatbot that remembers previous queries.                      |

**Key Differences:**

* Zero-shot agents are **simpler and faster**, suitable for straightforward tasks.
* ReAct agents are **more robust for complex problem-solving** due to their iterative reasoning.
* Conversational agents are **optimized for dialogue**, leveraging memory and context tracking.

---

**Tool-Enabled Agents**

* Tool-enabled agents can **access external APIs, databases, calculators, or other utilities** to perform tasks beyond the capabilities of an LLM alone.
* Tools are registered with the agent, and the agent decides **which tool to call and when**.

**Example:** Using a calculator tool in a conversational agent

```python
from langchain.agents import initialize_agent, Tool
from langchain.chat_models import ChatOpenAI

# Define a simple calculator tool
def calculator(input_text: str) -> str:
    return str(eval(input_text))

tools = [
    Tool(
        name="Calculator",
        func=calculator,
        description="Performs arithmetic calculations"
    )
]

# Initialize an LLM
llm = ChatOpenAI(model="gpt-4", temperature=0)

# Create a tool-enabled agent
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

# Run the agent with a query
response = agent.run("What is 25 multiplied by 13?")
print(response)
```

**Benefits of Tool-Enabled Agents:**

* Extend the capabilities of LLMs to **perform real-world tasks**.
* Allow **dynamic decision-making**, selecting the appropriate tool based on the query.
* Enable integration with **databases, APIs, and custom functions**, making workflows more intelligent.

---

**Best Practices for Using Agents:**

1. **Define Clear Tool Descriptions:** Agents rely on tool descriptions to decide which to use.
2. **Limit Token Usage in Prompts:** Large context with many tools can increase token consumption.
3. **Use Memory for Multi-Turn Tasks:** Conversational or task-tracking agents benefit from memory integration.
4. **Combine with RAG Pipelines:** Agents can retrieve relevant documents and then act based on the retrieved knowledge.
5. **Monitor and Log Decisions:** For debugging, verbose logging helps understand agent reasoning and tool usage.




## **Tools**

In LangChain, **tools** are external utilities, APIs, or functions that an agent can leverage to perform specific tasks that go beyond the capabilities of a language model alone. Tools enable agents to **interact with the real world, perform computations, fetch external data, and execute custom workflows**, making them essential for building intelligent, dynamic AI applications.

---

**Integrating APIs and Functions**

* Tools can be **external APIs** or **Python functions** registered with an agent.
* Each tool typically includes:

  * **Name:** The identifier used by the agent to reference the tool
  * **Function:** The callable logic that performs the task
  * **Description:** Explains when and how the tool should be used

**Example: API Integration**

```python
from langchain.agents import Tool
import requests

def weather_api(city: str) -> str:
    response = requests.get(f"https://api.weatherapi.com/v1/current.json?key=YOUR_API_KEY&q={city}")
    data = response.json()
    return f"The current weather in {city} is {data['current']['condition']['text']} with a temperature of {data['current']['temp_c']}¬∞C."

weather_tool = Tool(
    name="WeatherChecker",
    func=weather_api,
    description="Provides the current weather for a specified city."
)
```

* Agents can dynamically select and call the API when a query requires weather information.

**Benefits:**

* Makes agents **context-aware** and able to respond to real-world queries.
* Allows integration with **any web-based service or internal API**.

---

**Python REPL and Custom Tool Integration**

* The **Python REPL tool** enables agents to execute arbitrary Python code at runtime.
* Useful for tasks such as:

  * Mathematical computations
  * Data transformations
  * Dynamic logic execution

**Example: Python REPL Tool**

```python
from langchain.agents import Tool, initialize_agent
from langchain.chat_models import ChatOpenAI

# Python REPL tool
python_tool = Tool(
    name="Python REPL",
    func=lambda code: str(eval(code)),
    description="Executes Python code and returns the result."
)

# Initialize an LLM
llm = ChatOpenAI(model="gpt-4", temperature=0)

# Create an agent with Python REPL capabilities
agent = initialize_agent(
    tools=[python_tool],
    llm=llm,
    agent="zero-shot-react-description",
    verbose=True
)

# Run an example
result = agent.run("Compute 125 * 32 + 10")
print(result)
```

* This approach allows agents to **dynamically compute or process data** without predefining every possible operation.

**Benefits:**

* Offers **flexibility** for multi-step or unpredictable workflows
* Supports **custom logic** that can be integrated into agent reasoning

---

**Web Scraping and Query Tools**

* Web scraping tools allow agents to **extract live data from websites** and other online sources.
* Query tools can access **SQL, NoSQL, or other structured databases** to retrieve information dynamically.

**Example: Web Scraping Tool**

```python
from bs4 import BeautifulSoup
import requests
from langchain.agents import Tool

def scrape_page(url: str) -> str:
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    return soup.get_text()[:500]  # Return first 500 characters for brevity

web_scraper_tool = Tool(
    name="WebScraper",
    func=scrape_page,
    description="Scrapes and returns the first 500 characters of the text from a webpage."
)
```

**Example: Database Query Tool**

```python
import sqlite3

def query_database(query: str) -> str:
    conn = sqlite3.connect("example.db")
    cursor = conn.cursor()
    cursor.execute(query)
    results = cursor.fetchall()
    conn.close()
    return str(results)

database_tool = Tool(
    name="DatabaseQuery",
    func=query_database,
    description="Executes SQL queries and returns results."
)
```

**Benefits:**

* Provides **real-time and up-to-date information**
* Enables agents to **perform complex lookups or data retrieval tasks**
* Supports integration with **enterprise systems and external datasets**

---

**Summary of Tool Integration in LangChain**

| Tool Type                | Description                               | Use Case                                                 |
| ------------------------ | ----------------------------------------- | -------------------------------------------------------- |
| **API Tool**             | Connects to external APIs or web services | Weather info, stock prices, translation APIs             |
| **Python REPL Tool**     | Executes arbitrary Python code            | Computation, data transformation, dynamic logic          |
| **Web Scraping Tool**    | Extracts content from web pages           | Research, news updates, live data extraction             |
| **Query Tool**           | Queries databases or structured datasets  | Reporting, analytics, retrieving structured information  |
| **Custom Function Tool** | Any Python function exposed to the agent  | Domain-specific tasks, preprocessing, or task automation |

## **Advanced Agent Strategies**

Advanced agent strategies in LangChain enable agents to **handle complex tasks, multi-step workflows, and real-world uncertainties**. By combining reasoning, planning, tool usage, and error handling, these agents become **dynamic problem solvers** capable of acting intelligently in diverse scenarios.

---

**Multi-Step Reasoning**

* Multi-step reasoning allows agents to **break down a complex problem into smaller, manageable steps** and solve them sequentially or iteratively.
* Agents use reasoning to **decide which tools to call, in which order**, and how to process intermediate results.

**Example:** Calculating the average temperature from multiple cities using a weather API tool

```python
from langchain.agents import Tool, initialize_agent
from langchain.chat_models import ChatOpenAI

# Example tool: weather API
def weather_api(city: str) -> str:
    # Simulated API response
    data = {"London": 15, "New York": 22, "Tokyo": 19}
    return str(data.get(city, "Data not available"))

tools = [Tool(name="WeatherChecker", func=weather_api, description="Returns temperature of a city")]

llm = ChatOpenAI(model="gpt-4", temperature=0)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

query = "Calculate the average temperature for London, New York, and Tokyo."
response = agent.run(query)
print(response)
```

**Benefits:**

* Handles **multi-step workflows** automatically
* Reduces human effort in coordinating tool calls
* Useful for **analytics, planning, and problem-solving**

---

**Planning Agents**

* **Planning agents** can **decompose complex tasks into sub-tasks**, execute them in sequence, and track dependencies.
* Planning can be **explicit**, where tasks are predefined, or **dynamic**, where the agent decides on-the-fly based on input and retrieved context.

**Use Cases:**

* Multi-step document analysis: retrieve relevant documents ‚Üí extract key points ‚Üí summarize ‚Üí generate report
* Event planning: check availability ‚Üí book venue ‚Üí send invitations ‚Üí confirm attendees

**Example Workflow:**

1. Retrieve relevant documents using a vector store
2. Extract key insights from each document
3. Perform calculations or transformations on the extracted data
4. Generate a final output summary

Planning agents allow **longer workflows without losing context**, making them ideal for RAG systems, report generation, and knowledge-intensive tasks.

---

**Error Handling and Fallbacks**

* Real-world tools and APIs may fail due to **network issues, invalid inputs, or service downtime**.
* Agents should include **error handling and fallback strategies** to maintain robustness and reliability.

**Error Handling Strategies:**

1. **Try-Catch Logic:** Wrap tool calls in try-except blocks to catch runtime errors.
2. **Fallback Tools:** Use alternative tools if the primary tool fails.
3. **Default Responses:** Return safe or partial answers when tools cannot provide results.
4. **Retries:** Retry failed tool calls automatically before giving up.

**Example:** Robust weather API tool

```python
def safe_weather_api(city: str) -> str:
    try:
        # Simulated API call
        data = {"London": 15, "New York": 22, "Tokyo": 19}
        return str(data[city])
    except KeyError:
        return "Weather data not available."
    except Exception:
        return "Service currently unavailable, please try again later."
```

* Integrating robust error handling ensures agents remain **responsive and reliable** in production scenarios.

---

### Summary Table: Advanced Agent Strategies

| Strategy                       | Description                                                 | Benefits                                       | Example                                        |
| ------------------------------ | ----------------------------------------------------------- | ---------------------------------------------- | ---------------------------------------------- |
| **Multi-Step Reasoning**       | Decomposes complex tasks into sequential or iterative steps | Automates complex workflows, improves accuracy | Calculating averages from multiple data points |
| **Planning Agents**            | Breaks tasks into sub-tasks and executes them in order      | Maintains context, handles dependencies        | Multi-document analysis and summarization      |
| **Error Handling & Fallbacks** | Ensures resilience against tool failures or invalid inputs  | Improves reliability, avoids workflow crashes  | Retry API calls, provide default responses     |


## **Advanced LangChain Features**

**Callbacks and Logging**

Callbacks in LangChain allow you to **hook into the execution of chains, agents, or tools** to monitor activity, collect data, or trigger additional logic. Logging provides a **record of these interactions**, which is crucial for debugging, performance monitoring, and analysis.

**Key Concepts:**

* **Callbacks:** Functions or objects called at specific points during execution (e.g., before a chain runs, after an LLM generates output).
* **Logging:** Recording events, outputs, errors, or timing information.
* **Callback Managers:** Manage multiple callbacks, ensuring they are executed in order.

**Example: Using Callbacks for Logging**

```python
from langchain.callbacks import CallbackManager, StdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

# Create a callback manager
callback_manager = CallbackManager([StdOutCallbackHandler()])

# Define prompt and LLM
prompt = PromptTemplate(input_variables=["topic"], template="Explain {topic} in simple terms.")
llm = ChatOpenAI(model="gpt-4", callback_manager=callback_manager, verbose=True)

chain = LLMChain(llm=llm, prompt=prompt)
response = chain.run("LangChain callbacks")
```

* The `StdOutCallbackHandler` prints logs to the console whenever the chain runs.
* Callbacks can also be **custom-defined** to save logs to files, databases, or monitoring dashboards.

---

**Debugging and Monitoring Chains**

LangChain‚Äôs callback and logging system makes it easier to **debug complex chains and agents**:

* **Track LLM Input/Output:** Monitor prompts sent to the model and responses generated.
* **Step-by-Step Debugging:** Observe intermediate steps in sequential or conditional chains.
* **Error Tracing:** Capture exceptions, invalid outputs, or tool failures.
* **Performance Monitoring:** Measure time taken for each step or tool call.

**Example: Custom Debugging Callback**

```python
from langchain.callbacks.base import BaseCallbackHandler

class DebugCallbackHandler(BaseCallbackHandler):
    def on_llm_start(self, serialized, prompts, **kwargs):
        print("LLM started with prompts:", prompts)
    
    def on_llm_end(self, response, **kwargs):
        print("LLM finished, response length:", len(response.generations[0][0].text))

# Attach custom handler
callback_manager = CallbackManager([DebugCallbackHandler()])
llm = ChatOpenAI(model="gpt-4", callback_manager=callback_manager)
```

* This approach gives **fine-grained visibility** into model behavior and chain execution.

---

**Integration with MLflow and Other Tools**

LangChain can integrate with **MLflow, Weights & Biases, and other monitoring or tracking platforms** to manage AI workflows in production:

* **MLflow Integration:**

  * Log chain inputs, outputs, metrics, and artifacts automatically.
  * Track experiments for different prompts, models, and embeddings.
* **Benefits:**

  * Enables reproducibility
  * Provides historical logs for analysis and optimization
  * Supports production-grade monitoring and versioning

**Example: Logging with MLflow**

```python
import mlflow

with mlflow.start_run(run_name="langchain_test"):
    result = chain.run("Explain RAG pipelines")
    mlflow.log_param("topic", "RAG pipelines")
    mlflow.log_metric("response_length", len(result))
    mlflow.log_text(result, "response.txt")
```

* You can track multiple **chains, agents, and experiments** centrally, facilitating **continuous improvement and monitoring**.

---

**Summary Table: Callbacks and Monitoring**

| Feature                     | Description                                            | Use Case                                             |
| --------------------------- | ------------------------------------------------------ | ---------------------------------------------------- |
| **Callbacks**               | Hook functions executed during chain or agent runtime  | Logging, triggering custom actions, debugging        |
| **Logging**                 | Record inputs, outputs, errors, and timing information | Track workflows, performance, and correctness        |
| **Debugging Handlers**      | Custom callback implementations for monitoring         | Fine-grained step-level visibility                   |
| **MLflow / External Tools** | Track experiments, metrics, and artifacts              | Production monitoring, reproducibility, optimization |

## **Custom Chains and Components**

**Building Reusable Chains**

* **Reusable chains** are modular sequences of operations that can be invoked multiple times with different inputs.
* Chains can combine **LLMs, tools, memory, and custom logic** into a single callable object.

**Benefits of Reusable Chains:**

* Avoids repetitive coding
* Standardizes workflows for consistent results
* Makes it easier to **scale and maintain** AI applications

**Example: A Reusable QA Chain**

```python
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

# Define a prompt template
qa_prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="Using the context below, answer the question.\nContext: {context}\nQuestion: {question}\nAnswer:"
)

# Create a reusable chain
llm = ChatOpenAI(model="gpt-4", temperature=0)
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)

# Execute the chain
context = "LangChain simplifies building workflows with LLMs."
question = "What is LangChain?"
answer = qa_chain.run(context=context, question=question)
print(answer)
```

* This chain can be reused for multiple questions or contexts without modification.

---

**Custom Prompt Templates**

* **Custom prompt templates** allow you to define the **format, variables, and instructions** for LLM input.
* Templates can be simple or complex and can include:

  * Conditional instructions
  * Few-shot examples
  * Dynamic variables from chain inputs

**Example: Few-Shot Prompt Template**

```python
few_shot_prompt = PromptTemplate(
    input_variables=["input_text"],
    template="""
    Translate the following English text to French:

    Example:
    English: Hello
    French: Bonjour

    English: {input_text}
    French:
    """
)
translation_chain = LLMChain(llm=llm, prompt=few_shot_prompt)
output = translation_chain.run(input_text="How are you?")
print(output)
```

* Custom templates help **guide LLM behavior** and improve output consistency.

---

**Extending LangChain with Python**

* LangChain allows you to **extend functionality using Python**, including:

  * Custom chains and modules
  * Tool definitions
  * Preprocessing and postprocessing of data
* Python extensions can integrate with:

  * APIs, databases, and external services
  * Custom logic for specialized workflows
  * Automated evaluation pipelines

**Example: Custom Chain Component**

```python
from langchain.chains.base import Chain

class ReverseTextChain(Chain):
    input_keys = ["text"]
    output_keys = ["reversed_text"]

    def _call(self, inputs):
        reversed_text = inputs["text"][::-1]
        return {"reversed_text": reversed_text}

reverse_chain = ReverseTextChain()
result = reverse_chain({"text": "LangChain"})
print(result["reversed_text"])  # Output: niaChgnaL
```

* This demonstrates **custom logic integration** in a reusable chain format.

---

**Evaluation**

Evaluation in LangChain is essential to **measure the quality, accuracy, and relevance** of LLM outputs. By defining metrics and automated pipelines, you can systematically **assess performance** and improve models or prompts.

---

**Evaluating LLM Outputs**

* LLM outputs can be evaluated based on:

  * **Correctness:** Does the output match the expected answer?
  * **Relevance:** Does the response address the input query?
  * **Fluency:** Is the output grammatically correct and readable?
  * **Completeness:** Does it cover all required aspects?

**Manual Evaluation:**

* Human reviewers read LLM outputs and assign scores based on predefined criteria.
* Useful for nuanced assessments but **time-consuming**.

**Automated Evaluation:**

* Compare generated outputs with **gold standard references** using metrics like BLEU, ROUGE, or cosine similarity for embeddings.
* Enables **scalable evaluation** of large datasets.

---

**Metrics for Quality and Accuracy**

| Metric                             | Description                                                  | Use Case                            |
| ---------------------------------- | ------------------------------------------------------------ | ----------------------------------- |
| **BLEU**                           | Measures n-gram overlap between generated and reference text | Translation tasks                   |
| **ROUGE**                          | Measures recall-based overlap for summaries                  | Summarization tasks                 |
| **Cosine Similarity (Embeddings)** | Measures semantic similarity between texts                   | Open-ended QA, retrieval evaluation |
| **Exact Match (EM)**               | Checks if output exactly matches reference                   | Fact-based QA tasks                 |
| **Human Score**                    | Subjective assessment of correctness, fluency, relevance     | Complex reasoning tasks             |

---

**Automated Evaluation Pipelines**

* Automate evaluation using **LangChain components**:

  1. Generate outputs using chains or agents
  2. Compare outputs against reference answers
  3. Compute metrics automatically
  4. Log results for analysis or model improvement

**Example: Automated QA Evaluation**

```python
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer

# Example data
generated_answers = ["LangChain is a framework for building LLM workflows."]
reference_answers = ["LangChain allows developers to create workflows using LLMs."]

# Compute embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
gen_emb = model.encode(generated_answers)
ref_emb = model.encode(reference_answers)

# Compute similarity
similarity_score = cosine_similarity(gen_emb, ref_emb)[0][0]
print(f"Semantic similarity score: {similarity_score:.2f}")
```



## **Performance and Optimization**

**Speeding Up Embedding Queries**

Embedding-based retrieval is often the **bottleneck** in RAG and similarity search pipelines. Optimizing embedding queries can drastically reduce latency.

**Strategies:**

1. **Use Efficient Vector Stores:**

   * Tools like **FAISS, Chroma, Milvus, Pinecone** support fast approximate nearest neighbor (ANN) searches.
   * Choose **index types** like IVF, HNSW, or PQ based on dataset size and query speed requirements.

2. **Precompute Embeddings:**

   * Store embeddings for all documents beforehand rather than generating them on-the-fly.
   * Update embeddings only when new data is added.

3. **Reduce Embedding Dimensionality:**

   * Lower-dimensional embeddings reduce computation time and memory usage.
   * Use models like **text-embedding-3-small** for smaller datasets or cost-sensitive applications.

4. **Caching and Local Storage:**

   * Cache frequently queried embeddings or results in memory or Redis to **avoid repeated computation**.

**Example:** Precomputing and storing embeddings in Chroma

```python
from langchain.embeddings import OpenAIEmbeddings
import chromadb

embedding_model = OpenAIEmbeddings(model="text-embedding-3-large")
client = chromadb.Client()
collection = client.create_collection("precomputed_docs")

# Precompute and add embeddings
for doc in documents:
    vector = embedding_model.embed_query(doc.page_content)
    collection.add(documents=[doc.page_content], embeddings=[vector])
```

---

**Efficient Memory Management**

LangChain supports **short-term and long-term memory**, but memory can become a bottleneck in large-scale applications.

**Best Practices:**

1. **Chunking and Summarization:**

   * Split long documents into smaller chunks to reduce token usage.
   * Summarize intermediate content before storing in memory.

2. **Memory Pruning:**

   * Remove outdated or irrelevant conversation history in multi-turn dialogues.
   * Keep only **recent or relevant context** to conserve memory.

3. **Persistent vs. In-Memory Storage:**

   * Store long-term memory in **persistent databases** like Chroma, Pinecone, or SQLite.
   * Use in-memory structures for **active session memory** for speed.

---

**Batch Processing**

Batch processing improves throughput and reduces latency when embedding multiple documents or queries.

**Benefits:**

* Reduces API call overhead
* Maximizes GPU/CPU utilization
* Cost-efficient for large-scale LLM or embedding calls

**Example: Batch Embeddings**

```python
batch_size = 32
all_texts = [doc.page_content for doc in documents]
batched_vectors = []

for i in range(0, len(all_texts), batch_size):
    batch = all_texts[i:i+batch_size]
    vectors = embedding_model.embed_documents(batch)
    batched_vectors.extend(vectors)
```


## **Deployment**

**API Deployment (FastAPI, Flask)**

* Wrap chains and agents in a **web API** for interactive applications or external integrations.

**Example: FastAPI Deployment**

```python
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Query(BaseModel):
    question: str

@app.post("/ask")
def ask_question(query: Query):
    response = qa_chain.run(context="LangChain framework overview.", question=query.question)
    return {"answer": response}
```

* Supports **real-time user queries** with low latency.
* Flask can be used similarly for smaller or legacy applications.

---

**Cloud Deployment (AWS, GCP, Azure)**

* Host APIs and chains on cloud services for **scalability, high availability, and auto-scaling**.
* Services include:

  * **AWS:** EC2, Lambda, ECS, or SageMaker
  * **GCP:** Cloud Run, Compute Engine, Vertex AI
  * **Azure:** App Service, Azure Functions, AKS

**Best Practices:**

* Use **autoscaling** for high-traffic scenarios.
* Separate **compute-intensive tasks** like embedding generation to dedicated GPU instances.

---

**Docker and CI/CD Pipelines
**
* Containerize applications using **Docker** for portability and reproducibility.
* Use **CI/CD pipelines** (GitHub Actions, GitLab CI, Jenkins) for automated testing, deployment, and versioning.

**Example: Dockerfile for a LangChain API**

```dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
```

* Ensures **consistent runtime environments** across development, staging, and production.

---

**Security and Best Practices**

Ensuring security is critical when deploying LangChain applications, especially when handling **sensitive data or external APIs**.

---

**API Key Management**

* Store keys securely using **environment variables, secrets managers, or encrypted storage**.
* Avoid hardcoding API keys in code or repositories.

**Example: Using Environment Variables**

```python
import os
from langchain.chat_models import ChatOpenAI

api_key = os.getenv("OPENAI_API_KEY")
llm = ChatOpenAI(model="gpt-4", openai_api_key=api_key)
```

---

**Data Privacy**
* Ensure user data is **anonymized and encrypted** in storage and transit.
* Avoid storing unnecessary personal data in memory or persistent databases.
* Implement **access controls and logging** for auditability.

---

**Handling Sensitive Information**

* Mask sensitive content before passing it to LLMs if required.
* Use **role-based access control** for sensitive pipelines.
* Employ **tokenization or pseudonymization** techniques for compliance with regulations like GDPR or HIPAA.