### RAG Chain

In [71]:
sample_docs = [
  """Deep learning is a subfield of machine learning that uses neural networks with multiple layers. It is widely applied in image recognition, natural language processing, and speech recognition.""",

  """Machine learning is the process of teaching computers to make decisions without being explicitly programmed. It includes supervised, unsupervised, and reinforcement learning approaches.""",

  """Retrieval-Augmented Generation (RAG) combines a retrieval system with a generative model. It improves accuracy by retrieving relevant documents from a knowledge base before generating answers.""",

  """LangChain is a framework designed to build applications powered by large language models. It provides abstractions for chains, agents, and memory, making it easier to integrate LLMs with external data.""",

  """When using RAG with LangChain, the pipeline typically involves embedding documents, storing them in a vector database, retrieving relevant chunks, and passing them into the LLM for context-aware generation.""",

  """Popular vector databases used with LangChain for RAG pipelines include Pinecone, Weaviate, FAISS, and Milvus.""",

  """Deep learning models often require GPUs for efficient training because of their high computational requirements.""",

  """LangChain supports integration with multiple LLM providers, including OpenAI, Anthropic, and Hugging Face.""",

  """In a RAG pipeline, embeddings are numerical vector representations of text that allow semantic similarity search.""",

  """Fine-tuning machine learning models involves adjusting model parameters on domain-specific datasets to improve accuracy."""
]


### Document Loading
- But as I am using a dummy data over here, we can skip this part , or store it in a folder data and use directoryloader to import them

In [72]:
from langchain.docstore.document import Document
documents = [Document(page_content=doc) for doc in sample_docs] 
documents 

[Document(metadata={}, page_content='Deep learning is a subfield of machine learning that uses neural networks with multiple layers. It is widely applied in image recognition, natural language processing, and speech recognition.'),
 Document(metadata={}, page_content='Machine learning is the process of teaching computers to make decisions without being explicitly programmed. It includes supervised, unsupervised, and reinforcement learning approaches.'),
 Document(metadata={}, page_content='Retrieval-Augmented Generation (RAG) combines a retrieval system with a generative model. It improves accuracy by retrieving relevant documents from a knowledge base before generating answers.'),
 Document(metadata={}, page_content='LangChain is a framework designed to build applications powered by large language models. It provides abstractions for chains, agents, and memory, making it easier to integrate LLMs with external data.'),
 Document(metadata={}, page_content='When using RAG with LangChain,

### Document Splitting

In [73]:
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    separators=[" "],
    chunk_size=100,
    chunk_overlap=10,
    length_function=len
)
chunks = text_splitter.split_documents(documents)
print(f"Number of chunks: {len(chunks)}")
print(chunks[4])  

Number of chunks: 23
page_content='Retrieval-Augmented Generation (RAG) combines a retrieval system with a generative model. It'


### Embeddings using HuggingFace

In [74]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2") 
vector = embeddings.embed_query("Hello world")
print(vector)  

[-0.03447727486491203, 0.03102317824959755, 0.006734970025718212, 0.026108985766768456, -0.03936202451586723, -0.16030244529247284, 0.06692401319742203, -0.006441489793360233, -0.0474504791200161, 0.014758856035768986, 0.07087527960538864, 0.05552763119339943, 0.019193334504961967, -0.026251312345266342, -0.01010954286903143, -0.02694045566022396, 0.022307461127638817, -0.022226648405194283, -0.14969263970851898, -0.017493007704615593, 0.00767625542357564, 0.05435224249958992, 0.0032543970737606287, 0.031725890934467316, -0.0846213847398758, -0.02940601296722889, 0.05159561336040497, 0.04812406003475189, -0.0033148222137242556, -0.058279167860746384, 0.04196927323937416, 0.022210685536265373, 0.1281888335943222, -0.022338971495628357, -0.011656315997242928, 0.06292839348316193, -0.032876335084438324, -0.09122604131698608, -0.031175347045063972, 0.0526994913816452, 0.04703482985496521, -0.08420311659574509, -0.030056199058890343, -0.02074483036994934, 0.009517835453152657, -0.0037217906

### Initalize ChromaDB

In [75]:
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings

persist_dir = "./chroma_db"

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model,
    collection_name="rag_collection",
    persist_directory=persist_dir,
)
vector_db.persist()

print(f"Number of documents in the collection: {vector_db._collection.count()}")


Number of documents in the collection: 157


### Testing Similarity Search

In [76]:
query = "What is Fine tuning?"
result = vector_db.similarity_search(query,k=3)
result

[Document(metadata={}, page_content='Fine-tuning machine learning models involves adjusting model parameters on domain-specific datasets'),
 Document(metadata={}, page_content='Fine-tuning machine learning models involves adjusting model parameters on domain-specific datasets'),
 Document(metadata={}, page_content='Fine-tuning machine learning models involves adjusting model parameters on domain-specific datasets')]

### Advance Similarity Search with Scores
- Lower the value closer to the result

In [77]:
results = vector_db.similarity_search_with_score(query,k=3)
results

[(Document(metadata={}, page_content='Fine-tuning machine learning models involves adjusting model parameters on domain-specific datasets'),
  1.1072428226470947),
 (Document(metadata={}, page_content='Fine-tuning machine learning models involves adjusting model parameters on domain-specific datasets'),
  1.1072428226470947),
 (Document(metadata={}, page_content='Fine-tuning machine learning models involves adjusting model parameters on domain-specific datasets'),
  1.1072428226470947)]

### Initalizing LLM 

In [78]:
import os

from dotenv import load_dotenv
load_dotenv()
from langchain.chat_models import init_chat_model
llm = init_chat_model("gemini-2.5-flash", model_provider="google_genai")

llm

ChatGoogleGenerativeAI(model='models/gemini-2.5-flash', google_api_key=SecretStr('**********'), client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x000001C335659B40>, default_metadata=(), model_kwargs={})

In [79]:
llm.invoke("What is LLM?")

AIMessage(content='**LLM stands for Large Language Model.**\n\nIn simple terms, an LLM is a type of **artificial intelligence (AI) program** that has been trained on an enormous amount of text data to understand, generate, and process human-like language.\n\nLet\'s break down the name:\n\n1.  **Large:** This refers to two main things:\n    *   **Data:** LLMs are trained on truly massive datasets of text, often comprising trillions of words from the internet (web pages, books, articles, code, conversations, etc.).\n    *   **Parameters:** They have billions, sometimes even trillions, of internal variables (parameters) that they adjust during training. The more parameters, the more complex patterns and nuances they can learn, leading to more sophisticated and human-like output.\n\n2.  **Language:** This indicates their primary focus: natural human language. They are designed to work with words, sentences, paragraphs, and the intricate rules and meanings that govern human communication.\n

### RAG Modern Chain

In [80]:
from langchain.prompts import ChatPromptTemplate 
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

In [81]:
vector_retriever = vector_db.as_retriever(
    search_kargs={"k":3}
)
vector_retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001C33565BD30>, search_kwargs={})

### A chatprompt template 
- Basically why I am usin this is because we need to talk to LLM at the end, so we need a prompt for it

In [82]:
system_prompt = """ You are a assistant for question-answering tasks. 
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum to answer and keep it concise.
Context : {context}"""

In [83]:
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])

- Stuffing releveant context from db in create stuff document chain

In [84]:
document_chain = create_stuff_documents_chain(llm,prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template=" You are a assistant for question-answering tasks. \nUse the following pieces of context to answer the question at the end.\nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\nUse three sentences maximum to answer and keep it concise.\nContext : {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatGoogleGenerativeAI(model='models/gemini-2.5-flash', google_api_key=SecretStr('**********'), clie

In [85]:
## Final RAG Chain
rag_chain = create_retrieval_chain(vector_retriever,document_chain)
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001C33565BD30>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template=" You are a assistant for question-answering tasks. \nUse the following pieces of context to answer the question at the end.\nIf you

- Invoking Final rag chain

In [86]:
results = rag_chain.invoke({"input":"Where is MLR Institute of Technology located?"})
results

{'input': 'Where is MLR Institute of Technology located?',
 'context': [Document(metadata={'topic': 'Machine Learning', 'source': 'copilot'}, page_content='Machine learning is a field of artificial intelligence that focuses on enabling computers to learn'),
  Document(metadata={'source': 'copilot', 'topic': 'Machine Learning'}, page_content='Machine learning is a field of artificial intelligence that focuses on enabling computers to learn'),
  Document(metadata={}, page_content='LLMs with external data.'),
  Document(metadata={}, page_content='LLMs with external data.')],
 'answer': "I don't know the answer to where MLR Institute of Technology is located. The provided context does not contain information about educational institutions or their locations."}

### RAG Chain - Using LangChain Expression Language 

In [87]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough,RunnableParallel 

In [88]:
custom_prompt = ChatPromptTemplate.from_template(
    """" 
    You are a assistant for question-answering tasks. 
    Use the following pieces of context to answer the question at the end.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    Use three sentences maximum to answer and keep it concise.
    Context : {context}
    Question : {input} 
    Answer : """
)

In [89]:
def stuff_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [90]:
rag_chain_lcel = (
    {"context" : vector_retriever | stuff_docs,
    "question" : RunnablePassthrough()
    }
    | custom_prompt
    | llm
    | StrOutputParser()
)

In [91]:
results = rag_chain.invoke({"input" : "What is RAG?"})
results['answer']

'I apologize, but the provided context does not define what RAG stands for or what it is. It only describes a pipeline involving RAG with LangChain.'

### Adding new data to VectorDB

In [92]:
new_document = """ 
    NodeJs is a JavaScript runtime built on Chrome's V8 JavaScript engine. It allows developers to run JavaScript code on the server side, enabling the creation of scalable network applications. NodeJs is known for its event-driven, non-blocking I/O model, which makes it efficient and suitable for real-time applications.
"""

In [93]:
new_doc = Document(
    page_content=new_document,
    metadata={"source":"copilot","topic": "NodeJs"}
)

In [94]:
new_chunks = text_splitter.split_documents([new_doc])

In [95]:
vector_db.add_documents(new_chunks)

['cacb1163-36f7-4d97-afdb-d5a2f32cd0c6',
 '7dde22e4-ef31-49e5-bc52-a52ed3322148',
 '462e66c8-47b8-4d52-8ab2-859b6106d70f',
 '312a486f-c3aa-4206-a117-1c1507509641']

### Advance RAG Technique - Conversational Memory

In [96]:
from langchain_core.messages import AIMessage, HumanMessage
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

- We need this to remember previous context
- create_history_aware it makes the retriever understand conversation context
- MessagePlaceHolder - place holder for previous data/ chat history
- Human - input, AI - output messages

In [97]:
new_document = """ Machine learning is a field of artificial intelligence that focuses on enabling computers to learn from data and improve their performance on specific tasks without being explicitly programmed. It involves the development of algorithms and statistical models that can analyze and interpret complex datasets to make predictions or decisions. Machine learning is widely used in various applications, including image recognition, natural language processing, recommendation systems, and autonomous vehicles. Types of Machine Learning: Supervised Learning, Unsupervised Learning, Reinforcement Learning. """

In [98]:
new_doc = Document(
    page_content=new_document,
    metadata={"source":"copilot","topic": "Machine Learning"}
)

In [99]:
new_chunks = text_splitter.split_documents([new_doc])
vector_db.add_documents(new_chunks)

['b14d329c-f8db-4dc3-afdf-76ce044db58f',
 '9099f95b-1d1f-482a-b8f0-a6b649409f46',
 '7fc0b351-4ee1-4778-be72-4049a7484b1c',
 '48e690b1-498b-4e29-8036-a70380ac3a5b',
 '771d3364-7d7b-47da-97dd-9eb3a69496f9',
 '11b90ba9-7ff9-472d-ba69-586c5e68b8cd',
 'd0536bd3-1b39-4e8b-896f-8c04661f8785']

In [100]:
new_document = """ Types of fruits are apples, bananas, oranges, grapes, and strawberries. Fruits are a rich source of vitamins, minerals, and fiber, making them an essential part of a healthy diet. They can be consumed fresh, dried, or juiced, and are often used in cooking and baking to add natural sweetness and flavor. Regular consumption of fruits is associated with numerous health benefits, including improved digestion, reduced risk of chronic diseases, and enhanced immune function. """

In [101]:
new_doc = Document(
    page_content=new_document,
    metadata={"source":"copilot","topic": "Fruits"}
)

In [102]:
new_chunks = text_splitter.split_documents([new_doc])
vector_db.add_documents(new_chunks)

['6d13c96d-20c1-4f2c-9cd7-e9dfce3e3f7b',
 '2aed39c3-5f10-48ef-9c27-c92b108d75a7',
 'edea1d9c-1cda-42be-8472-0b6df1e2f4ae',
 'c208d6fa-3953-413a-94a5-708c0be08acb',
 '1c45d82b-5cd6-4394-bbaf-d29ecab6d0e7',
 '8c34eae9-0a06-4e37-8f83-0a83126654c1']

In [103]:
contextualize_q_system_prompt = """ Given a chat history and  the latest user question which might reference context in chat history, formulate a standalone question that can be understood without the chat history.
 Do NOT answer the question, just reformulate it if needed and otherwise return as it is ."""

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])

In [104]:
history_aware_retriever = create_history_aware_retriever(
    llm, vector_retriever, contextualize_q_prompt
)
history_aware_retriever

RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
| VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001C33565BD30>, search_kwargs={}))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag

In [105]:
system_prompt = """ You are a assistant for question-answering tasks. 
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum to answer and keep it concise.
Context : {context}"""

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])

question_answer_chain = create_stuff_documents_chain(llm,qa_prompt)

conversational_rag_chain = create_retrieval_chain(
    history_aware_retriever, question_answer_chain
)

In [106]:
chat_history = []

result1 = conversational_rag_chain.invoke(
    {"chat_history": chat_history,
     "input":"What is Machine Learning?"
    }
)
result1['answer']

'Machine learning is a field of artificial intelligence that focuses on enabling computers to learn. It involves teaching computers to make decisions without being explicitly programmed.'

In [107]:
chat_history = chat_history[:-2]
chat_history 

[]

In [108]:
chat_history.extend([ 
    HumanMessage(content="What is Machine Learning?"),
    AIMessage(content=result1['answer'])
])

In [109]:
chat_history

[HumanMessage(content='What is Machine Learning?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Machine learning is a field of artificial intelligence that focuses on enabling computers to learn. It involves teaching computers to make decisions without being explicitly programmed.', additional_kwargs={}, response_metadata={})]

In [110]:
result2 = conversational_rag_chain.invoke(
    {"input":"What are it different types?", "chat_history": chat_history}
)
result2['answer'] 

"I don't know the answer based on the provided context. The context explains what Machine Learning is but does not mention its different types."

In [111]:
new_document = """ 
    List in html is used to group a set of related items. There are two types of lists in HTML: unordered lists (ul) and ordered lists (ol). Unordered lists""" 

In [112]:
new_doc = Document(
    page_content=new_document,
    metadata={"source":"copilot","topic": "Lists"}
)

In [113]:
new_chunks = text_splitter.split_documents([new_doc])
vector_db.add_documents(new_chunks)

['11531927-37c1-413b-a998-04e83ef04fd0',
 '43c0e45f-08fb-405f-a146-65ce2ea6c565']

In [114]:
result1 = conversational_rag_chain.invoke(
    {"chat_history": chat_history,
     "input":"What is List in html?"
    }
)
result1['answer']

'A list in HTML is used to group a set of related items. There are two types of lists in HTML: unordered lists (ul) and ordered lists (ol).'

In [None]:
chat_history.extend([
    HumanMessage(content="What is List in html?"), AIMessage(content=result1['answer'])
])

In [118]:
chat_history 

[HumanMessage(content='What is Machine Learning?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Machine learning is a field of artificial intelligence that focuses on enabling computers to learn. It involves teaching computers to make decisions without being explicitly programmed.', additional_kwargs={}, response_metadata={}),
 (HumanMessage(content='What is List in html?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='A list in HTML is used to group a set of related items. There are two types of lists in HTML: unordered lists (ul) and ordered lists (ol).', additional_kwargs={}, response_metadata={}))]

In [119]:
flattened = []
for item in chat_history:
    if isinstance(item, tuple):
        flattened.extend(item)
    else:
        flattened.append(item)
chat_history = flattened
chat_history

[HumanMessage(content='What is Machine Learning?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Machine learning is a field of artificial intelligence that focuses on enabling computers to learn. It involves teaching computers to make decisions without being explicitly programmed.', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='What is List in html?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='A list in HTML is used to group a set of related items. There are two types of lists in HTML: unordered lists (ul) and ordered lists (ol).', additional_kwargs={}, response_metadata={})]

In [120]:
result2 = conversational_rag_chain.invoke({
    "input":"What are it different types?", 
    "chat_history": chat_history
})
result2['answer']

'The different types of lists in HTML are unordered lists (ul) and ordered lists (ol).'