# In this notebook we'll build a simple RAG pipeline and explore ways of integrating chat history in the context

I have used `Ollama` to run the llm locally. The entire pipeline runs over the CPU.  
[Setting up Ollama - Step-by-step guide](https://pulkit12dhingra.github.io/Blog/content/Setting_up_Ollama.html)

## First let's build a simple RAG pipeline
[Building a RAG pipeline](https://pulkit12dhingra.github.io/Blog/content/Building_a_RAG_Pipeline_with_PDFs.html)

In [32]:
%pip install -r requirements.txt

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


In [None]:
# In this step we'll read the text files from the 'data' directory and put them as a list
# loading the text file conversation
from langchain.document_loaders import TextLoader

# use os module to go through the directory and load the text files
import os 
# Store all loaded documents
documents = []

data_dir = "data"  # replace with your directory
# Loop through all files in the directory
for filename in os.listdir(data_dir):
    if filename.endswith(".txt"):
        file_path = os.path.join(data_dir, filename)
        loader = TextLoader(file_path)
        documents.extend(loader.load())  # append loaded documents


We can load a variety of documents using lanchains 

[Loading documents via langchain](https://pulkit12dhingra.github.io/Blog/content/LangChain_Document_Loaders.html)

[official documentation](https://python.langchain.com/docs/integrations/document_loaders/)

Now we

In [36]:
# split the big text into smaller chunks 

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)

In [37]:
# embedding the text chunks

from langchain.embeddings import HuggingFaceEmbeddings
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")



Next we need to create a vectorstore (like a space) to save all the embeddings. It acts as a database to fetch the related contextual embeddings before prompting the LLM.

[Vector Stores](https://python.langchain.com/docs/integrations/vectorstores/)

In [38]:
from langchain.vectorstores import FAISS

vectorstore = FAISS.from_documents(chunks, embedding=embedding_model)
retriever = vectorstore.as_retriever()


Langchain supports a wide variety of models.

[Official documentation on loading different models](https://python.langchain.com/docs/integrations/chat/)

In [39]:
from langchain.llms import Ollama

# update this to load the different model

# Initialize Ollama with the gemma3 model
llm = Ollama(model="gemma3")
  

In [40]:
from langchain.chains import RetrievalQA
# RAG chain
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

# Ask a question
query = "What is the summary of the text?"
response = rag_chain(query)

print("Answer:", response['result'])

Answer: The text discusses several key concepts in the field of Artificial Intelligence (AI). It defines AI as the broader concept of machines performing intelligent tasks, and outlines different types of AI like Machine Learning (ML) and Deep Learning (DL). It also details Natural Language Processing (NLP) as the focus on enabling machines to understand and generate human language. Furthermore, it describes the Turing Test as a measure of a machine’s ability to mimic human intelligence, and highlights ethical concerns surrounding AI such as bias, job displacement, and lack of transparency. Finally, it introduces Explainable AI (XAI) as a method for making AI decisions more understandable and accountable.


# Extend the pipeline to maintain chat history and provide additional context

In [42]:
# history aware chat

# Import function to create a retriever that can use chat history to reformulate questions
from langchain.chains import create_history_aware_retriever

# Import tools to construct prompt templates
from langchain_core.prompts import ChatPromptTemplate,MessagesPlaceholder


# Define a prompt that instructs the LLM on how to reformulate questions
retriever_prompt = (
    "Given a chat history and the latest user question which might reference context in the chat history,"
    "formulate a standalone question which can be understood without the chat history."
    "Do NOT answer the question, just reformulate it if needed and otherwise return it as is."
)


contextualize_q_prompt  = ChatPromptTemplate.from_messages(
    [
        ("system", retriever_prompt), # Instructions to the model
        MessagesPlaceholder(variable_name="chat_history"), # Past conversation messages
        ("human", "{input}"), # Latest user question


     ]
)

history_aware_retriever = create_history_aware_retriever(llm,
                                                         retriever,
                                                         contextualize_q_prompt # defined above
                                                         )



In [None]:
# This type of chain puts all retrieved documents into the context at once
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_prompt = ChatPromptTemplate.from_messages(
    [
        MessagesPlaceholder("chat_history"), # Inserts the full chat history so the model understands the conversation flow
        ("human", "{input}"), # Inserts the latest user question into the prompt
    ]
)

In [None]:
from langchain_core.prompts import PromptTemplate

# Define a PromptTemplate that explicitly includes "context" and "input" as variables.
# This is used to feed retrieved documents and user questions into the LLM.
stuff_prompt = PromptTemplate(
	input_variables=["context", "input"],
	template=(
		"Use the following pieces of context to answer the question at the end.\n"
		"If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n"
		"{context}\n\nQuestion: {input}\nHelpful Answer:"
	)
)
# This chain feeds all retrieved context + the user query into the LLM using the above format.
question_answer_chain = create_stuff_documents_chain(llm, stuff_prompt)

In [43]:
from langchain.chains import create_retrieval_chain

# Combine the history-aware retriever and the QA chain into a full RAG pipeline
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

In [44]:
from langchain_core.messages import HumanMessage, AIMessage
chat_history = []

question1 = "What is the summary of the text?"
message1= rag_chain.invoke({"input": question1, "chat_history": chat_history})

In [45]:
message1["answer"]

'The text discusses several key concepts in the field of Artificial Intelligence (AI). It defines AI as the broader concept of machines performing intelligent tasks, and introduces Machine Learning (ML) and Deep Learning (DL) as subsets of AI. It also details the Turing Test as a measure of a machine’s ability to mimic human intelligence, and Natural Language Processing (NLP) as the focus on enabling machines to understand and generate human language. Furthermore, it highlights ethical concerns surrounding AI such as bias, job displacement, and lack of transparency, and introduces Explainable AI (XAI) as a solution to improve trust and accountability in AI systems.'

In [46]:
chat_history.extend(
    [
        HumanMessage(content=question1),
        AIMessage(content=message1["answer"]),
    ]
)

In [47]:
chat_history


[HumanMessage(content='What is the summary of the text?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='The text discusses several key concepts in the field of Artificial Intelligence (AI). It defines AI as the broader concept of machines performing intelligent tasks, and introduces Machine Learning (ML) and Deep Learning (DL) as subsets of AI. It also details the Turing Test as a measure of a machine’s ability to mimic human intelligence, and Natural Language Processing (NLP) as the focus on enabling machines to understand and generate human language. Furthermore, it highlights ethical concerns surrounding AI such as bias, job displacement, and lack of transparency, and introduces Explainable AI (XAI) as a solution to improve trust and accountability in AI systems.', additional_kwargs={}, response_metadata={})]

In [48]:

second_question = "provide the previous answer within 100 characters"
message2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})

print(message2["answer"])

AI is the simulation of human intelligence in machines.


# Session-aware chat memory in a conversational RAG pipeline 

In [49]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

In [23]:
store = {}

In [None]:
# Function to retrieve or create a chat history object for a given session ID

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

# Creates an in-memory dictionary store to hold chat histories for different users or sessions.
# Ensures each session has a dedicated message history, using the session ID as a key.

In [50]:
conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain, # The RAG chain created earlier
    get_session_history, # Function that supplies chat history per session
    input_messages_key="input",  # Key in input dict for the user message
    history_messages_key="chat_history", # Used by LangChain to auto-track conversation
    output_messages_key="answer", # Key to extract the model's response
)

In [51]:

conversational_rag_chain.invoke(
    {"input": "Summarize the text"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

'The text explains Artificial Intelligence (AI) as the simulation of human intelligence in machines. It differentiates between AI, Machine Learning (ML), and Deep Learning (DL), outlining their relationships. It then details the main types of AI: Narrow AI, General AI, and Super AI. Finally, it lists several real-world applications of AI, including virtual assistants, recommendation engines, self-driving cars, fraud detection, medical diagnosis, and language translation, and describes Natural Language Processing (NLP) as a field focused on enabling machines to understand and generate human language.'

In [52]:
store

{'abc123': InMemoryChatMessageHistory(messages=[HumanMessage(content='Summarize the text', additional_kwargs={}, response_metadata={}), AIMessage(content='The text describes several key areas within Artificial Intelligence (AI). AI is the broader concept of machines performing intelligent tasks. It encompasses various types, including Narrow AI (or Weak AI), which is specialized like recommendation engines and chatbots, and General AI, which would possess human-level intelligence. The text also discusses related fields like Machine Learning (ML) and Deep Learning (DL), which utilize neural networks. Furthermore, it highlights specific applications of AI such as virtual assistants, self-driving cars, fraud detection, medical diagnosis, and language translation, alongside the historical Turing Test, which assesses a machine’s ability to mimic human conversation.', additional_kwargs={}, response_metadata={}), HumanMessage(content='provide the previous answer within 100 characters', additi

In [53]:
conversational_rag_chain.invoke(
    {"input": "provide the previous answer within 100 characters"},
    config={"configurable": {"session_id": "abc123"}},
)["answer"]

'Narrow, general, and super AI are the main types.'

In [54]:
for message in store["abc123"].messages:
    if isinstance(message, AIMessage):
        prefix = "AI"
    else:
        prefix = "User"
    print(f"{prefix}: {message.content}\n")

User: Summarize the text

AI: The text describes several key areas within Artificial Intelligence (AI). AI is the broader concept of machines performing intelligent tasks. It encompasses various types, including Narrow AI (or Weak AI), which is specialized like recommendation engines and chatbots, and General AI, which would possess human-level intelligence. The text also discusses related fields like Machine Learning (ML) and Deep Learning (DL), which utilize neural networks. Furthermore, it highlights specific applications of AI such as virtual assistants, self-driving cars, fraud detection, medical diagnosis, and language translation, alongside the historical Turing Test, which assesses a machine’s ability to mimic human conversation.

User: provide the previous answer within 100 characters

AI: AI encompasses narrow, general, and super AI types, with ethical concerns like bias and job displacement.

User: Summarize the text

AI: The text explains Artificial Intelligence (AI) as the