# Local Retrieval Augmented Generation (RAG) Chatbot

**Auther** Venkat Krishna Madabooshini  
**Description:** This notebook demonstrates a RAG - powered local chatbot capabale of answering queries on PDF/text documents using LangChain, ChromaDB, Ollama and LLM model of choice. Chatbot also supports persistent converssation memory for seamless multi-turn interactions. 


## 1. Setup & Installation  

The following code installs necessary packages (run only if not already installed).  

In [None]:
#follow thse steps to create an isolated python environment and install all dependencies
# create a virtual envrionment
# python -m venv venv
# activate the virtual environment
# On windows (PowerShell):
# venv\Scripts\Activate.ps1
# On macOS/Linux:
# source venv/bin/activate
# pip install langchain chromadb langchain-community "unstructured[pdf]" langchain-chroma langchain-ollama


## 2. Import Required Libraries

In [None]:
#Libraries for loading in the text document and spliting the data into chunks
from langchain_community.document_loaders import TextLoader
#from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
#Libraries for creating the (chromadb) vector database, text embeddings 
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings
from langchain_ollama import OllamaLLM
#Libraries for creating retrieval chains and memory
from langchain.chains.retrieval import create_retrieval_chain
from langchain_core.prompts import PromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.memory import ConversationBufferMemory
from langchain.schema.messages import AIMessage, HumanMessage
#Misc Libraries
import os
import json
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

## 3. Persistent Conversation Memory Class  
Stores and loads conversation history to/from disk, ensuring user sessions are persistent across runs.

In [2]:
class PersistentConversationMemory:
    """
    A wrapper class around ConversationBufferMemory that adds persistent storage
    allowing converstaion history to be saved and retrieved from the disk.
    This enables conversation continuity across different application sessions.
    """
    def __init__(self, filepath="memory.json", **kwargs):
        """
        Initialize the persistent memory wrapper.
        filepath (str): Path tp the JSON file where conversation history will be stored.
                        defaults to "memory.json"
        **kwargs: Additional arguments passed to the underlying ConversationBufferMemory
        """
        self.filepath = filepath #store the filepath for persistent storage
        self.memory = ConversationBufferMemory(**kwargs) #create the underlying Langchain memory object
        self._load_memory() #Load any existing converstaion history from disk

    def _load_memory(self):
        """
        Load the conversation history from the JSON file if it exists.
        The JSON file contains a list of message obkects with 'tyoe' and 'content' fields.
        Messages are converted back to HumanMessage and AIMessage objects.
        """
        if os.path.exists(self.filepath): #check if the path exists
            try:
                with open(self.filepath,"r") as f: #open and parse the JSON file
                    data = json.load(f)
                self.memory.chat_memory.messages = [ #convert JSON data back to Langchain message objects
                    HumanMessage(content=msg["content"]) 
                    if msg["type"] == "human"
                    else AIMessage(content=msg["content"])
                    for msg in data]
                print(f"[Memory] Loaded {len(self.memory.chat_memory.messages)} messages from the disk")
            except Exception as e: #error handling
                print(f"[Memory] failed to load Memory: {e}")
        else: #no file found
            print("[Memory] No existing memory file found.")

    def save_memory(self):
        """
        Save the current conversation history to disk on JSON.
        Messages are serialized as dictionaries with 'type' amd 'content' fields.
        """
        try:
            data = [ #convert LangChain message object to serializable dictionaries
                {"type" : "human" if isinstance(msg, HumanMessage) 
                 else "ai", "content" : msg.content}
                 for msg in self.memory.chat_memory.messages]
            with open(self.filepath, "w") as f: #write the data to JSON file
                json.dump(data, f, indent = 2)        
            print(f"[Memory] Saved {len(data)} messages to {self.filepath}")
        except Exception as e: #error handling
            print(f"[Memory] failed to save Memory: {e}")

    def save_context(self, inputs, outputs):
        """
        Save a new converstaion to memory and then to disk.
        This method extends the standard Langchain save_context by 
        automatically saving the updated conversation history to JSON file.
        inputs (dict): the input data (typicaly user query)
        outputs (dict): the output data (typically ai response)
        """
        self.memory.save_context(inputs, outputs) #save the underlying context to memory object
        self.save_memory() #save it to disk

    def load_memory_variables(self, inputs):
        """
        Load memory variables for use in the conversation chain.
        This is a passthrough method that calls the underlying memory' load_memory_variables method
        inputs (dict): input variables (typicaly empty dict)
        return value: memory variables including chat history 
        """
        return self.memory.load_memory_variables(inputs)


## 4. Document Loader & Text Chunking  

Load the text file and split it into manageable chunks for effective retrieval.

In [None]:
txt_path = "data/book.txt"
print(f"text Path: {txt_path}")
# Using TextLoader as example could be replaced with UnstructuredPDFLoader for loading PDF files.
loader = TextLoader(txt_path) #intialize the TextLoader to extract text from text file.
documents = loader.load() #returns a list of document objects containing the extracted text.

text_splitter = RecursiveCharacterTextSplitter( #configure the text splitter for optimal chunk creation.
    chunk_size = 1000, #balance the context and retrieval efficiency, lager chunks -> more context may reduce retrieval precision.
    chunk_overlap = 500, #overlap to maintain context continuity between chunks.
    separators=["\n\n", "\n", ". ", " ", ""]) #define seperators in order for splitting text.

chunks = text_splitter.split_documents(documents) #split larger documents into smaller manageable chunks
print(f"Split {len(documents)} document(s) into {len(chunks)} chunks.")


PDF Path: data/book.txt
Split 1 document(s) into 1172 chunks.


## 5. Vector Database Setup

Initialize or load persistent vector database with document embeddings.

In [4]:
persist_dir = "chromaDB"  #define the directory path where ChromaDB will store vector embeddings
print(f"ChromaDB path: {persist_dir}")

#Initialize Ollama embeddings using nomic-embed-text model
#this model converts text chunks into high-dementional vectors for semantic search
ollama_embeddings = OllamaEmbeddings(model="nomic-embed-text:latest")

if os.path.isdir(persist_dir): #Load existing vector database from disk
    db = Chroma(persist_directory=persist_dir,
                embedding_function=ollama_embeddings)
    print(f"Loaded existing ChromaDB from {persist_dir}")
else: #Create a new vector database from document chunks
    #This process involves:
    #-1. computing embeddings for each text chunk using embedding model
    #-2. store vectors in ChromaDB with associated metadata
    #-3. creating search indices for efficient similarity retrieval
    db = Chroma.from_documents(
        chunks, #text chunks to be embedded
        ollama_embeddings, #embedding model
        persist_directory=persist_dir) # directory for persistent storage
    print(f"Created and persisted now ChromaDB to {persist_dir}")



ChromaDB path: chromaDB
Created and persisted now ChromaDB to chromaDB


## 6. Language Model & Retrieval Chain Setup  

Configure the LLM and the retrieval pipeline for answering queries.

In [5]:
#create a retriever from vector database for semantic document search
#this will return the most relevant document chunks based on query similarity
retriever = db.as_retriever()

#model options for different use cases
# - deepseek-r1:14b  (reasoning model)
# - gemma3:12b (multi-modal model supporting text and image processing)

llm = OllamaLLM(model="gemma3:12b",
             callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])) #enable streaming output for realtime response display

prompt_template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up the answer.

{context}

Question: {input}
"""

print(f"PromptTemplate: {prompt_template}")

#convert the string template into a langchain PromptTemplate object
#enables dynamic variable substitution for context and input

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context","input"])

#create a document processing chain:
# -  takes retrieved documents and user query
# -  combines them using the prompt template
combine_docs_chain = create_stuff_documents_chain(llm=llm, prompt=PROMPT)

retrieval_chain = create_retrieval_chain(retriever= retriever, #vecotr database retriever
                                        combine_docs_chain=combine_docs_chain) #document processing chain   
                                                      



PromptTemplate: Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up the answer.

{context}

Question: {input}



## 7. Persistent Conversation Memory initialization

In [6]:
memory_file = "persistent_memory.json"

#initializes persistent memory wrapper class
#return_messages = True returns actual HumanMessage and AIMessage objects instead of a single concatenated string
memory = PersistentConversationMemory(memory_key = "chat_history", #this key will be used when loading converstaion context for the LLM 
                                      return_messages = True, file_path=memory_file)

# memory system behaviour notes:
# - on initialization: automatically loads any existing conversation history
# - during conversation: maintains context in memory for multi-turn interactions
# - After each exchange: automatically saves the updated history to JSON file
# - file format: list of messages with 'type' and 'context' fields'
# - this enables seamless conversation continuation across application restarts.

[Memory] Loaded 8 messages from the disk


  self.memory = ConversationBufferMemory(**kwargs) #create the underlying Langchain memory object


## 8. Conversational Query Helper

Encapsulates the retrieval and memory upate logic for a single question-answer turn.

In [7]:
def conversational_runnable(data):
    """
    Execute a single conversational turn in the RAG system with persistant memory.
    This function orchestrates the complete RAG workflow:
    - 1. Extracts the user query from input data.
    - 2. loads converstaion history from persistent storage.
    - 3. Invokes the retrieval chain with context and query.
    - 4. saves the new interaction to persistent memory.
    - 5. returns the complete response with answer and source documents.

    data (dict): Input dictionary containing the user query under "input" key.
    returns: (dict) complete response containing:
        - "answer" : the AI - generated response in text.
        - "context" : list of retrieved document chunks used as sources.
        - additional metadata from the retrieval chain.
    """
    question = data["input"] #extract the user's question from the input data structure
    
    chat_history = memory.load_memory_variables({})["chat_history"] #retrieve the complete conversation history from the persistent memory
    
    #Use the retriever to find the relevant documnet chunks and combine it with chat history to generate a response using the constructed prompt.
    response = retrieval_chain.invoke({"input" : question, "chat_history" : chat_history})
    #save this question-answer pair to persistent memory.
    memory.save_context({"input" : question}, {"output" : response["answer"]})

    return response

## 9. Demo: Run a Sample Query

In [8]:
query = "What is the main topic discussed in this document?"
response = conversational_runnable({"input": query})

print(f"\nQuery : {query}")
print(f"AI Response: {response['answer']}")

#optionally print the source documents for context

print("\n Source Documents:")
for doc in response['context']:
    print(f" - Content: {doc.page_content[:200]} ... ")
    print(f"   Metadata: {doc.metadata}")

The main topic discussed in this document is Jules Verne's novel, "Twenty Thousand Leagues Under the Sea." It details various aspects of the story, including encounters, geographical locations, and a character's wealth.[Memory] Saved 10 messages to memory.json

Query : What is the main topic discussed in this document?
AI Response: The main topic discussed in this document is Jules Verne's novel, "Twenty Thousand Leagues Under the Sea." It details various aspects of the story, including encounters, geographical locations, and a character's wealth.

 Source Documents:
 - Content: "One last question, Captain Nemo."
"Ask it, Professor."
"You are rich?"
"Immensely rich, sir; and I could, without missing it, pay the national debt of
France."
I stared at the singular person who spo ... 
   Metadata: {'source': 'data/book.txt'}
 - Content: clear of all land at a few yards beneath the waves of the Atlantic.
CHAPTER XI
THE SARGASSO SEA
That day the Nautilus crossed a singular part of the Atlant

## 10. Next Steps & Improvements

 - Add a Web UI for interactive querying
 - Implement multi-document retrieval
 - Enhance persistent memory structure for user profiles.