## Imports

In [None]:
import os
import shutil
import gradio as gr

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma

from langchain.memory import ConversationBufferMemory
from langchain.prompts import ChatPromptTemplate
from langchain.prompts.prompt import PromptTemplate  

# Embeddings / LLM imports (choose what you need)
from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain_community.llms import HuggingFaceHub
from langchain_community.llms.ollama import Ollama


#### Each import sets up the pieces needed to load the PDF, process it into a searchable form, and run a conversation through a chat interface.

## Configuration

In [None]:
OPENAI_API_KEY = "sk-proj-***"
openai_model = "gpt-4o-mini"
llama_model = "llama3.1:8b"
db_name = "vector_db"
pdf_path = "monopoly.pdf"

### This section defines configuration variables:

* **OPENAI_API_KEY:** Your OpenAI API key (redacted here). It’s needed if you use OpenAI models.

* **openai_model:** The name of the OpenAI model to use. *"gpt-4o-mini"* refers to GPT-4o (“Omni”) Mini, a smaller GPT model.

* **llama_model:** The name of the local model for Ollama. *"llama3.1:8b"* likely refers to a Llama 3 model with 8 billion parameters, pulled into Ollama’s local service. Ollama allows you to run such open-source LLMs locally.

* **db_name:** Name of the directory where the vector database will be saved ("vector_db").

* **pdf_path:** The path to the PDF file that will be used as the knowledge source ("monopoly.pdf").

## Helper Functions

In [None]:
def PDFLoader(file_path: str):
    loader = PyPDFLoader(file_path)
    return loader.load()

def TextSplitter(documents):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=800,
        chunk_overlap=80,
        length_function=len,
        is_separator_regex=False
    )
    return text_splitter.split_documents(documents)

### These two functions prepare the PDF content for processing:

* **PDFLoader:** Takes a file path (string), creates a PyPDFLoader for that PDF, and calls *loader.load()*. This returns a list of LangChain Document objects, where each document represents the text of part of the PDF (for example, one page per document by default).

* **TextSplitter:** Takes a list of Document objects and splits them into smaller chunks. It uses *RecursiveCharacterTextSplitter* with a *chunk_size* of 800 characters and an *chunk_overlap* of 80. This means each chunk is ~800 characters long, and adjacent chunks overlap by 80 characters to preserve context between splits. The *split_documents* method returns a new list of smaller Document chunks. This ensures the text is in manageable pieces for embedding and retrieval.

Together, these helpers load the PDF content and divide it into semantically meaningful pieces. This is crucial for building an effective embedding index and for improving answer quality (smaller chunks keep context local).

## Embedding Factories

In [None]:
def GetHuggingFaceEmbedding():
    return HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

def GetOpenAIEmbedding():
    # If you want openai embeddings instead, instantiate here (requires OpenAI key)
    return OpenAIEmbeddings(api_key=OPENAI_API_KEY, model="text-embedding-3-small")

### This section defines two helper functions to create embedding generator objects:

* **GetHuggingFaceEmbedding:** Instantiates *HuggingFaceEmbeddings* using the model *sentence-transformers/all-MiniLM-L6-v2*. This is a popular, lightweight model for encoding sentences into numeric vectors.

* **GetOpenAIEmbedding:** Instantiates *OpenAIEmbeddings* using the OpenAI model *text-embedding-3-small*, requiring your API key. This uses OpenAI’s embedding service to turn text into vectors.

You can choose which embedding source to use by calling the appropriate function. For example, using Hugging Face embeddings avoids API calls but runs locally; OpenAI embeddings may yield different embeddings but require the API key.

## Vector Store Creation

In [None]:
def GetVectorstore(chunks, embeddings, persist_directory=db_name):
    try:
        if os.path.exists(persist_directory):
            shutil.rmtree(persist_directory)
    except PermissionError:
        print(f"Warning: Could not delete {persist_directory}. Using existing database.")
    
    return Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=persist_directory)

### This function creates (or recreates) the vector database (Chroma) from text chunks:

* It first checks if a directory named *persist_directory* (default db_name, i.e. "vector_db") exists. If so, it attempts to delete that directory with *shutil.rmtree*, ensuring we start fresh. If permission is denied (perhaps the directory is in use), it prints a warning and proceeds with the existing data.

* Then it calls *Chroma.from_documents(...)* with the list of chunks and the specified embeddings. This builds a Chroma vectorstore: each chunk of text is converted to a vector (using the embeddings provided) and stored in the database.

* Because *persist_directory* is given, Chroma will save the index files there. The LangChain docs note that a *persist_directory* means the collection will be saved to disk. This allows the index to be reused without recomputing in future runs.

*In summary*, GetVectorstore prepares a searchable semantic index of the PDF content. This index lets us later find relevant chunks when a user asks a question.

## Loading PDF & Building the Vector Database

In [None]:
documents = PDFLoader(pdf_path)
chunks = TextSplitter(documents)

def Embeddings(model): 
    if model == "GPT4o-mini":
        return GetOpenAIEmbedding()
    elif model == "Llama3-8b":
        return GetHuggingFaceEmbedding()

vectorstore = GetVectorstore(chunks, Embeddings("GPT4o-mini"), persist_directory=db_name)
#vectorstore = GetVectorstore(chunks, Embeddings("Llama3-8b"), persist_directory=db_name)
print("Vectorstore created.")

### This part of the code actually runs the loading and vector store creation:

**1. Load PDF**

**2. Split text**

**3. Select embeddings:** The Embeddings function is intended to pick an embedding factory based on the model name. if model is "GPT4o-mini", use OpenAI embeddings; if "Llama3-8b", use Hugging Face.

**4. Build vectorstore:** *GetVectorstore(chunks, Embeddings("GPT4o-mini"), persist_directory=db_name)* is called. Here *"GPT4o-mini"* is passed to *Embeddings*, but due to the code, it ends up using Hugging Face embeddings on the chunks. The resulting vectorstore is saved in *"vector_db"*.

At this point, we have a Chroma database of vectors for each text chunk from the PDF. This database can be queried semantically to find the chunks most relevant to any question.

## Conversational Retrieval with Memory

In [None]:
# Global memory object to persist conversation history across queries
global_memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

def ConversationRetrieval(query: str, llm, memory=global_memory):
    retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

    custom_prompt = ChatPromptTemplate.from_template(
        """Use the following context to answer the question:

{context}

Question: {question}

Answer:"""
    )

    conversation_chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        memory=memory,
        combine_docs_chain_kwargs={"prompt": custom_prompt}
    )

    result = conversation_chain.invoke({"question": query})
    return result.get("answer")


### This section defines how to answer a query using the vectorstore and memory:

* **Memory:** global_memory is a *ConversationBufferMemory* that stores all past messages. It has *return_messages=True*, meaning it keeps the history as a list of messages. This memory is passed to the chain so that each new question can be answered in context.

* **ConversationRetrieval function:** Takes a user query (string) and an LLM instance.

    * It creates a retriever from the vectorstore with *k=5*, meaning up to 5 relevant documents (text chunks) will be retrieved for each query.

    * It defines a *custom_prompt* template. This template tells the model: *“Use the following context to answer the question...”* and fills in *{context} (retrieved docs)* and *{question}*.

    * It builds a *ConversationalRetrievalChain* with the given *llm*, *retriever*, and *memory*. The *combine_docs_chain_kwargs* include our *custom_prompt* for how to present the retrieved docs to the model.

    * Finally, it calls *conversation_chain.invoke({"question": query})*. This runs the chain: it uses the chat history and new question to fetch relevant documents and then generates an answer. It returns the generated answer string.

**In summary**, this sets up a retrieval-augmented generation (RAG) system with chat memory. According to the LangChain docs, *“This chain takes in chat history ... and new questions, and then returns an answer”* by combining retrieval and LLM steps. The conversation history plus the newly retrieved context helps the model answer accurately and consistently.

## Query Wrapper with Persistent Memory

In [None]:
def Query(query: str, model: str = "Llama"):
    if model == "Llama3-8b":
        llm = Ollama(model=llama_model)
    elif model == "GPT4o-mini":
        llm = ChatOpenAI(model_name=openai_model, api_key=OPENAI_API_KEY)

    # Use the global memory object to persist history across calls
    return ConversationRetrieval(query, llm, memory=global_memory)

This function chooses which LLM to use based on the model string and then calls ConversationRetrieval. calls *ConversationRetrieval(query, llm, memory=global_memory)* to get the answer. Because *global_memory* is passed each time, the chat history persists across multiple queries.

**In effect**, Query hides the details of model selection and ensures that each new user question is processed with the **same memory** object, allowing the chatbot to carry context from previous turns.

## Gradio Chat Interface

In [None]:
# --- Wrap for Gradio Chat ---
def ChatBot(query, history, model):
    answer = Query(query, model)  
    return answer

# --- Chat Interface ---
with gr.Blocks() as ui:
    gr.Markdown("## 📘 PDF Q&A Chatbot with Memory")
    gr.Markdown("#### Ask questions about the $Monopoly$ board game PDF using *Llama3-8b* or *GPT4o-mini*.")

    with gr.Row():
        model_selector = gr.Dropdown(
            choices=["Llama3-8b", "GPT4o-mini"],
            label="Choose Model",
            value="GPT4o-mini"
        )

    chatbot = gr.Chatbot(height=400)

    msg = gr.Textbox(
        placeholder="Type your question...",
        label="Your Question"
    )
    clear = gr.ClearButton([msg, chatbot])

    def user_submit(query, history, model):
        # Append user query
        history = history + [(query, None)]
        answer = Query(query, model)
        # Append bot answer
        history[-1] = (query, answer)
        return history, ""

    msg.submit(user_submit, [msg, chatbot, model_selector], [chatbot, msg])

ui.launch(inbrowser=True) 

**Overall**, this section uses Gradio’s Blocks API to create a simple chatbot UI. It combines static text (Markdown) and interactive elements (dropdown, textbox, chatbot panel). The logic ensures each user question is processed by the Query function (which uses the selected LLM and the memory-augmented retrieval chain), and the resulting answer is displayed in the chat.

### Example without interface

In [None]:
# ---------- Example ----------
if __name__ == "__main__":
    answer = Query("How can I buy a hotel?", "Llama")
    print("Answer:\n", answer)
     You can continue the conversation and the memory/history will be preserved:
    followup = Query("What if I don't have enough money?", "Llama")
    print("Follow-up Answer:\n", followup)