Intution behind the RAG

In [30]:
#import necessary libraries
import ollama #import the ollama library
import gradio as gr #provides the interface for the model

#document processing and retrieval
from langchain_community.document_loaders import PyMuPDFLoader #extract text from pdf file for processing
from langchain.text_splitter import RecursiveCharacterTextSplitter #splits text into smaller chunks for better embedding and retrieval
from langchain.vectorstores import chroma #handles storage and vector embedding using chroma

#Embedding generation
from langchain_community.embeddings import OllamaEmbeddings #converts texts into numerical vectors using Ollama's embedding model
import re #regular expression library for text processing

#### Call DeepSeek R11.5B via API
In this snippet, we use ollama.chat() to generate a response from DeepSeek R11.5B (which is installed locally). Let’s break it down:
	- **Choosing the Model**: We specify “deepseek-r11.5b” using the model argument.

	- Passing User Messages: The messages parameter is a list of interactions, where each message contains:

	- role: “user” — Indicates that the message is from the user.

	- content: “Explain Newton’s second law of motion” — The actual question asked.

	- Extracting and Printing the Response: The model generates a structured response, where the content of the reply is stored in response["message"].
	
	- We print this output to display the answer.

This approach allows us to interact with an LLM locally, making it a powerful way to answer queries without relying on external APIs.

In [31]:
#call the ollama model
response = ollama.chat(
    model = "deepseek-r1:1.5b",
    messages = [
        {"role": "user", "content": "explain the concept of orbital mechanics"},
                ]
)
print(response["message"]["content"])

<think>
Okay, so I need to explain the concept of orbital mechanics. Hmm, where do I start? Well, I know that orbital mechanics is all about how objects move around a central body, like planets orbiting the sun or satellites orbiting Earth. But wait, what's the big deal about that? I think it has something to do with gravity and movement.

I remember hearing about Kepler's laws. There are three laws in orbital mechanics. The first one is about the square of the period being proportional to the cube of the semi-major axis. That sounds like Kepler's third law, right? So, for example, Earth orbits the sun more quickly than Venus because it's farther away. That makes sense because gravity pulls stronger when you're closer.

Then there's the first law, which says that objects orbit in an ellipse. Wait, but we all know ellipses are stretched out circles. So isn't circular motion just a special case of an ellipse? Yeah, I think so. And the second law talks about areas swept by the radius bein

## Preprocess the PDF Document for RAG

We will now create a function that pre-processes the PDF file for RAG. Below is a breakdown of its logic:

- **Check if a PDF is provided**: If no file is uploaded, the function returns `None`, preventing unnecessary processing.
- **Extract text from the PDF**: Uses `PyMuPDFLoader` to load and extract raw text from the document.
- **Split the text into chunks**: Since LLMs process smaller text fragments better, we use `RecursiveCharacterTextSplitter`. Each chunk contains **500 characters**, with an **overlap of 100 characters** to maintain context.
- **Generate embeddings for each chunk**: Uses `OllamaEmbeddings` with the `"deepseek-r1:1.5b"` model to convert text into **numerical vectors**. These embeddings allow us to find **meaning-based matches** rather than exact keyword searches.
- **Store embeddings in a vector database**: We use `ChromaDB` to **store and organize** the generated embeddings efficiently. The data is **persisted** in `"./chroma_db"` to avoid recomputing embeddings every time.
- **Create a retriever for searching the database**: The retriever acts like a **smart search engine**, enabling the chatbot to fetch the most relevant text when answering questions.
- **Return essential components**
    - `text_splitter` (for future text processing)
    - `vectorstore` (holding the document embeddings)
    - `retriever` (allowing AI-powered search over the document)

## **What are embeddings?**
Embeddings are **numerical representations of text** that capture meaning. Instead of treating words as just sequences of letters, embeddings transform them into **multi-dimensional vectors** where similar words or sentences **are placed closer together**.

![image](https://miro.medium.com/v2/resize:fit:1400/1*OEmWDt4eztOcm5pr2QbxfA.png)
_Source: https://medium.com/towards-data-science/word-embeddings-intuition-behind-the-vector-representation-of-the-words-7e4eb2410bba_

### **Intuition: how do embeddings work?**
Imagine a **map of words**:
- Words with **similar meanings** (*cat* and *dog*) are **closer together**.
- Words with **different meanings** (*cat* and *car*) are **farther apart**.
- Sentences or paragraphs with similar **context** will have embeddings that are **close to each other**.

This means when a user asks a question, the LLM doesn’t just look for **exact words**—it finds the **most relevant text based on meaning**, even if the wording is different.

### **Why this matters?**
This function enables a chatbot to **understand and retrieve information from PDFs efficiently**. Instead of simple keyword searches, it **finds contextually relevant information**, making AI responses **more accurate and useful**.



In [32]:
#define the function that process the PDF
def process_pdf(pdf_bytes):
    if not pdf_bytes:
        return None, None, None
    #load the pdf document
    loader = PyMuPDFLoader(pdf_bytes)
    data = loader.load()
    #split the text into smaller chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,overlap=100)
    chunks = text_splitter.split_document(data)
    embeddings = OllamaEmbeddings(model="deepseek-r1:1.5b")
    vectorstore = chroma.from_document(documents = chunks, embeddings = embeddings, persist_directory = './choma_db')
    retriever = vectorstore.as_retriever()

    return text_splitter, vectorstore, retriever

## **Combining retrieved document chunks**
Once the embeddings are retrieved, next we need to stitch these together. The `combine_docs() function merges multiple retrieved document chunks into a single string. Why do we do this?

- **Provides better context** – LLMs understand structured, continuous text better than fragmented pieces.  
- **Improves response quality** – Merging chunks helps LLMs generate more coherent and complete answers.  
- **Preserves document flow** – Keeps information logically ordered, preventing disjointed responses.  
- **Optimizes token usage** – Reduces redundant queries and ensures efficient use of the model’s context window.  

In [33]:
def combine_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

## Querying DeepSeek-R1 using Ollama

Now, our input to the model is ready. Let’s set up DeepSeek R1 using Ollama.

The `ollama_llm()` function **takes a user’s question and relevant context, formats a structured prompt, sends it to the DeepSeek-R1 model, and returns a clean generated response**.

### **How it works (step-by-step)**
- **Formats the input** – Structures the question and context for better input understanding.
- **Calls `deepseek-r1`** – Sends the formatted prompt to generate a response.
- **Extracts the response content** – Retrieves the AI’s answer.
- **Cleans unnecessary text** – Removes `<think>...</think>` traces that contain model reasoning.
- **Returns the cleaned response** – Provides a polished and readable AI answer.

In [34]:
def ollama_llm(question, context):
    formatted_prompt = f"question: {question} \n\n context: {context}"
    response = ollama.chat(
        model = "deepseek-r1:1.5b",
        messages = [
            {"role": "user", "content": formatted_prompt},
        ]
    )
    response_content = response["message"]["content"]
    final_answer = re.sub(r'<think>.*?</think>', '', 
                          response_content,
                          flags=re.DOTALL).strip()
    return final_answer

## **Build a RAG pipeline** 

Now we have all the required components, let’s build the RAG pipeline for our demo. We will build the `rag_chain()` function, which **retrieves relevant document chunks, formats them, and generates a response with the additional context from the retrieval step**. 

### **How it works**

- **Retrieves relevant document chunks**: The `retriever.invoke(question)` searches for the most relevant text based on the user's question.Instead of relying solely on a language model’s memory, it **fetches factual data** from stored documents.
- **Formats the retrieved content**: `combine_docs(retrieved_docs)` merges the document chunks into a single structured text. This ensures that DeepSeek receives a **well-organized input** for better reasoning.
- **Generates the response**: Calls `ollama_llm(question, formatted_content)`, which:  
    - Passes the structured input to `deepseek-r1:1.5b` for processing.  
    - Cleans up the response (removes `<think>` tags).  
    - Returns a polished, fact-based answer.

In [35]:
#Define rag_chain function for retrieval Augmented Generation
def rag_chain(question, text_splitter, vectorstore, retriever):
    retrieved_docs = retriever.invoke(question)
    formatted_content = combine_docs(retrieved_docs)
    return ollama_llm(question, formatted_content)

In [36]:
#putting it all together - Create a fuction that performs the logic expected by the chatbot
def ask_question(pdf_bytes, question):
    text_splitter, vectorstore, retriever = process_pdf(pdf_bytes)
    if text_splitter is None:
        return None
    return rag_chain(question, text_splitter, vectorstore, retriever)

### Buiding the Chat Interface with Gradio

In [37]:
import gradio as gr

def ask_question(pdf_file, question):
    """Handles PDF processing and answering user questions."""
    
    if pdf_file:
        # Read the uploaded PDF
        pdf_bytes = pdf_file.read()
        text_splitter, vectorstore, retriever = process_pdf(pdf_bytes)

        if retriever is None:
            return "Error processing PDF."

        # Retrieve relevant document content
        docs = retriever.get_relevant_documents(question)
        if not docs:
            return "No relevant information found in the document."

        # Concatenate relevant text as response
        response = "\n".join([doc.page_content for doc in docs])
    
    else:
        # If no PDF is uploaded, provide a default response
        response = "No PDF uploaded. Please upload a PDF to get document-based answers."

    return response

# Define Gradio interface
interface = gr.Interface(
    fn=ask_question,
    inputs=[ 
        gr.File(label="Upload PDF (optional)"),  # Removed `optional=True` (not needed)
        gr.Textbox(label="Question", placeholder="Type your question here..."),
    ],
    outputs="text",
    title="Ask a Question About a PDF",
    description="Upload a PDF and ask a question about it. If no PDF is uploaded, you will get a default response.",
)

interface.launch()

* Running on local URL:  http://127.0.0.1:7867

To create a public link, set `share=True` in `launch()`.


