# RAG-powered Assistant

## RAG-Powered Assistant for PDF or Text Search

### Project Goal:

Build a small app that answers questions from a document or a knowledge base using RAG (Retrieval-Augmented Generation).
The pipeline: embed → store → retrieve → generate.

### imports

In [6]:
from textwrap import wrap
import os
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer
import fitz
from huggingface_hub import InferenceClient
import openai

## Load Environment Variables

In [7]:
from dotenv import load_dotenv
load_dotenv()
hf_api_key = os.getenv("HUGGINGFACE_API_KEY")

## Load PDF File

In [8]:
# the pdf (math chapter)
file = "./data/mathematical-concepts.pdf"

In [9]:
pdf_path = file
# Load the PDF file
doc = fitz.open(pdf_path)

text = ""
for page in doc:
    text += page.get_text()

print(f"Extracted {len(text)} characters from PDF.")

Extracted 20640 characters from PDF.


## Split Text into Chunks

In [10]:
chunk_size = 500 
chunks = wrap(text, chunk_size)
print(f"Document split into {len(chunks)} chunks.")

Document split into 42 chunks.


## Embeddings

In [11]:
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

embeddings = [embedding_model.encode(chunk) for chunk in chunks]
embeddings = np.array(embeddings, dtype="float32")

print(f"Generated embeddings with shape: {embeddings.shape}")

Generated embeddings with shape: (42, 384)


## Store Embeddings in FAISS

In [12]:
import faiss

embedding_dim = embeddings.shape[1]
index = faiss.IndexFlatL2(embedding_dim)
index.add(embeddings)

print("FAISS index created with", index.ntotal, "chunks.")

FAISS index created with 42 chunks.


## Retrieval function

In [13]:
def search(query, top_k=3):
    query_emb = embedding_model.encode(query)
    query_emb = np.array([query_emb], dtype="float32")
    distances, indices = index.search(query_emb, top_k)
    return [(chunks[i], distances[0][pos]) for pos, i in enumerate(indices[0])]

In [22]:
client = InferenceClient(api_key=hf_api_key)

def answer_question(query):
    # Retrieve context
    relevant_chunks = search(query, top_k=3)
    context = "\n".join([chunk for chunk, _ in relevant_chunks])

    prompt = f"""Use the following pieces of context to answer the question at the end. Please follow the following rules:
                1. If you don't know the answer, don't try to make up an answer. Just say "I can't find the final answer but you may want to check the following links".
                2. If you find the answer, write the answer in a concise way with five sentences maximum.
                3. Use ONLY the following context to answer the question. If the answer is not contained in the context, respond with "I don't know."   

                {context}

                Question: {query}

                Helpful Answer:"""
    messages=[
            {"role": "system", "content": prompt},
            {"role": "user", "content": f"{query}"}
        ]

    # Call Hugging Face Inference API
    response = client.chat.completions.create(
        messages=messages,
        model="Qwen/Qwen3-4B-Instruct-2507",
        max_tokens=300,
        temperature=0.3
    )

    return response.choices[0].message.content

# Example
print(answer_question("What is a gradient"))

The gradient of a scalar function is a vector that points in the direction of the steepest increase of the function and indicates the rate of change of the function with respect to position. It is calculated as the vector of partial derivatives of the function with respect to each coordinate. For example, if 𝑓(𝑥,𝑦,𝑧) represents temperature, its gradient points in the direction of fastest temperature increase. In Cartesian coordinates, the gradient is expressed as ∇𝑓 = (∂𝑓/∂𝑥)î + (∂𝑓/∂𝑦)ĵ + (∂𝑓/∂𝑧)𝑘̂. The gradient is fundamental in vector calculus and applications such as electromagnetism and heat transfer.
