# RAG Workshop: Building a Retrieval-Augmented QA System with Azure OpenAI

Welcome! This notebook walks through an end‚Äëto‚Äëend mini Retrieval Augmented Generation (RAG) pipeline using:
- Azure OpenAI (chat + embeddings)
- LangChain document loaders & text splitter
- FAISS in‚Äëmemory vector store
- A nutrition PDF as our knowledge base

Workshop Flow:
1. Environment & credentials
2. Acquire source document (PDF)
3. Load & split into semantic chunks
4. Embed chunks + build vector index
5. Inspect an embedding (intuition)
6. Perform retrieval + grounded chat completion
7. Discuss next steps & enhancements

Learning Goals:
- Understand why chunking matters
- See how embeddings enable similarity search
- Learn prompt grounding basics
- Know where to optimize / productionize

### 1. Environment & Configuration
This cell:
- Locates a `.env` file in the project root
- Loads required Azure OpenAI + embedding deployment variables
- Verifies presence of critical values before continuing

Why it matters:
Without correct credentials, subsequent API calls (embeddings + chat completions) will fail. Early validation shortens debug time.

Checklist (before running):
- `.env` exists one level up from `src/`
- Contains: `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_VERSION`, `AZURE_OPENAI_MODEL`, optional `AZURE_EMBEDDING_DEPLOYMENT`, `AZURE_DEPLOYMENT`

If you see missing variables: open `.env`, add them, re-run this cell.


In [None]:
import os
from dotenv import load_dotenv


env_path = os.path.join('..', '.env')

# 1. Loading the variables
if os.path.exists(env_path):
    load_dotenv(dotenv_path=env_path)
    print(f"‚úÖ Loaded configuration from: {os.path.abspath(env_path)}")
else:
    print("‚ùå Error: .env file not found in project root.")

# 2. Verifying Credentials
required_vars = [
    "AZURE_OPENAI_API_KEY", 
    "AZURE_OPENAI_ENDPOINT", 
    "AZURE_OPENAI_API_VERSION", 
    "AZURE_OPENAI_MODEL",
]

missing = [var for var in required_vars if not os.getenv(var)]
if missing:
    print(f"‚ùå Missing environment variables: {missing}")
else:
    key = os.getenv("AZURE_OPENAI_API_KEY")
    print(f"‚úÖ Azure Configured. Key: {key[:5]}...******")
    print(f"‚úÖ Azure OpenAI Model: {os.getenv('AZURE_OPENAI_MODEL')}")

### 2. Source Document Acquisition (PDF Download)
Goal:
Fetch a public PDF once and store it locally so downstream steps operate on a stable artifact instead of re-hitting the network.

Key Points:
- Checks if the file already exists to avoid redundant downloads.
- Saves into `../pdf/` relative to this notebook.

Why Local Storage?
Repeated parsing of remote documents increases latency and introduces failure points. Caching locally standardizes the pipeline for every participant.

Try:
Delete the file and re-run to see the download logic.


In [None]:
import requests
import os 

# 1. Configuration
pdf_url = "https://globalwellnessinstitute.org/wp-content/uploads/2023/12/NUTRITION_4_HEALTH_SPAN_GWI_final_202301210_hi-res.pdf"

output_folder = "../pdf"

file_path = os.path.join(output_folder, "nutrition_healthspan.pdf")

# 2. Download if not exists
os.makedirs(output_folder, exist_ok=True)

if not os.path.exists(file_path):
    print(f"Downloading PDF from {pdf_url}...")
    headers = {'User-Agent': 'Mozilla/5.0'}
    response = requests.get(pdf_url, headers=headers)
    
    if response.status_code == 200:
        with open(file_path, 'wb') as f:
            f.write(response.content)
        print("‚úÖ Download complete.")
    else:
        print(f"‚ùå Failed to download. Status: {response.status_code}")
else:
    print("‚ÑπÔ∏è File already exists locally.")

### 3. Loading & Chunking the Document
This step:
1. Loads the PDF pages into memory using `PyPDFLoader`.
2. Applies `RecursiveCharacterTextSplitter` to create overlapping chunks.

Why Chunking?
- LLM context windows are limited; we can't pass the entire PDF.
- Smaller semantic units improve retrieval precision.
- Overlap (`chunk_overlap`) preserves continuity (avoids cutting sentences abruptly).

Parameters:
- `chunk_size=1000`: Tune for model/context size; too large reduces recall granularity.
- `chunk_overlap=200`: Helps maintain context for boundary sentences.

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

print("Loading PDF... (This reads the file)")
loader = PyPDFLoader(file_path)
docs = loader.load()
print(f"   Loaded {len(docs)} pages.")

# Spliting Configuration
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,    # Characters per chunk
    chunk_overlap=200   # Overlap to preserve context
)

print("Splitting document into chunks...")
splits = text_splitter.split_documents(docs)

print(f"‚úÖ Created {len(splits)} chunks.")

### 4. Embeddings & Vector Index Construction
Purpose:
Transform textual chunks into high-dimensional vectors so we can perform similarity search (find the most relevant passages for a query).

Components:
- `AzureOpenAIEmbeddings`: Calls Azure deployment for embedding generation.
- `FAISS.from_documents`: Builds an in-memory index optimized for fast nearest-neighbor lookup.

Key Parameters:
- Model: `text-embedding-3-small` (1536 dims) ‚Äî balance of cost vs. semantic fidelity.
- `chunk_size`: Batches requests; too large can hit rate limits, too small may slow throughput.
- Retry settings: Helpful for transient network/API issues.

Why FAISS?
- It is a local, in-memory library and does not require a separate client or server to operate.
- Efficient vector similarity search

Alternative Stores:
- Chroma, Opensearch, Pinecone, etc.

After Running:
You have a retrievable knowledge base ready for RAG.


In [None]:
from langchain_openai import AzureOpenAIEmbeddings
from langchain_community.vectorstores import FAISS
import os

# 1. Configuring Embedding Model
embeddings = AzureOpenAIEmbeddings(
    model="text-embedding-3-small",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    azure_deployment=os.getenv("AZURE_EMBEDDING_DEPLOYMENT"),    
    chunk_size=100,
    show_progress_bar=True,
    max_retries=20,
    retry_min_seconds=2
)

print(f"‚öóÔ∏è  Embedding chunks using: {os.getenv('AZURE_EMBEDDING_DEPLOYMENT')}...")

vectorstore = FAISS.from_documents(documents=splits, embedding=embeddings)

print("‚úÖ Vector Store created successfully.")

### 5. Inspecting a Sample Embedding
Objective:
Demystify what an embedding looks like and confirm expected dimensionality.

What Happens:
- We call `embed_query` on a simple token (`"apple"`).
- Print metadata: length (vector size), type, first few values.

Why Inspect?
- Quick sanity check to ensure the embedding model is working.
- Dimension mismatch is a common integration error when swapping models.

Notes:
- Values have no human-readable meaning individually.
- Similar words will produce vectors closer in cosine space.

Try:
Replace `sample_word` with terms like `fruit`, `nutrition`, `longevity` and compare.


In [None]:
# Sample embedding
sample_word = "apple"
vector = embeddings.embed_query(sample_word)

print(f"Word: '{sample_word}'")
print(f"Vector Dimensions: {len(vector)}") # Should be 1536 for OpenAI models
print(f"Type: {type(vector)}")
print(f"First 10 numbers: {vector[:10]}")
print("...")

### 6. Retrieval-Augmented Generation (RAG) Query Flow

Overview:
1. Accept a user question interactively.
2. Perform vector similarity search (`k=3`) to fetch top relevant chunks.
3. Build a grounded prompt: strict system instructions + context + user question.
4. Call Azure Chat Completion model using deployment configured in env.
5. Display the model's answer.

Why k=3?
Small k keeps prompt size manageable while offering multiple perspectives. Tune based on chunk size & model context window.


In [None]:
from openai import AzureOpenAI
import os 

# 1. Setup Azure Client
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

deployment = os.getenv("AZURE_DEPLOYMENT")

# 2. Define Question
user_question = input("Enter your question: ")

# 3. Retrieve Context (Manually)
print(f"üîç Searching PDF for: '{user_question}'...")
relevant_docs = vectorstore.similarity_search(user_question, k=3)

# Join the retrieved text into one big string
context_data = "\n\n".join([doc.page_content for doc in relevant_docs])

# 4. Preparing the Prompt
system_prompt = """You are a strict assistant.
Your ONLY task is to answer the user's question based on the provided context below.
- Do NOT use your internal knowledge.
- Do NOT make up facts.
- If the answer is not explicitly written in the context, you MUST say "I don't know".
- Do not try to be helpful by adding outside information.
"""

user_message = f"""
Context:
{context_data}

Question: 
{user_question}
"""

# 5. Call GPT-5 
print(f"ü§ñ Asking Azure {os.getenv('AZURE_OPENAI_MODEL')}...")

response = client.chat.completions.create(
    model=deployment,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message}
    ]
)

# 6. Output
print("\n--- Answer ---")
print(response.choices[0].message.content)