# AI Integrations for Developers — Exam

## Instructions

- This notebook is a **template** where you must put your code.  
- You should **fill in all empty variables** and complete the code so that when I download your notebook and click **Run all**, all cells execute correctly and provide the answers.  
- ⚠️ **Do NOT hardcode your API key**. Use Colab environment variables (`%env OPENAI_API_KEY=your_key_here`) and access them in your code.  
- You may **create more cells** if needed. It is recommended that your code is well-structured and split logically into separate cells.  
- The function **`ask_ai(query)`** must be implemented by you. All queries will call this function to check your solution.  
- ✅ **Test cases will be created by me (the instructor).** You are **not allowed to modify, remove, or add to the test cases cell**. Your code must work correctly with the provided test cases.  
- You are **ONLY ALLOWED** to use only the following:  
  - **Models:** OpenAI or Anthropic  
  - **Technologies:** LangChain or vanilla Python code  
  - **Vector Store:** Chroma DB

🚨 **Any student who does not follow the template, does not stick to the required format, or whose code does not execute properly will be disqualified.**


### Important

Fill in **all the variables** in the cell.  
❌ **Do NOT put your API key directly in the code.**  
✅ The cell must be set up to take the API key from the Colab environment variables.


In [1]:
# ================================
# 🔧 RAG Configuration Variables
# ================================

# ⚠️ Do NOT put your API key here directly.
# Make sure you set your API key in Colab like this:
# %env OPENAI_API_KEY=your_key_here

import os
from google.colab import userdata

# API Key (taken from Colab environment variables)
API_KEY = userdata.get("OPENAI_API_KEY")

# Prompt & Model Settings
PROMPT = ""                       # e.g. "Summarize the document in 3 sentences"
MODEL = "gpt-4o-mini"             # e.g. "gpt-4"
EMBEDDING_MODEL = "text-embedding-3-small"   # e.g. "text-embedding-ada-002"

# Chunking Parameters
CHUNK_SIZE = 300            # e.g. 500
CHUNK_OVERLAP = 50         # e.g. 50
TOP_N_RESULTS = 5         # e.g. 3

# Generation Parameters
OUTPUT_LENGTH = 420          # e.g. 200
TEMPERATURE = 0.2            # e.g. 0.7

### Code Organization

Create more cells if needed and put your code in them.  
It is **recommended** that your code is well-structured, split logically, and kept in separate cells for clarity.


In [2]:
# ================================
# 🔧 Install packages
# ================================

!pip install chromadb pypdf openai tiktoken



Collecting chromadb
  Downloading chromadb-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting pypdf
  Downloading pypdf-6.0.0-py3-none-any.whl.metadata (7.1 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.9 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.37.0-py3-none-any.whl.metadata (2.4 kB)
Collecting pypika>=0.48.9 (from chromadb)
  Downloading PyPika-0.48.9.tar.gz (67 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3

In [3]:
# ================================
# 📂 File Upload (PDF)
# ================================

from google.colab import files

# Upload a PDF file
uploaded = files.upload()

# Get filename
pdf_path = list(uploaded.keys())[0]
print(f"✅ Uploaded file: {pdf_path}")


Saving Prompting_Guide_101.pdf to Prompting_Guide_101.pdf
✅ Uploaded file: Prompting_Guide_101.pdf


In [4]:
# ================================
# 📖 Extract text from PDF
# ================================

from pypdf import PdfReader

# Read PDF
reader = PdfReader(pdf_path)

# Extract text from all pages
extracted_text = ""
for page in reader.pages:
    extracted_text += page.extract_text() + "\n"

# Save extracted text to a file for verification
text_file = "extracted_text.txt"
with open(text_file, "w", encoding="utf-8") as f:
    f.write(extracted_text)

print(f"✅ Text extracted and saved to {text_file} (length: {len(extracted_text)} chars)")

# Option to download file
# from google.colab import files
# files.download(text_file)


✅ Text extracted and saved to extracted_text.txt (length: 111401 chars)


In [None]:
# ================================
# ✂️ Sentence-aware Chunking
# ================================

import re
import tiktoken

# Load tokenizer for the embedding model
enc = tiktoken.encoding_for_model(EMBEDDING_MODEL)

def num_tokens(text: str) -> int:
    return len(enc.encode(text))

def split_into_sentences(text: str):
    # Simple regex-based sentence splitter
    sentences = re.split(r'(?<=[.!?])\s+', text.strip())
    return [s for s in sentences if s]

def chunk_text(text, chunk_size=CHUNK_SIZE, overlap=CHUNK_OVERLAP):
    sentences = split_into_sentences(text)
    chunks, current_chunk, current_tokens = [], [], 0

    for sent in sentences:
        sent_tokens = num_tokens(sent)

        # If adding this sentence exceeds chunk size, save current chunk
        if current_tokens + sent_tokens > chunk_size:
            chunks.append(" ".join(current_chunk))
            # Start new chunk with overlap from last chunk
            overlap_tokens = []
            while current_chunk and num_tokens(" ".join(overlap_tokens)) < overlap:
                overlap_tokens.insert(0, current_chunk.pop())
            current_chunk = overlap_tokens.copy()
            current_tokens = num_tokens(" ".join(current_chunk))

        # Add sentence to current chunk
        current_chunk.append(sent)
        current_tokens += sent_tokens

    # Add last chunk
    if current_chunk:
        chunks.append(" ".join(current_chunk))

    return chunks

# Create chunks
chunks = chunk_text(extracted_text, CHUNK_SIZE, CHUNK_OVERLAP)

# Save chunks to file
chunks_file = "chunks_preview.txt"
with open(chunks_file, "w", encoding="utf-8") as f:
    for i, chunk in enumerate(chunks):
        f.write(f"--- Chunk {i+1} ---\n{chunk}\n\n")

# Print statistics
print(f"✅ Total Chunks: {len(chunks)}")
chunk_lengths = [num_tokens(c) for c in chunks]
print(f"📊 Avg tokens per chunk: {sum(chunk_lengths)//len(chunk_lengths)}")
print(f"📊 Min tokens: {min(chunk_lengths)}, Max tokens: {max(chunk_lengths)}")

# Preview first chunks
for i, chunk in enumerate(chunks[:3]):
    print(f"\n🔍 Chunk {i+1} ({num_tokens(chunk)} tokens):\n{chunk[:400]}...\n")

# Allow download
# from google.colab import files
# files.download(chunks_file)


✅ Total Chunks: 105
📊 Avg tokens per chunk: 286
📊 Min tokens: 120, Max tokens: 301

🔍 Chunk 1 (261 tokens):
1
October 2024 edition
A quick-start handbook 
for effective prompts

2
Writing effective prompts 
From the very beginning, Google Workspace was built to allow you to collaborate in real time with other people. Now, you can also collaborate with AI using Gemini for Google Workspace to help boost your productivity and 
creativity without sacrificing privacy or security. The embedded generative AI-p...


🔍 Chunk 2 (293 tokens):
This guide provides you with the foundational skills to write effective prompts when using Gemini for Workspace. You can think of a prompt as a conversation starter with your AI-powered assistant. You might write several 
prompts as the conversation progresses. While the possibilities are virtually endless, you can put consistent 
best practices to work today. The four main areas to consider when ...


🔍 Chunk 3 (300 tokens):
Express complete thoughts in  
f

In [5]:
# ================================
# ✂️ Semantic / Dynamic Chunking
# ================================

import re
import tiktoken

enc = tiktoken.encoding_for_model(EMBEDDING_MODEL)

def num_tokens(text: str) -> int:
    return len(enc.encode(text))

def semantic_chunk_text(text, chunk_size=200, overlap=50):
    # Split on paragraphs and bullet points
    paragraphs = re.split(r'\n\s*\n|•', text)
    chunks, current_chunk, current_tokens = [], [], 0

    for para in paragraphs:
        para = para.strip()
        if not para:
            continue

        tokens = num_tokens(para)

        # If paragraph alone is too big, split further by sentences
        if tokens > chunk_size:
            sentences = re.split(r'(?<=[.!?])\s+', para)
            for sent in sentences:
                sent_tokens = num_tokens(sent)
                if current_tokens + sent_tokens > chunk_size:
                    chunks.append(" ".join(current_chunk))
                    # Overlap handling
                    overlap_tokens = []
                    while current_chunk and num_tokens(" ".join(overlap_tokens)) < overlap:
                        overlap_tokens.insert(0, current_chunk.pop())
                    current_chunk = overlap_tokens.copy()
                    current_tokens = num_tokens(" ".join(current_chunk))
                current_chunk.append(sent)
                current_tokens += sent_tokens
        else:
            if current_tokens + tokens > chunk_size:
                chunks.append(" ".join(current_chunk))
                overlap_tokens = []
                while current_chunk and num_tokens(" ".join(overlap_tokens)) < overlap:
                    overlap_tokens.insert(0, current_chunk.pop())
                current_chunk = overlap_tokens.copy()
                current_tokens = num_tokens(" ".join(current_chunk))
            current_chunk.append(para)
            current_tokens += tokens

    if current_chunk:
        chunks.append(" ".join(current_chunk))

    return chunks

# Create semantic chunks
semantic_chunks = semantic_chunk_text(extracted_text, chunk_size=200, overlap=50)

print(f"✅ Created {len(semantic_chunks)} semantic chunks")
print(f"📊 Avg tokens: {sum(num_tokens(c) for c in semantic_chunks)//len(semantic_chunks)}")
print(f"🔍 First chunk:\n{semantic_chunks[0][:400]}...")


✅ Created 188 semantic chunks
📊 Avg tokens: 190
🔍 First chunk:
1
October 2024 edition
A quick-start handbook 
for effective prompts 2
Writing effective prompts 
From the very beginning, Google Workspace was built to allow you to collaborate in real time with other people. Now, you can also collaborate with AI using Gemini for Google Workspace to help boost your productivity and 
creativity without sacrificing privacy or security. The embedded generative AI-po...


In [7]:
# ================================
# 📝 Contextualization of Chunks
# ================================

from openai import OpenAI

client = OpenAI(api_key=API_KEY)

def contextualize_chunks(chunks):
    contextualized = []
    for i, chunk in enumerate(chunks):
        # Use GPT to generate a short description
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "system", "content": "Summarize this chunk in one sentence for context retrieval."},
                {"role": "user", "content": chunk}
            ],
            max_tokens=50,
            temperature=0
        )
        summary = response.choices[0].message.content.strip()
        contextualized.append({
            "id": f"chunk_{i+1}",
            "text": chunk,
            "contextualized_text": f"Context: {summary}\n\n{chunk}"
        })

        if (i+1) % 10 == 0:
            print(f"✅ Contextualized {i+1}/{len(chunks)} chunks")

    return contextualized

# Apply contextualization
contextualized_chunks = contextualize_chunks(semantic_chunks)

print(f"✅ Total contextualized chunks: {len(contextualized_chunks)}")
print(f"🔍 Example:\n{contextualized_chunks[0]['contextualized_text'][:400]}...")


✅ Contextualized 10/188 chunks
✅ Contextualized 20/188 chunks
✅ Contextualized 30/188 chunks
✅ Contextualized 40/188 chunks
✅ Contextualized 50/188 chunks
✅ Contextualized 60/188 chunks
✅ Contextualized 70/188 chunks
✅ Contextualized 80/188 chunks
✅ Contextualized 90/188 chunks
✅ Contextualized 100/188 chunks
✅ Contextualized 110/188 chunks
✅ Contextualized 120/188 chunks
✅ Contextualized 130/188 chunks
✅ Contextualized 140/188 chunks
✅ Contextualized 150/188 chunks
✅ Contextualized 160/188 chunks
✅ Contextualized 170/188 chunks
✅ Contextualized 180/188 chunks
✅ Total contextualized chunks: 188
🔍 Example:
Context: The October 2024 edition introduces a quick-start handbook for using Gemini in Google Workspace, highlighting its generative AI features that enhance collaboration, productivity, and creativity while maintaining privacy and security across various Google applications.

1
October 2024 edition
A quick-start handbook 
for effective prompts 2
Writing effective prompts 
From the v

In [9]:
# ================================
# 🔑 Create Embeddings for Chunks
# ================================

import json
from openai import OpenAI

client = OpenAI(api_key=API_KEY)

embeddings = []

print("⏳ Generating embeddings...")

chunks = [c["contextualized_text"] for c in contextualized_chunks]

for i, chunk in enumerate(chunks):
    response = client.embeddings.create(
        model=EMBEDDING_MODEL,
        input=chunk
    )
    vector = response.data[0].embedding
    embeddings.append({
        "id": f"chunk_{i+1}",
        "text": chunk,
        "embedding": vector
    })

    if (i+1) % 10 == 0 or i == len(chunks)-1:
        print(f"✅ Processed {i+1}/{len(chunks)} chunks")

# Save to JSONL file
embeddings_file = "chunk_embeddings.jsonl"
with open(embeddings_file, "w", encoding="utf-8") as f:
    for e in embeddings:
        f.write(json.dumps(e) + "\n")

print(f"\n✅ Saved embeddings to {embeddings_file} (total {len(embeddings)})")

# Allow download
# from google.colab import files
# files.download(embeddings_file)


⏳ Generating embeddings...
✅ Processed 10/188 chunks
✅ Processed 20/188 chunks
✅ Processed 30/188 chunks
✅ Processed 40/188 chunks
✅ Processed 50/188 chunks
✅ Processed 60/188 chunks
✅ Processed 70/188 chunks
✅ Processed 80/188 chunks
✅ Processed 90/188 chunks
✅ Processed 100/188 chunks
✅ Processed 110/188 chunks
✅ Processed 120/188 chunks
✅ Processed 130/188 chunks
✅ Processed 140/188 chunks
✅ Processed 150/188 chunks
✅ Processed 160/188 chunks
✅ Processed 170/188 chunks
✅ Processed 180/188 chunks
✅ Processed 188/188 chunks

✅ Saved embeddings to chunk_embeddings.jsonl (total 188)


In [11]:
!pip install rank_bm25

Collecting rank_bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank_bm25
Successfully installed rank_bm25-0.2.2


In [17]:
# ================================
# 🔍 Hybrid Search (BM25 + Chroma)
# ================================

from rank_bm25 import BM25Okapi

import chromadb
from chromadb.utils import embedding_functions

# Create Chroma client (in-memory for now, can persist later)
chroma_client = chromadb.Client()

# Build BM25 index
bm25 = BM25Okapi([c["text"].split() for c in contextualized_chunks])

# Rebuild Chroma collection with contextualized text
try:
    chroma_client.delete_collection("prompting_guide_hybrid")
except:
    pass

collection_hybrid = chroma_client.create_collection(
    name="prompting_guide_hybrid",
    metadata={"hnsw:space": "cosine"}
)

collection_hybrid.add(
    ids=[c["id"] for c in contextualized_chunks],
    documents=[c["contextualized_text"] for c in contextualized_chunks],
    embeddings=[client.embeddings.create(model=EMBEDDING_MODEL, input=c["contextualized_text"]).data[0].embedding for c in contextualized_chunks]
)

def hybrid_search(query, n_results=TOP_N_RESULTS, alpha=0.5):
    """
    Hybrid search using BM25 + Chroma embeddings.
    alpha = weight for embeddings (0.5 = equal balance)
    """
    # Embedding search
    q_emb = client.embeddings.create(model=EMBEDDING_MODEL, input=query).data[0].embedding
    chroma_results = collection_hybrid.query(query_embeddings=[q_emb], n_results=n_results*2)

    # BM25 search
    bm25_scores = bm25.get_scores(query.split())

    # Normalize BM25 scores
    max_score = max(bm25_scores)
    bm25_norm = [s/max_score for s in bm25_scores]

    # Combine scores
    combined = {}
    for i, cid in enumerate(chroma_results["ids"][0]):
        combined[cid] = {"text": chroma_results["documents"][0][i], "score": (1-alpha)*(1-chroma_results["distances"][0][i])}

    for i, c in enumerate(contextualized_chunks):
        if c["id"] not in combined:
            combined[c["id"]] = {"text": c["contextualized_text"], "score": 0}
        combined[c["id"]]["score"] += alpha * bm25_norm[i]

    # Sort by combined score
    ranked = sorted(combined.items(), key=lambda x: x[1]["score"], reverse=True)[:n_results]

    return [{"id": rid, "text": rdata["text"], "score": rdata["score"]} for rid, rdata in ranked]

# 🔍 Test hybrid search
test_query = "How many words should effective prompts average?"
results = hybrid_search(test_query, n_results=3)

print(f"🔎 Query: {test_query}\n")
for r in results:
    print(f"📌 {r['id']} (score={r['score']:.4f})")
    print(r["text"][:400] + "...\n")


🔎 Query: How many words should effective prompts average?

📌 chunk_7 (score=0.8305)
Context: Effective prompts for generative AI typically average around 21 words with relevant context, but users often submit shorter prompts, leading to unpredictable responses that require careful review before use.

Based on what we’ve learned from our users 
so far, the most fruitful prompts average around 21 words with relevant 
context, yet the prompts people try are usually less than nine wo...

📌 chunk_6 (score=0.6214)
Context: To enhance Gemini's output, personalize it with your Google Drive files and use the "Make this a power prompt" feature in Gemini Advanced to refine your prompts, aiming for around 21 words for optimal results.

Personalize Gemini’s output with information from your own files in Google Drive. 6. Make Gemini your prompt editor. When using Gemini Advanced, start your prompts with: “Make this...

📌 chunk_8 (score=0.5313)
Context: The document includes a table of contents outli

In [27]:
# ================================
# 🔍 Hybrid Contextual Search (with score breakdown)
# ================================

def search_chunks(query, n_results=TOP_N_RESULTS, alpha=0.5):
    """
    Hybrid search using BM25 + Chroma embeddings + contextualized chunks.
    Returns both embedding and BM25 contributions in the score breakdown.
    alpha = weight for BM25 (0.5 = balanced between BM25 and embeddings).
    """
    # Step 1: Embed the query
    q_emb = client.embeddings.create(model=EMBEDDING_MODEL, input=query).data[0].embedding

    # Step 2: Chroma search (semantic similarity)
    chroma_results = collection_hybrid.query(
        query_embeddings=[q_emb],
        n_results=n_results * 2  # get more candidates for reranking
    )

    # Step 3: BM25 search (keyword exactness)
    bm25_scores = bm25.get_scores(query.split())
    max_score = max(bm25_scores)
    bm25_norm = [s / max_score for s in bm25_scores]

    # Step 4: Combine scores with breakdown
    combined = {}

    # Add Chroma results (embedding similarity)
    for i, cid in enumerate(chroma_results["ids"][0]):
        emb_score = 1 - chroma_results["distances"][0][i]  # distance → similarity
        combined[cid] = {
            "text": chroma_results["documents"][0][i],
            "embedding_score": emb_score,
            "bm25_score": 0.0,
            "score": (1 - alpha) * emb_score
        }

    # Add BM25 scores
    for i, c in enumerate(contextualized_chunks):
        if c["id"] not in combined:
            combined[c["id"]] = {
                "text": c["contextualized_text"],
                "embedding_score": 0.0,
                "bm25_score": 0.0,
                "score": 0.0
            }
        combined[c["id"]]["bm25_score"] = bm25_norm[i]
        combined[c["id"]]["score"] += alpha * bm25_norm[i]

    # Step 5: Sort & return top results
    ranked = sorted(combined.items(), key=lambda x: x[1]["score"], reverse=True)[:n_results]

    return [
        {
            "id": rid,
            "text": rdata["text"],
            "embedding_score": rdata["embedding_score"],
            "bm25_score": rdata["bm25_score"],
            "score": rdata["score"]
        }
        for rid, rdata in ranked
    ]


In [30]:
# 🔍 Quick test (with score breakdown)
test_query = "How many words should effective prompts average?"
results = search_chunks(test_query, n_results=3)

print(f"🔎 Query: {test_query}\n")
for r in results:
    print(f"📌 {r['id']} | emb={r['embedding_score']:.4f} | bm25={r['bm25_score']:.4f} | combined={r['score']:.4f}")
    print(r["text"][:400].replace("\n", " ") + "...\n")


🔎 Query: How many words should effective prompts average?

📌 chunk_7 | emb=0.6609 | bm25=1.0000 | combined=0.8305
Context: Effective prompts for generative AI typically average around 21 words with relevant context, but users often submit shorter prompts, leading to unpredictable responses that require careful review before use.  Based on what we’ve learned from our users  so far, the most fruitful prompts average around 21 words with relevant  context, yet the prompts people try are usually less than nine wo...

📌 chunk_6 | emb=0.5649 | bm25=0.6777 | combined=0.6213
Context: To enhance Gemini's output, personalize it with your Google Drive files and use the "Make this a power prompt" feature in Gemini Advanced to refine your prompts, aiming for around 21 words for optimal results.  Personalize Gemini’s output with information from your own files in Google Drive. 6. Make Gemini your prompt editor. When using Gemini Advanced, start your prompts with: “Make this...

📌 chunk_8 | emb=0.545

In [28]:
# ================================
# 🧪 Debug Hybrid Search (Score Breakdown)
# ================================

debug_query = "How many words should effective prompts average?"

results = search_chunks(debug_query, n_results=10, alpha=0.5)

print(f"🔎 Query: {debug_query}\n")
for r in results:
    txt = r["text"][:300].replace("\n", " ")
    print(f"📌 {r['id']} | emb={r['embedding_score']:.4f} | bm25={r['bm25_score']:.4f} | combined={r['score']:.4f}")
    print(f"   {txt}\n")

🔎 Query: How many words should effective prompts average?

📌 chunk_7 | emb=0.6610 | bm25=1.0000 | combined=0.8305
   Context: Effective prompts for generative AI typically average around 21 words with relevant context, but users often submit shorter prompts, leading to unpredictable responses that require careful review before use.  Based on what we’ve learned from our users  so far, the most fruitful prompts aver

📌 chunk_6 | emb=0.5650 | bm25=0.6777 | combined=0.6214
   Context: To enhance Gemini's output, personalize it with your Google Drive files and use the "Make this a power prompt" feature in Gemini Advanced to refine your prompts, aiming for around 21 words for optimal results.  Personalize Gemini’s output with information from your own files in Google Drive

📌 chunk_173 | emb=0.4816 | bm25=0.7032 | combined=0.5924
   Context: The guide provides tips for writing effective prompts with Gemini for Workspace to enhance productivity and includes examples of prompt iteration for br

## Test Cases (Final Cell)

The final cell must contain your **test cases**.  
When executed, the AI should provide correct answers to the given questions **based on the PDF file**.


### AI Query Function

In this cell, you must implement the function **ask_ai(query)**.  
This function will be the final execution point of your pipeline (RAG / LLM).  


In [31]:
# ================================
# ❓ AI Query Function (with Debug Mode + Score Breakdown)
# ================================

DEBUG = True  # 🔎 Toggle evidence printing

def ask_ai(query: str) -> str:
    """
    Executes the final RAG / LLM pipeline with optional debug mode.
    Input:
        query (str): The question you want to ask the AI.
    Output:
        str: The AI's answer based on the PDF file.
    """
    # Step 1: Retrieve top-N chunks
    retrieved = search_chunks(query, n_results=TOP_N_RESULTS)
    context = "\n\n".join([r["text"] for r in retrieved])

    # Debug mode: show retrieved evidence with score breakdown
    if DEBUG:
        print(f"\n🔎 DEBUG: Retrieved {len(retrieved)} chunks for query → {query}\n")
        for r in retrieved:
            print(f"📌 {r['id']} | emb={r['embedding_score']:.4f} | bm25={r['bm25_score']:.4f} | combined={r['score']:.4f}")
            print(r["text"][:300].replace("\n", " ") + "...\n")

    # Step 2: Build QA prompt
    system_prompt = (
        "You are a helpful assistant answering questions from a company handbook.\n"
        "Only use the provided context to answer.\n"
        "If the answer is not found in the context, say: 'Not in the guide.'\n"
        "Keep answers concise and factual."
    )

    user_prompt = f"Context:\n{context}\n\nQuestion: {query}\nAnswer:"

    # Step 3: Call GPT model
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        max_tokens=OUTPUT_LENGTH,
        temperature=TEMPERATURE
    )

    # Step 4: Return the model's answer
    return response.choices[0].message.content.strip()


### Test Queries

Use this cell to test your function with different queries.  
The answers must be generated correctly based on the PDF file.  


In [32]:
# ================================
# 🔍 Example Queries for Testing
# ================================

queries = [
    "How many words should effective prompts average?",
    "List the four main areas for effective prompts.",
    "What does 'persona' mean in prompt writing?",
    "Name three business roles covered in this guide.",
    "What is Gemini Advanced?"
]

# Call the AI with each query
for q in queries:
    print(f"Q: {q}")
    print(f"A: {ask_ai(q)}\n")


Q: How many words should effective prompts average?

🔎 DEBUG: Retrieved 5 chunks for query → How many words should effective prompts average?

📌 chunk_7 | emb=0.6610 | bm25=1.0000 | combined=0.8305
Context: Effective prompts for generative AI typically average around 21 words with relevant context, but users often submit shorter prompts, leading to unpredictable responses that require careful review before use.  Based on what we’ve learned from our users  so far, the most fruitful prompts aver...

📌 chunk_6 | emb=0.5650 | bm25=0.6777 | combined=0.6214
Context: To enhance Gemini's output, personalize it with your Google Drive files and use the "Make this a power prompt" feature in Gemini Advanced to refine your prompts, aiming for around 21 words for optimal results.  Personalize Gemini’s output with information from your own files in Google Drive...

📌 chunk_8 | emb=0.5458 | bm25=0.5169 | combined=0.5313
Context: The document includes a table of contents outlining various topics relate