# Freestyle Track ‚Äì PDF Question-Answering Agent (Google √ó Kaggle 5-Day AI Agents Intensive Capstone)

This project implements a **PDF Question-Answering Agent** powered by Google's Gemini models.

**Track:** Freestyle  
**Agent goal:** Read a PDF and answer questions grounded only in that document.

**Key agent concepts demonstrated:**

1. **LLM-powered agent** (Gemini as the reasoning engine)  
2. **Tools** (custom PDF loading tool)  
3. **Sequential / loop agent** (multi-turn Q&A over the same PDF)  
4. **Sessions & memory** (chat history across questions)  
5. **Observability** (logging key steps in the agent)  
6. **Agent evaluation** (LLM-based scoring of the agent's answers)  


In [49]:
!pip install -q -U google-genai

from google import genai
from google.genai import types
from kaggle_secrets import UserSecretsClient

# Get API key from Kaggle Secrets (Add-ons ‚ûú Secrets ‚ûú GOOGLE_API_KEY)
user_secrets = UserSecretsClient()
GOOGLE_API_KEY = user_secrets.get_secret("GOOGLE_API_KEY")

client = genai.Client(api_key=GOOGLE_API_KEY)

# Choose a model ‚Äì you can adjust based on your access
LLM_MODEL = "gemini-2.5-flash"  # fallback: "gemini-2.0-flash" or "gemini-1.5-flash"

print("Gemini client initialized ‚úÖ")


Gemini client initialized ‚úÖ


In [50]:
import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

logging.info("Logging initialized for the PDF Q&A Agent.")


2025-12-01 06:38:04,759 - INFO - Logging initialized for the PDF Q&A Agent.


In [51]:
from pathlib import Path
from google.genai import types
import logging

# Path to your PDF (use your dataset and filename)
PDF_PATH = "/kaggle/input/dummy-resume/Dummy_resume.pdf"

def load_pdf_tool(pdf_path: str) -> types.Part:
    """
    Custom Tool: Load a PDF from disk and convert it into a Gemini Part.
    This tool allows the agent to read your PDF content.
    """
    logging.info("Loading PDF from: %s", pdf_path)
    pdf_bytes = Path(pdf_path).read_bytes()

    # Convert to Part for Gemini
    pdf_part = types.Part.from_bytes(
        data=pdf_bytes,
        mime_type="application/pdf"
    )

    logging.info("PDF loaded successfully into Gemini Part.")
    return pdf_part


# Load it now
pdf_part = load_pdf_tool(PDF_PATH)
print("PDF loaded as Gemini Part ‚úîÔ∏è")


2025-12-01 06:38:04,782 - INFO - Loading PDF from: /kaggle/input/dummy-resume/Dummy_resume.pdf
2025-12-01 06:38:04,786 - INFO - PDF loaded successfully into Gemini Part.


PDF loaded as Gemini Part ‚úîÔ∏è


In [52]:
from typing import List, Dict
import logging

def pdf_qa_agent(
    question: str,
    pdf_doc: types.Part,
    chat_history: List[Dict[str, str]] | None = None,
) -> Dict[str, object]:
    """
    PDF Question-Answering Agent.

    Inputs:
      - question: user question in natural language
      - pdf_doc: Gemini Part containing the PDF
      - chat_history: optional list of previous turns: {'role': 'user'/'agent', 'content': str}

    Returns:
      - dict with keys: 'answer' (str), 'updated_history' (list)
    """
    logging.info("Agent received question: %s", question)

    if chat_history is None:
        chat_history = []

    # --- Session & Memory: build compact context from recent history ---
    history_text = ""
    if chat_history:
        logging.info("Building context from chat history with %d messages.", len(chat_history))
        lines = []
        # keep only the last few turns (simple context compaction)
        for turn in chat_history[-6:]:
            role = turn["role"]
            content = turn["content"]
            lines.append(f"{role.upper()}: {content}")
        history_text = "Previous conversation:\n" + "\n".join(lines)

    # --- System instruction: how the agent should behave ---
    system_instruction = (
        "You are a helpful PDF question-answering agent. "
        "You must answer ONLY using information from the attached PDF (the user's resume). "
        "If the answer is not in the document, say you do not know. "
        "Be concise and, when helpful, mention where in the resume you found the info "
        "like 'In the experience section' or 'Under skills'."
    )

    contents = [system_instruction]

    if history_text:
        contents.append(history_text)

    # Attach the PDF and the current question
    contents.append(pdf_doc)
    contents.append(f"Question: {question}")

    logging.info("Calling Gemini model: %s", LLM_MODEL)
    response = client.models.generate_content(
        model=LLM_MODEL,
        contents=contents,
    )

    answer_text = response.text
    logging.info("Model returned an answer of length %d characters.", len(answer_text))

    # --- Update chat history (session memory) ---
    new_history = chat_history + [
        {"role": "user", "content": question},
        {"role": "agent", "content": answer_text},
    ]

    return {
        "answer": answer_text,
        "updated_history": new_history,
    }

print("PDF Q&A Agent defined üöÄ")


PDF Q&A Agent defined üöÄ


In [53]:
chat_history = []

questions = [
    "Give me a short summary of this resume in 4-5 bullet points.",
    "List all the skills mentioned in the resume.",
    "What work experience does the resume describe?",
]

for idx, q in enumerate(questions, start=1):
    logging.info("Running test question %d", idx)
    result = pdf_qa_agent(
        question=q,
        pdf_doc=pdf_part,
        chat_history=chat_history,
    )
    chat_history = result["updated_history"]

    print(f"\n=== Question {idx} ===")
    print("Q:", q)
    print("\nAgent answer:\n", result["answer"])
    print("=" * 60)


2025-12-01 06:38:04,841 - INFO - Running test question 1
2025-12-01 06:38:04,842 - INFO - Agent received question: Give me a short summary of this resume in 4-5 bullet points.
2025-12-01 06:38:04,844 - INFO - Calling Gemini model: gemini-2.5-flash
2025-12-01 06:38:04,845 - INFO - AFC is enabled with max remote calls: 10.
2025-12-01 06:38:10,378 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 06:38:10,382 - INFO - Model returned an answer of length 981 characters.
2025-12-01 06:38:10,383 - INFO - Running test question 2
2025-12-01 06:38:10,384 - INFO - Agent received question: List all the skills mentioned in the resume.
2025-12-01 06:38:10,385 - INFO - Building context from chat history with 2 messages.
2025-12-01 06:38:10,386 - INFO - Calling Gemini model: gemini-2.5-flash
2025-12-01 06:38:10,387 - INFO - AFC is enabled with max remote calls: 10.



=== Question 1 ===
Q: Give me a short summary of this resume in 4-5 bullet points.

Agent answer:
 Here is a short summary of the resume:

*   John Doe is a Full Stack Developer with experience in JavaScript technologies, frontend, backend, API design, and cloud deployment (from Professional Summary).
*   He holds a Bachelor of Science in Computer Science from the University of California, Berkeley (2020-2024) (from Education).
*   His experience includes internships as a Full Stack Developer at TechNova Solutions and a Frontend Developer at CloudByte Labs, where he built APIs, developed reusable components, and optimized performance (from Experience).
*   He is proficient in languages like JavaScript, Python, and TypeScript, and frameworks/tools such as React, Node.js, MongoDB, Git, and Docker (under Technical Skills).
*   He has earned a Meta Front-End Developer Certificate and a Google Data Analytics Certificate, and notable achievements include 150+ GitHub contributions in 90 days

2025-12-01 06:38:12,680 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 06:38:12,684 - INFO - Model returned an answer of length 316 characters.
2025-12-01 06:38:12,685 - INFO - Running test question 3
2025-12-01 06:38:12,686 - INFO - Agent received question: What work experience does the resume describe?
2025-12-01 06:38:12,687 - INFO - Building context from chat history with 4 messages.
2025-12-01 06:38:12,687 - INFO - Calling Gemini model: gemini-2.5-flash
2025-12-01 06:38:12,688 - INFO - AFC is enabled with max remote calls: 10.



=== Question 2 ===
Q: List all the skills mentioned in the resume.

Agent answer:
 Here are all the skills mentioned in the resume (under Technical Skills):

*   **Languages:** JavaScript (ES6+), Python, TypeScript
*   **Frontend:** React, Next.js, Tailwind CSS
*   **Backend:** Node.js, Express.js, REST APIs
*   **Databases:** MongoDB, PostgreSQL
*   **Tools:** Git, Docker, Postman, Jest, VS Code


2025-12-01 06:38:15,041 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 06:38:15,044 - INFO - Model returned an answer of length 250 characters.



=== Question 3 ===
Q: What work experience does the resume describe?

Agent answer:
 Here is the work experience described in the resume (from Experience):

*   **TechNova Solutions (Remote):** Full Stack Developer Intern (Jun 2024 - Aug 2024)
*   **CloudByte Labs (San Francisco, CA):** Frontend Developer Intern (Jan 2024 - Apr 2024)


In [54]:
def evaluate_agent_answer(question: str, answer: str) -> str:
    """
    LLM-based evaluator that rates the agent's answer.
    This satisfies the 'Agent Evaluation' requirement in the capstone.
    """
    logging.info("Evaluating agent's answer...")

    eval_prompt = f"""
You are an evaluator for a PDF question-answering agent.

Evaluate the agent's response based on:
1. Accuracy and correctness based on the PDF.
2. Clarity and structure.
3. Whether the answer stays grounded in the document.
4. Whether it directly answers the question.

Give a score from 1 to 10.
Then explain the score in 3‚Äì4 lines.

---

Question:
{question}

Agent Answer:
{answer}
"""

    response = client.models.generate_content(
        model=LLM_MODEL,
        contents=[eval_prompt],
    )

    return response.text

print("Evaluation function added ‚úîÔ∏è")


Evaluation function added ‚úîÔ∏è


In [55]:
# Extract the last user question and last agent answer from chat_history
last_question = None
last_answer = None

for turn in reversed(chat_history):
    if turn["role"] == "agent" and last_answer is None:
        last_answer = turn["content"]
    elif turn["role"] == "user" and last_question is None:
        last_question = turn["content"]
    if last_question and last_answer:
        break

# If we have both, evaluate the answer
if last_question and last_answer:
    print("Last Question:\n", last_question)
    print("\nLast Agent Answer:\n", last_answer)
    print("\n------------------ Evaluation ------------------\n")
    eval_result = evaluate_agent_answer(last_question, last_answer)
    print(eval_result)
else:
    print("No Q&A found in chat_history to evaluate.")
# Extract the last user question and last agent answer from chat_history
last_question = None
last_answer = None

for turn in reversed(chat_history):
    if turn["role"] == "agent" and last_answer is None:
        last_answer = turn["content"]
    elif turn["role"] == "user" and last_question is None:
        last_question = turn["content"]
    if last_question and last_answer:
        break

# If we have both, evaluate the answer
if last_question and last_answer:
    print("Last Question:\n", last_question)
    print("\nLast Agent Answer:\n", last_answer)
    print("\n------------------ Evaluation ------------------\n")
    eval_result = evaluate_agent_answer(last_question, last_answer)
    print(eval_result)
else:
    print("No Q&A found in chat_history to evaluate.")


2025-12-01 06:38:15,082 - INFO - Evaluating agent's answer...
2025-12-01 06:38:15,084 - INFO - AFC is enabled with max remote calls: 10.


Last Question:
 What work experience does the resume describe?

Last Agent Answer:
 Here is the work experience described in the resume (from Experience):

*   **TechNova Solutions (Remote):** Full Stack Developer Intern (Jun 2024 - Aug 2024)
*   **CloudByte Labs (San Francisco, CA):** Frontend Developer Intern (Jan 2024 - Apr 2024)

------------------ Evaluation ------------------



2025-12-01 06:38:18,117 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 06:38:18,120 - INFO - Evaluating agent's answer...
2025-12-01 06:38:18,121 - INFO - AFC is enabled with max remote calls: 10.


Score: 10

The agent accurately extracts and presents all relevant work experience from the resume. The answer is clear, well-structured using bullet points, and directly addresses the question without adding extraneous information. It demonstrates excellent grounding in the document by providing specific details like roles, companies, locations, and dates.
Last Question:
 What work experience does the resume describe?

Last Agent Answer:
 Here is the work experience described in the resume (from Experience):

*   **TechNova Solutions (Remote):** Full Stack Developer Intern (Jun 2024 - Aug 2024)
*   **CloudByte Labs (San Francisco, CA):** Frontend Developer Intern (Jan 2024 - Apr 2024)

------------------ Evaluation ------------------



2025-12-01 06:38:22,600 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"


Score: 10

The agent accurately extracts and presents the work experience directly from the resume, using a clear and well-structured bullet-point format. It includes all relevant details like company, role, location, and dates for each entry, making it easy to read and understand. The answer directly addresses the question and stays entirely grounded in the document's content.


In [56]:
def ask_agent(question: str):
    """
    A helper function to manually ask questions to the agent.
    Keeps memory and logs everything.
    """
    global chat_history
    logging.info("Manual question asked: %s", question)

    result = pdf_qa_agent(
        question=question,
        pdf_doc=pdf_part,
        chat_history=chat_history
    )
    chat_history = result["updated_history"]

    print("\nQ:", question)
    print("\nAgent:\n", result["answer"])


In [57]:
ask_agent("Explain my resume as if I am a beginner who doesn‚Äôt know technical terms.")


2025-12-01 06:38:22,636 - INFO - Manual question asked: Explain my resume as if I am a beginner who doesn‚Äôt know technical terms.
2025-12-01 06:38:22,637 - INFO - Agent received question: Explain my resume as if I am a beginner who doesn‚Äôt know technical terms.
2025-12-01 06:38:22,639 - INFO - Building context from chat history with 6 messages.
2025-12-01 06:38:22,639 - INFO - Calling Gemini model: gemini-2.5-flash
2025-12-01 06:38:22,640 - INFO - AFC is enabled with max remote calls: 10.
2025-12-01 06:38:30,705 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 06:38:30,708 - INFO - Model returned an answer of length 3837 characters.



Q: Explain my resume as if I am a beginner who doesn‚Äôt know technical terms.

Agent:
 Okay, let's break down your resume into simpler terms, imagining you're explaining it to someone who's new to tech:

Here's your resume, explained:

*   **Who you are (Professional Summary):** You're like a master builder for websites and apps. You're good at making both the parts users see (the "frontend") and the hidden machinery that makes everything work behind the scenes (the "backend"). You're good at designing how different parts of an app talk to each other ("API design"), and how to put apps online ("cloud deployment"). You care about making things work well, look good, and be easy for people to use.

*   **Your Schooling (Education):** You went to a big university, UC Berkeley, and got a degree in Computer Science, which is basically the study of how computers and software work. You also have a high school diploma.

*   **Your Work Experience (Experience):**
    *   **TechNova Solutions (

In [58]:
ask_agent("What job roles am I most suitable for based on my resume?")

2025-12-01 06:38:30,714 - INFO - Manual question asked: What job roles am I most suitable for based on my resume?
2025-12-01 06:38:30,716 - INFO - Agent received question: What job roles am I most suitable for based on my resume?
2025-12-01 06:38:30,717 - INFO - Building context from chat history with 8 messages.
2025-12-01 06:38:30,718 - INFO - Calling Gemini model: gemini-2.5-flash
2025-12-01 06:38:30,719 - INFO - AFC is enabled with max remote calls: 10.
2025-12-01 06:38:33,956 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 06:38:33,959 - INFO - Model returned an answer of length 423 characters.



Q: What job roles am I most suitable for based on my resume?

Agent:
 Based on your resume, you are most suitable for the following job roles:

*   **Full Stack Developer** (mentioned in Professional Summary and Experience, and implied by Key Projects)
*   **Frontend Developer** (mentioned in Professional Summary and Experience, and supported by Certifications and Technical Skills)

Your resume also highlights skills in API design and cloud deployment, which are components of these roles.


# üìÑ Freestyle Track ‚Äì PDF Question-Answering Agent (AI Agents Intensive Capstone)

This project implements a **PDF Question-Answering Agent** using Google Gemini models.  
The agent reads a PDF (my resume) and answers questions based only on the content of that document.

---

# üéØ Goal of the Agent
The agent‚Äôs goal is to:
- Load and analyze a PDF document.
- Answer questions in natural language.
- Stay grounded strictly in the PDF content.
- Maintain memory across multiple questions.
- Provide clear, structured responses.

---

# ü§ñ Key Agent Concepts Demonstrated

## 1. **LLM-Powered Agent**
The core function `pdf_qa_agent()` uses a Gemini model as the reasoning engine.  
The agent understands queries, analyzes the PDF, and generates grounded answers.

---

## 2. **Custom Tool Usage**
I implemented a **custom tool**:

`load_pdf_tool(pdf_path)`

This tool:
- Reads a PDF from Kaggle input directory.
- Converts it into a Gemini-compatible `Part`.
- Allows the agent to ‚Äúsee‚Äù and process the resume.

---

## 3. **Sequential Agent / Loop Behavior**
The agent handles multi-step interactions:
- I asked 3 test questions in sequence.
- The agent answered each one using the PDF.
- The agent maintained memory across turns.

This demonstrates sequential agent workflow.

---

## 4. **Sessions & Memory**
The agent stores conversation turns using `chat_history`.  
Memory is compacted (only the last few turns kept) to improve context efficiency.

This enables:
- Follow-up questions
- Multi-turn reasoning
- Improved contextual grounding

---

## 5. **Observability (Logging)**
I added Python `logging` to track:
- When the agent receives questions
- When the PDF loads
- When the model is called
- When answers are returned
- When evaluations occur

This provides transparency into the agent‚Äôs execution.

---

## 6. **Agent Evaluation**
The function `evaluate_agent_answer()`:
- Uses Gemini to evaluate the agent‚Äôs response
- Scores the answer on accuracy, clarity, PDF grounding
- Provides reasoning for the score

This satisfies the ‚ÄúAgent Evaluation‚Äù requirement.

---

# üìä Results
- The agent successfully analyzed my resume.
- It answered:  
  - Summary  
  - Skills  
  - Work experience  
- The evaluation gave the agent a **10/10 score**, indicating:
  - High accuracy  
  - Grounded responses  
  - Clear structure  

---

# üöÄ Conclusion

This capstone demonstrates a complete **Freestyle AI Agent** applying key concepts taught in the Google √ó Kaggle 5-Day AI Agents Intensive Course:

‚úîÔ∏è LLM agent  
‚úîÔ∏è Tools (custom tool)  
‚úîÔ∏è Memory  
‚úîÔ∏è Sequential agent behavior  
‚úîÔ∏è Observability (logging)  
‚úîÔ∏è Agent evaluation  

The result is a functional, extendable PDF Q&A assistant that can be improved with features like RAG, multi-agent workflows, or a user interface.

