# Freestyle Track ‚Äì PDF Question-Answering Agent (Google √ó Kaggle 5-Day AI Agents Intensive Capstone)

This project implements a **PDF Question-Answering Agent** powered by Google's Gemini models.

**Track:** Freestyle  
**Agent goal:** Read a PDF and answer questions grounded only in that document.

**Key agent concepts demonstrated:**

1. **LLM-powered agent** (Gemini as the reasoning engine)  
2. **Tools** (custom PDF loading tool)  
3. **Sequential / loop agent** (multi-turn Q&A over the same PDF)  
4. **Sessions & memory** (chat history across questions)  
5. **Observability** (logging key steps in the agent)  
6. **Agent evaluation** (LLM-based scoring of the agent's answers)  


In [2]:
!pip install -q -U google-genai

from google import genai
from google.genai import types
from kaggle_secrets import UserSecretsClient

# Get API key from Kaggle Secrets (Add-ons ‚ûú Secrets ‚ûú GOOGLE_API_KEY)
user_secrets = UserSecretsClient()
GOOGLE_API_KEY = user_secrets.get_secret("GOOGLE_API_KEY")

client = genai.Client(api_key=GOOGLE_API_KEY)

# Choose a model ‚Äì you can adjust based on your access
LLM_MODEL = "gemini-2.5-flash"  # fallback: "gemini-2.0-flash" or "gemini-1.5-flash"

print("Gemini client initialized ‚úÖ")


Gemini client initialized ‚úÖ


In [3]:
import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

logging.info("Logging initialized for the PDF Q&A Agent.")


2025-12-01 05:50:25,784 - INFO - Logging initialized for the PDF Q&A Agent.


In [4]:
from pathlib import Path
from google.genai import types
import logging

# Path to your PDF (use your dataset and filename)
PDF_PATH = "/kaggle/input/document-analysis/MohseenAliHawaldar.pdf"

def load_pdf_tool(pdf_path: str) -> types.Part:
    """
    Custom Tool: Load a PDF from disk and convert it into a Gemini Part.
    This tool allows the agent to read your PDF content.
    """
    logging.info("Loading PDF from: %s", pdf_path)
    pdf_bytes = Path(pdf_path).read_bytes()

    # Convert to Part for Gemini
    pdf_part = types.Part.from_bytes(
        data=pdf_bytes,
        mime_type="application/pdf"
    )

    logging.info("PDF loaded successfully into Gemini Part.")
    return pdf_part


# Load it now
pdf_part = load_pdf_tool(PDF_PATH)
print("PDF loaded as Gemini Part ‚úîÔ∏è")


2025-12-01 05:50:25,811 - INFO - Loading PDF from: /kaggle/input/document-analysis/MohseenAliHawaldar.pdf
2025-12-01 05:50:25,823 - INFO - PDF loaded successfully into Gemini Part.


PDF loaded as Gemini Part ‚úîÔ∏è


In [5]:
from typing import List, Dict
import logging

def pdf_qa_agent(
    question: str,
    pdf_doc: types.Part,
    chat_history: List[Dict[str, str]] | None = None,
) -> Dict[str, object]:
    """
    PDF Question-Answering Agent.

    Inputs:
      - question: user question in natural language
      - pdf_doc: Gemini Part containing the PDF
      - chat_history: optional list of previous turns: {'role': 'user'/'agent', 'content': str}

    Returns:
      - dict with keys: 'answer' (str), 'updated_history' (list)
    """
    logging.info("Agent received question: %s", question)

    if chat_history is None:
        chat_history = []

    # --- Session & Memory: build compact context from recent history ---
    history_text = ""
    if chat_history:
        logging.info("Building context from chat history with %d messages.", len(chat_history))
        lines = []
        # keep only the last few turns (simple context compaction)
        for turn in chat_history[-6:]:
            role = turn["role"]
            content = turn["content"]
            lines.append(f"{role.upper()}: {content}")
        history_text = "Previous conversation:\n" + "\n".join(lines)

    # --- System instruction: how the agent should behave ---
    system_instruction = (
        "You are a helpful PDF question-answering agent. "
        "You must answer ONLY using information from the attached PDF (the user's resume). "
        "If the answer is not in the document, say you do not know. "
        "Be concise and, when helpful, mention where in the resume you found the info "
        "like 'In the experience section' or 'Under skills'."
    )

    contents = [system_instruction]

    if history_text:
        contents.append(history_text)

    # Attach the PDF and the current question
    contents.append(pdf_doc)
    contents.append(f"Question: {question}")

    logging.info("Calling Gemini model: %s", LLM_MODEL)
    response = client.models.generate_content(
        model=LLM_MODEL,
        contents=contents,
    )

    answer_text = response.text
    logging.info("Model returned an answer of length %d characters.", len(answer_text))

    # --- Update chat history (session memory) ---
    new_history = chat_history + [
        {"role": "user", "content": question},
        {"role": "agent", "content": answer_text},
    ]

    return {
        "answer": answer_text,
        "updated_history": new_history,
    }

print("PDF Q&A Agent defined üöÄ")


PDF Q&A Agent defined üöÄ


In [6]:
chat_history = []

questions = [
    "Give me a short summary of this resume in 4-5 bullet points.",
    "List all the skills mentioned in the resume.",
    "What work experience does the resume describe?",
]

for idx, q in enumerate(questions, start=1):
    logging.info("Running test question %d", idx)
    result = pdf_qa_agent(
        question=q,
        pdf_doc=pdf_part,
        chat_history=chat_history,
    )
    chat_history = result["updated_history"]

    print(f"\n=== Question {idx} ===")
    print("Q:", q)
    print("\nAgent answer:\n", result["answer"])
    print("=" * 60)


2025-12-01 05:50:25,869 - INFO - Running test question 1
2025-12-01 05:50:25,871 - INFO - Agent received question: Give me a short summary of this resume in 4-5 bullet points.
2025-12-01 05:50:25,872 - INFO - Calling Gemini model: gemini-2.5-flash
2025-12-01 05:50:25,874 - INFO - AFC is enabled with max remote calls: 10.
2025-12-01 05:50:30,431 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 05:50:30,436 - INFO - Model returned an answer of length 656 characters.
2025-12-01 05:50:30,437 - INFO - Running test question 2
2025-12-01 05:50:30,437 - INFO - Agent received question: List all the skills mentioned in the resume.
2025-12-01 05:50:30,438 - INFO - Building context from chat history with 2 messages.
2025-12-01 05:50:30,439 - INFO - Calling Gemini model: gemini-2.5-flash
2025-12-01 05:50:30,440 - INFO - AFC is enabled with max remote calls: 10.



=== Question 1 ===
Q: Give me a short summary of this resume in 4-5 bullet points.

Agent answer:
 Here's a short summary of the resume:

*   Full Stack Developer with experience in the MERN stack and modern UI frameworks. (In the Professional Summary)
*   Holds a Bachelor of Engineering in Computer Science and Engineering (expected Jun 2026). (Under Education)
*   Completed internships as a Full Stack Developer Intern at Codec Technologies and a Front-End Software Engineering Intern at Skyscanner. (Under Experience)
*   Proficient in languages like JavaScript, Python, Java, and SQL, and frameworks such as React, Node.js, and Express.js. (Under Technical Skills)
*   Achieved over 90 GitHub contributions in the last 3 months. (Under Achievements)


2025-12-01 05:50:32,748 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 05:50:32,751 - INFO - Model returned an answer of length 346 characters.
2025-12-01 05:50:32,752 - INFO - Running test question 3
2025-12-01 05:50:32,753 - INFO - Agent received question: What work experience does the resume describe?
2025-12-01 05:50:32,754 - INFO - Building context from chat history with 4 messages.
2025-12-01 05:50:32,754 - INFO - Calling Gemini model: gemini-2.5-flash
2025-12-01 05:50:32,755 - INFO - AFC is enabled with max remote calls: 10.



=== Question 2 ===
Q: List all the skills mentioned in the resume.

Agent answer:
 Here are the skills mentioned in the resume:

*   **Languages:** JavaScript (ES6+), Python, Java, SQL (Under Technical Skills)
*   **Frameworks and Libraries:** React, Node.js, Express.js, Redux, Tailwind CSS, Framer Motion (Under Technical Skills)
*   **Database and Tools:** MongoDB, Git, Docker, Postman, Jest, VS Code (Under Technical Skills)


2025-12-01 05:50:35,192 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 05:50:35,195 - INFO - Model returned an answer of length 767 characters.



=== Question 3 ===
Q: What work experience does the resume describe?

Agent answer:
 The resume describes the following work experience:

*   **Full Stack Developer Intern** at Codec Technologies, Pune, Maharashtra (Virtual) from Jan 2025 to Feb 2025. In this role, the individual engineered web solutions using the MERN Stack, built secure RESTful APIs with JWT Authentication and RBAC, optimized MongoDB queries, and collaborated on debugging issues using MVC architecture. (Under Experience)
*   **Front-End Software Engineering Intern** at Skyscanner, Virtual from Sep 2025 to Oct 2025. Responsibilities included developing a high-traffic travel date-picker component using React, writing automated tests using Jest, enhancing frontend performance with code splitting, and ensuring cross-browser compatibility and WCAG adherence. (Under Experience)


In [7]:
def evaluate_agent_answer(question: str, answer: str) -> str:
    """
    LLM-based evaluator that rates the agent's answer.
    This satisfies the 'Agent Evaluation' requirement in the capstone.
    """
    logging.info("Evaluating agent's answer...")

    eval_prompt = f"""
You are an evaluator for a PDF question-answering agent.

Evaluate the agent's response based on:
1. Accuracy and correctness based on the PDF.
2. Clarity and structure.
3. Whether the answer stays grounded in the document.
4. Whether it directly answers the question.

Give a score from 1 to 10.
Then explain the score in 3‚Äì4 lines.

---

Question:
{question}

Agent Answer:
{answer}
"""

    response = client.models.generate_content(
        model=LLM_MODEL,
        contents=[eval_prompt],
    )

    return response.text

print("Evaluation function added ‚úîÔ∏è")


Evaluation function added ‚úîÔ∏è


In [8]:
# Extract the last user question and last agent answer from chat_history
last_question = None
last_answer = None

for turn in reversed(chat_history):
    if turn["role"] == "agent" and last_answer is None:
        last_answer = turn["content"]
    elif turn["role"] == "user" and last_question is None:
        last_question = turn["content"]
    if last_question and last_answer:
        break

# If we have both, evaluate the answer
if last_question and last_answer:
    print("Last Question:\n", last_question)
    print("\nLast Agent Answer:\n", last_answer)
    print("\n------------------ Evaluation ------------------\n")
    eval_result = evaluate_agent_answer(last_question, last_answer)
    print(eval_result)
else:
    print("No Q&A found in chat_history to evaluate.")
# Extract the last user question and last agent answer from chat_history
last_question = None
last_answer = None

for turn in reversed(chat_history):
    if turn["role"] == "agent" and last_answer is None:
        last_answer = turn["content"]
    elif turn["role"] == "user" and last_question is None:
        last_question = turn["content"]
    if last_question and last_answer:
        break

# If we have both, evaluate the answer
if last_question and last_answer:
    print("Last Question:\n", last_question)
    print("\nLast Agent Answer:\n", last_answer)
    print("\n------------------ Evaluation ------------------\n")
    eval_result = evaluate_agent_answer(last_question, last_answer)
    print(eval_result)
else:
    print("No Q&A found in chat_history to evaluate.")


2025-12-01 05:50:35,232 - INFO - Evaluating agent's answer...
2025-12-01 05:50:35,233 - INFO - AFC is enabled with max remote calls: 10.


Last Question:
 What work experience does the resume describe?

Last Agent Answer:
 The resume describes the following work experience:

*   **Full Stack Developer Intern** at Codec Technologies, Pune, Maharashtra (Virtual) from Jan 2025 to Feb 2025. In this role, the individual engineered web solutions using the MERN Stack, built secure RESTful APIs with JWT Authentication and RBAC, optimized MongoDB queries, and collaborated on debugging issues using MVC architecture. (Under Experience)
*   **Front-End Software Engineering Intern** at Skyscanner, Virtual from Sep 2025 to Oct 2025. Responsibilities included developing a high-traffic travel date-picker component using React, writing automated tests using Jest, enhancing frontend performance with code splitting, and ensuring cross-browser compatibility and WCAG adherence. (Under Experience)

------------------ Evaluation ------------------



2025-12-01 05:50:40,332 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 05:50:40,336 - INFO - Evaluating agent's answer...
2025-12-01 05:50:40,337 - INFO - AFC is enabled with max remote calls: 10.


Score: 10

The agent accurately and comprehensively lists the described work experiences from the resume, including roles, companies, dates, and key responsibilities. The answer is exceptionally clear, well-structured with bullet points, and directly addresses the question while remaining entirely grounded in the document's content.
Last Question:
 What work experience does the resume describe?

Last Agent Answer:
 The resume describes the following work experience:

*   **Full Stack Developer Intern** at Codec Technologies, Pune, Maharashtra (Virtual) from Jan 2025 to Feb 2025. In this role, the individual engineered web solutions using the MERN Stack, built secure RESTful APIs with JWT Authentication and RBAC, optimized MongoDB queries, and collaborated on debugging issues using MVC architecture. (Under Experience)
*   **Front-End Software Engineering Intern** at Skyscanner, Virtual from Sep 2025 to Oct 2025. Responsibilities included developing a high-traffic travel date-picker comp

2025-12-01 05:50:43,524 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"


Score: 10

The agent's response accurately and comprehensively lists all work experience described in the resume. It is clearly structured with bullet points, bolding, and includes all relevant details like dates, roles, companies, and responsibilities. The answer directly addresses the question and is entirely grounded in the document's content.


In [9]:
def ask_agent(question: str):
    """
    A helper function to manually ask questions to the agent.
    Keeps memory and logs everything.
    """
    global chat_history
    logging.info("Manual question asked: %s", question)

    result = pdf_qa_agent(
        question=question,
        pdf_doc=pdf_part,
        chat_history=chat_history
    )
    chat_history = result["updated_history"]

    print("\nQ:", question)
    print("\nAgent:\n", result["answer"])


In [10]:
ask_agent("Explain my resume as if I am a beginner who doesn‚Äôt know technical terms.")


2025-12-01 05:50:43,557 - INFO - Manual question asked: Explain my resume as if I am a beginner who doesn‚Äôt know technical terms.
2025-12-01 05:50:43,558 - INFO - Agent received question: Explain my resume as if I am a beginner who doesn‚Äôt know technical terms.
2025-12-01 05:50:43,559 - INFO - Building context from chat history with 6 messages.
2025-12-01 05:50:43,560 - INFO - Calling Gemini model: gemini-2.5-flash
2025-12-01 05:50:43,561 - INFO - AFC is enabled with max remote calls: 10.
2025-12-01 05:50:54,398 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 05:50:54,400 - INFO - Model returned an answer of length 2056 characters.



Q: Explain my resume as if I am a beginner who doesn‚Äôt know technical terms.

Agent:
 Here's a summary of the resume explained in simple terms for a beginner:

This resume is for someone who builds complete websites. They are skilled at creating both the parts you see and interact with (like buttons and text) and the hidden systems that make websites run smoothly behind the scenes.

*   **What they're studying (Under Education):** They are currently pursuing a college degree in Computer Science and Engineering, which they expect to complete in June 2026. They have also finished their high school education.
*   **Where they've gained work experience (Under Experience):**
    *   At **Codec Technologies**, they helped build entire websites using a popular set of tools, making them faster to put online. They also set up secure ways to handle user data and made the website's information storage system work quicker.
    *   At **Skyscanner**, they created a popular part of a travel websi

In [11]:
ask_agent("What job roles am I most suitable for based on my resume?")

2025-12-01 05:50:54,407 - INFO - Manual question asked: What job roles am I most suitable for based on my resume?
2025-12-01 05:50:54,411 - INFO - Agent received question: What job roles am I most suitable for based on my resume?
2025-12-01 05:50:54,412 - INFO - Building context from chat history with 8 messages.
2025-12-01 05:50:54,413 - INFO - Calling Gemini model: gemini-2.5-flash
2025-12-01 05:50:54,413 - INFO - AFC is enabled with max remote calls: 10.
2025-12-01 05:50:58,550 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-12-01 05:50:58,553 - INFO - Model returned an answer of length 293 characters.



Q: What job roles am I most suitable for based on my resume?

Agent:
 Based on your resume, you are most suitable for the following job roles:

*   **Full Stack Developer** (As stated in your Professional Summary and experience at Codec Technologies)
*   **Front-End Software Engineer/Developer** (As highlighted in your experience at Skyscanner and Key Projects)


# üìÑ Freestyle Track ‚Äì PDF Question-Answering Agent (AI Agents Intensive Capstone)

This project implements a **PDF Question-Answering Agent** using Google Gemini models.  
The agent reads a PDF (my resume) and answers questions based only on the content of that document.

---

# üéØ Goal of the Agent
The agent‚Äôs goal is to:
- Load and analyze a PDF document.
- Answer questions in natural language.
- Stay grounded strictly in the PDF content.
- Maintain memory across multiple questions.
- Provide clear, structured responses.

---

# ü§ñ Key Agent Concepts Demonstrated

## 1. **LLM-Powered Agent**
The core function `pdf_qa_agent()` uses a Gemini model as the reasoning engine.  
The agent understands queries, analyzes the PDF, and generates grounded answers.

---

## 2. **Custom Tool Usage**
I implemented a **custom tool**:

`load_pdf_tool(pdf_path)`

This tool:
- Reads a PDF from Kaggle input directory.
- Converts it into a Gemini-compatible `Part`.
- Allows the agent to ‚Äúsee‚Äù and process the resume.

---

## 3. **Sequential Agent / Loop Behavior**
The agent handles multi-step interactions:
- I asked 3 test questions in sequence.
- The agent answered each one using the PDF.
- The agent maintained memory across turns.

This demonstrates sequential agent workflow.

---

## 4. **Sessions & Memory**
The agent stores conversation turns using `chat_history`.  
Memory is compacted (only the last few turns kept) to improve context efficiency.

This enables:
- Follow-up questions
- Multi-turn reasoning
- Improved contextual grounding

---

## 5. **Observability (Logging)**
I added Python `logging` to track:
- When the agent receives questions
- When the PDF loads
- When the model is called
- When answers are returned
- When evaluations occur

This provides transparency into the agent‚Äôs execution.

---

## 6. **Agent Evaluation**
The function `evaluate_agent_answer()`:
- Uses Gemini to evaluate the agent‚Äôs response
- Scores the answer on accuracy, clarity, PDF grounding
- Provides reasoning for the score

This satisfies the ‚ÄúAgent Evaluation‚Äù requirement.

---

# üìä Results
- The agent successfully analyzed my resume.
- It answered:  
  - Summary  
  - Skills  
  - Work experience  
- The evaluation gave the agent a **10/10 score**, indicating:
  - High accuracy  
  - Grounded responses  
  - Clear structure  

---

# üöÄ Conclusion

This capstone demonstrates a complete **Freestyle AI Agent** applying key concepts taught in the Google √ó Kaggle 5-Day AI Agents Intensive Course:

‚úîÔ∏è LLM agent  
‚úîÔ∏è Tools (custom tool)  
‚úîÔ∏è Memory  
‚úîÔ∏è Sequential agent behavior  
‚úîÔ∏è Observability (logging)  
‚úîÔ∏è Agent evaluation  

The result is a functional, extendable PDF Q&A assistant that can be improved with features like RAG, multi-agent workflows, or a user interface.

