### Step 1: Import Packages

**What this step does:**

-   Installs all Python libraries required to build an AI agent with LangChain
-   Imports the core modules for agents, tools, LLMs, and the Gradio interface

**Key definitions:**

-   **LangChain**: Framework for building applications powered by language models; provides abstractions for agents, tools, and chains
-   **langchain-openai**: LangChain's integration with OpenAI's GPT models
-   **Gradio**: Library for quickly creating web UIs to demo ML models
-   **ChromaDB**: Vector database for semantic search (used in some RAG patterns)
-   **tiktoken**: OpenAI's tokenizer library for counting tokens
-   **pypdf / pdf2image / Pillow**: Libraries for parsing and rendering PDF documents

**Guardrails & best practices:**

-   Pin library versions to avoid breaking changes mid-course
-   Import only what you need to keep namespaces clean
-   Use `google.colab.userdata` (Colab Secrets) to store API keys---never hardcode them

**Why these imports matter:**

-   `ChatOpenAI` → the LLM that powers your agent
-   `tool` decorator → converts Python functions into agent-callable tools
-   `create_openai_tools_agent` → factory function that wires tools + LLM + prompt
-   `AgentExecutor` → runtime loop that invokes the agent and executes tools
-   `SystemMessage`, `ChatPromptTemplate` → structures for agent instructions

In [1]:
# ─── Cell 1: Install Packages  ─────────────────────────────────
!pip install --upgrade \
  "langchain==0.1.16" \
  "langchain-core==0.1.45" \
  "langchain-openai==0.1.3" \
  "openai>=1.0.0" \
  "gradio>=4.0.0" \
  "chromadb<0.5.0" \
  "sentence-transformers<3.0.0" \
  "pypdf<4.0.0" \
  "tiktoken" \
  "transformers<5.0.0" \
  "python-dotenv<1.1.0" \
  "pdf2image<1.17.0" \
  "pillow<11.0.0"

Collecting openai>=1.0.0
  Using cached openai-2.7.1-py3-none-any.whl.metadata (29 kB)


In [2]:
# ─── Cell 2: Imports & Setup  ─────────────────────────────────
import os, json, io, base64
from dotenv import load_dotenv
from typing import List
from PIL import Image
from pdf2image import convert_from_path
import gradio as gr

from openai import OpenAI as OpenAIClient
from langchain_openai import ChatOpenAI, OpenAI
from langchain_core.tools import tool
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage

from pypdf import PdfReader

from google.colab import userdata


  from cryptography.hazmat.primitives.ciphers.algorithms import AES, ARC4


### Step 2: Define the Tools
**What this step does:**

-   Defines three Python functions and decorates them with `@tool` so the agent can call them
-   Each tool performs a specific task: parsing PDFs, rewriting text, or analyzing visual design

**Key definitions:**

-   **Tool**: A function the agent can invoke autonomously. The `@tool` decorator tells LangChain "this function is available to the agent."
-   **Docstring**: The description inside `"""triple quotes"""`. The LLM reads this to decide *when* to call the tool---make it clear and specific.
-   **JSON-serializable output**: Tools must return simple Python types (dict, str, list, int) that can be converted to JSON---no raw objects or file handles.

**The three tools:**

1.  **`parse_resume(pdf_path, job_spec)`**
    -   Extracts raw text from a PDF using `pypdf`
    -   Returns a dict with `{"text": "...", "job_spec": "..."}`
    -   **Why the job_spec parameter?** So the agent can thread context through multiple tool calls

1.  **`enhance_resume(text, job_spec)`**
    -   Rewrites résumé text to match the target role using a separate GPT-4 call
    -   Uses `ChatOpenAI` (chat model) + `HumanMessage` wrapper
    -   Returns improved text as a string

1.  **`vision_style_analyzer(pdf_path)`**
    -   Converts the first page of the PDF to a high-res image
    -   Sends it to GPT-4 Vision (`gpt-4o`) with a critique prompt
    -   Returns a structured JSON dict: `{"style_score": 7, "template_type": "modern", "positive_points": [...], "improvement_suggestions": [...]}`

**Guardrails & best practices:**

-   **Clear docstrings**: The agent can't read your mind---if the docstring says "Extract résumé text," the agent knows to call this tool when the user uploads a PDF
-   **Fail fast**: If a PDF path is invalid, `PdfReader` will raise an error---let it bubble up so the agent can report it
-   **Type hints**: `pdf_path: str`, `job_spec: str` help with debugging and IDE autocomplete
-   **JSON mode**: For `vision_style_analyzer`, we force `response_format={"type": "json_object"}` so GPT-4 Vision always returns valid JSON

**Why split into three tools?**

-   **Separation of concerns**: Parsing, rewriting, and visual analysis are distinct tasks
-   **Reusability**: The agent can call `parse_resume` alone if the user only wants extraction
-   **Observability**: You can debug each tool independently

In [3]:
# ─── Cell 3: Tool 1 (parse_resume) ──────────────────────────────────────────
@tool
def parse_resume(pdf_path: str, job_spec: str) -> dict:
    # This docstring is what the agent reads. Its very specific.
    # It says Extract the raw text of a résumé PDF'
    # The agent knows this tool is for PDF parsing, not image analysis or web scraping.
    """
    Extract the raw text of a résumé PDF and carry along the job_spec.

    Args:
      pdf_path: Path to the uploaded PDF file
      job_spec: Target job role/description (e.g., "Software Engineer")

    Returns:
      dict with keys 'text' (str) and 'job_spec' (str)
    """
    reader = PdfReader(pdf_path)

    # Specifies the shape: extract text from all pages, join with newlines
    text = "\n".join(page.extract_text() or "" for page in reader.pages)

    # Returns a JSON-serializable dict
    return {"text": text, "job_spec": job_spec}

In [None]:
# ─── Cell 4: Tool 2 (enhance_resume) ──────────────────────────────────────────
@tool
def enhance_resume(text: str, job_spec: str) -> str:
    """
    Rewrite a résumé text for better clarity, impact, and relevance to the job_spec.

    Args:
      text: Raw résumé text (output from parse_resume)
      job_spec: Target role (e.g., "Product Manager")

    Returns:
      Enhanced résumé text (str)
    """
    prompt = f"You are an expert résumé coach. Improve the bullets and wording of this résumé to match a {job_spec} role:{text}"
    chat_llm = ChatOpenAI(temperature=0, model_name="gpt-4o") # changed from gpt-4-0613
    response = chat_llm.invoke([HumanMessage(content=prompt)])
    return response.content

In [5]:
# ─── Cell 5: Tool 3 (vision_style_analyzer) ──────────────────────────────────────────
@tool
def vision_style_analyzer(pdf_path: str) -> dict:
    """
    Analyze the first page of the resume using GPT-4 Vision.

    Returns a dict with:
      - style_score (int): design score 1 (poor) to 10 (excellent)
      - template_type (str): e.g. 'classic', 'modern', 'creative'
      - positive_points (List[str]): what works visually
      - improvement_suggestions (List[str]): concrete layout/design improvements
    """

    # ============================================================================
    # STEP 1: Convert PDF to Image
    # ============================================================================
    # Why? GPT-4 Vision can't read PDFs directly—it needs images.
    # We convert ONLY the first page because:
    #   - Résumés are judged by their first impression
    #   - Converting multiple pages wastes memory/time
    #   - Vision API charges per image
    pages = convert_from_path(pdf_path, dpi=250, first_page=1, last_page=1)
    img = pages[0].convert("RGB")

    # ============================================================================
    # STEP 2: Resize if Needed (Vision Model Limit)
    # ============================================================================
    # Why? GPT-4 Vision has a 2048px width limit for "high detail" mode.
    # If the image is wider, the API will auto-downscale it anyway—but we do it
    # manually to control quality (LANCZOS is high-quality resampling).
    w, h = img.size
    if w > 2048:
        img = img.resize((2048, int(2048 * h / w)), Image.LANCZOS)

    # ============================================================================
    # STEP 3: Compress to JPEG
    # ============================================================================
    # Why? Vision APIs accept base64-encoded images. PNG files can be huge
    # (10+ MB for a full résumé page), causing:
    #   - Slow uploads
    #   - API timeouts
    #   - Higher costs (OpenAI charges per token, images consume many tokens)
    buf = io.BytesIO()
    img.save(buf, format="JPEG", quality=75)
    buf.seek(0)


    # ============================================================================
    # STEP 4: Encode as Base64 Data URI
    # ============================================================================
    # Why? OpenAI's Vision API expects images as:
    #   1. A public URL (e.g., https://example.com/image.jpg), OR
    #   2. A base64 data URI (e.g., data:image/jpeg;base64,/9j/4AAQ...)
    b64_img = base64.b64encode(buf.getvalue()).decode("ascii")
    image_url = f"data:image/jpeg;base64,{b64_img}"

    # ============================================================================
    # STEP 5: Call GPT-4 Vision API
    # ============================================================================
    # We use the native OpenAI client (not LangChain's ChatOpenAI) because:
    #   - LangChain's vision support is less mature
    #   - We need precise control over the "detail" parameter
    client = OpenAIClient()

    # --- Craft the Prompt ---
    # This is CRITICAL. The prompt must:
    #   1. Set clear expectations ("brutally honest")
    #   2. Prevent generic advice ("don't just say 'add white space'")
    #   3. Specify exact JSON structure (so we can parse it reliably)
    prompt_text = (
      "You are a senior hiring manager at a top tech firm with a background in graphic design. "
      "Your critique must be brutally honest and focused on what will get a candidate noticed or rejected based on visual presentation alone. "
      "Analyze the attached résumé image and provide a detailed critique. "
      "Do not give generic advice like 'add more white space' unless the document is genuinely cramped; instead, point to specific sections that need it. "
      "Your response MUST be a single, raw JSON object with the following keys:\n"
      "  'style_score': An integer from 1-10 based on its immediate professional impact.\n"
      "  'template_type': One of 'classic', 'modern', or 'creative'.\n"
      "  'positive_points': An array of strings detailing what is visually effective (e.g., 'Good use of columns for readability').\n"
      "  'improvement_suggestions': An array of actionable, specific strings for visual improvement (e.g., 'The font size for section headers is inconsistent; make all headers 14pt').\n\n"
      "Provide ONLY the raw JSON object and nothing else."
    )

    # --- Build the Messages Array ---
    # OpenAI's vision API uses a nested structure:
    #   messages = [
    #       {"role": "system", "content": "<instructions>"},
    #       {"role": "user", "content": [<text>, <image>, ...]}
    #   ]
    messages = [
        {
            "role": "system",
            "content": "You are an expert design critic providing feedback as a JSON object."
        },
        {
            "role": "user",
            "content": [
                # Part 1: Text prompt
                {"type": "text", "text": prompt_text},

                # Part 2: Image
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_url,
                        "detail": "high"
                    }
                }
            ]
        }
    ]

    # --- Make the API Call ---
    # guardrail: response_format={"type": "json_object"} forces the model to
    # return ONLY valid JSON. If it returns prose, the API raises an error.
    # This prevents us from trying to parse malformed output.
    resp = client.chat.completions.create(
        model="gpt-4o",  # Using the more powerful model for better vision analysis
        messages=messages,
        temperature=0.5, # Giving it slightly more room for nuanced wording
        response_format={"type": "json_object"} # Key guardrail: Forces JSON output
    )

    # 6. Parse and return
    return json.loads(resp.choices[0].message.content)

### Step 3: Define the Instruction

**What this step does:**

-   Creates a `SystemMessage` that serves as the agent's instruction manual
-   Defines the agent's role, available tools, and decision-making logic
-   Establishes the workflow: which tools to call and in what order

**Key definitions:**

-   **SystemMessage**: A LangChain message type that sets the agent's identity and behavior (equivalent to `{"role": "system", "content": "..."}` in OpenAI's API)
-   **Agent instructions**: The "rulebook" the LLM reads before every decision---tells it *who it is*, *what tools it has*, and *when to use them*
-   **Tool-calling strategy**: The logic for chaining tools (e.g., "Always call `parse_resume` first, then `enhance_resume`")

**Why this matters:**

-   Without clear instructions, the agent will guess randomly which tool to call
-   The system prompt is where you encode domain knowledge (e.g., "résumés must be parsed before enhancement")
-   **The LLM cannot read your mind**---if you want the agent to call tools in a specific order, you must write it explicitly

**Anatomy of this system prompt:**

1.  **Identity**: "You are the Résumé Enhancement Agent"\
    → Sets the agent's persona and scope

1.  **Tool catalog**: Lists each tool with signature and description\
    → Reinforces what the docstrings already say (redundancy is good---LLMs need repetition)

1.  **Workflow rules**:
    -   "Always call `parse_resume` first"
    -   "Use `enhance_resume` on the parsed text"
    -   "Use `vision_style_analyzer` when the user asks for style critique"\
        → Explicit conditionals guide the agent's decision tree

**Guardrails & best practices:**

-   **Be explicit**: Don't assume the agent will infer the workflow---write it out step-by-step
-   **Use conditionals**: "If the user asks X, do Y" structures prevent tool misuse
-   **Redundancy is good**: Even though tools have docstrings, restating their purpose in the system message reinforces correct behavior
-   **Keep it concise**: Long prompts dilute focus---this one is ~150 words, enough for clarity without overwhelming the context window

**Common pitfall:**

-   **Vague instructions** like "Help the user improve their résumé" → the agent won't know whether to parse, enhance, or analyze first
-   **Fix:** Write explicit steps: "First parse, then enhance, then format the output"


In [6]:
# ─── Cell 6: Define the Instruction/System Prompt  ────────
SYSTEM_MESSAGE = SystemMessage(
    content="""
      You are the Résumé Enhancement Agent. You only “communicate” by calling one of these three functions:

      1. parse_resume(pdf_path: str, job_spec: str) → {text, job_spec}
        • Always call this first to extract raw text against the spec.

      2. enhance_resume(text: str, job_spec: str) → {enhanced}
        • Use this on the parsed text to rewrite/improve the résumé.

      3. vision_style_analyzer(pdf_path: str) → {style_score, template_type, suggestions}
        • Use this when the user asks for a style critique or template feedback.

    Workflow rules:
    If the user asks to critique the visual design or template of the résumé, call vision_style_analyzer.
    If the user asks to enhance content, follow the chain: parse_resume → enhance_resume.
"""
)

#### Step 4: Initialise the Model

**What this step does:**

-   Sets the OpenAI API key securely using Colab Secrets
-   Assembles the three tools into a list
-   Creates a `ChatOpenAI` instance (the LLM "brain")
-   Builds a `ChatPromptTemplate` that combines system instructions, user input, and conversation history

**Key definitions:**

-   **API key**: Your secret credential for accessing OpenAI's models---treat it like a password
-   **`ChatOpenAI`**: LangChain's wrapper around OpenAI's chat models (GPT-4, GPT-3.5, etc.)
-   **`temperature=0`**: Controls randomness; 0 = deterministic (same input → same output), 1 = creative/varied
-   **`ChatPromptTemplate`**: A template that defines the structure of messages sent to the LLM
-   **`MessagesPlaceholder`**: A dynamic slot in the prompt where LangChain inserts the agent's "scratchpad" (history of tool calls and results)

In [None]:
# ─── Cell 7: Initialise the Model ──────────────────────────────────────────

# 1. Set OpenAI API key from Colab Secrets
# This sets an environment variable. LangChain's ChatOpenAI automatically looks for OPENAI_API_KEY in os.environ.
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

# 2. Assemble the tools list
# Simple list.
# The order doesn't matter here—the agent decides call order based on the system message, not this list.
# But keep it consistent for readability.
tools = [parse_resume, enhance_resume, vision_style_analyzer]

# 3. Initialize the LLM
# - model='gpt-4o': The latest GPT-4 variant with vision and function calling
# - temperature=0: We set temperature to 0, which means the outputs are deterministic and consistent. 
# So every time you give it the same input, you get the same output. 
# This is crucial for production applications where you want predictable behavior.
# If you want creative variation (e.g., multiple résumé rewrites),
# use temperature=0.7, but for this workshop we want reproducibility.
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# 4. Build the prompt template
# ChatPromptTemplate: A template that defines the structure of messages sent to the LLM
prompt = ChatPromptTemplate.from_messages([
    SYSTEM_MESSAGE,
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

#### Step 5: Initialise the AI Agent

**What this step does:**

-   Wires the LLM, tools, and prompt template into a functional agent
-   Wraps the agent in an `AgentExecutor` that handles the execution loop, error handling, and logging

**Key definitions:**

-   **`create_openai_tools_agent`**: Factory function that binds tools to the LLM and returns an agent runnable
-   **Agent**: The decision-making component---reads the prompt, decides which tool to call, formats the function call
-   **`AgentExecutor`**: The runtime loop that executes the agent iteratively until the task is complete
-   **`verbose=True`**: Enables detailed logging (shows each tool call, reasoning, and result in real-time)

**How these two pieces work together:**

1.  **The agent (created by `create_openai_tools_agent`):**
    -   Takes the `llm`, `tools`, and `prompt` from Step 4
    -   Automatically binds tools to the LLM (calls `llm.bind_tools(tools)` internally)
    -   Returns a "runnable"---a LangChain object that can process one decision cycle
    -   **Does NOT execute tools**---it only generates the decision ("I should call `parse_resume` with these args")

1.  **The AgentExecutor (wraps the agent):**
    -   Runs the agent in a loop:
        1.  Invoke the agent → get a tool call decision
        2.  Execute the tool → get the result
        3.  Append result to scratchpad
        4.  Invoke the agent again → repeat until done
    -   Handles errors (e.g., if a tool raises an exception, the executor catches it and reports to the agent)
    -   Enforces limits (default: max 15 iterations to prevent infinite loops)
    -   Logs everything if `verbose=True`

**Why separate the agent from the executor?**

-   **Separation of concerns**: The agent handles *reasoning*, the executor handles *orchestration*
-   **Testability**: You can test the agent's decision-making without executing tools
-   **Configurability**: You can swap out executors (e.g., use a custom executor with rate limiting) without changing the agent

**What happens when you run the agent:**


```
result = agent_executor.invoke({"input": "Enhance my résumé for a PM role"})
```

**Behind the scenes:**

1.  **Iteration 1:**
    -   AgentExecutor calls the agent with the prompt template filled in
    -   Agent reads the system message and user input
    -   Agent decides: "I should call `parse_resume`"
    -   AgentExecutor executes `parse_resume(pdf_path, "PM")`
    -   Result appended to scratchpad

1.  **Iteration 2:**
    -   AgentExecutor calls the agent again (with the updated scratchpad)
    -   Agent reads: "I called `parse_resume` and got text. Now I should call `enhance_resume`"
    -   AgentExecutor executes `enhance_resume(text, "PM")`
    -   Result appended to scratchpad

1.  **Iteration 3:**
    -   Agent reads: "I have the enhanced text. Task complete."
    -   Agent generates a final response (not a tool call)
    -   AgentExecutor returns the final output

**Guardrails & best practices:**

-   **Always use `verbose=True` during development** so you can see the agent's reasoning and debug issues
-   **Set `max_iterations`** if you have expensive tools (default is 15, which is usually safe)
-   **Handle errors gracefully**: If a tool fails, the executor will report the error to the agent, which can decide to retry or abort
-   **The agent is stateless**: Each invocation starts fresh---if you want multi-turn conversations, you must pass the conversation history explicitly

**Common pitfall:**

-   **Calling the agent directly** (`agent.invoke(...)`) instead of the executor → the agent will generate tool calls but not execute them
-   **Fix:** Always invoke the `AgentExecutor`, not the raw agent

In [8]:
# ─── Cell 8: Initialise the AI Agent ──────────────────────────────────────────

# 1. Create the agent using the modern 'create_openai_tools_agent' function
agent = create_openai_tools_agent(llm, tools, prompt)

# 2. Create the AgentExecutor, which will run the agent and tools in a loop
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
)

#### Step 6: Initialise the Interface

**What this step does:**

-   Creates a Gradio web interface with file upload, text input, and output display
-   Wires two buttons ("Enhance Résumé" and "Critique Style") to the agent executor
-   Launches a shareable web app for interacting with the AI agent

**Key definitions:**

-   **Gradio (`gr.Blocks`)**: A Python library for building ML/AI web UIs with minimal code
-   **`gr.File`**: File upload component---returns a file object with `.name` (path on disk)
-   **`gr.Textbox`**: Text input/output component
-   **`gr.Button.click()`**: Connects a button to a Python function; when clicked, runs the function with the specified inputs and displays the result in outputs
-   **`.queue().launch(share=True)`**: Starts the web server; `share=True` creates a public URL (valid for 72 hours)

**Why Gradio for this project:**

-   **Rapid prototyping**: 20 lines of code to go from working agent to working demo
-   **No frontend knowledge needed**: Pure Python---no HTML, CSS, or JavaScript
-   **Shareable**: `share=True` generates a public link you can send to colleagues or clients
-   **Perfect for agents**: Gradio's simplicity lets us focus on the agent logic, not UI design

**How the interface works:**

1.  **User uploads a PDF** → Gradio saves it to `/tmp/gradio/...` and passes the file object to your function
2.  **User enters a job spec** (e.g., "Senior Data Scientist") → stored in the `job_spec` textbox
3.  **User clicks "Enhance Résumé"** → triggers the lambda function:

```
   lambda pdf, spec: run_agent(
       f"Please enhance the résumé at {pdf.name} for a {spec} position."
   )
```

  -   `pdf.name` is the file path (e.g., `/tmp/gradio/abc123/resume.pdf`)
  -   The lambda constructs a natural language query and passes it to `run_agent()`

4.  **`run_agent()` calls the agent executor** → agent parses → enhances → returns result
5.  **Result displayed in the output textbox**

**The two workflows:**
| Button | Query Template | Agent's Action |
| --- | --- | --- |
| **Enhance Résumé** | `"Please enhance the résumé at {path} for a {spec} position."` | Calls `parse_resume` → `enhance_resume` → returns improved text |
| **Critique Style** | `"Please critique the visual design and template of the résumé at {path}."` | Calls `vision_style_analyzer` → returns JSON with score, type, suggestions |

In [None]:
# ─── Cell 9: Initialise the Interface ──────────────────────────────────────────
with gr.Blocks() as demo:
    gr.Markdown("## AI‐Powered Résumé Enricher + Style Critic")

    with gr.Row():
        # This tells Gradio to accept any file and lets our Python code do the real validation.
        resume_file = gr.File(label="Upload Résumé (PDF)")
        job_spec    = gr.Textbox(label="Job Specification",
                                 placeholder="e.g. Senior Software Engineer")

    btn_enhance = gr.Button("Enhance Résumé")
    btn_style   = gr.Button("Critique Style")
    output      = gr.Textbox(label="Result", lines=20)

    # Function to handle the agent invocation and extract the output
    def run_agent(query):
        response = agent_executor.invoke({"input": query})
        return response['output']

    # Launch enhancement flow
    btn_enhance.click(
        fn=lambda pdf, spec: run_agent(
            f"Please enhance the résumé at {pdf.name} for a {spec} position."
        ),
        inputs=[resume_file, job_spec],
        outputs=output,
    )

    # Launch style-critique flow
    btn_style.click(
        fn=lambda pdf: run_agent(
            f"Please critique the visual design and template of the résumé at {pdf.name}."
        ),
        inputs=[resume_file],
        outputs=output,
    )

    demo.queue().launch(share=True, debug=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://9d73bdea619f5ceb2e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
