<a href="https://colab.research.google.com/github/dipanjanS/mastering-intelligent-agents-langgraph-workshop-dhs2025/blob/main/Module-5-Building-Agentic-RAG-and-Multimodal-Agentic-AI-Systems/M5LC2_Build_a_Multimodal_Multi_Agent_System_for_Invoice_Processing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build a Multimodal Multi-Agent System for **Invoice Processing** with LangGraph

> A real-world example of a **multimodal multi-agent system** that extracts data from PNG/PDF invoices, transform it into a strict schema, and store/query it with a database.

![Architecture Diagram](https://i.imgur.com/gynMdpH.png)

---

## What this notebook does

- Accepts **invoice files** (PNG/PDF).
- Uses a **multimodal LLM** to extract invoice data as **Markdown**.
- Converts Markdown → **structured JSON** with checks on totals.
- **Persists** invoices and **retrieves** pending ones from a simple SQLite DB.
- Orchestrates the above with a **Supervisor Agent** and three specialist agents.

---

## Multi-Agent Architecture (at a glance)

### 1) Supervisor Agent
- **Role:** Central router that decides which specialist agent should act next.
- **Thinks only about routing** (does not do tool calls itself).
- **Routing policy (supports single or multiple invoices):**
  - **Invoice Extraction Agent** → when the latest user input includes one or more **file paths** ending in `.png`/`.pdf` and we **haven't** yet extracted Markdown for those files. The extraction agent should call the correct tool **per file** and return Markdowns (clearly separated per file).
  - **Invoice Transformation Agent** → when **Markdown is present** (for one or many invoices) but **no structured JSON** yet. The agent converts each Markdown block into **Structured JSON** (one object per invoice).
  - **Invoice Management Agent** → when the user wants to **store** structured JSON (e.g., “save/insert”) **or** **retrieve** pending invoices (e.g., “show/list pending”). It will either insert into the DB or return a Markdown table.
  - **FINISH** → when the objective is complete (e.g., after storing or after showing pending invoices) and nothing else is requested.
---

## 2) Specialized Agents

### 🧾 Invoice Extraction Agent
- **Goal:** Extract invoice content using multimodal LLM (Gemini) **without changing it**.
- **Tools:**  
  - `extract_invoice_data_png(path)` – PNG → Markdown  
  - `extract_invoice_data_pdf(path)` – PDF → Markdown  
- **Output:** Markdown that preserves fields and tables.

### 🧩 Invoice Transformation Agent
- **Goal:** Convert Markdown → **InvoiceSchema JSON** (with totals checks).
- **Tool:**  
  - `transform_invoice_data(markdown)` – uses GPT-4o-mini with `with_structured_output` (Pydantic).
- **Output:** Strict JSON (invoice_id, vendor, dates, line items, subtotal, tax_total, grand_total, currency).

### 🗄️ Invoice Management Agent
- **Goal:** **Store** or **Retrieve** invoices based on the query intent.
- **Tools:**  
  - `store_invoice(json_string)` – inserts into SQLite (status defaults to *Pending*).  
  - `get_pending_invoices()` – fetches pending invoices and returns a **Markdown table** via LLM formatting.
- **Output:** Storage confirmation **or** a clean Markdown listing of pending invoices.

---

## Tools Layer

- **Extraction (multimodal):** `extract_invoice_data_png`, `extract_invoice_data_pdf`  
- **Transformation (structured):** `transform_invoice_data`  
- **Persistence & Retrieval:** `store_invoice`, `get_pending_invoices`

---

## Data Sources

- **Invoice Files:** PNG/PDF inputs.  
- **Invoice Processing Database (SQLite):** `invoices` table with a JSON column and virtual columns for quick filtering.

---

## End-to-End Flow

1. **Provide one or more invoice file paths** → Supervisor routes to **Extraction**.  
2. **Markdown available** → Supervisor routes to **Transformation** (creates InvoiceSchema JSON).  
3. **JSON ready / user requests storage or pending list** → Supervisor routes to **Management**.  
4. **Done** → Supervisor returns a concise **final response** (and finishes).

---

## Requirements & Setup

- API keys:
  - `GEMINI_API_KEY` (for PNG/PDF extraction to Markdown)
  - OpenAI (for GPT-4o-mini structured transformation + formatting)
- Python deps: `langgraph`, `langchain-openai`, `google-genai`, `pydantic`, `sqlite3`

---

## Notebook Layout

1. **Install & Keys**  
2. **Tool Definitions** (extraction, transformation, storage/retrieval)  
3. **Agents** (Extraction / Transformation / Management)  
4. **Supervisor & Graph Wiring**  
5. **Run Examples** (single file, multiple files; store + list pending)

---


## Install Gemini, LangChain, OpenAI & LangGraph Dependencies

In [None]:
!pip install google-genai==1.29.0 langchain==0.3.27 langchain-community==0.3.27 langchain-openai==0.3.30 langgraph==0.6.5 --quiet

## Enter API Keys & Setup Environment Variables

In [None]:
import os
import getpass

# OpenAI API Key (for chat & embeddings)
if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key (https://platform.openai.com/account/api-keys):\n")

# Gemini API Key (for multimodal invoice processing)
if not os.environ.get("GEMINI_API_KEY"):
    os.environ["GEMINI_API_KEY"] = getpass.getpass("Enter your Google Gemini API key (https://aistudio.google.com/apikey):\n")

## Create Invoice Processing Database


In [None]:
!rm invoice_db.db

In [None]:
import sqlite3

# Create DB & table if not exists
INVOICE_DATABASE = "invoice_db.db"
conn = sqlite3.connect(INVOICE_DATABASE)
cur = conn.cursor()

cur.execute("""
CREATE TABLE IF NOT EXISTS invoices (
  id INTEGER PRIMARY KEY,
  doc JSON,
  invoice_id TEXT GENERATED ALWAYS AS (json_extract(doc,'$.invoice_id')) VIRTUAL,
  vendor TEXT GENERATED ALWAYS AS (json_extract(doc,'$.vendor')) VIRTUAL,
  invoice_date TEXT GENERATED ALWAYS AS (json_extract(doc,'$.invoice_date')) VIRTUAL,
  grand_total REAL GENERATED ALWAYS AS (json_extract(doc,'$.grand_total')) VIRTUAL,
  current_status TEXT DEFAULT 'Pending'
);
""")
conn.commit()

## Implement Tools

- **Extraction (multimodal):** `extract_invoice_data_png`, `extract_invoice_data_pdf`  
- **Transformation (structured):** `transform_invoice_data`  
- **Persistence & Retrieval:** `store_invoice`, `get_pending_invoices`

In [None]:
from google import genai
from google.genai import types

GEMINI_MODEL_NAME = "gemini-2.0-flash"

gemini_client = genai.Client()

PNG_PROCESSOR_CONFIG = types.GenerateContentConfig(
    temperature=0,
    system_instruction="""You are an invoice processor.
    Extract all the major elements of the given invoice from the Image.
    Neatly format into a markdown response and return.
    Keep elements like tables intact, do not mess with structure."""
)

PDF_PROCESSOR_CONFIG = types.GenerateContentConfig(
    temperature=0,
    system_instruction="""You are an invoice processor.
    Extract all the major elements of the given invoice from the PDF document.
    Neatly format into a markdown response and return.
    Keep elements like tables intact, do not mess with structure."""
)


In [None]:
from typing import List, Optional
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
import json

@tool
def extract_invoice_data_png(image_file_path: str) -> str:
    """
    Extract invoice contents from a PNG file using multimodal LLM (Gemini).

    Args:
        image_file_path (str): Path to a PNG invoice.

    Returns:
        str: Invoice content in Markdown format, preserving fields and tables.
    """
    with open(image_file_path, "rb") as f:
        data = f.read()
    image_part = types.Part.from_bytes(data=data, mime_type="image/png")

    resp = gemini_client.models.generate_content(
        model=GEMINI_MODEL_NAME,
        contents=["Extract this invoice", image_part],
        config=PNG_PROCESSOR_CONFIG
    )
    return resp.text


@tool
def extract_invoice_data_pdf(pdf_file_path: str) -> str:
    """
    Extract invoice contents from a PDF file using multimodal LLM (Gemini).

    Args:
        pdf_file_path (str): Path to a PDF invoice.

    Returns:
        str: Invoice content in Markdown format, preserving fields and tables.
    """
    with open(pdf_file_path, "rb") as f:
        data = f.read()
    pdf_part = types.Part.from_bytes(data=data, mime_type="application/pdf")

    resp = gemini_client.models.generate_content(
        model=GEMINI_MODEL_NAME,
        contents=["Extract this invoice", pdf_part],
        config=PDF_PROCESSOR_CONFIG
    )
    return resp.text


llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

class LineItem(BaseModel):
    description: str
    quantity: Optional[int]
    unit_price: Optional[float]
    line_total: Optional[float]

class InvoiceSchema(BaseModel):
    invoice_id: str
    vendor: str
    invoice_date: str
    due_date: Optional[str]
    currency: Optional[str]
    line_items: List[LineItem]
    subtotal: Optional[float]
    tax_total: Optional[float]
    grand_total: Optional[float]

@tool
def transform_invoice_data(markdown_invoice: str) -> str:
    """
    Transform extracted invoice Markdown into a structured InvoiceSchema.

    Args:
        markdown_invoice (str): Invoice content in Markdown format.

    Returns:
        str: Parsed JSON invoice conforming to InvoiceSchema.
    """
    structured_llm = llm.with_structured_output(InvoiceSchema)

    prompt = f"""
    Convert the following invoice Markdown into structured data that matches the schema.
    Make sure all totals are calculated correctly.
    Use "USD" for currency when "$" is present.

    Invoice Markdown:
    {markdown_invoice}
    """

    result = structured_llm.invoke(prompt)
    return result.model_dump_json()


@tool
def store_invoice(transformed_invoice: str) -> str:
    """
    Store a structured invoice JSON into the SQLite database.

    Args:
        transformed_invoice (str): Invoice JSON string from transform_invoice_data tool.

    Returns:
        str: Confirmation message with invoice_id.
    """
    conn = sqlite3.connect(INVOICE_DATABASE)
    cur = conn.cursor()
    cur.execute(
        "INSERT INTO invoices(doc, current_status) VALUES (json(?), 'Pending')",
        (transformed_invoice,)
    )
    conn.commit()
    cur.close()
    conn.close()

    inv_id = json.loads(transformed_invoice).get("invoice_id", "UNKNOWN")
    return f"Invoice {inv_id} stored successfully with status Pending."


@tool
def get_pending_invoices() -> str:
    """
    Retrieve all invoices marked as Pending and format them nicely using GPT-4o-mini.

    Args:
       None.

    Returns:
        str: Markdown formatted summary of pending invoices.
    """
    conn = sqlite3.connect(INVOICE_DATABASE)
    cur = conn.cursor()

    rows = cur.execute("""
        SELECT invoice_id, vendor, invoice_date, grand_total, current_status
        FROM invoices WHERE current_status = 'Pending'
    """).fetchall()

    cur.close()
    conn.close()

    if not rows:
        return "No pending invoices found."

    invoices = [
        {
            "invoice_id": r[0],
            "vendor": r[1],
            "invoice_date": r[2],
            "grand_total": r[3],
            "status": r[4]
        }
        for r in rows
    ]

    # Use LLM to format the output into Markdown
    llm_prompt = f"""
    You are given a list of pending invoices as JSON:

    {json.dumps(invoices, indent=2)}

    Present them as a clean Markdown table with columns:
    - Invoice ID
    - Vendor
    - Invoice Date
    - Total (with USD symbol)
    - Status
    """

    response = llm.invoke(llm_prompt)  # llm is your gpt-4o-mini instance
    return response.content

## Multi-Agent System with Supervisor

### Implement Sub-Agents (Worker Agents)

In [None]:
# =========================
# Agents (workers)
# =========================
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI

# base LLM for agents
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# 1) Invoice Extraction Agent  (PNG/PDF -> Markdown)
invoice_extraction_agent = create_react_agent(
    llm,
    tools=[extract_invoice_data_png,
           extract_invoice_data_pdf],
    prompt="""You are an invoice data extraction specialist.

Responsibilities:
- Use the appropriate tool to extract invoice content depending on the file type (PNG or PDF).
- Return the extracted content in Markdown format preserving fields and tables.
- Do not summarize or alter the invoice or add extra fields or explanations."""
)

# 2) Invoice Transformation Agent  (Markdown -> JSON InvoiceSchema)
invoice_transformation_agent = create_react_agent(
    llm,
    tools=[transform_invoice_data],
    prompt="""You are an invoice data transformation specialist.

Responsibilities:
- Take invoice content in Markdown and transform it into structured JSON using the provided tool.
- Ensure totals and calculations are correct.
- Use "USD" when the "$" symbol appears.
- Output must strictly follow the schema used in the tool."""
)

# 3) Invoice Management Agent  (DB store / retrieve pending)
invoice_management_agent = create_react_agent(
    llm,
    tools=[store_invoice,
           get_pending_invoices],
    prompt="""You are an invoice management assistant.

Responsibilities:
- Decide whether the user wants to STORE an invoice or RETRIEVE pending invoices.
  * If structured invoice JSON is provided, use 'store_invoice'.
  * If the user asks for pending invoices, use 'get_pending_invoices'.
- Do not perform both actions unless explicitly asked.
- Never invent invoice data - only use tool outputs.

Output:
- For storage: confirm success with the invoice ID and status as per tool response.
- For retrieval: use tool output to return a clean Markdown table with columns:
  Invoice ID | Vendor | Invoice Date | Total (USD) | Status."""
)


### Create Supervisor Agent

In [None]:
# =========================
# Custom Supervisor
# =========================
from typing import Literal, Annotated
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages
from langgraph.types import Command
from langgraph.graph import StateGraph, START, END
from langchain_core.messages import SystemMessage, AIMessage

# Graph state
class State(TypedDict):
    messages: Annotated[list, add_messages]

# Available members (for structured routing)
members = [
    "invoice_extraction_agent",
    "invoice_transformation_agent",
    "invoice_management_agent",
]

# Structured router schema (same technique you used)
class RouterPath(TypedDict):
    next: Literal[
        "invoice_extraction_agent",
        "invoice_transformation_agent",
        "invoice_management_agent",
        "FINISH"
    ]

SUPERVISOR_PROMPT = f"""You are the Supervisor Agent for managing invoice processing.

Your job is to route to ONE of:
{members[0]}, {members[1]}, {members[2]}, or FINISH.

Routing policy (handles single or multiple invoices):

If there are multiple invoice paths, for each path call the following agents based on their functions:
- invoice_extraction_agent → Used to extract markdown context from an invoice file path
- invoice_transformation_agent → Used to transform markdown context into structured invoice JSON
- invoice_management_agent → Use when the user wants to STORE structured invoice JSON (e.g., “save/insert/persist”)
  or to RETRIEVE pending invoices (e.g., “show/list pending”).
  The management agent will either store the invoices or return a Markdown table of pending invoices.

- FINISH → Use when the objective appears complete (e.g., after storing or after showing pending invoices)
  and the user isn’t asking for anything else.
  Remember to process all invoices using the above agents if there are multiple invoices/

Read the current messages. Decide who should act next.
If the full workflow is complete, respond with FINISH.
"""

def supervisor_node(state: State) -> Command[
    Literal["invoice_extraction_agent",
            "invoice_transformation_agent",
            "invoice_management_agent",
            "__end__"]
]:
    # Ask the LLM to choose the next hop via structured output
    messages = [SystemMessage(content=SUPERVISOR_PROMPT)] + state["messages"]
    response = llm.with_structured_output(RouterPath).invoke(messages)
    goto = response["next"]

    if goto == "FINISH":
        goto=END,

    return Command(goto=goto, update={"next": goto})

### Create Sub-Agents Node Functions

In [None]:
# =========================
# Worker agent node wrapper functions
# =========================
def invoice_extraction_node(state: State) -> Command[Literal["supervisor_agent"]]:
    result = invoice_extraction_agent.invoke(state)
    return Command(
        update={"messages": [AIMessage(content=result["messages"][-1].content,
                                       name="invoice_extraction_agent")]},
        goto="supervisor_agent"
    )

def invoice_transformation_node(state: State) -> Command[Literal["supervisor_agent"]]:
    result = invoice_transformation_agent.invoke(state)
    return Command(
        update={"messages": [AIMessage(content=result["messages"][-1].content,
                                       name="invoice_transformation_agent")]},
        goto="supervisor_agent"
    )

def invoice_management_node(state: State) -> Command[Literal["supervisor_agent"]]:
    result = invoice_management_agent.invoke(state)
    return Command(
        update={"messages": [AIMessage(content=result["messages"][-1].content,
                                       name="invoice_management_agent")]},
        goto="supervisor_agent"
    )


In [None]:
# =========================
# Graph wiring
# =========================
graph_builder = StateGraph(State)

graph_builder.add_node("supervisor_agent", supervisor_node)
graph_builder.add_node("invoice_extraction_agent", invoice_extraction_node)
graph_builder.add_node("invoice_transformation_agent", invoice_transformation_node)
graph_builder.add_node("invoice_management_agent", invoice_management_node)

graph_builder.add_edge(START, "supervisor_agent")

# Edges between workers are driven by each node's `goto="supervisor_agent"`
multi_agent = graph_builder.compile()

In [None]:
from IPython.display import Image, display

display(Image(multi_agent.get_graph(xray=True).draw_mermaid_png()))

### Get Agent Response Formatting Utilities

In [None]:
!gdown 1dSyjcjlFoZpYEqv4P9Oi0-kU2gIoolMB

In [None]:
from agent_utils import format_message
from IPython.display import Markdown, display

def call_agent_system(agent, prompt, verbose=False):

    for event in agent.stream(
        input={"messages": [{"role": "user", "content": prompt}]},
        config={"recursion_limit": 25},
        stream_mode='values' #returns full agent state with all messages including updates
    ):
        if verbose:
            format_message(event["messages"][-1])

    print('\n\nFinal Response:\n')
    display(Markdown(event["messages"][-1].content.replace(r'$', r'\$')))
    return event["messages"]

## Get Sample Invoices

In [None]:
!gdown 1kwylAodEdOR5gYdJ-nIWxRC0TkRbDkas

In [None]:
!unzip -q invoice_receipts.zip -d invoices

In [None]:
!ls invoices

## View Sample Invoices

![](https://i.imgur.com/MgrgaIn.png)

## Load Invoice Paths

In [None]:
from pathlib import Path

folder = Path("invoices")
files = [str(p.resolve()) for p in folder.rglob("*") if p.is_file()]
files

## Test Invoice Processing Agent

In [None]:
prompt = """Please process and store all these invoices now.

Invoices:
{invoices}""".format(invoices=files)
response = call_agent_system(multi_agent, prompt, verbose=True)

In [None]:
prompt = """Show me all pending invoices please"""
response = call_agent_system(multi_agent, prompt, verbose=True)

In [None]:
conn = sqlite3.connect(INVOICE_DATABASE)
cur = conn.cursor()
rows = cur.execute("""
    SELECT *
    FROM invoices WHERE current_status = 'Pending'
""").fetchall()
cur.close()
conn.close()

In [None]:
import pandas as pd

pd.DataFrame([json.loads(record[1]) for record in rows]).T