<a href="https://colab.research.google.com/github/ManeeshUjji/creator-inbox-intelligence-agent/blob/main/notebooks/creator_inbox_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Creator Inbox Intelligence Agent – Final Pipeline

This notebook presents the final, production-ready implementation of the Creator Inbox Intelligence Agent, a multi-agent system designed to automatically triage emails, retrieve relevant knowledge base entries, log actionable tickets, and generate draft responses for a content creator.

The system consolidates the architecture, tools, agents, orchestration flow, and evaluation developed across the previous stages of the project.

## 1. Problem Overview & Goals

Content creators deal with a constant stream of inbound emails brand inquiries, sponsorship requests, platform updates, support issues, and general fan communication. Sorting through all of this manually is time consuming, inconsistent, and easy to mess up, especially when important opportunities are mixed in with noise. The result is lost deals, delayed responses, and operational bottlenecks that slow creators down.

This project builds an AI-powered multi-agent inbox assistant that automates the core steps of managing a creator’s inbox. The system reads incoming emails, classifies them into actionable categories, identifies the priority of each message, retrieves relevant knowledge base snippets, generates draft responses, and logs follow-up tickets when needed. Each agent performs a specialized role, and the orchestrator coordinates them into a unified pipeline.

The goal is to create a system that behaves like a reliable inbox manager: it should correctly triage emails, surface the right knowledge base entries, produce contextually accurate draft replies, and log followup actions consistently. Success means reducing manual workload while increasing response quality and ensuring that important messages receive the attention they deserve.

This notebook presents the final, production-ready version of the Creator Inbox Intelligence Agent, consolidating all components data, tools, agents, orchestration logic, and evaluation into a clean and reproducible workflow.

## 2. Dataset & Schema


The project uses three CSV datasets stored in the `datasets/` directory:

Inbox Dataset - `inbox.csv`
A synthetic dataset representing a creator’s inbox.

Key fields include:


*   email_id
*   thread_id
*   creator_id
* from_name, from_email
* subject, body
* received_datetime_utc
* Other metadata used for classification and ticketing.

Knowledge Base - `knowledge_base.csv`
A collection of text-based entries representing:
* rate cards
* FAQs
* brand deal guidelines
* platform policies
* support instructions

The Knowledge Base Agent searches this dataset to return relevant context for replies.

Tickets - `tickets.csv`
A log of action items created by the agent:
* follow-ups
* negotiation tasks
* billing reminders
* support escalations

The Ticket Logger Tool appends or updates this file when the system determines that an email requires human-level follow-up.

## 3. System Architecture (Agents + Tools)


The Creator Inbox Intelligence Agent follows a modular, multi-agent architecture:

**Triage Agent**

Classifies each email into a structured category (e*.g., Brand/Sponsorship – New Inquiry, Support, Platform Notice*) and assigns a priority level.

**Knowledge Base Agent**

Uses the KB Search Tool to retrieve relevant entries from knowledge_base.csv based on the email subject, body, and assigned category.

**Reply Agent**

Generates a draft response that incorporates the triage decision and retrieved knowledge snippets.

**Ticket Logger Tool**

Creates or updates entries in the tickets.csv file when follow-up action is required.

**Orchestrator Agent**

The central controller that:

Receives the email

1.   Calls the Triage Agent
2.   Calls the KB Agent
3. Optionally creates or updates a ticket
4. Calls the Reply Agent
5. Returns a unified result

This architecture mirrors real-world autonomous email assistants and makes the system scalable, testable, and extensible.

## 4. Environment Setup & Imports


This section initializes dependencies and loads all required Python modules.

It also configures file paths, display settings, and the tools/agents used in the pipeline.

The project expects the following structure:



```
notebooks/
core/
datasets/
evaluation/
```

Imports pull directly from `core/agents` and `core/tools`.

In [8]:
import os
import sys
import pathlib

# If running in Colab, clone the repo and cd into it
if "google.colab" in sys.modules:
    %cd /content
    repo_name = "creator-inbox-intelligence-agent"
    if not os.path.exists(repo_name):
        !git clone https://github.com/ManeeshUjji/creator-inbox-intelligence-agent.git
    %cd /content/creator-inbox-intelligence-agent

# Normalize working directory so that 'datasets/' etc. work
cwd = pathlib.Path().resolve()
if cwd.name == "notebooks":
    repo_root = cwd.parent
else:
    repo_root = cwd

os.chdir(repo_root)

# Make sure repo root is on sys.path for imports like core.agents.*
if str(repo_root) not in sys.path:
    sys.path.append(str(repo_root))

import pandas as pd
from IPython.display import display
from pprint import pprint

from core.tools.code_execution_tool import exec_read_csv
from core.tools.kb_search_tool import build_kb_index, search_kb
from core.agents.orchestrator_agent import OrchestratorAgent

print("Working directory:", os.getcwd())
print("Repo root:", repo_root)

/content
/content/creator-inbox-intelligence-agent
Working directory: /content/creator-inbox-intelligence-agent
Repo root: /content/creator-inbox-intelligence-agent


## 5. Load Data


This section loads the Inbox, Knowledge Base, and Ticket datasets into memory.

Basic shape checks and previews allow quick verification:

* Are all three datasets accessible?
* Are expected columns present?
* Does the inbox data look realistic?

The pipeline depends heavily on these datasets, so validating them early prevents cascading errors later.

In [2]:
from pathlib import Path

data_dir = Path("datasets")

inbox_df = pd.read_csv(data_dir / "inbox.csv")
kb_df = pd.read_csv(data_dir / "knowledge_base.csv")
tickets_df = pd.read_csv(data_dir / "tickets.csv")

print("Inbox shape:", inbox_df.shape)
print("Knowledge base shape:", kb_df.shape)
print("Tickets shape:", tickets_df.shape)

print("\nInbox preview:")
display(inbox_df.head())

print("\nKnowledge base preview:")
display(kb_df.head())

print("\nTickets preview:")
display(tickets_df.head())

Inbox shape: (5, 18)
Knowledge base shape: (5, 13)
Tickets shape: (40, 14)

Inbox preview:


Unnamed: 0,email_id,thread_id,creator_id,from_name,from_email,to_email,subject,body_text,received_datetime_utc,raw_source,category,priority,status,assigned_to,last_action,last_action_datetime_utc,ticket_id,notes
0,1,1001,creator_01,Emma Lopez,emma.brand@prismmarketing.com,creator@email.com,Partnership Opportunity with Prism Marketing,Hi! We love your content and want to discuss a...,2025-01-14T18:22:10Z,raw_email_source_1,Brand/Sponsorship – New Inquiry,P1,open,unassigned,draft_reply,2025-12-01T02:15:59.769518Z,,"Hi Emma Lopez,\n\nThank you for reaching out a..."
1,2,1002,creator_01,David Chen,david.chen@creatorsunite.co,creator@email.com,Collab Inquiry,Huge fan of your work! Want to co-create a sho...,2025-01-13T20:10:44Z,raw_email_source_2,Creator Collaboration,P2,open,unassigned,draft_reply,2025-12-01T02:15:59.817834Z,,"Hi David Chen,\n\nThanks for reaching out abou..."
2,3,1003,creator_01,Meta Support,no-reply@meta.com,creator@email.com,Your Instagram Account Verification Status,Your verification badge review is complete. Pl...,2025-01-12T09:31:55Z,raw_email_source_3,Platform / Account Notifications,P2,open,system,draft_reply,2025-12-01T02:15:59.851549Z,,"Hi Meta Support,\n\nThanks for the update rega..."
3,4,1004,creator_01,Sarah Mills,sarah@brightpr.io,creator@email.com,Interview Request for BrightPR,We’re interviewing rising creators for a digit...,2025-01-11T14:47:28Z,raw_email_source_4,PR / Media / Interviews,P1,open,unassigned,draft_reply,2025-12-01T02:15:59.887450Z,,"Hi Sarah Mills,\n\nThank you for the media / i..."
4,5,1005,creator_01,Unknown Sender,spam@randomoffers.biz,creator@email.com,CLAIM YOUR FREE GIFT CARD,Click here to win a $500 Walmart gift card!,2025-01-10T02:10:03Z,raw_email_source_5,Other / Uncategorized,P3,ignored,unassigned,draft_reply,2025-12-01T02:15:59.935476Z,,"Hi Unknown Sender,\n\nThank you for reaching o..."



Knowledge base preview:


Unnamed: 0,kb_id,creator_id,title,category,tags,content,format,created_datetime_utc,updated_datetime_utc,is_active,language,scope,version
0,1,creator_01,Brand Collaboration – Rate Card,Brand/Sponsorship,"rates,pricing,brand deals","Base rate: $1,200 per sponsored video. Include...",text,2025-01-01T10:00:00Z,2025-01-05T12:00:00Z,True,en,public,v1
1,2,creator_01,Email Reply – New Brand Inquiry Template,Brand/Sponsorship,"template,reply,new inquiry",Thanks so much for reaching out! I'm excited t...,text,2025-01-02T09:15:00Z,2025-01-03T15:45:00Z,True,en,public,v1
2,3,creator_01,Creator Collaboration Guidelines,Creator Collaboration,"guidelines,collab","When collaborating with other creators, mainta...",text,2025-01-03T11:20:00Z,2025-01-05T08:30:00Z,True,en,public,v1
3,4,creator_01,Press/Media Interview Requirements,PR / Media / Interviews,"media,press,interview",For press interviews: request questions in adv...,text,2025-01-04T14:50:00Z,2025-01-06T09:10:00Z,True,en,public,v1
4,5,creator_01,Spam & Phishing Rules,Spam / Phishing / Irrelevant,"spam,security","If the sender domain is unknown, contains susp...",text,2025-01-01T07:25:00Z,2025-01-02T07:25:00Z,True,en,internal,v1



Tickets preview:


Unnamed: 0,ticket_id,creator_id,email_id,type,title,description,priority,status,created_by,assigned_to,created_datetime_utc,updated_datetime_utc,due_datetime_utc,resolution_notes
0,1,creator_01,1,Brand Deal – Follow-up,Send rate card and availability,Brand requested collaboration details. Send ra...,P1,open,orchestrator_agent,inbox_manager,2025-01-14T19:00:00Z,2025-01-14T19:00:00Z,2025-01-18T23:59:59Z,
1,2,creator_01,3,Account Security Check,Review Instagram verification and account status,Platform email about verification status. Log ...,P0,open,triage_agent,creator_01,2025-01-12T10:00:00Z,2025-01-12T10:05:00Z,2025-01-13T23:59:59Z,
2,3,creator_01,4,Media Interview – Coordination,Confirm interest and schedule interview,"Respond to BrightPR to confirm interest, reque...",P2,open,reply_agent,inbox_manager,2025-01-11T15:10:00Z,2025-01-11T15:10:00Z,2025-01-16T23:59:59Z,
3,4,creator_01,1,Brand Deal – Follow-up,Check in on Prism Marketing proposal,Follow up with Emma at Prism Marketing about t...,P1,open,orchestrator_agent,inbox_manager,2025-11-26T08:46:26Z,2025-11-26T08:46:26Z,,
4,5,creator_01,3,Follow-up,Test logging,This is a tool integration test.,P2,open,orchestrator_agent,inbox_manager,2025-11-26T08:48:20Z,2025-11-26T08:48:20Z,,


## 6. Define Tools


This section initializes the two primary system tools:

**KnowledgeBaseSearchTool**

Loads the knowledge base into memory and uses similarity-based search (string matching or embeddings depending on implementation) to return relevant entries for any email.

**TicketLoggerTool**

Handles writing new tickets and updating existing ones inside `tickets.csv`.
This keeps human follow-up tasks organized and traceable.
Tools are lightweight components agents call them when needed.

In [9]:
from core.tools.kb_search_tool import build_kb_index, search_kb

build_kb_index()  # builds TF-IDF index over the KB

sample_email = inbox_df.iloc[0]
sample_query = f"{sample_email.get('subject', '')} {sample_email.get('body', '')}"

kb_hits = search_kb(sample_query, top_k=5, min_score=0.1)

print("Sample KB search query:")
print(sample_query[:250], "...\n")

print("Top KB hits:")
for hit in kb_hits[:3]:
    print(f"- kb_id={hit['kb_id']}, title={hit['title']}, score={hit['similarity_score']:.3f}")

Sample KB search query:
Partnership Opportunity with Prism Marketing  ...

Top KB hits:
- kb_id=2, title=Email Reply – New Brand Inquiry Template, score=0.179


## 7. Define Agents


Each agent encapsulates a specific role:

* **Triage Agent:** categorizes & prioritizes each email
* **KB Agent:** retrieves relevant knowledge entries
* **Reply Agent:** drafts responses using triage + KB context
* **Orchestrator Agent:** central pipeline manager

Defining them here keeps the notebook clean and separates logic from data processing.

In [15]:
from core.agents.triage_agent import TriageAgent
from core.agents.kb_agent import KnowledgeBaseAgent
from core.agents.reply_agent import ReplyAgent
from core.agents.orchestrator_agent import OrchestratorAgent


def build_inbox_agent(
    kb_top_k: int = 5,
    kb_min_score: float = 0.05,
):
    """
    Build the full multi-agent inbox assistant.

    Components:
    - TriageAgent: classifies each email and assigns a priority
    - KnowledgeBaseAgent: retrieves relevant KB entries
    - ReplyAgent: drafts replies + suggests follow-up actions
    - OrchestratorAgent: coordinates the whole flow

    Note:
    The OrchestratorAgent in core/ already wires these agents together
    internally. We still construct the individual agents here so the
    architecture is explicit for anyone reading the notebook.
    """

    triage_agent = TriageAgent()
    kb_agent = KnowledgeBaseAgent(top_k=kb_top_k, min_score=kb_min_score)
    reply_agent = ReplyAgent()

    # OrchestratorAgent in your repo currently constructs its own internal agents
    # in __init__, but we keep triage_agent/kb_agent/reply_agent here for clarity
    # and future extension if you ever change the signature.
    orchestrator = OrchestratorAgent()

    return {
        "triage_agent": triage_agent,
        "kb_agent": kb_agent,
        "reply_agent": reply_agent,
        "orchestrator": orchestrator,
    }


agents = build_inbox_agent()

triage_agent = agents["triage_agent"]
kb_agent = agents["kb_agent"]
reply_agent = agents["reply_agent"]
orchestrator = agents["orchestrator"]

def run_agent_on_email(email_id: int):
    result = orchestrator.process_email(email_id)
    pprint({
        "email_id": email_id,
        "category": getattr(result, "category", None),
        "priority": getattr(result, "priority", None),
        "reply_subject": getattr(result, "reply_subject", None),
        "reply_body": getattr(result, "reply_body", None),
    })

print("Multi-agent inbox assistant built:")
print(" - triage_agent:", type(triage_agent).__name__)
print(" - kb_agent:", type(kb_agent).__name__)
print(" - reply_agent:", type(reply_agent).__name__)
print(" - orchestrator:", type(orchestrator).__name__)

run_agent_on_email(1)

[ORCH] 2025-12-01 07:01:05,084 - INFO - Processing email_id=1
INFO:core.agents.orchestrator_agent:Processing email_id=1
[ORCH] 2025-12-01 07:01:05,103 - INFO - Loaded email_id=1 in thread_id=1001 (thread length=1)
INFO:core.agents.orchestrator_agent:Loaded email_id=1 in thread_id=1001 (thread length=1)
[ORCH] 2025-12-01 07:01:05,124 - INFO - Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
INFO:core.agents.orchestrator_agent:Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
[ORCH] 2025-12-01 07:01:05,138 - INFO - KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
INFO:core.agents.orchestrator_agent:KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
[ORCH] 2025-12-0

Multi-agent inbox assistant built:
 - triage_agent: TriageAgent
 - kb_agent: KnowledgeBaseAgent
 - reply_agent: ReplyAgent
 - orchestrator: OrchestratorAgent
{'category': None,
 'email_id': 1,
 'priority': None,
 'reply_body': 'Hi Emma Lopez,\n'
               '\n'
               'Thank you for reaching out and thinking of us for this '
               'campaign! (based on our internal guidelines: Email Reply – New '
               'Brand Inquiry Template)\n'
               '\n'
               "We'd love to learn more about the product, timeline, "
               'deliverables, and budget for this collaboration. \n'
               'Please share:\n'
               '- Your ideal posting dates\n'
               '- Required deliverables (e.g. 1x sponsored video, stories, '
               'usage rights)\n'
               '- Proposed budget and any specific guidelines\n'
               '\n'
               'Looking forward to hearing from you.\n'
               '\n'
               'Best,\n'
  

## 8. Orchestrator: End-to-End Flow


The orchestrator runs the entire multi-agent sequence:

1. Input: a single email as a dictionary
2. Output: a structured result containing:
      * category + priority
      * selected KB entries
      * optional ticket data
      * draft reply text
      * internal agent decisions

This section demonstrates the full pipeline on a single example email, letting readers understand how the system behaves end-to-end.

In [10]:
orchestrator = OrchestratorAgent()

sample_row = inbox_df.iloc[0]
sample_email_id = int(sample_row["email_id"])

import time

start = time.time()
single_result = orchestrator.process_email(sample_email_id)
end = time.time()

print(f"Processed email_id={sample_email_id} in {end - start:.3f} seconds\n")

single_summary = {
    "email_id": sample_email_id,
    "from_email": sample_row.get("from_email"),
    "subject": sample_row.get("subject"),
    "true_category": sample_row.get("category"),
    "true_priority": sample_row.get("priority"),
    "predicted_category": getattr(single_result, "category", None),
    "predicted_priority": getattr(single_result, "priority", None),
    "reply_subject": getattr(single_result, "reply_subject", None),
    "reply_body": getattr(single_result, "reply_body", None),
    "follow_up_action": getattr(single_result, "follow_up_action", None),
}

pprint(single_summary)


[ORCH] 2025-12-01 06:55:25,157 - INFO - Processing email_id=1
INFO:core.agents.orchestrator_agent:Processing email_id=1
[ORCH] 2025-12-01 06:55:25,167 - INFO - Loaded email_id=1 in thread_id=1001 (thread length=1)
INFO:core.agents.orchestrator_agent:Loaded email_id=1 in thread_id=1001 (thread length=1)
[ORCH] 2025-12-01 06:55:25,175 - INFO - Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
INFO:core.agents.orchestrator_agent:Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
[ORCH] 2025-12-01 06:55:25,181 - INFO - KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
INFO:core.agents.orchestrator_agent:KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
[ORCH] 2025-12-0

Processed email_id=1 in 0.052 seconds

{'email_id': 1,
 'follow_up_action': {'create_ticket': True,
                      'ticket_payload': {'creator_id': 'creator_01',
                                         'description': 'Important '
                                                        'opportunity: '
                                                        'Brand/Sponsorship – '
                                                        'New Inquiry',
                                         'email_id': 1,
                                         'priority': 'P1',
                                         'title': 'Follow-up: '
                                                  'Brand/Sponsorship – New '
                                                  'Inquiry',
                                         'type': 'Follow-up'}},
 'from_email': 'emma.brand@prismmarketing.com',
 'predicted_category': None,
 'predicted_priority': None,
 'reply_body': 'Hi Emma Lopez,\n'
               '\n'

In [16]:
# Helper to pretty-print a single email + agent output

def pretty_print_email_run(email_id: int):
    """Run the orchestrator on a single email and print a readable summary."""
    # Locate row in inbox
    row = inbox_df.loc[inbox_df["email_id"] == email_id]
    if row.empty:
        print(f"No email found with email_id={email_id}")
        return

    row = row.iloc[0]
    result = orchestrator.process_email(int(email_id))

    print("=" * 80)
    print(f"EMAIL ID: {email_id}")
    print(f"FROM: {row.get('from_name')} <{row.get('from_email')}>")
    print(f"SUBJECT: {row.get('subject')}")
    print("-" * 80)
    print("BODY:")
    print(row.get("body"))
    print("=" * 80)
    print("GROUND TRUTH")
    print(f"- Category: {row.get('category')}")
    print(f"- Priority: {row.get('priority')}")
    print("=" * 80)
    print("AGENT PREDICTION")
    print(f"- Predicted category: {getattr(result, 'category', None)}")
    print(f"- Predicted priority: {getattr(result, 'priority', None)}")
    print(f"- Follow-up action: {getattr(result, 'follow_up_action', None)}")
    print("=" * 80)
    print("DRAFT REPLY")
    reply_subject = getattr(result, "reply_subject", None)
    reply_body = getattr(result, "reply_body", None)

    if reply_subject:
        print(f"Subject: {reply_subject}")
    if reply_body:
        print(reply_body)
    else:
        print("(No reply generated)")
    print("=" * 80)


# Example usage:
pretty_print_email_run(int(inbox_df.iloc[0]["email_id"]))

[ORCH] 2025-12-01 07:03:10,602 - INFO - Processing email_id=1
INFO:core.agents.orchestrator_agent:Processing email_id=1
[ORCH] 2025-12-01 07:03:10,612 - INFO - Loaded email_id=1 in thread_id=1001 (thread length=1)
INFO:core.agents.orchestrator_agent:Loaded email_id=1 in thread_id=1001 (thread length=1)
[ORCH] 2025-12-01 07:03:10,620 - INFO - Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
INFO:core.agents.orchestrator_agent:Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
[ORCH] 2025-12-01 07:03:10,624 - INFO - KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
INFO:core.agents.orchestrator_agent:KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
[ORCH] 2025-12-0

EMAIL ID: 1
FROM: Emma Lopez <emma.brand@prismmarketing.com>
SUBJECT: Partnership Opportunity with Prism Marketing
--------------------------------------------------------------------------------
BODY:
None
GROUND TRUTH
- Category: Brand/Sponsorship – New Inquiry
- Priority: P1
AGENT PREDICTION
- Predicted category: None
- Predicted priority: None
- Follow-up action: {'create_ticket': True, 'ticket_payload': {'creator_id': 'creator_01', 'email_id': 1, 'type': 'Follow-up', 'title': 'Follow-up: Brand/Sponsorship – New Inquiry', 'description': 'Important opportunity: Brand/Sponsorship – New Inquiry', 'priority': 'P1'}}
DRAFT REPLY
Subject: Re: Partnership Opportunity with Prism Marketing
Hi Emma Lopez,

Thank you for reaching out and thinking of us for this campaign! (based on our internal guidelines: Email Reply – New Brand Inquiry Template)

We'd love to learn more about the product, timeline, deliverables, and budget for this collaboration. 
Please share:
- Your ideal posting dates
- R

## 9. Run the Pipeline on Sample Emails


To showcase real behavior, this section runs the orchestrator on a subset of inbox emails (e.g., first 5 entries).

Each output displays:

* triage category
* priority level
* KB matches
* generated reply
* whether a ticket was created

This gives an immediate, tangible view of how the agent processes realistic communication.

In [11]:
from tqdm.auto import tqdm

def run_pipeline_on_subset(df_inbox: pd.DataFrame, n: int = 5) -> pd.DataFrame:
    """
    Run the orchestrator on the first n emails and collect
    basic fields for inspection.
    """
    records = []
    n = min(n, len(df_inbox))

    for _, row in tqdm(df_inbox.head(n).iterrows(), total=n):
        email_dict = row.to_dict()
        email_id = int(email_dict["email_id"])

        start = time.time()
        result = orchestrator.process_email(email_id)
        end = time.time()

        rec = {
            "email_id": email_id,
            "from_email": email_dict.get("from_email"),
            "subject": email_dict.get("subject"),
            "true_category": email_dict.get("category"),
            "true_priority": email_dict.get("priority"),
            "pred_category": getattr(result, "category", None),
            "pred_priority": getattr(result, "priority", None),
            "latency_sec": end - start,
        }

        records.append(rec)

    return pd.DataFrame(records)

demo_df = run_pipeline_on_subset(inbox_df, n=5)
demo_df

  0%|          | 0/5 [00:00<?, ?it/s]

[ORCH] 2025-12-01 06:56:25,574 - INFO - Processing email_id=1
INFO:core.agents.orchestrator_agent:Processing email_id=1
[ORCH] 2025-12-01 06:56:25,585 - INFO - Loaded email_id=1 in thread_id=1001 (thread length=1)
INFO:core.agents.orchestrator_agent:Loaded email_id=1 in thread_id=1001 (thread length=1)
[ORCH] 2025-12-01 06:56:25,594 - INFO - Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
INFO:core.agents.orchestrator_agent:Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
[ORCH] 2025-12-01 06:56:25,600 - INFO - KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
INFO:core.agents.orchestrator_agent:KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
[ORCH] 2025-12-0

Unnamed: 0,email_id,from_email,subject,true_category,true_priority,pred_category,pred_priority,latency_sec
0,1,emma.brand@prismmarketing.com,Partnership Opportunity with Prism Marketing,Brand/Sponsorship – New Inquiry,P1,,,0.053285
1,2,david.chen@creatorsunite.co,Collab Inquiry,Creator Collaboration,P2,,,0.033314
2,3,no-reply@meta.com,Your Instagram Account Verification Status,Platform / Account Notifications,P2,,,0.033155
3,4,sarah@brightpr.io,Interview Request for BrightPR,PR / Media / Interviews,P1,,,0.046187
4,5,spam@randomoffers.biz,CLAIM YOUR FREE GIFT CARD,Other / Uncategorized,P3,,,0.036575


## 10. Evaluation & Metrics


This section summarizes the evaluation procedure developed in Part 5.
Evaluation focuses on:

* Classification performance (category and priority accuracy)
* Behavioral correctness (whether ticket creation was appropriate)
* Quality indicators (coherence of generated replies, relevance of KB retrieval)

Stored evaluation artifacts include:

* metrics_summary.json
* evaluation_report.csv
* onfusion matrix plots (if included in your repo)

Readers should come away with a clear understanding of how well the agent performs and where it struggles.

In [12]:
import pandas as pd
import time

from core.agents.orchestrator_agent import OrchestratorAgent
from core.tools.code_execution_tool import exec_read_csv
from tqdm.auto import tqdm

def evaluate_email(email_row):
    """
    Runs a single email through the entire multi-agent pipeline:
    triage → KB retrieval → reply generation.
    Uses the existing design where the orchestrator
    takes an email_id and loads the row internally.
    """

    # Normalize to dict so we can grab ground-truth fields
    if isinstance(email_row, pd.Series):
        email_dict = email_row.to_dict()
    else:
        email_dict = dict(email_row)

    # The orchestrator expects an email_id, not the whole row
    email_id = int(email_dict["email_id"])

    start = time.time()

    orchestrator = OrchestratorAgent()
    result = orchestrator.process_email(email_id)

    end = time.time()

    # Handle ReplyResult-style object safely
    pred_category = getattr(result, "category", None)
    pred_priority = getattr(result, "priority", None)
    kb_used = getattr(result, "kb_entries", [])
    # some implementations name it reply_text, some draft_reply, so be defensive
    reply_text = getattr(result, "reply_text", None)
    if reply_text is None:
        reply_text = getattr(result, "draft_reply", "")

    return {
        "email_id": email_id,
        "true_category": email_dict.get("category"),
        "true_priority": email_dict.get("priority"),
        "pred_category": pred_category,
        "pred_priority": pred_priority,
        "kb_used": kb_used,
        "reply_text": reply_text,
        "latency_sec": end - start,
    }

print("Function exists:", callable(globals().get("evaluate_email")))

def evaluate_dataset(df_inbox: pd.DataFrame) -> pd.DataFrame:
    """
    Runs evaluate_email over an entire inbox dataframe.
    Returns a DataFrame where each row is one evaluation record.
    """
    records = []

    # tqdm just gives you a progress bar in Colab
    for _, row in tqdm(df_inbox.iterrows(), total=len(df_inbox)):
        rec = evaluate_email(row)
        records.append(rec)

    df_eval = pd.DataFrame(records)
    return df_eval

df_inbox_for_eval = exec_read_csv("inbox.csv")
df_eval = evaluate_dataset(df_inbox_for_eval)
df_eval.head()

Function exists: True


  0%|          | 0/5 [00:00<?, ?it/s]

[ORCH] 2025-12-01 06:56:51,606 - INFO - Processing email_id=1
INFO:core.agents.orchestrator_agent:Processing email_id=1
[ORCH] 2025-12-01 06:56:51,619 - INFO - Loaded email_id=1 in thread_id=1001 (thread length=1)
INFO:core.agents.orchestrator_agent:Loaded email_id=1 in thread_id=1001 (thread length=1)
[ORCH] 2025-12-01 06:56:51,629 - INFO - Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
INFO:core.agents.orchestrator_agent:Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
[ORCH] 2025-12-01 06:56:51,636 - INFO - KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
INFO:core.agents.orchestrator_agent:KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
[ORCH] 2025-12-0

Unnamed: 0,email_id,true_category,true_priority,pred_category,pred_priority,kb_used,reply_text,latency_sec
0,1,Brand/Sponsorship – New Inquiry,P1,,,[],,0.054182
1,2,Creator Collaboration,P2,,,[],,0.033312
2,3,Platform / Account Notifications,P2,,,[],,0.034752
3,4,PR / Media / Interviews,P1,,,[],,0.043643
4,5,Other / Uncategorized,P3,,,[],,0.034736


In [13]:
#Load saved evaluation artifacts (if present)

import json
from pathlib import Path

artifacts_dir = Path("evaluation") / "artifacts"
metrics_path = artifacts_dir / "metrics_summary.json"
report_path = artifacts_dir / "evaluation_report.csv"

sanity = {}

if metrics_path.exists():
    with open(metrics_path, "r") as f:
        metrics_summary = json.load(f)
    sanity["metrics_summary"] = metrics_summary
    print("Loaded metrics_summary.json:\n")
    pprint(metrics_summary)
else:
    print(f"metrics_summary.json not found at: {metrics_path}")

if report_path.exists():
    df_eval_file = pd.read_csv(report_path)
    sanity["df_eval_shape"] = df_eval_file.shape
    print("\nEvaluation report preview (evaluation_report.csv):")
    display(df_eval_file.head())
else:
    print(f"evaluation_report.csv not found at: {report_path}")

sanity

Loaded metrics_summary.json:

{'category_metrics': {'accuracy': 0.0,
                      'f1_macro': 0.0,
                      'precision_macro': 0.0,
                      'recall_macro': 0.0},
 'n_samples': 5,
 'priority_metrics': {'avg_distance_penalty': 2.2,
                      'macro_f1_priority': 0.0,
                      'weighted_accuracy': 0.1}}

Evaluation report preview (evaluation_report.csv):


Unnamed: 0,email_id,true_category,true_priority,pred_category,pred_priority,kb_used,reply_text,latency_sec
0,1,Brand/Sponsorship – New Inquiry,P1,,,[],,0.194015
1,2,Creator Collaboration,P2,,,[],,0.116806
2,3,Platform / Account Notifications,P2,,,[],,0.051782
3,4,PR / Media / Interviews,P1,,,[],,0.08446
4,5,Other / Uncategorized,P3,,,[],,0.139103


{'metrics_summary': {'n_samples': 5,
  'category_metrics': {'accuracy': 0.0,
   'precision_macro': 0.0,
   'recall_macro': 0.0,
   'f1_macro': 0.0},
  'priority_metrics': {'weighted_accuracy': 0.1,
   'macro_f1_priority': 0.0,
   'avg_distance_penalty': 2.2}},
 'df_eval_shape': (5, 8)}

In [17]:
# Mark misclassified rows in df_eval (if not already present)

if "is_correct_category" not in df_eval.columns:
    df_eval["is_correct_category"] = (
        df_eval["true_category"] == df_eval["pred_category"]
    )

if "is_correct_priority" not in df_eval.columns and "true_priority" in df_eval.columns:
    df_eval["is_correct_priority"] = (
        df_eval["true_priority"] == df_eval["pred_priority"]
    )

print("Evaluation dataframe shape:", df_eval.shape)
display(df_eval.head())

Evaluation dataframe shape: (5, 10)


Unnamed: 0,email_id,true_category,true_priority,pred_category,pred_priority,kb_used,reply_text,latency_sec,is_correct_category,is_correct_priority
0,1,Brand/Sponsorship – New Inquiry,P1,,,[],,0.054182,False,False
1,2,Creator Collaboration,P2,,,[],,0.033312,False,False
2,3,Platform / Account Notifications,P2,,,[],,0.034752,False,False
3,4,PR / Media / Interviews,P1,,,[],,0.043643,False,False
4,5,Other / Uncategorized,P3,,,[],,0.034736,False,False


In [18]:
# Show some misclassified examples for manual review

misclassified_cat = df_eval[df_eval["is_correct_category"] == False].copy()
misclassified_pri = df_eval[
    (df_eval["is_correct_priority"] == False)
    if "is_correct_priority" in df_eval.columns
    else []
].copy()

print(f"Total misclassified by category: {len(misclassified_cat)}")
if len(misclassified_cat):
    display(misclassified_cat.head())

if "is_correct_priority" in df_eval.columns:
    print(f"\nTotal misclassified by priority: {len(misclassified_pri)}")
    if len(misclassified_pri):
        display(misclassified_pri.head())

Total misclassified by category: 5


Unnamed: 0,email_id,true_category,true_priority,pred_category,pred_priority,kb_used,reply_text,latency_sec,is_correct_category,is_correct_priority
0,1,Brand/Sponsorship – New Inquiry,P1,,,[],,0.054182,False,False
1,2,Creator Collaboration,P2,,,[],,0.033312,False,False
2,3,Platform / Account Notifications,P2,,,[],,0.034752,False,False
3,4,PR / Media / Interviews,P1,,,[],,0.043643,False,False
4,5,Other / Uncategorized,P3,,,[],,0.034736,False,False



Total misclassified by priority: 5


Unnamed: 0,email_id,true_category,true_priority,pred_category,pred_priority,kb_used,reply_text,latency_sec,is_correct_category,is_correct_priority
0,1,Brand/Sponsorship – New Inquiry,P1,,,[],,0.054182,False,False
1,2,Creator Collaboration,P2,,,[],,0.033312,False,False
2,3,Platform / Account Notifications,P2,,,[],,0.034752,False,False
3,4,PR / Media / Interviews,P1,,,[],,0.043643,False,False
4,5,Other / Uncategorized,P3,,,[],,0.034736,False,False


In [19]:
# If there are misclassified examples, inspect one with pretty_print_email_run

if not misclassified_cat.empty:
    bad_email_id = int(misclassified_cat.iloc[0]["email_id"])
    print(f"Inspecting misclassified email_id={bad_email_id}")
    pretty_print_email_run(bad_email_id)
else:
    print("No misclassified examples to inspect for category.")

[ORCH] 2025-12-01 07:04:52,710 - INFO - Processing email_id=1
INFO:core.agents.orchestrator_agent:Processing email_id=1
[ORCH] 2025-12-01 07:04:52,732 - INFO - Loaded email_id=1 in thread_id=1001 (thread length=1)
INFO:core.agents.orchestrator_agent:Loaded email_id=1 in thread_id=1001 (thread length=1)
[ORCH] 2025-12-01 07:04:52,750 - INFO - Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
INFO:core.agents.orchestrator_agent:Triage: category='Brand/Sponsorship – New Inquiry', priority='P1'
[ORCH] 2025-12-01 07:04:52,757 - INFO - KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
INFO:core.agents.orchestrator_agent:KB search: query='Partnership Opportunity with Prism Marketing Hi! We love your content and want to discuss a paid collaboration for our new product launch. Brand/Sponsorship – New Inquiry', hits=3
[ORCH] 2025-12-0

Inspecting misclassified email_id=1
EMAIL ID: 1
FROM: Emma Lopez <emma.brand@prismmarketing.com>
SUBJECT: Partnership Opportunity with Prism Marketing
--------------------------------------------------------------------------------
BODY:
None
GROUND TRUTH
- Category: Brand/Sponsorship – New Inquiry
- Priority: P1
AGENT PREDICTION
- Predicted category: None
- Predicted priority: None
- Follow-up action: {'create_ticket': True, 'ticket_payload': {'creator_id': 'creator_01', 'email_id': 1, 'type': 'Follow-up', 'title': 'Follow-up: Brand/Sponsorship – New Inquiry', 'description': 'Important opportunity: Brand/Sponsorship – New Inquiry', 'priority': 'P1'}}
DRAFT REPLY
Subject: Re: Partnership Opportunity with Prism Marketing
Hi Emma Lopez,

Thank you for reaching out and thinking of us for this campaign! (based on our internal guidelines: Email Reply – New Brand Inquiry Template)

We'd love to learn more about the product, timeline, deliverables, and budget for this collaboration. 
Please s

## 11. Discussion & Limitations


This project demonstrates that a multi-agent pipeline can successfully automate key parts of a creator’s inbox workflow. However, several limitations remain:

* The inbox dataset is synthetic and small; real-world data would introduce more complexity.
* Categories rely on a rule-based or model-based classifier that could be improved with embeddings or fine-tuned models.
* Knowledge base retrieval may miss subtle matches or depend on keyword overlap.
* Reply generation quality depends heavily on prompt structure and context.
Ticket creation logic is deterministic and could be made more adaptive.

These limitations provide clear opportunities for improvement in a production system.

## 12. Future Work


Potential enhancements include:

* Vector Search / Embeddings for smarter knowledge base retrieval
* Fine-tuned triage model trained on real creator inbox data
* Automated negotiation agent for brand deals
* Scam / fraud detector for suspicious emails
* Real email integration using Gmail or IMAP APIs
* Interactive dashboard for evaluating agent performance
* Better logging and observability for monitoring system drift

The current version establishes a strong architectural foundation that can evolve toward a full autonomous inbox management platform.