# Week 12 – Graded Project (Use Case 5)
## Supply Chain & Operations – Incident Management (CrewAI Multi‑Agent Workflow)

**Goal:** Classify operational incidents, apply SLA/rules, recommend actions, and decide escalation/priority.

**Agents (4):**
1. Incident Classification Agent
2. Operations Rules & SLA Reasoning Agent
3. Action Recommendation Agent
4. Escalation & Priority Decision Agent

> Notes  
> • This notebook uses **static policies + mock rules** (no RAG).  
> • If you see **OpenAI connection errors**, verify internet/proxy and that `OPENAI_API_KEY` is set.


## 1) Environment Setup

### 1.1 Install dependencies
Run (once) in your activated environment:

```bash
pip install -U crewai langchain-openai pydantic python-dotenv ipykernel jupyter
```

### 1.2 Set API key
You can set it as an environment variable (recommended) or in a `.env` file.


In [None]:
import os

# --- Optional: reduce noisy telemetry / tracing network calls ---
# These environment variables are safe no-ops if not used by your setup,
# but often help in restricted corporate networks.
os.environ.setdefault("OTEL_SDK_DISABLED", "true")
os.environ.setdefault("CREWAI_TRACING_ENABLED", "false")
os.environ.setdefault("CREWAI_DISABLE_TELEMETRY", "true")

print("OPENAI_API_KEY set:", bool(os.environ.get("OPENAI_API_KEY")))


## 2) Imports

In [None]:
import json
import re
import time
from typing import Any, Dict, List, Optional

from pydantic import BaseModel, Field

from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI


## 3) Static Policies / SLA Rules (Non‑RAG)

In [None]:
OPERATIONS_POLICY = """You are operating under these simplified (static) supply‑chain policies:

A) Incident types:
- shipment_delay: delayed pickup/in-transit/out-for-delivery beyond SLA
- inventory_mismatch: physical vs system stock mismatch
- cold_chain_breach: temperature excursion / reefer issue for perishable goods
- vendor_issue: vendor non-compliance, missed ASN, quality/packaging issues
- process_failure: label failure, WMS outage, routing failure, documentation issue
- general_query: anything else

B) SLA bands (minutes/hours are examples):
- P0 (Critical): safety, perishables spoilage, widespread outage, theft pattern
  Target response: 15 min, mitigation immediately, escalate to on-call manager
- P1 (High): high-value shipment missing, repeated failures, customer-committed breach
  Target response: 1 hour, mitigation same day, escalate if unresolved
- P2 (Medium): single delay <24h, minor mismatch, vendor late but recoverable
  Target response: 4 hours, mitigation within 1-2 days
- P3 (Low): informational requests, minor status checks
  Target response: 1 business day

C) Escalation triggers:
- Cold chain breach OR temperature excursion => always escalate (P0)
- High-value loss/theft suspected => escalate (P1/P0)
- Any incident with confidence < 0.60 => escalate (uncertainty)
- Urgent keywords: 'urgent', 'ASAP', 'today', 'stuck at customs' increases priority
- Repeated issue keywords: 'again', 'third time', 'recurring' increases priority

D) Allowed actions (examples):
- Create incident ticket, notify carrier, initiate trace, place hold, re-route
- Trigger cycle count, quarantine inventory, block vendor, request CAPA
- Dispatch cold chain recovery SOP, move to backup reefer, quality inspection
"""

## 4) Data Models (Structured Outputs)

In [None]:
class IncidentClassification(BaseModel):
    incident_type: str = Field(..., description="One of the policy incident types")
    confidence: float = Field(..., ge=0.0, le=1.0)
    key_entities: List[str] = Field(default_factory=list, description="Ids like shipment/order/SKU/vendor/location")
    summary: str

class SLADecision(BaseModel):
    priority: str = Field(..., description="P0/P1/P2/P3")
    sla_response_time: str = Field(..., description="e.g., '15 min', '1 hour'")
    policy_notes: str
    allowed_actions: List[str] = Field(default_factory=list)

class ActionPlan(BaseModel):
    immediate_actions: List[str]
    next_steps_24h: List[str]
    info_needed: List[str] = Field(default_factory=list)

class EscalationDecision(BaseModel):
    escalate: bool
    escalation_reason: str = ""

class IncidentHandoff(BaseModel):
    customer_query: str
    incident_type: Optional[str] = None
    confidence: Optional[float] = None
    priority: Optional[str] = None
    sla_response_time: Optional[str] = None
    policy_notes: Optional[str] = None
    allowed_actions: List[str] = Field(default_factory=list)
    immediate_actions: List[str] = Field(default_factory=list)
    next_steps_24h: List[str] = Field(default_factory=list)
    info_needed: List[str] = Field(default_factory=list)
    escalate: bool = False
    escalation_reason: str = ""
    risk_score: Optional[int] = None


## 5) LLM + Agents

In [None]:
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

intent_agent = Agent(
    role="Incident Classification Agent",
    goal="Classify the incident type and extract key entities for an operations team.",
    backstory="You route operational incidents for a supply chain control tower.",
    llm=llm,
    verbose=False,
)

sla_agent = Agent(
    role="Operations Rules & SLA Reasoning Agent",
    goal="Apply the static policy to decide priority, SLA response time, and allowed actions.",
    backstory="You enforce operations policy and SLA bands consistently.",
    llm=llm,
    verbose=False,
)

action_agent = Agent(
    role="Action Recommendation Agent",
    goal="Propose practical immediate actions and next steps based on incident type and SLA.",
    backstory="You are a senior operations coordinator who knows standard playbooks.",
    llm=llm,
    verbose=False,
)

escalation_agent = Agent(
    role="Escalation & Priority Decision Agent",
    goal="Decide whether to escalate and why, based on risk/urgency/uncertainty and policy triggers.",
    backstory="You are the on-call incident commander who escalates only when needed.",
    llm=llm,
    verbose=False,
)


## 6) Prompts (Templates)

In [None]:
INTENT_PROMPT = """{policy}

You are the Incident Classification Agent.
Classify the customer query into ONE of:
- shipment_delay
- inventory_mismatch
- cold_chain_breach
- vendor_issue
- process_failure
- general_query

Return ONLY valid JSON with keys:
incident_type (string), confidence (number 0..1), key_entities (array of strings), summary (string)

Customer query:
{customer_query}
"""

SLA_PROMPT = """{policy}

You are the Operations Rules & SLA Reasoning Agent.
Given:
- customer_query: {customer_query}
- incident_type: {incident_type}
- confidence: {confidence}
Decide:
- priority: P0/P1/P2/P3
- sla_response_time: string
- policy_notes: short explanation referencing the static policy
- allowed_actions: list of actions allowed/appropriate

Return ONLY valid JSON with keys:
priority, sla_response_time, policy_notes, allowed_actions
"""

ACTION_PROMPT = """{policy}

You are the Action Recommendation Agent.
Context:
customer_query: {customer_query}
incident_type: {incident_type}
priority: {priority}
sla_response_time: {sla_response_time}
allowed_actions: {allowed_actions}

Return ONLY valid JSON with keys:
immediate_actions (array), next_steps_24h (array), info_needed (array)
"""

ESCALATION_PROMPT = """{policy}

You are the Escalation & Priority Decision Agent.
Context:
customer_query: {customer_query}
incident_type: {incident_type}
confidence: {confidence}
priority: {priority}
immediate_actions: {immediate_actions}

Apply policy triggers and decide escalation.
Return ONLY valid JSON with keys:
escalate (true/false), escalation_reason (string)
"""

## 7) Helpers: JSON parsing, mini-crews, risk scoring

In [None]:
def parse_json_strict(text: str) -> Dict[str, Any]:
    text = str(text).strip()
    try:
        return json.loads(text)
    except Exception:
        # extract first JSON object block
        match = re.search(r"\{[\s\S]*\}", text)
        if not match:
            raise ValueError(f"No JSON found in output. Raw (first 500 chars):\n{text[:500]}")
        return json.loads(match.group(0))

def run_one(agent: Agent, prompt: str, expected_output: str, inputs: Dict[str, Any], verbose: bool = False) -> Dict[str, Any]:
    task = Task(description=prompt, expected_output=expected_output, agent=agent)
    crew = Crew(agents=[agent], tasks=[task], verbose=verbose)
    result = crew.kickoff(inputs=inputs)
    return parse_json_strict(str(result))

def simple_risk_score(query: str, incident_type: str, confidence: float) -> int:
    q = query.lower()
    score = 0

    # base by incident type
    if incident_type == "cold_chain_breach":
        score += 80
    elif incident_type in {"shipment_delay", "inventory_mismatch", "vendor_issue", "process_failure"}:
        score += 35
    else:
        score += 15

    # urgency / repetition
    if any(k in q for k in ["urgent", "asap", "today", "immediately"]):
        score += 15
    if any(k in q for k in ["again", "recurring", "third time", "multiple times"]):
        score += 10

    # loss / theft / missing high value
    if any(k in q for k in ["stolen", "theft", "missing", "never received", "lost"]):
        score += 15
    if any(k in q for k in ["expensive", "high value", "critical"]):
        score += 10

    # uncertainty
    if confidence < 0.60:
        score += 20

    return max(0, min(100, score))


## 8) Orchestrator (Goal‑Oriented Multi‑Agent Workflow)

In [None]:
def run_incident_workflow(customer_query: str, verbose: bool = False) -> IncidentHandoff:
    # 1) classify
    intent_out = run_one(
        intent_agent,
        INTENT_PROMPT,
        expected_output="JSON: incident_type, confidence, key_entities, summary",
        inputs={"policy": OPERATIONS_POLICY, "customer_query": customer_query},
        verbose=verbose
    )
    incident_type = intent_out.get("incident_type", "general_query")
    confidence = float(intent_out.get("confidence", 0.5))

    # 2) SLA/policy
    sla_out = run_one(
        sla_agent,
        SLA_PROMPT,
        expected_output="JSON: priority, sla_response_time, policy_notes, allowed_actions",
        inputs={
            "policy": OPERATIONS_POLICY,
            "customer_query": customer_query,
            "incident_type": incident_type,
            "confidence": confidence
        },
        verbose=verbose
    )

    # 3) action plan
    action_out = run_one(
        action_agent,
        ACTION_PROMPT,
        expected_output="JSON: immediate_actions, next_steps_24h, info_needed",
        inputs={
            "policy": OPERATIONS_POLICY,
            "customer_query": customer_query,
            "incident_type": incident_type,
            "priority": sla_out.get("priority"),
            "sla_response_time": sla_out.get("sla_response_time"),
            "allowed_actions": sla_out.get("allowed_actions", [])
        },
        verbose=verbose
    )

    # 4) escalation
    escalation_out = run_one(
        escalation_agent,
        ESCALATION_PROMPT,
        expected_output="JSON: escalate, escalation_reason",
        inputs={
            "policy": OPERATIONS_POLICY,
            "customer_query": customer_query,
            "incident_type": incident_type,
            "confidence": confidence,
            "priority": sla_out.get("priority"),
            "immediate_actions": action_out.get("immediate_actions", [])
        },
        verbose=verbose
    )

    handoff = IncidentHandoff(
        customer_query=customer_query,
        incident_type=incident_type,
        confidence=confidence,
        priority=sla_out.get("priority"),
        sla_response_time=sla_out.get("sla_response_time"),
        policy_notes=sla_out.get("policy_notes"),
        allowed_actions=sla_out.get("allowed_actions", []),
        immediate_actions=action_out.get("immediate_actions", []),
        next_steps_24h=action_out.get("next_steps_24h", []),
        info_needed=action_out.get("info_needed", []),
        escalate=bool(escalation_out.get("escalate", False)),
        escalation_reason=escalation_out.get("escalation_reason", "")
    )

    # deterministic safeguards
    handoff.risk_score = simple_risk_score(customer_query, incident_type, confidence)

    if incident_type == "cold_chain_breach":
        handoff.escalate = True
        handoff.escalation_reason = handoff.escalation_reason or "Cold chain breach trigger (policy): always escalate."

    if confidence < 0.60:
        handoff.escalate = True
        handoff.escalation_reason = handoff.escalation_reason or "Low confidence classification; escalate for review."

    if handoff.risk_score is not None and handoff.risk_score >= 70:
        handoff.escalate = True
        if not handoff.escalation_reason:
            handoff.escalation_reason = "High risk score based on static rules."

    return handoff

print("Orchestrator ready.")


## 9) Quick Test (Single Query)

In [None]:
q = "We have a temperature excursion in a reefer truck for dairy shipment. Urgent. What should we do?"
out = run_incident_workflow(q, verbose=False)
print(out.model_dump_json(indent=2))


## 10) End-to-End Tests (Sample Inputs)

In [None]:
TEST_QUERIES = [
    # Routine
    "Shipment ABC123 is delayed by 6 hours due to weather. What is the updated ETA?",
    "Warehouse says SKU-778 stock is 120 but system shows 40. Please advise.",
    "Vendor VND-21 missed the ASN again and cartons arrived damaged.",
    "Our WMS label printing failed and carriers are refusing pickup.",
    # Edge / high risk
    "Reefer temp went above threshold for 2 hours for vaccine shipment. Urgent!",
    "We suspect theft: multiple high-value parcels missing from hub BLR-DC2.",
    "Customs hold: shipment stuck at customs and customer committed date is today.",
    "This is the third time this lane has failed this week. Recurring delivery failures."
]

for q in TEST_QUERIES:
    print("\n" + "="*110)
    print("INCIDENT:", q)
    try:
        out = run_incident_workflow(q, verbose=False)
        print(out.model_dump_json(indent=2))
    except Exception as e:
        print("ERROR:", e)


## 11) Architecture Summary (What to write in your report)

- **Agent roles**
  - Classification: maps query → incident_type + confidence + entities
  - SLA/Policy: chooses priority + SLA + allowed actions based on static policy
  - Action: produces an action plan constrained by allowed actions and priority
  - Escalation: decides escalation based on triggers + uncertainty

- **Task flow / handoffs**
  - Each step consumes prior step outputs (incident_type, confidence, priority, allowed_actions).

- **Escalation logic**
  - Policy triggers: cold_chain_breach always escalates; low confidence escalates
  - Deterministic risk score adds a final safeguard (>=70)

- **Why this design**
  - Mirrors real control-tower operations: triage → SLA decision → playbook → escalation.
