# Tooling as a Product Surface (Simple Walkthrough)

This notebook is a **guided demo** showing how to design and operate tools as a first-class product surface for an agentic system:
1) Tools have **clear contracts** (typed inputs/outputs).  
2) Every call runs through **validation → permissions → (optional) dry-run → execution → postconditions → evidence → logging**.  
3) **Idempotency** prevents duplicate side effects.  
4) You’ll see a **visual pipeline**, **tables**, and **charts** that explain what happened.

> Run entirely offline with simulated data and virtual services.

### Visualizations
- A **pipeline diagram** of a safe tool call.
- A **catalog table** of tools and their contracts.
- A **scenario run** with calls and outcomes (OK, error, dry-run, idempotent return).
- **Charts** for outcome counts and per-call durations.
- A clean **tool audit log** (like a black box recorder).

### Quick scenario controls
- `ROLE`: permissions profile (`analyst` or `manager`)
- `DRY_RUN`: preview actions without side effects
- `FAILURE_RATE`: how “messy” the world is (0.0 to 0.35). Higher means more simulated hiccups.
- `STEP_BUDGET`: max number of tool calls allowed in the scenario

In [None]:
ROLE = "analyst"       # "analyst" or "manager"
DRY_RUN = False        # True runs with previews; False performs side effects
FAILURE_RATE = 0.15    # 0.0 (easy) ... 0.35 (harder)
STEP_BUDGET = 40       # budget for number of tool calls

import uuid
RUN_ID = f"tools-{uuid.uuid4().hex[:8]}"
print({"RUN_ID": RUN_ID, "ROLE": ROLE, "DRY_RUN": DRY_RUN, "FAILURE_RATE": FAILURE_RATE, "STEP_BUDGET": STEP_BUDGET})

## Workflow
Propose → Validate schema → Check permissions → Dry-run? → Execute → Verify postconditions → Attach evidence → Log

In [None]:
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle, FancyArrowPatch

plt.figure(figsize=(12, 2.6))
ax = plt.gca(); ax.axis("off")
stages = ["Propose", "Validate", "Permissions", "Dry-run?", "Execute", "Post-\nconditions", "Evidence", "Log"]
x = 0.02
for s in stages:
    ax.add_patch(Rectangle((x, 0.35), 0.1, 0.3, fill=False))
    ax.text(x+0.05, 0.5, s, ha="center", va="center")
    x += 0.12
x = 0.12
for _ in range(len(stages)-1):
    ax.add_patch(FancyArrowPatch((x, 0.5), (x+0.02, 0.5), arrowstyle='->'))
    x += 0.12
plt.title("Tooling as a Product Surface — Pipeline")
plt.show()

## Setup: data, policy, tool registry, and wrappers

We simulate orders, refunds, and cases. Each tool is wrapped with:
- **Schemas** for inputs/outputs
- **Permission checks** based on role
- **Idempotency** for side effects
- **Dry-run** previews
- **Postconditions** (verify the action worked)
- **Evidence** links (where the fact/action came from)
- **Structured audit logs**

In [None]:
import random, time, json
from dataclasses import dataclass, field
from typing import Any, Dict, List, Callable

random.seed(7)

# --- Simulated Systems ---
ORDERS = {
    "O100": {"account_id": "A100", "total": 120.00, "status": "settled"},
    "O200": {"account_id": "A200", "total": 58.50, "status": "settled"},
    "O300": {"account_id": "A300", "total": 199.99, "status": "settled"},
}
REFUNDS = []  # ledger
CASES = []    # simple case store

POLICY = {
    "refund_limits": {"analyst": 50.0, "manager": 500.0},
    "allowed_reasons": ["late", "damaged", "other"]
}

# --- Utilities ---
def _now_ms(): 
    return int(time.time() * 1000)

class FailureInjector:
    def __init__(self, rate=0.1, latency=True):
        self.rate = rate; self.latency = latency
    def maybe_fail(self):
        if self.latency:
            time.sleep(random.uniform(0.005, 0.03))
        r = random.random()
        if r < self.rate * 0.25:  # timeout
            raise TimeoutError("simulated timeout")
        if r < self.rate * 0.5:   # not found / bad id
            raise ValueError("simulated 404/bad-id")
        if r < self.rate * 0.75:  # stale data hint
            return {"stale": True}
        return {}

FAIL = FailureInjector(rate=FAILURE_RATE, latency=True)

# --- Tool Contracts ---
@dataclass
class ToolSpec:
    name: str
    inputs: Dict[str, Any]
    outputs: Dict[str, Any]
    permissions: List[str]
    side_effects: bool = False
    version: str = "v1"

@dataclass
class ToolResult:
    status: str
    data: Dict[str, Any] = field(default_factory=dict)
    errors: List[str] = field(default_factory=list)
    evidence: List[Dict[str, str]] = field(default_factory=list)
    idempotency_key: str = ""
    version: str = "v1"
    duration_ms: int = 0

class ToolRegistry:
    def __init__(self):
        self.specs: Dict[str, ToolSpec] = {}
        self.impls: Dict[str, Callable] = {}
        self.idem_store: Dict[str, ToolResult] = {}  # cache for idempotency
        self.audit: List[Dict[str, Any]] = []
    def register(self, spec: ToolSpec, fn: Callable):
        self.specs[spec.name] = spec
        self.impls[spec.name] = fn
    def list(self):
        return list(self.specs.values())

REG = ToolRegistry()

def validate_inputs(spec: ToolSpec, args: Dict[str, Any]) -> List[str]:
    errs = []
    for k, rule in spec.inputs.items():
        if rule.get("required") and k not in args:
            errs.append(f"missing required '{k}'")
        if k in args and "type" in rule:
            if rule["type"] == "string" and not isinstance(args[k], str):
                errs.append(f"'{k}' must be string")
            if rule["type"] == "number" and not isinstance(args[k], (int, float)):
                errs.append(f"'{k}' must be number")
        if k in args and "enum" in rule and args[k] not in rule["enum"]:
            errs.append(f"'{k}' must be one of {rule['enum']}")
    return errs

def tool_call(tool: str, args: Dict[str, Any], role: str, dry_run: bool, idem_key: str = None) -> ToolResult:
    start = _now_ms()
    spec = REG.specs[tool]
    # 1) Permissions
    if role not in spec.permissions:
        res = ToolResult(status="error", errors=[f"permission denied for role '{role}'"])
        res.duration_ms = _now_ms() - start
        REG.audit.append({"ts": _now_ms(), "tool": tool, "args": args, "role": role, "dry_run": dry_run, "result": res.__dict__})
        return res
    # 2) Schema validation
    v_errs = validate_inputs(spec, args)
    if v_errs:
        res = ToolResult(status="error", errors=v_errs)
        res.duration_ms = _now_ms() - start
        REG.audit.append({"ts": _now_ms(), "tool": tool, "args": args, "role": role, "dry_run": dry_run, "result": res.__dict__})
        return res
    # 3) Idempotency (only for side effects)
    idem_key = idem_key or (f"{tool}:{json.dumps(args, sort_keys=True)}" if spec.side_effects else "")
    if spec.side_effects and idem_key in REG.idem_store:
        prev = REG.idem_store[idem_key]
        res = ToolResult(status="ok", data=prev.data, evidence=prev.evidence, idempotency_key=idem_key, version=spec.version)
        res.duration_ms = _now_ms() - start
        REG.audit.append({"ts": _now_ms(), "tool": tool, "args": args, "role": role, "dry_run": dry_run, "result": res.__dict__, "note": "idempotent-return"})
        return res
    # 4) Dry-run preview
    if dry_run and spec.side_effects:
        preview = {"would_perform": tool, "args": args}
        res = ToolResult(status="ok", data=preview, idempotency_key=idem_key, version=spec.version, evidence=[{"type":"preview","url":f"{tool}://dry-run"}])
        res.duration_ms = _now_ms() - start
        REG.audit.append({"ts": _now_ms(), "tool": tool, "args": args, "role": role, "dry_run": dry_run, "result": res.__dict__})
        return res
    # 5) Execute impl (with simulated failures)
    try:
        mark = FAIL.maybe_fail()
        impl = REG.impls[tool]
        out = impl(args, mark)  # impl returns (data, evidence)
        res = ToolResult(status="ok", data=out["data"], evidence=out.get("evidence", []), idempotency_key=idem_key, version=spec.version)
        # 6) Postconditions (simple checks)
        if spec.side_effects and not out.get("post_ok", True):
            res = ToolResult(status="error", errors=["postcondition failed"], idempotency_key=idem_key)
        # Cache for idempotency
        if spec.side_effects and res.status == "ok":
            REG.idem_store[idem_key] = res
    except Exception as e:
        res = ToolResult(status="error", errors=[str(e)], idempotency_key=idem_key)
    res.duration_ms = _now_ms() - start
    REG.audit.append({"ts": _now_ms(), "tool": tool, "args": args, "role": role, "dry_run": dry_run, "result": res.__dict__})
    return res

## Tools implemented in this demo
- `order.lookup(order_id)` → read-only lookup with evidence link  
- `refund.issue(order_id, amount, reason)` → side effect with limits, idempotency, evidence  
- `case.create(account_id, title, severity)` → side effect with simple post-check

In [None]:
# order.lookup
spec_lookup = ToolSpec(
    name="order.lookup",
    inputs={"order_id": {"type":"string", "required": True}},
    outputs={"order": {"type":"object"}},
    permissions=["analyst","manager"],
    side_effects=False
)
def impl_lookup(args, mark):
    oid = args["order_id"]
    if oid not in ORDERS: 
        raise ValueError("order not found")
    return {"data": {"order": ORDERS[oid]}, "evidence": [{"type":"source","url":f"order://{oid}"}]}
REG.register(spec_lookup, impl_lookup)

# refund.issue
spec_refund = ToolSpec(
    name="refund.issue",
    inputs={
        "order_id": {"type":"string", "required": True},
        "amount":   {"type":"number", "required": True},
        "reason":   {"type":"string", "required": True, "enum": POLICY["allowed_reasons"]},
    },
    outputs={"refund_id": {"type":"string"}},
    permissions=["analyst","manager"],
    side_effects=True
)
def impl_refund(args, mark):
    oid, amount, reason = args["order_id"], args["amount"], args["reason"]
    if oid not in ORDERS: 
        raise ValueError("order not found")
    limit = POLICY["refund_limits"].get(ROLE, 0.0)
    if amount > limit: 
        raise PermissionError(f"amount exceeds role limit: {limit}")
    rid = f"R{len(REFUNDS)+1:03d}"
    REFUNDS.append({"refund_id": rid, "order_id": oid, "amount": amount, "reason": reason})
    post_ok = any(r["refund_id"] == rid for r in REFUNDS)
    return {"data": {"refund_id": rid}, "evidence": [{"type":"ledger","url":f"refund://{rid}"}], "post_ok": post_ok}
REG.register(spec_refund, impl_refund)

# case.create
spec_case = ToolSpec(
    name="case.create",
    inputs={
        "account_id": {"type":"string", "required": True},
        "title": {"type":"string", "required": True},
        "severity": {"type":"string", "required": True, "enum": ["low","medium","high"]}
    },
    outputs={"case_id": {"type":"string"}},
    permissions=["analyst","manager"],
    side_effects=True
)
def impl_case(args, mark):
    cid = f"C{len(CASES)+1:03d}"
    CASES.append({"case_id": cid, **args})
    post_ok = any(c["case_id"] == cid for c in CASES)
    return {"data": {"case_id": cid}, "evidence": [{"type":"ticket","url":f"case://{cid}"}], "post_ok": post_ok}
REG.register(spec_case, impl_case)

## Tool Catalog (contracts, permissions)

In [None]:
import pandas as pd

catalog = []
for spec in REG.list():
    catalog.append({
        "tool": spec.name,
        "side_effects": spec.side_effects,
        "permissions": ",".join(spec.permissions),
        "inputs": json.dumps(spec.inputs),
        "outputs": json.dumps(spec.outputs),
        "version": spec.version,
    })
catalog_df = pd.DataFrame(catalog)

display(catalog_df)

## Run Scenario
This run intentionally includes duplicates and policy edge cases to show idempotency and permission checks in action.

Workflow:
1. Each tool call went through **schema validation** and **permission checks**.  
2. **Idempotency** stopped a duplicate refund from issuing twice.  
3. The **over-limit refund** failed for an analyst (would succeed for a manager).  
4. **Dry-run mode** (toggle at the top) shows what would happen without side effects.  
5. Every call wrote a **structured audit record** and returned **evidence links**.

In [None]:
# Reset audit and state for a clean run
REG.audit.clear()
REG.idem_store.clear()
REFUNDS.clear(); CASES.clear()

# Bounded run helper
def bounded_call(tool, args, idem=None):
    global STEP_BUDGET
    if STEP_BUDGET <= 0:
        return ToolResult(status="error", errors=["step budget exceeded"])
    STEP_BUDGET -= 1
    return tool_call(tool, args, role=ROLE, dry_run=DRY_RUN, idem_key=idem)

# Scenario steps
results = []
results.append(bounded_call("order.lookup", {"order_id":"O100"}))
# Try a refund within analyst limit
results.append(bounded_call("refund.issue", {"order_id":"O100", "amount": 20.0, "reason":"late"}, idem="O100-20-late"))
# Duplicate refund with same idem -> should NOT double issue
results.append(bounded_call("refund.issue", {"order_id":"O100", "amount": 20.0, "reason":"late"}, idem="O100-20-late"))
# Over the limit for analyst -> expect error unless ROLE="manager"
results.append(bounded_call("refund.issue", {"order_id":"O200", "amount": 120.0, "reason":"damaged"}, idem="O200-120-damaged"))
# Create a case
results.append(bounded_call("case.create", {"account_id":"A200", "title":"Investigate integration", "severity":"high"}))

# Summaries
ok = sum(1 for r in results if r.status=="ok")
err = len(results) - ok
print("Calls:", len(results), "| OK:", ok, "| Errors:", err)
[ r.__dict__ for r in results ]

## Visuals
### Outcome counts

In [None]:
import matplotlib.pyplot as plt

labels = ["ok","error"]
counts = [sum(1 for r in REG.audit if r["result"]["status"]=="ok"),
          sum(1 for r in REG.audit if r["result"]["status"]!="ok")]

plt.figure()
plt.bar(labels, counts)
plt.title("Tool call outcomes")
plt.xlabel("Status")
plt.ylabel("Count")
plt.show()

### Per-call duration (ms)

In [None]:
dur = [row["result"].get("duration_ms",0) for row in REG.audit]
if dur:
    import matplotlib.pyplot as plt
    plt.figure()
    plt.plot(dur)
    plt.title("Per-call duration (ms)")
    plt.xlabel("Call #")
    plt.ylabel("ms")
    plt.show()
else:
    print("No calls recorded.")