# Milestone 3 — Parallel Agents + Persistent Agent Memory (Pinecone)



This notebook implements Milestone 3:

- Run **sequential vs parallel** (async) agent pipelines and compare runtime.

- **Persist agent outputs** into Pinecone as “agent memory” vectors with metadata (`contract_id`, `agent_type`, `timestamp`).

- **Recall stored memory** later (filter by `contract_id` and optionally `agent_type`) to answer follow-ups **without rerunning** the agents.



Prereqs:

- Pinecone index already populated with contract chunk vectors (Milestone 2).

- Environment variable `PINECONE_API_KEY` set.


## 1) Project / Environment Setup


In [35]:
from __future__ import annotations

import asyncio
import json
import logging
import os
import random
import time
import uuid
from dataclasses import dataclass
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple

ROOT = Path.cwd().resolve()
ARTIFACTS_DIR = ROOT / "artifacts"

# Reproducibility
SEED = 42
random.seed(SEED)

# Avoid optional TensorFlow/JAX imports (common Windows DLL issues, and not needed here)
os.environ.setdefault("TRANSFORMERS_NO_TF", "1")
os.environ.setdefault("USE_TF", "0")
os.environ.setdefault("TRANSFORMERS_NO_FLAX", "1")

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s | %(levelname)s | %(message)s",
)
logger = logging.getLogger("milestone3")

def utc_now_iso() -> str:
    return datetime.now(timezone.utc).isoformat()

def load_env_file(path: Path) -> Dict[str, str]:
    """Minimal .env loader (no extra deps).

    - Skips blanks and comments
    - Supports KEY=VALUE
    - Strips surrounding quotes
    - Does NOT override already-set env vars
    """
    loaded: Dict[str, str] = {}
    if not path.exists():
        return loaded

    for raw_line in path.read_text(encoding="utf-8").splitlines():
        line = raw_line.strip()
        if not line or line.startswith("#"):
            continue
        if "=" not in line:
            continue
        key, value = line.split("=", 1)
        key = key.strip()
        value = value.strip().strip('"').strip("'")
        if not key:
            continue
        if os.getenv(key) is None:
            os.environ[key] = value
            loaded[key] = value
    return loaded

# Auto-load root .env if present
env_loaded = load_env_file(ROOT / ".env")

logger.info(f"ROOT={ROOT}")
logger.info(f"ARTIFACTS_DIR={ARTIFACTS_DIR} (exists={ARTIFACTS_DIR.exists()})")
logger.info(f"Loaded .env keys: {sorted(env_loaded.keys())}")
logger.info("Env guards: TRANSFORMERS_NO_TF=%s USE_TF=%s TRANSFORMERS_NO_FLAX=%s", os.getenv("TRANSFORMERS_NO_TF"), os.getenv("USE_TF"), os.getenv("TRANSFORMERS_NO_FLAX"))

2026-01-09 19:22:17,185 | INFO | ROOT=C:\Users\LENOVO\OneDrive\Dokumen\legal contracts eda\milestone3
2026-01-09 19:22:17,185 | INFO | ARTIFACTS_DIR=C:\Users\LENOVO\OneDrive\Dokumen\legal contracts eda\milestone3\artifacts (exists=False)
2026-01-09 19:22:17,185 | INFO | Loaded .env keys: []
2026-01-09 19:22:17,185 | INFO | Env guards: TRANSFORMERS_NO_TF=1 USE_TF=0 TRANSFORMERS_NO_FLAX=1


In [36]:
# Shared agent list used across the notebook
AGENT_TYPES = ["legal_agent", "compliance_agent", "finance_agent", "operations_agent"]

In [37]:
# Dependency repair for this notebook kernel (run once if imports fail)
import sys

try:
    import importlib.metadata as _md  # py3.8+
except Exception:  # pragma: no cover
    _md = None

def _v(pkg: str) -> str:
    if _md is None:
        return "(unknown)"
    try:
        return _md.version(pkg)
    except Exception:
        return "(not installed)"

print("Python executable:", sys.executable)
print("Before:")
print("- huggingface-hub:", _v("huggingface-hub"))
print("- transformers:   ", _v("transformers"))
print("- sentence-transformers:", _v("sentence-transformers"))
print("- tensorflow:     ", _v("tensorflow"))
print("- tensorflow-intel:", _v("tensorflow-intel"))

# IMPORTANT: %pip installs into the currently-running Jupyter kernel environment.
# If you still see the same import error after this, restart the kernel and rerun from Cell 3.
%pip install -U "huggingface-hub>=0.24.0,<1.0" "transformers>=4.40.0" "sentence-transformers>=2.7.0"

# If you see: "Failed to load the native TensorFlow runtime" on Windows, TensorFlow is installed but broken.
# Sentence-transformers does not require TensorFlow for embeddings, so removing it is safe for this notebook:
# %pip uninstall -y tensorflow tensorflow-intel

print("\nAfter:")
print("- huggingface-hub:", _v("huggingface-hub"))
print("- transformers:   ", _v("transformers"))
print("- sentence-transformers:", _v("sentence-transformers"))
print("- tensorflow:     ", _v("tensorflow"))
print("- tensorflow-intel:", _v("tensorflow-intel"))

Python executable: c:\Users\LENOVO\anaconda3\python.exe
Before:
- huggingface-hub: 0.36.0
- transformers:    4.57.3
- sentence-transformers: 5.2.0
- tensorflow:      2.19.0
- tensorflow-intel: 2.18.0
Note: you may need to restart the kernel to use updated packages.

After:
- huggingface-hub: 0.36.0
- transformers:    4.57.3
- sentence-transformers: 5.2.0
- tensorflow:      2.19.0
- tensorflow-intel: 2.18.0



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


## 2) Connect to Pinecone + Load Embeddings Model


In [38]:
# Pinecone connection (supports both newer and older SDK styles)

from getpass import getpass

# Ensure optional TensorFlow/JAX stacks are not used (avoids Windows DLL issues)
os.environ.setdefault("TRANSFORMERS_NO_TF", "1")
os.environ.setdefault("USE_TF", "0")
os.environ.setdefault("TRANSFORMERS_NO_FLAX", "1")

PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
INDEX_NAME = os.getenv("PINECONE_INDEX", "cuad-index")
PINECONE_ENV = os.getenv("PINECONE_ENV")

# If the env var isn't set, allow interactive entry (common in notebooks)
if not PINECONE_API_KEY:
    PINECONE_API_KEY = getpass("Enter PINECONE_API_KEY (input hidden): ")
    PINECONE_API_KEY = (PINECONE_API_KEY or "").strip()
    if not PINECONE_API_KEY:
        raise RuntimeError(
            "Missing PINECONE_API_KEY. Set it as an environment variable, or re-run and enter it when prompted."
        )
    os.environ["PINECONE_API_KEY"] = PINECONE_API_KEY

index = None
try:
    # Newer SDK
    from pinecone import Pinecone

    pc = Pinecone(api_key=PINECONE_API_KEY)
    index = pc.Index(INDEX_NAME)
    logger.info(f"Connected to Pinecone index '{INDEX_NAME}' via pinecone.Pinecone")
except Exception as e_new:
    try:
        # Older SDK
        import pinecone

        if not PINECONE_ENV:
            raise RuntimeError(
                "Using legacy pinecone SDK requires PINECONE_ENV. "
                "Set env var PINECONE_ENV (e.g., 'us-east1-gcp'), then re-run."
            )
        pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_ENV)
        index = pinecone.Index(INDEX_NAME)
        logger.info(f"Connected to Pinecone index '{INDEX_NAME}' via pinecone.init")
    except Exception as e_old:
        raise RuntimeError(f"Failed to connect to Pinecone: new={e_new} old={e_old}")

# Embedding model
try:
    from sentence_transformers import SentenceTransformer
except Exception as e:
    msg = str(e)
    if "Failed to load the native TensorFlow runtime" in msg or "_pywrap_tensorflow_internal" in msg:
        raise RuntimeError(
            "TensorFlow is installed but failing to load native DLLs on this machine.\n\n"
            "Sentence-transformers does not require TensorFlow for embeddings. Fix by removing TensorFlow from this env:\n"
            "  pip uninstall -y tensorflow tensorflow-intel\n\n"
            "Then restart the kernel and rerun from Section 1."
        ) from e
    if "huggingface-hub" in msg and "Try:" in msg:
        raise RuntimeError(
            "Dependency mismatch while importing sentence-transformers.\n\n"
            "Your environment has an older huggingface-hub that is incompatible with transformers.\n\n"
            "Fix (pip):\n"
            "  pip install -U huggingface-hub transformers sentence-transformers\n\n"
            "Fix (conda-forge):\n"
            "  conda install -c conda-forge huggingface-hub transformers sentence-transformers\n\n"
            "Then restart the kernel and rerun from Section 1."
        ) from e
    raise RuntimeError(
        "Failed to import sentence-transformers.\n\n"
        "Try:\n"
        "  pip install -U sentence-transformers transformers huggingface-hub\n\n"
        "Then restart the kernel and rerun from Section 1.\n\n"
        f"Original error: {type(e).__name__}: {e}"
    ) from e

EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "all-MiniLM-L6-v2")
CACHE_DIR = ROOT / "models_cache" / "hub"
logger.info(f"Loading embedding model: {EMBEDDING_MODEL}")
model = SentenceTransformer(EMBEDDING_MODEL, cache_folder=str(CACHE_DIR))

def embed_query(text: str) -> List[float]:
    vec = model.encode([text], convert_to_numpy=True)[0]
    return vec.tolist()

2026-01-09 19:22:20,086 | INFO | Connected to Pinecone index 'cuad-index' via pinecone.Pinecone
2026-01-09 19:22:20,087 | INFO | Loading embedding model: all-MiniLM-L6-v2
2026-01-09 19:22:20,087 | INFO | Use pytorch device_name: cpu
2026-01-09 19:22:20,087 | INFO | Load pretrained SentenceTransformer: all-MiniLM-L6-v2


## 3) Agent Pipelines (Retrieval-First) + Timing (Sequential vs Parallel)


In [39]:
AGENT_TYPES = ["legal_agent", "compliance_agent", "finance_agent", "operations_agent"]



AGENT_QUERIES: Dict[str, List[str]] = {

    "legal_agent": [

        "What are the termination clauses and conditions?",

        "What happens in case of breach of contract?",

        "What are the confidentiality and non-disclosure obligations?",

        "What are the indemnification and hold harmless obligations?",

    ],

    "compliance_agent": [

        "What are the data protection and privacy obligations?",

        "What regulatory requirements must be followed?",

        "What are the audit and reporting requirements?",

        "What are the data retention and deletion obligations?",

        "What are the breach notification and incident reporting requirements?",

        "What security audit or certification requirements exist (SOC2/ISO/HIPAA)?",

    ],

    "finance_agent": [

        "What are the payment terms and conditions?",

        "What are the fees, invoices, and billing requirements?",

        "What are the penalties and late fees for non-payment?",

        "What are the interest charges or interest rate for late payment?",

        "What is the financial liability and indemnification?",

    ],

    "operations_agent": [

        "What are the deliverables and project outputs?",

        "What are the timelines and milestones for delivery?",

        "What are the service level agreements (SLAs)?",

        "What are the performance standards and obligations?",

        "What are the operational requirements and responsibilities?",

        "What are the uptime commitments, uptime guarantees, and service credits?",

    ],

}





def pinecone_query(

    *,

    query: str,

    top_k: int = 5,

    namespace: Optional[str] = None,

    metadata_filter: Optional[Dict[str, Any]] = None,

) -> Any:

    qvec = embed_query(query)

    kwargs: Dict[str, Any] = {

        "vector": qvec,

        "top_k": top_k,

        "include_metadata": True,

    }

    if namespace is not None:

        kwargs["namespace"] = namespace

    if metadata_filter is not None:

        kwargs["filter"] = metadata_filter

    return index.query(**kwargs)





def _extract_matches(resp: Any) -> List[Dict[str, Any]]:

    matches = getattr(resp, "matches", None)

    if matches is None and isinstance(resp, dict):

        matches = resp.get("matches")

    if not matches:

        return []



    out: List[Dict[str, Any]] = []

    for m in matches:

        md = getattr(m, "metadata", None)

        score = getattr(m, "score", None)

        if md is None and isinstance(m, dict):

            md = m.get("metadata")

            score = m.get("score")

        out.append({

            "score": float(score) if score is not None else None,

            "metadata": md or {},

        })

    return out





def _confidence_from_matches(matches: List[Dict[str, Any]]) -> Optional[float]:

    scores = [m.get("score") for m in matches if isinstance(m.get("score"), (int, float))]

    if not scores:

        return None

    return float(sum(scores) / len(scores))





def run_agent_pipeline(

    *,

    agent_type: str,

    question: str,

    contract_id: str,

    top_k_per_query: int = 5,

    chunks_namespace: Optional[str] = None,

    filter_chunks_by_contract_id: bool = False,

) -> Dict[str, Any]:

    if agent_type not in AGENT_QUERIES:

        raise ValueError(f"Unknown agent_type: {agent_type}")



    t0 = time.perf_counter()



    # Only enable this if your chunk vectors' metadata includes: {"contract_id": "..."}

    md_filter = (

        {"contract_id": {"$eq": contract_id}}

        if filter_chunks_by_contract_id

        else None

    )



    all_matches: List[Dict[str, Any]] = []

    per_query: List[Dict[str, Any]] = []

    for q in AGENT_QUERIES[agent_type]:

        resp = pinecone_query(query=q, top_k=top_k_per_query, namespace=chunks_namespace, metadata_filter=md_filter)

        matches = _extract_matches(resp)

        per_query.append({"query": q, "matches": matches})

        all_matches.extend(matches)



    confidence = _confidence_from_matches(all_matches)

    elapsed = time.perf_counter() - t0



    return {

        "agent_type": agent_type,

        "contract_id": contract_id,

        "question": question,

        "timestamp": utc_now_iso(),

        "elapsed_seconds": elapsed,

        "confidence": confidence,

        "retrieval": {

            "top_k_per_query": top_k_per_query,

            "filter_chunks_by_contract_id": filter_chunks_by_contract_id,

            "per_query": per_query,

        },

    }





def run_sequential(

    *,

    question: str,

    contract_id: str,

    agents: List[str] = AGENT_TYPES,

    filter_chunks_by_contract_id: bool = False,

) -> Tuple[Dict[str, Any], float]:

    t0 = time.perf_counter()

    out: Dict[str, Any] = {}

    for a in agents:

        out[a] = run_agent_pipeline(

            agent_type=a,

            question=question,

            contract_id=contract_id,

            filter_chunks_by_contract_id=filter_chunks_by_contract_id,

        )

    return out, time.perf_counter() - t0





async def run_parallel(

    *,

    question: str,

    contract_id: str,

    agents: List[str] = AGENT_TYPES,

    filter_chunks_by_contract_id: bool = False,

) -> Tuple[Dict[str, Any], float]:

    t0 = time.perf_counter()

    tasks = [

        asyncio.to_thread(

            run_agent_pipeline,

            agent_type=a,

            question=question,

            contract_id=contract_id,

            filter_chunks_by_contract_id=filter_chunks_by_contract_id,

        )

        for a in agents

    ]

    results = await asyncio.gather(*tasks)

    out = {r["agent_type"]: r for r in results}

    return out, time.perf_counter() - t0


In [40]:
# Configure your run

# - CONTRACT_ID can be any stable identifier you choose (used for memory persistence/recall).

# - If your chunk vectors include contract_id in metadata, set FILTER_CHUNKS_BY_CONTRACT_ID=True.

CONTRACT_ID = os.getenv("CONTRACT_ID", "demo_contract")

QUESTION = "What are the payment terms, audit requirements, and uptime commitments?"

FILTER_CHUNKS_BY_CONTRACT_ID = False



seq_out, seq_s = run_sequential(

    question=QUESTION,

    contract_id=CONTRACT_ID,

    filter_chunks_by_contract_id=FILTER_CHUNKS_BY_CONTRACT_ID,

)

par_out, par_s = await run_parallel(

    question=QUESTION,

    contract_id=CONTRACT_ID,

    filter_chunks_by_contract_id=FILTER_CHUNKS_BY_CONTRACT_ID,

)



print("Sequential seconds:", round(seq_s, 3))

print("Parallel seconds:  ", round(par_s, 3))



print("\nPer-agent confidence (parallel):")

for a in AGENT_TYPES:

    conf = par_out[a].get("confidence")

    print(f"- {a}: {None if conf is None else round(conf, 4)}")


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Sequential seconds: 9.716
Parallel seconds:   3.344

Per-agent confidence (parallel):
- legal_agent: 0.6607
- compliance_agent: 0.5007
- finance_agent: 0.5432
- operations_agent: 0.4704


## 4) Persist Agent Outputs as Vector Memory (Pinecone)


In [41]:
AGENT_MEMORY_NAMESPACE = "agent_memory"

AGENT_MEMORY_RECORD_TYPE = "agent_memory"



def safe_json_dumps(obj: Any, max_chars: int = 6000) -> str:
    """Serialize to JSON for storage in metadata.

    Important: If we truncate, we keep the returned string as VALID JSON so it can be json.loads()'d later.
    """
    try:
        s = json.dumps(obj, ensure_ascii=False)
    except Exception:
        s = str(obj)
    if len(s) <= max_chars:
        return s
    # Wrap a preview in a valid JSON object to avoid broken/partial JSON strings
    preview = s[: max_chars - 200]
    wrapper = {"_truncated": True, "preview": preview, "chars": len(s)}
    return json.dumps(wrapper, ensure_ascii=False)


@dataclass
class AgentMemoryRecord:
    contract_id: str
    agent_type: str
    timestamp: str
    question: str
    output: Any

    def to_text(self) -> str:
        return (
            f"contract_id: {self.contract_id}\n"
            f"agent_type: {self.agent_type}\n"
            f"timestamp: {self.timestamp}\n"
            f"question: {self.question}\n\n"
            f"output_json: {safe_json_dumps(self.output)}\n"
        )

    def to_metadata(self) -> Dict[str, Any]:
        md: Dict[str, Any] = {
            "record_type": AGENT_MEMORY_RECORD_TYPE,
            "contract_id": self.contract_id,
            "agent_type": self.agent_type,
            # Alias to match common examples
            "agent": self.agent_type,
            "timestamp": self.timestamp,
            "question": self.question[:1000],
            "output_json": safe_json_dumps(self.output, 6000),
        }

        # If output contains a risk_level or confidence, store explicitly for filtering / reporting.
        if isinstance(self.output, dict):
            rl = self.output.get("risk_level")
            if isinstance(rl, str) and rl.strip():
                md["risk_level"] = rl.strip().lower()
            conf = self.output.get("confidence")
            if isinstance(conf, (int, float)):
                md["confidence"] = float(conf)

        return md


def persist_agent_memory(*, records: List[AgentMemoryRecord], namespace: str = AGENT_MEMORY_NAMESPACE) -> List[str]:
    vectors = []
    ids: List[str] = []
    for r in records:
        text = r.to_text()
        vec = embed_query(text)
        vid = f"{r.contract_id}:{r.agent_type}:{r.timestamp}:{uuid.uuid4().hex}"
        ids.append(vid)
        vectors.append({"id": vid, "values": vec, "metadata": r.to_metadata()})

    index.upsert(vectors=vectors, namespace=namespace)
    return ids


def query_agent_memory(*, query: str, contract_id: str, agent_type: Optional[str] = None, top_k: int = 5, namespace: str = AGENT_MEMORY_NAMESPACE) -> Any:
    filt: Dict[str, Any] = {
        "record_type": {"$eq": AGENT_MEMORY_RECORD_TYPE},
        "contract_id": {"$eq": contract_id},
    }
    if agent_type:
        filt["agent_type"] = {"$eq": agent_type}

    return index.query(
        vector=embed_query(query),
        top_k=top_k,
        include_metadata=True,
        namespace=namespace,
        filter=filt,
    )

In [42]:
# Persist the PARALLEL outputs from Section 3

records = [

    AgentMemoryRecord(

        contract_id=CONTRACT_ID,

        agent_type=a,

        timestamp=utc_now_iso(),

        question=QUESTION,

        output=par_out[a],

    )

    for a in AGENT_TYPES

]



ids = persist_agent_memory(records=records)

print(f"Upserted {len(ids)} agent-memory vectors into namespace '{AGENT_MEMORY_NAMESPACE}'.")

print("Example IDs:", ids[:2])


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Upserted 4 agent-memory vectors into namespace 'agent_memory'.
Example IDs: ['demo_contract:legal_agent:2026-01-09T13:52:38.517959+00:00:6a8ebe41b4d04600b5e73e6919d81934', 'demo_contract:compliance_agent:2026-01-09T13:52:38.517959+00:00:35723597c5644adcb45da0cfcf076162']


## 5) Recall Stored Agent Memory (No Rerun)


In [43]:
# Recall examples (filtered by contract_id and optionally agent_type)

recall_ops = query_agent_memory(

    query="uptime commitments service credits",

    contract_id=CONTRACT_ID,

    agent_type="operations_agent",

    top_k=3,

)



recall_fin = query_agent_memory(

    query="interest charges late payment",

    contract_id=CONTRACT_ID,

    agent_type="finance_agent",

    top_k=3,

)



print("Operations memory matches:")

for m in getattr(recall_ops, "matches", [])[:3]:

    print("- score:", getattr(m, "score", None))

    print("  ts:", (getattr(m, "metadata", {}) or {}).get("timestamp"))

    print("  question:", (getattr(m, "metadata", {}) or {}).get("question"))



print("\nFinance memory matches:")

for m in getattr(recall_fin, "matches", [])[:3]:

    print("- score:", getattr(m, "score", None))

    print("  ts:", (getattr(m, "metadata", {}) or {}).get("timestamp"))

    print("  question:", (getattr(m, "metadata", {}) or {}).get("question"))


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Operations memory matches:
- score: 0.300049812
  ts: 2026-01-08T13:29:46.417164+00:00
  question: What are the payment terms, audit requirements, and uptime commitments?
- score: 0.287986785
  ts: 2026-01-09T13:31:38.825623+00:00
  question: What are the payment terms, audit requirements, and uptime commitments?
- score: 0.244141608
  ts: 2026-01-09T13:47:16.141924+00:00
  question: What are the payment terms, audit requirements, and uptime commitments?

Finance memory matches:
- score: 0.208692566
  ts: 2026-01-08T13:29:46.417164+00:00
  question: What are the payment terms, audit requirements, and uptime commitments?
- score: 0.205164433
  ts: 2026-01-09T13:31:38.825623+00:00
  question: What are the payment terms, audit requirements, and uptime commitments?
- score: 0.16263485
  ts: 2026-01-09T13:47:16.141924+00:00
  question: What are the payment terms, audit requirements, and uptime commitments?


## 6) Cross-Agent Refinement (Memory → Shared Context → Refine → Persist)

> Goal: enable one agent to use another agent’s stored output as context, refine risk assessment, and write back the refined result into Pinecone memory.

In [49]:
from datetime import datetime, timezone

# Fallback if earlier cells were not executed yet
AGENTS_FOR_REFINEMENT = globals().get("AGENT_TYPES") or [
    "legal_agent",
    "compliance_agent",
    "finance_agent",
    "operations_agent",
]

def _as_utc_aware(dt: datetime) -> datetime:
    """Normalize datetimes to timezone-aware UTC for safe comparisons."""
    if dt.tzinfo is None:
        return dt.replace(tzinfo=timezone.utc)
    return dt.astimezone(timezone.utc)

def _parse_ts(ts: Optional[str]) -> datetime:
    # Always return a timezone-aware UTC datetime to avoid naive/aware comparison errors.
    if not ts:
        return datetime.min.replace(tzinfo=timezone.utc)
    try:
        # Handles ISO 8601 like: 2026-01-08T12:34:56.789+00:00 or ...Z
        dt = datetime.fromisoformat(ts.replace("Z", "+00:00"))
        return _as_utc_aware(dt)
    except Exception:
        return datetime.min.replace(tzinfo=timezone.utc)

def _matches(resp: Any) -> List[Any]:
    if isinstance(resp, dict):
        return resp.get("matches") or []
    return getattr(resp, "matches", []) or []

def _md(match: Any) -> Dict[str, Any]:
    if isinstance(match, dict):
        return match.get("metadata") or {}
    return getattr(match, "metadata", {}) or {}

def _infer_risk_from_text(text: str) -> Tuple[str, str]:
    t = (text or "").lower()
    # Very simple heuristic just for milestone demonstration
    high_terms = ["penalt", "late fee", "interest", "termination", "breach", "indemn", "liability", "service credit"]
    medium_terms = ["audit", "confidential", "privacy", "retention", "notification", "sla"]

    if any(k in t for k in high_terms):
        return "high", "Contains high-impact financial/legal terms (heuristic)."
    if any(k in t for k in medium_terms):
        return "medium", "Contains standard compliance/operations terms (heuristic)."
    return "medium", "Defaulted to medium (insufficient signal in stored output)."

def fetch_latest_agent_memory(*, contract_id: str, agent_type: str, top_k: int = 10) -> Optional[Dict[str, Any]]:
    resp = query_agent_memory(query=f"{agent_type} risk assessment", contract_id=contract_id, agent_type=agent_type, top_k=top_k)
    best = None
    best_ts = datetime.min.replace(tzinfo=timezone.utc)
    for m in _matches(resp):
        md = _md(m)
        ts = _parse_ts(md.get("timestamp"))
        if ts > best_ts:
            best = md
            best_ts = ts
    return best

# 1) Retrieve latest memory per agent and build shared_context
latest_by_agent: Dict[str, Dict[str, Any]] = {}
for agent_type in AGENTS_FOR_REFINEMENT:
    md = fetch_latest_agent_memory(contract_id=CONTRACT_ID, agent_type=agent_type)
    if md is None:
        latest_by_agent[agent_type] = {
            "agent": agent_type,
            "risk_level": "unknown",
            "confidence": None,
            "timestamp": None,
            "output_json": "",
        }
        continue

    # Prefer explicit risk_level metadata, else infer from stored output_json text
    output_json = md.get("output_json") or ""
    risk_level = md.get("risk_level")
    if not isinstance(risk_level, str) or not risk_level.strip():
        risk_level, _ = _infer_risk_from_text(output_json)

    # Best-effort confidence from memory metadata
    conf = md.get("confidence")
    if not isinstance(conf, (int, float)):
        conf = None

    latest_by_agent[agent_type] = {
        "agent": md.get("agent") or md.get("agent_type") or agent_type,
        "risk_level": risk_level,
        "confidence": float(conf) if isinstance(conf, (int, float)) else None,
        "timestamp": md.get("timestamp"),
        "output_json": output_json,
    }

shared_context = "\n".join([f"{v['agent']} risk: {v['risk_level']}" for v in latest_by_agent.values()])
print(shared_context)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

legal_agent risk: high
compliance_agent risk: high
finance_agent risk: high
operations_agent risk: medium


In [50]:
# 2) Let compliance agent read finance output and refine (risk escalation demo)
finance = latest_by_agent.get("finance_agent", {})
compliance = latest_by_agent.get("compliance_agent", {})

finance_risk = (finance.get("risk_level") or "unknown").lower()
compliance_risk = (compliance.get("risk_level") or "unknown").lower()

# Best-effort confidence: inherit from latest compliance if available, else finance, else None
def _as_float(x: Any) -> Optional[float]:
    return float(x) if isinstance(x, (int, float)) else None

inherited_confidence = (
    _as_float(compliance.get("confidence"))
    or _as_float((compliance.get("output") or {}).get("confidence") if isinstance(compliance.get("output"), dict) else None)
    or _as_float(finance.get("confidence"))
    or _as_float((finance.get("output") or {}).get("confidence") if isinstance(finance.get("output"), dict) else None)
    or None
 )

refined_risk = compliance_risk
reason = "No escalation: finance risk not high (heuristic)."

if finance_risk == "high" and compliance_risk in {"low", "medium", "unknown"}:
    refined_risk = "high"
    reason = "Escalated to high because finance risk is high; combined exposure increases compliance risk."

refined_compliance = {
    "agent_type": "compliance_agent",
    "risk_level": refined_risk,
    "confidence": inherited_confidence,
    "reason": reason,
    "based_on": {
        "shared_context": shared_context,
        "finance_risk": finance_risk,
    },
}

print(json.dumps(refined_compliance, indent=2))

{
  "agent_type": "compliance_agent",
  "risk_level": "high",
  "confidence": 0.54319379216,
  "reason": "No escalation: finance risk not high (heuristic).",
  "based_on": {
    "shared_context": "legal_agent risk: high\ncompliance_agent risk: high\nfinance_agent risk: high\noperations_agent risk: medium",
    "finance_risk": "high"
  }
}


In [51]:
# 3) Update Compliance memory (persist refined assessment)
refined_record = AgentMemoryRecord(
    contract_id=CONTRACT_ID,
    agent_type="compliance_agent",
    timestamp=utc_now_iso(),
    question="Cross-agent refinement: compliance reads finance output and re-evaluates risk",
    output=refined_compliance,
)

refined_ids = persist_agent_memory(records=[refined_record])
print("Upserted refined compliance memory:", refined_ids[0])

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Upserted refined compliance memory: demo_contract:compliance_agent:2026-01-09T14:01:55.035860+00:00:942d633582c140b7b21d36109d479307


## 7) Final Contract-Level JSON Output (Latest Memories → Standard JSON)

This section produces a **single standardized JSON** for a contract by:
- Pulling the **latest** stored memory per agent from Pinecone
- Aggregating **confidence** across agents
- Extracting a list of **high-risk clauses** (evidence snippets)
- Computing an **overall risk level**
- Saving the final JSON to disk (so it can be used outside the notebook)

In [52]:
# Define the final schema + generate the final contract-level JSON from latest Pinecone memories
from __future__ import annotations

import json
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple

# Where to save outputs (prefer milestone3/outputs if present; otherwise milestone3/artifacts)
OUTPUTS_DIR = (ROOT / "outputs") if (ROOT / "outputs").exists() else (ROOT / "artifacts")
OUTPUTS_DIR.mkdir(parents=True, exist_ok=True)

RISK_ORDER = {"low": 0, "medium": 1, "high": 2, "unknown": 1}
HIGH_RISK_TERMS = [
    "penalt", "late fee", "interest", "termination", "breach", "indemn", "liability", "service credit",
    "audit right", "uncapped", "limitation of liability", "data breach", "incident", "non-compliance",
    "security", "subprocessor", "cross-border", "governing law", "injunction",
]

# Defensive: if this list ever becomes nested (e.g., due to a trailing comma edit), flatten it
if HIGH_RISK_TERMS and isinstance(HIGH_RISK_TERMS[0], (list, tuple, set)):
    HIGH_RISK_TERMS = [t for group in HIGH_RISK_TERMS for t in group]

FINAL_CONTRACT_SCHEMA: Dict[str, Any] = {
    "contract_id": "",
    "legal": {},
    "compliance": {},
    "finance": {},
    "operations": {},
    "overall_risk": "",
    "confidence": {
        "per_agent": {},
        "overall_avg": None,
    },
    "high_risk_clauses": [],
    "generated_at": "",
}

def _utc_now_iso() -> str:
    return datetime.now(timezone.utc).isoformat()

def _as_utc_aware(dt: datetime) -> datetime:
    if dt.tzinfo is None:
        return dt.replace(tzinfo=timezone.utc)
    return dt.astimezone(timezone.utc)

def _parse_ts(ts: Optional[str]) -> datetime:
    if not ts:
        return datetime.min.replace(tzinfo=timezone.utc)
    try:
        dt = datetime.fromisoformat(ts.replace("Z", "+00:00"))
        return _as_utc_aware(dt)
    except Exception:
        return datetime.min.replace(tzinfo=timezone.utc)

def _matches(resp: Any) -> List[Any]:
    if isinstance(resp, dict):
        return resp.get("matches") or []
    return getattr(resp, "matches", []) or []

def _md(match: Any) -> Dict[str, Any]:
    if isinstance(match, dict):
        return match.get("metadata") or {}
    return getattr(match, "metadata", {}) or {}

def _safe_json_loads(s: str) -> Optional[Any]:
    if not isinstance(s, str) or not s.strip():
        return None
    try:
        return json.loads(s)
    except Exception:
        return None

def _extract_text_from_match_metadata(md: Dict[str, Any]) -> str:
    # Try common metadata keys used by chunking pipelines
    for key in ("text", "chunk_text", "content", "clause_text", "snippet", "page_content"):
        v = md.get(key)
        if isinstance(v, str) and v.strip():
            return v.strip()
    # Fallback: show a compact representation
    try:
        return json.dumps(md, ensure_ascii=False)[:800]
    except Exception:
        return str(md)[:800]

def _infer_risk_level(output: Any) -> str:
    if isinstance(output, dict):
        rl = output.get("risk_level")
        if isinstance(rl, str) and rl.strip():
            return rl.strip().lower()
    # fallback heuristic on serialized output
    text = json.dumps(output, ensure_ascii=False).lower() if output is not None else ""
    if any(t in text for t in HIGH_RISK_TERMS):
        return "high"
    return "medium"

def _extract_term_hits(*, text: str, agent_type: str, max_items: int = 5) -> List[Dict[str, Any]]:
    """Fallback evidence extraction when we don't have structured retrieval matches available."""
    if not isinstance(text, str) or not text:
        return []
    lower = text.lower()
    out: List[Dict[str, Any]] = []
    seen_terms = set()
    for term in HIGH_RISK_TERMS:
        if term in seen_terms:
            continue
        idx = lower.find(term)
        if idx < 0:
            continue
        seen_terms.add(term)
        start = max(0, idx - 120)
        end = min(len(text), idx + 160)
        snippet = text[start:end].replace("\n", " ")
        out.append({
            "agent": agent_type,
            "query": "(memory-text-scan)",
            "score": None,
            "snippet": snippet[:800],
            "is_high_risk": True,
            "matched_term": term,
        })
        if len(out) >= max_items:
            break
    return out

def get_latest_agent_output(*, contract_id: str, agent_type: str, top_k: int = 10) -> Dict[str, Any]:
    """Return the latest stored agent output for this contract+agent.

    Best-effort confidence behavior: if the latest record has no numeric confidence (common for refinement-only
    records), fall back to the newest record in the result set that *does* have confidence.
    """
    resp = query_agent_memory(
        query="risk",
        contract_id=contract_id,
        agent_type=agent_type,
        top_k=top_k,
    )
    matches = _matches(resp)
    if not matches:
        return {
            "agent_type": agent_type,
            "timestamp": None,
            "risk_level": "unknown",
            "confidence": None,
            "output": {"risk_level": "unknown", "note": "No memory found"},
            "_memory_metadata": {},
        }

    # Sort all candidates by timestamp (desc)
    ranked: List[Tuple[datetime, Dict[str, Any]]] = []
    for m in matches:
        md = _md(m)
        ranked.append((_parse_ts(md.get("timestamp")), md))
    ranked.sort(key=lambda t: t[0], reverse=True)

    best_md = ranked[0][1]
    # output_json is expected to be JSON; if it was truncated, it should still be valid JSON wrapper
    output_raw = best_md.get("output_json") or ""
    output_obj = _safe_json_loads(output_raw)
    if output_obj is None:
        output_obj = {"raw_output_json": output_raw}

    # Prefer explicit risk_level metadata, else infer
    risk_level = best_md.get("risk_level")
    if not isinstance(risk_level, str) or not risk_level.strip():
        risk_level = _infer_risk_level(output_obj)

    # 1) First try: explicit confidence from the latest record's metadata
    conf: Optional[float] = None
    c_latest = best_md.get("confidence")
    if isinstance(c_latest, (int, float)):
        conf = float(c_latest)
    # 2) Second try: confidence embedded in the latest record's output JSON
    if conf is None and isinstance(output_obj, dict):
        c2 = output_obj.get("confidence")
        if isinstance(c2, (int, float)):
            conf = float(c2)
    # 3) Fallback: newest record (by timestamp) that has numeric confidence in metadata or output
    if conf is None:
        for _, md in ranked[1:]:
            c_md = md.get("confidence")
            if isinstance(c_md, (int, float)):
                conf = float(c_md)
                break
            o = _safe_json_loads(md.get("output_json") or "")
            if isinstance(o, dict):
                c_o = o.get("confidence")
                if isinstance(c_o, (int, float)):
                    conf = float(c_o)
                    break

    return {
        "agent_type": agent_type,
        "timestamp": best_md.get("timestamp"),
        "risk_level": str(risk_level).lower(),
        "confidence": conf,
        "output": output_obj,
        "_memory_metadata": best_md,
    }

def _extract_high_risk_clauses(*, agent_output: Any, agent_type: str, max_items: int = 5) -> List[Dict[str, Any]]:
    """Extract evidence snippets from retrieval matches and label them as high-risk if terms match."""
    out: List[Dict[str, Any]] = []
    if not isinstance(agent_output, dict):
        return out
    retrieval = agent_output.get("retrieval")
    if not isinstance(retrieval, dict):
        return out
    per_query = retrieval.get("per_query")
    if not isinstance(per_query, list):
        return out

    candidates: List[Tuple[float, Dict[str, Any]]] = []
    for item in per_query:
        if not isinstance(item, dict):
            continue
        q = item.get("query")
        matches = item.get("matches")
        if not isinstance(matches, list):
            continue
        for m in matches:
            if not isinstance(m, dict):
                continue
            score = m.get("score")
            md = m.get("metadata") if isinstance(m.get("metadata"), dict) else {}
            snippet = _extract_text_from_match_metadata(md)
            snippet_l = snippet.lower()
            is_high = any(t in snippet_l for t in HIGH_RISK_TERMS)
            score_f = float(score) if isinstance(score, (int, float)) else 0.0
            rank_score = score_f + (0.25 if is_high else 0.0)
            candidates.append((rank_score, {
                "agent": agent_type,
                "query": q,
                "score": score_f if isinstance(score, (int, float)) else None,
                "snippet": snippet[:800],
                "is_high_risk": bool(is_high),
            }))

    candidates.sort(key=lambda t: t[0], reverse=True)
    seen = set()
    for _, c in candidates:
        key = (c.get("agent"), c.get("snippet"))
        if key in seen:
            continue
        seen.add(key)
        if c.get("is_high_risk"):
            out.append(c)
        if len(out) >= max_items:
            break
    return out

def _overall_risk(agent_risks: Dict[str, str]) -> str:
    best = "low"
    for r in agent_risks.values():
        r = (r or "unknown").lower()
        if RISK_ORDER.get(r, 1) > RISK_ORDER.get(best, 0):
            best = r
    if best not in {"low", "medium", "high"}:
        best = "medium"
    return best

# 1) Retrieve latest outputs
latest = {a: get_latest_agent_output(contract_id=CONTRACT_ID, agent_type=a) for a in AGENT_TYPES}

# 2) Collect into final JSON
final = dict(FINAL_CONTRACT_SCHEMA)
final["contract_id"] = CONTRACT_ID
final["generated_at"] = _utc_now_iso()

agent_risks: Dict[str, str] = {}
conf_per_agent: Dict[str, Optional[float]] = {}
high_risk_clauses: List[Dict[str, Any]] = []

for a in AGENT_TYPES:
    payload = latest[a]
    output_obj = payload.get("output")
    agent_risks[a] = payload.get("risk_level") or _infer_risk_level(output_obj)
    conf_per_agent[a] = payload.get("confidence")
    # Evidence from structured retrieval (if present)
    high_risk_clauses.extend(_extract_high_risk_clauses(agent_output=output_obj, agent_type=a, max_items=5))
    # Fallback evidence: scan stored memory text for high-risk terms
    if not high_risk_clauses:
        mem_text = (payload.get("_memory_metadata") or {}).get("output_json") or ""
        high_risk_clauses.extend(_extract_term_hits(text=mem_text, agent_type=a, max_items=3))

final["legal"] = latest["legal_agent"]["output"]
final["compliance"] = latest["compliance_agent"]["output"]
final["finance"] = latest["finance_agent"]["output"]
final["operations"] = latest["operations_agent"]["output"]

final["overall_risk"] = _overall_risk(agent_risks)
final["confidence"]["per_agent"] = conf_per_agent
vals = [v for v in conf_per_agent.values() if isinstance(v, (int, float))]
final["confidence"]["overall_avg"] = (sum(vals) / len(vals)) if vals else None

# Keep only high-risk items and cap length
final["high_risk_clauses"] = high_risk_clauses[:20]

out_path = OUTPUTS_DIR / f"final_contract_{CONTRACT_ID}.json"
out_path.write_text(json.dumps(final, ensure_ascii=False, indent=2), encoding="utf-8")

print("Saved:", out_path)
print("overall_risk:", final["overall_risk"])
print("confidence overall_avg:", final["confidence"]["overall_avg"])
print("high_risk_clauses:", len(final["high_risk_clauses"]))
if final["confidence"]["overall_avg"] is None:
    print("NOTE: Confidence is None because older memories may not include a stored confidence score.")
    print("      Re-run Section 3 + Section 4 to persist fresh memories; new upserts store confidence in metadata.")
if len(final["high_risk_clauses"]) == 0:
    print("NOTE: No high-risk evidence snippets were extracted.")
    print("      To improve this, ensure your chunk vectors store clause text in metadata (e.g., key 'text' or 'page_content').")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Saved: C:\Users\LENOVO\OneDrive\Dokumen\legal contracts eda\milestone3\outputs\final_contract_demo_contract.json
overall_risk: high
confidence overall_avg: 0.5543624817841666
high_risk_clauses: 3


## 8) Human-Readable Report Template (Executive Summary + Section Bullets)

Same data → multiple views: executives don’t read JSON, lawyers want evidence, managers want summaries.

This section converts the final JSON into a simple report structure with:
- Plain-language **Executive Summary**
- Bullet points per section (legal / compliance / finance / operations)
- A short list of high-risk clause snippets (evidence)

In [48]:
REPORT_STRUCTURE = [
    "Executive Summary",
    "Overall Risk Assessment",
    "Legal Analysis",
    "Compliance Analysis",
    "Financial Analysis",
    "Operational Analysis",
    "Conclusion & Recommendations",
]

def _bulletize(lines: List[str]) -> str:
    return "\n".join([f"- {ln}" for ln in lines if isinstance(ln, str) and ln.strip()])

def _clean_snippet(s: str) -> str:
    if not isinstance(s, str):
        return ""
    # Normalize common escaped forms from JSON previews
    s = s.replace("\\n", " ").replace("\n", " ")
    s = s.replace("\\\"", '"').replace("\\/", "/")
    # Collapse whitespace
    s = " ".join(s.split())
    return s.strip()

def _looks_like_json_fragment(s: str) -> bool:
    """Detect snippets that are likely cut-through JSON rather than readable clause text."""
    if not isinstance(s, str):
        return True
    t = s.strip()
    if not t:
        return True
    tl = t.lower()

    # Hard-block: retrieval/memory JSON keys (these are NOT clause text)
    hard_markers = [
        "top_k_per_query",
        "filter_chunks_by_contract_id",
        "per_query",
        "matches",
        "retrieval\":",
        "metadata\":",
        "output_json",
        "raw_output_json",
    ]
    if any(m in tl for m in hard_markers):
        return True

    # Soft heuristics: many JSON-ish tokens and low natural-language signal
    jsonish = sum(t.count(ch) for ch in ["{", "}", "[", "]", ":"])
    backslashes = t.count("\\")
    quotes = t.count('"')
    letters = sum(ch.isalpha() for ch in t)
    spaces = t.count(" ")

    if (jsonish + backslashes + quotes) >= 8 and (letters < 60 or spaces < 8):
        return True
    return False

def build_executive_summary(final_json: Dict[str, Any]) -> str:
    risk = (final_json.get("overall_risk") or "medium").lower()
    conf = final_json.get("confidence", {}).get("overall_avg")
    conf_s = "unknown" if conf is None else f"{conf:.3f}"
    n_hi = len(final_json.get("high_risk_clauses") or [])
    # Simple language, no legal jargon
    return (
        f"Overall risk is {risk}. "
        f"Confidence score average is {conf_s}. "
        f"We found {n_hi} high-risk clause evidence snippets to review."
    )

def build_report(final_json: Dict[str, Any]) -> Dict[str, str]:
    per_agent_conf = (final_json.get("confidence") or {}).get("per_agent") or {}
    hi = final_json.get("high_risk_clauses") or []

    report: Dict[str, str] = {}
    report["Executive Summary"] = build_executive_summary(final_json)

    report["Overall Risk Assessment"] = _bulletize([
        f"Overall risk level: {(final_json.get('overall_risk') or 'medium').lower()}",
        f"Confidence (avg): {final_json.get('confidence', {}).get('overall_avg')}",
        f"Legal confidence: {per_agent_conf.get('legal_agent')}",
        f"Compliance confidence: {per_agent_conf.get('compliance_agent')}",
        f"Finance confidence: {per_agent_conf.get('finance_agent')}",
        f"Operations confidence: {per_agent_conf.get('operations_agent')}",
    ])

    # Per-section bullets (keep it generic; the evidence list carries the detail)
    report["Legal Analysis"] = _bulletize([
        "Key legal obligations summarized from retrieval outputs.",
        "Review termination, breach, and indemnity language if present.",
    ])
    report["Compliance Analysis"] = _bulletize([
        "Key privacy/security/compliance obligations summarized from retrieval outputs.",
        "Review audit rights, incident notification, and data handling language if present.",
    ])
    report["Financial Analysis"] = _bulletize([
        "Key payment, invoicing, and late-fee obligations summarized from retrieval outputs.",
        "Review liability and penalty exposure if present.",
    ])
    report["Operational Analysis"] = _bulletize([
        "Key deliverables, timelines, and SLA obligations summarized from retrieval outputs.",
        "Review uptime commitments and service credits if present.",
    ])

    # Evidence: prefer readable snippets (skip JSON fragments from truncated memory previews)
    top_evidence: List[str] = []
    for item in hi:
        if not isinstance(item, dict):
            continue
        raw = item.get("snippet") or ""
        snippet = _clean_snippet(raw)
        if not snippet:
            continue
        if _looks_like_json_fragment(snippet):
            continue
        agent = item.get("agent") or "unknown_agent"
        term = item.get("matched_term")
        prefix = f"[{agent}] " + (f"(term: {term}) " if isinstance(term, str) and term else "")
        top_evidence.append((prefix + snippet)[:220])
        if len(top_evidence) >= 8:
            break

    report["Conclusion & Recommendations"] = _bulletize([
        "Prioritize review of the high-risk clauses listed below.",
        "If overall risk is high, consider negotiation points or approvals before signing.",
        *(["High-risk evidence:"] + top_evidence if top_evidence else ["No clean high-risk evidence snippets were extracted from memory."]),
    ])

    return report

report = build_report(final)

print("\n".join(["=" * 80, "REPORT PREVIEW", "=" * 80]))
for section in REPORT_STRUCTURE:
    print(f"\n## {section}\n")
    print(report.get(section, ""))

REPORT PREVIEW

## Executive Summary

Overall risk is high. Confidence score average is 0.558. We found 3 high-risk clause evidence snippets to review.

## Overall Risk Assessment

- Overall risk level: high
- Confidence (avg): 0.5580853783255555
- Legal confidence: 0.6606792805499999
- Compliance confidence: None
- Finance confidence: 0.54319379216
- Operations confidence: 0.47038306226666665

## Legal Analysis

- Key legal obligations summarized from retrieval outputs.
- Review termination, breach, and indemnity language if present.

## Compliance Analysis

- Key privacy/security/compliance obligations summarized from retrieval outputs.
- Review audit rights, incident notification, and data handling language if present.

## Financial Analysis

- Key payment, invoicing, and late-fee obligations summarized from retrieval outputs.
- Review liability and penalty exposure if present.

## Operational Analysis

- Key deliverables, timelines, and SLA obligations summarized from retrieval out