Skip to content

Releases: dupuis1212/agentic-ai-course-labs

Scout v1.0

11 Jun 22:39

Choose a tag to compare

First production-grade release: Scout, a multi-agent research assistant —
give it a question, approve its research plan, and it searches the web,
reads and cross-checks sources, then writes a cited report, with full
tracing, evals, and guardrails.

The system (modules 1–12)

  • LLM access via hosted NIM endpoints (Nemotron 3, OpenAI-compatible
    client), model pinned in one place — config.py (M1, M3).
  • Agent core: hand-rolled ReAct, then a LangGraph StateGraph with
    tool calling and streaming (M2); architecture design doc + layout
    refactor (M3).
  • Planner with a self-critique loop and frozen Pydantic plan schemas (M4).
  • Memory: SQLite checkpointer (resume, time travel) + long-term user
    preference store (M5).
  • RAG: page fetching, ingestion into Chroma, retrieval as a tool,
    [n]-cited answers — embeddings/reranking via hosted Retriever NIMs (M6).
  • Multi-agent team: supervisor + Searcher / Reader / Fact-checker /
    Writer over shared state (M7).
  • Evals: 15-question golden set, LLM-as-judge (grounding / coverage /
    citations), regression comparison (M8).
  • Safety & oversight: NeMo Guardrails on retrieved content and output,
    audit trail, human plan approval via interrupt() (M9).
  • Deployment: FastAPI async job API (POST /research,
    GET /status/{job_id}, POST /research/{job_id}/approve), Dockerfile (M10).
  • Observability: Langfuse tracing per node, cost/latency accounting,
    continuous eval monitor with alert threshold (M11).
  • NVIDIA variant: NeMo Agent Toolkit profiling + Nemotron model
    comparison on the golden set (M12, module-12/nvidia-variant/).

Hardening (module 13 — this release)

  • scout/resilience.py: retry with exponential backoff + jitter on
    transient faults only (429/5xx/timeouts); per-tool timeouts; per-node
    timeout and global run budget via stream_with_budget — constants sized
    from module 12's real latency measurements.
  • Graceful failure: a timed-out or over-budget run ends with the API's
    partial status and a salvaged, still-cited report — never a hang,
    never an empty crash.
  • Idempotent submission, drilled: the Idempotency-Key header dedups
    duplicate POST /research calls (same key → same job, one execution).
  • Crash recovery: recover_job() picks a dead run back up from its last
    checkpoint — same thread_id, completed work never re-paid.
  • tests/failure_drills.py: dead search API, 429 burst, permanent fault,
    stuck node, blown budget, kill+recover, double submission — all simulated,
    all offline.
  • PRODUCTION-CHECKLIST.md: the NCP-AAI Production Checklist, one section
    per exam domain.
  • Demo run archived in module-13/demo/ (transcript, cited report, judge
    scores, annotated trace).
  • CI quality gate: offline smoke tests for modules 1–13 on every push/PR,
    plus a golden-set eval-regression job that fails the build if quality
    drops.

Known limits (deliberate — see PRODUCTION-CHECKLIST.md)

Single host; in-memory job registry; SQLite checkpointer (not shared);
no authn/z or multi-tenancy; no durable queue; no SLOs/paging; circuit
breaker covered as a concept only. These are the "beyond this course"
lines of the checklist.