Releases: dupuis1212/agentic-ai-course-labs
Releases · dupuis1212/agentic-ai-course-labs
Scout v1.0
First production-grade release: Scout, a multi-agent research assistant —
give it a question, approve its research plan, and it searches the web,
reads and cross-checks sources, then writes a cited report, with full
tracing, evals, and guardrails.
The system (modules 1–12)
- LLM access via hosted NIM endpoints (Nemotron 3, OpenAI-compatible
client), model pinned in one place —config.py(M1, M3). - Agent core: hand-rolled ReAct, then a LangGraph
StateGraphwith
tool calling and streaming (M2); architecture design doc + layout
refactor (M3). - Planner with a self-critique loop and frozen Pydantic plan schemas (M4).
- Memory: SQLite checkpointer (resume, time travel) + long-term user
preference store (M5). - RAG: page fetching, ingestion into Chroma, retrieval as a tool,
[n]-cited answers — embeddings/reranking via hosted Retriever NIMs (M6). - Multi-agent team: supervisor + Searcher / Reader / Fact-checker /
Writer over shared state (M7). - Evals: 15-question golden set, LLM-as-judge (grounding / coverage /
citations), regression comparison (M8). - Safety & oversight: NeMo Guardrails on retrieved content and output,
audit trail, human plan approval viainterrupt()(M9). - Deployment: FastAPI async job API (
POST /research,
GET /status/{job_id},POST /research/{job_id}/approve), Dockerfile (M10). - Observability: Langfuse tracing per node, cost/latency accounting,
continuous eval monitor with alert threshold (M11). - NVIDIA variant: NeMo Agent Toolkit profiling + Nemotron model
comparison on the golden set (M12,module-12/nvidia-variant/).
Hardening (module 13 — this release)
scout/resilience.py: retry with exponential backoff + jitter on
transient faults only (429/5xx/timeouts); per-tool timeouts; per-node
timeout and global run budget viastream_with_budget— constants sized
from module 12's real latency measurements.- Graceful failure: a timed-out or over-budget run ends with the API's
partialstatus and a salvaged, still-cited report — never a hang,
never an empty crash. - Idempotent submission, drilled: the
Idempotency-Keyheader dedups
duplicatePOST /researchcalls (same key → same job, one execution). - Crash recovery:
recover_job()picks a dead run back up from its last
checkpoint — samethread_id, completed work never re-paid. tests/failure_drills.py: dead search API, 429 burst, permanent fault,
stuck node, blown budget, kill+recover, double submission — all simulated,
all offline.PRODUCTION-CHECKLIST.md: the NCP-AAI Production Checklist, one section
per exam domain.- Demo run archived in
module-13/demo/(transcript, cited report, judge
scores, annotated trace). - CI quality gate: offline smoke tests for modules 1–13 on every push/PR,
plus a golden-set eval-regression job that fails the build if quality
drops.
Known limits (deliberate — see PRODUCTION-CHECKLIST.md)
Single host; in-memory job registry; SQLite checkpointer (not shared);
no authn/z or multi-tenancy; no durable queue; no SLOs/paging; circuit
breaker covered as a concept only. These are the "beyond this course"
lines of the checklist.