Releases · dupuis1212/agentic-ai-course-labs

First production-grade release: Scout, a multi-agent research assistant —
give it a question, approve its research plan, and it searches the web,
reads and cross-checks sources, then writes a cited report, with full
tracing, evals, and guardrails.

The system (modules 1–12)

LLM access via hosted NIM endpoints (Nemotron 3, OpenAI-compatible
client), model pinned in one place — config.py (M1, M3).
Agent core: hand-rolled ReAct, then a LangGraph StateGraph with
tool calling and streaming (M2); architecture design doc + layout
refactor (M3).
Planner with a self-critique loop and frozen Pydantic plan schemas (M4).
Memory: SQLite checkpointer (resume, time travel) + long-term user
preference store (M5).
RAG: page fetching, ingestion into Chroma, retrieval as a tool,
[n]-cited answers — embeddings/reranking via hosted Retriever NIMs (M6).
Multi-agent team: supervisor + Searcher / Reader / Fact-checker /
Writer over shared state (M7).
Evals: 15-question golden set, LLM-as-judge (grounding / coverage /
citations), regression comparison (M8).
Safety & oversight: NeMo Guardrails on retrieved content and output,
audit trail, human plan approval via interrupt() (M9).
Deployment: FastAPI async job API (POST /research,
GET /status/{job_id}, POST /research/{job_id}/approve), Dockerfile (M10).
Observability: Langfuse tracing per node, cost/latency accounting,
continuous eval monitor with alert threshold (M11).
NVIDIA variant: NeMo Agent Toolkit profiling + Nemotron model
comparison on the golden set (M12, module-12/nvidia-variant/).

Hardening (module 13 — this release)

scout/resilience.py: retry with exponential backoff + jitter on
transient faults only (429/5xx/timeouts); per-tool timeouts; per-node
timeout and global run budget via stream_with_budget — constants sized
from module 12's real latency measurements.
Graceful failure: a timed-out or over-budget run ends with the API's
partial status and a salvaged, still-cited report — never a hang,
never an empty crash.
Idempotent submission, drilled: the Idempotency-Key header dedups
duplicate POST /research calls (same key → same job, one execution).
Crash recovery: recover_job() picks a dead run back up from its last
checkpoint — same thread_id, completed work never re-paid.
tests/failure_drills.py: dead search API, 429 burst, permanent fault,
stuck node, blown budget, kill+recover, double submission — all simulated,
all offline.
PRODUCTION-CHECKLIST.md: the NCP-AAI Production Checklist, one section
per exam domain.
Demo run archived in module-13/demo/ (transcript, cited report, judge
scores, annotated trace).
CI quality gate: offline smoke tests for modules 1–13 on every push/PR,
plus a golden-set eval-regression job that fails the build if quality
drops.

Known limits (deliberate — see PRODUCTION-CHECKLIST.md)

Single host; in-memory job registry; SQLite checkpointer (not shared);
no authn/z or multi-tenancy; no durable queue; no SLOs/paging; circuit
breaker covered as a concept only. These are the "beyond this course"
lines of the checklist.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

The system (modules 1–12)

Hardening (module 13 — this release)

Known limits (deliberate — see PRODUCTION-CHECKLIST.md)

Uh oh!

Releases: dupuis1212/agentic-ai-course-labs

Scout v1.0

The system (modules 1–12)

Hardening (module 13 — this release)

Known limits (deliberate — see PRODUCTION-CHECKLIST.md)

Uh oh!