Autonomous agents analyse video using Google Gemma 4 E4B-it (multimodal), with FastAPI, Docker, and (planned) Google ADK and A2A.
- Main events observed
- Security issues
- Logistics / item tracking (SAM 3.1 later)
- Attendance: anonymous counts only (no identity)
All processed video runs are logged with timestamp + frame index + event payload.
- Overwatch API:
GET /v1/health,GET/POST /v1/jobs,POST /v1/jobs/upload(multipart video),GET /v1/jobs/{id}/summary,GET /v1/jobs/{id}/events(paginated;legacy=truefor full list) - Agents (v0): Job-level text agents over the stored summary JSON (same vLLM as the pipeline):
- Synthesis — cross-chunk narrative and recommended actions (
agent_synthesisevent). - Risk review — safety / security triage (
agent_risk_reviewevent). - Incident brief — short handoff narrative and follow-ups (
agent_incident_briefevent). - Compliance brief — cross-industry SOP / safety alignment from the summary (
agent_compliance_brief). - Loss prevention — retail / logistics behavioural narrative without identities (
agent_loss_prevention). - Perimeter chain — ordered boundary / access storyline (
agent_perimeter_chain). - Privacy review — identity-inference and sensitive-descriptor risks in the structured summary (
agent_privacy_review). - Async queue:
POST /v1/jobs/{id}/agent-runswith body{"agent": "<kind>","force":?}where<kind>is one ofsynthesis,risk_review,incident_brief,compliance_brief,loss_prevention,perimeter_chain,privacy_review. Returns 202 andrun_id; pollGET /v1/agent-runs/{run_id}untilstatusiscompletedorfailed. A background agent worker drains the queue and appends orchestrator events. Staleprocessingrows are marked failed periodically. - Orchestrated flow (linear):
POST /v1/jobs/{id}/agent-runs/orchestratewith body{"steps":[...], "force":?}(up to 24 kinds; e.g. core three or the full seven cross-industry agents) returns 202 withorchestration_idandhead_run_id. PollGET /v1/agent-orchestrations/{orchestration_id}untilstatusiscompletedorfailed; each step enqueues the next automatically on success. Only one running orchestration per job (409 if a second is started). List past runs:GET /v1/jobs/{id}/agent-orchestrations. Response includesindustry_packwhen the run was started via the industry endpoint (elsenull). - Industry pipelines (named static graphs):
POST /v1/jobs/{id}/agent-runs/orchestrate/industrywith body{"industry":"<pack>","force":?}selects a curated agent order for the vertical. Packs:general(all seven agents),retail_qsr,logistics_warehouse,manufacturing,commercial_real_estate,transportation_hubs,critical_infrastructure,banking_atm,hospitality_venues,education_campus,healthcare_facilities. Orderings are defined inindustry_pipelines.py(versionable, testable). This is the recommended approach before LLM-driven dynamic graphs: explicit policy per industry. The UI exposes an industry dropdown and Run industry pipeline. - Blocking synthesis (optional):
POST /v1/jobs/{id}/agents/synthesis?blocking=truewaits for the LLM in-process (legacy / scripts). - Latest by kind:
GET …/agents/synthesis,…/risk-review,…/incident-brief,…/compliance-brief,…/loss-prevention,…/perimeter-chain,…/privacy-reviewreturn the newest stored event payload.
- Synthesis — cross-chunk narrative and recommended actions (
- Web UI (Vite + React): upload a video and poll for job status + summary (
frontend/; Dockeroverwatch-uipublishes the gateway on host port 80) - SQLite job + event store under
DATA_DIR; jobs includesummary_json(aggregated structured analysis) - Folder ingest: poll
INGEST_DIRfor new video files; stable-write detection (INGEST_STABLE_SEC) - Worker — per chunk:
- Observe (multimodal): MP4 clip → strict JSON
scene_summary+observations[](with JSON repair retries) - Specialists (text-only, 3 calls):
main_events,security+logistics, count-onlyattendance - Merged
ChunkAnalysisMergedstored onchunk_analysisevent - Job summary (
schema_version: "1") lists all chunk merges →GET …/summaryandJobRecord.summary
- Observe (multimodal): MP4 clip → strict JSON
- Docker Compose:
overwatch-api+overwatch-uiby default; local GPU vLLM is optional (--profile local-vllm).
VLLM_BASE_URL must be the OpenAI-compatible prefix immediately before /chat/completions.
If your gateway uses a different layout, override VLLM_BASE_URL accordingly.
Set VLLM_API_KEY if the hosted API requires a bearer token.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export DATA_DIR=./data/overwatch INGEST_DIR=./data/ingest
export VLLM_BASE_URL=https://your-vllm-host.example/v1
export VLLM_MODEL=gemma4
# export VLLM_API_KEY=... # if required
PYTHONPATH=src uvicorn overwatch.main:app --reload --port 8080Drop files into data/ingest/ (e.g. .mp4). After they are stable for INGEST_STABLE_SEC, a job is enqueued.
Web UI (development): in another terminal, from frontend/ run npm install then npm run dev. The dev server proxies /api/* to http://127.0.0.1:8080/v1/*, so keep the API on port 8080 and open http://localhost:5173. You can upload a file there instead of copying into INGEST_DIR.
Set VLLM_BASE_URL (and optional VLLM_API_KEY) in your environment or a .env file next to compose.yml (Compose loads it automatically). Example:
# .env (not committed) or export in shell:
# VLLM_BASE_URL=https://your-vllm-host.example/v1
# VLLM_API_KEY=...
export VLLM_MODEL=gemma4 # served model id on your vLLM
docker compose up --build- API + UI on port 80:
http://localhost/v1/health,http://localhost/v1/jobs, … — theoverwatch-apicontainer is not published on 8080; nginx proxies/v1/,/api/,/docs,/redoc,openapi.json, and/service(same JSON as the API’sGET /root). - UI + gateway (Compose):
http://localhost/— React app at/; ensure nothing else on the host is bound to port 80. If your Docker setup cannot publish host ports below 1024 (some rootless setups), changeoverwatch-uiincompose.ymlto e.g."8088:80"and usehttp://localhost:8088/(same paths under that origin). - Upload 502 / “Host is unreachable”: nginx re-resolves
overwatch-apivia Docker DNS (avoids stale IPs aftercompose up --build).overwatch-uiwaits until the APIhealthcheckpasses so the backend is listening before the gateway starts. If 502 persists, rundocker compose psand checkoverwatch-apilogs (crash/OOM during large uploads). - Remote vLLM:
https://some-vllm(no local GPU in default compose)
Creating a job from the host (Docker): paths inside the container are not the same as on your laptop. The compose file mounts host ./data/ingest at /data/ingest in the container. Use either:
# Easiest: basename only (resolved under INGEST_DIR in the container)
curl -s -X POST 'http://localhost/v1/jobs' \
-H 'Content-Type: application/json' \
-d '{"filename":"warehouse-1.mp4"}'
# Or the full path *as seen inside the container*
curl -s -X POST 'http://localhost/v1/jobs' \
-H 'Content-Type: application/json' \
-d '{"source_path":"/data/ingest/warehouse-1.mp4"}'Do not pass host paths like /home/you/.../data/ingest/foo.mp4 — the API will reject them.
Local GPU vLLM (optional):
export HF_TOKEN=...
docker compose --profile local-vllm up --buildPoint Overwatch at the local server with VLLM_BASE_URL=http://vllm-server:10802/v1 (vLLM OpenAI path is usually under /v1).
Mount points: ./data/ingest → /data/ingest, ./data/overwatch → /data/overwatch.
- Python / FastAPI
- Docker / Compose
- Hosted vLLM OpenAI API (chat completions) — Gemma 4 E4B-it on the remote service
- Google Agent Development Kit (planned)
- A2A — Agent-to-Agent protocol (planned)
- Engineering technical report (~10 pages): docs/technical_report.md — architecture, data model, chunk and agent pipelines, orchestration and industry packs, API overview, configuration, deployment, testing, and known limitations.
- Research evaluation plan: docs/eval_report.md — layered methodology, run cards, rubrics backlog.
- Factorio closed-loop (research): docs/factorio_report.md — prototype scope, phases, parser fixtures.
- Research doc index: docs/todo_report.md — links to the reports above.
| Variable | Default | Description |
|---|---|---|
DATA_DIR |
./data/overwatch |
SQLite DB and state |
INGEST_DIR |
./data/ingest |
Watched folder for new videos |
INGEST_POLL_INTERVAL_SEC |
5 |
Folder scan interval |
INGEST_STABLE_SEC |
2 |
File size+mtime must be unchanged this long |
INGEST_EXTENSIONS |
.mp4,.mkv,... |
Allowed suffixes |
VLLM_BASE_URL |
(empty) | Required for vLLM calls: OpenAI prefix before /chat/completions (e.g. https://your-host/v1). Set in .env or the shell; do not hardcode in the app. |
VLLM_MODEL |
gemma4 |
Model id for chat completions |
VLLM_API_KEY |
(unset) | Authorization: Bearer … if required |
VLLM_CHAT_TIMEOUT_SEC |
120 |
HTTP timeout for text chat completions |
VLLM_MULTIMODAL_ENABLED |
true |
Send MP4 clips per chunk to the model |
VLLM_MAX_CHUNKS_PER_JOB |
4 |
Max chunks analysed per job (cost/latency cap) |
VLLM_CHUNK_TIMEOUT_SEC |
600 |
Timeout per multimodal request (large JSON body) |
VLLM_CHUNK_MAX_TOKENS |
1024 |
Max completion tokens for observe (multimodal JSON) |
VLLM_JSON_RETRY_MAX |
2 |
JSON parse repair rounds per LLM step |
VLLM_SPECIALIST_MAX_TOKENS |
800 |
Max tokens per text specialist call |
VLLM_AGENT_MAX_TOKENS |
2048 |
Max tokens for job-level text agents (all seven kinds + any future agents) |
VLLM_AGENT_TIMEOUT_SEC |
120 |
HTTP timeout for job-level text agent chat completions |
VLLM_FACTORIO_MAX_TOKENS |
1024 |
Max tokens for Factorio HUD screenshot parser (overwatch.factorio) |
VLLM_FACTORIO_TIMEOUT_SEC |
120 |
HTTP timeout for Factorio multimodal parser calls |
FACTORIO_ROOT |
(unset) | Session + frame store root; default is DATA_DIR/factorio |
FACTORIO_MAX_ACTIONS_PER_MINUTE |
30 |
Cap for overwatch.factorio.SkillExecutor when not in dry-run |
FACTORIO_CAPTURE_INTERVAL_SEC |
2 |
Default interval for overwatch.factorio.capture_loop |
MAX_UPLOAD_BYTES |
536870912 |
Multipart upload cap (default 512 MiB; align with reverse proxy client_max_body_size) |
AGENT_RUN_STALE_SEC |
900 |
Agent runs stuck in processing longer than this are marked failed (worker restart / hang) |
API_RATE_LIMIT_PER_MINUTE |
0 |
Per-IP (or X-Forwarded-For) cap over 60s; 0 disables |
VLLM_SEGMENT_MAX_BYTES |
18000000 |
Skip chunk if re-encoded MP4 exceeds this (~18 MB) |
VLLM_VIDEO_SCALE_WIDTH |
480 |
FFmpeg scale width before upload |
VLLM_SEGMENT_INCLUDE_AUDIO |
true |
AAC in segment; retries video-only if ffmpeg fails |
WORKER_POLL_INTERVAL_SEC |
1 |
Worker idle sleep |
AGENT_WORKER_POLL_INTERVAL_SEC |
0.4 |
Sleep when the agent run queue is empty |
CORS_ORIGINS |
http://localhost:5173,... |
Comma-separated browser origins for the API; use * to allow all (dev only). Needed when the UI origin differs from the API (e.g. Vite on 5173). The Compose gateway on port 80 uses same-origin /api and does not rely on CORS. |
Set VLLM_BASE_URL= empty to skip all vLLM calls (probe + chat).
API quick reference
GET /v1/jobs/{id}/events?limit=50&after_id=0→{ "items": [...], "next_after_id": <id|null> }(usenext_after_idas the nextafter_id)GET /v1/jobs/{id}/events?legacy=true→ full array (large jobs)GET /v1/jobs/{id}/summary→ aggregatedchunk_analysesafter the job completesGET /v1/jobs/{id}includessummarywhen presentPOST /v1/jobs/{id}/agent-runs→ queue agent run (202 +run_id);GET /v1/agent-runs/{run_id}→ status andresultPOST /v1/jobs/{id}/agent-runs/orchestrate→ sequential multi-agent run (202 +orchestration_id,head_run_id);POST /v1/jobs/{id}/agent-runs/orchestrate/industry→ same, withindustrypack and persistedindustry_packon the orchestration rowGET /v1/agent-orchestrations/{orchestration_id}→ pipeline status (includesindustry_packwhen applicable);GET /v1/jobs/{id}/agent-orchestrations→ historyGET /v1/jobs/{id}/agent-runs→ recent runs for that jobPOST /v1/jobs/{id}/agents/synthesis?blocking=true→ blocking synthesis; withoutblocking, 202 + async queue (same asagent-runswithsynthesis)GET /v1/jobs/{id}/agents/synthesis/…/risk-review/…/incident-brief/…/compliance-brief/…/loss-prevention/…/perimeter-chain/…/privacy-review→ latest stored event payload
Observe multimodal format: chunk_video_user_messages — text + video_url (data:video/mp4;base64,...). Structured parsing lives in analysis/chunk_pipeline.py and analysis/json_extract.py.
PYTHONPATH=src python -m unittest discover -s tests -p 'test_*.py' -v