AI safety monitoring for toddlers using live video + audio.
The system watches only when a toddler is present, detects high-risk situations, verifies context, and speaks alerts in real time.
post_2.mov
- Clear real-world problem: home child safety.
- Multi-signal intelligence: toddler presence, dangerous objects, fall events, and risky-zone crossing.
- Strong demo UX: live overlays + spoken warnings + structured alert logs.
- Practical design: false-positive controls, cooldowns, and stateful event transitions.
- Hackathon-ready evidence: saved videos, debug ticks, and JSON alert history.
- Toddler-first gating: safety processors run only when toddler presence is true.
- Reduced false alarms and lower compute when toddler is not visible.
- Local YOLO detection for dangerous classes near toddler.
- Proximity logic between toddler bbox and object bbox.
- Optional second-pass behavior supported in object detector for harder objects.
- Alert transitions are event-based (not frame-spam).
- Toddler-gated fall model inference.
- Buffered/thresholded logic in fall processor to avoid flicker alerts.
- Emits alert event + spoken warning when fall state transitions to true.
- Startup Moondream two-step flow:
- Step 1:
/v1/queryto return unsafe place names. - Step 2:
/v1/detectfor returned place(s), prioritizing stairs-like labels. - Risk zone locks after successful detection.
- If not found, detection retries on later frames until success (configurable).
- Boundary crossing and near-zone state are tracked in real time.
- Moondream verification for dangerous-object candidates before final danger alert.
- Helps suppress false positives from local detector alone.
- Cartesia TTS warnings for:
- fall detected
- dangerous object near toddler
- toddler near/crossed/inside risky stairs zone
- Alert cooldown and dedupe logic prevents repetitive speech spam.
- Live overlay stream with all active safety annotations.
- JSONL alert sink for downstream actions (notifications/email integration).
- Saved zone snapshots and raw Moondream request/response logs for debugging.
- Offline video runners to reproduce and demonstrate results quickly.
- Live annotated stream:
- toddler boxes, danger boxes, fall box, red risky-zone box.
- status text including boundary crossing.
- Spoken messages (Cartesia):
- “Toddler fall detected.”
- “ is dangerously close to the toddler.”
- “Baby crossed into the stairs danger zone.”
backend/: agent runtime, processors, routes, toolsfrontend/: viewer/demo UI- Python
3.12 uv- API keys:
STREAM_API_KEY,STREAM_API_SECRETGOOGLE_API_KEYROBOFLOW_API_KEYMOONDREAM_API_KEYCARTESIA_API_KEY(for TTS)
- Run agent:
- Optional stream endpoint server:
- Stream URL:
http://127.0.0.1:8000/video/stream- Alert events:
backend/data/alerts/alerts.jsonl
- Zone init artifacts:
backend/data/test_results/zone_risk/zone_init_*_raw.jpgbackend/data/test_results/zone_risk/zone_init_*_marked.jpgbackend/data/test_results/zone_risk/zone_init_*_response.jsonbackend/data/test_results/zone_risk/moondream_api_debug.jsonl
- Tick-level debug CSVs in
backend/data/test_results/ - Frontend joins Stream call via backend token endpoint:
POST /auth/stream-token
- Backend agent joins same
call_id, processes incoming user camera. - Frontend displays processed output from:
GET /video/stream
personclass is filtered from object overlay/state.- Zone is locked after detection; not continuously re-detected.
- Retry is enabled only until initial zone lock succeeds.
- TTS is event-based with cooldown to avoid repetitive spam.
cd backend uv venv --python 3.12 .venv source .venv/bin/activate uv sync
If needed:
uv add "vision-agents[moondream]"Create
backend/.env:STREAM_API_KEY=... STREAM_API_SECRET=... GOOGLE_API_KEY=... ROBOFLOW_API_KEY=... MOONDREAM_API_KEY=... CARTESIA_API_KEY=... GEMINI_MODEL=gemini-2.5-flash-lite STARTUP_SPEECH_ENABLED=false ZONE_RISK_DEBUG_DIR=data/test_results/zone_risk ZONE_RISK_INIT_AFTER_FRAMES=0 ZONE_RISK_INIT_RETRY_INTERVAL_FRAMES=30 ZONE_RISK_MAX_INIT_ATTEMPTS=0 ZONE_RISK_CROSSED_DISPLAY_SECONDS=3
cd backend source .venv/bin/activate python server.py run --call-type default --call-id vision-test-1
python server.py serve --host 127.0.0.1 --port 8000
Full danger/fall pipeline on video:
backend/.venv/bin/python backend/tools/run_toddler_danger_video.py \ --input backend/data/test_video/test_7.mp4 \ --output backend/data/test_results/test_7_full_pipeline.mp4 \ --process-fps 1 \ --debug-log backend/data/test_results/test_7_full_pipeline_ticks.csv
Zone-risk only runner:
backend/.venv/bin/python backend/tools/run_zone_risk_video.py \ --input backend/data/test_video/test_8.mp4 \ --output backend/data/test_results/test_8_zone_risk.mp4 \ --process-fps 2 \ --debug-log backend/data/test_results/test_8_zone_risk_ticks.csv
Playable sample outputs:
test_result_3.mp4
test_result_2.mp4
test_7.mp4
For Vite frontend, set:
VITE_BACKEND_URL=http://127.0.0.1:8000