Make VLM analysis frame count configurable (beyond fixed 4-frame grid)

**Is your feature request related to a problem? Please describe.**

On SharpAI Aegis v0.2.9 (macOS, Apple Silicon), VLM clip analysis appears to be limited to a fixed **4-frame composite** per clip. Saved analyses in `~/.aegis-ai/media/video_chats/` consistently reference four sequential frames (e.g. "all four timestamps", "frame three", "four sequential moments").

For security monitoring, that often isn't enough temporal context. Short events like quick deliveries, brief intrusions, etc. can fall between the sampled frames in a ~1-minute clip. Right now I cannot configure how many frames the VLM sees *within* each clip.

From inspecting the installed app, the motion-composite pipeline seems to cap grid mode at 4 frames (`render4FrameGrid`), and I couldn't find a user-facing setting in the Aegis UI or `~/.aegis-ai/` config files to change this.

---

**Describe the solution you'd like**

A configurable setting (global and/or per-camera) for how many frames are sent to the VLM per clip, for example:

- **Grid size:** 2×2 (4), 3×3 (9), 4×4 (16), or a custom frame count
- or simply a number of frames samples per clip: 4, 6, 10, etc

Ideally this would be exposed in **AI Setup** or per-camera settings, with a short note on the tradeoff: more frames = richer temporal context, but higher VLM latency/cost and potentially lower per-frame resolution when composited into a single image.

---

**Describe alternatives you've considered**

1. **Custom DeepCamera analysis skill**: viable for advanced users (e.g. multi-frame ffmpeg sampling like `smarthome-bench`), but requires building and maintaining a separate pipeline outside the default Aegis flow.
2. **Patching `app.asar`**: technically possible but unsupported, breaks on updates, and not a sustainable solution (also, not sure if this would be considered legal based on the license).

---

**Additional context**

- **App:** SharpAI Aegis `0.2.9`
- **Platform:** macOS (Apple Silicon)
- **VLM:** Local engine (llama.cpp), Qwen3.5-2B
- **Camera:** Local webcam (testing)
- **Data dir:** `~/.aegis-ai/`

From the app bundle, the pipeline appears to sample frames at `sampleFps: 5` (up to ~60 stored frames), then choose between a stroboscopic composite (busy motion) or a fixed 4-frame 2×2 grid (simpler scenes). Aegis docs mention frame extraction at 0.1–5 fps, but it's unclear how that relates to the final 4-frame VLM input — clarifying or unifying those settings would help.

Happy to provide sample `video_chats` JSON outputs or help test a beta build if this gets implemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make VLM analysis frame count configurable (beyond fixed 4-frame grid) #203

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Make VLM analysis frame count configurable (beyond fixed 4-frame grid) #203

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions