Autonomic Photography Assistant

This project runs an iterative photo-optimization loop for a portrait objective.

The pipeline captures an overview image from an Android phone, summarizes the scene with Gemini on Vertex, retrieves prior real examples, proposes a new {pan, orientation, z, zoom_ratio} setting, captures a candidate image, scores it, stores the attempt, and repeats until it stops.

Reference Visuals

These images are AI-generated and are included only as illustrative reference material. They may contain inaccuracies and should not be treated as the source of truth over the code and runtime behavior.

System Overview

UI Example

Flow

Step	What happens
1	Start from the current local initial pose: `pan=0`, `orientation=portrait`, `z=23`
2	Capture `PHOTO_ASSISTANT_OVERVIEW_COUNT` overview images; the current local env uses `2`
3	Gemini scene summary uses `VERTEX_SUMMARY_MODEL`; the current local env uses `gemini-2.5-flash`
4	Retrieval finds up to `4` good and `4` bad prior examples
5	Gemini reasoning uses `VERTEX_REASONER_MODEL`; the current local env uses `gemini-2.5-flash` for the next `{pan, orientation, z, zoom_ratio}`
6	In `manual` mode, you move the phone or rig and confirm capture
7	In `hardware` mode, the ESP8266 rig applies the proposal automatically
8	Gemini scoring uses `VERTEX_SCORER_MODEL`; the current local env uses `gemini-2.5-flash`
9	The run stores the attempt and either continues, stalls, or finishes

Current Local Model Configuration

Component	Model
Scene understanding	Vertex Gemini `gemini-2.5-flash`
Reasoning / proposal generation	Vertex Gemini `gemini-2.5-flash`
Candidate image scoring	Vertex Gemini `gemini-2.5-flash`

The current local env/config split is:

VERTEX_REASONER_MODEL
VERTEX_SUMMARY_MODEL
VERTEX_SCORER_MODEL

VERTEX_VISION_MODEL still works as a legacy fallback that sets both summary and scorer when the split variables are not provided.

Code defaults in app/config.py are still gemini-2.5-flash for reasoning and gemini-2.5-flash-lite for summary and scoring. The table above reflects the current local .env behavior, without exposing machine-specific paths or secrets.

Current Runtime Assumptions

android_http is the main camera adapter for live runs.
vertex is the main reasoner for live runs.
gemini is the main VLM for live runs.
The current local env starts each run from:
- pan=0
- orientation=portrait
- z=23
- zoom_ratio=1.0
The current local env uses:
- hardware as the default execution mode
- retrieval_top_k=20
- max_iterations=15
- overview_count=2
- 4 good retrieved examples
- 4 bad retrieved examples
- 3 session-memory items
The first proposal is seeded by an overview-derived framing prior before retrieval examples are allowed to dominate.
Retrieval excludes execution_mode="mock" examples.
Mock runs no longer write new examples into long-term retrieval memory.
Retrieval also excludes synthetic examples.
The current local Vertex env uses:
- VERTEX_REASONER_TIMEOUT_SECONDS=40
- VERTEX_REASONER_RETRY_COUNT=2
- VERTEX_REASONER_RETRY_BACKOFF_SECONDS=3
- no explicit scorer timeout override, so scorer timeout falls back to the code default

Phone Camera Setup

The phone side is the separate Android Studio project in android-camera-server/.

It exposes:

GET /health
POST /capture
GET /shot.jpg

The Android app currently serves HTTP on port 8080.

The Python pipeline expects:

PHOTO_ASSISTANT_ANDROID_CAPTURE_URL
PHOTO_ASSISTANT_ANDROID_SNAPSHOT_URL

Typical direct-network values:

PHOTO_ASSISTANT_ANDROID_CAPTURE_URL=http://<phone-ip>:8080/capture
PHOTO_ASSISTANT_ANDROID_SNAPSHOT_URL=http://<phone-ip>:8080/shot.jpg

POST /capture accepts query parameters for:

zoom_ratio or legacy zoom
orientation such as portrait or landscape

ESP8266 Rig Setup

hardware mode expects the ESP8266 rig controller to expose:

GET /health
GET /pose
POST /apply
POST /task/start
POST /task/end

Set:

PHOTO_ASSISTANT_ESP8266_BASE_URL=http://<esp-ip>

The currently deployed ESP8266 controller source is stored in esp8266_code.txt as a reference snapshot of what is running on the rig. That file shows:

the controller serves HTTP on port 80
AutoFrame-compatible endpoints include GET /health, GET /pose, GET or POST /apply, and GET or POST for both /task/start and /task/end
the deployed home pose is pan=0, orientation=portrait, z=23
accepted orientation values are portrait and landscape
the configured height range is 23 to 90 cm

Environment

HPC workflow

Before any Python command on the HPC environment, run:

module load mamba
source activate autoframe
export HF_HOME="/scratch/ihmehta/hf_cache"

Local Windows workflow

Use the repo venv interpreter explicitly to avoid python / pip mismatches:

py -3.11 -m venv autoframe
.\autoframe\Scripts\python.exe -m pip install -r requirements.txt

Important:

google-auth and requests are both required for Vertex ADC auth.
The repo .env is re-read by app.config whenever get_settings() is called.
.env values currently overwrite inherited process environment values for the same keys.

Example `.env`

This example reflects the current local env values, except that network URLs, project identifiers, and credential paths stay portable and placeholder-safe.

PHOTO_ASSISTANT_PROJECT_ROOT=.
PHOTO_ASSISTANT_DATA_DIR=./data
PHOTO_ASSISTANT_LOG_LEVEL=INFO

PHOTO_ASSISTANT_ESP8266_BASE_URL=http://<esp-ip>

PHOTO_ASSISTANT_DEFAULT_MODE=hardware
PHOTO_ASSISTANT_CAMERA_KIND=android_http
PHOTO_ASSISTANT_VLM_KIND=gemini
PHOTO_ASSISTANT_REASONER_KIND=vertex
PHOTO_ASSISTANT_DRY_RUN=false

PHOTO_ASSISTANT_ANDROID_CAPTURE_URL=http://<phone-ip>:8080/capture
PHOTO_ASSISTANT_ANDROID_SNAPSHOT_URL=http://<phone-ip>:8080/shot.jpg
PHOTO_ASSISTANT_ANDROID_TIMEOUT_SECONDS=10.0

PHOTO_ASSISTANT_INITIAL_PAN=0
PHOTO_ASSISTANT_INITIAL_ORIENTATION=portrait
PHOTO_ASSISTANT_INITIAL_Z=23

PHOTO_ASSISTANT_RETRIEVAL_TOP_K=20
PHOTO_ASSISTANT_PROMPT_GOOD_EXAMPLES_LIMIT=4
PHOTO_ASSISTANT_PROMPT_BAD_EXAMPLES_LIMIT=4
PHOTO_ASSISTANT_PROMPT_SESSION_MEMORY_LIMIT=3
PHOTO_ASSISTANT_OBJECTIVE_THRESHOLD=0.35
PHOTO_ASSISTANT_OBJECTIVE_WEIGHT=0.65
PHOTO_ASSISTANT_SCENE_WEIGHT=0.35
PHOTO_ASSISTANT_MAX_ITERATIONS=15
PHOTO_ASSISTANT_GOOD_ENOUGH_THRESHOLD=8.8
PHOTO_ASSISTANT_STALL_WINDOW=3
PHOTO_ASSISTANT_STALL_DELTA=0.2
PHOTO_ASSISTANT_OVERVIEW_COUNT=2

PHOTO_ASSISTANT_FOLDER_CAMERA_DIR=./data/captures/folder_source

VERTEX_PROJECT=<your-vertex-project-id>
VERTEX_LOCATION=us-central1
VERTEX_REASONER_MODEL=gemini-2.5-flash
VERTEX_SUMMARY_MODEL=gemini-2.5-flash
VERTEX_SCORER_MODEL=gemini-2.5-flash
VERTEX_REASONER_TIMEOUT_SECONDS=40
VERTEX_REASONER_RETRY_COUNT=2
VERTEX_REASONER_RETRY_BACKOFF_SECONDS=3
VERTEX_ENDPOINT_OVERRIDE=
VERTEX_ACCESS_TOKEN=
GOOGLE_APPLICATION_CREDENTIALS=./credentials/vertex-service-account.json

Initialize Data

Run once after creating the environment:

.\autoframe\Scripts\python.exe -m app.cli init-data

If old mock or synthetic examples are present in the dataset, clean them out and validate the artifacts:

.\autoframe\Scripts\python.exe -m app.cli remove-mock-examples
.\autoframe\Scripts\python.exe -m app.cli remove-synthetic-examples
.\autoframe\Scripts\python.exe -m app.cli validate-jsonl

Run The Pipeline

Manual run

.\autoframe\Scripts\python.exe -m app.cli run-objective "Capture a flattering portrait of one person" `
  --execution-mode manual `
  --camera-kind android_http `
  --reasoner-kind vertex `
  --vlm-kind gemini

During a manual run:

Let the system capture the overview image.
Wait for the proposed pan / orientation / z / zoom_ratio.
Move the phone or rig to that position.
Confirm, skip, retry, or abort in the CLI.
Let the system score the result and continue.

Hardware run

.\autoframe\Scripts\python.exe -m app.cli run-objective "Capture a flattering portrait of one person" `
  --execution-mode hardware `
  --camera-kind android_http `
  --reasoner-kind vertex `
  --vlm-kind gemini

Mock run

.\autoframe\Scripts\python.exe -m app.cli run-objective "Capture a wide group portrait with background context" `
  --execution-mode mock `
  --camera-kind mock `
  --reasoner-kind mock `
  --vlm-kind mock

Mock runs are useful for wiring checks, but they do not contribute retrieval examples for live runs.

Maintenance Commands

Rebuild retrieval artifacts:

.\autoframe\Scripts\python.exe -m app.cli rebuild-embeddings

Validate dataset consistency:

.\autoframe\Scripts\python.exe -m app.cli validate-jsonl

Remove stored mock examples:

.\autoframe\Scripts\python.exe -m app.cli remove-mock-examples

Remove stored synthetic examples:

.\autoframe\Scripts\python.exe -m app.cli remove-synthetic-examples

Rescore stored examples that still have accessible image files:

.\autoframe\Scripts\python.exe -m app.cli rescore-examples

Add a real example manually:

.\autoframe\Scripts\python.exe -m app.cli offline-add-example ... --execution-mode manual

Export or import the retrieval dataset:

.\autoframe\Scripts\python.exe -m app.cli export-dataset .\dataset_export.json
.\autoframe\Scripts\python.exe -m app.cli import-dataset .\dataset_export.json

API Server

Start the API:

.\autoframe\Scripts\python.exe -m uvicorn app.main:app --reload --host 127.0.0.1 --port 8000

Useful endpoints:

GET /health
GET /ui
GET /dataset/ui
GET /ui/config
POST /objectives/run
POST /objectives/start
POST /retrieve/debug
POST /score/debug
POST /scene-summary/debug
POST /offline/add-example
POST /dataset/capture-overview
POST /dataset/capture-score
POST /dataset/save-example
POST /dataset/reset-rig

Where Outputs Go

data/captures/: saved overview and candidate images
data/sessions/: full session state
data/traces/: event logs
data/debug/: prompt/debug artifacts per iteration
data/examples/examples.jsonl: long-term retrieval memory source of truth
data/examples/*.npy: retrieval embeddings
data/examples/id_map.json: embedding row to example id mapping

Key Files

app/services/orchestrator.py: main optimization loop
app/services/vlm.py: Gemini scene summary and scoring
app/services/sas_reasoner.py: Gemini reasoning
app/services/prompt_builder.py: prompt construction and example selection
app/memory/retrieval.py: retrieval scoring and filtering
app/services/camera.py: Android HTTP camera adapter
app/services/rig.py: ESP8266 rig controller
android-camera-server/: phone-side camera server

Current Limitations

Manual mode still requires a human to move the device between iterations.
Hardware mode assumes the rig is reachable directly over HTTP.
Large images and remote Vertex calls add noticeable latency.
Retrieval and dataset maintenance use examples.jsonl, id_map.json, and the embedding .npy files directly.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
android-camera-server		android-camera-server
app		app
assets		assets
scripts		scripts
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
esp8266_code.txt		esp8266_code.txt
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autonomic Photography Assistant

Index

Reference Visuals

System Overview

UI Example

Flow

Current Local Model Configuration

Current Runtime Assumptions

Phone Camera Setup

ESP8266 Rig Setup

Environment

HPC workflow

Local Windows workflow

Example `.env`

Initialize Data

Run The Pipeline

Manual run

Hardware run

Mock run

Maintenance Commands

API Server

Where Outputs Go

Key Files

Current Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autonomic Photography Assistant

Index

Reference Visuals

System Overview

UI Example

Flow

Current Local Model Configuration

Current Runtime Assumptions

Phone Camera Setup

ESP8266 Rig Setup

Environment

HPC workflow

Local Windows workflow

Example .env

Initialize Data

Run The Pipeline

Manual run

Hardware run

Mock run

Maintenance Commands

API Server

Where Outputs Go

Key Files

Current Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Example `.env`

Packages