feat(telegram): unique photo filenames + caption-aware auto-vision by jkyberneees · Pull Request #23 · BackendStack21/odek

jkyberneees · 2026-06-07T15:10:01Z

Summary

Two fixes for the Telegram photo flow, reported from live use.

1. Filename collision — "image already processed"

DownloadPhoto/DownloadVoice named files photo_<fileID[:16]>.<ext>. Telegram file_ids share a long constant prefix (e.g. AgACAgIAAxkBAAI…) that encodes file-type/datacenter/version — the bytes that actually distinguish one file from another come after char 16. Truncating kept only the shared prefix, so every photo mapped to the same filename and overwrote the previous one, making the bot treat each new image as already-seen.

Fix: new fileIDSuffix() hashes the full file_id (SHA-256, first 16 hex chars) for a genuinely unique suffix. Applied to both photo and voice downloads.

2. Caption-aware auto-vision

A photo can carry a caption (the user's actual request), which was silently dropped — and the agent had to discover/call vision itself.

Fix:

Message gains a Caption field; OnPhotoMessage now receives it.
New vision.auto_describe config (default true, mirrors transcription.auto_transcribe).
On a photo, the bot runs the vision model first (focused by the caption when present) to extract a description, then injects [description] + caption to the agent so it answers the request using the description.
Falls back to the path-based message when auto-describe is off or vision fails.

The extracted description stays wrapped in <untrusted_content> boundaries (image text is untrusted input); the caption is the user's own trusted request.

Behavior

Input	Before	After
Two different photos	Same filename → "already processed"	Distinct filenames, each processed
Photo + caption "what breed?"	Caption dropped; path handed to agent	Vision extracts description focused on the caption → agent answers "what breed?"
Photo, no caption	Path handed to agent	Vision describes → agent summarizes

Config

Docker configs (config.restricted.json, config.godmode.json) ship vision.auto_describe: true. Note: like auto_transcribe, the default-true only applies when the vision section is entirely absent, so a present section must set the flag explicitly.

Tests

TestDownloadPhoto_PrefixCollisionAvoided — regression: two IDs sharing a prefix produce different filenames.
TestDownloadVoice_HashedFileIDSuffix / TestDownloadPhoto_HashedFileIDSuffix — hashed suffix, raw prefix absent.
TestHandleUpdate_PhotoMessage — asserts caption threading.
TestResolveVision_Defaults / TestResolveVision_AutoDescribePreserved — auto_describe default + explicit values.

All packages build, go vet clean, tests pass under -race.

Docs

docs/CHEATSHEET.md and docs/TELEGRAM.md updated (auto-describe flow, new filename scheme, updated handler signature).

🤖 Generated with Claude Code

Two fixes for the Telegram photo flow: 1) Filename collision ("image already processed"). DownloadPhoto/DownloadVoice named files photo_<fileID[:16]>.<ext>, but Telegram file_ids share a long constant prefix (e.g. "AgACAgIAAxkBAAI…") — the distinguishing bytes come *after* char 16. Truncating kept only the shared prefix, so every photo mapped to the same filename and overwrote the last one. Now we hash the full file_id (SHA-256, first 16 hex chars) for a genuinely unique suffix. Adds a prefix-collision regression test. 2) Caption-aware vision. Photos can carry a caption (the user's request), which was silently dropped, and the agent had to discover/call vision itself. Now: - Message gains a Caption field; OnPhotoMessage receives it. - New vision.auto_describe config (default true, mirrors auto_transcribe). - On a photo, the bot runs the vision model FIRST (focused by the caption if present) to extract a description, then injects "[description] + caption" to the agent so it answers the request. Falls back to the path-based message when auto-describe is off or vision fails. Docker configs ship vision.auto_describe=true. Docs (CHEATSHEET, TELEGRAM) updated. All packages build, vet clean, tests pass under -race. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-06-07T15:10:06Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	odek	`714bbc1`	Commit Preview URL Branch Preview URL	Jun 07 2026, 03:14 PM

…e funcs vprotocol auto-repair (§6.2 property tests). The photo-handler message composition lived inline in an untested closure in package main, leaving the new branching logic (caption present/absent, vision success/fallback) unexercised — the binding weakness in the verification η. Extract three pure functions — photoVisionPrompt, photoVisionMessage, photoFallbackMessage — and cover them with unit tests, including a regression that the <untrusted_content> wrapping is preserved verbatim when the description is injected into the agent (axis 2.8). No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jkyberneees · 2026-06-07T15:17:02Z

vprotocol v5.2.7 — Verification Certificate

PR: #23 feat/telegram-vision-caption-and-unique-filenames · head 714bbc1 · 14 files · +330/-67 LOC
Generator: Claude Opus 4.8 (claude-opus-4-8) · Class: GeneratedCode (code + tests, same model/session)
Run date: 2026-06-07 · Single-model pipeline (B=C=D=E), monoculture fallback

Pre-scan (§0)

Deterministic scan of the diff for injection markers / verdict tokens / new exec sinks: clean. The one new untrusted→LLM path (image description → agent message) is delimited: the vision tool wraps the description in nonce'd <untrusted_content> boundaries before injection, and a regression test now asserts the wrapping survives. Axis 2.8 → pass.

Nine Axes

Axis	Verdict	Notes
2.1 Semantic Correctness	✅ pass	Explicit error/fallback paths (download, vision, json); empty-caption handled
2.2 Behavioral Contract	⚠️ warn	No independent spec; PR description is the contract. `OnPhotoMessage` signature change — all callers + tests updated
2.3 Security Surface	✅ pass	Caption is user-trusted; description untrusted-wrapped; path from MediaDir; SHA-256 used only for filenames
2.4 Structural Integrity	✅ pass	Mirrors voice auto-transcribe; `fileIDSuffix` + 3 pure composers, single-responsibility
2.5 Behavioral Exploration	✅ pass	Collision regression (shared-prefix ids), empty/oversized caption, vision-error fallback all covered
2.6 Dependency Integrity	✅ pass	No new deps; stdlib `crypto/sha256`, `encoding/hex`
2.7 Generator Provenance	⚠️ warn	Code + tests: same model, same session → correlated (gates ρ)
2.8 Adversarial Surface	✅ pass	New image→prompt path explicitly delimited + provenance-tagged; verified by test
2.9 Documentation Coverage	✅ pass	`auto_describe` + filename scheme + handler signature documented (CHEATSHEET, TELEGRAM)

η Derivation (re-derived post-repair)

Signal	Weight	Value	Note
m (mutation kill)	0.34	0.62	composition branches now unit-tested; no mutation runner (estimated)
o (oracle agreement)	0.24	0.38	no independent Agent-C contract
b (branch coverage)	0.14	0.70	changed-line branches covered; handler orchestration still integration-only
f (fuzz survival)	0.09	0.90	no crashes; defensive (estimated, no fuzzer)
s (SAST clean)	0.04	1.00	`go vet` clean
t (static depth)	0.10	1.00	typed; compiler + vet clean on changed lines
d (doc coverage)	0.05	1.00	config/user surface documented

η_raw = 0.671 · ρ = 0.24 (family +0.10, version +0.05, spec_independence +0.05, AST ~0.02, shared-mutants ~0.02)
η = clamp(0.671 − 0.24, 0, 1) = 0.431

Verdict: `HumanReviewRequired`

Binding gates: η 0.431 < 0.80, and ρ 0.24 ∈ (0.20, 0.30] → HumanReviewRequired regardless of η. A single model authoring both code and tests cannot self-certify higher — independent human review is the protocol-mandated next step.
ΔDebt ≈ 0.3 h (Low) · Ci_estimated: true · LOC 397 < 1,500 (standard pipeline).

Auto-Repair Applied

§6.2 property tests (commit 714bbc1): extracted the photo-handler message composition (untested closure in package main) into three pure functions — photoVisionPrompt, photoVisionMessage, photoFallbackMessage — and added unit tests incl. an untrusted-wrapping-preservation regression. Raised η 0.333 → 0.431 by closing the testing gap on the new branching logic. No behavior change.

Open items for the human reviewer

Axis 2.7 (correlated generator): confirm the mock/test assumptions match real llama-mtmd-cli + Telegram behavior — code and tests share a single author.
Axis 2.2: no formal spec; verify the injected-message phrasing actually elicits the intended "extract then answer" behavior from the production model.
The handler orchestration (download→vision→dispatch wiring) remains integration-only by design, consistent with the existing voice handler.

Generated by vprotocol v5.2.7 auto-repair mode (single-model pipeline; ρ applied at full strength per §0.1 monoculture fallback).

jkyberneees merged commit 903e453 into main Jun 7, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telegram): unique photo filenames + caption-aware auto-vision#23

feat(telegram): unique photo filenames + caption-aware auto-vision#23
jkyberneees merged 2 commits into
mainfrom
feat/telegram-vision-caption-and-unique-filenames

jkyberneees commented Jun 7, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

jkyberneees commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkyberneees commented Jun 7, 2026

Summary

1. Filename collision — "image already processed"

2. Caption-aware auto-vision

Behavior

Config

Tests

Docs

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

jkyberneees commented Jun 7, 2026

vprotocol v5.2.7 — Verification Certificate

Pre-scan (§0)

Nine Axes

η Derivation (re-derived post-repair)

Verdict: HumanReviewRequired

Auto-Repair Applied

Open items for the human reviewer

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Jun 7, 2026 •

edited

Loading

Verdict: `HumanReviewRequired`