feat: gemini image analyzer (auto-detect role/vibe/treatment on upload)#29
Merged
feat: gemini image analyzer (auto-detect role/vibe/treatment on upload)#29
Conversation
Closes Milestone D from PR #25. Every image upload now fires a fire-and-forget Gemini Flash analysis that tags the manifest entry with role, mood vibe, a recommended image-scene treatment, and a 1-10 retention-strength estimate. The visual director consumes those fields as soft priors so the very first render has intentional visual logic instead of defaulting to editorial-bleed across every scene. Backend - packages/core/src/images/analyzer.ts: analyzeImage() uses Gemini Flash inlineData (base64 webp) for sub-2s latency; normalizeAnalysis defends against out-of-range scores, unknown enums, and missing fields. - ImageEntry gains vibe / suggestedTreatment / retentionStrengthAtAttachment / analysisStatus / analysisError / analyzedAt / analysisRationale, all optional so old manifests continue to load. - Upload route patches the entry to "pending" synchronously, then runs analysis off-cycle and writes the result back. POST /images/:id/analyze re-runs on demand. - Visual director catalog labels new fields as "(analyzer prior)" so the LLM knows they are advisory and can override based on script context. Frontend - ImagesTab row chip surfaces analysis state: "analyzing…", "R7" (color-coded retention chip), or "analysis failed" with the reason on hover. - Editor gains an Analyzer panel showing vibe / treatment prior / retention score / rationale, plus a re-analyze button that optimistically flips the chip to pending while the request is in flight. Cost is logged under script.images.analyze with kind: gemini, ~$0.001 per image at gemini-2.5-flash rates. User-typed role / description / tags are preserved and never overwritten by the analyzer. 15 new analyzer tests; full suite stays green (760 core + 281 studio). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes Milestone D from #25. Every image upload now fires a fire-and-forget Gemini Flash analysis that tags the manifest entry with
role, moodvibe, a recommended image-scenesuggestedTreatment, and a 1-10retentionStrengthAtAttachmentestimate. The visual director consumes those fields as soft priors, so the very first render has intentional visual logic instead of defaulting toeditorial-bleedacross every scene.packages/core/src/images/analyzer.ts— Gemini Flash +inlineData(base64 webp) for sub-2s latency; purenormalizeAnalysisdefends against out-of-range scores, unknown enums, and missing fields.packages/core/src/images/manifest.ts—ImageEntrygainsvibe,suggestedTreatment,retentionStrengthAtAttachment,analysisStatus,analysisError,analyzedAt,analysisRationale. All optional, so old manifests load unchanged.packages/core/src/studio-api/routes/images.ts— Upload patchesanalysisStatus: pendingsynchronously, fires analysis off-cycle, writes result back. NewPOST /images/:id/analyzere-runs on demand.packages/core/src/script/visualDirector.ts— Catalog labels new fields as(analyzer prior)so the LLM treats them as advisory; new constraint feat(script): cinematography agents + hook layering + ssr cache fix #10 + retention-strength guidance for hook scenes.packages/studio/src/components/sidebar/ImagesTab.tsx— Row chip showsanalyzing…/R7(color-coded retention) /analysis failed. Editor gains an Analyzer panel with vibe / treatment prior / score / rationale, plus a re-analyze button that optimistically flips state.Cost
Logged under
script.images.analyzewithkind: "gemini". ~$0.001 per image atgemini-2.5-flashrates.Tests
15 new analyzer tests (clamping, role coercion, treatment normalization, char-clipping, default-treatment helper). Full suite green: 745 core + 281 studio passing.
Failure modes
GEMINI_API_KEYmissing → entry flips toanalysisStatus: "failed"with a "configure it in Settings" reason. Upload still succeeds.Test plan
GEMINI_API_KEYset → row chip showsanalyzing…for 1-2s then flips toR<n>color-coded.↻re-analyze on an existing image → chip flips toanalyzing…then refreshes with new values.GEMINI_API_KEYand upload a new image → chip showsanalysis failedwith the reason on hover; upload still succeeds.🤖 Generated with Claude Code