1.AH — generative media-output adapters (TTS/Imagen/Sora/Veo) + later-phase media task records#38
Conversation
1.AG (media output generation, Phase D) merged to main via PR #37 (2026-06-21), including the final pre-merge review's two HIGH fixes (orphaned-vertex cost addend + crash-in-window run:paused). Flip its status everywhere, per the Roadmap-Done-After-Merge rule: - phase-1-engine-and-llm.md: the 1.AG dependency-table row (◇ → ✅ Done, PR #37), the 1.m6 summary row (1.AG ✅), and the narrative entry's Status line. - current.md: 1.AG ✅ Done (PR #37); "only 1.AH remains" on the 1.m6 sub-spine. - deferred-tasks.md: tick [x] the A5 async-media-job ADR + engine-loop obligation (PR #37). - CLAUDE.md: both Status paragraphs note 1.AG Done; remaining Phase-1 work is 1.AH. - README.md: 1.AG ✅; only 1.AH remains. - llm-provider-seam.md: clarify the streaming media triad — non-streaming output landed at 1.AG (generate() + generative endpoint, ADR-0046); the streaming triad is deferred to 1.AH (ADR-0046 §4). Docs-only; format:check clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The first 1.AH vendor-adapter section (package-level, no surface — provable offline).
Refactors OpenAI `generateMedia` into a modality dispatcher and wires the audio arm:
- generateMedia now dispatches by modality: image → gpt-image-1 (extracted into
openAiGenerateImage), audio → TTS (new openAiGenerateSpeech), video → typed
UnsupportedCapabilityError (the async Sora path is a later section); DeepSeek →
typed capability error (generates no media). The extraction also keeps each arm
small (no cognitive-complexity smell).
- openAiGenerateSpeech: `client.audio.speech` returns BINARY audio → base64-encoded
into an in-flight MediaPart (the engine de-inlines to a handle). `req.mimeType`
selects the vendor response_format (default mp3); providerOptions.audio.voice
selects the voice (default alloy). The raw audio bytes NEVER cross the seam — `raw`
carries only a non-byte `{ responseFormat }` diagnostic (strip-on-sink anyway, I3).
- TTS_FORMAT_TO_MIME + ttsResponseFormat (mimeType↔format) + ttsVoice helpers.
- No new runtime dependency (openai SDK already imported); Buffer base64 mirrors
packages/db's media-store (the adapter tsconfig carries `types: ["node"]`; the seam
itself stays pure under tsconfig.seam.json `types: []`).
Tests: TTS base64 + voice + format-default round-trip; req.mimeType → response_format
mapping (audio/opus → opus); the modality-reject test updated (OpenAI video + DeepSeek
reject; audio no longer rejected). Toolchain 16/16 green, format clean, Leakwatch 0.
Docs: deferred-tasks + llm-provider-seam — OpenAI-TTS now wired (1.AH A1); Gemini-Imagen
remains (A2). Pricing rows + count>1 deferred per the 1.AH scope decisions.
Refs: ADR-0045, ADR-0031
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…/count notes, empty-bytes test) Applies the verified findings from the A1 Opus review (0 blocker/high; 1 medium, 3 low — all confirmed): - M1 (medium): move the TTS body download (`await response.arrayBuffer()`) INSIDE the try/catch. `audio.speech` is a __binaryResponse — create() returns the raw Response unconsumed, so the multi-MB download happens at the read; a mid-download socket reset / abort was escaping unclassified and flattening to opaque `internal` instead of a classified LlmError (and an abort → `internal` not `cancelled`). Now routed through openaiErrorToLlmError, matching the image/egress-error contract. - L1 (low): document the `pcm → audio/L16` sample-rate loss — OpenAI pcm is 24 kHz but the seam's MediaMimeTypeSchema forbids the `;rate=` parameter (RFC 2586 default 8 kHz), so a consumer must assume 24 kHz. Mirrors the pre-existing chat pcm16 convention; recorded as a low deferred-tasks known-limitation. - L2 (low): comment that `count` is intentionally a no-op for TTS (per-character billing, single stream — no bill-N-deliver-1 hazard, unlike the image path). - L3 (low): test the empty-audio-bytes → bad_request rejection branch. Toolchain 16/16 green, format clean, Leakwatch 0. Refs: ADR-0045 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold the verified findings from the A1 Sonnet adversarial review (0 blocker/high):
- doc(nit): refresh the `generateMedia` method JSDoc — it still claimed audio
(TTS) was NOT wired; audio dispatches to `openAiGenerateSpeech` since A1. Video
remains the async Sora path (correctly throws `UnsupportedCapabilityError`).
- test(low): assert `result.raw === { responseFormat: 'mp3' }` on the TTS
happy-path — pins the I3 structural guarantee (no audio bytes leak into `raw`,
which is `z.unknown()`) at the test level.
- test(low): add a TTS-call-failure test — a rejected `audio.speech.create`
surfaces as a typed `LlmProviderError` (transport, retryable) via
`openaiErrorToLlmError`, proving the try/catch is wired rather than raw.
Toolchain 16/16 green; prettier clean; Leakwatch 0.
Refs: ADR-0045, ADR-0046
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the Gemini adapter's separate-endpoint image generation behind the seam (ADR-0045 §1, ADR-0046), mirroring the OpenAI image arm: - `GeminiTransport.generateImages` — a new required transport method (a transport honestly declares the generative surface it services; image is Gemini's only Phase-1 generative arm, Veo video is 1.AH A4). The default `sdkTransport` wraps `ai.models.generateImages`; the conformance harness replays it. - `GeminiImageRequest`/`GeminiImageResponse` — hand-rolled structural subsets so no vendor type crosses the seam (parity with `GeminiRequest`/`GeminiResponse`). - `generateMedia` dispatches by modality: image → `geminiGenerateImage` (`generatedImages[].image.imageBytes` → a base64 `media` part the engine de-inlines to a `media://` handle — I3); audio/video fail loud with a typed `UnsupportedCapabilityError`. `count > 1` is rejected before egress; a safety-filtered candidate (`raiFilteredReason`, no image) maps to `content_filter`; `numberOfImages` is pinned to 1 AFTER any providerOptions spread so an author count can't smuggle past the single-artifact seam. - Conformance: a recorded Imagen fixture drives the existing generative seam-contract scenario against the real fold; 9 adapter unit tests cover the happy path, MIME default, providerOptions threading, count>1, content_filter, no-image, transport rejection, and audio/video capability rejection. Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean. Refs: ADR-0045, ADR-0046, ADR-0011 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold the two verified nits from the A2 Opus adversarial review (0 blocker/high; the raw:response I3 question was assessed and confirmed a defensible parity choice — raw is a sink-stripped diagnostic by contract, types.ts:442-444): - doc(nit): document the content_filter taxonomy in geminiGenerateImage — only the in-body raiFilteredReason channel maps to content_filter; an HTTP-level Imagen safety 4xx normalizes to bad_request (Gemini's ApiError carries no structured content-policy code, unlike OpenAI's isContentPolicyCode, so a status-only block has no reliable signal to key on; message-sniffing would be fragile). A documented, behaviorally-inert divergence from the OpenAI sibling. - test(nit): assert a present-but-empty raiFilteredReason falls through to bad_request, not content_filter (the previously-uncovered guard sub-branch). Toolchain 16/16 green; prettier clean; Leakwatch 0. Refs: ADR-0045, ADR-0046, ADR-0011 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ker) Fold the A2 Sonnet second-pass adversarial review (1 blocker, 1 medium, 3 low, 3 nit — the SSRF blocker was missed entirely by the first Opus pass): - BLOCKER (B1) — SSRF / API-key exfiltration on the Imagen path: geminiGenerateImage spread providerOptions into the Imagen config WITHOUT stripTransportKeys, so a caller-supplied httpOptions.baseUrl could redirect egress + the live API key to an attacker host (the text generate() path strips it precisely for this reason). Fix: apply stripTransportKeys to the spread; extend stripTransportKeys to also drop an author-injected abortSignal (L1 — the signal is the engine's to set, never providerOptions), hardening both paths. + an SSRF regression test. - MEDIUM (M1) — vendor MIME not parameter-stripped: a parameterized Imagen MIME (image/png; q=1.0) reached MediaStore.put / media_objects.mimeType un-canonicalized (no parameter-stripping CHECK in the production de-inline path). Fix: strip ;-parameters to the bare MIME, mirroring the generate() arm (mapContent). + test. - LOW: assert model threading (L2); cover an entirely-absent generatedImages field (L3). NIT: strengthen isGeminiImageResponse to reject a non-array generatedImages (N2); assert raw is the exact transport response (N3). Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean. Refs: ADR-0045, ADR-0046, ADR-0011, ADR-0043 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the OpenAI adapter's ASYNC video LRO behind the seam (ADR-0045 §3/§7), the
first real async generative adapter (design validated by a dedicated research pass):
- generateMedia(video) → client.videos.create → ALWAYS returns an opaque { jobId }
(even on instant completion — completion is the engine poll loop's job; returning
{ media } would orphan the LRO). raw carries only { id, status }, never bytes.
- pollMediaJob(jobId, key, signal) → decode → videos.retrieve → status map: queued/
in_progress → pending(clamped progress); completed → videos.downloadContent →
base64 video/mp4 media part (downloaded in the same try as retrieve so a mid-
download abort classifies as cancelled); failed → mapVideoCreateError (content-
policy code → content_filter, else unknown). A malformed jobId returns a FATAL
`failed` (not a throw — a throw is engine-classified retryable and would loop).
- Opaque jobId encoding (ADR-0045 §7): rlv-mediajob:1:<base64url(vendorId)> — a
STATELESS reversible bijection (encodeVideoJobId/decodeVideoJobId), NOT an in-memory
Map, so a cold-process re-attach (§3) resolves the vendor id with no adapter state.
Relavium-namespaced + base64url'd: not the bare vendor op-name/URL, engine never
parses it (I1/ADR-0011), the non-secret resource id is safe to persist. NO new ADR
(within ADR-0045 §7; rationale recorded in the adapter doc-comment).
- providerOptions is NOT spread into the SDK call — only explicit soraSeconds (reject
non-{4,8,12}, never round) + soraSize knobs are extracted (no transport-key surface,
per the A2-verification sibling-sweep note).
- 11 unit tests (encoding round-trip/opacity, foreign-token decode, generateMedia
always-jobId, duration reject, full status mapping, content_filter/null-error,
malformed-token, abort threading, DeepSeek guards); the generative-seam conformance
caveat + the llm-provider-seam.md Wired list updated.
Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.
Refs: ADR-0045, ADR-0011, ADR-0031
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…overage) Fold the A3 Opus adversarial review (0 blocker/high; "no behavioral defects" — all 9 confirmed items were a doc overstatement or a coverage gap on correct code): - decodeVideoJobId now round-trip-validates (re-encode === jobId): base64url decode is lenient (drops invalid chars), so a corrupted/non-canonical right-prefix token would otherwise decode to a wrong-but-non-empty vendor id and be polled. The strict check rejects every such token; every minted token still round-trips exactly. - pollMediaJobSora: a defensive `default` arm on the status switch returns a FATAL `unknown` failed instead of falling through to `undefined` (which would break the Promise<MediaJobStatus> contract) if the SDK ever returns an out-of-union status. - Tests (the rest were coverage gaps on correct code): assert the create FormData body forwards model/prompt/seconds (videos.create is multipart — capture FormData fields); seconds defaults to 4 + a valid size knob is forwarded; an invalid size is dropped; progress clamps (150→1); completed-but-empty-bytes → bad_request; empty error code → unknown + omitted; the abort signal reaches BOTH retrieve and downloadContent; the non-canonical-token decode rejection. Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean. Refs: ADR-0045, ADR-0011, ADR-0031 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…on + coverage) Fold the A3 Sonnet second-pass review (0 blocker/high; 6 distinct items — 2 low, 4 nit): - abort classification (nit, one-canonical-home fix): a NATIVE abort during a binary body read (TTS audio.speech / Sora downloadContent arrayBuffer) throws a DOMException/ Error named 'AbortError', NOT the SDK's APIUserAbortError, so it previously classified as `unknown`. openaiErrorToLlmError now detects an 'AbortError'-named error → `cancelled` — benefits every binary-read arm (A1 TTS + A3 Sora) consistently. + a classifier test. - soraSeconds now takes providerId (was hardcoded 'openai'), matching every other error helper in the file. - comment: the `bytes === undefined` half of the completed-arm guard is the TS narrowing guard (bytes is control-flow-conditionally assigned), not dead code — clarified. - doc: types.ts pollMediaJob JSDoc no longer claims "no Phase-1 adapter implements it" (Sora wired at A3; Veo at A4). - tests (the rest were coverage gaps on correct code): the videos.create AbortSignal is threaded; durationSeconds 8/12 forward correctly; the pollMediaJobSora default arm maps an unrecognized SDK status to a fatal unknown failed. Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean. Refs: ADR-0045, ADR-0011 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oding
Wire the Gemini adapter's ASYNC Veo video LRO behind the seam (ADR-0045 §3/§7),
completing the four 1.AH generative adapters, and factor the opaque-jobId encoding
into one shared home:
- adapters/shared.ts: encodeMediaJobId/decodeMediaJobId (moved from openai.ts A3,
generalized — provider-agnostic, the engine routes pollMediaJob by the bound
provider so the shared rlv-mediajob:1: prefix is collision-safe). The Sora arm now
imports them; the round-trip-validated bijection + its tests live in shared.test.ts.
- GeminiTransport gains generateVideos + pollVideo (required) — plain Relavium subsets
(GeminiVideoRequest/Operation/Poll), no @google/genai type crosses the seam. The
default sdkTransport wraps models.generateVideos / operations.getVideosOperation;
getVideosOperation reads only operation.name, so pollVideo reconstructs a fresh
GenerateVideosOperation carrying the persisted name → stateless cold-process re-attach.
- generateMedia(video) → geminiGenerateVideo: ALWAYS returns an opaque { jobId } from the
operation name (never { media }); numberOfVideos pinned to 1 AFTER stripTransportKeys
(single-artifact + the A2 SSRF lesson — genai co-mingles httpOptions in the config);
durationSeconds threaded; raw byte-free.
- pollMediaJob → geminiPollVideo: pending (done:false); done delivers inline videoBytes as
base64 OR a re-hostable url source (ADR-0045 §7 — the engine de-inlines via the single
fetchMediaBytes, never a second fetch site); raiFilteredCount>0 → content_filter;
operation error → unknown; empty → bad_request; malformed jobId → fatal failed.
- 14 Veo unit tests (opaque jobId, SSRF strip, status mapping incl. url-source + base64,
content_filter/error/empty, malformed token, abort threading); conformance replay +
seam-doc Wired list updated.
Veo end-to-end completion of a url-source result needs the host MediaUrlFetch re-host
hook (1.AH host-wiring, deferred).
Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.
Refs: ADR-0045, ADR-0011, ADR-0031, ADR-0043
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold the A4 Opus adversarial review (0 blocker/high; "every finding is a test-coverage or style item on correct code — A3/Sora behavior fully preserved by the refactor"): - GeminiVideoPoll.raiFilteredCount → `number | undefined` for sibling parity (N1). - tests (coverage gaps on correct code): assert the Veo create-path threads the AbortSignal (L1); assert numberOfVideos:5 in providerOptions is force-pinned back to 1 (the A2 single-artifact guard, L3); a pollVideo transport-rejection classifies (L2 — the only previously-uncovered new lines); a pending→done poll sequence transition (L4 — also exercises fakeVideoTransport's sequence path); schema-validate the url-source done result for parity with the base64 case (N2). Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean. Refs: ADR-0045, ADR-0011 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold the A4 Sonnet second-pass review (0 blocker/high/medium; "the Opus GO holds — every finding is test-coverage/doc completeness on correct code"): - test: pollVideo threads the DECODED op-name to the transport (fakeVideoTransport now captures lastOperationName) — proves the re-attach decode→thread invariant (N1). - test: a parameterized vendor video MIME (video/mp4; codecs=h264) strips to the bare video/mp4 — parity with the Imagen arm (L1/N5). - test: durationSeconds is omitted from the Veo config when not requested (N2). - test: the pending→done poll sequence clamps on the last status (a 3rd poll stays done) — locks the fakeVideoTransport sequence contract (N9). - doc: the generative-seam conformance header no longer says Veo "remains deferred" (Veo is wired at A4, exercised via fakeVideoTransport) (N7). Skipped by design: a cross-provider token-rejection test (N3 — the opaque token is provider-agnostic by the maintainer-approved design; the engine routes pollMediaJob by the bound provider) and tightening GeminiVideoRequest.config typing (N4 — the Record<string,unknown> escape hatch is the established pattern for all 3 gemini requests). Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean. Refs: ADR-0045, ADR-0011 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ality knobs
Two adapter-level improvements (A5 save_to double-fetch stays deferred — it's a
rare-case perf optimization needing a risky engine-flow refactor; correctness is
already fine since the CAS put dedupes the bytes):
- bareMimeType (gemini.ts): a shared helper that strips ;-parameters AND validates the
vendor MIME via the shared MediaMimeTypeSchema (returns '' on absent/parameter-only/
ILLEGAL incl. a CR/LF-injected value that split(';') alone would pass). Replaces the
three split-only sites (mapContent inline media-out, geminiGenerateImage, geminiPollVideo)
— closes the A2-N1 cross-arm gap where a CR/LF MIME could reach media_objects.mimeType
(the production de-inline path runs no MIME regex). Callers fall back ('' → default) or
skip the part. + a CR/LF-rejection test.
- gpt-image-1 size/quality knobs (openai.ts): openAiGenerateImage now threads optional
size/quality from providerOptions.image.{size,quality}, narrowed to the valid gpt-image-1
values (unrecognized dropped — no 400-inducing passthrough), in the request BODY only
(never the SDK RequestOptions — the A2 SSRF lesson). E2E-reachable once the engine
populates providerOptions for a generative call (host-wiring). + forward/drop tests.
Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.
Refs: ADR-0044, ADR-0031, ADR-0011
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…se docs Analyze the media host-wiring that 1.AH deliberately left to later phases and record it as tasks in each phase doc (rule 8: task + canonical ADR/deferred-tasks pointer, never a spec restatement). No HW/S labels invented — each task is named by its mechanism + the canonical D-series / ADR pointer (matching deferred-tasks.md): - phase-1 §1.AH: expand the narrative bullet into the landed A1–A6 adapter list (the four generative adapters + shared opaque-jobId codec + bareMime, behind the seam, not yet runtime-reachable), the deferred decisions (count>1 ADR, pricing rows, the Responses-API path), and the host-wiring-spans-2–6 surface map + the build-with-the- first-surface warning. A5 noted deferred. - phase-2 §2.S (NEW): the CANONICAL home for the cross-surface host-wiring task list — resolveMediaSurface, the read_media MediaReadAccess factory, session-scope media_references, the load-check, the cost governor, resolveForEgress, the save_to write port, and the EgressCapability.fetch SSRF mechanism (security-critical) + milestone/dependency rows. - phase-3 §3.B: split into the desktop Rust CAS + handle-only de-inline integ test + the read_media Tauri byte gate + save_to MediaWritePort + the keychain no-raw-key IPC test + canvas rendering + a points-to-§2.S shared-wiring bullet + the reserved-triad note + exit criterion #9 + a Risks row + the media ADR Related links. - phase-4 §4.N (NEW): VS Code FS-CAS host-wiring (points to §2.S) + webview-resource rendering + the SSRF mechanism + a P4.M7 milestone + ADR links. - phase-5: managed gateway media poll-through (pass-through, not a store), mediaUnits metering + reserve→settle, durable settle of a billed-but-failed async job, the no-logging-extends-to-media-bytes guarantee, the per-modality $ dashboard axis, ADRs. - phase-6: object-storage MediaStore + BullMQ sweep, media-table tenancy/RLS + cross-org CAS dedup, GET /media/{handle} byte gate, durable failed/cancelled cost, the reserved {kind:'workspace'} read_media scope, and the record-only portal preview triad. Prettier clean; Leakwatch 0; all referenced ADR/phase links verified to exist. Refs: ADR-0031, ADR-0032, ADR-0042, ADR-0043, ADR-0044, ADR-0045, ADR-0046, ADR-0015 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Sorry @cemililik, your pull request is larger than the review limit of 150000 diff characters
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughImplements ChangesGenerative Media Adapter Implementation
Roadmap and Reference Documentation Updates
Sequence Diagram(s)sequenceDiagram
participant Caller
participant generateMedia
participant GeminiAdapter
participant OpenAIAdapter
participant encodeMediaJobId
participant VendorTransport
rect rgba(100, 149, 237, 0.5)
Note over Caller,VendorTransport: Sync image/audio path
Caller->>generateMedia: modality=image/audio
generateMedia->>GeminiAdapter: geminiGenerateImage(request)
GeminiAdapter->>VendorTransport: generateImages(request, key)
VendorTransport-->>GeminiAdapter: GeminiImageResponse (base64 bytes)
GeminiAdapter-->>Caller: MediaGenResult (inline media)
end
rect rgba(144, 238, 144, 0.5)
Note over Caller,VendorTransport: Async video LRO path (Sora / Veo)
Caller->>generateMedia: modality=video
generateMedia->>OpenAIAdapter: openaiGenerateVideo(request)
OpenAIAdapter->>VendorTransport: POST /videos/generations
VendorTransport-->>OpenAIAdapter: soraJobId
OpenAIAdapter->>encodeMediaJobId: encode(soraJobId)
OpenAIAdapter-->>Caller: opaque jobId token
Caller->>generateMedia: pollMediaJob(jobId, signal)
generateMedia->>OpenAIAdapter: decodeMediaJobId(jobId)
OpenAIAdapter->>VendorTransport: GET /videos/generations/{id}
VendorTransport-->>OpenAIAdapter: status (queued→completed)
OpenAIAdapter->>VendorTransport: GET /videos/generations/{id}/content
VendorTransport-->>OpenAIAdapter: MP4 bytes
OpenAIAdapter-->>Caller: MediaPollResult done (base64 video)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request implements the generative media-output adapters for OpenAI (TTS audio, Sora async video) and Gemini (Imagen sync image, Veo async video) behind the @relavium/llm seam, along with stateless opaque job ID encoding/decoding bijections and extensive unit and conformance tests. It also updates the project roadmap and documentation to reflect the completion of the media output generation phase (1.AG) and the transition to host-wiring (1.AH). The code review identified several critical issues in the newly added adapters where nullish or missing fields in API responses (such as video.error, video.progress, op.name, and poll.error) could bypass existing checks and trigger runtime TypeError crashes or Zod validation failures.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/llm/src/conformance/gemini.conformance.test.ts`:
- Around line 30-35: In the isGeminiImageResponse type guard function, replace
the unsafe `as` cast when accessing the generatedImages property. Instead of
using the cast `(value as { generatedImages?: unknown }).generatedImages`, use
`Reflect.get(value, 'generatedImages')` to safely access the property without
type casting. This maintains the same runtime behavior while adhering to the no
unsafe casts requirement.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4de2406a-ac49-41de-89ac-41ff300b7948
📒 Files selected for processing (21)
CLAUDE.mdREADME.mddocs/reference/shared-core/llm-provider-seam.mddocs/roadmap/current.mddocs/roadmap/deferred-tasks.mddocs/roadmap/phases/phase-1-engine-and-llm.mddocs/roadmap/phases/phase-2-cli.mddocs/roadmap/phases/phase-3-desktop.mddocs/roadmap/phases/phase-4-vscode.mddocs/roadmap/phases/phase-5-managed-inference.mddocs/roadmap/phases/phase-6-cloud-execution-portal.mdpackages/llm/src/adapters/gemini.test.tspackages/llm/src/adapters/gemini.tspackages/llm/src/adapters/openai.test.tspackages/llm/src/adapters/openai.tspackages/llm/src/adapters/shared.test.tspackages/llm/src/adapters/shared.tspackages/llm/src/conformance/fixtures/gemini.tspackages/llm/src/conformance/gemini.conformance.test.tspackages/llm/src/conformance/generative-seam.conformance.test.tspackages/llm/src/types.ts
…responses Address the gemini-code-assist review on PR #38 (1 high, 3 medium) — defensive guards for media-response fields the vendor SDKs type as present but a deviating runtime response could omit (the adapters trusted the SDK types): - mapVideoCreateError (HIGH): accept `VideoCreateError | null | undefined` and use `err == null` (was `=== null`) so a runtime-absent Sora `video.error` on a failed status maps to `unknown` instead of crashing on `err.code`; read `err.code`/`err.message` defensively (truthy/`??`) too. - Sora `video.progress` (medium): `(video.progress ?? 0) / 100` so a missing progress yields 0, not NaN (NaN would fail the engine z.number().min(0).max(1)). - Veo `op.name` (medium): `!op.name` (was `=== undefined || .length === 0`) — covers a runtime null/undefined/empty without a TypeError on `.length`. - Veo `poll.error` (medium): truthy guard `if (poll.error)` before reading `.message`. + two regression tests (Sora failed with absent error → unknown; Sora in_progress with absent progress → pending 0). The Veo null guards are exercised by the existing empty/truthy-path tests. Toolchain 16/16 green; prettier clean; Leakwatch 0. Refs: ADR-0045 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e (CodeRabbit) Address CodeRabbit's actionable comment on PR #38: the conformance type guard read `generatedImages` via `(value as { generatedImages?: unknown })` — an unsafe `as` against the project's "prefer type guards, no unsafe `as`" guideline. CodeRabbit suggested `Reflect.get`, but that returns `any` and trips the harder `@typescript-eslint/no-unsafe-assignment` rule — so instead use `in`-narrowing, which is both cast-free AND any-free and preserves the exact semantics (absent or undefined or array → valid; a present non-array → invalid). Toolchain 16/16 green; prettier clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ells) Address the 3 SonarCloud "extract this nested ternary into an independent statement" major code smells in the media test helpers: - gemini.test.ts fakeVideoTransport `polls` — the undefined / single / sequence selection is now an if/else chain (kept the `'done' in opts.poll` discriminant — Array.isArray does not narrow a readonly-array union). - openai.test.ts soraFetch `url` (typeof input string/URL/Request) and `call` (content/create/retrieve) — both nested ternaries extracted to if/else. Behavior identical; toolchain 16/16 green; prettier clean; Leakwatch 0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@coderabbitai review all |
|
✅ Action performedFull review finished. |
…n 2026-09-24) OpenAI announced the Sora 2 video models + the videos.* API shut down 2026-09-24, affecting the 1.AH A3 OpenAI/Sora async-video adapter. Record it as a tracked forward-obligation: impact is narrow (only the Sora adapter arm + tests — the engine async-job LRO, the shared opaque-jobId codec, the conformance, and Gemini/Veo are provider-agnostic and unaffected), no live exposure until the Phase-2 host wiring, and the maintainer action (retarget to a replacement model, or disable/remove the Sora arm leaving the seam + Veo intact) is captured with the deadline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Review responsesAll automated-review findings addressed (CI green: lint·typecheck·test, SonarCloud, engine coverage floor, strict peer-deps): gemini-code-assist (4 —
CodeRabbit (1 — SonarCloud (3 nested-ternary smells — ℹ️ Separately recorded ( |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
docs/roadmap/phases/phase-1-engine-and-llm.md (1)
1065-1069: 💤 Low valueClarify "landed (unmerged)" phrasing for adapters.
Line 1066 states the adapters "landed (unmerged)," but earlier claims in this file (line 1059) mark 1.AG as ✅ Done (PR
#37, 2026-06-21). The parenthetical "(unmerged)" is confusing — the code is either merged (with the PR) or it is not.The clarification "They are not yet runtime-reachable" (line 1069) resolves the intent: the adapters shipped in PR
#37but require 1.AH host-wiring to be callable. Recommend rewording to remove the ambiguous "(unmerged)" or clarify it as "landed (deferred host-wiring)."📝 Suggested rephrase
- The Phase-1-doable half landed (unmerged): the **four generative media-output adapters** + The Phase-1-doable half landed in PR `#37`: the **four generative media-output adapters**Or:
- The Phase-1-doable half landed (unmerged): the **four generative media-output adapters** + The Phase-1-doable half landed (deferred host-wiring): the **four generative media-output adapters**🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/roadmap/phases/phase-1-engine-and-llm.md` around lines 1065 - 1069, The phrase "landed (unmerged)" at line 1066 is confusing because it contradicts the earlier note that 1.AG was marked Done in PR `#37`. Since the adapters (OpenAI-TTS, Gemini-Imagen, OpenAI/Sora, Gemini/Veo) are actually merged via PR `#37` but not yet runtime-reachable, replace "(unmerged)" with "(deferred host-wiring)" to clarify that the adapters were shipped in the PR but their integration/host-wiring for runtime access is still pending in 1.AH.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/roadmap/deferred-tasks.md`:
- Around line 169-179: The deferred-tasks.md roadmap file incorrectly attributes
the OpenAI-TTS audio adapter to 1.AH A1 (lines 171 and surrounding context) when
it was actually shipped in 1.AG, not deferred. Update the generateMedia deferred
task section to remove the claim that OpenAI-TTS is being wired in 1.AH, and
instead explicitly state that OpenAI-TTS audio adapter was already completed in
the earlier 1.AG sections. Adjust any references to "1.AH A1" in this section to
"1.AG A1" to match the actual completion status documented in PR objectives `#37`.
This correction should make clear that only Gemini-Imagen (A2) and the two async
adapters remain as deferred work for 1.AH.
In `@docs/roadmap/phases/phase-5-managed-inference.md`:
- Line 221: In the phase-5-managed-inference.md file at line 221, locate the
phrase containing "the D15 loader is the 1.AH host-wiring half" and replace "D15
loader" with "D15 load-check" to align with the canonical terminology used in
deferred-tasks.md. The updated phrase should read "the D15 load-check is the
1.AH host-wiring half" to maintain consistency across documentation for the
validation step where validateWorkflowWithCatalog is invoked by the host loader
against the managed catalog.
In `@packages/llm/src/adapters/openai.ts`:
- Around line 1125-1224: Update the SORA_SECONDS constant to include the missing
supported durations by adding entries for 16 and 20 seconds alongside the
existing 4, 8, and 12 second mappings. Additionally, update the error message in
the soraSeconds function to reflect all five supported duration values instead
of just the three currently mentioned, so users receive accurate guidance about
valid Sora clip lengths.
In `@packages/llm/src/adapters/shared.ts`:
- Around line 86-88: The encodeMediaJobId function currently accepts empty
vendor IDs, which causes invalid encoded values that the decode function
intentionally rejects on lines 102-109, breaking the bijection contract. Add
validation at the beginning of the encodeMediaJobId function to throw an error
or return early if the vendorId is an empty string or falsy value, ensuring that
only valid vendor IDs that can be successfully decoded are accepted at encoding
time rather than deferring validation to the decode phase.
---
Nitpick comments:
In `@docs/roadmap/phases/phase-1-engine-and-llm.md`:
- Around line 1065-1069: The phrase "landed (unmerged)" at line 1066 is
confusing because it contradicts the earlier note that 1.AG was marked Done in
PR `#37`. Since the adapters (OpenAI-TTS, Gemini-Imagen, OpenAI/Sora, Gemini/Veo)
are actually merged via PR `#37` but not yet runtime-reachable, replace
"(unmerged)" with "(deferred host-wiring)" to clarify that the adapters were
shipped in the PR but their integration/host-wiring for runtime access is still
pending in 1.AH.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b78c3976-560a-4955-9164-6438d77fc7e4
📒 Files selected for processing (21)
CLAUDE.mdREADME.mddocs/reference/shared-core/llm-provider-seam.mddocs/roadmap/current.mddocs/roadmap/deferred-tasks.mddocs/roadmap/phases/phase-1-engine-and-llm.mddocs/roadmap/phases/phase-2-cli.mddocs/roadmap/phases/phase-3-desktop.mddocs/roadmap/phases/phase-4-vscode.mddocs/roadmap/phases/phase-5-managed-inference.mddocs/roadmap/phases/phase-6-cloud-execution-portal.mdpackages/llm/src/adapters/gemini.test.tspackages/llm/src/adapters/gemini.tspackages/llm/src/adapters/openai.test.tspackages/llm/src/adapters/openai.tspackages/llm/src/adapters/shared.test.tspackages/llm/src/adapters/shared.tspackages/llm/src/conformance/fixtures/gemini.tspackages/llm/src/conformance/gemini.conformance.test.tspackages/llm/src/conformance/generative-seam.conformance.test.tspackages/llm/src/types.ts
Address the 2 valid findings from CodeRabbit's re-review of the pushed fixes: - shared.ts encodeMediaJobId: reject an empty vendorId at mint (it would produce a prefix-only token decodeMediaJobId rejects — breaking the bijection). The adapter call sites already guard, so this is a fail-fast contract assertion. + a test. - phase-5-managed-inference.md: "D15 loader" → "D15 load-check" for terminology consistency with deferred-tasks.md / phase-2 §2.S. Skipped (with rationale, replied on the threads): the SORA_SECONDS "add 16/20" finding (the pinned openai@6.42 types VideoSeconds as '4'|'8'|'12' — 16/20 would fail tsc — and Sora is deprecated, e169c7a) and the deferred-tasks "OpenAI-TTS is 1.AG" finding (a false positive — OpenAI-TTS is 1.AH A1; 1.AG wired OpenAI-image sync). Toolchain 16/16 green; prettier clean; Leakwatch 0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…fier (CodeRabbit nitpick) "landed (unmerged)" is a transient PR-state qualifier that reads as confusing in a roadmap doc (and goes stale once #38 merges). Rephrase to the accurate, non-transient state — "is implemented behind the seam" — the sentence already notes it is not yet runtime-reachable + needs the deferred host-wiring below, so nothing is over-claimed. (CodeRabbit suggested "(deferred host-wiring)", but the adapters aren't the deferred part — the host-wiring is.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
PR #38 merged — the Phase-1-doable half of 1.AH (the four generative media-output adapters OpenAI-TTS/Gemini-Imagen/OpenAI-Sora/Gemini-Veo behind the seam + the shared opaque-jobId codec + the bare-MIME validator) is done. Reconcile the status docs (roadmap-done-after-merge): - current.md: 1.AH Phase-1-doable half ✅ Done (PR #38); Phase 1 complete; the host-wiring half is distributed across Phases 2–6 as recorded tasks. Replaced both "only 1.AH remains" status lines. - CLAUDE.md: both status paragraphs — 1.AH ✅ Done; Phase 1 complete. - README.md: the multimodal-sub-spine status line — 1.AH ✅ (PR #38), 1.m6 / Phase 1 complete. - phase-1-engine-and-llm.md: the §1.AH bullet (Phase-1-doable half ✅ Done), the 1.m6 milestone row (1.AH ✅), and the dependency table row (adapters Done; host-wiring spans Phases 2–6). - deferred-tasks.md: flip the Imagen/TTS/knobs (A1/A2/A6) and Sora/Veo (A3/A4) adapter entries to ✅ Done, with the N4 opacity obligation discharged; count>1, verified pricing rows, the host-wiring D-mechanisms, A5, and the Sora deprecation stay deferred. A5 (save_to url double-fetch) remains deferred; OpenAI Sora 2 is vendor-deprecated (2026-09-24, A3-arm only). prettier clean; Leakwatch 0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… and milestones - Mark Phase 1 as complete with all workstreams (1.A–1.AH) merged (PR #6–#38). - Achieve global milestones M1 (LLM seam proven) and M2 (engine end-to-end). - Detail the completion of the critical path and additive sub-spines, including Lane C and multimodal sub-spine. - Document the decision on multimodal I/O and its integration into the roadmap.



Completes Phase-1 workstream 1.AH — the Phase-1-doable half of media output — and records the later-phase media host-wiring as tasks in each phase doc. Also folds the 1.AG-Done status-doc sync (
bbd8a47) made after PR #37 merged.The four generative media-output adapters (behind the
@relavium/llmseam)audio.speech→ base64 +response_format↔MIMEgenerateImages→ base64mediapartgenerateMedia→opaque{ jobId }+pollMediaJob(engine LRO)bareMimevalidator + gpt-image-1size/qualityknobsEach section followed develop → Opus adversarial review → fix → Sonnet adversarial review → fix. The A2 Sonnet pass caught a genuine SSRF/key-exfiltration blocker (the Imagen path spread
providerOptionsinto the SDK config withoutstripTransportKeys); fixed, then a dedicated 3-adversary verification confirmed it fully closed and the OpenAI sibling paths safe.Design highlights
rlv-mediajob:1:<base64url(vendorId)>— a stateless, round-trip-validated bijection (no in-memoryMap), so a cold-process re-attach resolves the vendor id with no adapter state. Sharedencode/decodeMediaJobIdinadapters/shared.ts(Sora + Veo). No new ADR — within ADR-0045 §7 (the one design fork, maintainer-approved).videoBytesbase64 OR a re-hostableurlsource the engine de-inlines via the singlefetchMediaBytes(never an in-adapter "second fetch site").AbortErrorduring a binary body read →cancelled) benefits both the TTS and Sora arms.@google/genai/openaivendor type crosses the seam;packages/coreuntouched; no new runtime dependency; adapters ship 0-cost (never a fabricated rate).PART C — later-phase media work recorded (rule 8: task + canonical pointer)
resolveMediaSurface,read_mediaMediaReadAccess, the load-check, cost governor,resolveForEgress,save_towrite port, theEgressCapability.fetchSSRF mechanism).read_mediabyte gate, keychain no-raw-key IPC test, canvas rendering, exit criterion + Risks row).Deferred (documented, not in this PR)
count > 1multi-artifact (needs a future media-array ADR), verified generative pricing rows, OpenAI Responses-API agentic image-gen.Verification
pnpm turbo run lint typecheck test build16/16 green · prettier clean · Leakwatch 0 · seam fence clean.Refs: ADR-0031, ADR-0032, ADR-0042, ADR-0043, ADR-0044, ADR-0045, ADR-0046
🤖 Generated with Claude Code
Summary by CodeRabbit