Skip to content

1.AH — generative media-output adapters (TTS/Imagen/Sora/Veo) + later-phase media task records#38

Merged
cemililik merged 21 commits into
mainfrom
development
Jun 21, 2026
Merged

1.AH — generative media-output adapters (TTS/Imagen/Sora/Veo) + later-phase media task records#38
cemililik merged 21 commits into
mainfrom
development

Conversation

@cemililik

@cemililik cemililik commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Completes Phase-1 workstream 1.AH — the Phase-1-doable half of media output — and records the later-phase media host-wiring as tasks in each phase doc. Also folds the 1.AG-Done status-doc sync (bbd8a47) made after PR #37 merged.

The four generative media-output adapters (behind the @relavium/llm seam)

Section What Review
A1 OpenAI-TTS audio — audio.speech → base64 + response_format↔MIME Opus + Sonnet, GO
A2 Gemini-Imagen image — generateImages → base64 media part Opus + Sonnet + SSRF-fix verify, GO
A3 OpenAI/Sora async video — generateMedia→opaque { jobId } + pollMediaJob (engine LRO) Opus + Sonnet, GO
A4 Gemini/Veo async video + shared opaque-jobId codec Opus + Sonnet, GO
A6 shared bareMime validator + gpt-image-1 size/quality knobs Opus, GO

Each section followed develop → Opus adversarial review → fix → Sonnet adversarial review → fix. The A2 Sonnet pass caught a genuine SSRF/key-exfiltration blocker (the Imagen path spread providerOptions into the SDK config without stripTransportKeys); fixed, then a dedicated 3-adversary verification confirmed it fully closed and the OpenAI sibling paths safe.

Design highlights

  • Opaque async-job id (ADR-0045 §7): rlv-mediajob:1:<base64url(vendorId)> — a stateless, round-trip-validated bijection (no in-memory Map), so a cold-process re-attach resolves the vendor id with no adapter state. Shared encode/decodeMediaJobId in adapters/shared.ts (Sora + Veo). No new ADR — within ADR-0045 §7 (the one design fork, maintainer-approved).
  • Veo byte delivery: inline videoBytes base64 OR a re-hostable url source the engine de-inlines via the single fetchMediaBytes (never an in-adapter "second fetch site").
  • Abort classification fix (a native AbortError during a binary body read → cancelled) benefits both the TTS and Sora arms.
  • No @google/genai/openai vendor type crosses the seam; packages/core untouched; no new runtime dependency; adapters ship 0-cost (never a fabricated rate).

PART C — later-phase media work recorded (rule 8: task + canonical pointer)

  • phase-1 §1.AH expanded into the A1–A6 list + deferred decisions + the host-wiring-spans-2–6 surface map.
  • phase-2 §2.S (new) — the canonical host-wiring task list (resolveMediaSurface, read_media MediaReadAccess, the load-check, cost governor, resolveForEgress, save_to write port, the EgressCapability.fetch SSRF mechanism).
  • phase-3 §3.B split (Rust CAS, handle-only de-inline test, read_media byte gate, keychain no-raw-key IPC test, canvas rendering, exit criterion + Risks row).
  • phase-4 §4.N (new) — VS Code FS-CAS wiring + webview rendering. phase-5 managed media metering/poll-through. phase-6 cloud object-storage media + tenancy + byte endpoint.

Deferred (documented, not in this PR)

  • A5 save_to url double-fetch (rare-case perf; correctness already fine — CAS dedupes the bytes; a risky engine refactor isn't worth it).
  • count > 1 multi-artifact (needs a future media-array ADR), verified generative pricing rows, OpenAI Responses-API agentic image-gen.
  • All host-wiring (the new Phase-2–6 tasks above) + vendor live-credential conformance — future-phase work, not 1.AH.

Verification

pnpm turbo run lint typecheck test build 16/16 green · prettier clean · Leakwatch 0 · seam fence clean.

Refs: ADR-0031, ADR-0032, ADR-0042, ADR-0043, ADR-0044, ADR-0045, ADR-0046

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Expanded multimodal media generation for images, TTS audio, and async video workflows.
    • Improved async media polling with opaque, vendor-safe job IDs (including clearer progress/done/failed handling).
  • Bug Fixes
    • Hardened media MIME handling and input sanitization; stricter capability/error handling for unsupported or invalid results.
    • Abort/cancellation is now consistently classified and propagated.
  • Documentation
    • Updated Phase 1 completion and roadmap/deferred-work notes across multimodal workflows.
  • Tests
    • Expanded Gemini/OpenAI adapter and conformance coverage, focusing on MIME normalization, abort behavior, and async polling edge cases.

cemililik and others added 15 commits June 21, 2026 11:15
1.AG (media output generation, Phase D) merged to main via PR #37 (2026-06-21),
including the final pre-merge review's two HIGH fixes (orphaned-vertex cost
addend + crash-in-window run:paused). Flip its status everywhere, per the
Roadmap-Done-After-Merge rule:

- phase-1-engine-and-llm.md: the 1.AG dependency-table row (◇ → ✅ Done, PR #37),
  the 1.m6 summary row (1.AG ✅), and the narrative entry's Status line.
- current.md: 1.AG ✅ Done (PR #37); "only 1.AH remains" on the 1.m6 sub-spine.
- deferred-tasks.md: tick [x] the A5 async-media-job ADR + engine-loop obligation
  (PR #37).
- CLAUDE.md: both Status paragraphs note 1.AG Done; remaining Phase-1 work is 1.AH.
- README.md: 1.AG ✅; only 1.AH remains.
- llm-provider-seam.md: clarify the streaming media triad — non-streaming output
  landed at 1.AG (generate() + generative endpoint, ADR-0046); the streaming triad
  is deferred to 1.AH (ADR-0046 §4).

Docs-only; format:check clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The first 1.AH vendor-adapter section (package-level, no surface — provable offline).
Refactors OpenAI `generateMedia` into a modality dispatcher and wires the audio arm:

- generateMedia now dispatches by modality: image → gpt-image-1 (extracted into
  openAiGenerateImage), audio → TTS (new openAiGenerateSpeech), video → typed
  UnsupportedCapabilityError (the async Sora path is a later section); DeepSeek →
  typed capability error (generates no media). The extraction also keeps each arm
  small (no cognitive-complexity smell).
- openAiGenerateSpeech: `client.audio.speech` returns BINARY audio → base64-encoded
  into an in-flight MediaPart (the engine de-inlines to a handle). `req.mimeType`
  selects the vendor response_format (default mp3); providerOptions.audio.voice
  selects the voice (default alloy). The raw audio bytes NEVER cross the seam — `raw`
  carries only a non-byte `{ responseFormat }` diagnostic (strip-on-sink anyway, I3).
- TTS_FORMAT_TO_MIME + ttsResponseFormat (mimeType↔format) + ttsVoice helpers.
- No new runtime dependency (openai SDK already imported); Buffer base64 mirrors
  packages/db's media-store (the adapter tsconfig carries `types: ["node"]`; the seam
  itself stays pure under tsconfig.seam.json `types: []`).

Tests: TTS base64 + voice + format-default round-trip; req.mimeType → response_format
mapping (audio/opus → opus); the modality-reject test updated (OpenAI video + DeepSeek
reject; audio no longer rejected). Toolchain 16/16 green, format clean, Leakwatch 0.

Docs: deferred-tasks + llm-provider-seam — OpenAI-TTS now wired (1.AH A1); Gemini-Imagen
remains (A2). Pricing rows + count>1 deferred per the 1.AH scope decisions.

Refs: ADR-0045, ADR-0031
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…/count notes, empty-bytes test)

Applies the verified findings from the A1 Opus review (0 blocker/high; 1 medium,
3 low — all confirmed):

- M1 (medium): move the TTS body download (`await response.arrayBuffer()`) INSIDE
  the try/catch. `audio.speech` is a __binaryResponse — create() returns the raw
  Response unconsumed, so the multi-MB download happens at the read; a mid-download
  socket reset / abort was escaping unclassified and flattening to opaque `internal`
  instead of a classified LlmError (and an abort → `internal` not `cancelled`). Now
  routed through openaiErrorToLlmError, matching the image/egress-error contract.
- L1 (low): document the `pcm → audio/L16` sample-rate loss — OpenAI pcm is 24 kHz
  but the seam's MediaMimeTypeSchema forbids the `;rate=` parameter (RFC 2586 default
  8 kHz), so a consumer must assume 24 kHz. Mirrors the pre-existing chat pcm16
  convention; recorded as a low deferred-tasks known-limitation.
- L2 (low): comment that `count` is intentionally a no-op for TTS (per-character
  billing, single stream — no bill-N-deliver-1 hazard, unlike the image path).
- L3 (low): test the empty-audio-bytes → bad_request rejection branch.

Toolchain 16/16 green, format clean, Leakwatch 0.

Refs: ADR-0045
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold the verified findings from the A1 Sonnet adversarial review (0 blocker/high):

- doc(nit): refresh the `generateMedia` method JSDoc — it still claimed audio
  (TTS) was NOT wired; audio dispatches to `openAiGenerateSpeech` since A1. Video
  remains the async Sora path (correctly throws `UnsupportedCapabilityError`).
- test(low): assert `result.raw === { responseFormat: 'mp3' }` on the TTS
  happy-path — pins the I3 structural guarantee (no audio bytes leak into `raw`,
  which is `z.unknown()`) at the test level.
- test(low): add a TTS-call-failure test — a rejected `audio.speech.create`
  surfaces as a typed `LlmProviderError` (transport, retryable) via
  `openaiErrorToLlmError`, proving the try/catch is wired rather than raw.

Toolchain 16/16 green; prettier clean; Leakwatch 0.

Refs: ADR-0045, ADR-0046
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the Gemini adapter's separate-endpoint image generation behind the seam
(ADR-0045 §1, ADR-0046), mirroring the OpenAI image arm:

- `GeminiTransport.generateImages` — a new required transport method (a transport
  honestly declares the generative surface it services; image is Gemini's only
  Phase-1 generative arm, Veo video is 1.AH A4). The default `sdkTransport` wraps
  `ai.models.generateImages`; the conformance harness replays it.
- `GeminiImageRequest`/`GeminiImageResponse` — hand-rolled structural subsets so no
  vendor type crosses the seam (parity with `GeminiRequest`/`GeminiResponse`).
- `generateMedia` dispatches by modality: image → `geminiGenerateImage`
  (`generatedImages[].image.imageBytes` → a base64 `media` part the engine
  de-inlines to a `media://` handle — I3); audio/video fail loud with a typed
  `UnsupportedCapabilityError`. `count > 1` is rejected before egress; a
  safety-filtered candidate (`raiFilteredReason`, no image) maps to
  `content_filter`; `numberOfImages` is pinned to 1 AFTER any providerOptions
  spread so an author count can't smuggle past the single-artifact seam.
- Conformance: a recorded Imagen fixture drives the existing generative
  seam-contract scenario against the real fold; 9 adapter unit tests cover the
  happy path, MIME default, providerOptions threading, count>1, content_filter,
  no-image, transport rejection, and audio/video capability rejection.

Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.

Refs: ADR-0045, ADR-0046, ADR-0011
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold the two verified nits from the A2 Opus adversarial review (0 blocker/high;
the raw:response I3 question was assessed and confirmed a defensible parity
choice — raw is a sink-stripped diagnostic by contract, types.ts:442-444):

- doc(nit): document the content_filter taxonomy in geminiGenerateImage — only
  the in-body raiFilteredReason channel maps to content_filter; an HTTP-level
  Imagen safety 4xx normalizes to bad_request (Gemini's ApiError carries no
  structured content-policy code, unlike OpenAI's isContentPolicyCode, so a
  status-only block has no reliable signal to key on; message-sniffing would be
  fragile). A documented, behaviorally-inert divergence from the OpenAI sibling.
- test(nit): assert a present-but-empty raiFilteredReason falls through to
  bad_request, not content_filter (the previously-uncovered guard sub-branch).

Toolchain 16/16 green; prettier clean; Leakwatch 0.

Refs: ADR-0045, ADR-0046, ADR-0011
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ker)

Fold the A2 Sonnet second-pass adversarial review (1 blocker, 1 medium, 3 low,
3 nit — the SSRF blocker was missed entirely by the first Opus pass):

- BLOCKER (B1) — SSRF / API-key exfiltration on the Imagen path: geminiGenerateImage
  spread providerOptions into the Imagen config WITHOUT stripTransportKeys, so a
  caller-supplied httpOptions.baseUrl could redirect egress + the live API key to an
  attacker host (the text generate() path strips it precisely for this reason). Fix:
  apply stripTransportKeys to the spread; extend stripTransportKeys to also drop an
  author-injected abortSignal (L1 — the signal is the engine's to set, never
  providerOptions), hardening both paths. + an SSRF regression test.
- MEDIUM (M1) — vendor MIME not parameter-stripped: a parameterized Imagen MIME
  (image/png; q=1.0) reached MediaStore.put / media_objects.mimeType un-canonicalized
  (no parameter-stripping CHECK in the production de-inline path). Fix: strip
  ;-parameters to the bare MIME, mirroring the generate() arm (mapContent). + test.
- LOW: assert model threading (L2); cover an entirely-absent generatedImages field
  (L3). NIT: strengthen isGeminiImageResponse to reject a non-array generatedImages
  (N2); assert raw is the exact transport response (N3).

Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.

Refs: ADR-0045, ADR-0046, ADR-0011, ADR-0043
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the OpenAI adapter's ASYNC video LRO behind the seam (ADR-0045 §3/§7), the
first real async generative adapter (design validated by a dedicated research pass):

- generateMedia(video) → client.videos.create → ALWAYS returns an opaque { jobId }
  (even on instant completion — completion is the engine poll loop's job; returning
  { media } would orphan the LRO). raw carries only { id, status }, never bytes.
- pollMediaJob(jobId, key, signal) → decode → videos.retrieve → status map: queued/
  in_progress → pending(clamped progress); completed → videos.downloadContent →
  base64 video/mp4 media part (downloaded in the same try as retrieve so a mid-
  download abort classifies as cancelled); failed → mapVideoCreateError (content-
  policy code → content_filter, else unknown). A malformed jobId returns a FATAL
  `failed` (not a throw — a throw is engine-classified retryable and would loop).
- Opaque jobId encoding (ADR-0045 §7): rlv-mediajob:1:<base64url(vendorId)> — a
  STATELESS reversible bijection (encodeVideoJobId/decodeVideoJobId), NOT an in-memory
  Map, so a cold-process re-attach (§3) resolves the vendor id with no adapter state.
  Relavium-namespaced + base64url'd: not the bare vendor op-name/URL, engine never
  parses it (I1/ADR-0011), the non-secret resource id is safe to persist. NO new ADR
  (within ADR-0045 §7; rationale recorded in the adapter doc-comment).
- providerOptions is NOT spread into the SDK call — only explicit soraSeconds (reject
  non-{4,8,12}, never round) + soraSize knobs are extracted (no transport-key surface,
  per the A2-verification sibling-sweep note).
- 11 unit tests (encoding round-trip/opacity, foreign-token decode, generateMedia
  always-jobId, duration reject, full status mapping, content_filter/null-error,
  malformed-token, abort threading, DeepSeek guards); the generative-seam conformance
  caveat + the llm-provider-seam.md Wired list updated.

Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.

Refs: ADR-0045, ADR-0011, ADR-0031
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…overage)

Fold the A3 Opus adversarial review (0 blocker/high; "no behavioral defects" —
all 9 confirmed items were a doc overstatement or a coverage gap on correct code):

- decodeVideoJobId now round-trip-validates (re-encode === jobId): base64url decode
  is lenient (drops invalid chars), so a corrupted/non-canonical right-prefix token
  would otherwise decode to a wrong-but-non-empty vendor id and be polled. The strict
  check rejects every such token; every minted token still round-trips exactly.
- pollMediaJobSora: a defensive `default` arm on the status switch returns a FATAL
  `unknown` failed instead of falling through to `undefined` (which would break the
  Promise<MediaJobStatus> contract) if the SDK ever returns an out-of-union status.
- Tests (the rest were coverage gaps on correct code): assert the create FormData body
  forwards model/prompt/seconds (videos.create is multipart — capture FormData fields);
  seconds defaults to 4 + a valid size knob is forwarded; an invalid size is dropped;
  progress clamps (150→1); completed-but-empty-bytes → bad_request; empty error code →
  unknown + omitted; the abort signal reaches BOTH retrieve and downloadContent; the
  non-canonical-token decode rejection.

Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.

Refs: ADR-0045, ADR-0011, ADR-0031
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…on + coverage)

Fold the A3 Sonnet second-pass review (0 blocker/high; 6 distinct items — 2 low, 4 nit):

- abort classification (nit, one-canonical-home fix): a NATIVE abort during a binary
  body read (TTS audio.speech / Sora downloadContent arrayBuffer) throws a DOMException/
  Error named 'AbortError', NOT the SDK's APIUserAbortError, so it previously classified
  as `unknown`. openaiErrorToLlmError now detects an 'AbortError'-named error → `cancelled`
  — benefits every binary-read arm (A1 TTS + A3 Sora) consistently. + a classifier test.
- soraSeconds now takes providerId (was hardcoded 'openai'), matching every other error
  helper in the file.
- comment: the `bytes === undefined` half of the completed-arm guard is the TS narrowing
  guard (bytes is control-flow-conditionally assigned), not dead code — clarified.
- doc: types.ts pollMediaJob JSDoc no longer claims "no Phase-1 adapter implements it"
  (Sora wired at A3; Veo at A4).
- tests (the rest were coverage gaps on correct code): the videos.create AbortSignal is
  threaded; durationSeconds 8/12 forward correctly; the pollMediaJobSora default arm maps
  an unrecognized SDK status to a fatal unknown failed.

Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.

Refs: ADR-0045, ADR-0011
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oding

Wire the Gemini adapter's ASYNC Veo video LRO behind the seam (ADR-0045 §3/§7),
completing the four 1.AH generative adapters, and factor the opaque-jobId encoding
into one shared home:

- adapters/shared.ts: encodeMediaJobId/decodeMediaJobId (moved from openai.ts A3,
  generalized — provider-agnostic, the engine routes pollMediaJob by the bound
  provider so the shared rlv-mediajob:1: prefix is collision-safe). The Sora arm now
  imports them; the round-trip-validated bijection + its tests live in shared.test.ts.
- GeminiTransport gains generateVideos + pollVideo (required) — plain Relavium subsets
  (GeminiVideoRequest/Operation/Poll), no @google/genai type crosses the seam. The
  default sdkTransport wraps models.generateVideos / operations.getVideosOperation;
  getVideosOperation reads only operation.name, so pollVideo reconstructs a fresh
  GenerateVideosOperation carrying the persisted name → stateless cold-process re-attach.
- generateMedia(video) → geminiGenerateVideo: ALWAYS returns an opaque { jobId } from the
  operation name (never { media }); numberOfVideos pinned to 1 AFTER stripTransportKeys
  (single-artifact + the A2 SSRF lesson — genai co-mingles httpOptions in the config);
  durationSeconds threaded; raw byte-free.
- pollMediaJob → geminiPollVideo: pending (done:false); done delivers inline videoBytes as
  base64 OR a re-hostable url source (ADR-0045 §7 — the engine de-inlines via the single
  fetchMediaBytes, never a second fetch site); raiFilteredCount>0 → content_filter;
  operation error → unknown; empty → bad_request; malformed jobId → fatal failed.
- 14 Veo unit tests (opaque jobId, SSRF strip, status mapping incl. url-source + base64,
  content_filter/error/empty, malformed token, abort threading); conformance replay +
  seam-doc Wired list updated.

Veo end-to-end completion of a url-source result needs the host MediaUrlFetch re-host
hook (1.AH host-wiring, deferred).

Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.

Refs: ADR-0045, ADR-0011, ADR-0031, ADR-0043
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold the A4 Opus adversarial review (0 blocker/high; "every finding is a test-coverage
or style item on correct code — A3/Sora behavior fully preserved by the refactor"):

- GeminiVideoPoll.raiFilteredCount → `number | undefined` for sibling parity (N1).
- tests (coverage gaps on correct code): assert the Veo create-path threads the
  AbortSignal (L1); assert numberOfVideos:5 in providerOptions is force-pinned back to 1
  (the A2 single-artifact guard, L3); a pollVideo transport-rejection classifies (L2 —
  the only previously-uncovered new lines); a pending→done poll sequence transition (L4 —
  also exercises fakeVideoTransport's sequence path); schema-validate the url-source done
  result for parity with the base64 case (N2).

Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.

Refs: ADR-0045, ADR-0011
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold the A4 Sonnet second-pass review (0 blocker/high/medium; "the Opus GO holds —
every finding is test-coverage/doc completeness on correct code"):

- test: pollVideo threads the DECODED op-name to the transport (fakeVideoTransport now
  captures lastOperationName) — proves the re-attach decode→thread invariant (N1).
- test: a parameterized vendor video MIME (video/mp4; codecs=h264) strips to the bare
  video/mp4 — parity with the Imagen arm (L1/N5).
- test: durationSeconds is omitted from the Veo config when not requested (N2).
- test: the pending→done poll sequence clamps on the last status (a 3rd poll stays done)
  — locks the fakeVideoTransport sequence contract (N9).
- doc: the generative-seam conformance header no longer says Veo "remains deferred"
  (Veo is wired at A4, exercised via fakeVideoTransport) (N7).

Skipped by design: a cross-provider token-rejection test (N3 — the opaque token is
provider-agnostic by the maintainer-approved design; the engine routes pollMediaJob by
the bound provider) and tightening GeminiVideoRequest.config typing (N4 — the
Record<string,unknown> escape hatch is the established pattern for all 3 gemini requests).

Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.

Refs: ADR-0045, ADR-0011
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ality knobs

Two adapter-level improvements (A5 save_to double-fetch stays deferred — it's a
rare-case perf optimization needing a risky engine-flow refactor; correctness is
already fine since the CAS put dedupes the bytes):

- bareMimeType (gemini.ts): a shared helper that strips ;-parameters AND validates the
  vendor MIME via the shared MediaMimeTypeSchema (returns '' on absent/parameter-only/
  ILLEGAL incl. a CR/LF-injected value that split(';') alone would pass). Replaces the
  three split-only sites (mapContent inline media-out, geminiGenerateImage, geminiPollVideo)
  — closes the A2-N1 cross-arm gap where a CR/LF MIME could reach media_objects.mimeType
  (the production de-inline path runs no MIME regex). Callers fall back ('' → default) or
  skip the part. + a CR/LF-rejection test.
- gpt-image-1 size/quality knobs (openai.ts): openAiGenerateImage now threads optional
  size/quality from providerOptions.image.{size,quality}, narrowed to the valid gpt-image-1
  values (unrecognized dropped — no 400-inducing passthrough), in the request BODY only
  (never the SDK RequestOptions — the A2 SSRF lesson). E2E-reachable once the engine
  populates providerOptions for a generative call (host-wiring). + forward/drop tests.

Toolchain 16/16 green; prettier clean; Leakwatch 0; seam fence clean.

Refs: ADR-0044, ADR-0031, ADR-0011
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…se docs

Analyze the media host-wiring that 1.AH deliberately left to later phases and record
it as tasks in each phase doc (rule 8: task + canonical ADR/deferred-tasks pointer,
never a spec restatement). No HW/S labels invented — each task is named by its
mechanism + the canonical D-series / ADR pointer (matching deferred-tasks.md):

- phase-1 §1.AH: expand the narrative bullet into the landed A1–A6 adapter list (the
  four generative adapters + shared opaque-jobId codec + bareMime, behind the seam,
  not yet runtime-reachable), the deferred decisions (count>1 ADR, pricing rows, the
  Responses-API path), and the host-wiring-spans-2–6 surface map + the build-with-the-
  first-surface warning. A5 noted deferred.
- phase-2 §2.S (NEW): the CANONICAL home for the cross-surface host-wiring task list —
  resolveMediaSurface, the read_media MediaReadAccess factory, session-scope
  media_references, the load-check, the cost governor, resolveForEgress, the save_to
  write port, and the EgressCapability.fetch SSRF mechanism (security-critical) +
  milestone/dependency rows.
- phase-3 §3.B: split into the desktop Rust CAS + handle-only de-inline integ test +
  the read_media Tauri byte gate + save_to MediaWritePort + the keychain no-raw-key IPC
  test + canvas rendering + a points-to-§2.S shared-wiring bullet + the reserved-triad
  note + exit criterion #9 + a Risks row + the media ADR Related links.
- phase-4 §4.N (NEW): VS Code FS-CAS host-wiring (points to §2.S) + webview-resource
  rendering + the SSRF mechanism + a P4.M7 milestone + ADR links.
- phase-5: managed gateway media poll-through (pass-through, not a store), mediaUnits
  metering + reserve→settle, durable settle of a billed-but-failed async job, the
  no-logging-extends-to-media-bytes guarantee, the per-modality $ dashboard axis, ADRs.
- phase-6: object-storage MediaStore + BullMQ sweep, media-table tenancy/RLS + cross-org
  CAS dedup, GET /media/{handle} byte gate, durable failed/cancelled cost, the reserved
  {kind:'workspace'} read_media scope, and the record-only portal preview triad.

Prettier clean; Leakwatch 0; all referenced ADR/phase links verified to exist.

Refs: ADR-0031, ADR-0032, ADR-0042, ADR-0043, ADR-0044, ADR-0045, ADR-0046, ADR-0015
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @cemililik, your pull request is larger than the review limit of 150000 diff characters

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99ba1bf2-bb38-4449-a825-e62665c82da0

📥 Commits

Reviewing files that changed from the base of the PR and between c0e226d and 5b9e539.

📒 Files selected for processing (5)
  • docs/roadmap/deferred-tasks.md
  • docs/roadmap/phases/phase-1-engine-and-llm.md
  • docs/roadmap/phases/phase-5-managed-inference.md
  • packages/llm/src/adapters/shared.test.ts
  • packages/llm/src/adapters/shared.ts

📝 Walkthrough

Walkthrough

Implements generateMedia/pollMediaJob for the Gemini adapter (Imagen sync image, Veo async LRO video) and OpenAI adapter (GPT-image-1 sync image, TTS sync audio, Sora async LRO video). Adds shared opaque job-ID encode/decode helpers, comprehensive unit and conformance tests, and updates all roadmap and reference documentation to mark milestone 1.AG complete while adding forward media host-wiring workstreams for phases 2–6.

Changes

Generative Media Adapter Implementation

Layer / File(s) Summary
Shared opaque media job ID encoding
packages/llm/src/adapters/shared.ts, packages/llm/src/adapters/shared.test.ts
Adds MEDIA_JOB_PREFIX, exported encodeMediaJobId/decodeMediaJobId with base64url encoding and canonical round-trip validation; tests cover opaque prefix, vendor-id round-trips, and fail-closed decoding for malformed/foreign tokens.
Gemini Imagen/Veo transport contracts and MIME helpers
packages/llm/src/adapters/gemini.ts, packages/llm/src/types.ts
Exports GeminiImageRequest/Response and GeminiVideoRequest/Operation/Poll interfaces; extends GeminiTransport with generateImages, generateVideos, pollVideo; adds bareMimeType validator and hardens stripTransportKeys to also remove abortSignal.
Gemini sdkTransport and adapter-level media handlers
packages/llm/src/adapters/gemini.ts
Implements sdkTransport Imagen/Veo methods; adds geminiGenerateImage (single-image, RAI mapping, MIME defaulting), geminiGenerateVideo (LRO token minting), geminiPollVideo (decode→poll→map); wires modality dispatch and pollMediaJob into createGeminiAdapter; replaces naïve MIME split with bareMimeType in chat-part normalization.
Gemini adapter media tests and transport fakes
packages/llm/src/adapters/gemini.test.ts
Refactors test fakes with unusedGenerative stub; adds scripted fakeVideoTransport and genMedia/pollMedia helpers; adds Imagen and Veo test suites covering MIME sanitization, SSRF stripping, abort-signal threading, opaque jobId, pending→done polling, safety-filter mapping, and error classification.
OpenAI image/TTS/Sora adapter implementation
packages/llm/src/adapters/openai.ts
Classifies AbortError as cancelled; adds image size/quality option helpers and TTS_FORMAT_TO_MIME voice helpers; rewrites generateMedia to route image/audio sync and Sora async LRO; implements soraSeconds/soraSize validation and Sora pollMediaJob with MP4 download and status mapping.
OpenAI adapter media and Sora tests
packages/llm/src/adapters/openai.test.ts
Adds TTS base64/MIME/error tests, image providerOptions forwarding/filtering tests, AbortError classification test, and Sora async-video test suite covering jobId, duration, abort-signal, status mapping, and DeepSeek rejection.
Gemini conformance fixture and replay transport
packages/llm/src/conformance/fixtures/gemini.ts, packages/llm/src/conformance/gemini.conformance.test.ts, packages/llm/src/conformance/generative-seam.conformance.test.ts
Adds imageGenerate fixture and registers mediaGenerate in GEMINI_FIXTURES; adds isGeminiImageResponse type guard and generateImages replay-transport implementation; updates generative-seam comment to reflect production adapters now implemented.

Roadmap and Reference Documentation Updates

Layer / File(s) Summary
1.AG completion status across roadmap files
CLAUDE.md, README.md, docs/roadmap/current.md, docs/roadmap/phases/phase-1-engine-and-llm.md, docs/roadmap/deferred-tasks.md
Marks 1.AG done (PR #37) across all status surfaces; removes 1.AG from remaining-work lists; splits deferred adapter items and adds audio/L16 MIME known-limitation bullet.
LLM provider seam reference doc update
docs/reference/shared-core/llm-provider-seam.md
Updates ADR-0031 generateMedia/pollMediaJob wiring narrative to enumerate OpenAI/Gemini adapter implementations; adjusts streaming triad docs to state emission is deferred to 1.AH.
Phase 2 CLI media host-wiring workstream (2.S)
docs/roadmap/phases/phase-2-cli.md
Adds 2.S workstream detailing CLI obligations: resolveMediaSurface, scoped read_media, output_modalities load check, containment-checked save_to, SSRF/redirect defense, TUI/JSON rendering, host media GC, durable cost snapshots; updates milestones and dependencies.
Phase 3 desktop IPC media requirements
docs/roadmap/phases/phase-3-desktop.md
Expands 3.B IPC section with handle-only Channel<StreamChunk> invariant test, off-channel read_media byte-gate spec, save_to/CAS-orphan sweep, keychain IPC test; adds live theater read_media rendering requirement; adds exit criterion and risk bullet for media-byte leakage across WebView↔Rust boundary.
Phase 4 VS Code media host-wiring workstream (4.N)
docs/roadmap/phases/phase-4-vscode.md
Adds 4.N workstream for VS Code host: filesystem CAS media store, durable read_media handles, webview rendering via asWebviewUri, SSRF egress blocking; adds P4.M7 milestone row.
Phase 5 managed inference and Phase 6 cloud media additions
docs/roadmap/phases/phase-5-managed-inference.md, docs/roadmap/phases/phase-6-cloud-execution-portal.md
Phase 5 adds managed poll-through egress, per-modality caps, mediaUnits metering, and failed-job durable settle. Phase 6 extends RLS to media tables, adds GET /media/{handle} endpoint, cloud CAS MediaStore with BullMQ sweep, billing snapshot for failed jobs, and workspace read_media scope.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant generateMedia
  participant GeminiAdapter
  participant OpenAIAdapter
  participant encodeMediaJobId
  participant VendorTransport

  rect rgba(100, 149, 237, 0.5)
    Note over Caller,VendorTransport: Sync image/audio path
    Caller->>generateMedia: modality=image/audio
    generateMedia->>GeminiAdapter: geminiGenerateImage(request)
    GeminiAdapter->>VendorTransport: generateImages(request, key)
    VendorTransport-->>GeminiAdapter: GeminiImageResponse (base64 bytes)
    GeminiAdapter-->>Caller: MediaGenResult (inline media)
  end

  rect rgba(144, 238, 144, 0.5)
    Note over Caller,VendorTransport: Async video LRO path (Sora / Veo)
    Caller->>generateMedia: modality=video
    generateMedia->>OpenAIAdapter: openaiGenerateVideo(request)
    OpenAIAdapter->>VendorTransport: POST /videos/generations
    VendorTransport-->>OpenAIAdapter: soraJobId
    OpenAIAdapter->>encodeMediaJobId: encode(soraJobId)
    OpenAIAdapter-->>Caller: opaque jobId token

    Caller->>generateMedia: pollMediaJob(jobId, signal)
    generateMedia->>OpenAIAdapter: decodeMediaJobId(jobId)
    OpenAIAdapter->>VendorTransport: GET /videos/generations/{id}
    VendorTransport-->>OpenAIAdapter: status (queued→completed)
    OpenAIAdapter->>VendorTransport: GET /videos/generations/{id}/content
    VendorTransport-->>OpenAIAdapter: MP4 bytes
    OpenAIAdapter-->>Caller: MediaPollResult done (base64 video)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • HodeTech/Relavium#37: Introduces the generative media implementation (Gemini Imagen/Veo and OpenAI image/TTS/Sora adapters) that this PR references and validates via status updates.
  • HodeTech/Relavium#11: Established the ADR-0031 LlmProvider multimodal seam contract (generateMedia/pollMediaJob) that the implementation layers of this PR fulfill.

Poem

🐇 Hop hop! The media flows at last,
Imagen and Sora — adapters amassed!
Base64 bytes, an opaque job token too,
SSRF defense and MIME scrubbed clean through.
The 1.AG checkbox — ✅ — hooray!
Only 1.AH stands in the way. 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementation of generative media-output adapters (TTS/Imagen/Sora/Veo) for Phase 1 workstream 1.AH, which directly matches the PR's primary objective and the substantial adapter code changes across OpenAI and Gemini implementations.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch development

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the generative media-output adapters for OpenAI (TTS audio, Sora async video) and Gemini (Imagen sync image, Veo async video) behind the @relavium/llm seam, along with stateless opaque job ID encoding/decoding bijections and extensive unit and conformance tests. It also updates the project roadmap and documentation to reflect the completion of the media output generation phase (1.AG) and the transition to host-wiring (1.AH). The code review identified several critical issues in the newly added adapters where nullish or missing fields in API responses (such as video.error, video.progress, op.name, and poll.error) could bypass existing checks and trigger runtime TypeError crashes or Zod validation failures.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/llm/src/adapters/openai.ts Outdated
Comment thread packages/llm/src/adapters/openai.ts Outdated
Comment thread packages/llm/src/adapters/gemini.ts Outdated
Comment thread packages/llm/src/adapters/gemini.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/llm/src/conformance/gemini.conformance.test.ts`:
- Around line 30-35: In the isGeminiImageResponse type guard function, replace
the unsafe `as` cast when accessing the generatedImages property. Instead of
using the cast `(value as { generatedImages?: unknown }).generatedImages`, use
`Reflect.get(value, 'generatedImages')` to safely access the property without
type casting. This maintains the same runtime behavior while adhering to the no
unsafe casts requirement.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4de2406a-ac49-41de-89ac-41ff300b7948

📥 Commits

Reviewing files that changed from the base of the PR and between 713feec and 8f09ac7.

📒 Files selected for processing (21)
  • CLAUDE.md
  • README.md
  • docs/reference/shared-core/llm-provider-seam.md
  • docs/roadmap/current.md
  • docs/roadmap/deferred-tasks.md
  • docs/roadmap/phases/phase-1-engine-and-llm.md
  • docs/roadmap/phases/phase-2-cli.md
  • docs/roadmap/phases/phase-3-desktop.md
  • docs/roadmap/phases/phase-4-vscode.md
  • docs/roadmap/phases/phase-5-managed-inference.md
  • docs/roadmap/phases/phase-6-cloud-execution-portal.md
  • packages/llm/src/adapters/gemini.test.ts
  • packages/llm/src/adapters/gemini.ts
  • packages/llm/src/adapters/openai.test.ts
  • packages/llm/src/adapters/openai.ts
  • packages/llm/src/adapters/shared.test.ts
  • packages/llm/src/adapters/shared.ts
  • packages/llm/src/conformance/fixtures/gemini.ts
  • packages/llm/src/conformance/gemini.conformance.test.ts
  • packages/llm/src/conformance/generative-seam.conformance.test.ts
  • packages/llm/src/types.ts

Comment thread packages/llm/src/conformance/gemini.conformance.test.ts
cemililik and others added 3 commits June 21, 2026 16:54
…responses

Address the gemini-code-assist review on PR #38 (1 high, 3 medium) — defensive
guards for media-response fields the vendor SDKs type as present but a deviating
runtime response could omit (the adapters trusted the SDK types):

- mapVideoCreateError (HIGH): accept `VideoCreateError | null | undefined` and use
  `err == null` (was `=== null`) so a runtime-absent Sora `video.error` on a failed
  status maps to `unknown` instead of crashing on `err.code`; read `err.code`/`err.message`
  defensively (truthy/`??`) too.
- Sora `video.progress` (medium): `(video.progress ?? 0) / 100` so a missing progress
  yields 0, not NaN (NaN would fail the engine z.number().min(0).max(1)).
- Veo `op.name` (medium): `!op.name` (was `=== undefined || .length === 0`) — covers a
  runtime null/undefined/empty without a TypeError on `.length`.
- Veo `poll.error` (medium): truthy guard `if (poll.error)` before reading `.message`.

+ two regression tests (Sora failed with absent error → unknown; Sora in_progress with
absent progress → pending 0). The Veo null guards are exercised by the existing
empty/truthy-path tests.

Toolchain 16/16 green; prettier clean; Leakwatch 0.

Refs: ADR-0045
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e (CodeRabbit)

Address CodeRabbit's actionable comment on PR #38: the conformance type guard read
`generatedImages` via `(value as { generatedImages?: unknown })` — an unsafe `as`
against the project's "prefer type guards, no unsafe `as`" guideline.

CodeRabbit suggested `Reflect.get`, but that returns `any` and trips the harder
`@typescript-eslint/no-unsafe-assignment` rule — so instead use `in`-narrowing, which
is both cast-free AND any-free and preserves the exact semantics (absent or undefined
or array → valid; a present non-array → invalid).

Toolchain 16/16 green; prettier clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ells)

Address the 3 SonarCloud "extract this nested ternary into an independent statement"
major code smells in the media test helpers:

- gemini.test.ts fakeVideoTransport `polls` — the undefined / single / sequence
  selection is now an if/else chain (kept the `'done' in opts.poll` discriminant —
  Array.isArray does not narrow a readonly-array union).
- openai.test.ts soraFetch `url` (typeof input string/URL/Request) and `call`
  (content/create/retrieve) — both nested ternaries extracted to if/else.

Behavior identical; toolchain 16/16 green; prettier clean; Leakwatch 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cemililik

Copy link
Copy Markdown
Contributor Author

@coderabbitai review all

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

@cemililik Sure, I'll review all the changes in this PR now.

✅ Action performed

Full review finished.

…n 2026-09-24)

OpenAI announced the Sora 2 video models + the videos.* API shut down 2026-09-24,
affecting the 1.AH A3 OpenAI/Sora async-video adapter. Record it as a tracked
forward-obligation: impact is narrow (only the Sora adapter arm + tests — the engine
async-job LRO, the shared opaque-jobId codec, the conformance, and Gemini/Veo are
provider-agnostic and unaffected), no live exposure until the Phase-2 host wiring, and
the maintainer action (retarget to a replacement model, or disable/remove the Sora arm
leaving the seam + Veo intact) is captured with the deadline.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cemililik

Copy link
Copy Markdown
Contributor Author

Review responses

All automated-review findings addressed (CI green: lint·typecheck·test, SonarCloud, engine coverage floor, strict peer-deps):

gemini-code-assist (4 — 525621b, threads resolved): defensive nullish-handling for media-response fields the vendor SDKs type as present but a deviating runtime response could omit —

  • mapVideoCreateError: accepts null | undefined (err == null) + reads code/message defensively (no TypeError);
  • Sora video.progress: (… ?? 0) / 100 (no NaN → no Zod failure);
  • Veo op.name: !op.name (handles null/undefined/empty);
  • Veo poll.error: truthy guard before .message.
    Plus 2 regression tests (Sora failed-with-absent-error → unknown; in_progress-with-absent-progress → pending 0).

CodeRabbit (1 — f686d00): the isGeminiImageResponse unsafe as cast removed. Note: the suggested Reflect.get returns any, which trips the harder @typescript-eslint/no-unsafe-assignment rule here — so I used in-narrowing instead (cast-free and any-free, identical semantics).

SonarCloud (3 nested-ternary smells — c0e226d): extracted the fakeVideoTransport polls and soraFetch url/call ternaries to if/else.

ℹ️ Separately recorded (e169c7a, deferred-tasks.md): OpenAI deprecated the Sora 2 models + Videos API (shutdown 2026-09-24). Impact is narrow — only the A3 Sora adapter arm; the engine async-job LRO, the shared opaque-jobId codec, the conformance, and Gemini/Veo are unaffected. Captured as a maintainer obligation (retarget or disable before the date); no live exposure until the Phase-2 host wiring.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
docs/roadmap/phases/phase-1-engine-and-llm.md (1)

1065-1069: 💤 Low value

Clarify "landed (unmerged)" phrasing for adapters.

Line 1066 states the adapters "landed (unmerged)," but earlier claims in this file (line 1059) mark 1.AG as ✅ Done (PR #37, 2026-06-21). The parenthetical "(unmerged)" is confusing — the code is either merged (with the PR) or it is not.

The clarification "They are not yet runtime-reachable" (line 1069) resolves the intent: the adapters shipped in PR #37 but require 1.AH host-wiring to be callable. Recommend rewording to remove the ambiguous "(unmerged)" or clarify it as "landed (deferred host-wiring)."

📝 Suggested rephrase
-  The Phase-1-doable half landed (unmerged): the **four generative media-output adapters**
+  The Phase-1-doable half landed in PR `#37`: the **four generative media-output adapters**

Or:

-  The Phase-1-doable half landed (unmerged): the **four generative media-output adapters**
+  The Phase-1-doable half landed (deferred host-wiring): the **four generative media-output adapters**
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/roadmap/phases/phase-1-engine-and-llm.md` around lines 1065 - 1069, The
phrase "landed (unmerged)" at line 1066 is confusing because it contradicts the
earlier note that 1.AG was marked Done in PR `#37`. Since the adapters
(OpenAI-TTS, Gemini-Imagen, OpenAI/Sora, Gemini/Veo) are actually merged via PR
`#37` but not yet runtime-reachable, replace "(unmerged)" with "(deferred
host-wiring)" to clarify that the adapters were shipped in the PR but their
integration/host-wiring for runtime access is still pending in 1.AH.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/roadmap/deferred-tasks.md`:
- Around line 169-179: The deferred-tasks.md roadmap file incorrectly attributes
the OpenAI-TTS audio adapter to 1.AH A1 (lines 171 and surrounding context) when
it was actually shipped in 1.AG, not deferred. Update the generateMedia deferred
task section to remove the claim that OpenAI-TTS is being wired in 1.AH, and
instead explicitly state that OpenAI-TTS audio adapter was already completed in
the earlier 1.AG sections. Adjust any references to "1.AH A1" in this section to
"1.AG A1" to match the actual completion status documented in PR objectives `#37`.
This correction should make clear that only Gemini-Imagen (A2) and the two async
adapters remain as deferred work for 1.AH.

In `@docs/roadmap/phases/phase-5-managed-inference.md`:
- Line 221: In the phase-5-managed-inference.md file at line 221, locate the
phrase containing "the D15 loader is the 1.AH host-wiring half" and replace "D15
loader" with "D15 load-check" to align with the canonical terminology used in
deferred-tasks.md. The updated phrase should read "the D15 load-check is the
1.AH host-wiring half" to maintain consistency across documentation for the
validation step where validateWorkflowWithCatalog is invoked by the host loader
against the managed catalog.

In `@packages/llm/src/adapters/openai.ts`:
- Around line 1125-1224: Update the SORA_SECONDS constant to include the missing
supported durations by adding entries for 16 and 20 seconds alongside the
existing 4, 8, and 12 second mappings. Additionally, update the error message in
the soraSeconds function to reflect all five supported duration values instead
of just the three currently mentioned, so users receive accurate guidance about
valid Sora clip lengths.

In `@packages/llm/src/adapters/shared.ts`:
- Around line 86-88: The encodeMediaJobId function currently accepts empty
vendor IDs, which causes invalid encoded values that the decode function
intentionally rejects on lines 102-109, breaking the bijection contract. Add
validation at the beginning of the encodeMediaJobId function to throw an error
or return early if the vendorId is an empty string or falsy value, ensuring that
only valid vendor IDs that can be successfully decoded are accepted at encoding
time rather than deferring validation to the decode phase.

---

Nitpick comments:
In `@docs/roadmap/phases/phase-1-engine-and-llm.md`:
- Around line 1065-1069: The phrase "landed (unmerged)" at line 1066 is
confusing because it contradicts the earlier note that 1.AG was marked Done in
PR `#37`. Since the adapters (OpenAI-TTS, Gemini-Imagen, OpenAI/Sora, Gemini/Veo)
are actually merged via PR `#37` but not yet runtime-reachable, replace
"(unmerged)" with "(deferred host-wiring)" to clarify that the adapters were
shipped in the PR but their integration/host-wiring for runtime access is still
pending in 1.AH.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b78c3976-560a-4955-9164-6438d77fc7e4

📥 Commits

Reviewing files that changed from the base of the PR and between 713feec and c0e226d.

📒 Files selected for processing (21)
  • CLAUDE.md
  • README.md
  • docs/reference/shared-core/llm-provider-seam.md
  • docs/roadmap/current.md
  • docs/roadmap/deferred-tasks.md
  • docs/roadmap/phases/phase-1-engine-and-llm.md
  • docs/roadmap/phases/phase-2-cli.md
  • docs/roadmap/phases/phase-3-desktop.md
  • docs/roadmap/phases/phase-4-vscode.md
  • docs/roadmap/phases/phase-5-managed-inference.md
  • docs/roadmap/phases/phase-6-cloud-execution-portal.md
  • packages/llm/src/adapters/gemini.test.ts
  • packages/llm/src/adapters/gemini.ts
  • packages/llm/src/adapters/openai.test.ts
  • packages/llm/src/adapters/openai.ts
  • packages/llm/src/adapters/shared.test.ts
  • packages/llm/src/adapters/shared.ts
  • packages/llm/src/conformance/fixtures/gemini.ts
  • packages/llm/src/conformance/gemini.conformance.test.ts
  • packages/llm/src/conformance/generative-seam.conformance.test.ts
  • packages/llm/src/types.ts

Comment thread docs/roadmap/deferred-tasks.md
Comment thread docs/roadmap/phases/phase-5-managed-inference.md Outdated
Comment thread packages/llm/src/adapters/openai.ts
Comment thread packages/llm/src/adapters/shared.ts
cemililik and others added 2 commits June 21, 2026 17:50
Address the 2 valid findings from CodeRabbit's re-review of the pushed fixes:

- shared.ts encodeMediaJobId: reject an empty vendorId at mint (it would produce a
  prefix-only token decodeMediaJobId rejects — breaking the bijection). The adapter
  call sites already guard, so this is a fail-fast contract assertion. + a test.
- phase-5-managed-inference.md: "D15 loader" → "D15 load-check" for terminology
  consistency with deferred-tasks.md / phase-2 §2.S.

Skipped (with rationale, replied on the threads): the SORA_SECONDS "add 16/20"
finding (the pinned openai@6.42 types VideoSeconds as '4'|'8'|'12' — 16/20 would fail
tsc — and Sora is deprecated, e169c7a) and the deferred-tasks "OpenAI-TTS is 1.AG"
finding (a false positive — OpenAI-TTS is 1.AH A1; 1.AG wired OpenAI-image sync).

Toolchain 16/16 green; prettier clean; Leakwatch 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…fier (CodeRabbit nitpick)

"landed (unmerged)" is a transient PR-state qualifier that reads as confusing in a
roadmap doc (and goes stale once #38 merges). Rephrase to the accurate, non-transient
state — "is implemented behind the seam" — the sentence already notes it is not yet
runtime-reachable + needs the deferred host-wiring below, so nothing is over-claimed.
(CodeRabbit suggested "(deferred host-wiring)", but the adapters aren't the deferred
part — the host-wiring is.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cemililik cemililik merged commit 72abf54 into main Jun 21, 2026
5 of 6 checks passed
@sonarqubecloud

Copy link
Copy Markdown

cemililik added a commit that referenced this pull request Jun 21, 2026
PR #38 merged — the Phase-1-doable half of 1.AH (the four generative media-output
adapters OpenAI-TTS/Gemini-Imagen/OpenAI-Sora/Gemini-Veo behind the seam + the shared
opaque-jobId codec + the bare-MIME validator) is done. Reconcile the status docs
(roadmap-done-after-merge):

- current.md: 1.AH Phase-1-doable half ✅ Done (PR #38); Phase 1 complete; the
  host-wiring half is distributed across Phases 2–6 as recorded tasks. Replaced both
  "only 1.AH remains" status lines.
- CLAUDE.md: both status paragraphs — 1.AH ✅ Done; Phase 1 complete.
- README.md: the multimodal-sub-spine status line — 1.AH ✅ (PR #38), 1.m6 / Phase 1
  complete.
- phase-1-engine-and-llm.md: the §1.AH bullet (Phase-1-doable half ✅ Done), the 1.m6
  milestone row (1.AH ✅), and the dependency table row (adapters Done; host-wiring
  spans Phases 2–6).
- deferred-tasks.md: flip the Imagen/TTS/knobs (A1/A2/A6) and Sora/Veo (A3/A4) adapter
  entries to ✅ Done, with the N4 opacity obligation discharged; count>1, verified
  pricing rows, the host-wiring D-mechanisms, A5, and the Sora deprecation stay deferred.

A5 (save_to url double-fetch) remains deferred; OpenAI Sora 2 is vendor-deprecated
(2026-09-24, A3-arm only). prettier clean; Leakwatch 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cemililik added a commit that referenced this pull request Jun 21, 2026
… and milestones

- Mark Phase 1 as complete with all workstreams (1.A–1.AH) merged (PR #6#38).
- Achieve global milestones M1 (LLM seam proven) and M2 (engine end-to-end).
- Detail the completion of the critical path and additive sub-spines, including Lane C and multimodal sub-spine.
- Document the decision on multimodal I/O and its integration into the roadmap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant