feat(sdk): native OpenRouter audio/video/image routing across Python, TS, Go by santoshkumarradha · Pull Request #579 · Agent-Field/agentfield

santoshkumarradha · 2026-05-23T01:42:12Z

Summary

Adds first-class support for OpenRouter's full media surface in all three SDKs (Python, TypeScript, Go) without changing the public API. The provider now fetches each model's metadata once (cached per instance) and routes:

Audio → either POST /audio/speech (TTS-only models like hexgrad/kokoro-82m) or POST /chat/completions with audio modality (gpt-audio family).
Image → POST /chat/completions with modalities=["image"] (works for both image-only models like x-ai/grok-imagine-image-quality and dual-output models like google/gemini-2.5-flash-image).
Video → POST /api/v1/videos async lifecycle (now reads the current unsigned_urls array and downloads with Bearer auth — the "unsigned" URLs are served from openrouter.ai itself and require the same auth as the API).

DX is unchanged — same generate_audio / generate_video / generate_image signatures, same defaults. Adds optional extra / image_url(s) / speed / frame_type parameters that simply pass through.

Why

Kokoro and every TTS-only model on OpenRouter live on /audio/speech. The old code only knew the chat-completions audio modality so those models 404'd with No endpoints found that support the requested output modalities: text, audio.
x-ai/grok-imagine-image-quality is image-only output and rejects modalities=["image","text"]. We now send ["image"], verified to work for both image-only models and dual-output models.
Video download used the wrong field (unsigned_url singular) and fetched without auth, so google/veo-3.1-lite returned 401 even though the job completed successfully.
TS bug: frameImages / inputReferences were being passed as camelCase to the API, which expects snake_case (frame_images, input_references, frame_type).

What changes

Python (`sdk/python/agentfield/`)

media_providers.py:
- per-instance metadata cache (_model_meta_cache) + _fetch_model_meta() helper that hits /api/v1/models/{id}/endpoints lazily.
- new _openrouter_audio_speech() path. When caller asks for format="wav" we request pcm from upstream and wrap in a WAV header client-side so it stays playable.
- generate_audio routes by output_modalities; default routes to /audio/speech (broader-compat) when metadata is unavailable.
- generate_video: downloads from openrouter.ai URLs with auth, anonymous for CDN URLs.
- new params: image_urls, speed, extra (passthrough merged into request body).
multimodal_response.py: ImageOutput.save() / get_bytes() now handle data:image/...;base64,... URLs.
vision.py: modalities=["image"] and multi-part user message when image_urls are passed.

TypeScript (`sdk/typescript/src/ai/`)

OpenRouterMediaProvider.ts:
- module-level WeakMap-backed metadata cache + fetchModelMeta helper.
- seedModelMeta(model, outputModalities, inputModalities) — public test helper to pre-populate the cache (used by tests against mock servers).
- new /audio/speech code path with wrapPcm16AsWav helper (RIFF header generation).
- chat-completions SSE path now decodes audio chunks, concatenates raw bytes, re-encodes, and wraps to WAV when requested.
- video download: detects openrouter.ai host and attaches Bearer header.
MediaProvider.ts: new typed VideoFrameImage (with frameType: "first_frame" | "last_frame"), VideoInputReference. Added imageUrl, extra to VideoRequest; imageUrls, extra, expanded imageConfig (strength, style, rgbColors, backgroundRgbColor, fontInputs) to ImageRequest; speed, extra to AudioRequest.
camelCase → snake_case translation for nested objects so OpenRouter actually receives the right field names.

Go (`sdk/go/ai/`)

openrouter_media.go:
- mutex-protected metaCache + fetchModelMeta helper.
- SeedModelMeta(model, outputModalities, inputModalities) exported test helper.
- new generateAudioViaSpeechEndpoint method + wrapPCM16AsWAV helper.
- videoJobStatus.UnsignedURLs []string (plural) + Usage.Cost parsing.
- video download with Bearer when host is openrouter.ai.
media_provider.go: added ImageURL to VideoRequest; ImageURLs, Extra to ImageRequest; full ImageConfig expansion; Speed, Extra to AudioRequest; new typed FontInput.

DX preserved

Same call shapes work; nothing changed for existing callers:

await app.ai_generate_audio(text="...", model="openrouter/hexgrad/kokoro-82m", voice="af_bella", format="wav")
await app.ai_generate_image(prompt="...", model="openrouter/x-ai/grok-imagine-image-quality")
await app.ai_generate_video(
    prompt="...",
    model="openrouter/google/veo-3.1-lite",
    frame_images=[
        {"type":"image_url","image_url":{"url":first},"frame_type":"first_frame"},
        {"type":"image_url","image_url":{"url":last}, "frame_type":"last_frame"},
    ],
)

The routing is metadata-driven, so every OpenRouter model in each category works automatically. No allowlist — new TTS / video / image models added by OpenRouter work without an SDK change.

Test plan

Python: 107 media tests pass (pytest tests/test_openrouter_audio.py tests/test_openrouter_video.py tests/test_media_providers.py tests/test_media_providers_additional.py tests/test_media_integration.py tests/test_vision.py tests/test_image_config.py)
TypeScript: 596 tests pass (npm test)
Go: 228 tests pass (go test ./ai/...)
End-to-end smoke tests against real OpenRouter:
- audio: openrouter/hexgrad/kokoro-82m → 31s WAV (RIFF/WAVE PCM 16-bit mono 24kHz)
- image: openrouter/x-ai/grok-imagine-image-quality → 896×1280 JPEG
- video: openrouter/google/veo-3.1-lite → 4s 1280×720 MP4 (1MB)
- image-to-video: same model with frame_images=[first_frame, last_frame] from grok-imagine outputs → 4s 720×1280 MP4 (2.6MB)
CI green (the gate is what this PR has to clear)

Tested models (examples — not an allowlist)

Modality	Endpoint	Example models that route here
Image	`/chat/completions` w/ `modalities=["image"]`	`x-ai/grok-imagine-image-quality`, `google/gemini-2.5-flash-image`, `openai/gpt-image-1`, anything else with `image` in `output_modalities`
Audio TTS	`/audio/speech`	`hexgrad/kokoro-82m`, `openai/gpt-4o-mini-tts`, anything whose `output_modalities` is `["speech"]`
Audio chat	`/chat/completions` w/ `modalities=["text","audio"]` SSE	`openai/gpt-audio`, `openai/gpt-audio-mini`, `openai/gpt-4o-audio-preview`, `google/lyria-3-pro` (music)
Video	`/videos` async polling	`google/veo-3.1-lite`, `google/veo-3.1`, `kling-video/*`, anything with `video` in `output_modalities`

Website docs follow-up to Agent-Field/website2.0 once this merges.

… TypeScript, Go Adds first-class support for OpenRouter's full media surface in all three SDKs without changing the public API. The provider now fetches model metadata once (cached) and routes audio to either `POST /audio/speech` (TTS-only models like hexgrad/kokoro-82m) or `POST /chat/completions` with audio modality (gpt-audio family). Image generation drops `"text"` from `modalities` so image-only models like x-ai/grok-imagine-image-quality stop 404-ing. Video properly reads the current `unsigned_urls` array shape and downloads with Bearer auth (the "unsigned" URLs are served from openrouter.ai itself). DX is unchanged — same `generate_audio/video/image` signatures, same defaults. Why - Kokoro and other TTS-only models live on `/audio/speech`; the old code only knew chat-completions audio modality so they 404'd. - `x-ai/grok-imagine-image-quality` is image-only output and rejects `modalities=["image","text"]`; we now send `["image"]` which works for both image-only and dual-output models (verified vs. gemini-2.5-flash-image). - Video download was using the wrong field (`unsigned_url` singular) and fetched without auth, so veo-3.1-lite returned 401. What - Python (`sdk/python/agentfield/media_providers.py`, `multimodal_response.py`, `vision.py`): + per-instance metadata cache + `_fetch_model_meta` + new `_openrouter_audio_speech` path with client-side WAV wrapping + `image_urls` reference-image support, `speed`, `extra` passthrough + ImageOutput now handles `data:` URLs - TypeScript (`sdk/typescript/src/ai/{MediaProvider,OpenRouterMediaProvider}.ts`): + same routing + WAV wrapping + `seedModelMeta` test helper + fixed camelCase→snake_case for `frame_images` / `input_references` + new `VideoRequest.imageUrl`, `ImageRequest.imageUrls`, `extra` passthrough, expanded `ImageConfig` (strength/style/rgb_colors/...) - Go (`sdk/go/ai/{media_provider,openrouter_media}.go`): + same metadata-driven routing + `SeedModelMeta` test helper + reads `unsigned_urls` plural + downloads with Bearer when host is openrouter.ai + new `VideoRequest.ImageURL`, `ImageRequest.ImageURLs`, `Speed`, `Extra` fields; full `ImageConfig` expansion Tested - Smoke-tested end-to-end against openrouter/hexgrad/kokoro-82m (audio), openrouter/x-ai/grok-imagine-image-quality (image), and openrouter/google/veo-3.1-lite (video), including image-to-video with first_frame / last_frame guidance. Outputs saved + verified as RIFF/WAVE, JPEG, and MP4. - Python: 107 media tests pass. - TypeScript: 596 tests pass. - Go: 228 tests pass.

github-actions · 2026-05-23T01:43:53Z

Performance

SDK	Memory	Δ	Latency	Δ	Tests	Status
Python	9.4 KB	+4%	0.32 µs	-9%	✓	✓
Go	165 B	-41%	0.63 µs	-37%	✓	✓
TS	405 B	+16%	1.55 µs	-22%	✓	⚠

⚠ Regression detected:

TypeScript memory: 350 B → 405 B (+16%)

…essage The refactor of ImageOutput.save() to delegate to get_bytes() dropped the 'to save' suffix that test_output_objects_raise_for_missing_data asserts on. Restore the upfront check so save() raises 'No image data or URL available to save' while get_bytes() still raises 'No image data or URL available'.

github-actions · 2026-05-23T01:54:04Z

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 86%, aggregate ≥ 88%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface	Current	Baseline	Δ
`control-plane`	87.50%	87.30%	↑ +0.20 pp	🟡
`sdk-go`	91.80%	90.70%	↑ +1.10 pp	🟢
`sdk-python`	93.73%	93.63%	↑ +0.10 pp	🟢
`sdk-typescript`	92.80%	92.56%	↑ +0.24 pp	🟢
`web-ui`	89.91%	90.01%	↓ -0.10 pp	🟡
aggregate	89.02%	89.01%	↑ +0.01 pp	🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

github-actions · 2026-05-23T01:54:05Z

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface	Touched lines	Patch coverage	Status
`control-plane`	0	—	➖ no changes
`sdk-go`	239	90.00%	✅
`sdk-python`	0	—	➖ no changes
`sdk-typescript`	218	96.00%	✅
`web-ui`	0	—	➖ no changes

✅ Patch gate passed

Every surface whose lines were touched by this PR has patch coverage at or above the threshold.

… to read message.images - Adds Go test file (openrouter_media_routing_test.go) covering fetchModelMeta cache + error paths, /audio/speech success/error, frame_images + input_references + extra translation, and wrapPCM16AsWAV header correctness. Lifts Go patch coverage from 64% to >87% on touched lines. - Adds TS test file (openrouter_media_routing.test.ts) covering the metadata cache (success / 500 / network exception), generateImage multi-part content + imageConfig snake_case translation, video param translation (imageUrl, frameImages, inputReferences, extra), and /audio/speech speed + extra passthrough. Lifts TS coverage to 94.5% lines / 80.1% branches on OpenRouterMediaProvider.ts. - Fixes a real bug uncovered while writing tests: the TS image-response parser only read message.content[] (gpt-image-1 style) and dropped images that OpenRouter returns in the dedicated message.images[] array (gemini-*-image, grok-imagine, when content is null). Now parses both shapes.

santoshkumarradha requested review from a team and AbirAbbas as code owners May 23, 2026 01:42

santoshkumarradha merged commit 2dcc803 into main May 23, 2026
33 checks passed

santoshkumarradha deleted the feat/openrouter-native-media branch May 23, 2026 02:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sdk): native OpenRouter audio/video/image routing across Python, TS, Go#579

feat(sdk): native OpenRouter audio/video/image routing across Python, TS, Go#579
santoshkumarradha merged 3 commits into
mainfrom
feat/openrouter-native-media

santoshkumarradha commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

santoshkumarradha commented May 23, 2026

Summary

Why

What changes

Python (sdk/python/agentfield/)

TypeScript (sdk/typescript/src/ai/)

Go (sdk/go/ai/)

DX preserved

Test plan

Tested models (examples — not an allowlist)

Uh oh!

github-actions Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Uh oh!

github-actions Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Coverage gate

✅ Gate passed

Uh oh!

github-actions Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📐 Patch coverage gate

✅ Patch gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Python (`sdk/python/agentfield/`)

TypeScript (`sdk/typescript/src/ai/`)

Go (`sdk/go/ai/`)

github-actions Bot commented May 23, 2026 •

edited

Loading

github-actions Bot commented May 23, 2026 •

edited

Loading

github-actions Bot commented May 23, 2026 •

edited

Loading