Skip to content

feat: Multi-Provider Media Generation (Video, Audio, Image)#475

Open
santoshkumarradha wants to merge 14 commits intomainfrom
dev/add-video
Open

feat: Multi-Provider Media Generation (Video, Audio, Image)#475
santoshkumarradha wants to merge 14 commits intomainfrom
dev/add-video

Conversation

@santoshkumarradha
Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha commented Apr 18, 2026

Summary

Epic PR for the Media Generation Milestone — adds multi-provider media generation (video, audio, image, music) across Python, TypeScript, and Go SDKs via OpenRouter and other providers.

Issues Closed

Closes #463, closes #464, closes #465, closes #466, closes #467, closes #468, closes #469, closes #470

What's Included

Wave 1 — Foundation (Merged ✅):

Wave 2 — Python Generation (Merged ✅):

Wave 3 — Cross-SDK (Merged ✅):

Wave 4 — Quality (Complete ✅):

Code Review Fixes Applied

All implementation PRs received deep code reviews (54 findings total). Key fixes:

  • SSRF protection: Job ID regex validation, URL scheme enforcement, private IP blocking
  • Timeout enforcement: aiohttp.ClientTimeout, AbortSignal.timeout(), context.WithTimeout
  • Memory safety: Max size limits (500MB audio, 500MB video), io.LimitReader, bounded accumulation
  • Retry logic: Transient error tolerance during video poll loops (3 retries)
  • API key security: WeakMap storage (TS), constructor validation (Go), toJSON exclusion
  • Code quality: SSE helper deduplication, cached provider instances, typed errors

Architecture

Agent Code
    ↓
MediaRouter.resolve(model, capability)
    ↓ (longest-prefix match)
┌─────────────┬──────────────┬───────────────┐
│ FalProvider  │ OpenRouter   │ LiteLLM       │
│ "fal-ai/"   │ "openrouter/"│ "" (catch-all) │
└─────────────┴──────────────┴───────────────┘
Provider Image Video Audio Music
OpenRouter ✅ Gemini Flash ✅ Async poll ✅ SSE stream ✅ Lyria 3 Pro
Fal.ai ✅ Direct API
LiteLLM ✅ DALL-E/etc ✅ TTS

CI Status

  • ✅ Python SDK CI — passing (3.10, 3.11, 3.12)
  • ✅ Go SDK CI — passing
  • ✅ TypeScript SDK CI — passing
  • ✅ Functional Tests — passing
  • ✅ Performance Check — passing
  • 🔄 Live API verification — in progress (real OpenRouter calls)

Test Plan

  • cd sdk/python && pytest — all tests pass (incl. 33 new integration tests)
  • cd sdk/python && ruff check . && ruff format --check . — lint clean
  • cd sdk/typescript && bun run build && bun test — 536 tests pass
  • cd sdk/go && go test ./... && go vet ./... — 1008+ tests pass
  • Security review: SSRF, timeouts, memory limits, API key handling
  • Cross-SDK integration tests: routing, lifecycle, error propagation
  • Live API verification with OPENROUTER_API_KEY (in progress)

Documentation

  • Agent-Field/website2.0#13 — Documentation update issue created

…ation (#466) (#472)

Squash merge: image_config support for OpenRouter (#466)
…lResponse (#469) (#473)

Squash merge: VideoOutput type and video support (#469)
…463) (#474)

- New MediaRouter class in media_router.py with longest-prefix-first matching
- Lazy _media_router property in AgentAI with fal/openrouter/litellm providers
- Refactored ai_with_vision(), ai_with_audio(), ai_generate_video() to use router
- Updated tests for new routing pattern
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 18, 2026

Performance

SDK Memory Δ Latency Δ Tests Status
Python 7.9 KB -13% 0.33 µs -6%
Go 223 B -20% 0.60 µs -40%
TS 458 B +31% 1.91 µs -5%

Regression detected:

  • TypeScript memory: 350 B → 458 B (+31%)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 18, 2026

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 86%, aggregate ≥ 88%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface Current Baseline Δ
control-plane 87.20% 87.30% ↓ -0.10 pp 🟡
sdk-go 91.00% 90.70% ↑ +0.30 pp 🟢
sdk-python 93.63% 93.63% ↑ +0.00 pp 🟢
sdk-typescript 92.72% 92.56% ↑ +0.16 pp 🟢
web-ui 90.02% 90.01% ↑ +0.01 pp 🟢
aggregate 88.98% 89.01% ↓ -0.03 pp 🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 18, 2026

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface Touched lines Patch coverage Status
control-plane 0 ➖ no changes
sdk-go 419 90.00%
sdk-python 0 ➖ no changes
sdk-typescript 383 93.00%
web-ui 0 ➖ no changes

✅ Patch gate passed

Every surface whose lines were touched by this PR has patch coverage at or above the threshold.

santoshkumarradha and others added 8 commits April 18, 2026 06:11
…tion (#468)

* feat(go-sdk): add MediaProvider interface and OpenRouter media generation (#468)

Adds MediaProvider interface, MediaRouter for model-prefix-based dispatch,
and OpenRouterMediaProvider supporting image, audio, and video generation.

* fix(02): CR-01 validate job ID to prevent SSRF via path traversal

* fix(02): CR-02+WR-02 use context.WithTimeout for poll loop, add transient error retry

* fix(02): CR-03 increase SSE scanner buffer to 1MB for large audio chunks

* fix(02): WR-01 cap io.ReadAll with 10MB LimitReader on all HTTP responses

* fix(02): WR-03 validate API key non-empty, return error from constructor

* fix(02): WR-05+WR-06 validate non-empty prompt/text before API calls

* fix(02): WR-07 return error on base64 decode failure instead of silent skip

* fix(02): IN-05 set VideoData.Filename to generated_video.mp4

* fix(02): WR-08 add full video poll lifecycle test and input validation tests
)

Implement SSE streaming audio via OpenRouter chat completions API and
add music generation capability to the MediaProvider ABC and
OpenRouterProvider.
…a generation (#467)

Ports MediaProvider abstraction to TS SDK with VideoRequest/ImageRequest/AudioRequest
types, MediaRouter prefix-based dispatch, and OpenRouterMediaProvider supporting
video (async job polling), image, and audio (SSE stream) generation.
…464)

* feat(python-sdk): add OpenRouter video generation via async polling (#464)

* fix(python-sdk): address code review findings for OpenRouter video (#464)

CR-01: Add image_url to request body (was silently dropped)
CR-02: Validate job_id format + enforce HTTPS-only video download URL
HI-01: Add MAX_VIDEO_BYTES (500MB) size limit on video downloads
HI-02: Add comment clarifying download uses no auth headers
HI-03: Add transient poll error retry (max 3 consecutive 502/503/504)
MD-01: Fix duration type to Optional[float], remove int() cast in agent_ai
MD-03: Move poll sleep to end of loop (poll immediately on first iteration)
LO-01: Truncate error response bodies to 500 chars
LO-02: Move _error_messages to class constant _VIDEO_ERROR_MESSAGES
IN-02: Add test for image_url passthrough in request body
Apply fixes from REVIEW-465.md:
- CR-01: Add aiohttp.ClientTimeout(total=300s) to SSE streaming
- CR-02: Add MAX_AUDIO_B64_BYTES (500MB) size guard
- HI-01: Extract _stream_openrouter_audio() shared helper (dedup ~90 lines)
- HI-02: Cache _openrouter_provider as lazy property (like _fal_provider)
- HI-03: Rename format -> audio_format internally to avoid builtin shadow
- ME-02: Use resp.content.readline() for proper SSE line parsing
- ME-03: Truncate error response body to 500 chars
- ME-04: Validate duration > 0 and <= 600
- LO-02: Replace deprecated get_event_loop with @pytest.mark.asyncio
Apply fixes from REVIEW-ts-sdk-media.md:
- CR-01: Add AbortSignal.timeout() to all fetch calls (30s API, 120s download)
- CR-02: SSRF validation — assertSafeUrl() blocks non-HTTPS, localhost, private IPs
- CR-03: API key stored in WeakMap, toJSON() excludes key
- WR-01: Poll loop checks deadline after sleep, uses Math.min for sleep duration
- WR-02: Process remaining SSE buffer after stream ends
- WR-04: Track parse errors, throw MediaProviderError after 50 consecutive
- WR-05: Include model + endpoint in all error messages
- WR-06: MediaProviderError typed error class for programmatic handling
- Python: 33 tests — MediaRouter routing, OpenRouter video/audio/music
  lifecycle, AgentAI dispatch, MultimodalResponse consistency, error
  propagation, provider caching
- TypeScript: 28 tests — MediaRouter, OpenRouter video/image/audio,
  SSRF protection (8 cases), MediaProviderError typing
- Go: 25 tests — MediaRouter, OpenRouter video lifecycle with httptest,
  audio SSE, input validation, context cancellation
Keep dev/add-video version which includes ai_generate_music delegate
and all media generation methods added during the milestone.
@santoshkumarradha santoshkumarradha marked this pull request as ready for review April 18, 2026 11:05
@santoshkumarradha santoshkumarradha requested review from a team and AbirAbbas as code owners April 18, 2026 11:05
- Fix flaky harness test: DurationMS can be 0ms for near-instant stubs
  in CI; use GreaterOrEqual(0) instead of Positive assertion
- Go SDK: fix image response parsing for models returning content as
  string or null, handle Gemini-style message.images[], default audio
  format to pcm16
- Python SDK: replace readline-based SSE parsing with manual chunked
  parsing to handle >64KB base64 audio lines from music models
The live verification agent changed _stream_openrouter_audio() from
readline() to iter_any() for handling large SSE lines. Update test
fakes (_FakeContent and integration test mocks) to implement iter_any()
as async generators instead of readline().

Fixes 12 test failures in CI: test_openrouter_audio.py and
test_media_integration.py.
Add coverage tests for branches not exercised by the existing media
integration suite: optional video payload fields, submit/poll error
paths, image config+inline-base64 fallback, Gemini-style images[],
audio default voice, HTTP error, invalid SSE/base64 chunks, and
RawStdEncoding fallback. Lifts patch coverage from 69% to 89%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment