Summary
Port the MediaProvider abstraction and OpenRouterMediaProvider to the TypeScript SDK, enabling video, image, and audio generation with the same DX as Python.
Context
The TypeScript SDK already has MultimodalResponse.ts and multimodal.ts (input helpers), but has no media generation capability — no image gen, no video gen, no audio gen. The Python SDK has a full MediaProvider ABC with Fal, LiteLLM, and OpenRouter implementations.
This issue brings TypeScript to parity by adding the provider interface and an OpenRouter implementation that covers video, image, and audio generation.
Scope
New Files
| File |
Purpose |
sdk/typescript/src/ai/MediaProvider.ts |
MediaProvider interface + MediaRouter class |
sdk/typescript/src/ai/OpenRouterMediaProvider.ts |
OpenRouter implementation (video, image, audio) |
Modified Files
| File |
Change |
sdk/typescript/src/ai/AIClient.ts |
Add generateVideo(), generateImage(), generateAudio() methods |
sdk/typescript/src/ai/index.ts |
Export new types |
Interface Design
// MediaProvider.ts
export interface VideoRequest {
prompt: string;
model?: string;
duration?: number;
resolution?: '480p' | '720p' | '1080p' | '1K' | '2K' | '4K';
aspectRatio?: '16:9' | '9:16' | '1:1' | '4:3' | '3:4' | '21:9' | '9:21';
generateAudio?: boolean;
seed?: number;
frameImages?: Array<{type: string; imageUrl: {url: string}; frameType?: string}>;
inputReferences?: Array<{type: string; imageUrl: {url: string}}>;
pollInterval?: number; // ms, default 30000
timeout?: number; // ms, default 600000
}
export interface ImageRequest {
prompt: string;
model?: string;
size?: string;
quality?: string;
imageConfig?: {
aspectRatio?: string;
imageSize?: string;
superResolutionReferences?: string[];
fontInputs?: Array<{fontUrl: string; text: string}>;
};
}
export interface AudioRequest {
text: string;
model?: string;
voice?: string;
format?: string;
}
export interface MediaProvider {
readonly name: string;
readonly supportedModalities: string[];
generateImage(request: ImageRequest): Promise<MultimodalResponse>;
generateAudio(request: AudioRequest): Promise<MultimodalResponse>;
generateVideo?(request: VideoRequest): Promise<MultimodalResponse>;
}
Developer Experience
import { AIClient } from '@agentfield/sdk';
const ai = new AIClient({ model: 'openai/gpt-4o' });
// Video generation
const video = await ai.generateVideo({
prompt: 'A golden retriever on a beach',
model: 'openrouter/google/veo-3.1',
resolution: '1080p',
aspectRatio: '16:9',
duration: 8,
});
await video.saveFile(video.files[0], 'dog.mp4');
// Image generation
const image = await ai.generateImage({
prompt: 'A sunset over mountains',
model: 'openrouter/google/gemini-2.5-flash-image',
imageConfig: { aspectRatio: '16:9', imageSize: '2K' },
});
await image.saveImage(image.images[0], 'sunset.png');
// Audio generation
const audio = await ai.generateAudio({
text: 'Welcome to AgentField',
model: 'openrouter/openai/gpt-audio',
voice: 'nova',
});
await audio.saveAudio(audio.audio!, 'welcome.wav');
Dependencies
Acceptance Criteria
Notes for Contributors
Severity: HIGH — New feature, TypeScript SDK currently has zero media gen.
Use fetch (native in Node 18+) for HTTP calls — no need for axios or got. For SSE parsing, use a lightweight approach: read the response body as a stream, split by \n\n, parse data: {...} lines.
Reference the Python implementation in #464 for the exact API request/response schemas. The OpenRouter API is identical regardless of client language.
Summary
Port the
MediaProviderabstraction andOpenRouterMediaProviderto the TypeScript SDK, enabling video, image, and audio generation with the same DX as Python.Context
The TypeScript SDK already has
MultimodalResponse.tsandmultimodal.ts(input helpers), but has no media generation capability — no image gen, no video gen, no audio gen. The Python SDK has a fullMediaProviderABC with Fal, LiteLLM, and OpenRouter implementations.This issue brings TypeScript to parity by adding the provider interface and an OpenRouter implementation that covers video, image, and audio generation.
Scope
New Files
sdk/typescript/src/ai/MediaProvider.tsMediaProviderinterface +MediaRouterclasssdk/typescript/src/ai/OpenRouterMediaProvider.tsModified Files
sdk/typescript/src/ai/AIClient.tsgenerateVideo(),generateImage(),generateAudio()methodssdk/typescript/src/ai/index.tsInterface Design
Developer Experience
Dependencies
Acceptance Criteria
MediaProviderinterface defined withgenerateImage,generateAudio,generateVideoMediaRouterclass handles prefix-based provider dispatchOpenRouterMediaProviderimplements video gen (async poll), image gen, audio gen (SSE)AIClientexposesgenerateVideo(),generateImage(),generateAudio()methodsMultimodalResponsewith appropriate contentnpm run lintpasses insdk/typescript/npm testpasses insdk/typescript/Notes for Contributors
Severity: HIGH — New feature, TypeScript SDK currently has zero media gen.
Use
fetch(native in Node 18+) for HTTP calls — no need for axios or got. For SSE parsing, use a lightweight approach: read the response body as a stream, split by\n\n, parsedata: {...}lines.Reference the Python implementation in #464 for the exact API request/response schemas. The OpenRouter API is identical regardless of client language.