Summary
Add MediaProvider interface, MediaRouter, and OpenRouterMediaProvider to the Go SDK, enabling video, image, and audio generation with idiomatic Go patterns.
Context
The Go SDK's ai/ package currently has:
client.go — Chat completions (Complete, StreamComplete)
multimodal.go — MIME type detection only (20 lines)
- No media generation of any kind
This issue adds a full media generation layer that matches the Python and TypeScript SDK capabilities.
New Files
| File |
Purpose |
sdk/go/ai/media_provider.go |
MediaProvider interface, MediaRouter, request/response types |
sdk/go/ai/openrouter_media.go |
OpenRouterMediaProvider — video, image, audio gen via OpenRouter |
sdk/go/ai/media_provider_test.go |
Unit tests for router + provider |
Interface Design
// media_provider.go
type VideoRequest struct {
Prompt string `json:"prompt"`
Model string `json:"model"`
Duration int `json:"duration,omitempty"`
Resolution string `json:"resolution,omitempty"`
AspectRatio string `json:"aspect_ratio,omitempty"`
GenerateAudio *bool `json:"generate_audio,omitempty"`
Seed *int `json:"seed,omitempty"`
FrameImages []FrameImage `json:"frame_images,omitempty"`
InputReferences []InputReference `json:"input_references,omitempty"`
PollInterval time.Duration `json:"-"` // default 30s
Timeout time.Duration `json:"-"` // default 10m
}
type ImageRequest struct {
Prompt string `json:"prompt"`
Model string `json:"model,omitempty"`
Size string `json:"size,omitempty"`
Quality string `json:"quality,omitempty"`
ImageConfig *ImageConfig `json:"image_config,omitempty"`
}
type AudioRequest struct {
Text string `json:"text"`
Model string `json:"model,omitempty"`
Voice string `json:"voice,omitempty"`
Format string `json:"format,omitempty"`
}
type MediaProvider interface {
Name() string
SupportedModalities() []string
GenerateImage(ctx context.Context, req ImageRequest) (*MediaResponse, error)
GenerateAudio(ctx context.Context, req AudioRequest) (*MediaResponse, error)
GenerateVideo(ctx context.Context, req VideoRequest) (*MediaResponse, error)
}
type MediaRouter struct { /* prefix-based dispatch */ }
func (r *MediaRouter) Register(prefix string, provider MediaProvider)
func (r *MediaRouter) Resolve(model, capability string) (MediaProvider, error)
Developer Experience
import "github.com/Agent-Field/agentfield/sdk/go/ai"
client, _ := ai.NewClient(ai.Config{
APIKey: os.Getenv("OPENROUTER_API_KEY"),
BaseURL: "https://openrouter.ai/api/v1",
})
// Video generation
result, err := client.GenerateVideo(ctx, ai.VideoRequest{
Prompt: "A golden retriever on a beach",
Model: "openrouter/google/veo-3.1",
Resolution: "1080p",
Duration: 8,
})
result.SaveFile(result.Files[0], "dog.mp4")
// Image generation
result, err := client.GenerateImage(ctx, ai.ImageRequest{
Prompt: "A sunset over mountains",
Model: "openrouter/google/gemini-2.5-flash-image",
})
result.SaveImage(result.Images[0], "sunset.png")
// Audio generation
result, err := client.GenerateAudio(ctx, ai.AudioRequest{
Text: "Welcome to AgentField",
Model: "openrouter/openai/gpt-audio",
Voice: "nova",
})
result.SaveAudio(result.Audio, "welcome.wav")
Go-Specific Patterns
- Context-based cancellation — All methods take
context.Context for timeout/cancel
- Polling uses
time.Ticker with context cancellation, not sleep loops
*http.Client injection — Use existing client's HTTP client or allow custom via option
- Error types — Define
VideoJobError, ProviderError with status codes
- No goroutine leaks — SSE reader must respect context cancellation
Dependencies
Acceptance Criteria
Notes for Contributors
Severity: HIGH — New feature, Go SDK has zero media gen.
For the HTTP client, use the standard net/http package — no external deps. For SSE parsing, use bufio.Scanner on the response body. For JSON, use encoding/json.
The polling loop should use select with time.After and ctx.Done():
for {
select {
case <-ctx.Done():
return nil, ctx.Err()
case <-time.After(req.PollInterval):
status, err := pollVideoJob(ctx, jobID)
// ...
}
}
Reference the Python implementation in #464 for exact API request/response JSON schemas.
Summary
Add
MediaProviderinterface,MediaRouter, andOpenRouterMediaProviderto the Go SDK, enabling video, image, and audio generation with idiomatic Go patterns.Context
The Go SDK's
ai/package currently has:client.go— Chat completions (Complete,StreamComplete)multimodal.go— MIME type detection only (20 lines)This issue adds a full media generation layer that matches the Python and TypeScript SDK capabilities.
New Files
sdk/go/ai/media_provider.goMediaProviderinterface,MediaRouter, request/response typessdk/go/ai/openrouter_media.goOpenRouterMediaProvider— video, image, audio gen via OpenRoutersdk/go/ai/media_provider_test.goInterface Design
Developer Experience
Go-Specific Patterns
context.Contextfor timeout/canceltime.Tickerwith context cancellation, not sleep loops*http.Clientinjection — Use existing client's HTTP client or allow custom via optionVideoJobError,ProviderErrorwith status codesDependencies
Acceptance Criteria
MediaProviderinterface defined withGenerateImage,GenerateAudio,GenerateVideoMediaRouterhandles prefix-based provider dispatchOpenRouterMediaProviderimplements all three methodscontext.ContextcancellationMediaResponsetype withFiles,Images,Audiofields + save helpersgo test ./sdk/go/ai/...passesgolangci-lint run ./sdk/go/...passesNotes for Contributors
Severity: HIGH — New feature, Go SDK has zero media gen.
For the HTTP client, use the standard
net/httppackage — no external deps. For SSE parsing, usebufio.Scanneron the response body. For JSON, useencoding/json.The polling loop should use
selectwithtime.Afterandctx.Done():Reference the Python implementation in #464 for exact API request/response JSON schemas.