Skip to content

[Go SDK] Add MediaProvider interface and OpenRouter media generation #468

@santoshkumarradha

Description

@santoshkumarradha

Summary

Add MediaProvider interface, MediaRouter, and OpenRouterMediaProvider to the Go SDK, enabling video, image, and audio generation with idiomatic Go patterns.

Context

The Go SDK's ai/ package currently has:

  • client.go — Chat completions (Complete, StreamComplete)
  • multimodal.go — MIME type detection only (20 lines)
  • No media generation of any kind

This issue adds a full media generation layer that matches the Python and TypeScript SDK capabilities.

New Files

File Purpose
sdk/go/ai/media_provider.go MediaProvider interface, MediaRouter, request/response types
sdk/go/ai/openrouter_media.go OpenRouterMediaProvider — video, image, audio gen via OpenRouter
sdk/go/ai/media_provider_test.go Unit tests for router + provider

Interface Design

// media_provider.go

type VideoRequest struct {
    Prompt          string            `json:"prompt"`
    Model           string            `json:"model"`
    Duration        int               `json:"duration,omitempty"`
    Resolution      string            `json:"resolution,omitempty"`
    AspectRatio     string            `json:"aspect_ratio,omitempty"`
    GenerateAudio   *bool             `json:"generate_audio,omitempty"`
    Seed            *int              `json:"seed,omitempty"`
    FrameImages     []FrameImage      `json:"frame_images,omitempty"`
    InputReferences []InputReference   `json:"input_references,omitempty"`
    PollInterval    time.Duration     `json:"-"` // default 30s
    Timeout         time.Duration     `json:"-"` // default 10m
}

type ImageRequest struct {
    Prompt      string            `json:"prompt"`
    Model       string            `json:"model,omitempty"`
    Size        string            `json:"size,omitempty"`
    Quality     string            `json:"quality,omitempty"`
    ImageConfig *ImageConfig      `json:"image_config,omitempty"`
}

type AudioRequest struct {
    Text   string `json:"text"`
    Model  string `json:"model,omitempty"`
    Voice  string `json:"voice,omitempty"`
    Format string `json:"format,omitempty"`
}

type MediaProvider interface {
    Name() string
    SupportedModalities() []string
    GenerateImage(ctx context.Context, req ImageRequest) (*MediaResponse, error)
    GenerateAudio(ctx context.Context, req AudioRequest) (*MediaResponse, error)
    GenerateVideo(ctx context.Context, req VideoRequest) (*MediaResponse, error)
}

type MediaRouter struct { /* prefix-based dispatch */ }

func (r *MediaRouter) Register(prefix string, provider MediaProvider)
func (r *MediaRouter) Resolve(model, capability string) (MediaProvider, error)

Developer Experience

import "github.com/Agent-Field/agentfield/sdk/go/ai"

client, _ := ai.NewClient(ai.Config{
    APIKey:  os.Getenv("OPENROUTER_API_KEY"),
    BaseURL: "https://openrouter.ai/api/v1",
})

// Video generation
result, err := client.GenerateVideo(ctx, ai.VideoRequest{
    Prompt:     "A golden retriever on a beach",
    Model:      "openrouter/google/veo-3.1",
    Resolution: "1080p",
    Duration:   8,
})
result.SaveFile(result.Files[0], "dog.mp4")

// Image generation
result, err := client.GenerateImage(ctx, ai.ImageRequest{
    Prompt: "A sunset over mountains",
    Model:  "openrouter/google/gemini-2.5-flash-image",
})
result.SaveImage(result.Images[0], "sunset.png")

// Audio generation
result, err := client.GenerateAudio(ctx, ai.AudioRequest{
    Text:  "Welcome to AgentField",
    Model: "openrouter/openai/gpt-audio",
    Voice: "nova",
})
result.SaveAudio(result.Audio, "welcome.wav")

Go-Specific Patterns

  1. Context-based cancellation — All methods take context.Context for timeout/cancel
  2. Polling uses time.Ticker with context cancellation, not sleep loops
  3. *http.Client injection — Use existing client's HTTP client or allow custom via option
  4. Error types — Define VideoJobError, ProviderError with status codes
  5. No goroutine leaks — SSE reader must respect context cancellation

Dependencies

Acceptance Criteria

  • MediaProvider interface defined with GenerateImage, GenerateAudio, GenerateVideo
  • MediaRouter handles prefix-based provider dispatch
  • OpenRouterMediaProvider implements all three methods
  • Video polling respects context.Context cancellation
  • Audio SSE streaming collects base64 chunks correctly
  • MediaResponse type with Files, Images, Audio fields + save helpers
  • go test ./sdk/go/ai/... passes
  • golangci-lint run ./sdk/go/... passes

Notes for Contributors

Severity: HIGH — New feature, Go SDK has zero media gen.

For the HTTP client, use the standard net/http package — no external deps. For SSE parsing, use bufio.Scanner on the response body. For JSON, use encoding/json.

The polling loop should use select with time.After and ctx.Done():

for {
    select {
    case <-ctx.Done():
        return nil, ctx.Err()
    case <-time.After(req.PollInterval):
        status, err := pollVideoJob(ctx, jobID)
        // ...
    }
}

Reference the Python implementation in #464 for exact API request/response JSON schemas.

Metadata

Metadata

Labels

ai-friendlyWell-documented task suitable for AI-assisted developmentarea:aiAI/LLM integrationenhancementNew feature or requestsdk:goGo SDK related

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions