Skip to content

RFC: Establish pkg/ai as the foundational AI kernel #2409

@shaj13

Description

@shaj13

Overview

This RFC proposes standardizing the Docker Agent to go beyond a CLI tool for running agentic AI. There are many open-source AI frameworks out there, each with its own limitations and tradeoffs. The Docker Agent already has strong core capabilities — model fallback, streaming, tool execution, structured output, multi-provider support — but they're buried inside runtime internals, tightly coupled and not reusable.

The focus of this proposal is modularization and establishing a strong core. By redesigning around a central pkg/ai package with clean, idiomatic Go APIs, we create clear boundaries between layers. With that modularity in place, targeting different use cases becomes straightforward, adding new features becomes incremental instead of invasive, and we avoid the tradeoffs that come from having everything tangled together.

Pain

The Docker Agent was built around pkg/runtime as the primary entry point for model interaction. This made sense early on — the runtime powers the TUI agent loop and it does that job well. But as the project grew, more features needed to talk to models (session titles, compaction, background agents), and each one had to work around the runtime rather than with a shared foundation.

Today, the core LLM interaction logic — streaming, fallback, retry, tool execution — lives inside runtime alongside orchestration concerns like sessions, events, permissions, and hooks. These are tightly coupled, which makes it difficult to use one without pulling in the other.

Specific challenges:

  • Model interaction requires runtime. There is no lightweight way to call a model. Session compaction creates an entire nested runtime instance for a single summarization call. Session title generation reimplements its own stream drain and fallback loop to avoid depending on runtime.

  • Core types lack a shared root. Message and Usage live in pkg/chat, Tool and ToolCall in pkg/tools, Provider in pkg/model/provider, error handling in pkg/modelerrors. These all describe the same domain — LLM interaction — but there is no common package they stem from, which leads to cross-cutting imports and circular dependency pressure as new features are added.

  • Extending means working around coupling. Adding a new capability that needs model access (evaluation, RAG rewriting, structured extraction) requires either importing runtime with all its dependencies or duplicating the call-and-stream pattern. Neither scales well.

  • One shape for all use cases. The runtime is designed for long-running interactive agents. But not every model interaction is agentic — some are one-shot completions, some are structured data extraction, some are background tasks. These different shapes are forced through the same path today.

Goal

Establish pkg/ai as the core package — a dependency-free foundation that owns all LLM interaction primitives. Everything needed to communicate with a model lives here: types, interfaces, streaming, fallback, retry, tool execution, and structured output.

The package should:

  • Be the single import for model interaction. One package gives you messages, tools, providers, streaming, and completion — no need to assemble five imports to make a call.

  • Be generic and use-case agnostic. It should serve a one-shot text generation, a structured data extraction, and a multi-turn tool loop equally well — without assuming agents, sessions, or UI.

  • Sit at the bottom of the dependency graph. pkg/ai depends on nothing internal. Everything else — runtime, agents, sessions, providers — depends on it. This breaks circular dependency pressure and gives the codebase a clear direction.

  • Make the runtime thinner. Runtime becomes focused on what it's good at: orchestrating long-running interactive agents. It calls into pkg/ai for model interaction instead of owning that logic. Session compaction and title generation become simple callers.

  • Lower the cost of new features. Any new capability that needs to talk to a model — evaluation, background summarization, RAG pipelines — imports pkg/ai and calls Generate. No runtime, no duplication.

Proposed Design

Package Structure

pkg/ai/
├── ai.go           // Generate, GenerateText, GenerateData[T]
├── message.go      // Message, Role, Usage, FinishReason (from pkg/chat)
├── tool.go         // Tool, ToolCall, ToolResult (from pkg/tools)
├── provider.go     // Provider interface (from pkg/model/provider)
├── stream.go       // StreamDelta, stream draining logic (from runtime/streaming.go)
├── fallback.go     // Retry, backoff, model chain (from runtime/fallback.go)
├── option.go       // Functional options: WithModel, WithTools, WithStream, etc.
├── errors.go       // Error classification, context overflow (from pkg/modelerrors)
└── result.go       // Result type returned by Generate

Core API

package ai

// Generate runs the full completion loop: call model → stream response →
// execute tools → repeat, until the model stops or max turns is reached.
// Fallback, retry, and streaming are handled internally.
func Generate(ctx context.Context, opts ...Option) (*Result, error)

// GenerateText is a convenience wrapper that returns the text content.
func GenerateText(ctx context.Context, opts ...Option) (string, error)

// GenerateData calls the model with structured output and unmarshals into T.
func GenerateData[T any](ctx context.Context, opts ...Option) (T, error)

Result

type Result struct {
    Text             string
    ReasoningContent string
    ToolCalls        []ToolCall
    FinishReason     FinishReason
    Usage            *Usage
    Model            string       // which model actually responded
}

Options (functional)

ai.WithModel(model)              // primary provider
ai.WithFallbacks(models...)      // fallback chain
ai.WithMessages(msgs...)         // conversation messages
ai.WithTools(tools...)           // available tools
ai.WithMaxTurns(n)               // max tool-call round trips
ai.WithMaxTokens(n)              // model max output tokens
ai.WithStream(callback)          // streaming delta callback
ai.WithRetries(n)                // per-model retry count
// ... more options as needed

Provider Interface

type Provider interface {
    ID() string
    CreateChatCompletionStream(ctx context.Context, messages []Message, tools []Tool) (MessageStream, error)
}

Existing provider implementations (OpenAI, Anthropic, etc.) implement this interface. They live outside pkg/ai and are passed in via WithModel().

What Generate Does Internally

1. Build model chain: primary + fallbacks
2. Call model (with retry/backoff on failure, fallback on exhaustion)
3. Drain stream → aggregate into Result
4. If model returned tool calls and turns remain:
   a. Execute tools (through event/hook system — design TBD)
   b. Append tool results to messages
   c. Go to step 2
5. Return final Result

Event / Hook System

The ai package will expose an event and hook system that allows callers to observe, intercept, and control the completion loop from outside. This includes:

  • Observing streaming deltas, tool calls, model fallbacks
  • Allowing or rejecting tool calls before execution
  • Modifying tool inputs/outputs
  • Injecting side effects (session recording, telemetry, UI events)

The specific design of this system is out of scope for this RFC and will be addressed in a follow-up.

Migration

Existing packages (pkg/chat, pkg/tools, pkg/model/provider, pkg/modelerrors) will alias their types to pkg/ai to avoid breaking existing code. Over time, consumers migrate to importing pkg/ai directly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions