Skip to content

anastasesg/claudi

Repository files navigation

Claudi

Pipeline orchestration for AI-assisted software development.

Claudi coordinates multi-step Claude agent interactions with resumability, state persistence, progress visualization, and fine-grained testability. It uses the @anthropic-ai/claude-agent-sdk directly—no wrapper layer.

Table of Contents


Features

  • Resumability — Persist state for crash recovery and resume capability
  • State Persistence — Atomic file writes track step-by-step progress
  • Progress Visualization — Docker BuildKit-inspired real-time progress display
  • Fine-grained Testability — Pure functions, immutable state, dependency injection
  • Direct SDK Integration — No wrapper layer; uses @anthropic-ai/claude-agent-sdk directly
  • 4 Step Types — Agent, Script, Parallel, Sequence
  • Recursive Composition — Nest sequences within parallel steps, and vice versa
  • MCP Support — Stdio and HTTP MCP servers for custom tools
  • Subagent Spawning — Define specialized agents that Claude can invoke
  • Human Input — Request input from human operators during execution (approvals, clarifications)
  • Structured Output — Zod schemas for validated, typed responses
  • Typed Pipeline Inputs — Define input schemas with full TypeScript inference and CLI flag generation
  • Pattern-Aware Builder — High-level primitives (.map(), .gate(), .converge()) and nested builder callbacks
  • Cost Tracking — Per-step and aggregate cost tracking with budget enforcement

Installation

# Using Bun (recommended)
bun install claudi

# Using npm
npm install claudi

# Using yarn
yarn add claudi

Requirements

  • Bun v1.0+ (recommended) or Node.js v18+
  • TypeScript 5+
  • Anthropic API key (set ANTHROPIC_API_KEY environment variable)

Quick Start

1. Initialize a Project

bunx claudi init

This creates:

  • claudi.config.json — Project configuration
  • .pipelines/definitions/ — Directory for pipeline definitions

2. Create a Pipeline

// .pipelines/definitions/analyze.ts
import { buildPipeline, z } from 'claudi/sdk';

const AnalysisResult = z.object({
  summary: z.string(),
  issues: z.array(
    z.object({
      severity: z.enum(['low', 'medium', 'high']),
      description: z.string(),
      file: z.string(),
    })
  ),
  score: z.number().min(0).max(100),
});

export const pipeline = buildPipeline({
  id: 'analyze',
  version: '1.0.0',
  description: 'Analyze codebase for issues',
  maxBudgetUsd: 5.0,
  llmConfig: { model: 'sonnet' },
  inputSchema: z.object({}),
})
  .agent('scan', {
    description: 'Scan codebase for issues',
    prompt: 'Analyze this codebase for security vulnerabilities, code quality issues, and potential bugs.',
    llmConfig: { tools: ['Read', 'Glob', 'Grep'], model: 'sonnet', maxTurns: 50 },
    outputKey: 'analysis',
    outputSchema: AnalysisResult,
  })
  .script('report', {
    description: 'Generate report',
    execute: async (ctx) => {
      const analysis = ctx.outputs.analysis!;
      console.log(`Found ${analysis.issues.length} issues. Score: ${analysis.score}/100`);
      return { reportGenerated: true };
    },
    outputKey: 'report',
  })
  .build();

3. Run the Pipeline

bunx claudi run analyze

4. Check Results

# List all runs
bunx claudi run list

# Show specific run details
bunx claudi run show analyze-abc123

# View cost summary
bunx claudi run cost --days 7

CLI Reference

claudi init

Initialize a new Claudi project.

claudi init [options]
Option Description
--force, -f Overwrite existing files
--dry-run Show what would be created without writing

Template Resolution: Templates are resolved from the local cache (~/.claudi/templates-cache/) or built-in defaults.


claudi run

Execute pipelines with real-time progress tracking.

claudi run <pipeline> [input...] [options]
Option Description
--stage N Run up to step N only
--restart Ignore previous progress, start fresh
--dry-run Test without execution
--verbose Show detailed output
--quiet, -q Suppress progress output
--plan <runId> Parent plan run ID
--todo-display <mode> Display mode: inline, rich, json, quiet

Input Variables:

Pass input variables as key=value pairs:

claudi run implement feature="Add dark mode" priority=high

Access in pipelines via ctx.variables.feature.

Subcommands

List runs:

claudi run list [pipeline]

# Output:
# Runs:
#   implement-abc123  │ completed │ 5/5 steps │ $0.24 │ 2m15s
#   plan-def456       │ failed    │ 2/3 steps │ $0.08 │ Resumable

Show run details:

claudi run show <runId>

Resume failed/paused run:

claudi run resume <runId>

View cost aggregation:

claudi run cost [--days N]  # Default: 30 days

claudi pipeline

Query registered pipelines.

# List all pipelines
claudi pipeline list

# Show pipeline details
claudi pipeline show <name>

Example output:

Pipeline: implement
Version: 1.0.0
Description: Implementation pipeline

Steps:
  1. [agent] analyze - Analyze codebase
     Model: sonnet | Max Turns: 50
     Tools: Read, Glob, Grep

  2. [parallel] features - Implement features
     ├─ [agent] auth - Add authentication
     └─ [agent] dashboard - Add dashboard

claudi logs

View and query pipeline run logs.

claudi logs [runId] [options]
Option Description
[runId] Show logs for a specific run ID
--last, -l [N] Show logs from Nth most recent run (default: 1)
--level Minimum log level: error, warn, info, debug
--step Filter by step ID
--follow, -f Tail logs in real-time
--json Output raw JSONL (for piping to jq)

Examples:

# View logs from most recent run (default)
claudi logs

# View logs for a specific run
claudi logs my-run-id

# View only errors from last run
claudi logs --last --level error

# Filter by step
claudi logs --last --step analyze

# Tail logs in real-time
claudi logs --follow

# Pipe raw JSONL to jq
claudi logs --last --json | jq .

Pipeline System

Pipeline Definition

A pipeline is a sequence of steps with optional defaults and hooks:

interface Pipeline<TInput = unknown> {
  id: string; // Unique identifier
  version: string; // Semantic version
  description: string; // What this pipeline does
  steps: PipelineStep[]; // Ordered steps to execute

  // Typed inputs (see "Typed Pipeline Inputs" section)
  inputSchema?: ZodType<TInput>; // Zod schema for validated inputs

  // Optional configuration
  maxBudgetUsd?: number; // Cost limit for entire run
  enableTodos?: boolean; // Enable TodoWrite tool (default: true)
  rateLimits?: RateLimitConfig; // API rate limiting
  llmConfig?: LLMConfig; // LLM defaults for all steps (model, tools, etc.)
}

Step Types

Claudi supports 4 step types, using a discriminated union pattern:

1. Agent Step

Execute Claude interactions with full SDK support.

{
  type: 'agent',
  id: 'analyze',
  description: 'Analyze codebase',
  prompt: 'Analyze this codebase for security issues.',

  // LLM configuration (all nested under llmConfig)
  llmConfig: {
    // Model & execution
    model: 'opus',                 // 'sonnet' | 'opus' | 'haiku'
    maxTurns: 100,
    timeout: 600000,               // 10 minutes
    permissionMode: 'acceptEdits', // Auto-accept file edits

    // Tools
    tools: ['Read', 'Grep', 'Glob', 'Write', 'Bash'],
    disallowedTools: ['Task'],     // Blocklist specific tools

    // Subagents (Claude can invoke these)
    agents: {
      'code-reviewer': {
        description: 'Expert code reviewer',
        prompt: 'You are a code review specialist...',
        tools: ['Read', 'Grep'],
      },
    },

    // MCP servers for custom tools
    mcpServers: {
      'filesystem': {
        command: 'npx',
        args: ['@modelcontextprotocol/server-filesystem'],
      },
    },

    // System prompt
    systemPrompt: { type: 'preset', preset: 'claude_code' },
    settingSources: ['project'],   // Load CLAUDE.md

    // Session management
    continueConversation: true,    // Maintain history
    forkSession: false,

    // Output validation
    outputSchema: AnalysisResultSchema,  // Zod schema

    // Human input (optional, see "Human Input" section)
    humanInput: {
      mode: 'auto',
      defaultTimeout: 300,
    },
  },

  outputKey: 'analysis',
}

2. Script Step

Execute arbitrary TypeScript code for data transformation and orchestration logic.

{
  type: 'script',
  id: 'transform',
  description: 'Transform analysis results',
  execute: async (ctx) => {
    const analysis = ctx.outputs.analysis;
    const filtered = analysis.issues.filter(i => i.severity === 'high');
    return { highPriorityIssues: filtered };
  },
  outputKey: 'filteredResults',
}

Script steps are designed for pure data transformations. For conditional branching or dynamic step generation, use Dynamic Steps instead.

3. Parallel Step

Execute steps concurrently with failure handling.

{
  type: 'parallel',
  id: 'features',
  description: 'Implement features in parallel',
  maxConcurrency: 3,             // Limit concurrent steps
  onFailure: 'continue',         // 'fail-fast' | 'continue' | 'ignore'
  steps: [
    { type: 'agent', id: 'auth', outputKey: 'auth_result', ... },
    { type: 'agent', id: 'dashboard', outputKey: 'dashboard_result', ... },
    { type: 'agent', id: 'api', outputKey: 'api_result', ... },
  ],
}

Failure Strategies:

Strategy Behavior
fail-fast (default) Cancel remaining steps on first failure
continue Complete all steps, aggregate failures
ignore Log failures, continue as successful

Context Isolation:

  • Each branch gets a deep clone of outputs (prevents race conditions)
  • variables is shallow cloned (treat as read-only)
  • After completion, outputs are merged (last-write-wins for conflicts)

4. Sequence Step

Execute steps sequentially with optional checkpoints.

{
  type: 'sequence',
  id: 'main-flow',
  description: 'Main implementation flow',
  isCheckpoint: true,            // Pause for verification after completion
  steps: [
    { type: 'agent', id: 'plan', ... },
    { type: 'parallel', id: 'implement', ... },
    { type: 'agent', id: 'test', ... },
  ],
  onStepStart: (step, ctx) => console.log(`Starting: ${step.id}`),
  onStepComplete: (step, result) => console.log(`Completed: ${step.id}`),
}

Checkpoint Support:

  • isCheckpoint: true pauses execution for user verification
  • Enables manual intervention in long-running pipelines
  • Resume with claudi run resume <runId>

Dynamic Steps

Sequence and Parallel steps support dynamic step resolution via functions. This enables conditional logic and fan-out patterns without a separate step type.

Conditional Logic

Use dynamic steps to branch execution based on runtime context:

{
  type: 'sequence',
  id: 'adaptive-flow',
  description: 'Choose approach based on complexity',
  steps: (ctx) => {
    const complexity = ctx.outputs.analysis?.complexity;
    if (complexity === 'high') {
      return [
        { type: 'agent', id: 'detailed-plan', description: 'Detailed planning', prompt: '...' },
        { type: 'agent', id: 'careful-impl', description: 'Careful implementation', prompt: '...' },
      ];
    }
    return [
      { type: 'agent', id: 'quick-impl', description: 'Quick implementation', prompt: '...' },
    ];
  },
}

Fan-Out Pattern

Use dynamic parallel steps to process collections:

{
  type: 'parallel',
  id: 'process-items',
  description: 'Process all items in parallel',
  steps: (ctx) => {
    const items = ctx.outputs.items ?? [];
    return items.map((item) => ({
      type: 'agent' as const,
      id: `process-${item.id}`,
      description: `Process ${item.name}`,
      prompt: `Process this item: ${JSON.stringify(item)}`,
    }));
  },
}

Dynamic steps can return an empty array to skip execution entirely:

steps: (ctx) => ctx.outputs.skip ? [] : [{ type: 'agent', ... }]

Recursive Composition

Steps can be arbitrarily nested:

{
  type: 'sequence',
  id: 'main',
  steps: [
    {
      type: 'parallel',
      id: 'phase-1',
      steps: [
        {
          type: 'sequence',
          id: 'feature-a',
          steps: [
            { type: 'agent', id: 'plan-a', ... },
            { type: 'agent', id: 'impl-a', ... },
          ],
        },
        {
          type: 'sequence',
          id: 'feature-b',
          // Dynamic steps for conditional inclusion
          steps: (ctx) => ctx.variables.includeFeatureB
            ? [{ type: 'agent', id: 'impl-b', ... }]
            : [],
        },
      ],
    },
  ],
}

Step Context

All steps receive a StepContext providing:

interface StepContext<TInput = unknown> {
  runId: string;                           // Unique run identifier
  sessionId?: string;                      // SDK session for continuity
  inputs: TInput;                          // Validated inputs from inputSchema (read-only)
  variables: Record<string, unknown>;      // Input variables (legacy)
  outputs: Record<string, unknown>;        // Previous step outputs
  config: PipelineConfig;                  // Resolved configuration
  stepPath: string[];                      // Path to current step

  emit: (event: EmittableEvent) => void;   // Event emission
  updateProgress?: (path, status) => void; // Progress updates
  completedTasks?: Array<{...}>;           // Resume support
}

Accessing Previous Outputs:

prompt: (ctx) => {
  const analysis = ctx.outputs.analysis;
  return `Based on this analysis: ${JSON.stringify(analysis)}, implement fixes.`;
},

Dynamic Configuration

Any configuration value can be static or computed at runtime:

{
  type: 'agent',
  id: 'dynamic-step',
  description: 'Process data',

  // Dynamic prompt (computed from context)
  prompt: (ctx) => `Process: ${ctx.outputs.preprocessed}`,

  // Dynamic llmConfig (computed from context)
  llmConfig: (ctx) => ({
    model: ctx.variables.useOpus ? 'opus' : 'sonnet',
    maxTurns: ctx.outputs.complexity === 'high' ? 100 : 50,
    tools: ctx.variables.allowBash ? ['Read', 'Bash'] : ['Read'],
  }),
}

Typed Pipeline Inputs

Define type-safe, schema-validated inputs for pipelines using Zod schemas. The CLI automatically generates flags from your schema.

Using buildPipeline Builder

The buildPipeline builder provides full type inference for both inputs and outputs via method chaining. Each .agent() or .script() call with outputKey and outputSchema automatically extends the output type, so subsequent steps see properly typed ctx.outputs:

import { buildPipeline, z } from 'claudi/sdk';

const AnalysisSchema = z.object({
  summary: z.string(),
  issues: z.array(z.object({ file: z.string(), severity: z.string() })),
});

export const pipeline = buildPipeline({
  id: 'deploy',
  version: '1.0.0',
  description: 'Deploy to environment',
  inputSchema: z.object({
    environment: z.enum(['dev', 'staging', 'prod']).describe('Target environment'),
    dryRun: z.boolean().default(false).describe('Test without deploying'),
  }),
})
  .agent('analyze', {
    description: 'Analyze deployment target',
    prompt: (ctx) => `Analyze deployment to ${ctx.inputs.environment}`,
    llmConfig: { tools: ['Read', 'Grep'], model: 'sonnet', maxTurns: 20 },
    outputKey: 'analysis',
    outputSchema: AnalysisSchema,
  })
  .agent('deploy', {
    description: 'Execute deployment',
    // ctx.outputs.analysis is fully typed from the previous step
    prompt: (ctx) => `Deploy. Issues found: ${ctx.outputs.analysis!.issues.length}`,
    llmConfig: { tools: ['Bash'], model: 'opus', maxTurns: 30 },
  })
  .build();

Builder methods:

Method Description
.agent(id, config) Add an agent step. With outputKey/outputSchema, extends output types
.script(id, config) Add a script step. With outputKey, extends output types
.sequence(id, config) Add a sequence step (nested steps run sequentially)
.sequence(id, callback, opts) Add a sequence via nested builder callback for type-safe composition
.parallel(id, config) Add a parallel step (nested steps run concurrently)
.parallel(id, callback, opts) Add a parallel via nested builder callback for type-safe composition
.map(id, config) Fan-out over a dynamic list (compiles to parallel with dynamic steps)
.gate(id, config) Draft/critique/approve loop (compiles to loop with 3 agent children)
.converge(id, config) Generate/fix convergence loop (compiles to loop with iteration-aware selection)
.outputs<T>() Declare additional output types from dynamic nested steps
.build() Produce the final Pipeline value object

Escape hatch for dynamic steps:

When a sequence or parallel step creates sub-steps with outputKey at runtime, use .outputs<>() to declare those types:

buildPipeline({ ... })
  .parallel('process', {
    description: 'Process items',
    steps: (ctx) => items.map((item) => ({
      type: 'agent' as const,
      id: `task-${item.id}`,
      outputKey: `task-${item.id}`,
      // ...
    })),
  })
  .outputs<{ [x: `task-${string}`]: TaskOutput }>()
  .build();

Nested builder callbacks:

The .sequence() and .parallel() methods accept a callback overload for type-safe composition. The callback receives a NestedBuilder and the output types accumulate automatically:

buildPipeline({ id: 'example', version: '1.0.0', description: 'Example', inputSchema: z.object({}) })
  .sequence(
    'setup',
    (seq) =>
      seq
        .agent('scan', {
          description: 'Scan project',
          prompt: 'Scan the project structure',
          llmConfig: { model: 'sonnet', maxTurns: 10, tools: ['Read', 'Glob'] },
          outputKey: 'scan',
          outputSchema: z.object({ files: z.array(z.string()) }),
        })
        .script('filter', {
          description: 'Filter results',
          execute: async (ctx) => ({ relevant: ctx.outputs.scan!.files.filter((f) => f.endsWith('.ts')) }),
          outputKey: 'filtered',
        }),
    { description: 'Setup phase' }
  )
  .build();

Pattern-aware primitives:

High-level primitives that compile to existing step types (no runner changes needed):

// .map() — fan-out over a dynamic list
.map('process-files', {
  description: 'Process each file',
  over: (ctx) => ctx.outputs.scan!.files,
  step: (file) => ({ type: 'agent', id: `process-${file}`, description: `Process ${file}`, prompt: `Process: ${file}` }),
  maxConcurrency: 3,
  outputKey: 'results',
  outputSchema: z.object({ success: z.boolean() }),
})

// .gate() — draft/critique/approve loop
.gate('review', {
  description: 'Review until approved',
  maxIterations: 3,
  draft:    { description: 'Write draft',     prompt: 'Write a draft...', outputKey: 'draft',    outputSchema: DraftSchema,    llmConfig: { model: 'opus', maxTurns: 20 } },
  critique: { description: 'Critique draft',  prompt: 'Critique...',      outputKey: 'critique', outputSchema: CritiqueSchema, llmConfig: { model: 'sonnet', maxTurns: 10 } },
  approve:  { description: 'Approve or reject', prompt: 'Approve?',      outputKey: 'approval', outputSchema: ApprovalSchema, llmConfig: { model: 'sonnet', maxTurns: 5 } },
})

// .converge() — generate/fix convergence loop
.converge('implement', {
  description: 'Implement until tests pass',
  maxIterations: 5,
  generate: { description: 'Generate code',  prompt: 'Generate...', outputKey: 'code', outputSchema: CodeSchema, llmConfig: { model: 'opus', maxTurns: 30 } },
  fix:      { description: 'Fix test failures', prompt: 'Fix...',   outputKey: 'code', outputSchema: CodeSchema, llmConfig: { model: 'sonnet', maxTurns: 40 } },
  check: async (ctx) => { /* run tests, return result */ },
  checkOutputKey: 'testResult',
  until: (result) => result.allPassed,
})

Using definePipeline Helper

The definePipeline helper provides full TypeScript inference from your Zod schema:

import { definePipeline, z } from 'claudi/sdk';

export const pipeline = definePipeline({
  id: 'deploy',
  version: '1.0.0',
  description: 'Deploy to environment',
  inputSchema: z.object({
    environment: z.enum(['dev', 'staging', 'prod']).describe('Target environment'),
    dryRun: z.boolean().default(false).describe('Test without deploying'),
    tags: z.array(z.string()).optional().describe('Tags to apply'),
    maxRetries: z.number().int().min(0).max(10).default(3).describe('Max retry attempts'),
  }),
  steps: [
    {
      type: 'agent',
      id: 'deploy',
      description: 'Run deployment',
      // ctx.inputs is fully typed: { environment: 'dev'|'staging'|'prod', dryRun: boolean, ... }
      prompt: (ctx) => `Deploy to ${ctx.inputs.environment}. Dry run: ${ctx.inputs.dryRun}`,
    },
  ] as const,
});

Typed Outputs (Curried Form)

To type ctx.outputs with definePipeline, use the curried form. This lets you specify TOutputs explicitly while TInput is still inferred from inputSchema:

type DeployOutputs = {
  analysis: { summary: string; risk: 'low' | 'medium' | 'high' };
  result: { success: boolean; url: string };
};

export const pipeline = definePipeline<DeployOutputs>()({
  id: 'deploy',
  version: '1.0.0',
  description: 'Deploy with typed outputs',
  inputSchema: z.object({ environment: z.enum(['dev', 'prod']) }),
  steps: [
    {
      type: 'agent',
      id: 'analyze',
      description: 'Analyze',
      prompt: (ctx) => `Analyze ${ctx.inputs.environment}`, // ctx.inputs typed
      outputKey: 'analysis',
    },
    {
      type: 'agent',
      id: 'deploy',
      description: 'Deploy',
      prompt: (ctx) => `Risk: ${ctx.outputs.analysis?.risk}`, // ctx.outputs typed
      outputKey: 'result',
    },
  ],
});

Tip: Prefer buildPipeline when possible — it infers output types automatically from the chain, so you don't need to maintain a separate type definition.

Auto-Generated CLI Flags

When you define an inputSchema, the CLI automatically generates flags:

# Flags are auto-generated from schema
claudi run deploy --environment prod --dry-run --max-retries 5

# Boolean flags work as expected
claudi run deploy --environment staging --dry-run

# Array values can be repeated
claudi run deploy --environment dev --tags feature --tags release

Schema to Flag Mapping:

Zod Type CLI Flag Type Example
z.string() --flag value --name "my-app"
z.number() --flag 123 --retries 3
z.boolean() --flag or --no-flag --dry-run
z.enum([...]) --flag choice --env prod
z.array(z.string()) --flag a --flag b --tags v1 --tags v2
.optional() Flag is optional (not required)
.default(val) Uses default if omitted (has default)
.describe('...') Shown in help text (documentation)

Naming Convention:

  • Schema fields use camelCase: maxRetries
  • CLI flags use kebab-case: --max-retries

Input Validation

Inputs are validated against the schema before execution:

# Valid - runs pipeline
claudi run deploy --environment prod

# Invalid - shows error
claudi run deploy --environment invalid
# Error: --environment: Invalid enum value. Expected 'dev' | 'staging' | 'prod'

# Missing required field
claudi run deploy
# Error: --environment: Required

Dry Run Validation

Use --dry-run to validate inputs without executing:

claudi run deploy --environment prod --dry-run
# ✓ Inputs validated successfully:
# {
#   "environment": "prod",
#   "dryRun": false,
#   "maxRetries": 3
# }
#
# Inputs:
#   --environment  Target environment (required) choices: "dev", "staging", "prod"
#   --dry-run      Test without deploying (default: false)
#   --tags         Tags to apply
#   --max-retries  Max retry attempts (default: 3)

Accessing Inputs in Steps

Use ctx.inputs (typed) instead of ctx.variables (untyped):

{
  type: 'script',
  id: 'check-env',
  description: 'Validate environment',
  execute: async (ctx) => {
    // ctx.inputs is typed from inputSchema
    const { environment, dryRun, maxRetries } = ctx.inputs;

    if (environment === 'prod' && !dryRun) {
      console.log('Production deployment with', maxRetries, 'retries');
    }

    return { validated: true };
  },
}

Without definePipeline

You can also define pipelines as plain objects that match the Pipeline shape. The pipeline loader accepts any object with the correct structure:

import { z } from 'claudi/sdk';

const inputSchema = z.object({
  target: z.string().describe('Target to process'),
});

export const pipeline = {
  id: 'process',
  version: '1.0.0',
  description: 'Process target',
  inputSchema,
  steps: [
    {
      type: 'agent' as const,
      id: 'process',
      description: 'Process',
      prompt: (ctx: { inputs: { target: string } }) => `Process: ${ctx.inputs.target}`,
    },
  ],
};

Configuration

Project Configuration

Create claudi.config.json in your project root:

{
  "claudiDir": ".pipelines",
  "llmConfig": {
    "model": "sonnet",
    "maxTurns": 50,
    "budgets": {
      "daily": 5.0,
      "monthly": 100.0
    },
    "tools": ["Read", "Write", "Glob", "Grep"],
    "plugins": [{ "type": "local", "path": "./my-plugin" }]
  }
}

The claudiDir field (default: .pipelines) consolidates all Claudi data under a single directory. Sub-paths are conventions derived by adapters:

  • definitions/ — Pipeline definition files
  • .reports/ — Run state persistence
  • .logs/ — Structured log files
Field Type Default Description
claudiDir string .pipelines Root directory for all Claudi data
llmConfig.model string sonnet Default model
llmConfig.maxTurns number 50 Default max turns
llmConfig.tools string[] Default tools for agent steps
llmConfig.plugins object[] SDK plugins to load
llmConfig.budgets.daily number Daily cost limit (USD)
llmConfig.budgets.monthly number Monthly cost limit (USD)

Pipeline LLM Config

Set LLM configuration at the pipeline level:

const pipeline: Pipeline = {
  id: 'my-pipeline',
  llmConfig: {
    model: 'sonnet',
    maxTurns: 50,
    timeout: 300000,           // 5 minutes
    tools: ['Read', 'Write', 'Bash'],
    plugins: [{ type: 'local', path: './custom-plugin' }],
    hooks: [
      { event: 'progress', handler: (event) => console.log(event) },
    ],
    retryOnFailure: true,
    maxRetries: 3,
    retryDelayMs: 1000,
    retryBackoffMultiplier: 2,
  },
  steps: [...],
};

Configuration Accumulation

Configuration flows through three levels: Project → Pipeline → Step. Values accumulate rather than replace:

Type Behavior
Scalars (model, maxTurns) Child overrides parent
Arrays (tools, disallowedTools) Union with deduplication
Plugins Union by path (child wins on conflict)
Objects (agents, mcpServers, env) Shallow merge (child wins on conflict)

Example:

// claudi.config.json
{ "llmConfig": { "tools": ["Read", "Write"] } }

// .pipelines/definitions/my-pipeline.ts
const pipeline: Pipeline = {
  llmConfig: { tools: ["Bash"] },  // Inherits + adds
  steps: [
    {
      type: 'agent',
      tools: ["Glob"],       // Inherits + adds
      // Final tools: ["Read", "Write", "Bash", "Glob"]
    },
  ],
};

Priority (for scalars):

  1. Step-level explicit config
  2. Pipeline llmConfig
  3. Project config (claudi.config.json)
  4. Built-in defaults

MCP Integration

Claudi supports Model Context Protocol servers for custom tools.

Stdio Servers (Local)

{
  type: 'agent',
  id: 'with-mcp',
  prompt: 'Query the database and save results to a file.',
  llmConfig: {
    mcpServers: {
      'filesystem': {
        command: 'node',
        args: ['./mcp-servers/filesystem.js'],
        env: { ROOT_DIR: '/project' },
      },
      'database': {
        command: 'npx',
        args: ['@modelcontextprotocol/server-postgres'],
        env: { DATABASE_URL: process.env.DATABASE_URL },
      },
    },
    tools: ['Read', 'MCP_filesystem_readFile', 'MCP_database_query'],
  },
}

HTTP Servers (Remote)

{
  llmConfig: {
    mcpServers: {
      'remote-api': {
        type: 'http',
        url: 'https://api.example.com/mcp',
        headers: { 'Authorization': `Bearer ${process.env.API_TOKEN}` },
      },
    },
  },
}

Subagents

Define specialized agents that Claude can invoke during execution:

{
  type: 'agent',
  id: 'orchestrator',
  prompt: 'Implement this feature. Use the code-reviewer to verify your changes and test-runner to ensure tests pass.',
  llmConfig: {
    tools: ['Read', 'Write', 'Task'],  // Include 'Task' to enable subagent invocation
    agents: {
      'code-reviewer': {
        description: 'Expert code reviewer for security and quality',
        prompt: 'You are a code review specialist. Focus on security vulnerabilities, performance issues, and code quality.',
        tools: ['Read', 'Grep', 'Glob'],
        model: 'sonnet',
      },
      'test-runner': {
        description: 'Runs and analyzes test suites',
        prompt: 'You are a test execution specialist. Run tests and analyze failures.',
        tools: ['Bash', 'Read', 'Grep'],
      },
      'documentation-writer': {
        description: 'Generates technical documentation',
        prompt: 'You are a technical writer. Create clear, comprehensive documentation.',
        tools: ['Read', 'Write'],
      },
    },
  },
}

Claude will autonomously decide when to invoke subagents based on the task.


Human Input

Enable agents to request input from human operators during execution. This is useful for approval workflows, clarification requests, and interactive decision points.

Enabling Human Input

Add RequestInput to the step's tools array:

{
  type: 'agent',
  id: 'interactive-deploy',
  description: 'Deploy with human approval',
  prompt: 'Deploy this application. Ask for confirmation before deploying to production.',
  llmConfig: { tools: ['Read', 'Write', 'RequestInput'] },
}

Input Types

The RequestInput tool supports four input types:

Type Description Example Use Case
text Free-form text input "Enter the deployment message"
confirm Yes/no confirmation "Deploy to production?"
choice Single/multiple selection from list "Select target environment"
content Multi-line content (logs, code, etc) "Paste the error log"

Example agent prompt:

When ready to deploy, use the RequestInput tool to ask:
- type: "confirm"
- prompt: "Ready to deploy to production?"

If the user confirms, proceed with deployment.

Human Input Configuration

Configure timeout and fallback behavior with the humanInput field:

{
  type: 'agent',
  id: 'interactive-step',
  prompt: 'Analyze and request clarification if needed.',
  llmConfig: {
    tools: ['Read', 'RequestInput'],
    humanInput: {
      mode: 'auto',           // 'interactive' | 'file' | 'auto'
      defaultTimeout: 300,    // 5 minutes (in seconds)
      defaultOnTimeout: 'fail', // 'fail' | 'default' | 'skip'
    },
  },
}

Configuration Options:

Field Type Default Description
mode 'interactive'│'file'│'auto' 'auto' Input collection mode
defaultTimeout number Default timeout in seconds (undefined = wait)
defaultOnTimeout 'fail'│'default'│'skip' 'fail' Behavior when timeout occurs

Mode Descriptions:

  • interactive — Prompt via terminal readline (requires TTY)
  • file — Write request to file, poll for response file (for background/CI)
  • auto — Use interactive if TTY available, otherwise file-based

Timeout Behaviors:

  • fail — Throw error and fail the step
  • default — Use the default value from the request (if provided)
  • skip — Return empty response and continue

Dynamic Configuration

Human input configuration supports dynamic values:

{
  type: 'agent',
  id: 'conditional-approval',
  prompt: 'Process changes with appropriate approvals.',
  llmConfig: (ctx) => ({
    tools: ['Read', 'RequestInput'],
    humanInput: {
      mode: ctx.variables.ci ? 'file' : 'auto',
      defaultTimeout: ctx.variables.environment === 'prod' ? 600 : 120,
      defaultOnTimeout: ctx.variables.allowSkip ? 'skip' : 'fail',
    },
  }),
}

Human Input Events

Subscribe to human input events for logging, UI updates, or custom handling:

eventBus.on('step:human-input-requested', (event) => {
  console.log(`Input requested: ${event.request.prompt}`);
  console.log(`Type: ${event.request.type}`);
});

eventBus.on('step:human-input-received', (event) => {
  console.log(`Input received from: ${event.response.source}`);
});

eventBus.on('step:human-input-timeout', (event) => {
  console.log(`Input timed out, behavior: ${event.behavior}`);
});

Event Types:

Event Description
step:human-input-requested Agent requested input from human
step:human-input-received Human provided input response
step:human-input-timeout Input request timed out

File-Based Input (CI/Background)

For non-interactive environments, the file collector writes requests and polls for responses:

.pipelines/.human-input/
├── request-{uuid}.json     # Written by Claudi
└── response-{uuid}.json    # Written by external process/human

Request file format:

{
  "type": "confirm",
  "prompt": "Deploy to production?",
  "default": false,
  "timeout": 300
}

Response file format:

{
  "value": true
}

This enables integration with external approval systems, Slack bots, or web dashboards.


Progress Display

Claudi provides Docker BuildKit-style progress visualization:

Pipeline: implement
Run ID: impl-abc123

Steps:
  ✓ analyze          00:15  ↑ 2.1k  ↓ 1.8k  $0.02
  ● implement        01:23  ↑ 5.4k  ↓ 12.3k $0.08  Writing src/auth.ts
    ├─ auth          00:45  ✓
    └─ dashboard     00:38  ● Reading components...
  ○ test             —
  ○ deploy           —

Total: $0.10 | 2/4 steps | 01:38 elapsed

Status Indicators:

  • Completed
  • Failed
  • Running
  • Pending
  • Skipped

Metrics Displayed:

  • Duration per step
  • Token counts (↑ input, ↓ output)
  • Cost per step
  • Current tool activity

Non-Interactive Mode

For CI/CD environments:

claudi run implement --quiet  # Minimal output

Or set CI=true environment variable for automatic detection.


Logging

Claudi logs to JSONL files with automatic run ID correlation. Every log entry emitted during a pipeline run is tagged with the runId via AsyncLocalStorage-based ambient context — no manual threading required.

Location: .pipelines/.logs/claudi-{YYYY-MM-DD}.log

Log Entry Format:

{
  "ts": "2026-01-15T10:30:45.123Z",
  "level": "info",
  "msg": "Pipeline started",
  "runId": "impl-abc123",
  "stepId": "analyze",
  "sessionId": "sess-xyz",
  "durationMs": 1523,
  "costUsd": 0.02
}

Log Levels: error, warn, info, debug

Querying Logs

Use the claudi logs command for filtered, human-readable output:

# View logs from most recent run
claudi logs

# Filter by level
claudi logs --last --level error

# Filter by step
claudi logs --last --step analyze

# Tail in real-time
claudi logs --follow

# Raw JSONL for piping
claudi logs --last --json | jq .

For advanced queries, pipe raw JSONL through jq:

# Steps taking > 10 seconds
claudi logs --last --json | jq 'select(.durationMs > 10000)'

# Cost tracking
claudi logs --last --json | jq 'select(.costUsd != null) | {stepId, costUsd}'

Error Handling

Error Types

Type Description
step_failed Generic step failure
timeout Step exceeded timeout
budget_exceeded Cost limit reached
aborted User/system abort
rate_limited API rate limit hit
validation_failed Output schema validation failed
unknown Unexpected error

Retry Configuration

const pipeline: Pipeline = {
  llmConfig: {
    retryOnFailure: true,
    maxRetries: 3,
    retryDelayMs: 1000,
    retryBackoffMultiplier: 2,  // Exponential backoff
  },
  steps: [...],
};

Smart Error Suggestions

Claudi provides context-specific suggestions:

Error: Step 'analyze' timed out after 300000ms

Suggestions:
  • Resume with longer timeout:
    claudi run resume impl-abc123 --timeout 600000
  • Check step complexity - consider breaking into smaller steps
  • Review recent logs: claudi logs impl-abc123

Resume Failed Runs

# Check what failed
claudi run show impl-abc123

# Resume from failure point
claudi run resume impl-abc123

Programmatic API

Claudi exposes a SDK entry point (claudi/sdk) for defining pipelines programmatically. Two approaches are available:

import { buildPipeline, definePipeline, z } from 'claudi/sdk';
import type { Pipeline, PipelineStep, StepContext } from 'claudi/sdk';

// Builder pattern (recommended) — output types accumulate automatically
const pipeline = buildPipeline({
  id: 'my-pipeline',
  version: '1.0.0',
  description: 'My custom pipeline',
  inputSchema: z.object({ target: z.string() }),
})
  .agent('analyze', {
    description: 'Analyze code',
    prompt: (ctx) => `Analyze: ${ctx.inputs.target}`,
    llmConfig: { tools: ['Read', 'Grep'], model: 'sonnet', maxTurns: 20 },
    outputKey: 'analysis',
    outputSchema: z.object({ summary: z.string() }),
  })
  .build();

// definePipeline — direct definition with optional typed outputs
const pipeline2 = definePipeline({
  id: 'simple',
  version: '1.0.0',
  description: 'Simple pipeline',
  inputSchema: z.object({ target: z.string() }),
  steps: [
    {
      type: 'agent',
      id: 'analyze',
      description: 'Analyze code',
      prompt: (ctx) => `Analyze: ${ctx.inputs.target}`,
    },
  ],
});

export { pipeline };

The CLI entry point (claudi/cli) bootstraps the DI container and resolves the full application:

import { runCLI } from 'claudi/cli';

await runCLI();

Ports and Adapters

The hexagonal architecture separates concerns through ports (interfaces) and adapters (implementations). To provide a custom adapter for any driven port, implement the port interface and register it in the composition layer:

// Implement a driven port (e.g., custom state persistence)
import type { PipelineStateRepository } from '@ports/driven';

class CustomStateRepository implements PipelineStateRepository {
  async save(state) {
    /* Save to database, S3, etc. (.pipelines/.reports/ by default) */
  }
  async load(runId) {
    /* Load from storage */
  }
  async list() {
    /* List all runs */
  }
}

Event Handling

Subscribe to pipeline events via the EventBus driven port:

// Events are defined in the domain layer
// Adapters subscribe through the EventBus port

eventBus.on('step:start', (event) => {
  console.log(`Starting step: ${event.stepId}`);
});

eventBus.on('step:complete', (event) => {
  console.log(`Completed: ${event.stepId} ($${event.costUsd})`);
});

eventBus.on('step:fail', (event) => {
  console.error(`Failed: ${event.stepId} - ${event.error.message}`);
});

Architecture

Claudi uses a hexagonal (ports & adapters) architecture with five layers. Dependencies flow strictly inward -- outer layers depend on inner layers, never the reverse. Boundary rules are enforced at lint time by eslint-plugin-boundaries (configured in boundary-spec.js).

For full architectural details -- layer responsibilities, data flow, DI patterns, and design decisions -- see ARCHITECTURE.md.

                  ┌────────────────────────────────────────┐
                  │           composition/                  │
                  │   DI wiring (tsyringe bootstrap)       │
                  │   Tokens, modules, container setup     │
                  └──────────┬──────────────────┬──────────┘
                             │                  │
        ┌────────────────────▼───┐    ┌────────▼────────────────┐
        │  adapters/ DRIVING     │    │  adapters/ DRIVEN       │
        │  ┌───────────────────┐ │    │  ┌────────────────────┐ │
        │  │ CLI (yargs/ink)   │ │    │  │ Claude LLM Gateway │ │
        │  │ SDK (buildPipe-   │ │    │  │ File State Repo    │ │
        │  │  line)            │ │    │  │ Config Manager     │ │
        │  └───────────────────┘ │    │  │ Logger, EventBus   │ │
        │                        │    │  │ Input Collectors   │ │
        └───────────┬────────────┘    │  └────────────────────┘ │
                    │                 └──────────┬──────────────┘
                    │                            │
        ┌───────────▼────────────────────────────▼──┐
        │              application/                  │
        │  ┌──────────────────────────────────────┐ │
        │  │ Use Cases: init, run, resume, list,  │ │
        │  │   show, cost-summary, view-logs      │ │
        │  ├──────────────────────────────────────┤ │
        │  │ Services: pipeline-runner,           │ │
        │  │   step-runner, step-runner strategies│ │
        │  ├──────────────────────────────────────┤ │
        │  │ Errors: abort, budget, timeout,      │ │
        │  │   validation, checkpoint-pause, ...  │ │
        │  └──────────────────────────────────────┘ │
        └───────────────────┬───────────────────────┘
                            │
        ┌───────────────────▼───────────────────────┐
        │                ports/                      │
        │  Driving:  *.usecase.ts (init, run, ...)  │
        │  Driven:   *.port.ts (LLM, state, config) │
        │  Internal: runner, step-runner, strategy  │
        └───────────────────┬───────────────────────┘
                            │
        ┌───────────────────▼───────────────────────┐
        │                domain/                     │
        │  Entities:  PipelineState, StepState      │
        │  Values:    Pipeline, PipelineStep,       │
        │             StepContext, StepResult,       │
        │             LLMConfig, LLMMessage,        │
        │             ClaudiConfig                   │
        │  Errors:    StepError                     │
        │  Events:    16 event classes              │
        │  (Pure business logic, zero dependencies) │
        └───────────────────────────────────────────┘

Dependency rules:

Layer Can depend on
domain nothing
ports domain
application domain, ports
adapters domain, ports (NOT application)
composition everything

Key patterns:

  • DI via tsyringe -- Injection tokens in tokens.ts, modules register bindings, bootstrap() wires the container
  • Strategy pattern -- Step runner strategies (agent, script, parallel, sequence) registered via multi-registration
  • Child container per request -- Isolation between pipeline runs
  • Path aliases -- @domain/*, @ports/*, @application/*, @adapters/*, @composition/* (never relative cross-layer imports)

Development

Setup

# Clone repository
git clone https://github.com/your-org/claudi.git
cd claudi

# Install dependencies
bun install

Scripts

bun run dev          # Hot-reload CLI development
bun run build        # Build JS bundles + type declarations (dist/)
bun run build:bin    # Compile to standalone binary (dist/claudi)
bun run build:js     # Build CLI + SDK bundles (dist/)
bun run build:types  # Bundle SDK type declarations (dist/typings/)
bun run clean        # Remove dist/ and node_modules/
bun run test         # Run tests with coverage
bun run typecheck    # TypeScript checking (tsc --noEmit)
bun run lint         # ESLint with architectural boundary enforcement
bun run lint:fix     # Auto-fix lint issues
bun run format       # Format code with Prettier
bun run format:check # Check formatting without writing

Testing

Tests live in tests/ and mirror the src/ structure. Mock driven ports for application-layer tests.

bun test                    # All tests
bun test tests/specific.ts  # Single file

Boundary Enforcement

Architectural layer dependencies are enforced by eslint-plugin-boundaries at lint time. The rules are defined in boundary-spec.js. Violations will not show as TypeScript errors -- run bun run lint to catch them.

# Check for boundary violations
bun run lint

License

MIT


Contributing

Contributions are welcome! Please read the contributing guidelines and submit pull requests to the main repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors