Skip to content

feat: add harness executor for coding agent CLI orchestration#1978

Merged
yottahmd merged 17 commits intomainfrom
feat/harness-executor
Apr 10, 2026
Merged

feat: add harness executor for coding agent CLI orchestration#1978
yottahmd merged 17 commits intomainfrom
feat/harness-executor

Conversation

@yottahmd
Copy link
Copy Markdown
Collaborator

@yottahmd yottahmd commented Apr 9, 2026

Summary

  • Add new type: harness step executor that spawns coding agent CLIs (Claude Code, Codex, OpenCode, Pi) as subprocess workflow steps
  • Each provider maps a unified config (model, effort, max_turns, output_format) to its CLI-specific flags, with provider-specific options and an extra_flags escape hatch
  • Update agent skill references (codingagent.md, executors.md) to use type: harness instead of bare shell scripts

Test plan

  • go build ./... passes
  • make lint — 0 issues
  • go test -race ./internal/runtime/builtin/harness/... — 33 tests pass
  • Manual test with Claude Code installed: dagu start a DAG with type: harness + provider: claude

Summary by CodeRabbit

  • New Features

    • Added a new harness executor to run external coding-agent CLIs (claude, codex, copilot, opencode, pi) as DAG steps, with provider registry, customizable invocation, fallback ordering, and explicit exit/timeout semantics.
    • Supports step/DAG-level harness defaults, named harness definitions, provider interpolation, flag-generation from config, and stdin piping for scripts.
  • Documentation

    • Added comprehensive harness executor reference and updated coding-agent docs with examples and behavior notes.

yottahmd added 2 commits April 9, 2026 16:14
Add a new `type: harness` step executor that spawns coding agent CLIs
(Claude Code, Codex, OpenCode, Pi) as subprocess workflow steps. Each
provider maps a unified config (model, effort, max_turns, output_format)
to its CLI-specific flags, with provider-specific options and an
extra_flags escape hatch for forward compatibility.
Rewrite codingagent.md to use the harness executor instead of bare
shell scripts. Add harness section to executors.md reference.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 928ebca5-0639-4e33-b295-2ec496f4db34

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new "harness" executor to run external coding-agent CLI binaries as subprocesses, a provider registry with built-in providers (claude, codex, copilot, opencode, pi), schema/spec support and DAG-level harness config, and comprehensive tests and docs for harness behavior and merging semantics.

Changes

Cohort / File(s) Summary
Executor Core
internal/runtime/builtin/harness/harness.go
New harness executor: builds/validates harness config, resolves provider/binary, constructs CLI args, spawns subprocess with env/stdin/stdout/stderr handling, supports fallback attempts, handles cancellation and exit-code mapping.
Provider Framework
internal/runtime/builtin/harness/provider.go, internal/runtime/builtin/builtin.go, internal/runtime/builtin/harness/schema.go
Introduces Provider interface (Name(), BinaryName(), BaseArgs(prompt string)), registry helpers (registerProvider, getProvider), package registration (blank import added to builtin entrypoint), and executor config JSON Schema registration.
Built-in Providers
internal/runtime/builtin/harness/claude.go, internal/runtime/builtin/harness/codex.go, internal/runtime/builtin/harness/copilot.go, internal/runtime/builtin/harness/opencode.go, internal/runtime/builtin/harness/pi.go
Adds five built-in provider implementations registering themselves and returning provider identity and base CLI-arg construction for a prompt.
Core Types & Utilities
internal/core/harness.go, internal/core/dag.go
Adds core HarnessDefinition types, enums (prompt mode/position/flag style), builtin provider listing helpers, and DAG-level Harness/Harnesses fields with cloning helpers.
Spec / Loader / Merge
internal/core/spec/dag.go, internal/core/spec/loader.go, internal/core/spec/loader_test.go, internal/core/spec/builder_test.go
Parses/validates top-level harness and harnesses in spec loader, merges/overrides harness definitions with custom mergo transformer, and adds tests for inheritance/merge/validation and blank imports to register builtins.
Step Build & Runtime Integration
internal/core/spec/step.go, internal/core/dag.go
Infers step type harness when DAG-level harness present, merges DAG/step harness configs with fallback inheritance rules, and extends DAG clone to deep-copy harness config.
Schema & Validation Tests
internal/cmn/schema/dag.schema.json, internal/cmn/schema/dag_schema_test.go
Adds schema entries for DAG-level harness/harnesses, new executorType enum value "harness", and tests validating harness-related schema behavior.
Harness Package Tests
internal/runtime/builtin/harness/harness_test.go
Comprehensive unit tests for provider selection, arg building, config normalization/coercion, fallback semantics, execution behavior (cwd, PATH resolution, timeouts), prompt extraction, and exit codes.
Docs
skills/dagu/references/codingagent.md, skills/dagu/references/executors.md
Replaces prior agent CLI docs with unified harness executor docs, examples and notes on prompt/flag mapping, stdin/script piping, output capture, fallback semantics, and exit codes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DAG as DAG Step
    participant Executor as Harness Executor
    participant Registry as Provider Registry
    participant Provider as Provider Impl
    participant CLI as CLI Binary

    User->>DAG: trigger step (type: harness)
    DAG->>Executor: Run(ctx, step)
    Executor->>Executor: decode/merge harness config
    Executor->>Registry: getProvider(providerName)
    Registry-->>Executor: Provider instance
    Executor->>Executor: resolve binary path (PATH / workdir)
    Executor->>Provider: BaseArgs(prompt)
    Provider-->>Executor: CLI args
    Executor->>CLI: start subprocess (args, env, cwd, stdin)
    CLI-->>Executor: stdout / stderr streams
    Executor->>Executor: spool stdout, tail stderr
    Executor->>CLI: wait for completion or ctx cancel
    alt ctx cancelled
        Executor->>CLI: kill process group
        CLI-->>Executor: terminated (exit 124)
    end
    Executor-->>DAG: result (stdout file/stream, exit code, stderr tail)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: introducing a new harness executor for orchestrating coding agent CLI tools as workflow steps.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/harness-executor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
skills/dagu/references/codingagent.md (1)

55-68: Clarify the dual use of command: and script: in harness steps.

The example shows both command: and script: fields. While line 88 explains the behavior, the example at lines 58-66 could be confusing since the script: content references ${RESEARCH} which would be interpolated before being piped to stdin. Consider adding an inline comment in the YAML example clarifying this interaction.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/dagu/references/codingagent.md` around lines 55 - 68, The YAML example
for the harness step named "review" is confusing because it shows both the
script: block (containing ${RESEARCH}) and a separate command: field without
clarifying which is used; update the example by adding a short inline comment
next to the script or command explaining that when both are present the script
is interpolated (variables like ${RESEARCH} are expanded) and its output is
piped to the command's stdin, so the command ("Review the research...") reads
the interpolated script from stdin. Reference the "review" step, the script:
block, the command: field, and the ${RESEARCH} variable in the comment so
readers understand the interaction.
internal/runtime/builtin/harness/provider.go (1)

27-33: Consider generating the supported provider list dynamically.

The error message hardcodes "claude, codex, opencode, pi" which could become stale if providers are added or removed. Consider generating this list from the registry keys.

♻️ Proposed refactor to generate provider list dynamically
 func getProvider(name string) (Provider, error) {
 	p, ok := providers[name]
 	if !ok {
-		return nil, fmt.Errorf("harness: unknown provider %q; supported: claude, codex, opencode, pi", name)
+		supported := make([]string, 0, len(providers))
+		for k := range providers {
+			supported = append(supported, k)
+		}
+		return nil, fmt.Errorf("harness: unknown provider %q; supported: %v", name, supported)
 	}
 	return p, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runtime/builtin/harness/provider.go` around lines 27 - 33, The error
in getProvider hardcodes the supported provider names; instead, build the
supported list dynamically from the providers registry: iterate over the map
keys of providers, collect them into a slice, sort if deterministic output is
desired, join with commas (e.g., strings.Join) and use that generated string in
the fmt.Errorf so the message always reflects the actual entries in providers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/runtime/builtin/harness/harness.go`:
- Around line 215-217: The current validation only checks that step.Commands is
non-empty but allows commands with empty prompts to slip through; update the
validation in the harness creation path (the block that currently checks if
len(step.Commands) == 0) to also iterate over step.Commands and reject any
command whose prompt/text field is empty (the same field that newHarness later
expects), returning a clear error (e.g., "harness: command prompt is required")
referencing the step.Commands entries; ensure this check runs before calling
newHarness so invalid steps are caught early.

In `@internal/runtime/builtin/harness/pi.go`:
- Around line 35-40: The current argument assembly can emit both cfg.Tools and
cfg.NoTools, producing conflicting flags; update the logic in the pi.go
argument-building code so the flags are mutually exclusive: check cfg.NoTools
first and if true only append "--no-tools", otherwise if cfg.Tools is non-empty
append "--tools" with its value (i.e., ensure you do not append "--tools" when
cfg.NoTools is set), making the decision around the same args slice where
cfg.Tools and cfg.NoTools are currently handled.

In `@skills/dagu/references/codingagent.md`:
- Around line 228-235: The Codex model names in the "Model tiers" table are
incorrect; update the Codex column entries (currently 'gpt-5.1-codex-mini',
'gpt-5.4', 'gpt-5.3-codex') to real identifiers (e.g., cheap ->
'codex-mini-latest' or 'GPT-5.1-Codex-Max'; balanced -> 'gpt-5-codex' or
'GPT-5.2-Codex'; most capable -> 'GPT-5.3-Codex') or explicitly mark them as
placeholders, ensuring the table in skills/dagu/references/codingagent.md and
the "Model tiers" section uses accurate, correctly-cased model names.

---

Nitpick comments:
In `@internal/runtime/builtin/harness/provider.go`:
- Around line 27-33: The error in getProvider hardcodes the supported provider
names; instead, build the supported list dynamically from the providers
registry: iterate over the map keys of providers, collect them into a slice,
sort if deterministic output is desired, join with commas (e.g., strings.Join)
and use that generated string in the fmt.Errorf so the message always reflects
the actual entries in providers.

In `@skills/dagu/references/codingagent.md`:
- Around line 55-68: The YAML example for the harness step named "review" is
confusing because it shows both the script: block (containing ${RESEARCH}) and a
separate command: field without clarifying which is used; update the example by
adding a short inline comment next to the script or command explaining that when
both are present the script is interpolated (variables like ${RESEARCH} are
expanded) and its output is piped to the command's stdin, so the command
("Review the research...") reads the interpolated script from stdin. Reference
the "review" step, the script: block, the command: field, and the ${RESEARCH}
variable in the comment so readers understand the interaction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0db391f9-5ce8-4972-b13e-276dc0146671

📥 Commits

Reviewing files that changed from the base of the PR and between 867d606 and 3a19078.

📒 Files selected for processing (11)
  • internal/runtime/builtin/builtin.go
  • internal/runtime/builtin/harness/claude.go
  • internal/runtime/builtin/harness/codex.go
  • internal/runtime/builtin/harness/harness.go
  • internal/runtime/builtin/harness/harness_test.go
  • internal/runtime/builtin/harness/opencode.go
  • internal/runtime/builtin/harness/pi.go
  • internal/runtime/builtin/harness/provider.go
  • internal/runtime/builtin/harness/schema.go
  • skills/dagu/references/codingagent.md
  • skills/dagu/references/executors.md

yottahmd added 6 commits April 9, 2026 17:56
- Strengthen validation to reject empty prompts, not just empty command
  lists (harness.go)
- Make --tools and --no-tools mutually exclusive in Pi provider; --no-tools
  takes precedence (pi.go)
- Fix Codex model names to match actual OpenAI offerings (codingagent.md,
  harness.md)
- Fix go fmt alignment of mapstructure struct tags (harness.go)
…pilot

Redesign the harness executor to pass config map entries directly as
CLI flags instead of hardcoding provider-specific struct fields. Config
keys become --key, values are type-dependent (string→value, bool→flag,
number→N, array→repeated). This makes the executor future-proof — any
new CLI flag works immediately without Go code changes.

Add GitHub Copilot CLI as a new provider (binary: copilot, -p prompt).

Remove hardcoded model name constants from docs — the agent should look
up current model names from provider documentation.

Net -463 lines: simpler code, more providers, no stale config.
Allow users to define their own harness providers via config.binary
and config.prompt_args instead of using a built-in provider name.
This lets any CLI tool be used as a harness step without Go code
changes.

Built-in:  config.provider = "claude"
Custom:    config.binary = "gemini", config.prompt_args = ["-p"]

prompt_args defaults to ["-p"] if omitted. The prompt is appended
after prompt_args. All other config keys are passed as --key value.
@yottahmd
Copy link
Copy Markdown
Collaborator Author

yottahmd commented Apr 9, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
internal/core/harness.go (2)

60-65: Return provider names in stable order.

At Line 60, map iteration makes BuiltinHarnessProviderNames() non-deterministic. Sorting before return will stabilize error messages and test output.

Suggested change
-import "maps"
+import (
+	"maps"
+	"sort"
+)
@@
 func BuiltinHarnessProviderNames() []string {
 	names := make([]string, 0, len(builtinHarnessProviders))
 	for name := range builtinHarnessProviders {
 		names = append(names, name)
 	}
+	sort.Strings(names)
 	return names
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/harness.go` around lines 60 - 65, BuiltinHarnessProviderNames
iterates a map (builtinHarnessProviders) so the returned slice order is
non-deterministic; collect the keys into names as currently done, then call
sort.Strings(names) before returning to stabilize output and tests, and add a
sort import if missing. Ensure you reference the BuiltinHarnessProviderNames
function and the builtinHarnessProviders map when making the change.

45-51: Avoid split-brain provider sources between core and runtime.

At Line 45, the built-in provider list is hard-coded in internal/core/harness.go while runtime registration is defined separately in internal/runtime/builtin/harness/*.go. This creates a drift risk where spec validation and runtime execution disagree.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/harness.go` around lines 45 - 51, The hard-coded
builtinHarnessProviders map in internal/core/harness.go can diverge from the
runtime registrations in internal/runtime/builtin/harness/*.go; replace the
static map with a single source of truth by loading the provider list from the
runtime registration mechanism (e.g., expose the runtime's registration registry
or init function) instead of hard-coding names: remove or deprecate
builtinHarnessProviders and reference the runtime registration API used by the
harness registration files so spec validation and runtime use the same provider
set (look for the map named builtinHarnessProviders and the registration code
under internal/runtime/builtin/harness to wire them together).
internal/runtime/builtin/harness/provider.go (1)

27-29: Prevent silent provider collisions during registration.

At Line 27, registerProvider overwrites existing entries without any signal. A duplicate provider name will mask the earlier registration and make behavior harder to debug.

Suggested change
 func registerProvider(p Provider) {
-	providers[p.Name()] = p
+	name := p.Name()
+	if _, exists := providers[name]; exists {
+		panic(fmt.Sprintf("harness: duplicate provider registration %q", name))
+	}
+	providers[name] = p
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runtime/builtin/harness/provider.go` around lines 27 - 29,
registerProvider currently overwrites existing entries silently; update the
function (registerProvider) to detect collisions by checking providers[p.Name()]
before assignment and fail fast with a clear error (e.g., panic or
processLogger.Fatalf) that includes the duplicate provider name and ideally both
providers' identities; keep the signature to avoid ripple changes or, if you
prefer to return an error, update all callers accordingly so duplicates are not
silently masked.
internal/cmn/schema/dag.schema.json (1)

3301-3344: Add schema-level requirement for prompt_flag when prompt_mode is flag.

Runtime validation already enforces this; adding it here keeps schema validation behavior consistent and fails earlier with clearer feedback.

🧩 Proposed schema parity update
     "harnessNamedDefinition": {
       "type": "object",
       "additionalProperties": false,
       "required": ["binary"],
+      "allOf": [
+        {
+          "if": {
+            "properties": { "prompt_mode": { "const": "flag" } },
+            "required": ["prompt_mode"]
+          },
+          "then": {
+            "required": ["prompt_flag"]
+          }
+        }
+      ],
       "properties": {
         "binary": {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/cmn/schema/dag.schema.json` around lines 3301 - 3344, Add a JSON
Schema conditional to harnessNamedDefinition that requires prompt_flag when
prompt_mode is "flag": inside the harnessNamedDefinition object, add an allOf
(or if/then) clause referencing properties.prompt_mode and
properties.prompt_flag so that if { "properties": { "prompt_mode": { "const":
"flag" } }, "required": ["prompt_mode"] } then enforce { "required":
["prompt_flag"] } — this ensures prompt_flag is required at schema validation
time for harnessNamedDefinition when prompt_mode is "flag".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/cmn/schema/dag.schema.json`:
- Around line 3355-3373: The harnessProviderConfig schema currently forbids
reserved keys "binary" and "prompt_args" via propertyNames.not.enum but still
allows "fallback" as an arbitrary flag; update the propertyNames.not.enum array
to also include "fallback" so nested entries cannot define a reserved "fallback"
key (refer to harnessProviderConfig -> propertyNames.not.enum and
additionalProperties -> $ref to harnessConfigValue), ensuring the schema treats
"fallback" as reserved for the top-level only.

In `@internal/core/dag.go`:
- Around line 235-237: Missing copy of Harnesses when rebuilding DAG from YAML:
in rebuildDAGRunSnapshotFromYAML assign the fresh parsed harness definitions
back to the original dag (i.e., add dag.Harnesses = fresh.Harnesses alongside
the existing copies of Env, Params, SMTP, SSH, RegistryAuths) so that
ctx.dag.Harnesses used in step validation (step.go referencing
ctx.dag.Harnesses) retains custom harness providers after queue/retry
deserialization.

In `@internal/runtime/builtin/harness/harness.go`:
- Around line 318-320: The harness currently silently truncates multi-command
steps because extractPrompt reads only step.Commands[0]; change the logic to
detect and reject steps with more than one command: update extractPrompt (or the
callers that assign prompt := extractPrompt(step)) to check len(step.Commands)
and return an error if >1, and apply the same validation in the other similar
use-site referenced (the block around the other prompt extraction at the 527-547
area) so any harness step with multiple commands fails fast with a clear error
that the harness type only accepts a single prompt/command.
- Around line 243-304: The problem is that built-in providers call
configToFlags(cfg.flags, nil) so unified keys like "max_turns" are forwarded
verbatim and not mapped to provider-specific tokens; update the caller
buildInvocation to pass the appropriate *core.HarnessDefinition (or the
provider's definition) into configToFlags instead of nil, and ensure
flagTokenForKey(key, definition) uses the HarnessDefinition mapping (or falls
back to a small normalization: convert common unified keys/underscores to
provider-expected tokens, e.g. underscore->hyphen) so BaseArgs(prompt) /
Provider invocations receive provider-specific flags rather than raw unified
keys.

In `@skills/dagu/references/codingagent.md`:
- Around line 207-220: The example YAML mixes camelCase and kebab-case for
config keys which will produce inconsistent CLI flags; update the example under
the steps -> name: task block so all non-reserved config keys use a single
convention (prefer kebab-case): change allowedTools to allowed-tools (and any
other camelCase keys) while keeping existing kebab-case keys like max-turns,
max-budget-usd, and permission-mode unchanged so forwarded CLI flags are
consistent.

In `@skills/dagu/references/executors.md`:
- Around line 431-447: The documentation for harnesses is missing the new
escape-hatch field extra_flags; add a short paragraph under the Custom
`harnesses:` section describing `extra_flags` as an array (or string) of raw CLI
switches that are forwarded verbatim to the invoked binary to support repeated
or ordering-sensitive flags, explain it bypasses the normal key=>`--key value`
mapping and is appended according to `prompt_position`/`flag_style` and any
`option_flags` overrides, and include a brief example showing repeated flags
(e.g., ["-I", "-I/usr/include"] or ["--foo","--bar"]) and note it is intended
for advanced use when precise ordering or duplicates are required.

---

Nitpick comments:
In `@internal/cmn/schema/dag.schema.json`:
- Around line 3301-3344: Add a JSON Schema conditional to harnessNamedDefinition
that requires prompt_flag when prompt_mode is "flag": inside the
harnessNamedDefinition object, add an allOf (or if/then) clause referencing
properties.prompt_mode and properties.prompt_flag so that if { "properties": {
"prompt_mode": { "const": "flag" } }, "required": ["prompt_mode"] } then enforce
{ "required": ["prompt_flag"] } — this ensures prompt_flag is required at schema
validation time for harnessNamedDefinition when prompt_mode is "flag".

In `@internal/core/harness.go`:
- Around line 60-65: BuiltinHarnessProviderNames iterates a map
(builtinHarnessProviders) so the returned slice order is non-deterministic;
collect the keys into names as currently done, then call sort.Strings(names)
before returning to stabilize output and tests, and add a sort import if
missing. Ensure you reference the BuiltinHarnessProviderNames function and the
builtinHarnessProviders map when making the change.
- Around line 45-51: The hard-coded builtinHarnessProviders map in
internal/core/harness.go can diverge from the runtime registrations in
internal/runtime/builtin/harness/*.go; replace the static map with a single
source of truth by loading the provider list from the runtime registration
mechanism (e.g., expose the runtime's registration registry or init function)
instead of hard-coding names: remove or deprecate builtinHarnessProviders and
reference the runtime registration API used by the harness registration files so
spec validation and runtime use the same provider set (look for the map named
builtinHarnessProviders and the registration code under
internal/runtime/builtin/harness to wire them together).

In `@internal/runtime/builtin/harness/provider.go`:
- Around line 27-29: registerProvider currently overwrites existing entries
silently; update the function (registerProvider) to detect collisions by
checking providers[p.Name()] before assignment and fail fast with a clear error
(e.g., panic or processLogger.Fatalf) that includes the duplicate provider name
and ideally both providers' identities; keep the signature to avoid ripple
changes or, if you prefer to return an error, update all callers accordingly so
duplicates are not silently masked.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 184b08a6-c9c6-47e7-9ba8-ef690c127a84

📥 Commits

Reviewing files that changed from the base of the PR and between 867d606 and 7fa618d.

📒 Files selected for processing (21)
  • internal/cmn/schema/dag.schema.json
  • internal/cmn/schema/dag_schema_test.go
  • internal/core/dag.go
  • internal/core/harness.go
  • internal/core/spec/builder_test.go
  • internal/core/spec/dag.go
  • internal/core/spec/loader.go
  • internal/core/spec/loader_test.go
  • internal/core/spec/step.go
  • internal/runtime/builtin/builtin.go
  • internal/runtime/builtin/harness/claude.go
  • internal/runtime/builtin/harness/codex.go
  • internal/runtime/builtin/harness/copilot.go
  • internal/runtime/builtin/harness/harness.go
  • internal/runtime/builtin/harness/harness_test.go
  • internal/runtime/builtin/harness/opencode.go
  • internal/runtime/builtin/harness/pi.go
  • internal/runtime/builtin/harness/provider.go
  • internal/runtime/builtin/harness/schema.go
  • skills/dagu/references/codingagent.md
  • skills/dagu/references/executors.md

Comment on lines +3355 to +3373
"harnessProviderConfig": {
"type": "object",
"propertyNames": {
"not": {
"enum": ["binary", "prompt_args"]
}
},
"properties": {
"provider": {
"type": "string",
"description": "Harness provider name. May reference a built-in provider such as claude, codex, copilot, opencode, or pi, or a custom top-level harnesses entry. Supports ${VAR} interpolation."
}
},
"additionalProperties": {
"$ref": "#/definitions/harnessConfigValue"
},
"required": ["provider"],
"description": "Primary or fallback harness provider configuration. Reserved keys are provider and fallback; all other keys are passed through to the CLI as flags."
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

harnessProviderConfig currently allows reserved fallback inside fallback entries.

fallback is documented as reserved, but in this definition it is treated as an arbitrary flag value. That can validate unsupported nested fallback shapes and produce ambiguous runtime behavior.

🔧 Proposed schema fix
     "harnessProviderConfig": {
       "type": "object",
       "propertyNames": {
         "not": {
-          "enum": ["binary", "prompt_args"]
+          "enum": ["binary", "prompt_args", "fallback"]
         }
       },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"harnessProviderConfig": {
"type": "object",
"propertyNames": {
"not": {
"enum": ["binary", "prompt_args"]
}
},
"properties": {
"provider": {
"type": "string",
"description": "Harness provider name. May reference a built-in provider such as claude, codex, copilot, opencode, or pi, or a custom top-level harnesses entry. Supports ${VAR} interpolation."
}
},
"additionalProperties": {
"$ref": "#/definitions/harnessConfigValue"
},
"required": ["provider"],
"description": "Primary or fallback harness provider configuration. Reserved keys are provider and fallback; all other keys are passed through to the CLI as flags."
},
"harnessProviderConfig": {
"type": "object",
"propertyNames": {
"not": {
"enum": ["binary", "prompt_args", "fallback"]
}
},
"properties": {
"provider": {
"type": "string",
"description": "Harness provider name. May reference a built-in provider such as claude, codex, copilot, opencode, or pi, or a custom top-level harnesses entry. Supports ${VAR} interpolation."
}
},
"additionalProperties": {
"$ref": "#/definitions/harnessConfigValue"
},
"required": ["provider"],
"description": "Primary or fallback harness provider configuration. Reserved keys are provider and fallback; all other keys are passed through to the CLI as flags."
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/cmn/schema/dag.schema.json` around lines 3355 - 3373, The
harnessProviderConfig schema currently forbids reserved keys "binary" and
"prompt_args" via propertyNames.not.enum but still allows "fallback" as an
arbitrary flag; update the propertyNames.not.enum array to also include
"fallback" so nested entries cannot define a reserved "fallback" key (refer to
harnessProviderConfig -> propertyNames.not.enum and additionalProperties -> $ref
to harnessConfigValue), ensuring the schema treats "fallback" as reserved for
the top-level only.

Comment on lines +235 to +237
// Harnesses contains reusable custom harness definitions available to harness steps.
// Excluded from JSON: derived from DAG/base config and rebuilt from YAML when needed.
Harnesses HarnessDefinitions `json:"-"`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '== DAG persistence / reload paths =='
rg -n -C2 --type=go 'dag\.json|json\.(Marshal|Unmarshal)|NewEncoder|NewDecoder|BaseConfigData|YamlData' internal

echo
echo '== Runtime use of harness definitions =='
rg -n -C2 --type=go '\bHarnesses\b|\bHarnessDefinition\b|\bdefinition\b|getProvider\(' internal/core internal/runtime

echo
echo '== Step build output for harness providers =='
rg -n -C2 --type=go 'mergeHarnessConfig|validateHarnessProviderConfig|ExecutorConfig\.Config' internal/core/spec

Repository: dagucloud/dagu

Length of output: 50371


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== DAG reload/deserialization patterns ==="
# Find where DAGs are unmarshaled from JSON
rg -n -B3 -A3 'json\.Unmarshal.*\*core\.DAG|newDAGFromSnapshot|readDAG' internal/core internal/runtime internal/service --type=go | head -100

echo
echo "=== YAML rebuild paths ==="
# Find where DAGs are rebuilt from YamlData
rg -n -B2 -A2 'rebuildDAGRunSnapshotFromYAML|BuildDAG.*YamlData|spec\.Parse.*YamlData' internal --type=go | head -80

echo
echo "=== Harnesses field usage at runtime ==="
# Check if dag.Harnesses is accessed during execution
rg -n 'dag\.Harnesses|\.Harnesses\[' internal/runtime internal/core/spec --type=go

echo
echo "=== HarnessDefinitions initialization ==="
# Find where Harnesses field is populated
rg -n -B2 -A2 'Harnesses\s*=' internal/core --type=go

Repository: dagucloud/dagu

Length of output: 3358


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== When/how are Harnesses populated? ==="
# Find where dag.Harnesses is assigned (not just cloned)
rg -n 'Harnesses\s*=' internal/core/spec --type=go -A2

echo
echo "=== Step.go line 1277 context ==="
# See the full context of the Harnesses access in step building
sed -n '1270,1285p' internal/core/spec/step.go

echo
echo "=== Does YAML parsing/BuildDAG populate Harnesses? ==="
# Find DAG initialization from YAML
rg -n -B5 -A10 'func.*BuildDAG|func.*Parse.*DAG' internal/core/spec/builder.go | head -80

echo
echo "=== Harness config building/merging ==="
# Find where harness configs are created/merged
rg -n -B2 -A5 'mergeHarnessConfig|NewHarnessDefinition|buildHarness' internal/core/spec --type=go | head -100

Repository: dagucloud/dagu

Length of output: 99


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== When/how are Harnesses populated? ==="
# Find where dag.Harnesses is assigned (including nested assignments)
rg -n 'Harnesses\s*:=|Harnesses\s*=' internal/core --type=go 

echo
echo "=== Step.go line 1277 full context ==="
# See the full context of the Harnesses access in step building
if [ -f internal/core/spec/step.go ]; then
  sed -n '1270,1290p' internal/core/spec/step.go
fi

echo
echo "=== BuildDAG function signature and start ==="
# Find the BuildDAG function
rg -n -A30 'func BuildDAG' internal/core/spec/builder.go | head -50

echo
echo "=== Harness initialization in builder ==="
# Look for where Harnesses field is set during building
rg -n -B3 -A3 'dag\.Harnesses.*=' internal/core/spec --type=go

Repository: dagucloud/dagu

Length of output: 1050


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Full rebuildDAGRunSnapshotFromYAML function ==="
sed -n '503,550p' internal/service/frontend/api/v1/dagruns.go

echo
echo "=== Where is dag.Harnesses assigned during DAG load/build? ==="
# Search more broadly in builder/loader
rg -n 'Harnesses' internal/core/spec/builder.go internal/core/spec/loader.go --type=go

echo
echo "=== DAG struct initialization ==="
# See if Harnesses is set anywhere in DAG creation
rg -n -B5 -A5 'func.*\*core\.DAG' internal/core/spec/builder.go | head -100

echo
echo "=== Check loader.go for Harnesses population ==="
cat internal/core/spec/loader.go | grep -n -A5 -B5 'Harnesses\|harness' | head -150

Repository: dagucloud/dagu

Length of output: 1414


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Complete rebuildDAGRunSnapshotFromYAML to see all field copies ==="
sed -n '503,560p' internal/service/frontend/api/v1/dagruns.go

echo
echo "=== LoadYAML function ==="
rg -n -A20 'func LoadYAML' internal/core/spec --type=go

echo
echo "=== Does LoadYAML populate Harnesses? ==="
rg -n -B5 -A10 'func.*Load.*\*core\.DAG|Harnesses' internal/core/spec/loader.go --type=go | head -100

echo
echo "=== Check what fresh DAG has after LoadYAML ==="
# See if Harnesses is computed in builder
rg -n 'harness' internal/core/spec/builder.go --type=go -i | head -50

Repository: dagucloud/dagu

Length of output: 6606


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Critical: Check if Harnesses are copied in rebuildDAGRunSnapshotFromYAML ==="
sed -n '545,560p' internal/service/frontend/api/v1/dagruns.go

echo
echo "=== Check builder.go to see if fresh DAG gets Harnesses populated ==="
rg -n 'Harnesses' internal/core/spec/builder.go --type=go -B3 -A3

echo
echo "=== Check if Harnesses are part of DAG initialization ==="
rg -n 'buildDAGFromConfig|NewDAG' internal/core/spec/builder.go --type=go -A20 | head -80

echo
echo "=== Look for where Harnesses field is set in builder ==="
rg -n 'dag\.Harnesses\s*=' internal/core/spec --type=go

Repository: dagucloud/dagu

Length of output: 646


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== CRITICAL: rebuildDAGRunSnapshotFromYAML does NOT copy Harnesses ==="
echo "Lines 545-557 copy Env, Params, SMTP, SSH, RegistryAuths but NOT Harnesses"

echo
echo "=== Does spec.LoadYAML populate fresh.Harnesses? ==="
# Trace through builder to see if Harnesses is set during DAG building
rg -n 'Harnesses\s*:=' internal/core/spec/builder.go --type=go

echo
echo "=== Where is dag.Harnesses initialized/populated? ==="
# Search entire spec package
find internal/core/spec -name '*.go' -exec grep -n 'Harnesses' {} + | grep -v test | grep -v '//'

echo
echo "=== Check buildDAG/NewDAG to see what fields are initialized ==="
rg -n -A30 'func buildDAGFromConfig' internal/core/spec/builder.go --type=go | head -100

echo
echo "=== Check if Harnesses comes from config/base definition ==="
rg -n 'Harnesses.*base|base.*Harnesses|config.*Harness' internal/core/spec --type=go

Repository: dagucloud/dagu

Length of output: 258


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check all Harnesses references in spec/ ==="
rg 'Harnesses' internal/core/spec --type=go | grep -v test | grep -v '//'

echo
echo "=== Check where Harnesses field is first set during DAG creation ==="
rg -n 'dag\.Harnesses' internal/core/spec --type=go

echo
echo "=== Dump loader.go to find Harnesses initialization ==="
wc -l internal/core/spec/loader.go
sed -n '220,350p' internal/core/spec/loader.go

echo
echo "=== Look for mergeHarnessConfig to understand where harnesses come from ==="
rg -n -B5 -A10 'mergeHarnessConfig' internal/core/spec --type=go

Repository: dagucloud/dagu

Length of output: 7355


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== FOUND IT: buildHarnesses function ==="
rg -n -A20 'func buildHarnesses' internal/core/spec/dag.go --type=go

echo
echo "=== Confirm: rebuildDAGRunSnapshotFromYAML MISSING Harnesses copy ==="
echo "rebuildDAGRunSnapshotFromYAML copies these fields from fresh DAG:"
sed -n '545,558p' internal/service/frontend/api/v1/dagruns.go
echo ""
echo "Missing: dag.Harnesses = fresh.Harnesses"

echo
echo "=== Verify DAG is built from YAML config (dag.Harnesses should be set) ==="
rg -n -B5 -A15 'type dag struct' internal/core/spec/dag.go | head -80

Repository: dagucloud/dagu

Length of output: 2431


Add missing Harnesses field copy in rebuildDAGRunSnapshotFromYAML.

rebuildDAGRunSnapshotFromYAML rebuilds the DAG from its stored YamlData, which correctly repopulates custom harness definitions via the YAML parser. However, the function only copies select fields (Env, Params, SMTP, SSH, RegistryAuths) back to the original DAG, omitting Harnesses. Since step validation at line 1277 in step.go accesses ctx.dag.Harnesses to resolve custom harness providers, this omission causes custom harnesses to be lost after queue/retry deserialization.

Add the missing line after line 550 in internal/service/frontend/api/v1/dagruns.go:

dag.Harnesses = fresh.Harnesses

This ensures that custom harness definitions are carried through the serialization/deserialization cycle.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/dag.go` around lines 235 - 237, Missing copy of Harnesses when
rebuilding DAG from YAML: in rebuildDAGRunSnapshotFromYAML assign the fresh
parsed harness definitions back to the original dag (i.e., add dag.Harnesses =
fresh.Harnesses alongside the existing copies of Env, Params, SMTP, SSH,
RegistryAuths) so that ctx.dag.Harnesses used in step validation (step.go
referencing ctx.dag.Harnesses) retains custom harness providers after
queue/retry deserialization.

Comment on lines +243 to +304
func configToFlags(cfg map[string]any, definition *core.HarnessDefinition) []string {
keys := make([]string, 0, len(cfg))
for k := range cfg {
if reservedKeys[k] {
continue
}
keys = append(keys, k)
}
sort.Strings(keys)

var args []string
for _, key := range keys {
flag := flagTokenForKey(key, definition)
switch v := cfg[key].(type) {
case bool:
if v {
args = append(args, flag)
}
case string:
if v != "" {
args = append(args, flag, v)
}
case int:
args = append(args, flag, strconv.Itoa(v))
case int8:
args = append(args, flag, strconv.FormatInt(int64(v), 10))
case int16:
args = append(args, flag, strconv.FormatInt(int64(v), 10))
case int32:
args = append(args, flag, strconv.FormatInt(int64(v), 10))
case int64:
args = append(args, flag, strconv.FormatInt(v, 10))
case uint:
args = append(args, flag, strconv.FormatUint(uint64(v), 10))
case uint8:
args = append(args, flag, strconv.FormatUint(uint64(v), 10))
case uint16:
args = append(args, flag, strconv.FormatUint(uint64(v), 10))
case uint32:
args = append(args, flag, strconv.FormatUint(uint64(v), 10))
case uint64:
args = append(args, flag, strconv.FormatUint(v, 10))
case float32:
args = append(args, flag, strconv.FormatFloat(float64(v), 'f', -1, 32))
case float64:
if v == float64(int(v)) {
args = append(args, flag, strconv.Itoa(int(v)))
} else {
args = append(args, flag, strconv.FormatFloat(v, 'f', -1, 64))
}
case []string:
for _, item := range v {
args = append(args, flag, item)
}
case []any:
for _, item := range v {
args = append(args, flag, fmt.Sprint(item))
}
}
}
return args
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Built-in providers still forward config keys verbatim, so the “unified” option surface won't actually be portable.

For built-ins, buildInvocation always calls configToFlags(cfg.flags, nil), and the Provider contract only supplies BaseArgs(prompt). That leaves no place to translate model, effort, max_turns, output_format, etc. into provider-specific flags. Today a documented key like max_turns would become --max_turns, and any provider that expects --max-turns or a different token will fail.

Also applies to: 594-598, 647-657

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runtime/builtin/harness/harness.go` around lines 243 - 304, The
problem is that built-in providers call configToFlags(cfg.flags, nil) so unified
keys like "max_turns" are forwarded verbatim and not mapped to provider-specific
tokens; update the caller buildInvocation to pass the appropriate
*core.HarnessDefinition (or the provider's definition) into configToFlags
instead of nil, and ensure flagTokenForKey(key, definition) uses the
HarnessDefinition mapping (or falls back to a small normalization: convert
common unified keys/underscores to provider-expected tokens, e.g.
underscore->hyphen) so BaseArgs(prompt) / Provider invocations receive
provider-specific flags rather than raw unified keys.

Comment on lines +318 to +320
prompt := extractPrompt(step)
if prompt == "" {
return nil, fmt.Errorf("harness: command field (prompt) is required")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject multi-command harness steps instead of silently truncating them.

extractPrompt only reads step.Commands[0], so any additional command entries are ignored without warning. That is easy to hit when converting an existing shell step to type: harness, and the executor will run with a partial prompt.

💡 Proposed fix
 func newHarness(ctx context.Context, step core.Step) (executor.Executor, error) {
+	if len(step.Commands) > 1 {
+		return nil, fmt.Errorf("harness: exactly one command entry is supported")
+	}
 	cfg := normalizeConfigMap(step.ExecutorConfig.Config)
 	var defs core.HarnessDefinitions
 	env := runtime.GetEnv(ctx)
@@
 func validateHarnessStep(step core.Step) error {
 	if len(step.Commands) == 0 || extractPrompt(step) == "" {
 		return fmt.Errorf("harness: command field (prompt) is required")
 	}
+	if len(step.Commands) > 1 {
+		return fmt.Errorf("harness: exactly one command entry is supported")
+	}
 	cfg := step.ExecutorConfig.Config

Also applies to: 527-547

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runtime/builtin/harness/harness.go` around lines 318 - 320, The
harness currently silently truncates multi-command steps because extractPrompt
reads only step.Commands[0]; change the logic to detect and reject steps with
more than one command: update extractPrompt (or the callers that assign prompt
:= extractPrompt(step)) to check len(step.Commands) and return an error if >1,
and apply the same validation in the other similar use-site referenced (the
block around the other prompt extraction at the 527-547 area) so any harness
step with multiple commands fails fast with a clear error that the harness type
only accepts a single prompt/command.

Comment on lines +207 to +220
```yaml
steps:
- name: task
type: harness
command: "Write tests for the auth module"
config:
provider: claude
model: sonnet
effort: high
max-turns: 20
max-budget-usd: 2.00
permission-mode: auto
allowedTools: "Bash,Read,Edit"
bare: true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use one config-key convention in the examples.

This page says harness forwards non-reserved keys as CLI flags, but the examples mix kebab-case (max-turns) with camelCase (allowedTools). If forwarding is verbatim, those produce different flags. Please normalize the examples to the actual accepted keys so readers do not copy a broken config.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/dagu/references/codingagent.md` around lines 207 - 220, The example
YAML mixes camelCase and kebab-case for config keys which will produce
inconsistent CLI flags; update the example under the steps -> name: task block
so all non-reserved config keys use a single convention (prefer kebab-case):
change allowedTools to allowed-tools (and any other camelCase keys) while
keeping existing kebab-case keys like max-turns, max-budget-usd, and
permission-mode unchanged so forwarded CLI flags are consistent.

Comment on lines +431 to +447
The `command` field is the prompt. `config.provider` can reference either a built-in provider or a named custom entry under top-level `harnesses:`. All non-reserved keys are passed directly as CLI flags (`--key value` for strings/numbers, `--key` for booleans). Reserved keys are `provider` and `fallback`.

Supported providers: `claude`, `codex`, `copilot`, `opencode`, `pi`.

Top-level `harness:` acts as a DAG-wide default. Step-level config overlays the DAG-level primary config, and step-level `fallback` replaces the DAG-level `fallback`. If a step omits `type:` and the DAG defines `harness:`, the step is inferred as `type: harness`.

`provider` may be parameterized with `${...}` and is resolved at runtime after interpolation.

Custom `harnesses:` definitions describe how to invoke arbitrary harness CLIs:
- `binary`
- `prefix_args`
- `prompt_mode`: `arg`, `flag`, or `stdin`
- `prompt_flag` for `flag` mode
- `prompt_position`: `before_flags` or `after_flags`
- `flag_style`: `gnu_long` or `single_dash`
- `option_flags` to override specific flag tokens

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document extra_flags here as part of the harness surface.

This section explains key/value flag forwarding, but the PR also adds extra_flags as the escape hatch for raw CLI switches. Without that callout, users have no discoverable way to express repeated or ordering-sensitive flags.

I can draft a short example for this section if helpful.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/dagu/references/executors.md` around lines 431 - 447, The
documentation for harnesses is missing the new escape-hatch field extra_flags;
add a short paragraph under the Custom `harnesses:` section describing
`extra_flags` as an array (or string) of raw CLI switches that are forwarded
verbatim to the invoked binary to support repeated or ordering-sensitive flags,
explain it bypasses the normal key=>`--key value` mapping and is appended
according to `prompt_position`/`flag_style` and any `option_flags` overrides,
and include a brief example showing repeated flags (e.g., ["-I",
"-I/usr/include"] or ["--foo","--bar"]) and note it is intended for advanced use
when precise ordering or duplicates are required.

@yottahmd yottahmd merged commit c8ed7c1 into main Apr 10, 2026
5 checks passed
@yottahmd yottahmd deleted the feat/harness-executor branch April 10, 2026 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant