Skip to content

feat(dsl): provider-independent DSL with type-safe capabilities#33

Merged
arcaputo3 merged 18 commits intomainfrom
dsl/core-exploration
Apr 6, 2026
Merged

feat(dsl): provider-independent DSL with type-safe capabilities#33
arcaputo3 merged 18 commits intomainfrom
dsl/core-exploration

Conversation

@arcaputo3
Copy link
Copy Markdown
Contributor

Summary

Introduces a provider-independent DSL layer for Scalagent that decouples agent definitions from any specific LLM provider (Claude, Codex, etc.).

  • Core type system: TypedAgent[In, Out], Capability, Budget, Depth (Peano-encoded delegation limits), ExecutionPolicy, Utility scoring, TraceLogger/TraceSummary, Classification-gated review
  • Interpreters: ClaudeInterpreter, CodexInterpreter, A2AInterpreter — same agent definition runs on any backend
  • Protocol support: First-class A2A and MCP capability types with McpSurface tool loading
  • Safety layers: Capture-checked capabilities (experimental), ScopedCapabilities with runtime enforcement, AgenticReview with explainable scoring and classified segregation
  • Builder API: Fluent AgentBuilder for constructing agents without boilerplate
  • Examples: 8 runnable examples (dsl-basic, dsl-builder, dsl-delegation, dsl-review, dsl-cells, dsl-codex, dsl-cross, capture)
  • Tests: 19 new test suites covering core types, interpreters, event mappers, and experimental features
  • Docs: docs/dsl/ with foundations, protocols, mapping guide, examples, roadmap, and next steps

Test plan

  • ./mill examples.compile — all examples compile
  • ./mill agent.test — full test suite passes
  • Manual: ./mill examples.run dsl-basic end-to-end with API key
  • Manual: ./mill examples.run dsl-codex with Codex API key
  • Review docs in docs/dsl/ for accuracy

🤖 Generated with Claude Code

arcaputo3 and others added 16 commits April 3, 2026 13:09
…preter

Add a semantic DSL core (`com.tjclp.scalagent.core`) that sits above
the existing Claude-specific API without breaking it. This includes:

- Agent[-P, -I, +O] trait with principal, input, and policy parameters
- AgentRun[-R, +O] with shared event stream and typed result
- AgentEvent enum (8 cases) normalizing provider events with Native escape hatch
- ExecutionPolicy, Budget, StopStrategy, FallbackPolicy for semantic constraints
- OutputCodec[O] type class bridging String and StructuredOutput[A]
- RunSummary for terminal event metadata

ClaudeInterpreter bridges core Agent to existing ClaudeAgent runtime using
Queue+Promise shared execution (single provider call, both events and result
consumable). EventMapper provides pure AgentMessage → List[AgentEvent] mapping.

Design docs at docs/dsl/ cover foundations, type mapping, examples, protocol
boundaries, and phased roadmap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce a capability system using Scala 3 intersection types and
Peano-encoded delegation depth for compile-time enforcement:

- Capability marker traits: CanUseTools[T], CanSpawn[D], CanReadMemory,
  CanWriteMemory, CanEscalateHuman, HasBudget
- Peano depth types (Z, S[N]) with DepthLTE for compile-time comparison
- TypedAgent[P, I, O, C] wraps Agent with phantom capability evidence
- AgentBuilder accumulates capabilities via intersection type growth
- HasSpawn/HasToolsCap type classes extract evidence from intersections
- delegate() gated by HasSpawn — compile error without CanSpawn
- delegateTyped() enforces DepthLTE at compile time — equal or deeper
  child depth is a type error, not just a runtime assertion
- DelegationPolicy for budget slicing and turn limits
- ToolSurface for typed tool collections (composes with ++)
- ClaudeInterpreter.builder() provides agentTransform that wires tool
  restrictions into AgentOptions at runtime

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make A2A and MCP first-class citizens in the DSL vocabulary:

A2A (horizontal coordination):
- A2ARemoteAgent trait extends Agent — remote agents are seamlessly
  usable as delegation targets, in AgentBuilder, or called directly
- A2AInterpreter wraps A2AClient as A2ARemoteAgent (consumer)
- A2AServerAdapter wraps any Agent as an A2AEndpoint (producer)
- A2AEventMapper provides bidirectional StreamEvent ↔ AgentEvent mapping
- CanDelegateA2A capability marker

MCP (vertical capability access):
- McpToolSurface wraps tools with server provenance, converts to ToolSurface
- McpResource, McpResourceSurface for URI-addressed data sources
- McpPrompt, McpPromptSurface for parameterized prompt templates
- HasMcpTools, HasMcpResources, HasMcpPrompts capability markers
- McpToolLoader bridges existing McpServer to DSL tool surfaces

Note: MCP resources and prompts are forward-looking core types — the
Claude Agent SDK currently only exposes tools. Implementations will be
added when the SDK supports them.

Also: unseal Capability trait so protocol packages can extend it.
AgentBuilder gains withMcpTools, withMcpResources, withMcpPrompts,
withA2ADelegation methods.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rovements

Experimental capture checking (Phase 7):
- FileSandbox, BudgetSlice, SpawnPermit extend SharedCapability
- SandboxedRun provides scoped callbacks where capabilities can't escape
- Compiler rejects closures that capture capabilities beyond scope
- FileSandbox validates paths at runtime (no directory traversal)
- CaptureCheckingExample demonstrates all three capability types
- CaptureCheckSpec verifies compile-time rejection via typeChecks

Runtime enforcement improvements (from agent review):
- ToolSurface now tracks allowedTools list for provider enforcement
- withReadOnlyTools validates tools are read-only compatible
- ClaudeInterpreter.claudeTransform applies distinctAllowedTools
- A2AInterpreter uses SharedRun pattern (single stream, both events
  and result consumable)
- A2AServerAdapter enhanced with full request handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add zio-blocks-scope 0.0.33 as dependency and create ScopedCapabilities
that wraps FileSandbox, BudgetSlice, SpawnPermit as scope-managed
Resources.

This provides a stable alternative to experimental capture checking:
- $[A] opaque types prevent capabilities from escaping scope boundaries
- Unscoped type class gates what values can leave (no instance for
  capabilities = compile error if you try to return them)
- $ operator macro validates lambda bodies (no capture in closures)
- Zero runtime cost (opaque types erased)

Two approaches now coexist in experimental/:
- Capabilities.scala: real capture checking (SharedCapability + ^)
- ScopedCapabilities.scala: zio-blocks/scope ($[A] + Unscoped + macros)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Make FileSandbox, BudgetSlice, SpawnPermit constructors
  private[experimental] — userland code must go through SandboxedRun
  or ScopedCapabilities factories
- Fix path traversal: use nodePath.relative instead of startsWith
  to catch sibling-prefix escapes (e.g., /safe/../safe-evil/secret.txt)
- Add sibling-prefix escape tests proving the naive startsWith approach
  is vulnerable but resolveSafe catches it
- Add compile-time tests verifying direct instantiation is rejected

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 6: Utility and Evaluation — making "good agent" measurable.

Core types:
- Utility[-P, -O]: observer-dependent scoring trait with built-in
  implementations (costMinimizing, reliability, latencyMinimizing,
  simplicityBiased, weighted composite)
- TraceSummary: rich trace data foldable from AgentEvent streams
  (tool counts, delegation counts, timing, cost, tool names)
- Complexity: execution graph metrics derived from TraceSummary
  (totalNodes, toolCallNodes, delegationNodes, graphDensity)
- Evaluation[P, O]: bundles principal, output, trace, complexity,
  and score into a single inspectable result
- TraceLogger: composable logging to arbitrary sinks (console,
  callback/JSONL, fan-out via all(), noop)

Grounded in the formalization paper:
- U_ω(α) = E[P(accept | α(x))] → Utility trait
- C(α) = E[|G(x)| | x] → Complexity metrics

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Captures the next steps after completing all 7 roadmap phases:
PR/review, docs update, integration testing, ContextKernel,
SDK parity, Conversation DSL. Lists known gaps and key patterns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 3 runnable DSL examples (dsl-basic, dsl-builder, dsl-delegation)
  demonstrating one-shot/streaming, builder+evaluation, and typed delegation
- withTools/withMcpTools now produce CanUseTools[CustomTools] instead of
  AllTools — honest about what the surface actually declares
- Add TraceLogger.callbackZIO for effectful logging without unsafe nesting
- Rewrite EXAMPLES.md with 11 examples using actual implemented API
- Update NEXT.md with complete commit table and current branch state
- Add TraceLoggerSpec, update TypedAgentSpec for CustomTools

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Second interpreter backed by OpenAI Codex SDK (@openai/codex-sdk@0.118.0):

- codex/CodexSdk.scala: JS facades for Codex, Thread, ThreadEvent, ThreadItem
- codex/CodexOptions.scala: SandboxMode, ApprovalMode, client/thread options
- codex/CodexClient.scala: Pure Scala wrapper + CodexEvent/CodexItem ADTs
- interop/codex/CodexEventMapper.scala: 8 item types → AgentEvent normalization
- interop/codex/CodexInterpreter.scala: Agent impl with SharedRun + builder

Zero changes to core/ — same Agent, AgentRun, AgentEvent, ExecutionPolicy,
AgentBuilder, TypedAgent, TraceSummary, Utility, Evaluation types work
identically across Claude and Codex interpreters.

43 test suites, 0 failures (16 new Codex event mapper tests, 7 options tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DslCodexExample: one-shot, builder, evaluation via Codex interpreter
- DslCrossProviderExample: Claude ↔ Codex chain (same DSL, two providers)
- ExampleRunner: runtime dispatcher — all examples linked once, selected
  at runtime via EXAMPLE env var. Fixes --example flag which was broken
  because Mill evaluates Task.Input before Task.Command bodies run.
- examples.go now passes --example as EXAMPLE env to bun subprocess

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mill ScalaJS run() drops CLI args entirely. Override run to pass the
first arg as EXAMPLE env var to bun, where ExampleRunner reads it.

  ./mill examples.run dsl-basic      # run specific example
  ./mill examples.run -- --help      # list all 19 examples
  ./mill examples.run                # default (macro)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ROADMAP.md: mark all 8 phases completed with commit refs
- NEXT.md: 12 commits, 43 test suites, add Running Examples section
- EXAMPLES.md: add Codex + cross-provider examples (12-13), running section
- MAPPING.md: add Codex type inventory and interpreter table
- README.md: add examples.run commands, update project structure
- CLAUDE.md: add examples section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Demonstrates capability-segregated agent cells with need-to-know
communication, classified review routing, and explainable scoring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

PR Review: feat(dsl): provider-independent DSL with type-safe capabilities

This is a substantial and well-architected PR. The core type system design (Peano-encoded Depth, capability phantom types, Agent[P, I, O] variance) is excellent — these are the right abstractions for a provider-independent DSL. Below is a structured review organized by severity.


🔴 Critical

1. Race condition in ClaudeInterpreter.scala — shared state without synchronization

The getShared method uses a mutable var state that is read and written without any synchronization:

var state: SharedState[O] = SharedState.Empty()

def getShared: UIO[SharedRun[O]] =
  ZIO.suspendSucceed {
    state match
      case SharedState.Empty() =>
        for
          promise <- Promise.make[Nothing, SharedRun[O]]
          _ <- ZIO.succeed { state = SharedState.Initializing(promise) }  // not atomic
          ...

Two concurrent fibers can both observe SharedState.Empty() and both try to initialize, creating two independent underlying executions — breaking the single-execution contract. Fix: replace var state with Ref[SharedState[O]] and use Ref.modify for atomic state transitions.


🟠 High

2. Budget subtraction semantics are incorrect for the Unlimited - Unlimited case

def -(other: Budget): Budget = (this, other) match
  case (Unlimited, _)  => Unlimited
  case (_, Unlimited)  => Usd(0.0)  // <-- Unlimited - Unlimited hits this, returning $0
  case (Usd(a), Usd(b)) => Usd(math.max(0, a - b))

Unlimited - Unlimited resolves to the second case and returns Usd(0.0). This is also a precision concern: using Double for financial values can cause epsilon drift over many slice() operations. Consider BigDecimal or at minimum document the precision trade-off. Also clarify whether Usd(0.0).isExhausted should return true or false (currently true via <= 0).


🟡 Medium

3. AgentBuilder.withReadOnlyTools — validation happens after combining

val combined = tools ++ surface
require(combined.isReadOnlyCompatible, ...)

The ToolSurface is combined before the read-only check, meaning tools is already mutated in the new builder instance if a downstream check fails. Additionally, withReadOnlyTools silently injects ToolSurface.readOnlyBuiltins — callers may not expect implicit tools to be added. Consider either making this explicit or documenting it clearly in the method's scaladoc.

4. ToolSurface.++ doesn't deduplicate

def ++(other: ToolSurface): ToolSurface =
  ToolSurface(tools = tools ++ other.tools, allowedTools = allowedTools ++ other.allowedTools)

Multiple with* calls on AgentBuilder can produce duplicate tool entries. distinctAllowedTools only deduplicates at query time — not at construction. isReadOnlyCompatible and size can return inconsistent results. Consider deduplicating in ++ by tool name.

5. isReadOnlyCompatible validates the allowlist, not the tools themselves

def isReadOnlyCompatible: Boolean = allowedTools.forall(ToolName.isReadOnly)

A ToolDef can have side-effects while being registered under a read-only ToolName. The current check only validates the allowlist, not the actual tool behavior. This is a documentation gap at minimum.

6. TypedAgent.delegateTyped — runtime check direction is confusing

require(child.maxRuntimeDepth < maxRuntimeDepth, ...)

The assertion message says "child depth must be < parent depth" but this compares the integer values of depth levels, not delegation capacity. With Peano encoding, Depth2 has maxRuntimeDepth = 2 and Depth1 has maxRuntimeDepth = 1, so the check is functionally correct — but the mental model in the message conflicts with the Peano semantics described in Depth.scala. Add a comment explaining the Int vs. Peano mapping.

7. CodexClient.jsToJson silently swallows errors

private def jsToJson(value: js.Any): Json =
  try { ... zio.json.ast.Json.decoder.decodeJson(str).getOrElse(Json.Str(str)) }
  catch case _: Throwable => Json.Null

Failed MCP tool argument parsing is silently dropped as Json.Null. This makes debugging Codex tool calls very difficult. Consider returning a structured error value or at least logging the failure.

8. AgenticReview.enrich — permit consumed even if review fails

The permit.consume() call is not monadic — if it throws, the effect chain still proceeds to Evaluation.withReview(...). Should be permit.consume() *> Evaluation.withReview(evaluation, reviewer) to enforce sequencing.


🔵 Low / Suggestions

9. Missing experimental module documentation

The experimental/ package has no package.scala or README explaining:

  • That capture checking requires -language:experimental.captureChecking and is Scala 3 experimental
  • That ScopedCapabilities (zio-blocks/scope) is the stable alternative
  • Migration guidance if the feature stabilizes or is removed

This matters because CaptureCheckedOps.scala is in examples/ where users will encounter it first.

10. Utility.weightedNamed vs Utility.weighted — redundant APIs

Both methods solve the same problem (weighted combination). The weighted variant auto-generates names while weightedNamed requires explicit names. This is confusing. Consider deprecating weighted or documenting why both exist.

11. ExecutionPolicy.childPolicy appears unused

def childPolicy(fraction: Double): ExecutionPolicy = ...

This method doesn't appear to be called anywhere in the codebase. If it's intended for delegation, document it; if not, remove it to avoid API bloat.

12. CodexClientDouble.toInt on exit codes

item.exit_code.toOption.map(_.toInt)

JS exit codes are Double. For values outside Int range this silently truncates. Add bounds checking or document the assumption.

13. Test coverage gaps

The DSL test suite is solid, but several critical paths are untested:

  • CodexClient.startThread() / resumeThread() / runStreamed() (zero coverage)
  • CodexInterpreter integration with ExecutionPolicy and AgentBuilder
  • Error event routing through CodexEventMapper to AgentRun.result
  • AgentBuilder.build() with conflicting options (e.g., structured output + base options already having outputFormat)

✅ What's Well Done

  • Capability.scala and Depth.scala: The phantom type system is clean and correct. Inductive given instances for HasSpawn and DepthLTE are textbook-quality.
  • Agent.scala variance (-P, -I, +O) is correctly specified.
  • Security: No hardcoded keys anywhere; environment variable pattern is consistently used across all examples.
  • CaptureCheckSpec: Both positive and negative compile-time tests via typeChecks — this is the right way to test type-level enforcement.
  • build.mill ExampleRunner refactor: The EXAMPLE env-var dispatch via ExampleRunner is a significant UX improvement over the previous mainClass as Task.Input pattern.
  • A2A and MCP interop types: A2ARemoteAgent and McpSurface are well-factored protocol adapters with clear separation from the core DSL.

The foundation is strong. Addressing the race condition in ClaudeInterpreter and the Budget subtraction edge cases would be the highest-priority fixes before merge.

Critical fixes:
- Replace var state with Ref for atomic SharedState transitions in all
  three interpreters (Claude, A2A, Codex), preventing duplicate provider
  runs when events and result are consumed concurrently
- Harden FileSandbox.resolveSafe with realpathSync to reject symlink escapes
- Fail resultPromise on unsuccessful A2A/Codex completions instead of
  surfacing error text as successful output

Additional fixes:
- Add explicit Unlimited-Unlimited case in Budget subtraction
- Deduplicate ToolSurface.++ by tool name; sync filter() with allowedTools
- Wrap permit.consume() in ZIO for proper effect sequencing
- Log CodexClient.jsToJson conversion failures to stderr
- Remove unused ExecutionPolicy.childPolicy
- Add documentation for experimental package, isReadOnlyCompatible,
  withReadOnlyTools injection, withTools MCP caveat, depth check,
  Utility.weighted, and exit code assumption

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

Code Review: Provider-Independent DSL Layer

This is a substantial, well-designed PR. The core ideas — phantom-type capability tracking, Peano-encoded delegation depth, provider-independent Agent[P, I, O] with interpreter pattern — are architecturally sound. The 19 test suites show solid coverage of the type-level properties. Below are issues ranging from security bugs to design nits.


Security / Critical

Capabilities.scala — non-atomic mutations in BudgetSlice and ReviewPermit

// BudgetSlice.spend()
_remaining -= amount          // not atomic

// ReviewPermit.consume()
_remaining -= 1               // not atomic

Both classes are SharedCapability (implying shared ownership), yet state mutations use plain var fields without synchronization. Concurrent calls to spend() can overdraw the budget; concurrent consume() calls can exceed the review allowance. Fix: use AtomicReference or java.util.concurrent.atomic.AtomicDouble if targeting JVM, or a ZIO Ref if this ever runs in a ZIO fiber context.

Capabilities.scala — symlink escape edge case

catch
  case e: SecurityException => throw e
  case _: Throwable => resolved // root doesn't exist yet; lexical check passed

If the root directory is unreadable (permission denied on realpathSync), the whole symlink check is silently skipped and the code falls back to lexical resolution. A symlink planted alongside the root could escape the sandbox in this case. Consider making realpathSync failure on the root a hard error rather than a fallback.

Capabilities.scala — Node.js platform assumption

private val nodePath = scala.scalajs.js.Dynamic.global.require("node:path")
private val nodeFs   = scala.scalajs.js.Dynamic.global.require("node:fs")

This is Scala.js code that requires a Node.js runtime. The code will throw at construction time (not compile time) in any non-Node environment (browser, JVM, Deno). This should be documented prominently in the class Scaladoc, or the constructor should call require lazily and fail with a descriptive UnsupportedOperationException before any capability is used.


Bugs

TypedAgent.scala — depth comparison is off-by-one

// Line 75-76
require(
  child.maxRuntimeDepth < maxRuntimeDepth,
  s"Child depth ${child.maxRuntimeDepth} must be < parent depth $maxRuntimeDepth"
)

The type-level DepthLTE[CD, PD] evidence at call-site is defined as "child depth <= parent depth - 1", i.e., a strict predecessor relationship. But the runtime guard uses strict < on raw integers, so parent=2, child=2 would pass the type check (if DepthLTE is <=) but fail the runtime check, or vice-versa. Verify that DepthValue[D].value for Depth2 and the meaning of "predecessor" match the runtime comparison. A comment explaining the intended invariant would prevent this confusion.

ClaudeInterpreter.scala — daemon fiber leak

_ <- runner.forkDaemon   // no cleanup semantics

The runner fiber is forked but never interrupted if the result is abandoned (e.g., consumer calls .result and ignores the stream, or the ZIO runtime is shut down). This leaks a running fiber for each abandoned AgentRun. Use ZIO.scoped + Fiber.toManagedWith or wrap the fork in a ZIO.acquireRelease so the fiber is interrupted when the scope closes.

CodexEventMapper.scala — mutable shared state without isolation

class MapperState:
  var lastAgentMessage: Option[String] = None
  var startTimeMs: Long = System.currentTimeMillis()

MapperState is a mutable class. If the same MapperState is inadvertently shared across two concurrent streams, the fields corrupt each other. Consider making it immutable and threading state through ZStream.scan or ZStream.mapAccum.


Design Issues

Agent.scalaAgentRun[Any, O] loses ZIO environment type

def run(principal: P, input: I, policy: ExecutionPolicy): AgentRun[Any, O]

The Any environment erases any ZIO service requirements the underlying interpreter may need (e.g., ClaudeAgent, Scope). This forces all interpreters to either bake in their own ZLayer or use Unsafe escapes. Consider threading an R type parameter through Agent[R, P, I, O] and AgentRun[R, O] so callers can see and satisfy service requirements.

ExecutionPolicy.scala — no input validation

Retry(maxAttempts = 0) and maxTurns = Some(-1) are silently accepted. Add require guards:

case Retry(maxAttempts) => require(maxAttempts > 0, "maxAttempts must be positive")
// and in ExecutionPolicy constructor:
maxTurns.foreach(t => require(t > 0, "maxTurns must be positive"))

DelegationPolicy.scalamaxChildTurns not validated

maxChildTurns: Option[Int] = None

Some(-5) is accepted. Add maxChildTurns.foreach(t => require(t > 0, "maxChildTurns must be positive")).

AgentEvent.scalaDelegationStarted/DelegationFinished asymmetry

case DelegationStarted(childId: String, label: String)
case DelegationFinished(childId: String, status: String, summary: Option[RunSummary])

label is present on Started but missing on Finished, making it hard to correlate events in traces without joining on childId. status: String should be a sealed type (e.g., enum DelegationStatus { Success, Failed(error), Cancelled }) to prevent typos and enable exhaustive pattern matching.

Utility.scalasuccessGated breakdown double-counts the gate

The successGated wrapper appends a success_gate component to the breakdown, but the effect of the gate is already visible in the zero-ed out components. Downstream analysis code summing component contributions will over-attribute the impact of gating.


Minor / Quality

  • Budget.scala: Budget.usd(0.0) is allowed and silently produces a budget that immediately exhausts. Either document this as intentional (a no-op budget for dry-run scenarios) or add require(amount > 0).
  • ToolSurface.scala: isReadOnlyCompatible checks allowlist names, not handler behavior. A handler registered under ToolName.Read could still write. This limitation should be noted in the Scaladoc so users understand the check is name-based only.
  • TraceLogger.scala: The console logger uses print() directly; concurrent fiber output will interleave. Consider funneling through Console.print (ZIO's thread-safe version) or buffer per-run.
  • Capability.scala: CanReadMemory, CanWriteMemory, and CanEscalateHuman have no corresponding type-class extractors (HasReadMemory etc.), so they can be declared in phantom types but never queried at runtime. They're currently dead type-level information.
  • build.mill: The --no-server flag was removed from README examples. Make sure this is intentional and not a regression for environments where the Mill server causes issues (CI, Docker).

Test Coverage

The test suite is thorough for compile-time type checking (negative tests via typeChecks, builder accumulation, depth constraints). A few gaps worth considering:

  • No test verifies that BudgetSlice.spend() is safe under concurrent access.
  • ClaudeInterpreter tests (if any) appear to mock out the Claude SDK entirely — a runtime integration test with a mock that exercises the fiber/queue lifecycle would catch the resource-leak issue.
  • CodexEventMapper tests should include a case where MapperState is shared (even accidentally) to document/enforce isolation.

Overall this is high-quality exploratory work. The type-level safety story (Peano depth, phantom capabilities, DepthLTE) is genuinely novel and the builder pattern is clean. The main blockers before production use are the non-atomic mutations in Capabilities, the fiber leak in ClaudeInterpreter, and the Any environment erasure in AgentRun. The rest can be addressed incrementally.

…sed cancellation

- AgentBuilder now tracks McpToolSurface list; agentTransform receives it
- ClaudeInterpreter.claudeTransform registers ToolDef handlers through an
  implicit local MCP server (scalagent_dsl_local_tools), fixing the P2
  issue where withTools only forwarded names without runtime handlers
- ToolSurface.fromDefs derives MCP-qualified allowlist names so provider
  tool resolution matches the registered server
- ToolSurface.filter uses McpToolName-aware matching to correctly drop
  filtered tools even when allowedTools use qualified MCP names
- McpToolName.toToolName normalizes known names to built-in ToolName cases
  so read-only classification works for MCP tool surfaces
- ToolName.isReadOnly extended with McpResolveLibraryId, McpGetLibraryDocs
- All three interpreters: forkDaemon → forkScoped with onInterrupt cleanup,
  ensuring scope closure cancels the underlying provider run
- A2AServerAdapter: track active runs by taskId, cancelHandler interrupts
  the fiber instead of silently finishing
- New tests: scope-based interruption (Claude, A2A), local MCP server
  registration (ClaudeBuilder), MCP tool name normalization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

Code Review: Provider-independent DSL with type-safe capabilities

This is a substantial, well-designed PR. The core ideas are sound and the implementation is generally clean. Below is a mix of blocking issues, design questions, and minor observations.


Architecture — overall impression

The phantom-type capability system (CanUseTools[T], CanSpawn[D], HasBudget, etc.), Peano-encoded delegation depth, and the Agent[-P, -I, +O] trait variance annotations are all appropriate choices. The shared-run pattern in ClaudeInterpreter (one provider invocation shared between .events and .result) is correctly implemented using an atomic Ref.modify + Promise, avoiding a double-execution race. Good work on that.


Issues worth fixing before merge

1. Budget.usd should guard against NaN/Infinity (Budget.scala)

Budget.usd(Double.PositiveInfinity) and Budget.usd(Double.NaN) both pass the current >= 0 check and silently create meaningless budget values that break arithmetic downstream. Add:

def usd(amount: Double): Budget =
  require(amount >= 0 && amount.isFinite, s"Budget must be a finite non-negative number: $amount")

Also the (_, Unlimited) arm in - deserves a comment ("spending an unlimited amount from a finite budget exhausts it") or an error, since this case is unlikely and confusing.

2. Utility.weightedNamed doesn't normalize weights (Utility.scala)

If caller passes weights summing to 2.0, the composite score can exceed 1.0. Either normalize internally, add a require that weights sum to approximately 1.0, or document the invariant. The factory docstring currently says nothing about this.

3. Runtime.default inside claudeTransform creates a detached runtime (ClaudeInterpreter.scala)

val runtime = Runtime.default

This runs at agent construction time and creates a new default ZIO runtime detached from any surrounding runtime context. If the caller's ZIO runtime has a custom layer (tracing, executor, etc.), the MCP tool server won't inherit it. Consider threading the runtime from the call site or using ZIO.runtime[Any] inside a ZIO effect.

4. Agent.run hardcodes R=Any (Agent.scala)

def run(principal: P, input: I, policy: ExecutionPolicy = ExecutionPolicy.unbounded): AgentRun[Any, O]

The return type is hardcoded to AgentRun[Any, O], preventing implementations from expressing environment requirements at the trait level. If intentional (all env deps resolved at interpreter construction), document this constraint clearly.


Design questions

5. Ref.unsafe.make in ClaudeInterpreter.make

val stateRef: Ref[SharedState[O]] = Unsafe.unsafe { implicit u =>
  Ref.unsafe.make[SharedState[O]](SharedState.Empty())
}

This works (ZIO Ref is a pure value so eager allocation is fine), but it's unusual and may surprise readers. Hoisting into a for-comprehension using Ref.make would be more idiomatic.

6. parseEvent fallthrough in CodexClient.scala discards unknown events

case other =>
  CodexEvent.Error(s"Unknown event type: $other")

If the Codex SDK adds a new event type, you'd see a synthetic Error event instead of structured unknown data. Consider a Native(tag: String, raw: js.Dynamic) case in CodexEvent (mirroring AgentEvent.Native).

7. CanReadMemory / CanWriteMemory defined but not wired

The capability markers exist in Capability.scala but the agentTransform only wires tool surfaces and MCP surfaces. If stubs for future work, a // TODO comment would make the gap explicit.


Minor observations

  • var opts in overlayPolicy — mutable variable pattern could be replaced with a fold chain for clarity, not wrong.
  • dsl-cells CLI key maps to DslClandestineCellExample — name mismatch is a minor discoverability issue.
  • --no-server flag removal — original ./mill --no-server in the README was likely there for CI reproducibility. Removed without explanation; worth confirming this is intentional.
  • Stale branch referencedocs/dsl/EXAMPLES.md line 1 says "All examples use the actual DSL API on dsl/core-exploration". Should be cleaned up before merge.
  • Tests — compile-time DepthLTE tests using typeChecks/typeCheckErrors are excellent. Would be good to add a BudgetSpec covering the Unlimited - Usd(x) arithmetic edge cases.

Summary

Severity Item
Fix Budget.usd should guard against NaN/Infinity
Fix Utility.weightedNamed — document or enforce weight normalization
Fix Runtime.default in claudeTransform creates detached runtime
Clarify Agent.run R=Any hardcoding — intentional or oversight?
Minor CanReadMemory/CanWriteMemory stubs need TODO annotations
Minor Stale branch reference in EXAMPLES.md
Minor --no-server removal from README needs explanation

Overall the design is solid and the type-safety story (phantom capabilities, Peano depth, DepthLTE compile-time checks) is well executed. The main concerns are Budget edge cases, missing weight validation in Utility, and the runtime-context issue in claudeTransform.

@arcaputo3 arcaputo3 merged commit 709b4cc into main Apr 6, 2026
3 checks passed
@arcaputo3 arcaputo3 deleted the dsl/core-exploration branch April 6, 2026 21:56
arcaputo3 added a commit that referenced this pull request Apr 7, 2026
…rmissions

Built on top of the squash-merged DSL core (#33). Adds:

- Safe-by-default tool access (no tools unless opted in)
- Structured output with StructuredOutput.derive + cross-provider (Claude + Codex)
- README rewrite leading with DSL, not SDK wrapper
- Directory-scoped agents with PreToolUse hook enforcement
- PermissionMode.Auto for model-classified tool approvals
- HookOutput.ToolPermission for per-tool-call permission decisions
- BuilderConfig refactor replacing 5-param agentTransform
- Codex sandboxedBuilder, typedBuilder, and SDK parity updates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
arcaputo3 added a commit that referenced this pull request Apr 7, 2026
…rmissions

Built on top of the squash-merged DSL core (#33). Adds:

- Safe-by-default tool access (no tools unless opted in)
- Structured output with StructuredOutput.derive + cross-provider (Claude + Codex)
- README rewrite leading with DSL, not SDK wrapper
- Directory-scoped agents with PreToolUse hook enforcement
- PermissionMode.Auto for model-classified tool approvals
- HookOutput.ToolPermission for per-tool-call permission decisions
- BuilderConfig refactor replacing 5-param agentTransform
- Codex sandboxedBuilder, typedBuilder, and SDK parity updates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
arcaputo3 added a commit that referenced this pull request Apr 7, 2026
…rmissions

Built on top of the squash-merged DSL core (#33). Adds:

- Safe-by-default tool access (no tools unless opted in)
- Structured output with StructuredOutput.derive + cross-provider (Claude + Codex)
- README rewrite leading with DSL, not SDK wrapper
- Directory-scoped agents with PreToolUse hook enforcement
- PermissionMode.Auto for model-classified tool approvals
- HookOutput.ToolPermission for per-tool-call permission decisions
- BuilderConfig refactor replacing 5-param agentTransform
- Codex sandboxedBuilder, typedBuilder, and SDK parity updates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
arcaputo3 added a commit that referenced this pull request Apr 7, 2026
…rmissions (#34)

Built on top of the squash-merged DSL core (#33). Adds:

- Safe-by-default tool access (no tools unless opted in)
- Structured output with StructuredOutput.derive + cross-provider (Claude + Codex)
- README rewrite leading with DSL, not SDK wrapper
- Directory-scoped agents with PreToolUse hook enforcement
- PermissionMode.Auto for model-classified tool approvals
- HookOutput.ToolPermission for per-tool-call permission decisions
- BuilderConfig refactor replacing 5-param agentTransform
- Codex sandboxedBuilder, typedBuilder, and SDK parity updates

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant