feat(dsl): provider-independent DSL with type-safe capabilities by arcaputo3 · Pull Request #33 · TJC-LP/scalagent

arcaputo3 · 2026-04-06T20:53:48Z

Summary

Introduces a provider-independent DSL layer for Scalagent that decouples agent definitions from any specific LLM provider (Claude, Codex, etc.).

Core type system: TypedAgent[In, Out], Capability, Budget, Depth (Peano-encoded delegation limits), ExecutionPolicy, Utility scoring, TraceLogger/TraceSummary, Classification-gated review
Interpreters: ClaudeInterpreter, CodexInterpreter, A2AInterpreter — same agent definition runs on any backend
Protocol support: First-class A2A and MCP capability types with McpSurface tool loading
Safety layers: Capture-checked capabilities (experimental), ScopedCapabilities with runtime enforcement, AgenticReview with explainable scoring and classified segregation
Builder API: Fluent AgentBuilder for constructing agents without boilerplate
Examples: 8 runnable examples (dsl-basic, dsl-builder, dsl-delegation, dsl-review, dsl-cells, dsl-codex, dsl-cross, capture)
Tests: 19 new test suites covering core types, interpreters, event mappers, and experimental features
Docs: docs/dsl/ with foundations, protocols, mapping guide, examples, roadmap, and next steps

Test plan

./mill examples.compile — all examples compile
./mill agent.test — full test suite passes
Manual: ./mill examples.run dsl-basic end-to-end with API key
Manual: ./mill examples.run dsl-codex with Codex API key
Review docs in docs/dsl/ for accuracy

🤖 Generated with Claude Code

…preter Add a semantic DSL core (`com.tjclp.scalagent.core`) that sits above the existing Claude-specific API without breaking it. This includes: - Agent[-P, -I, +O] trait with principal, input, and policy parameters - AgentRun[-R, +O] with shared event stream and typed result - AgentEvent enum (8 cases) normalizing provider events with Native escape hatch - ExecutionPolicy, Budget, StopStrategy, FallbackPolicy for semantic constraints - OutputCodec[O] type class bridging String and StructuredOutput[A] - RunSummary for terminal event metadata ClaudeInterpreter bridges core Agent to existing ClaudeAgent runtime using Queue+Promise shared execution (single provider call, both events and result consumable). EventMapper provides pure AgentMessage → List[AgentEvent] mapping. Design docs at docs/dsl/ cover foundations, type mapping, examples, protocol boundaries, and phased roadmap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Introduce a capability system using Scala 3 intersection types and Peano-encoded delegation depth for compile-time enforcement: - Capability marker traits: CanUseTools[T], CanSpawn[D], CanReadMemory, CanWriteMemory, CanEscalateHuman, HasBudget - Peano depth types (Z, S[N]) with DepthLTE for compile-time comparison - TypedAgent[P, I, O, C] wraps Agent with phantom capability evidence - AgentBuilder accumulates capabilities via intersection type growth - HasSpawn/HasToolsCap type classes extract evidence from intersections - delegate() gated by HasSpawn — compile error without CanSpawn - delegateTyped() enforces DepthLTE at compile time — equal or deeper child depth is a type error, not just a runtime assertion - DelegationPolicy for budget slicing and turn limits - ToolSurface for typed tool collections (composes with ++) - ClaudeInterpreter.builder() provides agentTransform that wires tool restrictions into AgentOptions at runtime Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Make A2A and MCP first-class citizens in the DSL vocabulary: A2A (horizontal coordination): - A2ARemoteAgent trait extends Agent — remote agents are seamlessly usable as delegation targets, in AgentBuilder, or called directly - A2AInterpreter wraps A2AClient as A2ARemoteAgent (consumer) - A2AServerAdapter wraps any Agent as an A2AEndpoint (producer) - A2AEventMapper provides bidirectional StreamEvent ↔ AgentEvent mapping - CanDelegateA2A capability marker MCP (vertical capability access): - McpToolSurface wraps tools with server provenance, converts to ToolSurface - McpResource, McpResourceSurface for URI-addressed data sources - McpPrompt, McpPromptSurface for parameterized prompt templates - HasMcpTools, HasMcpResources, HasMcpPrompts capability markers - McpToolLoader bridges existing McpServer to DSL tool surfaces Note: MCP resources and prompts are forward-looking core types — the Claude Agent SDK currently only exposes tools. Implementations will be added when the SDK supports them. Also: unseal Capability trait so protocol packages can extend it. AgentBuilder gains withMcpTools, withMcpResources, withMcpPrompts, withA2ADelegation methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rovements Experimental capture checking (Phase 7): - FileSandbox, BudgetSlice, SpawnPermit extend SharedCapability - SandboxedRun provides scoped callbacks where capabilities can't escape - Compiler rejects closures that capture capabilities beyond scope - FileSandbox validates paths at runtime (no directory traversal) - CaptureCheckingExample demonstrates all three capability types - CaptureCheckSpec verifies compile-time rejection via typeChecks Runtime enforcement improvements (from agent review): - ToolSurface now tracks allowedTools list for provider enforcement - withReadOnlyTools validates tools are read-only compatible - ClaudeInterpreter.claudeTransform applies distinctAllowedTools - A2AInterpreter uses SharedRun pattern (single stream, both events and result consumable) - A2AServerAdapter enhanced with full request handling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add zio-blocks-scope 0.0.33 as dependency and create ScopedCapabilities that wraps FileSandbox, BudgetSlice, SpawnPermit as scope-managed Resources. This provides a stable alternative to experimental capture checking: - $[A] opaque types prevent capabilities from escaping scope boundaries - Unscoped type class gates what values can leave (no instance for capabilities = compile error if you try to return them) - $ operator macro validates lambda bodies (no capture in closures) - Zero runtime cost (opaque types erased) Two approaches now coexist in experimental/: - Capabilities.scala: real capture checking (SharedCapability + ^) - ScopedCapabilities.scala: zio-blocks/scope ($[A] + Unscoped + macros) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Make FileSandbox, BudgetSlice, SpawnPermit constructors private[experimental] — userland code must go through SandboxedRun or ScopedCapabilities factories - Fix path traversal: use nodePath.relative instead of startsWith to catch sibling-prefix escapes (e.g., /safe/../safe-evil/secret.txt) - Add sibling-prefix escape tests proving the naive startsWith approach is vulnerable but resolveSafe catches it - Add compile-time tests verifying direct instantiation is rejected Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 6: Utility and Evaluation — making "good agent" measurable. Core types: - Utility[-P, -O]: observer-dependent scoring trait with built-in implementations (costMinimizing, reliability, latencyMinimizing, simplicityBiased, weighted composite) - TraceSummary: rich trace data foldable from AgentEvent streams (tool counts, delegation counts, timing, cost, tool names) - Complexity: execution graph metrics derived from TraceSummary (totalNodes, toolCallNodes, delegationNodes, graphDensity) - Evaluation[P, O]: bundles principal, output, trace, complexity, and score into a single inspectable result - TraceLogger: composable logging to arbitrary sinks (console, callback/JSONL, fan-out via all(), noop) Grounded in the formalization paper: - U_ω(α) = E[P(accept | α(x))] → Utility trait - C(α) = E[|G(x)| | x] → Complexity metrics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Captures the next steps after completing all 7 roadmap phases: PR/review, docs update, integration testing, ContextKernel, SDK parity, Conversation DSL. Lists known gaps and key patterns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add 3 runnable DSL examples (dsl-basic, dsl-builder, dsl-delegation) demonstrating one-shot/streaming, builder+evaluation, and typed delegation - withTools/withMcpTools now produce CanUseTools[CustomTools] instead of AllTools — honest about what the surface actually declares - Add TraceLogger.callbackZIO for effectful logging without unsafe nesting - Rewrite EXAMPLES.md with 11 examples using actual implemented API - Update NEXT.md with complete commit table and current branch state - Add TraceLoggerSpec, update TypedAgentSpec for CustomTools Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Second interpreter backed by OpenAI Codex SDK (@openai/codex-sdk@0.118.0): - codex/CodexSdk.scala: JS facades for Codex, Thread, ThreadEvent, ThreadItem - codex/CodexOptions.scala: SandboxMode, ApprovalMode, client/thread options - codex/CodexClient.scala: Pure Scala wrapper + CodexEvent/CodexItem ADTs - interop/codex/CodexEventMapper.scala: 8 item types → AgentEvent normalization - interop/codex/CodexInterpreter.scala: Agent impl with SharedRun + builder Zero changes to core/ — same Agent, AgentRun, AgentEvent, ExecutionPolicy, AgentBuilder, TypedAgent, TraceSummary, Utility, Evaluation types work identically across Claude and Codex interpreters. 43 test suites, 0 failures (16 new Codex event mapper tests, 7 options tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- DslCodexExample: one-shot, builder, evaluation via Codex interpreter - DslCrossProviderExample: Claude ↔ Codex chain (same DSL, two providers) - ExampleRunner: runtime dispatcher — all examples linked once, selected at runtime via EXAMPLE env var. Fixes --example flag which was broken because Mill evaluates Task.Input before Task.Command bodies run. - examples.go now passes --example as EXAMPLE env to bun subprocess Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Mill ScalaJS run() drops CLI args entirely. Override run to pass the first arg as EXAMPLE env var to bun, where ExampleRunner reads it. ./mill examples.run dsl-basic # run specific example ./mill examples.run -- --help # list all 19 examples ./mill examples.run # default (macro) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- ROADMAP.md: mark all 8 phases completed with commit refs - NEXT.md: 12 commits, 43 test suites, add Running Examples section - EXAMPLES.md: add Codex + cross-provider examples (12-13), running section - MAPPING.md: add Codex type inventory and interpreter table - README.md: add examples.run commands, update project structure - CLAUDE.md: add examples section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Demonstrates capability-segregated agent cells with need-to-know communication, classified review routing, and explainable scoring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-06T20:57:39Z

PR Review: `feat(dsl): provider-independent DSL with type-safe capabilities`

This is a substantial and well-architected PR. The core type system design (Peano-encoded Depth, capability phantom types, Agent[P, I, O] variance) is excellent — these are the right abstractions for a provider-independent DSL. Below is a structured review organized by severity.

🔴 Critical

1. Race condition in `ClaudeInterpreter.scala` — shared state without synchronization

The getShared method uses a mutable var state that is read and written without any synchronization:

var state: SharedState[O] = SharedState.Empty()

def getShared: UIO[SharedRun[O]] =
  ZIO.suspendSucceed {
    state match
      case SharedState.Empty() =>
        for
          promise <- Promise.make[Nothing, SharedRun[O]]
          _ <- ZIO.succeed { state = SharedState.Initializing(promise) }  // not atomic
          ...

Two concurrent fibers can both observe SharedState.Empty() and both try to initialize, creating two independent underlying executions — breaking the single-execution contract. Fix: replace var state with Ref[SharedState[O]] and use Ref.modify for atomic state transitions.

🟠 High

2. `Budget` subtraction semantics are incorrect for the `Unlimited - Unlimited` case

def -(other: Budget): Budget = (this, other) match
  case (Unlimited, _)  => Unlimited
  case (_, Unlimited)  => Usd(0.0)  // <-- Unlimited - Unlimited hits this, returning $0
  case (Usd(a), Usd(b)) => Usd(math.max(0, a - b))

Unlimited - Unlimited resolves to the second case and returns Usd(0.0). This is also a precision concern: using Double for financial values can cause epsilon drift over many slice() operations. Consider BigDecimal or at minimum document the precision trade-off. Also clarify whether Usd(0.0).isExhausted should return true or false (currently true via <= 0).

🟡 Medium

3. `AgentBuilder.withReadOnlyTools` — validation happens after combining

val combined = tools ++ surface
require(combined.isReadOnlyCompatible, ...)

The ToolSurface is combined before the read-only check, meaning tools is already mutated in the new builder instance if a downstream check fails. Additionally, withReadOnlyTools silently injects ToolSurface.readOnlyBuiltins — callers may not expect implicit tools to be added. Consider either making this explicit or documenting it clearly in the method's scaladoc.

4. `ToolSurface.++` doesn't deduplicate

def ++(other: ToolSurface): ToolSurface =
  ToolSurface(tools = tools ++ other.tools, allowedTools = allowedTools ++ other.allowedTools)

Multiple with* calls on AgentBuilder can produce duplicate tool entries. distinctAllowedTools only deduplicates at query time — not at construction. isReadOnlyCompatible and size can return inconsistent results. Consider deduplicating in ++ by tool name.

5. `isReadOnlyCompatible` validates the allowlist, not the tools themselves

def isReadOnlyCompatible: Boolean = allowedTools.forall(ToolName.isReadOnly)

A ToolDef can have side-effects while being registered under a read-only ToolName. The current check only validates the allowlist, not the actual tool behavior. This is a documentation gap at minimum.

6. `TypedAgent.delegateTyped` — runtime check direction is confusing

require(child.maxRuntimeDepth < maxRuntimeDepth, ...)

The assertion message says "child depth must be < parent depth" but this compares the integer values of depth levels, not delegation capacity. With Peano encoding, Depth2 has maxRuntimeDepth = 2 and Depth1 has maxRuntimeDepth = 1, so the check is functionally correct — but the mental model in the message conflicts with the Peano semantics described in Depth.scala. Add a comment explaining the Int vs. Peano mapping.

7. `CodexClient.jsToJson` silently swallows errors

private def jsToJson(value: js.Any): Json =
  try { ... zio.json.ast.Json.decoder.decodeJson(str).getOrElse(Json.Str(str)) }
  catch case _: Throwable => Json.Null

Failed MCP tool argument parsing is silently dropped as Json.Null. This makes debugging Codex tool calls very difficult. Consider returning a structured error value or at least logging the failure.

8. `AgenticReview.enrich` — permit consumed even if review fails

The permit.consume() call is not monadic — if it throws, the effect chain still proceeds to Evaluation.withReview(...). Should be permit.consume() *> Evaluation.withReview(evaluation, reviewer) to enforce sequencing.

🔵 Low / Suggestions

9. Missing experimental module documentation

The experimental/ package has no package.scala or README explaining:

That capture checking requires -language:experimental.captureChecking and is Scala 3 experimental
That ScopedCapabilities (zio-blocks/scope) is the stable alternative
Migration guidance if the feature stabilizes or is removed

This matters because CaptureCheckedOps.scala is in examples/ where users will encounter it first.

10. `Utility.weightedNamed` vs `Utility.weighted` — redundant APIs

Both methods solve the same problem (weighted combination). The weighted variant auto-generates names while weightedNamed requires explicit names. This is confusing. Consider deprecating weighted or documenting why both exist.

11. `ExecutionPolicy.childPolicy` appears unused

def childPolicy(fraction: Double): ExecutionPolicy = ...

This method doesn't appear to be called anywhere in the codebase. If it's intended for delegation, document it; if not, remove it to avoid API bloat.

12. `CodexClient` — `Double.toInt` on exit codes

item.exit_code.toOption.map(_.toInt)

JS exit codes are Double. For values outside Int range this silently truncates. Add bounds checking or document the assumption.

13. Test coverage gaps

The DSL test suite is solid, but several critical paths are untested:

CodexClient.startThread() / resumeThread() / runStreamed() (zero coverage)
CodexInterpreter integration with ExecutionPolicy and AgentBuilder
Error event routing through CodexEventMapper to AgentRun.result
AgentBuilder.build() with conflicting options (e.g., structured output + base options already having outputFormat)

✅ What's Well Done

Capability.scala and Depth.scala: The phantom type system is clean and correct. Inductive given instances for HasSpawn and DepthLTE are textbook-quality.
Agent.scala variance (-P, -I, +O) is correctly specified.
Security: No hardcoded keys anywhere; environment variable pattern is consistently used across all examples.
CaptureCheckSpec: Both positive and negative compile-time tests via typeChecks — this is the right way to test type-level enforcement.
build.mill ExampleRunner refactor: The EXAMPLE env-var dispatch via ExampleRunner is a significant UX improvement over the previous mainClass as Task.Input pattern.
A2A and MCP interop types: A2ARemoteAgent and McpSurface are well-factored protocol adapters with clear separation from the core DSL.

The foundation is strong. Addressing the race condition in ClaudeInterpreter and the Budget subtraction edge cases would be the highest-priority fixes before merge.

Critical fixes: - Replace var state with Ref for atomic SharedState transitions in all three interpreters (Claude, A2A, Codex), preventing duplicate provider runs when events and result are consumed concurrently - Harden FileSandbox.resolveSafe with realpathSync to reject symlink escapes - Fail resultPromise on unsuccessful A2A/Codex completions instead of surfacing error text as successful output Additional fixes: - Add explicit Unlimited-Unlimited case in Budget subtraction - Deduplicate ToolSurface.++ by tool name; sync filter() with allowedTools - Wrap permit.consume() in ZIO for proper effect sequencing - Log CodexClient.jsToJson conversion failures to stderr - Remove unused ExecutionPolicy.childPolicy - Add documentation for experimental package, isReadOnlyCompatible, withReadOnlyTools injection, withTools MCP caveat, depth check, Utility.weighted, and exit code assumption Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-06T21:21:37Z

Code Review: Provider-Independent DSL Layer

This is a substantial, well-designed PR. The core ideas — phantom-type capability tracking, Peano-encoded delegation depth, provider-independent Agent[P, I, O] with interpreter pattern — are architecturally sound. The 19 test suites show solid coverage of the type-level properties. Below are issues ranging from security bugs to design nits.

Security / Critical

Capabilities.scala — non-atomic mutations in BudgetSlice and ReviewPermit

// BudgetSlice.spend()
_remaining -= amount          // not atomic

// ReviewPermit.consume()
_remaining -= 1               // not atomic

Both classes are SharedCapability (implying shared ownership), yet state mutations use plain var fields without synchronization. Concurrent calls to spend() can overdraw the budget; concurrent consume() calls can exceed the review allowance. Fix: use AtomicReference or java.util.concurrent.atomic.AtomicDouble if targeting JVM, or a ZIO Ref if this ever runs in a ZIO fiber context.

Capabilities.scala — symlink escape edge case

catch
  case e: SecurityException => throw e
  case _: Throwable => resolved // root doesn't exist yet; lexical check passed

If the root directory is unreadable (permission denied on realpathSync), the whole symlink check is silently skipped and the code falls back to lexical resolution. A symlink planted alongside the root could escape the sandbox in this case. Consider making realpathSync failure on the root a hard error rather than a fallback.

Capabilities.scala — Node.js platform assumption

private val nodePath = scala.scalajs.js.Dynamic.global.require("node:path")
private val nodeFs   = scala.scalajs.js.Dynamic.global.require("node:fs")

This is Scala.js code that requires a Node.js runtime. The code will throw at construction time (not compile time) in any non-Node environment (browser, JVM, Deno). This should be documented prominently in the class Scaladoc, or the constructor should call require lazily and fail with a descriptive UnsupportedOperationException before any capability is used.

Bugs

TypedAgent.scala — depth comparison is off-by-one

// Line 75-76
require(
  child.maxRuntimeDepth < maxRuntimeDepth,
  s"Child depth ${child.maxRuntimeDepth} must be < parent depth $maxRuntimeDepth"
)

The type-level DepthLTE[CD, PD] evidence at call-site is defined as "child depth <= parent depth - 1", i.e., a strict predecessor relationship. But the runtime guard uses strict < on raw integers, so parent=2, child=2 would pass the type check (if DepthLTE is <=) but fail the runtime check, or vice-versa. Verify that DepthValue[D].value for Depth2 and the meaning of "predecessor" match the runtime comparison. A comment explaining the intended invariant would prevent this confusion.

ClaudeInterpreter.scala — daemon fiber leak

_ <- runner.forkDaemon   // no cleanup semantics

The runner fiber is forked but never interrupted if the result is abandoned (e.g., consumer calls .result and ignores the stream, or the ZIO runtime is shut down). This leaks a running fiber for each abandoned AgentRun. Use ZIO.scoped + Fiber.toManagedWith or wrap the fork in a ZIO.acquireRelease so the fiber is interrupted when the scope closes.

CodexEventMapper.scala — mutable shared state without isolation

class MapperState:
  var lastAgentMessage: Option[String] = None
  var startTimeMs: Long = System.currentTimeMillis()

MapperState is a mutable class. If the same MapperState is inadvertently shared across two concurrent streams, the fields corrupt each other. Consider making it immutable and threading state through ZStream.scan or ZStream.mapAccum.

Design Issues

Agent.scala — AgentRun[Any, O] loses ZIO environment type

def run(principal: P, input: I, policy: ExecutionPolicy): AgentRun[Any, O]

The Any environment erases any ZIO service requirements the underlying interpreter may need (e.g., ClaudeAgent, Scope). This forces all interpreters to either bake in their own ZLayer or use Unsafe escapes. Consider threading an R type parameter through Agent[R, P, I, O] and AgentRun[R, O] so callers can see and satisfy service requirements.

ExecutionPolicy.scala — no input validation

Retry(maxAttempts = 0) and maxTurns = Some(-1) are silently accepted. Add require guards:

case Retry(maxAttempts) => require(maxAttempts > 0, "maxAttempts must be positive")
// and in ExecutionPolicy constructor:
maxTurns.foreach(t => require(t > 0, "maxTurns must be positive"))

DelegationPolicy.scala — maxChildTurns not validated

maxChildTurns: Option[Int] = None

Some(-5) is accepted. Add maxChildTurns.foreach(t => require(t > 0, "maxChildTurns must be positive")).

AgentEvent.scala — DelegationStarted/DelegationFinished asymmetry

case DelegationStarted(childId: String, label: String)
case DelegationFinished(childId: String, status: String, summary: Option[RunSummary])

label is present on Started but missing on Finished, making it hard to correlate events in traces without joining on childId. status: String should be a sealed type (e.g., enum DelegationStatus { Success, Failed(error), Cancelled }) to prevent typos and enable exhaustive pattern matching.

Utility.scala — successGated breakdown double-counts the gate

The successGated wrapper appends a success_gate component to the breakdown, but the effect of the gate is already visible in the zero-ed out components. Downstream analysis code summing component contributions will over-attribute the impact of gating.

Minor / Quality

Budget.scala: Budget.usd(0.0) is allowed and silently produces a budget that immediately exhausts. Either document this as intentional (a no-op budget for dry-run scenarios) or add require(amount > 0).
ToolSurface.scala: isReadOnlyCompatible checks allowlist names, not handler behavior. A handler registered under ToolName.Read could still write. This limitation should be noted in the Scaladoc so users understand the check is name-based only.
TraceLogger.scala: The console logger uses print() directly; concurrent fiber output will interleave. Consider funneling through Console.print (ZIO's thread-safe version) or buffer per-run.
Capability.scala: CanReadMemory, CanWriteMemory, and CanEscalateHuman have no corresponding type-class extractors (HasReadMemory etc.), so they can be declared in phantom types but never queried at runtime. They're currently dead type-level information.
build.mill: The --no-server flag was removed from README examples. Make sure this is intentional and not a regression for environments where the Mill server causes issues (CI, Docker).

Test Coverage

The test suite is thorough for compile-time type checking (negative tests via typeChecks, builder accumulation, depth constraints). A few gaps worth considering:

No test verifies that BudgetSlice.spend() is safe under concurrent access.
ClaudeInterpreter tests (if any) appear to mock out the Claude SDK entirely — a runtime integration test with a mock that exercises the fiber/queue lifecycle would catch the resource-leak issue.
CodexEventMapper tests should include a case where MapperState is shared (even accidentally) to document/enforce isolation.

Overall this is high-quality exploratory work. The type-level safety story (Peano depth, phantom capabilities, DepthLTE) is genuinely novel and the builder pattern is clean. The main blockers before production use are the non-atomic mutations in Capabilities, the fiber leak in ClaudeInterpreter, and the Any environment erasure in AgentRun. The rest can be addressed incrementally.

…sed cancellation - AgentBuilder now tracks McpToolSurface list; agentTransform receives it - ClaudeInterpreter.claudeTransform registers ToolDef handlers through an implicit local MCP server (scalagent_dsl_local_tools), fixing the P2 issue where withTools only forwarded names without runtime handlers - ToolSurface.fromDefs derives MCP-qualified allowlist names so provider tool resolution matches the registered server - ToolSurface.filter uses McpToolName-aware matching to correctly drop filtered tools even when allowedTools use qualified MCP names - McpToolName.toToolName normalizes known names to built-in ToolName cases so read-only classification works for MCP tool surfaces - ToolName.isReadOnly extended with McpResolveLibraryId, McpGetLibraryDocs - All three interpreters: forkDaemon → forkScoped with onInterrupt cleanup, ensuring scope closure cancels the underlying provider run - A2AServerAdapter: track active runs by taskId, cancelHandler interrupts the fiber instead of silently finishing - New tests: scope-based interruption (Claude, A2A), local MCP server registration (ClaudeBuilder), MCP tool name normalization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-06T21:53:29Z

Code Review: Provider-independent DSL with type-safe capabilities

This is a substantial, well-designed PR. The core ideas are sound and the implementation is generally clean. Below is a mix of blocking issues, design questions, and minor observations.

Architecture — overall impression

The phantom-type capability system (CanUseTools[T], CanSpawn[D], HasBudget, etc.), Peano-encoded delegation depth, and the Agent[-P, -I, +O] trait variance annotations are all appropriate choices. The shared-run pattern in ClaudeInterpreter (one provider invocation shared between .events and .result) is correctly implemented using an atomic Ref.modify + Promise, avoiding a double-execution race. Good work on that.

Issues worth fixing before merge

1. `Budget.usd` should guard against `NaN`/`Infinity` (`Budget.scala`)

Budget.usd(Double.PositiveInfinity) and Budget.usd(Double.NaN) both pass the current >= 0 check and silently create meaningless budget values that break arithmetic downstream. Add:

def usd(amount: Double): Budget =
  require(amount >= 0 && amount.isFinite, s"Budget must be a finite non-negative number: $amount")

Also the (_, Unlimited) arm in - deserves a comment ("spending an unlimited amount from a finite budget exhausts it") or an error, since this case is unlikely and confusing.

2. `Utility.weightedNamed` doesn't normalize weights (`Utility.scala`)

If caller passes weights summing to 2.0, the composite score can exceed 1.0. Either normalize internally, add a require that weights sum to approximately 1.0, or document the invariant. The factory docstring currently says nothing about this.

3. `Runtime.default` inside `claudeTransform` creates a detached runtime (`ClaudeInterpreter.scala`)

val runtime = Runtime.default

This runs at agent construction time and creates a new default ZIO runtime detached from any surrounding runtime context. If the caller's ZIO runtime has a custom layer (tracing, executor, etc.), the MCP tool server won't inherit it. Consider threading the runtime from the call site or using ZIO.runtime[Any] inside a ZIO effect.

4. `Agent.run` hardcodes `R=Any` (`Agent.scala`)

def run(principal: P, input: I, policy: ExecutionPolicy = ExecutionPolicy.unbounded): AgentRun[Any, O]

The return type is hardcoded to AgentRun[Any, O], preventing implementations from expressing environment requirements at the trait level. If intentional (all env deps resolved at interpreter construction), document this constraint clearly.

Design questions

5. `Ref.unsafe.make` in `ClaudeInterpreter.make`

val stateRef: Ref[SharedState[O]] = Unsafe.unsafe { implicit u =>
  Ref.unsafe.make[SharedState[O]](SharedState.Empty())
}

This works (ZIO Ref is a pure value so eager allocation is fine), but it's unusual and may surprise readers. Hoisting into a for-comprehension using Ref.make would be more idiomatic.

6. `parseEvent` fallthrough in `CodexClient.scala` discards unknown events

case other =>
  CodexEvent.Error(s"Unknown event type: $other")

If the Codex SDK adds a new event type, you'd see a synthetic Error event instead of structured unknown data. Consider a Native(tag: String, raw: js.Dynamic) case in CodexEvent (mirroring AgentEvent.Native).

7. `CanReadMemory` / `CanWriteMemory` defined but not wired

The capability markers exist in Capability.scala but the agentTransform only wires tool surfaces and MCP surfaces. If stubs for future work, a // TODO comment would make the gap explicit.

Minor observations

var opts in overlayPolicy — mutable variable pattern could be replaced with a fold chain for clarity, not wrong.
dsl-cells CLI key maps to DslClandestineCellExample — name mismatch is a minor discoverability issue.
--no-server flag removal — original ./mill --no-server in the README was likely there for CI reproducibility. Removed without explanation; worth confirming this is intentional.
Stale branch reference — docs/dsl/EXAMPLES.md line 1 says "All examples use the actual DSL API on dsl/core-exploration". Should be cleaned up before merge.
Tests — compile-time DepthLTE tests using typeChecks/typeCheckErrors are excellent. Would be good to add a BudgetSpec covering the Unlimited - Usd(x) arithmetic edge cases.

Summary

Severity	Item
Fix	`Budget.usd` should guard against `NaN`/`Infinity`
Fix	`Utility.weightedNamed` — document or enforce weight normalization
Fix	`Runtime.default` in `claudeTransform` creates detached runtime
Clarify	`Agent.run` `R=Any` hardcoding — intentional or oversight?
Minor	`CanReadMemory`/`CanWriteMemory` stubs need `TODO` annotations
Minor	Stale branch reference in `EXAMPLES.md`
Minor	`--no-server` removal from README needs explanation

Overall the design is solid and the type-safety story (phantom capabilities, Peano depth, DepthLTE compile-time checks) is well executed. The main concerns are Budget edge cases, missing weight validation in Utility, and the runtime-context issue in claudeTransform.

…rmissions Built on top of the squash-merged DSL core (#33). Adds: - Safe-by-default tool access (no tools unless opted in) - Structured output with StructuredOutput.derive + cross-provider (Claude + Codex) - README rewrite leading with DSL, not SDK wrapper - Directory-scoped agents with PreToolUse hook enforcement - PermissionMode.Auto for model-classified tool approvals - HookOutput.ToolPermission for per-tool-call permission decisions - BuilderConfig refactor replacing 5-param agentTransform - Codex sandboxedBuilder, typedBuilder, and SDK parity updates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rmissions (#34) Built on top of the squash-merged DSL core (#33). Adds: - Safe-by-default tool access (no tools unless opted in) - Structured output with StructuredOutput.derive + cross-provider (Claude + Codex) - README rewrite leading with DSL, not SDK wrapper - Directory-scoped agents with PreToolUse hook enforcement - PermissionMode.Auto for model-classified tool approvals - HookOutput.ToolPermission for per-tool-call permission decisions - BuilderConfig refactor replacing 5-param agentTransform - Codex sandboxedBuilder, typedBuilder, and SDK parity updates Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arcaputo3 and others added 16 commits April 3, 2026 13:09

feat(dsl): add explainable scoring and gated semantic review

21fdd97

feat(dsl): add classified review segregation

bc7868f

feat(dsl): add clandestine cell zero-trust example

2f53846

Demonstrates capability-segregated agent cells with need-to-know communication, classified review routing, and explainable scoring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arcaputo3 merged commit 709b4cc into main Apr 6, 2026
3 checks passed

arcaputo3 deleted the dsl/core-exploration branch April 6, 2026 21:56

arcaputo3 mentioned this pull request Apr 8, 2026

chore(release): Bump version to 0.6.0 #36

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dsl): provider-independent DSL with type-safe capabilities#33

feat(dsl): provider-independent DSL with type-safe capabilities#33
arcaputo3 merged 18 commits intomainfrom
dsl/core-exploration

arcaputo3 commented Apr 6, 2026

Uh oh!

claude bot commented Apr 6, 2026

Uh oh!

claude bot commented Apr 6, 2026

Uh oh!

claude bot commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arcaputo3 commented Apr 6, 2026

Summary

Test plan

Uh oh!

claude bot commented Apr 6, 2026

PR Review: feat(dsl): provider-independent DSL with type-safe capabilities

🔴 Critical

1. Race condition in ClaudeInterpreter.scala — shared state without synchronization

🟠 High

2. Budget subtraction semantics are incorrect for the Unlimited - Unlimited case

🟡 Medium

3. AgentBuilder.withReadOnlyTools — validation happens after combining

4. ToolSurface.++ doesn't deduplicate

5. isReadOnlyCompatible validates the allowlist, not the tools themselves

6. TypedAgent.delegateTyped — runtime check direction is confusing

7. CodexClient.jsToJson silently swallows errors

8. AgenticReview.enrich — permit consumed even if review fails

🔵 Low / Suggestions

9. Missing experimental module documentation

10. Utility.weightedNamed vs Utility.weighted — redundant APIs

11. ExecutionPolicy.childPolicy appears unused

12. CodexClient — Double.toInt on exit codes

13. Test coverage gaps

✅ What's Well Done

Uh oh!

claude bot commented Apr 6, 2026

Code Review: Provider-Independent DSL Layer

Security / Critical

Bugs

Design Issues

Minor / Quality

Test Coverage

Uh oh!

claude bot commented Apr 6, 2026

Code Review: Provider-independent DSL with type-safe capabilities

Architecture — overall impression

Issues worth fixing before merge

1. Budget.usd should guard against NaN/Infinity (Budget.scala)

2. Utility.weightedNamed doesn't normalize weights (Utility.scala)

3. Runtime.default inside claudeTransform creates a detached runtime (ClaudeInterpreter.scala)

4. Agent.run hardcodes R=Any (Agent.scala)

Design questions

5. Ref.unsafe.make in ClaudeInterpreter.make

6. parseEvent fallthrough in CodexClient.scala discards unknown events

7. CanReadMemory / CanWriteMemory defined but not wired

Minor observations

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PR Review: `feat(dsl): provider-independent DSL with type-safe capabilities`

1. Race condition in `ClaudeInterpreter.scala` — shared state without synchronization

2. `Budget` subtraction semantics are incorrect for the `Unlimited - Unlimited` case

3. `AgentBuilder.withReadOnlyTools` — validation happens after combining

4. `ToolSurface.++` doesn't deduplicate

5. `isReadOnlyCompatible` validates the allowlist, not the tools themselves

6. `TypedAgent.delegateTyped` — runtime check direction is confusing

7. `CodexClient.jsToJson` silently swallows errors

8. `AgenticReview.enrich` — permit consumed even if review fails

10. `Utility.weightedNamed` vs `Utility.weighted` — redundant APIs

11. `ExecutionPolicy.childPolicy` appears unused

12. `CodexClient` — `Double.toInt` on exit codes

1. `Budget.usd` should guard against `NaN`/`Infinity` (`Budget.scala`)

2. `Utility.weightedNamed` doesn't normalize weights (`Utility.scala`)

3. `Runtime.default` inside `claudeTransform` creates a detached runtime (`ClaudeInterpreter.scala`)

4. `Agent.run` hardcodes `R=Any` (`Agent.scala`)

5. `Ref.unsafe.make` in `ClaudeInterpreter.make`

6. `parseEvent` fallthrough in `CodexClient.scala` discards unknown events

7. `CanReadMemory` / `CanWriteMemory` defined but not wired