feat(dsl): provider-independent DSL with type-safe capabilities#33
feat(dsl): provider-independent DSL with type-safe capabilities#33
Conversation
…preter Add a semantic DSL core (`com.tjclp.scalagent.core`) that sits above the existing Claude-specific API without breaking it. This includes: - Agent[-P, -I, +O] trait with principal, input, and policy parameters - AgentRun[-R, +O] with shared event stream and typed result - AgentEvent enum (8 cases) normalizing provider events with Native escape hatch - ExecutionPolicy, Budget, StopStrategy, FallbackPolicy for semantic constraints - OutputCodec[O] type class bridging String and StructuredOutput[A] - RunSummary for terminal event metadata ClaudeInterpreter bridges core Agent to existing ClaudeAgent runtime using Queue+Promise shared execution (single provider call, both events and result consumable). EventMapper provides pure AgentMessage → List[AgentEvent] mapping. Design docs at docs/dsl/ cover foundations, type mapping, examples, protocol boundaries, and phased roadmap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce a capability system using Scala 3 intersection types and Peano-encoded delegation depth for compile-time enforcement: - Capability marker traits: CanUseTools[T], CanSpawn[D], CanReadMemory, CanWriteMemory, CanEscalateHuman, HasBudget - Peano depth types (Z, S[N]) with DepthLTE for compile-time comparison - TypedAgent[P, I, O, C] wraps Agent with phantom capability evidence - AgentBuilder accumulates capabilities via intersection type growth - HasSpawn/HasToolsCap type classes extract evidence from intersections - delegate() gated by HasSpawn — compile error without CanSpawn - delegateTyped() enforces DepthLTE at compile time — equal or deeper child depth is a type error, not just a runtime assertion - DelegationPolicy for budget slicing and turn limits - ToolSurface for typed tool collections (composes with ++) - ClaudeInterpreter.builder() provides agentTransform that wires tool restrictions into AgentOptions at runtime Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make A2A and MCP first-class citizens in the DSL vocabulary: A2A (horizontal coordination): - A2ARemoteAgent trait extends Agent — remote agents are seamlessly usable as delegation targets, in AgentBuilder, or called directly - A2AInterpreter wraps A2AClient as A2ARemoteAgent (consumer) - A2AServerAdapter wraps any Agent as an A2AEndpoint (producer) - A2AEventMapper provides bidirectional StreamEvent ↔ AgentEvent mapping - CanDelegateA2A capability marker MCP (vertical capability access): - McpToolSurface wraps tools with server provenance, converts to ToolSurface - McpResource, McpResourceSurface for URI-addressed data sources - McpPrompt, McpPromptSurface for parameterized prompt templates - HasMcpTools, HasMcpResources, HasMcpPrompts capability markers - McpToolLoader bridges existing McpServer to DSL tool surfaces Note: MCP resources and prompts are forward-looking core types — the Claude Agent SDK currently only exposes tools. Implementations will be added when the SDK supports them. Also: unseal Capability trait so protocol packages can extend it. AgentBuilder gains withMcpTools, withMcpResources, withMcpPrompts, withA2ADelegation methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rovements Experimental capture checking (Phase 7): - FileSandbox, BudgetSlice, SpawnPermit extend SharedCapability - SandboxedRun provides scoped callbacks where capabilities can't escape - Compiler rejects closures that capture capabilities beyond scope - FileSandbox validates paths at runtime (no directory traversal) - CaptureCheckingExample demonstrates all three capability types - CaptureCheckSpec verifies compile-time rejection via typeChecks Runtime enforcement improvements (from agent review): - ToolSurface now tracks allowedTools list for provider enforcement - withReadOnlyTools validates tools are read-only compatible - ClaudeInterpreter.claudeTransform applies distinctAllowedTools - A2AInterpreter uses SharedRun pattern (single stream, both events and result consumable) - A2AServerAdapter enhanced with full request handling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add zio-blocks-scope 0.0.33 as dependency and create ScopedCapabilities that wraps FileSandbox, BudgetSlice, SpawnPermit as scope-managed Resources. This provides a stable alternative to experimental capture checking: - $[A] opaque types prevent capabilities from escaping scope boundaries - Unscoped type class gates what values can leave (no instance for capabilities = compile error if you try to return them) - $ operator macro validates lambda bodies (no capture in closures) - Zero runtime cost (opaque types erased) Two approaches now coexist in experimental/: - Capabilities.scala: real capture checking (SharedCapability + ^) - ScopedCapabilities.scala: zio-blocks/scope ($[A] + Unscoped + macros) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Make FileSandbox, BudgetSlice, SpawnPermit constructors private[experimental] — userland code must go through SandboxedRun or ScopedCapabilities factories - Fix path traversal: use nodePath.relative instead of startsWith to catch sibling-prefix escapes (e.g., /safe/../safe-evil/secret.txt) - Add sibling-prefix escape tests proving the naive startsWith approach is vulnerable but resolveSafe catches it - Add compile-time tests verifying direct instantiation is rejected Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 6: Utility and Evaluation — making "good agent" measurable. Core types: - Utility[-P, -O]: observer-dependent scoring trait with built-in implementations (costMinimizing, reliability, latencyMinimizing, simplicityBiased, weighted composite) - TraceSummary: rich trace data foldable from AgentEvent streams (tool counts, delegation counts, timing, cost, tool names) - Complexity: execution graph metrics derived from TraceSummary (totalNodes, toolCallNodes, delegationNodes, graphDensity) - Evaluation[P, O]: bundles principal, output, trace, complexity, and score into a single inspectable result - TraceLogger: composable logging to arbitrary sinks (console, callback/JSONL, fan-out via all(), noop) Grounded in the formalization paper: - U_ω(α) = E[P(accept | α(x))] → Utility trait - C(α) = E[|G(x)| | x] → Complexity metrics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Captures the next steps after completing all 7 roadmap phases: PR/review, docs update, integration testing, ContextKernel, SDK parity, Conversation DSL. Lists known gaps and key patterns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 3 runnable DSL examples (dsl-basic, dsl-builder, dsl-delegation) demonstrating one-shot/streaming, builder+evaluation, and typed delegation - withTools/withMcpTools now produce CanUseTools[CustomTools] instead of AllTools — honest about what the surface actually declares - Add TraceLogger.callbackZIO for effectful logging without unsafe nesting - Rewrite EXAMPLES.md with 11 examples using actual implemented API - Update NEXT.md with complete commit table and current branch state - Add TraceLoggerSpec, update TypedAgentSpec for CustomTools Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Second interpreter backed by OpenAI Codex SDK (@openai/codex-sdk@0.118.0): - codex/CodexSdk.scala: JS facades for Codex, Thread, ThreadEvent, ThreadItem - codex/CodexOptions.scala: SandboxMode, ApprovalMode, client/thread options - codex/CodexClient.scala: Pure Scala wrapper + CodexEvent/CodexItem ADTs - interop/codex/CodexEventMapper.scala: 8 item types → AgentEvent normalization - interop/codex/CodexInterpreter.scala: Agent impl with SharedRun + builder Zero changes to core/ — same Agent, AgentRun, AgentEvent, ExecutionPolicy, AgentBuilder, TypedAgent, TraceSummary, Utility, Evaluation types work identically across Claude and Codex interpreters. 43 test suites, 0 failures (16 new Codex event mapper tests, 7 options tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DslCodexExample: one-shot, builder, evaluation via Codex interpreter - DslCrossProviderExample: Claude ↔ Codex chain (same DSL, two providers) - ExampleRunner: runtime dispatcher — all examples linked once, selected at runtime via EXAMPLE env var. Fixes --example flag which was broken because Mill evaluates Task.Input before Task.Command bodies run. - examples.go now passes --example as EXAMPLE env to bun subprocess Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mill ScalaJS run() drops CLI args entirely. Override run to pass the first arg as EXAMPLE env var to bun, where ExampleRunner reads it. ./mill examples.run dsl-basic # run specific example ./mill examples.run -- --help # list all 19 examples ./mill examples.run # default (macro) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ROADMAP.md: mark all 8 phases completed with commit refs - NEXT.md: 12 commits, 43 test suites, add Running Examples section - EXAMPLES.md: add Codex + cross-provider examples (12-13), running section - MAPPING.md: add Codex type inventory and interpreter table - README.md: add examples.run commands, update project structure - CLAUDE.md: add examples section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Demonstrates capability-segregated agent cells with need-to-know communication, classified review routing, and explainable scoring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR Review:
|
Critical fixes: - Replace var state with Ref for atomic SharedState transitions in all three interpreters (Claude, A2A, Codex), preventing duplicate provider runs when events and result are consumed concurrently - Harden FileSandbox.resolveSafe with realpathSync to reject symlink escapes - Fail resultPromise on unsuccessful A2A/Codex completions instead of surfacing error text as successful output Additional fixes: - Add explicit Unlimited-Unlimited case in Budget subtraction - Deduplicate ToolSurface.++ by tool name; sync filter() with allowedTools - Wrap permit.consume() in ZIO for proper effect sequencing - Log CodexClient.jsToJson conversion failures to stderr - Remove unused ExecutionPolicy.childPolicy - Add documentation for experimental package, isReadOnlyCompatible, withReadOnlyTools injection, withTools MCP caveat, depth check, Utility.weighted, and exit code assumption Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code Review: Provider-Independent DSL LayerThis is a substantial, well-designed PR. The core ideas — phantom-type capability tracking, Peano-encoded delegation depth, provider-independent Security / Critical
// BudgetSlice.spend()
_remaining -= amount // not atomic
// ReviewPermit.consume()
_remaining -= 1 // not atomicBoth classes are
catch
case e: SecurityException => throw e
case _: Throwable => resolved // root doesn't exist yet; lexical check passedIf the root directory is unreadable (permission denied on
private val nodePath = scala.scalajs.js.Dynamic.global.require("node:path")
private val nodeFs = scala.scalajs.js.Dynamic.global.require("node:fs")This is Scala.js code that requires a Node.js runtime. The code will throw at construction time (not compile time) in any non-Node environment (browser, JVM, Deno). This should be documented prominently in the class Scaladoc, or the constructor should call Bugs
// Line 75-76
require(
child.maxRuntimeDepth < maxRuntimeDepth,
s"Child depth ${child.maxRuntimeDepth} must be < parent depth $maxRuntimeDepth"
)The type-level
_ <- runner.forkDaemon // no cleanup semanticsThe
class MapperState:
var lastAgentMessage: Option[String] = None
var startTimeMs: Long = System.currentTimeMillis()
Design Issues
def run(principal: P, input: I, policy: ExecutionPolicy): AgentRun[Any, O]The
case Retry(maxAttempts) => require(maxAttempts > 0, "maxAttempts must be positive")
// and in ExecutionPolicy constructor:
maxTurns.foreach(t => require(t > 0, "maxTurns must be positive"))
maxChildTurns: Option[Int] = None
case DelegationStarted(childId: String, label: String)
case DelegationFinished(childId: String, status: String, summary: Option[RunSummary])
The Minor / Quality
Test CoverageThe test suite is thorough for compile-time type checking (negative tests via
Overall this is high-quality exploratory work. The type-level safety story (Peano depth, phantom capabilities, |
…sed cancellation - AgentBuilder now tracks McpToolSurface list; agentTransform receives it - ClaudeInterpreter.claudeTransform registers ToolDef handlers through an implicit local MCP server (scalagent_dsl_local_tools), fixing the P2 issue where withTools only forwarded names without runtime handlers - ToolSurface.fromDefs derives MCP-qualified allowlist names so provider tool resolution matches the registered server - ToolSurface.filter uses McpToolName-aware matching to correctly drop filtered tools even when allowedTools use qualified MCP names - McpToolName.toToolName normalizes known names to built-in ToolName cases so read-only classification works for MCP tool surfaces - ToolName.isReadOnly extended with McpResolveLibraryId, McpGetLibraryDocs - All three interpreters: forkDaemon → forkScoped with onInterrupt cleanup, ensuring scope closure cancels the underlying provider run - A2AServerAdapter: track active runs by taskId, cancelHandler interrupts the fiber instead of silently finishing - New tests: scope-based interruption (Claude, A2A), local MCP server registration (ClaudeBuilder), MCP tool name normalization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code Review: Provider-independent DSL with type-safe capabilitiesThis is a substantial, well-designed PR. The core ideas are sound and the implementation is generally clean. Below is a mix of blocking issues, design questions, and minor observations. Architecture — overall impressionThe phantom-type capability system ( Issues worth fixing before merge1.
|
| Severity | Item |
|---|---|
| Fix | Budget.usd should guard against NaN/Infinity |
| Fix | Utility.weightedNamed — document or enforce weight normalization |
| Fix | Runtime.default in claudeTransform creates detached runtime |
| Clarify | Agent.run R=Any hardcoding — intentional or oversight? |
| Minor | CanReadMemory/CanWriteMemory stubs need TODO annotations |
| Minor | Stale branch reference in EXAMPLES.md |
| Minor | --no-server removal from README needs explanation |
Overall the design is solid and the type-safety story (phantom capabilities, Peano depth, DepthLTE compile-time checks) is well executed. The main concerns are Budget edge cases, missing weight validation in Utility, and the runtime-context issue in claudeTransform.
…rmissions Built on top of the squash-merged DSL core (#33). Adds: - Safe-by-default tool access (no tools unless opted in) - Structured output with StructuredOutput.derive + cross-provider (Claude + Codex) - README rewrite leading with DSL, not SDK wrapper - Directory-scoped agents with PreToolUse hook enforcement - PermissionMode.Auto for model-classified tool approvals - HookOutput.ToolPermission for per-tool-call permission decisions - BuilderConfig refactor replacing 5-param agentTransform - Codex sandboxedBuilder, typedBuilder, and SDK parity updates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rmissions Built on top of the squash-merged DSL core (#33). Adds: - Safe-by-default tool access (no tools unless opted in) - Structured output with StructuredOutput.derive + cross-provider (Claude + Codex) - README rewrite leading with DSL, not SDK wrapper - Directory-scoped agents with PreToolUse hook enforcement - PermissionMode.Auto for model-classified tool approvals - HookOutput.ToolPermission for per-tool-call permission decisions - BuilderConfig refactor replacing 5-param agentTransform - Codex sandboxedBuilder, typedBuilder, and SDK parity updates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rmissions Built on top of the squash-merged DSL core (#33). Adds: - Safe-by-default tool access (no tools unless opted in) - Structured output with StructuredOutput.derive + cross-provider (Claude + Codex) - README rewrite leading with DSL, not SDK wrapper - Directory-scoped agents with PreToolUse hook enforcement - PermissionMode.Auto for model-classified tool approvals - HookOutput.ToolPermission for per-tool-call permission decisions - BuilderConfig refactor replacing 5-param agentTransform - Codex sandboxedBuilder, typedBuilder, and SDK parity updates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rmissions (#34) Built on top of the squash-merged DSL core (#33). Adds: - Safe-by-default tool access (no tools unless opted in) - Structured output with StructuredOutput.derive + cross-provider (Claude + Codex) - README rewrite leading with DSL, not SDK wrapper - Directory-scoped agents with PreToolUse hook enforcement - PermissionMode.Auto for model-classified tool approvals - HookOutput.ToolPermission for per-tool-call permission decisions - BuilderConfig refactor replacing 5-param agentTransform - Codex sandboxedBuilder, typedBuilder, and SDK parity updates Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Introduces a provider-independent DSL layer for Scalagent that decouples agent definitions from any specific LLM provider (Claude, Codex, etc.).
TypedAgent[In, Out],Capability,Budget,Depth(Peano-encoded delegation limits),ExecutionPolicy,Utilityscoring,TraceLogger/TraceSummary,Classification-gated reviewClaudeInterpreter,CodexInterpreter,A2AInterpreter— same agent definition runs on any backendMcpSurfacetool loadingexperimental),ScopedCapabilitieswith runtime enforcement,AgenticReviewwith explainable scoring and classified segregationAgentBuilderfor constructing agents without boilerplatedsl-basic,dsl-builder,dsl-delegation,dsl-review,dsl-cells,dsl-codex,dsl-cross,capture)docs/dsl/with foundations, protocols, mapping guide, examples, roadmap, and next stepsTest plan
./mill examples.compile— all examples compile./mill agent.test— full test suite passes./mill examples.run dsl-basicend-to-end with API key./mill examples.run dsl-codexwith Codex API keydocs/dsl/for accuracy🤖 Generated with Claude Code