Prepare Wardwright 0.0.11 release#73
Conversation
Reviewer's GuidePrepares the Wardwright 0.1.0 stable release by introducing a Wardwright-hosted server-tool framework for non-streaming Chat Completions (built-in policy cache status, trusted Dune functions, and trusted BEAM modules), enriching tool-context metadata with execution location/visibility, updating the OpenAPI contract and docs to the 0.1.0 line and honest framework claims, and wiring the new server-tools path into the completion pipeline alongside version and CI workflow updates. Sequence diagram for non-streaming Chat Completions with Wardwright-hosted server toolssequenceDiagram
actor Client
participant Router as Wardwright.Router
participant ServerTools as Wardwright.ServerTools
participant Core as Wardwright
participant Provider
Client->>Router: POST /v1/chat/completions
Router->>ServerTools: complete_selected_model(selected_model, request, config)
alt tools disabled or stream==true
ServerTools->>Core: complete_selected_model(selected_model, request, config)
Core-->>ServerTools: first_response
ServerTools-->>Router: first_response
else tools enabled and non-streaming
ServerTools->>ServerTools: configured_tools(config)
ServerTools->>ServerTools: inject_tools(request, tools)
ServerTools->>Core: complete_selected_model(selected_model, request_with_tools, config)
Core-->>ServerTools: first_response (tool_calls[])
alt matching server tool requested
ServerTools->>ServerTools: execute_tool(tool_call, tool, request, config)
opt builtin tool
ServerTools->>Wardwright.PolicyCache: status()
Wardwright.PolicyCache-->>ServerTools: policy_cache_status
end
opt dune tool
ServerTools->>Wardwright.PolicySandbox.Dune: eval_snippet(source, input, limits)
Wardwright.PolicySandbox.Dune-->>ServerTools: result
end
opt beam_module tool
ServerTools->>BeamModule: run(arguments, %{config, request})
BeamModule-->>ServerTools: result | {:error, reason}
end
ServerTools->>Core: complete_selected_model(selected_model, followup_request, config)
Core-->>ServerTools: second_response
ServerTools->>ServerTools: add_server_tool_metadata(second_response, first_response, execution)
ServerTools-->>Router: final_response_with_metadata
else no matching server tool
ServerTools-->>Router: first_response
end
end
Router-->>Client: JSON completion with provider_metadata.wardwright_server_tools[]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The server-tool normalization logic is split between
Wardwright.normalize_server_tools/1andWardwright.ServerTools.configured_tools/1(including separatepresent?/1and engine inference branches); consider consolidating this into a single shared normalization path to avoid drift between config serialization and runtime behavior. - In
Wardwright.ServerTools, the module loading paths (load_tool_path/1,select_tool_module/2,compile_erlang_tool/1,load_beam_tool/1) have several failure modes that are collapsed into generic{:error, reason}tuples; consider adding structured error tagging or logging so operators can more easily diagnose why a given BEAM server tool failed to load or was ignored.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The server-tool normalization logic is split between `Wardwright.normalize_server_tools/1` and `Wardwright.ServerTools.configured_tools/1` (including separate `present?/1` and engine inference branches); consider consolidating this into a single shared normalization path to avoid drift between config serialization and runtime behavior.
- In `Wardwright.ServerTools`, the module loading paths (`load_tool_path/1`, `select_tool_module/2`, `compile_erlang_tool/1`, `load_beam_tool/1`) have several failure modes that are collapsed into generic `{:error, reason}` tuples; consider adding structured error tagging or logging so operators can more easily diagnose why a given BEAM server tool failed to load or was ignored.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces a server-side tool framework to Wardwright, enabling models to execute built-in tools, Dune snippets, and custom BEAM modules. It also updates the project version to v0.1.0 and adds execution_location and visibility_level metadata to the tool context. Feedback on the new ServerTools module highlights several critical issues: the tool loop only handles the first tool call, violating the OpenAI API contract for multiple calls; the BEAM module loader suffers from performance bottlenecks and a bug where tools fail to load on subsequent requests due to Code.require_file behavior; and the use of JSON.encode! on tool results risks runtime crashes when encountering atoms.
There was a problem hiding this comment.
Pull request overview
Prepares the Wardwright 0.1.0 stable release line by introducing a minimal Wardwright-hosted server-tool execution surface for non-streaming Chat Completions, extending tool-context provenance metadata, and updating contracts/docs/workflows/version strings from RC/draft wording to 0.1.0.
Changes:
- Add Wardwright-hosted server tools for non-streaming Chat Completions (builtin + trusted local Dune + trusted local BEAM modules) and wire the router to use this path.
- Extend tool-context normalization to include
execution_locationandvisibility_level, and update tests/contracts accordingly. - Update OpenAPI metadata, release docs, and workflow pins for the
0.1.0release line.
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Updates install/version references to v0.1.0. |
| docs/vision.md | Updates status banner to v0.1.0 wording. |
| docs/tutorial-news-monitor-agent.md | Updates validation notes and framework evidence wording for 0.1.0. |
| docs/tool-context-policy.md | Documents Wardwright-hosted server tools and execution/visibility semantics. |
| docs/ralph-runs/framework-adapter-validation-loop-supervisor.md | Adds post-RC evidence log slice for .NET + streaming probes. |
| docs/provider-credentials.md | Updates release-line wording from RC to stable. |
| docs/packaging.md | Updates packaging/version guidance and documents server-tool slice. |
| docs/index.md | Updates docs landing status banner for v0.1.0. |
| docs/framework-adapters.md | Updates framework adapter evidence/limits wording for 0.1.0. |
| docs/feature-spikes.md | Adds Wardwright-hosted server tools spike entry and guardrails. |
| docs/agent-authoring.md | Updates “release line” wording. |
| docs/agent-adapters.md | Updates adapter docs to 0.1.0 release line. |
| contracts/tool-context-policy-contract.md | Adds execution_location and visibility_level fields to the contract doc. |
| contracts/openapi.yaml | Bumps API version to 0.1.0 and adds server_tools / ServerTool schema. |
| app/test/tool_context_test.exs | Asserts new tool-context fields in normalized output. |
| app/test/stream_provider_transport_test.exs | Adds end-to-end tests for builtin/Dune/BEAM server tools. |
| app/test/gleam_policy_core_test.exs | Adds Gleam reference assertions for execution/visibility helpers. |
| app/test/agent_adapter_recording_test.exs | Updates adapter version assertions and TTL defaults in helper. |
| app/test/agent_adapter_identity_test.exs | Updates adapter version assertions to 0.1.0. |
| app/test_support/router_case.ex | Extends test provider to simulate server-tool calls/results. |
| app/src/wardwright/tool_context_core.gleam | Implements execution/visibility derivation helpers in Gleam core. |
| app/src/wardwright/elixir_reference/tool_context_core_reference.exs | Mirrors Gleam helpers in Elixir reference implementation. |
| app/mix.exs | Sets application version to 0.1.0. |
| app/lib/wardwright/tool_context.ex | Adds execution_location and visibility_level normalization. |
| app/lib/wardwright/server_tools/behaviour.ex | Introduces behaviour for trusted BEAM server tools (spec/0, run/2). |
| app/lib/wardwright/server_tools.ex | Implements server-tool registry, tool injection, one-loop execution, and receipt metadata. |
| app/lib/wardwright/router.ex | Routes non-streaming completions through Wardwright.ServerTools.complete_selected_model/3. |
| app/lib/wardwright/agent_adapters/pi_pack.ex | Updates adapter version constant to 0.1.0. |
| app/lib/wardwright/agent_adapters/omp_pack.ex | Updates adapter version constant to 0.1.0. |
| app/lib/wardwright/agent_adapters/claude_code_pack.ex | Updates adapter version constant to 0.1.0. |
| app/lib/wardwright.ex | Normalizes server_tools config into the public model config surface. |
| .github/workflows/wardwright-release.yml | Pins GitHub Actions to specific SHAs for reproducible release builds. |
Add request-side tool mediation controls
Scope
Prepares the next public Wardwright release as
0.0.11instead of0.1.0.This branch now includes:
0.0.11API/MCP proof run
The new smoke activates a canned Wardwright model through
POST /v1/policy-authoring/wardwright-models, calls it through/v1/chat/completions, capturesx-wardwright-receipt-id, discovers MCP tools throughtools/list, and loads the resulting trace throughload_control_debugger_trace.This proves the core authoring/debugging loop can be driven by agents over API/MCP. It does not claim every HTTP scenario-management endpoint is exposed as MCP, and it does not claim native framework state or exact replay fidelity beyond the tested surfaces.
Validation
Local validation:
cd app && MIX_ENV=test mise exec -- mix test --no-compile test/mcp_authoring_test.exs test/agent_adapter_identity_test.exs test/agent_adapter_recording_test.exs test/local_gemma_authoring_recipe_test.exs test/jido_adapter_smoke_test.exs-> 26 passedmise run check:docs-> passedmise check-> 458 passed, 6 excluded; docs/map/style/type/browser checks passedmise run package:smoke:darwin-arm64-> Burrito binary built and printed0.0.11GitHub validation on the current head:
Claim limits