Skip to content

Prepare Wardwright 0.0.11 release#73

Merged
bglusman merged 28 commits into
mainfrom
release/docs-and-version-prep
May 26, 2026
Merged

Prepare Wardwright 0.0.11 release#73
bglusman merged 28 commits into
mainfrom
release/docs-and-version-prep

Conversation

@bglusman
Copy link
Copy Markdown
Owner

@bglusman bglusman commented May 24, 2026

Scope

Prepares the next public Wardwright release as 0.0.11 instead of 0.1.0.

This branch now includes:

  • release/package/docs retargeting to 0.0.11
  • Jido framework dogfood smoke coverage and the local Gemma authoring dogfood recipe
  • the framework-adapter recipe/support documentation from the release-prep branch
  • a release smoke proving a real HTTP API plus MCP authoring/debugging loop without UI scraping
  • CI hardening for the secret-scan checkout range after GitHub Actions runner-token fetch failures

API/MCP proof run

The new smoke activates a canned Wardwright model through POST /v1/policy-authoring/wardwright-models, calls it through /v1/chat/completions, captures x-wardwright-receipt-id, discovers MCP tools through tools/list, and loads the resulting trace through load_control_debugger_trace.

This proves the core authoring/debugging loop can be driven by agents over API/MCP. It does not claim every HTTP scenario-management endpoint is exposed as MCP, and it does not claim native framework state or exact replay fidelity beyond the tested surfaces.

Validation

Local validation:

  • cd app && MIX_ENV=test mise exec -- mix test --no-compile test/mcp_authoring_test.exs test/agent_adapter_identity_test.exs test/agent_adapter_recording_test.exs test/local_gemma_authoring_recipe_test.exs test/jido_adapter_smoke_test.exs -> 26 passed
  • mise run check:docs -> passed
  • mise check -> 458 passed, 6 excluded; docs/map/style/type/browser checks passed
  • commit hooks and pre-push hooks -> app tests/docs/gitleaks/mise check passed
  • mise run package:smoke:darwin-arm64 -> Burrito binary built and printed 0.0.11

GitHub validation on the current head:

  • app -> passed
  • docs -> passed
  • packaged smoke -> passed
  • gitleaks -> passed
  • zizmor -> passed
  • aggregate test -> passed

Claim limits

  • Framework adapters remain recipes/smokes, not published native framework packages.
  • Jido is dogfooded in-app, but broader Jido native tool/stream/state fidelity is not claimed.
  • Some scenario-management operations remain HTTP-only; MCP covers the core authoring/debugging loop and exposed tools documented in the agent authoring guide.
  • The Darwin ARM64 artifact was built and smoked locally; release tags/Homebrew publication are not done by this PR itself.

Copilot AI review requested due to automatic review settings May 24, 2026 22:54
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 24, 2026

Reviewer's Guide

Prepares the Wardwright 0.1.0 stable release by introducing a Wardwright-hosted server-tool framework for non-streaming Chat Completions (built-in policy cache status, trusted Dune functions, and trusted BEAM modules), enriching tool-context metadata with execution location/visibility, updating the OpenAPI contract and docs to the 0.1.0 line and honest framework claims, and wiring the new server-tools path into the completion pipeline alongside version and CI workflow updates.

Sequence diagram for non-streaming Chat Completions with Wardwright-hosted server tools

sequenceDiagram
  actor Client
  participant Router as Wardwright.Router
  participant ServerTools as Wardwright.ServerTools
  participant Core as Wardwright
  participant Provider

  Client->>Router: POST /v1/chat/completions
  Router->>ServerTools: complete_selected_model(selected_model, request, config)
  alt tools disabled or stream==true
    ServerTools->>Core: complete_selected_model(selected_model, request, config)
    Core-->>ServerTools: first_response
    ServerTools-->>Router: first_response
  else tools enabled and non-streaming
    ServerTools->>ServerTools: configured_tools(config)
    ServerTools->>ServerTools: inject_tools(request, tools)
    ServerTools->>Core: complete_selected_model(selected_model, request_with_tools, config)
    Core-->>ServerTools: first_response (tool_calls[])
    alt matching server tool requested
      ServerTools->>ServerTools: execute_tool(tool_call, tool, request, config)
      opt builtin tool
        ServerTools->>Wardwright.PolicyCache: status()
        Wardwright.PolicyCache-->>ServerTools: policy_cache_status
      end
      opt dune tool
        ServerTools->>Wardwright.PolicySandbox.Dune: eval_snippet(source, input, limits)
        Wardwright.PolicySandbox.Dune-->>ServerTools: result
      end
      opt beam_module tool
        ServerTools->>BeamModule: run(arguments, %{config, request})
        BeamModule-->>ServerTools: result | {:error, reason}
      end
      ServerTools->>Core: complete_selected_model(selected_model, followup_request, config)
      Core-->>ServerTools: second_response
      ServerTools->>ServerTools: add_server_tool_metadata(second_response, first_response, execution)
      ServerTools-->>Router: final_response_with_metadata
    else no matching server tool
      ServerTools-->>Router: first_response
    end
  end
  Router-->>Client: JSON completion with provider_metadata.wardwright_server_tools[]
Loading

File-Level Changes

Change Details Files
Introduce Wardwright-hosted server-tools framework and integrate it into non-streaming Chat Completions.
  • Add Wardwright.ServerTools module to inject configured server tools into chat-completions requests, run a single model-requested server-tool call, and merge execution metadata back into provider_metadata and latency accounting.
  • Support three server-tool engines: a builtin wardwright_policy_cache_status tool, trusted local Dune function tools (inline source or snippet_id with limit options), and trusted BEAM module tools loaded from .ex/.exs/.erl/.beam paths via a common spec/0 and run/2 behaviour.
  • Normalize server_tools configuration in Wardwright config normalization so tools can be declared as strings, objects with engines, or with implicit engine inference; accept elixir_module as an alias for beam_module and filter out disabled/invalid tools.
  • Expose Wardwright.ServerTools.Behaviour for BEAM extensions and ensure BEAM tools are loaded safely (require_file, Erlang compile, or beam load) and module selection favors configured module names with run/2.
  • Update router to call Wardwright.ServerTools.complete_selected_model so the server-tool loop wraps the existing completion path for non-streaming calls only.
app/lib/wardwright/server_tools.ex
app/lib/wardwright/server_tools/behaviour.ex
app/lib/wardwright.ex
app/lib/wardwright/router.ex
Extend tool-context policy metadata with execution location and visibility level, and wire through core and reference implementations.
  • Add execution_location and visibility_level fields to tool_context contract and normalization, deriving them from tool namespace/source via new Gleam core functions.
  • Update Wardwright.ToolContext.normalize to compute primary_tool, execution_location, and visibility_level for both caller-metadata and inferred contexts, delegating to :wardwright@tool_context_core.
  • Implement execution_location/visibility_level in Gleam tool_context_core and keep Elixir reference implementation in sync; add tests to assert parity.
  • Extend tool_context tests to assert new fields for client tools, and add scenarios covering future wardwright_hosted and provider-declared tools.
contracts/tool-context-policy-contract.md
app/lib/wardwright/tool_context.ex
app/src/wardwright/tool_context_core.gleam
app/src/wardwright/elixir_reference/tool_context_core_reference.exs
app/test/tool_context_test.exs
app/test/gleam_policy_core_test.exs
Add test and test-support coverage for the new server-tool behavior against the OpenAI-compatible test provider.
  • Extend the streaming provider test harness to recognize server tool initiation/result loops, emitting tool_calls for wardwright_policy_cache_status, dune_echo_tool, and beam_reverse_tool, and returning distinct final assistant messages once tool results are sent back.
  • Add integration tests that configure server_tools in unit_policy_config, seed the policy cache, and assert that non-streaming /v1/chat/completions calls trigger Wardwright-hosted server tools and record wardwright_server_tools metadata with expected fields.
  • Add tests for Dune server tools that define an echo tool with inline source and for BEAM module tools that are written to a temporary .exs file implementing the Wardwright.ServerTools.Behaviour contract, verifying result_metadata echoes and reversals.
app/test/stream_provider_transport_test.exs
app/test_support/router_case.ex
Update OpenAPI contract and documentation to describe server_tools, clarify framework adapter claims, and move from 0.1.0-rc.1 to the 0.1.0 release line.
  • Extend the OpenAPI config/test schemas with a server_tools array and a ServerTool schema (engine, name, parameters, source/snippet_id/input, module/path, enabled, execution_location, visibility_level), and add allowed_tools to GovernanceRule kinds.
  • Update tool-context and feature-spikes documentation to describe the new Wardwright-hosted server-tool surface, including execution_location/visibility semantics and explicit limits (non-streaming only, no sandbox, no broad hidden tools).
  • Refresh framework-adapter docs, tutorial, vision, provider-credentials, packaging, and site index to reference the 0.1.0 release instead of 0.0.10/0.1.0-rc.1; incorporate results from post-RC Proxmox/.NET and streaming checks, and keep claims conservative about streaming, tool fidelity, and state.
  • Add a Ralph run log describing the post-RC framework gap mitigation (Proxmox LXC, NuGet smoke, streaming checks) and explicitly document remaining gaps and follow-ups.
contracts/openapi.yaml
docs/tool-context-policy.md
docs/feature-spikes.md
docs/framework-adapters.md
docs/tutorial-news-monitor-agent.md
docs/packaging.md
docs/index.md
docs/vision.md
docs/provider-credentials.md
docs/agent-authoring.md
docs/agent-adapters.md
docs/ralph-runs/framework-adapter-validation-loop-supervisor.md
README.md
Finalize 0.1.0 release wiring, versions, and CI workflow pinning.
  • Bump mix project version to 0.1.0 and update agent adapter identity tests and adapter pack modules to report adapter_version 0.1.0, including identity TTL tweaks in recording tests.
  • Update the release GitHub Actions workflow to pin actions/checkout, upload-artifact, and download-artifact to specific SHAs for reproducibility and security.
  • Adjust documentation and README install snippets, packaging notes, and agent adapter status sections to reference v0.1.0 as the stable release and describe its feature set and limits.
app/mix.exs
.github/workflows/wardwright-release.yml
app/test/agent_adapter_identity_test.exs
app/test/agent_adapter_recording_test.exs
app/lib/wardwright/agent_adapters/claude_code_pack.ex
app/lib/wardwright/agent_adapters/omp_pack.ex
app/lib/wardwright/agent_adapters/pi_pack.ex
README.md
docs/packaging.md
docs/agent-adapters.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The server-tool normalization logic is split between Wardwright.normalize_server_tools/1 and Wardwright.ServerTools.configured_tools/1 (including separate present?/1 and engine inference branches); consider consolidating this into a single shared normalization path to avoid drift between config serialization and runtime behavior.
  • In Wardwright.ServerTools, the module loading paths (load_tool_path/1, select_tool_module/2, compile_erlang_tool/1, load_beam_tool/1) have several failure modes that are collapsed into generic {:error, reason} tuples; consider adding structured error tagging or logging so operators can more easily diagnose why a given BEAM server tool failed to load or was ignored.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The server-tool normalization logic is split between `Wardwright.normalize_server_tools/1` and `Wardwright.ServerTools.configured_tools/1` (including separate `present?/1` and engine inference branches); consider consolidating this into a single shared normalization path to avoid drift between config serialization and runtime behavior.
- In `Wardwright.ServerTools`, the module loading paths (`load_tool_path/1`, `select_tool_module/2`, `compile_erlang_tool/1`, `load_beam_tool/1`) have several failure modes that are collapsed into generic `{:error, reason}` tuples; consider adding structured error tagging or logging so operators can more easily diagnose why a given BEAM server tool failed to load or was ignored.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a server-side tool framework to Wardwright, enabling models to execute built-in tools, Dune snippets, and custom BEAM modules. It also updates the project version to v0.1.0 and adds execution_location and visibility_level metadata to the tool context. Feedback on the new ServerTools module highlights several critical issues: the tool loop only handles the first tool call, violating the OpenAI API contract for multiple calls; the BEAM module loader suffers from performance bottlenecks and a bug where tools fail to load on subsequent requests due to Code.require_file behavior; and the use of JSON.encode! on tool results risks runtime crashes when encountering atoms.

Comment thread app/lib/wardwright/server_tools.ex
Comment thread app/lib/wardwright/server_tools.ex Outdated
Comment thread app/lib/wardwright/server_tools.ex
Comment thread app/lib/wardwright/server_tools.ex
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Prepares the Wardwright 0.1.0 stable release line by introducing a minimal Wardwright-hosted server-tool execution surface for non-streaming Chat Completions, extending tool-context provenance metadata, and updating contracts/docs/workflows/version strings from RC/draft wording to 0.1.0.

Changes:

  • Add Wardwright-hosted server tools for non-streaming Chat Completions (builtin + trusted local Dune + trusted local BEAM modules) and wire the router to use this path.
  • Extend tool-context normalization to include execution_location and visibility_level, and update tests/contracts accordingly.
  • Update OpenAPI metadata, release docs, and workflow pins for the 0.1.0 release line.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
README.md Updates install/version references to v0.1.0.
docs/vision.md Updates status banner to v0.1.0 wording.
docs/tutorial-news-monitor-agent.md Updates validation notes and framework evidence wording for 0.1.0.
docs/tool-context-policy.md Documents Wardwright-hosted server tools and execution/visibility semantics.
docs/ralph-runs/framework-adapter-validation-loop-supervisor.md Adds post-RC evidence log slice for .NET + streaming probes.
docs/provider-credentials.md Updates release-line wording from RC to stable.
docs/packaging.md Updates packaging/version guidance and documents server-tool slice.
docs/index.md Updates docs landing status banner for v0.1.0.
docs/framework-adapters.md Updates framework adapter evidence/limits wording for 0.1.0.
docs/feature-spikes.md Adds Wardwright-hosted server tools spike entry and guardrails.
docs/agent-authoring.md Updates “release line” wording.
docs/agent-adapters.md Updates adapter docs to 0.1.0 release line.
contracts/tool-context-policy-contract.md Adds execution_location and visibility_level fields to the contract doc.
contracts/openapi.yaml Bumps API version to 0.1.0 and adds server_tools / ServerTool schema.
app/test/tool_context_test.exs Asserts new tool-context fields in normalized output.
app/test/stream_provider_transport_test.exs Adds end-to-end tests for builtin/Dune/BEAM server tools.
app/test/gleam_policy_core_test.exs Adds Gleam reference assertions for execution/visibility helpers.
app/test/agent_adapter_recording_test.exs Updates adapter version assertions and TTL defaults in helper.
app/test/agent_adapter_identity_test.exs Updates adapter version assertions to 0.1.0.
app/test_support/router_case.ex Extends test provider to simulate server-tool calls/results.
app/src/wardwright/tool_context_core.gleam Implements execution/visibility derivation helpers in Gleam core.
app/src/wardwright/elixir_reference/tool_context_core_reference.exs Mirrors Gleam helpers in Elixir reference implementation.
app/mix.exs Sets application version to 0.1.0.
app/lib/wardwright/tool_context.ex Adds execution_location and visibility_level normalization.
app/lib/wardwright/server_tools/behaviour.ex Introduces behaviour for trusted BEAM server tools (spec/0, run/2).
app/lib/wardwright/server_tools.ex Implements server-tool registry, tool injection, one-loop execution, and receipt metadata.
app/lib/wardwright/router.ex Routes non-streaming completions through Wardwright.ServerTools.complete_selected_model/3.
app/lib/wardwright/agent_adapters/pi_pack.ex Updates adapter version constant to 0.1.0.
app/lib/wardwright/agent_adapters/omp_pack.ex Updates adapter version constant to 0.1.0.
app/lib/wardwright/agent_adapters/claude_code_pack.ex Updates adapter version constant to 0.1.0.
app/lib/wardwright.ex Normalizes server_tools config into the public model config surface.
.github/workflows/wardwright-release.yml Pins GitHub Actions to specific SHAs for reproducible release builds.

Comment thread app/lib/wardwright/server_tools.ex
Comment thread app/lib/wardwright/server_tools.ex
Comment thread app/lib/wardwright/server_tools.ex
Comment thread app/lib/wardwright/server_tools.ex
Comment thread app/lib/wardwright/server_tools.ex Outdated
Comment thread contracts/openapi.yaml
@bglusman bglusman changed the title Prepare Wardwright 0.1.0 release Prepare Wardwright 0.0.11 release May 26, 2026
@bglusman bglusman merged commit cfc56f5 into main May 26, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants