Release v0.25.0 · compusophy/localharness

Security

Hardening from a comprehensive adversarial audit (proxy / browser+seed / contracts
/ wallet-crypto). The crypto layer re-verified as sound (low-s signatures, fresh
per-op randomness + ECIES ephemerals, chainId+nonce+validity in Tempo tx, EIP-712
domain separation) and prior hardening holds (postMessage origin allowlist,
tx-target allowlist, markdown/error-string escaping). Real findings fixed:

Filesystem sandbox (workspace_only) audited + regression-tested. A
security deep-dive confirmed the agent file-tool sandbox holds against path
traversal (incl. deep ../../etc/passwd and Windows ..\), absolute-path
escape, sibling shared-prefix (<ws> vs <ws>-evil), case-bypass on
case-insensitive filesystems, symlink-out, and rename_file exfiltration (both
from AND to checked; missing args fail closed) — no exploitable bug.
Added +7 regression tests so a future refactor can't silently reintroduce a
starts_with sibling bug or a check-before-canonicalize symlink hole.
ABI decoders hardened against hostile/garbage RPC responses (registry.rs).
Nine dynamic decoders read offset/length words from untrusted eth_call
responses then did unchecked arithmetic before slicing. In the release/wasm
profile (panic="abort", overflow-checks OFF — the deployed one) a hostile
word WRAPPED → silently sliced the wrong region → returned wrong owner /
metadata / persona / device / signaling bytes with no error (in dev it
panicked). devices_of also pre-allocated Vec::with_capacity(hostile_len)
(OOM). All derived indices now use checked_add/checked_mul + .get()
(behavior-preserving on valid input; hostile input → empty/None/Err). +9
hostile-input/edge-case tests.
CLI: create now protects the persisted identity key. It sets owner-only
perms (0600, unix) and adds *.localharness.key to .gitignore (created if
absent) so a raw private key written to the working directory can't be
world-readable or accidentally git commited. Surfaced by the on-chain
test-user fleet (vex-qa) dogfooding the platform — a closed feedback loop:
the fleet filed it on-chain, this fixes it. (+ a pure unit test for the
idempotent .gitignore check.)
Proxy: auth-token replay window cut 24h → 5 min. FRESHNESS_WINDOW_SECS
was 86400, so a captured address:timestamp:signature token was replayable for
a day. Clients sign per request, so 300s (ample clock-skew tolerance) closes the
window at no UX cost.
Proxy: request-body size cap (16 MB). An oversized declared Content-Length
is now rejected up front (413) so one caller can't make the edge function buffer
a multi-GB body. Generous enough for max-context LLM requests.
Browser: closed an open redirect via ?then=. The linked-device hand-off
interpolated the raw ?then= query param into the redirect URL, so
?then=evil.com%23 → https://evil.com#.localharness.xyz/ navigated off-domain.
then is now validated as a bare DNS label (alphanumeric + hyphen, ≤63) first.
Contract (source; cut pending): ReleaseFacet MAIN guard reads storage
directly. It used a self-staticcall to mainOf that returns ok=false (not
a revert) if MainIdentityFacet were ever cut out — silently bypassing the
"can't release your MAIN" guard. Now reads LibMainIdentityStorage directly.
Source-only this pass; effective on the next diamondCut (low exploitability:
owner-misconfig + self-harm only).

Added

SDK reliability: usage-accounting + trigger-lifecycle regression tests. A
control-flow deep-dive verified the conversation usage accounting (cumulative
sums, last_turn resets each send, no per-step double-count — both backends
emit usage_metadata only on the terminal step), the trigger lifecycle
(double-start guard, stop() joins, callback error/panic isolation, Drop
aborts), and Agent::shutdown teardown order — no bug — and locked in +11
deterministic tests (240 lib total).
On-chain feedback garbage collection. The FeedbackFacet's append-only
Entry[] grew unbounded — every fleet run + probe appends an entry that costs
storage gas and lengthens feedbackRange forever (it had reached 46). Added an
owner-only clearFeedback() (cut into the live diamond via
script/AddFeedbackClear.s.sol) so on-chain feedback is a TRANSIENT inbox:
harvest/bridge off-chain (GitHub issues / harvest-feedback), then
scripts/clear-feedback.sh GCs the storage. The immutable FeedbackSubmitted
event log windows out naturally (100k-block cap), so localharness feedback
still shows recent notes after a clear. Verified live: storage 46 → 0, events
preserved.
CLI: publish is now one command — localharness publish <name> <src.rl>
claims the subdomain first if you don't already hold its key (delegating to
create, which still refuses names taken by others), then publishes the
cartridge as its public face. Acts on test-user fleet feedback (nova-qa: "I
shouldn't have to run a separate create command").
feedback → GitHub issues bridge (scripts/test-fleet/feedback-to-issues.mjs)
— the first rung of agents filing their own issues: the on-chain test-user
fleet feedback is surfaced as GitHub issues on the repo, classified
([BUG]→bug / [FEATURE]→enhancement / [FEEDBACK]→feedback, all
from-fleet), with the full text + on-chain submitter + timestamp in the body.
Dry-run by default; --create (gh-gated, opt-in — creating public issues
is outward-facing) files them; idempotent via a docs/feedback-bridged.txt
dedup ledger keyed on <timestamp>:<sender>. Backed by a new machine-readable
localharness feedback --json (+ unit test).
Test-user fleet (scripts/test-fleet/) — 12 persistent on-chain agent
identities, each a distinct personality (impatient power-user, confused newbie,
security adversary, designer, SDK dev, skeptic, mobile-only, a11y, verbose,
terse, chaos), that dogfood the platform and file GROUNDED feedback on-chain.
run-fleet.sh drives each persona: create → probe a live agent → reflect
in-persona on the REAL experience → submit one [BUG]/[FEATURE]/[FEEDBACK]
item (FeedbackFacet); read it back via harvest-feedback. Reuses the existing
CLI — no new server. Validated live: a 3-persona sample landed real DX,
onboarding, and security feedback on-chain (e.g. "create writes a raw private
key to the cwd with no chmod/.gitignore").
SDK: a minimal getting-started example (examples/basic_agent.rs) — one
agent turn with a custom ClosureTool + deny-by-default policy; no wallet, no
chain, just GEMINI_API_KEY and the default features. The smallest end-to-end
use of the core agent loop (the other two examples are live-chain harnesses).
README quickstart verified drift-free against the real API.
Discoverable agent cards on the apex explore view. The global
"explore / recent agents" view at localharness.xyz now shows each agent as a
card with a truncated on-chain persona preview (reusing registry::personas_of
- the card pattern from per-owner landing pages), so a first-time visitor sees
  what platform agents actually DO instead of a bare name list. Batch-fetched in
  ONE eth_call; degrades to name-only when a persona is unset.
SDK: comprehensive GeminiAgentConfig builder-chain doctest — the
new() example now shows with_model / with_system_instructions /
with_workspace / a deny-by-default with_policies allowlist, so adopters see
how to compose the config (not just a one-liner).
Discoverable agent portfolios on public landing pages. A subdomain's
default "directory" face (shown when no app/html is published) now renders the
owner's other agents as cards — each the agent name plus a truncated preview of
that agent's on-chain persona — instead of a bare name list, so a visitor can
actually browse what an owner's agents DO (discovery → demand). Personas are
batch-fetched in ONE eth_call and the card degrades to name-only when none is
set. (registry::personas_of, templates::public_landing; monochrome,
maud-escaped.)
MCP server surfaced in onboarding. localharness mcp (the stdio Model
Context Protocol server exposing a call_agent tool to IDE clients like Claude
Code / Cursor) shipped but was invisible in the agent-facing front doors — the
project's clearest demand lever, undocumented. Now web/skill.md and
web/llms.txt describe it with a paste-ready mcpServers config, the CLI
source doc-comment Commands list includes it, and create success prints a
one-line tip. (The runtime help text already covered it.)
Agent-teams P2P collaboration layer (Layer 5 wired). The foundation
(SignalingFacet/TeamFacet, webrtc.rs transport, sharedfs_sync.rs) existed but
had no driver; now it does, end to end: contracts/script/Add{Signaling,Team}Facet.s.sol
(deploy + diamondCut), a Rust signaling driver in registry.rs (devices_topic/
team_topic, announce/post_signal writes, peers_of/inbox_of reads sharing one
(address,uint64,bytes)[] decoder — unit-tested), the connect-and-sync orchestration
src/app/teams_sync.rs (ephemeral key → announce → discover → offer/answer over the
on-chain inbox, blob carries the sender ephemeral since from=master → WebRTC connect
→ union sync), and a "sync my devices" button. Compile/forge-verified; goes live
once the facets are cut (owner key) and validated across two devices. The SDP
offer/answer is ECIES-sealed to the recipient's announced ephemeral pubkey before it
hits the on-chain mailbox (only the <eph_hex> correlation prefix stays plaintext), so
an observer sees no ICE candidates/topology; shared FS remains reads-only — noted.
CLI billing self-test — localharness credits [--as <me>] (wallet $LH /
per-call meter / session) and localharness topup [--as <me>] (claim the daily
$LH allowance + deposit it into the per-request meter, sponsored). The end-to-end
billing check any agent can run as a real user: topup → call → credits.
rustlite: for i in a..b { … } loops. Desugared (no codegen change) to a
loop with the increment at the TOP and v pre-decremented, so continue stays
correct; bounds evaluated once. Range token .. added. Render-verified at runtime.
First integration test (tests/tool_hook_policy.rs, 5 tests) — exercises the
tool + hook + policy pipeline TOGETHER through the public API (the layer the
backend loop actually runs): policy precedence (specific-deny > wildcard-allow)
gating real ToolRunner dispatch, deny-by-default allowlists, ask-user verdicts,
first-deny-wins hook ordering, and post-hooks observing both allow and deny
outcomes. The repo's first tests/ suite; prior coverage was per-layer units.

Fixed

Accessibility: agent streams are now a screen-reader live region (+ labels).
The #transcript container — where every streamed assistant turn, text chunk,
and tool block lands — gained role="log" + aria-live="polite" +
aria-atomic="false", so screen-reader users hear new responses as they
stream (previously silent). Also: accessible names on the icon-only
send/stop/close/delete buttons + the unlabeled prompt textarea, a
role="status" live region on the status line, and aria-hidden on decorative
glyphs. Semantic-only — zero visual change. Surfaced by the test-user fleet
(ada-qa).
CLI: reject leading/trailing-hyphen names (dead-on-arrival subdomains).
name_is_valid allowed -foo / foo- (not valid DNS labels), so create
(and now publish) would mint them. Now rejected per RFC 1035. Surfaced by the
test-user fleet (juno-qa) — emoji/uppercase/oversized were ALREADY caught, so
this closed the one real residual, and confirmed the fleet (like real users)
files speculative bugs that need scrutiny before acting.
MCP client: image-bearing tool results no longer silently fail.
ContentBlock::Image's mime_type field expected snake_case, but
rename_all="lowercase" renames only the variant tags (not struct fields) and
MCP sends camelCase mimeType — so a real {"type":"image",…,"mimeType":…}
block failed to deserialize, erroring the WHOLE tools/call response even
though the server answered correctly. Added #[serde(rename = "mimeType")].
+28 protocol/correlation/framing tests for a previously untested module (incl.
out-of-order / unmatched / duplicate response-id correlation).
Context compaction: silent TOTAL context loss in tool-heavy sessions (both
backends). pick_split walked the summarize/keep boundary FORWARD to avoid
orphaning a tool pair — but in a long unbroken run of [assistant tool_use, user tool_result] round-trips every index qualified, so it chained to
history.len() and kept ZERO messages, defeating KEEP_RECENT_TURNS exactly
when compaction fires. The request still succeeded, so the agent just went
silently context-blind mid-task. The boundary now walks EARLIER (a kept slice
can only be orphaned by a leading tool_result whose call was summarized),
keeping ≥ the recent window and preserving the tool_use↔tool_result pairing.
+20 tests (gemini + anthropic).
Anthropic backend: 3 streaming/wire bugs + a compaction panic. (1) The
final SSE frame at EOF without a trailing blank line was dropped — losing the
turn's stop_reason + final output_tokens (the Gemini analog; ported
take_remaining). (2) Output-token usage was double-counted — Anthropic
streams cumulative usage (a message_start placeholder + each message_delta
carrying the running total), but the fold SUMMED them, over-reporting metered
tokens every turn (the credit proxy meters on these); now last-writer-wins.
(3) An unmodeled content block (redacted_thinking, future server-side types)
aborted the whole stream — added a Block::Other serde fallback. (4)
render_transcript truncated long tool-result bodies with &body[..512],
panicking on a multibyte char at byte 512 — now char-boundary-safe. +12 tests.
Gemini streaming: 3 SSE/wire correctness bugs. (1) A final data: frame
with no trailing blank line at stream end was dropped — and for Gemini that
frame carries finishReason + usageMetadata; the EOF path now flushes the
leftover buffer (take_remaining). (2) Text parts stamped thought:false (the
documented Gemini 3.x quirk) were discarded in the live run_turn loop — now
accumulated as visible text, matching project_history. (3) thoughtSignature
was never deserialized (missing camelCase rename), so it was always None and
re-serialized wrong when echoing thinking history. +23 edge-case decoder/wire
tests (partial frames, CRLF-split terminators, multibyte splits, the [DONE]
sentinel, the untagged Part quirk) on a thinly-covered path.
Restored transcripts now show Gemini tool results, not just the calls.
The Gemini backend's project_history emptied its pending-calls buffer when the
assistant Content was pushed — before the following FunctionResponse
Content arrived — so projecting wire history into a saved transcript dropped
every tool result/error. A reloaded conversation showed tool calls with a blank
(no result/error) pill. Now matched per-name FIFO across the two wire contents,
lifting the live error convention ({"error": …} → typed error). +4 tests
incl. old-format TranscriptEntry backward-compat. (Anthropic/local were already
correct; the replay path already reuses the live tool-call templates.)
SDK: closed a ChatResponse cursor lost-wakeup window.
ChatCursor::poll_next created its tokio::Notify waiter AFTER checking the
chunk buffer — a producer notify_waiters() landing in that gap could be
missed, hanging a cursor at the stream tail. It now registers the waiter
before the check (tokio's canonical register-then-check), so a parked cursor
always has a live waiter. Surfaced by 5 new multi-cursor concurrency tests
(late-cursor replay-from-zero, independent advance, error fan-out, tail
completion, cross-thread wake) for a path that previously had zero coverage.
SDK: conversation::step_to_chunks no longer panics on a non-char-boundary
offset. The terminal-response tail-recovery byte-sliced
content[text_emitted..]; when a harness split a multibyte UTF-8 char across
deltas, text_emitted landed mid-char and the slice panicked (a library
panic crashes the consumer). It now uses str::get, degrading to a no-op on a
bad boundary, and the doc comment is corrected (it's a BYTE offset, not chars).
+3 regression tests.
Browser app: the system prompt no longer advertises Gemini-only tools to
Claude agents. generate_image and start_subagent aren't registered on the
Anthropic backend, but the prompt listed them unconditionally — so every
Claude-backed agent was told it had tools it couldn't call. Those two bullets
are now gated on the selected backend (!is_anthropic(model); the model is
knowable at prompt-build time). Also RUNTIME_SUMMARY no longer claims
"Gemini-backed" (the platform runs Gemini, Claude, or local Gemma).
CLI: clearer empty-key error + doc completeness. An empty
.localharness.key produced a cryptic wallet-parse error; it now reports the
file is empty and to recreate it. credits/topup added to the source command
list (they were only in the runtime help).
Credit proxy: CORS origin check hardened + clearer first-call 402. The
localhost allowance used startsWith('http://localhost'), which also matched
http://localhost.evil.com — an attacker origin could read proxy responses
cross-origin; it now parses the URL and checks the hostname (localhost /
127.0.0.1 over http only). Separately, the first-call 402 (no active session or credit) was cryptic; it now explains the free-beta auto-session
and how to get $LH.
Browser app: XSS hardening — unescaped error/dynamic strings in innerHTML.
Four DOM sinks interpolated RPC error strings and on-chain-derived values
straight into HTML via format! (not maud), so an attacker-controlled RPC node
could return an error containing <script>/<img onerror> that executes in the
wallet-bearing origin: the agents-list + explore-grid error paths
(mod.rs), and the device-signer list + sync-result (events.rs). All now
build via maud (auto-escaped); a full sweep confirmed every other sink already
escapes (dom::msg_span, set_status/set_text_content, maud templates).
Closes the long-open "escape error-string innerHTML" item. Also: an orphaned
ToolResult (no matching pending call) now logs a warning instead of silently
dropping (chat.rs).
Anthropic backend: malformed streamed tool-args no longer silently run with
{}. A non-empty partial_json that failed to parse executed the tool with
an empty object (silent wrong-args); it now surfaces a real tool error
(is_error ToolResult) the model can retry on. The legitimate no-arg ({})
and valid-JSON paths are preserved; +3 unit tests.
README: stale localharness = "0.20" → "0.24"; documented the anthropic
and local cargo features (agents copy-paste the quickstart — the version was
4 minors behind).
docs.rs: 4 broken intra-doc links resolved (raster Viewport /
Viewport::full, compose Pending, policy FS_TOOLS) — the module-level
//! docs used bare paths that didn't resolve; now fully-qualified
(crate::raster::Viewport) or de-linked for the private FS_TOOLS. cargo doc --no-deps is warning-free.
Credit proxy: the $LH meter debit is now authoritative (closes burst
over-serving). The proxy gated a request with a cheap creditOf read then
fired the meter() debit awaiting only SUBMISSION — so a flurry of concurrent
requests could all pass the gate and be served while only the first N debits fit
the balance; the rest reverted on-chain (InsufficientCredits) unnoticed and the
PLATFORM ate the over-served calls. (User balances were never at risk — the
contract reverts rather than underflowing.) meterDebit now awaits the RECEIPT
and a definitive revert returns 402 (unless a session also covers the caller);
ambiguous RPC/timeout still serves, to avoid double-charging on retry.
Per-call $LH billing now actually decrements. Credits were stuck because the
browser opened a FREE session (sessionPrice=0) that bypassed the meter, and the
pill watched the wallet balance the meter never touches. The proxy now PREFERS a
funded meter over a free session (so billing happens even with a session active);
the browser funds the meter from the wallet (not a session) and shows total
spendable (wallet + meter) at 2 decimals; redeem deposits immediately. Verified
live: meter 100.00 → 99.97 LH across 3 metered calls.
rustlite: hex integer literals (0xFF0000). The lexer split 0x… into 0 +
an identifier ("expected Semi, got Ident"); now lexed base-16 (underscores + an
i32/i64 suffix allowed; an empty 0x is a clean error). Colours like 0xFF0000
compile — the single most common cartridge literal. (On-chain feedback #15/#16.)
rustlite: compound assignment (x += 1, -=, *=, /=, %=). The lexer
split += into + then =, so these threw the confusing "expected expression,
got Eq" — the TRUE source of that feedback (if-exprs always compiled). Now lexed
as compound-assign tokens and desugared place OP= v → place = place OP v
(operand order preserved for the non-commutative ops). Found by the test-user
dogfood pass; filed + fixed in the same loop.
rustlite: break/continue inside an if or match arm hung the cartridge.
Codegen hardcoded the branch depth (br 1 / br 0), ignoring the enclosing
conditional frames — so while c { if x { break } } branched to the loop instead
of out of it and spun forever (any guarded break/continue, the common case).
Now a per-function extra_depth counter tracks open if/match frames between the
break/continue and its loop, so the branch reaches the right target. Runtime-proven
via the render harness (the hanging cases now terminate). A SEVERE pre-existing bug
the test-user dogfood pass surfaced (it's what made for-loops hang at first).
rustlite: char literals ('A'). The lexer hit unexpected byte 0x27 on a
'; now lexed to the byte value as an IntLit (chars are i32 glyph codes for
draw_char) with string-style escapes; empty/multi-byte literals are clear errors.
(On-chain feedback #15.)
rustlite: block comments (/* … */, nesting allowed). Only // was skipped,
so a /* lexed its leading / as division → "expected expression, got Slash".
Now skipped as trivia like line comments. (LLM-authored source emits these
constantly — ties into the #19/#20 first-shot-compile pain.)
rustlite: top-level consts resolve. const W: i32 = 256; parsed + typechecked
but a function referencing W errored "undefined variable" — consts were never put
in scope. Now processed before functions (order-independent) and INLINED at each use
(a clone of the typed value → no runtime global, no codegen change); consts may
reference earlier consts. Runtime-verified (a const loop bound iterates the right N).

Added

rustlite: arrays (literals + indexed reads) — let pal = [0xFF0000, 0x00FF00];
and pal[i] (variable index). The single biggest missing feature: lookup tables,
palettes, sine/tile data. Stored in a static linear-memory region (re-initialised
when the literal evaluates), value = base pointer; arr[i] lowers to
i32.load(base + i*4). v1 is i32 elements, read-only (mutation arr[i] = v is a
clean "invalid assignment target" error, deferred). New ResolvedType::Array,
ExprKind::ArrayLit/Index. Runtime-verified ([3,5,7][1]→5, loop-lookup over a
table). The first piece of the linear-memory model that full tuples will share.
rustlite: bitwise + shift operators — & | ^ << >> (i32 + i64), with
Rust precedence (| < ^ < & < <</>> < +). Previously << lexed as two
<, and &/| were rejected as "no references/closures" — so color packing
(r<<16)|(g<<8)|b and masks & 0xFF were impossible, the most common cartridge
idiom. Lexer/AST/parser/typecheck (integer-only)/codegen all wired; runtime-verified
(values + precedence) via the render harness. Found by the test-user dogfood pass.
rustlite: as numeric casts — t as f64, (a * 10.0) as i32, i32↔i64, etc.
Previously as lexed as a bare identifier → "expected Semi, got Ident". Now an As
keyword + ExprKind::Cast with Rust precedence (tighter than * / %, looser than
unary); the codegen emits the right convert/trunc/extend/wrap/promote/demote opcode
per (from,to). The graphics staple — float math, then cast to a pixel coord.
Runtime-verified (3.7 as i32→3, (1.5*4.0) as i32→6).
rustlite: match range patterns — 0..=5 => … (inclusive) and 0..5 => …
(exclusive). Previously the .. in an arm hit "expected FatArrow, got DotDot". Now
a ..= (DotDotEq) token + an IntRange pattern lowered to scrutinee >= lo & scrutinee <(=) hi. Runtime-verified (in-range vs out-of-range select the right arm).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.25.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Security

Added

Fixed

Added

Uh oh!