v0.25.0
Security
Hardening from a comprehensive adversarial audit (proxy / browser+seed / contracts
/ wallet-crypto). The crypto layer re-verified as sound (low-s signatures, fresh
per-op randomness + ECIES ephemerals, chainId+nonce+validity in Tempo tx, EIP-712
domain separation) and prior hardening holds (postMessage origin allowlist,
tx-target allowlist, markdown/error-string escaping). Real findings fixed:
- Filesystem sandbox (
workspace_only) audited + regression-tested. A
security deep-dive confirmed the agent file-tool sandbox holds against path
traversal (incl. deep../../etc/passwdand Windows..\), absolute-path
escape, sibling shared-prefix (<ws>vs<ws>-evil), case-bypass on
case-insensitive filesystems, symlink-out, andrename_fileexfiltration (both
fromANDtochecked; missing args fail closed) — no exploitable bug.
Added +7 regression tests so a future refactor can't silently reintroduce a
starts_withsibling bug or a check-before-canonicalize symlink hole. - ABI decoders hardened against hostile/garbage RPC responses (
registry.rs).
Nine dynamic decoders read offset/length words from untrustedeth_call
responses then did unchecked arithmetic before slicing. In the release/wasm
profile (panic="abort", overflow-checks OFF — the deployed one) a hostile
word WRAPPED → silently sliced the wrong region → returned wrong owner /
metadata / persona / device / signaling bytes with no error (in dev it
panicked).devices_ofalso pre-allocatedVec::with_capacity(hostile_len)
(OOM). All derived indices now usechecked_add/checked_mul+.get()
(behavior-preserving on valid input; hostile input → empty/None/Err). +9
hostile-input/edge-case tests. - CLI:
createnow protects the persisted identity key. It sets owner-only
perms (0600, unix) and adds*.localharness.keyto.gitignore(created if
absent) so a raw private key written to the working directory can't be
world-readable or accidentallygit commited. Surfaced by the on-chain
test-user fleet (vex-qa) dogfooding the platform — a closed feedback loop:
the fleet filed it on-chain, this fixes it. (+ a pure unit test for the
idempotent.gitignorecheck.) - Proxy: auth-token replay window cut 24h → 5 min.
FRESHNESS_WINDOW_SECS
was 86400, so a capturedaddress:timestamp:signaturetoken was replayable for
a day. Clients sign per request, so 300s (ample clock-skew tolerance) closes the
window at no UX cost. - Proxy: request-body size cap (16 MB). An oversized declared
Content-Length
is now rejected up front (413) so one caller can't make the edge function buffer
a multi-GB body. Generous enough for max-context LLM requests. - Browser: closed an open redirect via
?then=. The linked-device hand-off
interpolated the raw?then=query param into the redirect URL, so
?then=evil.com%23→https://evil.com#.localharness.xyz/navigated off-domain.
thenis now validated as a bare DNS label (alphanumeric + hyphen, ≤63) first. - Contract (source; cut pending):
ReleaseFacetMAIN guard reads storage
directly. It used a self-staticcalltomainOfthat returnsok=false(not
a revert) ifMainIdentityFacetwere ever cut out — silently bypassing the
"can't release your MAIN" guard. Now readsLibMainIdentityStoragedirectly.
Source-only this pass; effective on the nextdiamondCut(low exploitability:
owner-misconfig + self-harm only).
Added
- SDK reliability: usage-accounting + trigger-lifecycle regression tests. A
control-flow deep-dive verified the conversation usage accounting (cumulative
sums,last_turnresets eachsend, no per-step double-count — both backends
emitusage_metadataonly on the terminal step), the trigger lifecycle
(double-start guard,stop()joins, callback error/panic isolation,Drop
aborts), andAgent::shutdownteardown order — no bug — and locked in +11
deterministic tests (240 lib total). - On-chain feedback garbage collection. The
FeedbackFacet's append-only
Entry[]grew unbounded — every fleet run + probe appends an entry that costs
storage gas and lengthensfeedbackRangeforever (it had reached 46). Added an
owner-onlyclearFeedback()(cut into the live diamond via
script/AddFeedbackClear.s.sol) so on-chain feedback is a TRANSIENT inbox:
harvest/bridge off-chain (GitHub issues /harvest-feedback), then
scripts/clear-feedback.shGCs the storage. The immutableFeedbackSubmitted
event log windows out naturally (100k-block cap), solocalharness feedback
still shows recent notes after a clear. Verified live: storage46 → 0, events
preserved. - CLI:
publishis now one command —localharness publish <name> <src.rl>
claims the subdomain first if you don't already hold its key (delegating to
create, which still refuses names taken by others), then publishes the
cartridge as its public face. Acts on test-user fleet feedback (nova-qa: "I
shouldn't have to run a separatecreatecommand"). feedback → GitHub issuesbridge (scripts/test-fleet/feedback-to-issues.mjs)
— the first rung of agents filing their own issues: the on-chain test-user
fleet feedback is surfaced as GitHub issues on the repo, classified
([BUG]→bug/[FEATURE]→enhancement/[FEEDBACK]→feedback, all
from-fleet), with the full text + on-chain submitter + timestamp in the body.
Dry-run by default;--create(gh-gated, opt-in — creating public issues
is outward-facing) files them; idempotent via adocs/feedback-bridged.txt
dedup ledger keyed on<timestamp>:<sender>. Backed by a new machine-readable
localharness feedback --json(+ unit test).- Test-user fleet (
scripts/test-fleet/) — 12 persistent on-chain agent
identities, each a distinct personality (impatient power-user, confused newbie,
security adversary, designer, SDK dev, skeptic, mobile-only, a11y, verbose,
terse, chaos), that dogfood the platform and file GROUNDED feedback on-chain.
run-fleet.shdrives each persona: create → probe a live agent → reflect
in-persona on the REAL experience → submit one[BUG]/[FEATURE]/[FEEDBACK]
item (FeedbackFacet); read it back viaharvest-feedback. Reuses the existing
CLI — no new server. Validated live: a 3-persona sample landed real DX,
onboarding, and security feedback on-chain (e.g. "createwrites a raw private
key to the cwd with no chmod/.gitignore"). - SDK: a minimal getting-started example (
examples/basic_agent.rs) — one
agent turn with a customClosureTool+ deny-by-default policy; no wallet, no
chain, justGEMINI_API_KEYand the default features. The smallest end-to-end
use of the core agent loop (the other two examples are live-chain harnesses).
README quickstart verified drift-free against the real API. - Discoverable agent cards on the apex explore view. The global
"explore / recent agents" view at localharness.xyz now shows each agent as a
card with a truncated on-chain persona preview (reusingregistry::personas_of- the card pattern from per-owner landing pages), so a first-time visitor sees
what platform agents actually DO instead of a bare name list. Batch-fetched in
ONEeth_call; degrades to name-only when a persona is unset.
- the card pattern from per-owner landing pages), so a first-time visitor sees
- SDK: comprehensive
GeminiAgentConfigbuilder-chain doctest — the
new()example now showswith_model/with_system_instructions/
with_workspace/ a deny-by-defaultwith_policiesallowlist, so adopters see
how to compose the config (not just a one-liner). - Discoverable agent portfolios on public landing pages. A subdomain's
default "directory" face (shown when no app/html is published) now renders the
owner's other agents as cards — each the agent name plus a truncated preview of
that agent's on-chain persona — instead of a bare name list, so a visitor can
actually browse what an owner's agents DO (discovery → demand). Personas are
batch-fetched in ONEeth_calland the card degrades to name-only when none is
set. (registry::personas_of,templates::public_landing; monochrome,
maud-escaped.) - MCP server surfaced in onboarding.
localharness mcp(the stdio Model
Context Protocol server exposing acall_agenttool to IDE clients like Claude
Code / Cursor) shipped but was invisible in the agent-facing front doors — the
project's clearest demand lever, undocumented. Nowweb/skill.mdand
web/llms.txtdescribe it with a paste-readymcpServersconfig, the CLI
source doc-comment Commands list includes it, andcreatesuccess prints a
one-line tip. (The runtimehelptext already covered it.) - Agent-teams P2P collaboration layer (Layer 5 wired). The foundation
(SignalingFacet/TeamFacet,webrtc.rstransport,sharedfs_sync.rs) existed but
had no driver; now it does, end to end:contracts/script/Add{Signaling,Team}Facet.s.sol
(deploy + diamondCut), a Rust signaling driver inregistry.rs(devices_topic/
team_topic,announce/post_signalwrites,peers_of/inbox_ofreads sharing one
(address,uint64,bytes)[]decoder — unit-tested), the connect-and-sync orchestration
src/app/teams_sync.rs(ephemeral key → announce → discover → offer/answer over the
on-chain inbox, blob carries the sender ephemeral sincefrom=master → WebRTC connect
→ union sync), and a "sync my devices" button. Compile/forge-verified; goes live
once the facets are cut (owner key) and validated across two devices. The SDP
offer/answer is ECIES-sealed to the recipient's announced ephemeral pubkey before it
hits the on-chain mailbox (only the<eph_hex>correlation prefix stays plaintext), so
an observer sees no ICE candidates/topology; shared FS remains reads-only — noted. - CLI billing self-test —
localharness credits [--as <me>](wallet$LH/
per-call meter / session) andlocalharness topup [--as <me>](claim the daily
$LHallowance + deposit it into the per-request meter, sponsored). The end-to-end
billing check any agent can run as a real user:topup → call → credits. - rustlite:
for i in a..b { … }loops. Desugared (no codegen change) to a
loopwith the increment at the TOP andvpre-decremented, socontinuestays
correct; bounds evaluated once. Range token..added. Render-verified at runtime. - First integration test (
tests/tool_hook_policy.rs, 5 tests) — exercises the
tool + hook + policy pipeline TOGETHER through the public API (the layer the
backend loop actually runs): policy precedence (specific-deny > wildcard-allow)
gating realToolRunnerdispatch, deny-by-default allowlists, ask-user verdicts,
first-deny-wins hook ordering, and post-hooks observing both allow and deny
outcomes. The repo's firsttests/suite; prior coverage was per-layer units.
Fixed
- Accessibility: agent streams are now a screen-reader live region (+ labels).
The#transcriptcontainer — where every streamed assistant turn, text chunk,
and tool block lands — gainedrole="log"+aria-live="polite"+
aria-atomic="false", so screen-reader users hear new responses as they
stream (previously silent). Also: accessible names on the icon-only
send/stop/close/delete buttons + the unlabeled prompt textarea, a
role="status"live region on the status line, andaria-hiddenon decorative
glyphs. Semantic-only — zero visual change. Surfaced by the test-user fleet
(ada-qa). - CLI: reject leading/trailing-hyphen names (dead-on-arrival subdomains).
name_is_validallowed-foo/foo-(not valid DNS labels), socreate
(and nowpublish) would mint them. Now rejected per RFC 1035. Surfaced by the
test-user fleet (juno-qa) — emoji/uppercase/oversized were ALREADY caught, so
this closed the one real residual, and confirmed the fleet (like real users)
files speculative bugs that need scrutiny before acting. - MCP client: image-bearing tool results no longer silently fail.
ContentBlock::Image'smime_typefield expected snake_case, but
rename_all="lowercase"renames only the variant tags (not struct fields) and
MCP sends camelCasemimeType— so a real{"type":"image",…,"mimeType":…}
block failed to deserialize, erroring the WHOLEtools/callresponse even
though the server answered correctly. Added#[serde(rename = "mimeType")].
+28 protocol/correlation/framing tests for a previously untested module (incl.
out-of-order / unmatched / duplicate response-id correlation). - Context compaction: silent TOTAL context loss in tool-heavy sessions (both
backends).pick_splitwalked the summarize/keep boundary FORWARD to avoid
orphaning a tool pair — but in a long unbroken run of[assistant tool_use, user tool_result]round-trips every index qualified, so it chained to
history.len()and kept ZERO messages, defeatingKEEP_RECENT_TURNSexactly
when compaction fires. The request still succeeded, so the agent just went
silently context-blind mid-task. The boundary now walks EARLIER (a kept slice
can only be orphaned by a leading tool_result whose call was summarized),
keeping ≥ the recent window and preserving the tool_use↔tool_result pairing.
+20 tests (gemini + anthropic). - Anthropic backend: 3 streaming/wire bugs + a compaction panic. (1) The
final SSE frame at EOF without a trailing blank line was dropped — losing the
turn'sstop_reason+ finaloutput_tokens(the Gemini analog; ported
take_remaining). (2) Output-token usage was double-counted — Anthropic
streams cumulative usage (amessage_startplaceholder + eachmessage_delta
carrying the running total), but the fold SUMMED them, over-reporting metered
tokens every turn (the credit proxy meters on these); now last-writer-wins.
(3) An unmodeled content block (redacted_thinking, future server-side types)
aborted the whole stream — added aBlock::Otherserde fallback. (4)
render_transcripttruncated long tool-result bodies with&body[..512],
panicking on a multibyte char at byte 512 — now char-boundary-safe. +12 tests. - Gemini streaming: 3 SSE/wire correctness bugs. (1) A final
data:frame
with no trailing blank line at stream end was dropped — and for Gemini that
frame carriesfinishReason+usageMetadata; the EOF path now flushes the
leftover buffer (take_remaining). (2) Text parts stampedthought:false(the
documented Gemini 3.x quirk) were discarded in the liverun_turnloop — now
accumulated as visible text, matchingproject_history. (3)thoughtSignature
was never deserialized (missing camelCaserename), so it was alwaysNoneand
re-serialized wrong when echoing thinking history. +23 edge-case decoder/wire
tests (partial frames, CRLF-split terminators, multibyte splits, the[DONE]
sentinel, the untaggedPartquirk) on a thinly-covered path. - Restored transcripts now show Gemini tool results, not just the calls.
The Gemini backend'sproject_historyemptied its pending-calls buffer when the
assistantContentwas pushed — before the followingFunctionResponse
Contentarrived — so projecting wire history into a saved transcript dropped
every tool result/error. A reloaded conversation showed tool calls with a blank
(no result/error) pill. Now matched per-name FIFO across the two wire contents,
lifting the live error convention ({"error": …}→ typederror). +4 tests
incl. old-formatTranscriptEntrybackward-compat. (Anthropic/local were already
correct; the replay path already reuses the live tool-call templates.) - SDK: closed a
ChatResponsecursor lost-wakeup window.
ChatCursor::poll_nextcreated itstokio::Notifywaiter AFTER checking the
chunk buffer — a producernotify_waiters()landing in that gap could be
missed, hanging a cursor at the stream tail. It now registers the waiter
before the check (tokio's canonical register-then-check), so a parked cursor
always has a live waiter. Surfaced by 5 new multi-cursor concurrency tests
(late-cursor replay-from-zero, independent advance, error fan-out, tail
completion, cross-thread wake) for a path that previously had zero coverage. - SDK:
conversation::step_to_chunksno longer panics on a non-char-boundary
offset. The terminal-response tail-recovery byte-sliced
content[text_emitted..]; when a harness split a multibyte UTF-8 char across
deltas,text_emittedlanded mid-char and the slice panicked (a library
panic crashes the consumer). It now usesstr::get, degrading to a no-op on a
bad boundary, and the doc comment is corrected (it's a BYTE offset, not chars).
+3 regression tests. - Browser app: the system prompt no longer advertises Gemini-only tools to
Claude agents.generate_imageandstart_subagentaren't registered on the
Anthropic backend, but the prompt listed them unconditionally — so every
Claude-backed agent was told it had tools it couldn't call. Those two bullets
are now gated on the selected backend (!is_anthropic(model); the model is
knowable at prompt-build time). AlsoRUNTIME_SUMMARYno longer claims
"Gemini-backed" (the platform runs Gemini, Claude, or local Gemma). - CLI: clearer empty-key error + doc completeness. An empty
.localharness.keyproduced a cryptic wallet-parse error; it now reports the
file is empty and to recreate it.credits/topupadded to the source command
list (they were only in the runtimehelp). - Credit proxy: CORS origin check hardened + clearer first-call 402. The
localhost allowance usedstartsWith('http://localhost'), which also matched
http://localhost.evil.com— an attacker origin could read proxy responses
cross-origin; it now parses the URL and checks the hostname (localhost/
127.0.0.1over http only). Separately, the first-call402(no active session or credit) was cryptic; it now explains the free-beta auto-session
and how to get$LH. - Browser app: XSS hardening — unescaped error/dynamic strings in
innerHTML.
Four DOM sinks interpolated RPC error strings and on-chain-derived values
straight into HTML viaformat!(not maud), so an attacker-controlled RPC node
could return an error containing<script>/<img onerror>that executes in the
wallet-bearing origin: theagents-list+explore-griderror paths
(mod.rs), and the device-signer list + sync-result (events.rs). All now
build via maud (auto-escaped); a full sweep confirmed every other sink already
escapes (dom::msg_span,set_status/set_text_content, maud templates).
Closes the long-open "escape error-string innerHTML" item. Also: an orphaned
ToolResult(no matching pending call) now logs a warning instead of silently
dropping (chat.rs). - Anthropic backend: malformed streamed tool-args no longer silently run with
{}. A non-emptypartial_jsonthat failed to parse executed the tool with
an empty object (silent wrong-args); it now surfaces a real tool error
(is_errorToolResult) the model can retry on. The legitimate no-arg ({})
and valid-JSON paths are preserved; +3 unit tests. - README: stale
localharness = "0.20"→"0.24"; documented theanthropic
andlocalcargo features (agents copy-paste the quickstart — the version was
4 minors behind). - docs.rs: 4 broken intra-doc links resolved (
rasterViewport/
Viewport::full,composePending,policyFS_TOOLS) — the module-level
//!docs used bare paths that didn't resolve; now fully-qualified
(crate::raster::Viewport) or de-linked for the privateFS_TOOLS.cargo doc --no-depsis warning-free. - Credit proxy: the
$LHmeter debit is now authoritative (closes burst
over-serving). The proxy gated a request with a cheapcreditOfread then
fired themeter()debit awaiting only SUBMISSION — so a flurry of concurrent
requests could all pass the gate and be served while only the first N debits fit
the balance; the rest reverted on-chain (InsufficientCredits) unnoticed and the
PLATFORM ate the over-served calls. (User balances were never at risk — the
contract reverts rather than underflowing.)meterDebitnow awaits the RECEIPT
and a definitive revert returns 402 (unless a session also covers the caller);
ambiguous RPC/timeout still serves, to avoid double-charging on retry. - Per-call
$LHbilling now actually decrements. Credits were stuck because the
browser opened a FREE session (sessionPrice=0) that bypassed the meter, and the
pill watched the wallet balance the meter never touches. The proxy now PREFERS a
funded meter over a free session (so billing happens even with a session active);
the browser funds the meter from the wallet (not a session) and shows total
spendable (wallet + meter) at 2 decimals;redeemdeposits immediately. Verified
live: meter100.00 → 99.97 LHacross 3 metered calls. - rustlite: hex integer literals (
0xFF0000). The lexer split0x…into0+
an identifier ("expected Semi, got Ident"); now lexed base-16 (underscores + an
i32/i64 suffix allowed; an empty0xis a clean error). Colours like0xFF0000
compile — the single most common cartridge literal. (On-chain feedback #15/#16.) - rustlite: compound assignment (
x += 1,-=,*=,/=,%=). The lexer
split+=into+then=, so these threw the confusing "expected expression,
got Eq" — the TRUE source of that feedback (if-exprs always compiled). Now lexed
as compound-assign tokens and desugaredplace OP= v→place = place OP v
(operand order preserved for the non-commutative ops). Found by the test-user
dogfood pass; filed + fixed in the same loop. - rustlite:
break/continueinside aniformatcharm hung the cartridge.
Codegen hardcoded the branch depth (br 1/br 0), ignoring the enclosing
conditional frames — sowhile c { if x { break } }branched to the loop instead
of out of it and spun forever (any guarded break/continue, the common case).
Now a per-functionextra_depthcounter tracks openif/match frames between the
break/continue and its loop, so the branch reaches the right target. Runtime-proven
via the render harness (the hanging cases now terminate). A SEVERE pre-existing bug
the test-user dogfood pass surfaced (it's what made for-loops hang at first). - rustlite: char literals (
'A'). The lexer hitunexpected byte 0x27on a
'; now lexed to the byte value as anIntLit(chars arei32glyph codes for
draw_char) with string-style escapes; empty/multi-byte literals are clear errors.
(On-chain feedback #15.) - rustlite: block comments (
/* … */, nesting allowed). Only//was skipped,
so a/*lexed its leading/as division → "expected expression, got Slash".
Now skipped as trivia like line comments. (LLM-authored source emits these
constantly — ties into the #19/#20 first-shot-compile pain.) - rustlite: top-level
consts resolve.const W: i32 = 256;parsed + typechecked
but a function referencingWerrored "undefined variable" — consts were never put
in scope. Now processed before functions (order-independent) and INLINED at each use
(a clone of the typed value → no runtime global, no codegen change); consts may
reference earlier consts. Runtime-verified (a const loop bound iterates the right N).
Added
- rustlite: arrays (literals + indexed reads) —
let pal = [0xFF0000, 0x00FF00];
andpal[i](variable index). The single biggest missing feature: lookup tables,
palettes, sine/tile data. Stored in a static linear-memory region (re-initialised
when the literal evaluates), value = base pointer;arr[i]lowers to
i32.load(base + i*4). v1 is i32 elements, read-only (mutationarr[i] = vis a
clean "invalid assignment target" error, deferred). NewResolvedType::Array,
ExprKind::ArrayLit/Index. Runtime-verified ([3,5,7][1]→5, loop-lookup over a
table). The first piece of the linear-memory model that full tuples will share. - rustlite: bitwise + shift operators —
&|^<<>>(i32 + i64), with
Rust precedence (|<^<&<<</>><+). Previously<<lexed as two
<, and&/|were rejected as "no references/closures" — so color packing
(r<<16)|(g<<8)|band masks& 0xFFwere impossible, the most common cartridge
idiom. Lexer/AST/parser/typecheck (integer-only)/codegen all wired; runtime-verified
(values + precedence) via the render harness. Found by the test-user dogfood pass. - rustlite:
asnumeric casts —t as f64,(a * 10.0) as i32, i32↔i64, etc.
Previouslyaslexed as a bare identifier → "expected Semi, got Ident". Now anAs
keyword +ExprKind::Castwith Rust precedence (tighter than* / %, looser than
unary); the codegen emits the right convert/trunc/extend/wrap/promote/demote opcode
per (from,to). The graphics staple — float math, then cast to a pixel coord.
Runtime-verified (3.7 as i32→3,(1.5*4.0) as i32→6). - rustlite:
matchrange patterns —0..=5 => …(inclusive) and0..5 => …
(exclusive). Previously the..in an arm hit "expected FatArrow, got DotDot". Now
a..=(DotDotEq) token + anIntRangepattern lowered toscrutinee >= lo & scrutinee <(=) hi. Runtime-verified (in-range vs out-of-range select the right arm).