Skip to content

v0.25.0

Choose a tag to compare

@compusophy compusophy released this 08 Jun 09:01
· 154 commits to main since this release

Security

Hardening from a comprehensive adversarial audit (proxy / browser+seed / contracts
/ wallet-crypto). The crypto layer re-verified as sound (low-s signatures, fresh
per-op randomness + ECIES ephemerals, chainId+nonce+validity in Tempo tx, EIP-712
domain separation) and prior hardening holds (postMessage origin allowlist,
tx-target allowlist, markdown/error-string escaping). Real findings fixed:

  • Filesystem sandbox (workspace_only) audited + regression-tested. A
    security deep-dive confirmed the agent file-tool sandbox holds against path
    traversal (incl. deep ../../etc/passwd and Windows ..\), absolute-path
    escape, sibling shared-prefix (<ws> vs <ws>-evil), case-bypass on
    case-insensitive filesystems, symlink-out, and rename_file exfiltration (both
    from AND to checked; missing args fail closed) — no exploitable bug.
    Added +7 regression tests so a future refactor can't silently reintroduce a
    starts_with sibling bug or a check-before-canonicalize symlink hole.
  • ABI decoders hardened against hostile/garbage RPC responses (registry.rs).
    Nine dynamic decoders read offset/length words from untrusted eth_call
    responses then did unchecked arithmetic before slicing. In the release/wasm
    profile (panic="abort", overflow-checks OFF — the deployed one) a hostile
    word WRAPPED → silently sliced the wrong region → returned wrong owner /
    metadata / persona / device / signaling bytes with no error (in dev it
    panicked). devices_of also pre-allocated Vec::with_capacity(hostile_len)
    (OOM). All derived indices now use checked_add/checked_mul + .get()
    (behavior-preserving on valid input; hostile input → empty/None/Err). +9
    hostile-input/edge-case tests.
  • CLI: create now protects the persisted identity key. It sets owner-only
    perms (0600, unix) and adds *.localharness.key to .gitignore (created if
    absent) so a raw private key written to the working directory can't be
    world-readable or accidentally git commited. Surfaced by the on-chain
    test-user fleet
    (vex-qa) dogfooding the platform — a closed feedback loop:
    the fleet filed it on-chain, this fixes it. (+ a pure unit test for the
    idempotent .gitignore check.)
  • Proxy: auth-token replay window cut 24h → 5 min. FRESHNESS_WINDOW_SECS
    was 86400, so a captured address:timestamp:signature token was replayable for
    a day. Clients sign per request, so 300s (ample clock-skew tolerance) closes the
    window at no UX cost.
  • Proxy: request-body size cap (16 MB). An oversized declared Content-Length
    is now rejected up front (413) so one caller can't make the edge function buffer
    a multi-GB body. Generous enough for max-context LLM requests.
  • Browser: closed an open redirect via ?then=. The linked-device hand-off
    interpolated the raw ?then= query param into the redirect URL, so
    ?then=evil.com%23https://evil.com#.localharness.xyz/ navigated off-domain.
    then is now validated as a bare DNS label (alphanumeric + hyphen, ≤63) first.
  • Contract (source; cut pending): ReleaseFacet MAIN guard reads storage
    directly.
    It used a self-staticcall to mainOf that returns ok=false (not
    a revert) if MainIdentityFacet were ever cut out — silently bypassing the
    "can't release your MAIN" guard. Now reads LibMainIdentityStorage directly.
    Source-only this pass; effective on the next diamondCut (low exploitability:
    owner-misconfig + self-harm only).

Added

  • SDK reliability: usage-accounting + trigger-lifecycle regression tests. A
    control-flow deep-dive verified the conversation usage accounting (cumulative
    sums, last_turn resets each send, no per-step double-count — both backends
    emit usage_metadata only on the terminal step), the trigger lifecycle
    (double-start guard, stop() joins, callback error/panic isolation, Drop
    aborts), and Agent::shutdown teardown order — no bug — and locked in +11
    deterministic tests (240 lib total).
  • On-chain feedback garbage collection. The FeedbackFacet's append-only
    Entry[] grew unbounded — every fleet run + probe appends an entry that costs
    storage gas and lengthens feedbackRange forever (it had reached 46). Added an
    owner-only clearFeedback() (cut into the live diamond via
    script/AddFeedbackClear.s.sol) so on-chain feedback is a TRANSIENT inbox:
    harvest/bridge off-chain (GitHub issues / harvest-feedback), then
    scripts/clear-feedback.sh GCs the storage. The immutable FeedbackSubmitted
    event log windows out naturally (100k-block cap), so localharness feedback
    still shows recent notes after a clear. Verified live: storage 46 → 0, events
    preserved.
  • CLI: publish is now one commandlocalharness publish <name> <src.rl>
    claims the subdomain first if you don't already hold its key (delegating to
    create, which still refuses names taken by others), then publishes the
    cartridge as its public face. Acts on test-user fleet feedback (nova-qa: "I
    shouldn't have to run a separate create command").
  • feedback → GitHub issues bridge (scripts/test-fleet/feedback-to-issues.mjs)
    — the first rung of agents filing their own issues: the on-chain test-user
    fleet feedback is surfaced as GitHub issues on the repo, classified
    ([BUG]bug / [FEATURE]enhancement / [FEEDBACK]feedback, all
    from-fleet), with the full text + on-chain submitter + timestamp in the body.
    Dry-run by default; --create (gh-gated, opt-in — creating public issues
    is outward-facing) files them; idempotent via a docs/feedback-bridged.txt
    dedup ledger keyed on <timestamp>:<sender>. Backed by a new machine-readable
    localharness feedback --json (+ unit test).
  • Test-user fleet (scripts/test-fleet/) — 12 persistent on-chain agent
    identities, each a distinct personality (impatient power-user, confused newbie,
    security adversary, designer, SDK dev, skeptic, mobile-only, a11y, verbose,
    terse, chaos), that dogfood the platform and file GROUNDED feedback on-chain.
    run-fleet.sh drives each persona: create → probe a live agent → reflect
    in-persona on the REAL experience → submit one [BUG]/[FEATURE]/[FEEDBACK]
    item (FeedbackFacet); read it back via harvest-feedback. Reuses the existing
    CLI — no new server. Validated live: a 3-persona sample landed real DX,
    onboarding, and security feedback on-chain (e.g. "create writes a raw private
    key to the cwd with no chmod/.gitignore").
  • SDK: a minimal getting-started example (examples/basic_agent.rs) — one
    agent turn with a custom ClosureTool + deny-by-default policy; no wallet, no
    chain, just GEMINI_API_KEY and the default features. The smallest end-to-end
    use of the core agent loop (the other two examples are live-chain harnesses).
    README quickstart verified drift-free against the real API.
  • Discoverable agent cards on the apex explore view. The global
    "explore / recent agents" view at localharness.xyz now shows each agent as a
    card with a truncated on-chain persona preview (reusing registry::personas_of
    • the card pattern from per-owner landing pages), so a first-time visitor sees
      what platform agents actually DO instead of a bare name list. Batch-fetched in
      ONE eth_call; degrades to name-only when a persona is unset.
  • SDK: comprehensive GeminiAgentConfig builder-chain doctest — the
    new() example now shows with_model / with_system_instructions /
    with_workspace / a deny-by-default with_policies allowlist, so adopters see
    how to compose the config (not just a one-liner).
  • Discoverable agent portfolios on public landing pages. A subdomain's
    default "directory" face (shown when no app/html is published) now renders the
    owner's other agents as cards — each the agent name plus a truncated preview of
    that agent's on-chain persona — instead of a bare name list, so a visitor can
    actually browse what an owner's agents DO (discovery → demand). Personas are
    batch-fetched in ONE eth_call and the card degrades to name-only when none is
    set. (registry::personas_of, templates::public_landing; monochrome,
    maud-escaped.)
  • MCP server surfaced in onboarding. localharness mcp (the stdio Model
    Context Protocol server exposing a call_agent tool to IDE clients like Claude
    Code / Cursor) shipped but was invisible in the agent-facing front doors — the
    project's clearest demand lever, undocumented. Now web/skill.md and
    web/llms.txt describe it with a paste-ready mcpServers config, the CLI
    source doc-comment Commands list includes it, and create success prints a
    one-line tip. (The runtime help text already covered it.)
  • Agent-teams P2P collaboration layer (Layer 5 wired). The foundation
    (SignalingFacet/TeamFacet, webrtc.rs transport, sharedfs_sync.rs) existed but
    had no driver; now it does, end to end: contracts/script/Add{Signaling,Team}Facet.s.sol
    (deploy + diamondCut), a Rust signaling driver in registry.rs (devices_topic/
    team_topic, announce/post_signal writes, peers_of/inbox_of reads sharing one
    (address,uint64,bytes)[] decoder — unit-tested), the connect-and-sync orchestration
    src/app/teams_sync.rs (ephemeral key → announce → discover → offer/answer over the
    on-chain inbox, blob carries the sender ephemeral since from=master → WebRTC connect
    → union sync), and a "sync my devices" button. Compile/forge-verified; goes live
    once the facets are cut (owner key) and validated across two devices. The SDP
    offer/answer is ECIES-sealed to the recipient's announced ephemeral pubkey before it
    hits the on-chain mailbox (only the <eph_hex> correlation prefix stays plaintext), so
    an observer sees no ICE candidates/topology; shared FS remains reads-only — noted.
  • CLI billing self-testlocalharness credits [--as <me>] (wallet $LH /
    per-call meter / session) and localharness topup [--as <me>] (claim the daily
    $LH allowance + deposit it into the per-request meter, sponsored). The end-to-end
    billing check any agent can run as a real user: topup → call → credits.
  • rustlite: for i in a..b { … } loops. Desugared (no codegen change) to a
    loop with the increment at the TOP and v pre-decremented, so continue stays
    correct; bounds evaluated once. Range token .. added. Render-verified at runtime.
  • First integration test (tests/tool_hook_policy.rs, 5 tests) — exercises the
    tool + hook + policy pipeline TOGETHER through the public API (the layer the
    backend loop actually runs): policy precedence (specific-deny > wildcard-allow)
    gating real ToolRunner dispatch, deny-by-default allowlists, ask-user verdicts,
    first-deny-wins hook ordering, and post-hooks observing both allow and deny
    outcomes. The repo's first tests/ suite; prior coverage was per-layer units.

Fixed

  • Accessibility: agent streams are now a screen-reader live region (+ labels).
    The #transcript container — where every streamed assistant turn, text chunk,
    and tool block lands — gained role="log" + aria-live="polite" +
    aria-atomic="false", so screen-reader users hear new responses as they
    stream
    (previously silent). Also: accessible names on the icon-only
    send/stop/close/delete buttons + the unlabeled prompt textarea, a
    role="status" live region on the status line, and aria-hidden on decorative
    glyphs. Semantic-only — zero visual change. Surfaced by the test-user fleet
    (ada-qa).
  • CLI: reject leading/trailing-hyphen names (dead-on-arrival subdomains).
    name_is_valid allowed -foo / foo- (not valid DNS labels), so create
    (and now publish) would mint them. Now rejected per RFC 1035. Surfaced by the
    test-user fleet (juno-qa) — emoji/uppercase/oversized were ALREADY caught, so
    this closed the one real residual, and confirmed the fleet (like real users)
    files speculative bugs that need scrutiny before acting.
  • MCP client: image-bearing tool results no longer silently fail.
    ContentBlock::Image's mime_type field expected snake_case, but
    rename_all="lowercase" renames only the variant tags (not struct fields) and
    MCP sends camelCase mimeType — so a real {"type":"image",…,"mimeType":…}
    block failed to deserialize, erroring the WHOLE tools/call response even
    though the server answered correctly. Added #[serde(rename = "mimeType")].
    +28 protocol/correlation/framing tests for a previously untested module (incl.
    out-of-order / unmatched / duplicate response-id correlation).
  • Context compaction: silent TOTAL context loss in tool-heavy sessions (both
    backends).
    pick_split walked the summarize/keep boundary FORWARD to avoid
    orphaning a tool pair — but in a long unbroken run of [assistant tool_use, user tool_result] round-trips every index qualified, so it chained to
    history.len() and kept ZERO messages, defeating KEEP_RECENT_TURNS exactly
    when compaction fires. The request still succeeded, so the agent just went
    silently context-blind mid-task. The boundary now walks EARLIER (a kept slice
    can only be orphaned by a leading tool_result whose call was summarized),
    keeping ≥ the recent window and preserving the tool_use↔tool_result pairing.
    +20 tests (gemini + anthropic).
  • Anthropic backend: 3 streaming/wire bugs + a compaction panic. (1) The
    final SSE frame at EOF without a trailing blank line was dropped — losing the
    turn's stop_reason + final output_tokens (the Gemini analog; ported
    take_remaining). (2) Output-token usage was double-counted — Anthropic
    streams cumulative usage (a message_start placeholder + each message_delta
    carrying the running total), but the fold SUMMED them, over-reporting metered
    tokens every turn (the credit proxy meters on these); now last-writer-wins.
    (3) An unmodeled content block (redacted_thinking, future server-side types)
    aborted the whole stream — added a Block::Other serde fallback. (4)
    render_transcript truncated long tool-result bodies with &body[..512],
    panicking on a multibyte char at byte 512 — now char-boundary-safe. +12 tests.
  • Gemini streaming: 3 SSE/wire correctness bugs. (1) A final data: frame
    with no trailing blank line at stream end was dropped — and for Gemini that
    frame carries finishReason + usageMetadata; the EOF path now flushes the
    leftover buffer (take_remaining). (2) Text parts stamped thought:false (the
    documented Gemini 3.x quirk) were discarded in the live run_turn loop — now
    accumulated as visible text, matching project_history. (3) thoughtSignature
    was never deserialized (missing camelCase rename), so it was always None and
    re-serialized wrong when echoing thinking history. +23 edge-case decoder/wire
    tests (partial frames, CRLF-split terminators, multibyte splits, the [DONE]
    sentinel, the untagged Part quirk) on a thinly-covered path.
  • Restored transcripts now show Gemini tool results, not just the calls.
    The Gemini backend's project_history emptied its pending-calls buffer when the
    assistant Content was pushed — before the following FunctionResponse
    Content arrived — so projecting wire history into a saved transcript dropped
    every tool result/error. A reloaded conversation showed tool calls with a blank
    (no result/error) pill. Now matched per-name FIFO across the two wire contents,
    lifting the live error convention ({"error": …} → typed error). +4 tests
    incl. old-format TranscriptEntry backward-compat. (Anthropic/local were already
    correct; the replay path already reuses the live tool-call templates.)
  • SDK: closed a ChatResponse cursor lost-wakeup window.
    ChatCursor::poll_next created its tokio::Notify waiter AFTER checking the
    chunk buffer — a producer notify_waiters() landing in that gap could be
    missed, hanging a cursor at the stream tail. It now registers the waiter
    before the check (tokio's canonical register-then-check), so a parked cursor
    always has a live waiter. Surfaced by 5 new multi-cursor concurrency tests
    (late-cursor replay-from-zero, independent advance, error fan-out, tail
    completion, cross-thread wake) for a path that previously had zero coverage.
  • SDK: conversation::step_to_chunks no longer panics on a non-char-boundary
    offset.
    The terminal-response tail-recovery byte-sliced
    content[text_emitted..]; when a harness split a multibyte UTF-8 char across
    deltas, text_emitted landed mid-char and the slice panicked (a library
    panic crashes the consumer). It now uses str::get, degrading to a no-op on a
    bad boundary, and the doc comment is corrected (it's a BYTE offset, not chars).
    +3 regression tests.
  • Browser app: the system prompt no longer advertises Gemini-only tools to
    Claude agents.
    generate_image and start_subagent aren't registered on the
    Anthropic backend, but the prompt listed them unconditionally — so every
    Claude-backed agent was told it had tools it couldn't call. Those two bullets
    are now gated on the selected backend (!is_anthropic(model); the model is
    knowable at prompt-build time). Also RUNTIME_SUMMARY no longer claims
    "Gemini-backed" (the platform runs Gemini, Claude, or local Gemma).
  • CLI: clearer empty-key error + doc completeness. An empty
    .localharness.key produced a cryptic wallet-parse error; it now reports the
    file is empty and to recreate it. credits/topup added to the source command
    list (they were only in the runtime help).
  • Credit proxy: CORS origin check hardened + clearer first-call 402. The
    localhost allowance used startsWith('http://localhost'), which also matched
    http://localhost.evil.com — an attacker origin could read proxy responses
    cross-origin; it now parses the URL and checks the hostname (localhost /
    127.0.0.1 over http only). Separately, the first-call 402 (no active session or credit) was cryptic; it now explains the free-beta auto-session
    and how to get $LH.
  • Browser app: XSS hardening — unescaped error/dynamic strings in innerHTML.
    Four DOM sinks interpolated RPC error strings and on-chain-derived values
    straight into HTML via format! (not maud), so an attacker-controlled RPC node
    could return an error containing <script>/<img onerror> that executes in the
    wallet-bearing origin: the agents-list + explore-grid error paths
    (mod.rs), and the device-signer list + sync-result (events.rs). All now
    build via maud (auto-escaped); a full sweep confirmed every other sink already
    escapes (dom::msg_span, set_status/set_text_content, maud templates).
    Closes the long-open "escape error-string innerHTML" item. Also: an orphaned
    ToolResult (no matching pending call) now logs a warning instead of silently
    dropping (chat.rs).
  • Anthropic backend: malformed streamed tool-args no longer silently run with
    {}.
    A non-empty partial_json that failed to parse executed the tool with
    an empty object (silent wrong-args); it now surfaces a real tool error
    (is_error ToolResult) the model can retry on. The legitimate no-arg ({})
    and valid-JSON paths are preserved; +3 unit tests.
  • README: stale localharness = "0.20""0.24"; documented the anthropic
    and local cargo features
    (agents copy-paste the quickstart — the version was
    4 minors behind).
  • docs.rs: 4 broken intra-doc links resolved (raster Viewport /
    Viewport::full, compose Pending, policy FS_TOOLS) — the module-level
    //! docs used bare paths that didn't resolve; now fully-qualified
    (crate::raster::Viewport) or de-linked for the private FS_TOOLS. cargo doc --no-deps is warning-free.
  • Credit proxy: the $LH meter debit is now authoritative (closes burst
    over-serving).
    The proxy gated a request with a cheap creditOf read then
    fired the meter() debit awaiting only SUBMISSION — so a flurry of concurrent
    requests could all pass the gate and be served while only the first N debits fit
    the balance; the rest reverted on-chain (InsufficientCredits) unnoticed and the
    PLATFORM ate the over-served calls. (User balances were never at risk — the
    contract reverts rather than underflowing.) meterDebit now awaits the RECEIPT
    and a definitive revert returns 402 (unless a session also covers the caller);
    ambiguous RPC/timeout still serves, to avoid double-charging on retry.
  • Per-call $LH billing now actually decrements. Credits were stuck because the
    browser opened a FREE session (sessionPrice=0) that bypassed the meter, and the
    pill watched the wallet balance the meter never touches. The proxy now PREFERS a
    funded meter over a free session (so billing happens even with a session active);
    the browser funds the meter from the wallet (not a session) and shows total
    spendable (wallet + meter) at 2 decimals; redeem deposits immediately. Verified
    live: meter 100.00 → 99.97 LH across 3 metered calls.
  • rustlite: hex integer literals (0xFF0000). The lexer split 0x… into 0 +
    an identifier ("expected Semi, got Ident"); now lexed base-16 (underscores + an
    i32/i64 suffix allowed; an empty 0x is a clean error). Colours like 0xFF0000
    compile — the single most common cartridge literal. (On-chain feedback #15/#16.)
  • rustlite: compound assignment (x += 1, -=, *=, /=, %=). The lexer
    split += into + then =, so these threw the confusing "expected expression,
    got Eq" — the TRUE source of that feedback (if-exprs always compiled). Now lexed
    as compound-assign tokens and desugared place OP= vplace = place OP v
    (operand order preserved for the non-commutative ops). Found by the test-user
    dogfood pass; filed + fixed in the same loop.
  • rustlite: break/continue inside an if or match arm hung the cartridge.
    Codegen hardcoded the branch depth (br 1 / br 0), ignoring the enclosing
    conditional frames — so while c { if x { break } } branched to the loop instead
    of out of it and spun forever (any guarded break/continue, the common case).
    Now a per-function extra_depth counter tracks open if/match frames between the
    break/continue and its loop, so the branch reaches the right target. Runtime-proven
    via the render harness (the hanging cases now terminate). A SEVERE pre-existing bug
    the test-user dogfood pass surfaced (it's what made for-loops hang at first).
  • rustlite: char literals ('A'). The lexer hit unexpected byte 0x27 on a
    '; now lexed to the byte value as an IntLit (chars are i32 glyph codes for
    draw_char) with string-style escapes; empty/multi-byte literals are clear errors.
    (On-chain feedback #15.)
  • rustlite: block comments (/* … */, nesting allowed). Only // was skipped,
    so a /* lexed its leading / as division → "expected expression, got Slash".
    Now skipped as trivia like line comments. (LLM-authored source emits these
    constantly — ties into the #19/#20 first-shot-compile pain.)
  • rustlite: top-level consts resolve. const W: i32 = 256; parsed + typechecked
    but a function referencing W errored "undefined variable" — consts were never put
    in scope. Now processed before functions (order-independent) and INLINED at each use
    (a clone of the typed value → no runtime global, no codegen change); consts may
    reference earlier consts. Runtime-verified (a const loop bound iterates the right N).

Added

  • rustlite: arrays (literals + indexed reads)let pal = [0xFF0000, 0x00FF00];
    and pal[i] (variable index). The single biggest missing feature: lookup tables,
    palettes, sine/tile data. Stored in a static linear-memory region (re-initialised
    when the literal evaluates), value = base pointer; arr[i] lowers to
    i32.load(base + i*4). v1 is i32 elements, read-only (mutation arr[i] = v is a
    clean "invalid assignment target" error, deferred). New ResolvedType::Array,
    ExprKind::ArrayLit/Index. Runtime-verified ([3,5,7][1]→5, loop-lookup over a
    table). The first piece of the linear-memory model that full tuples will share.
  • rustlite: bitwise + shift operators& | ^ << >> (i32 + i64), with
    Rust precedence (| < ^ < & < <</>> < +). Previously << lexed as two
    <, and &/| were rejected as "no references/closures" — so color packing
    (r<<16)|(g<<8)|b and masks & 0xFF were impossible
    , the most common cartridge
    idiom. Lexer/AST/parser/typecheck (integer-only)/codegen all wired; runtime-verified
    (values + precedence) via the render harness. Found by the test-user dogfood pass.
  • rustlite: as numeric castst as f64, (a * 10.0) as i32, i32↔i64, etc.
    Previously as lexed as a bare identifier → "expected Semi, got Ident". Now an As
    keyword + ExprKind::Cast with Rust precedence (tighter than * / %, looser than
    unary); the codegen emits the right convert/trunc/extend/wrap/promote/demote opcode
    per (from,to). The graphics staple — float math, then cast to a pixel coord.
    Runtime-verified (3.7 as i32→3, (1.5*4.0) as i32→6).
  • rustlite: match range patterns0..=5 => … (inclusive) and 0..5 => …
    (exclusive). Previously the .. in an arm hit "expected FatArrow, got DotDot". Now
    a ..= (DotDotEq) token + an IntRange pattern lowered to scrutinee >= lo & scrutinee <(=) hi. Runtime-verified (in-range vs out-of-range select the right arm).