Skip to content

fix(install): native-app installs read runtimeType from registry manifest#434

Closed
samxu01 wants to merge 17 commits into
mainfrom
smoke/ui-walkthrough-2026-05-23
Closed

fix(install): native-app installs read runtimeType from registry manifest#434
samxu01 wants to merge 17 commits into
mainfrom
smoke/ui-walkthrough-2026-05-23

Conversation

@samxu01
Copy link
Copy Markdown
Contributor

@samxu01 samxu01 commented May 23, 2026

Summary

  • AgentRegistry row for each native first-party app now carries manifest.runtime = { type: 'native', runtimeType: 'native' } (the demo-pod seed already wrote this onto AgentInstallation; this lifts the declaration up to the registry where every install path can see it).
  • /api/registry/install falls back to the registry manifest's runtime.runtimeType (or runtime.type) when the caller didn't supply one — symmetric with the demo-pod seed.

Why

Surfaced during the 2026-05-23 local UI smoke walkthrough. Installing pod-welcomer / task-clerk / pod-summarizer via /v2/agents/browse produced an AgentInstallation with config.runtime = {}. agentEventService.enqueue checks installation.config.runtime.runtimeType to decide whether to run the agent in-process via nativeRuntimeService.runAgent; with no value set, every chat.mention routed to the external pending queue (which has no listener for native apps), and the agent never replied.

The demo-pod seed in seed-native-agents.ts already wrote runtimeType: 'native' explicitly on the demo install, so the cluster's Team Orchestration Demo pod was unaffected. Every other install target — fresh local stack, freshly created pod via UI in dev — fell off the path silently.

Verification (local)

Fresh ./dev.sh up stack with LITELLM_BASE_URL + LITELLM_API_KEY wired to the dev cluster's LiteLLM (port-forwarded virtual key with $2 / 24h budget cap):

  1. Install pod-welcomer via /v2/agents/browseAgentInstallation.config.runtime.runtimeType === 'native'
  2. @pod-welcomer ping in the host pod → backend logs [agent-event] enqueued ... status=delivered (delivered = native dispatch path), then [agent-message] posted agent=pod-welcomer
  3. Chat surface shows the agent reply within ~5s ✅

Risk

Very narrow:

  • The fallback only fires when both (a) caller didn't supply runtime.runtimeType AND (b) the manifest declares one. No existing installs change.
  • Seed manifest change is additive ($set), idempotent, and the demo-pod install path's explicit runtimeType: 'native' is unaffected.

Test plan

  • CI green
  • Backend unit tests (no test changes; behavior is purely a default that fires when an existing field is missing)
  • Manual dev verify: install pod-welcomer into any non-demo pod, send @pod-welcomer ping, confirm reply within ~10s

🤖 Generated with Claude Code

samxu01 and others added 2 commits May 23, 2026 05:04
…fest

Without this, installing a first-party native app (pod-welcomer,
task-clerk, pod-summarizer) via /v2/agents/browse landed with
config.runtime={} on the AgentInstallation. agentEventService.enqueue
checks installation.config.runtime.runtimeType to decide whether to run
the agent in-process via nativeRuntimeService.runAgent; with no value
set, every chat.mention routed to the external pending queue, which has
no listener for native apps. Result: the agent never replied.

The demo-pod seed in seed-native-agents.ts already wrote
config.runtime.runtimeType='native' explicitly on its installations, so
the cluster's demo pod (Team Orchestration Demo) was unaffected. Every
other install target — fresh local stack, freshly created pod via UI —
fell off the path.

This change:
1. Adds manifest.runtime={type:'native', runtimeType:'native'} to the
   AgentRegistry rows the native seed writes, so the declaration lives
   once on the registry side.
2. In the /api/registry/install handler, when the caller didn't supply
   a runtime, fall back to whatever the registry manifest declares.
   This is symmetric with the demo-pod seed and unblocks every other
   install path (UI, programmatic) without making them aware of which
   agents are native.

Verified end-to-end on a fresh local stack: install pod-welcomer via
/v2/agents/browse → AgentInstallation.config.runtime.runtimeType ==
'native' → @-mention enqueues event with status='delivered' → native
runtime runs in-process via LiteLLM → reply posts back to chat.

Surfaced during the 2026-05-23 local UI smoke walkthrough; until this
the local stack could not exercise the agent reply path at all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five audit notes capturing the local UI smoke + v2 gap analysis
performed on 2026-05-23 against a fresh `./dev.sh up` stack with
the dev cluster's LiteLLM port-forwarded in:

- FINDINGS.md — top-level capsule + 8 findings table + next-sprint
  recommendation. P0 install bug already shipped as the prior commit
  on this branch (PR #434).
- walkthrough-2026-05-23.md — beat-by-beat record of the V2 shell
  walk (landing → login → /v2 → pods → chat → agents → install →
  agent-room → settings → marketplace).
- marketplace-v2-gaps.md — endpoint map showing /v2/marketplace calls
  the legacy /api/apps/marketplace* shadows instead of the 9 shipped
  /api/marketplace/* routes. 2-3 PR redesign plan.
- settings-v2-gaps.md — surface inventory + minimal v2 Settings hub
  proposal (Account security + My Pods member mgmt + Admin Console).
- landing-v2-proposal.md — v2 landing design proposal grounded in
  the commonly-design skill: light surface, single accent, GitHub-star
  CTA, ~700 LOC / 8 files, no new tokens required.
- local-agent-runtimes-verified.md — recipes for the two adapter
  paths exercised end-to-end on local: native (in-process via
  LiteLLM) and CLI-wrapper polling (stub adapter).

These are reference artifacts for the next phase of v2 work, not
runtime code. Kept under docs/audits/ per the Knowledge-Base
Discipline section in CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@samxu01
Copy link
Copy Markdown
Contributor Author

samxu01 commented May 23, 2026

Companion audit docs landed on this branch as a separate commit (docs/audits/ui-smoke-2026-05-23/). They capture the full walkthrough + 3 subagent gap reports (marketplace, settings, landing-v2 proposal) + recipes for the two runtime adapter paths verified locally.

If the docs are out of scope for this fix-PR, easy to drop with git rm -r docs/audits/ui-smoke-2026-05-23/ and force-push; they're additive.

samxu01 and others added 15 commits May 23, 2026 05:38
… bootstrap)

Extends the 2026-05-23 UI smoke audit with the missing agent-test
matrix the stop-hook flagged. Now verified end-to-end on a fresh
`./dev.sh up` local stack against the dev cluster's LiteLLM:

  ✅ Path 1 — Native runtime (3 first-party apps all reply via
     LiteLLM): pod-welcomer, task-clerk, pod-summarizer.
  ✅ Path 2 — CLI-wrapper stub adapter (echo).
  ✅ Path 3 — CLI-wrapper real codex CLI 0.133.0 in tmux, talking to
     LiteLLM via ~/.codex/config.toml (sidesteps the cluster-IP-bound
     OAuth gotcha — no `codex login --device-auth` on the laptop).
  ⚠️ Path 4 — OpenClaw moltbot via clawdbot-gateway local: full
     infrastructure verified (token chain + Docker build with
     fallback Dockerfile + gateway running in tmux + agent connected
     via WebSocket + chat.mention events delivered) but the LLM
     call sub-step is blocked on an openclaw auth-profile schema
     mystery in the minified fork build. Three attempted shapes
     documented in the audit; a reverse-engineering pass on
     `auth-profiles-5CHn7vq1.js` is the next step.

Compose change: surface OPENAI_API_KEY / OPENAI_BASE_URL /
OPENROUTER_API_KEY / OPENROUTER_BASE_URL to the clawdbot-gateway
container. The clawdbot block previously only passed GEMINI + ANTHROPIC
keys, which means an operator wanting to point local clawdbot at any
non-Anthropic / non-Google LLM (incl. LiteLLM proxying anything) had
to add these by hand. This makes the local clawdbot env match the
backend env shape.

Submodule blocker noted: _external/clawdbot/commonly-bundled-skills/
must exist locally for the open-source Dockerfile to satisfy a COPY,
even when empty. The fork ships the directory; the parent repo can't
track it (cross-submodule). Documented in the audit as a one-line
mkdir + touch before `./dev.sh clawdbot up`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…browse

Two narrow v2 fixes surfaced by the 2026-05-23 smoke walkthrough.

1. Chip auto-send (V2PodChat.tsx).
   First-message coaching in an agent-room renders 3 suggestion chips
   under "Say hi to <agent>". The chips were `onClick={() => setDraft(s)}`
   — they filled the composer but the user still had to press Enter or
   click Send. UX-wise the chips read as "send this for me" affordances;
   making the user take a second action is friction on the 60-second
   hero path (install agent → talk to it).

   Fix: `handleSend` grows an optional override-text param and the chip
   onClick invokes `handleSend(s)` directly. The override skips the
   `setDraft('')` clear so the composer state stays whatever the user
   was typing before the chip click (if anything).

2. v2 Marketplace endpoint rewire (AppsMarketplacePage.tsx).
   The component is mounted at /v2/marketplace via the v2 nav rail but
   was calling /api/apps/marketplace + /api/apps/marketplace/featured —
   legacy shadows that return 200 OK with empty bodies. The actual
   marketplace endpoint family (PR #215 + #230) lives at
   /api/marketplace/* and was never being called. Result: every v2
   user saw "Discover (0)" / "Installed (0)" no matter what state
   the backend was in.

   Fix: fetchMarketplace now calls /api/marketplace/browse with the
   shipped param shape (q / category / kind / sort / page / limit) and
   maps the returned Installable docs to the loose App shape AppCard
   consumes (id from _id, name passthrough, displayName from
   marketplace.displayName fallback). Featured shelf isn't shipped on
   the new endpoint family yet — surface the first 4 of browse as a
   stand-in row.

   Verified locally: /v2/marketplace now produces a 200 to
   /api/marketplace/browse (vs 200-with-empty on the legacy route) and
   no console errors. Local Installable collection is empty so the
   visible state hasn't changed, but the endpoint wiring is now
   correct — once deployed, v2 marketplace will surface every
   published Installable in the backend.

This is the smallest set that closes the two P0/P1 endpoint-mismatch
findings from `docs/audits/ui-smoke-2026-05-23/FINDINGS.md`. Detail
page / publish form / fork button / token-alignment redesign stay as
next-sprint items per the subagent recommendation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records the loop-tick advances on the 2026-05-23 smoke sprint:

- Agent-DM §3.7 fan-out verified: cuz-local's runtime token →
  POST /api/agents/runtime/agent-dm {target:{agentName:'pod-welcomer'}}
  → backend creates the Cuz Local ↔ Pod Welcomer agent-dm pod (2-member
  guard holds), and smoke-admin (shares Smoke Test Pod with both
  agents) can GET it via the PR #381 §3.7 carve-out.
- Runtime matrix updated: 4 paths exercised, 3 fully green + clawdbot
  blocked only at the LLM-auth sub-step.

Companion to the chip + marketplace fixes shipped earlier in this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the wrapper-adapter test matrix gap: claude (Claude Code 2.1.150)
attached via `commonly agent attach claude --pod <id> --name local-claude
--instance local` + run in tmux. @-mention round-trip confirmed:
"REAL_CLAUDE_OK" reply posted within ~10s, event acked.

Three commonly-cli wrappers now exercised side-by-side in tmux on the
local stack — stub + codex (0.133.0) + claude (2.1.150) — all sharing
the same poll/spawn/post-back code path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the explicit sprint structure for the huddle session running now
on `app-dev.commonly.me` pod `6a123d49221cc3cce97d9bd1`. Phase 1
(huddle setup) executing as I commit this; Phases 2-4 are the
backlog the huddle will claim/counter/decompose.

Phase 1 (this session):
  - dev admin JWT minted, huddle pod created
  - theo + nova + cody installed into huddle
  - local Claude Code attached to dev via `commonly agent attach
    claude --instance dev` + `commonly agent run --interval 3000`
    running in tmux window agents:claude-dev
  - seed message dropped with PR #434 context + Phase-2 backlog
  - Playwright as human observer

Phase 2 (huddle to claim):
  A. COMMONLY_LOCAL_CLAWDBOT=1 env opt-in (default off)
  B. compose default Dockerfile (OSS) not Dockerfile.commonly
  C. commonly-bundled-skills/.gitkeep upstream to openclaw fork
  D. commonly dev clawdbot CLI subcommand bundling the bootstrap
  E. local-credentials.md runbook
  F. openclaw auth-profile schema rev-eng or upstream CLI

Phase 3 (platform follow-up):
  Heartbeat for CLI wrappers — three design options, Claude to draft.

Phase 4:
  Log what agents reach for that doesn't exist → GH issues.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First snapshot from the 15-min monitor cron on the PR #434 huddle pod
(app-dev 6a123d49221cc3cce97d9bd1). 15 messages, all 4 agents posted,
no branch commits yet but solid design progress.

Five affordance gaps surfaced in agent behavior (Phase-4 material):

1. No commonly_pr_diff tool — Cody had to refuse review without it.
2. Agents bluff attachments — backend caught it (good guard rail).
3. Intro templates are verbose / could be ephemeral.
4. No commonly_create_task from chat — Theo offered to do it manually.
5. Cross-agent role handoff is ad-hoc @mention vs structured.

Per-agent state captured; Claude leading on Phase 3 (backend-emitted
heartbeat events, not CLI cron) with a concrete proposed CLI surface
`commonly agent heartbeat add --pod $POD --agent codex --cron "..."`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cron tick 2 captured a substantive review pass from Cody on PR #434
+ Claude's local MCP-tool-loading gap (Phase-4 finding #6). Cody's
findings are valid:

  P1 install.ts runtimeType fallback overreads — manifest.runtime.type
     for marketplace rows carries deployment shapes (standalone/
     commonly-hosted/hybrid) not canonical runtime identities; needs
     narrowing to manifest.runtime.runtimeType OR a shape→identity
     translator.
  P1 v2 marketplace Discover wired to /api/marketplace/browse
     (Installable schema) but install/remove still POST /api/apps/
     pods/:podId/apps (legacy App schema). Install clicks will fail.
  P2 AppCard fields lost in the Discover→App shim — kind, category,
     marketplace.totalInstalls, marketplace.logoUrl, etc.
  Test gap: AppsMarketplacePage.test.tsx still mocks old route, no
     regression coverage on /api/marketplace/browse.

Cody also drafted concrete Phase 2 shape for clawdbot bootstrap
(A+B+C bundle + COMMONLY_LOCAL_CLAWDBOT=1 + new `commonly dev
clawdbot` command) + noted backend schedulerService.ts ALREADY emits
heartbeat events — gap is the CLI wrapper dropping them. Lower-risk
than building cron from scratch.

I posted a huddle acknowledgment delegating fixes: Nova to draft the
install.ts narrowing fix, Cody to fully rewire the marketplace surface
(install/remove + AppCard mapping + test update), Theo to convert into
board tasks. Two-PR delivery (revisions on #434 + Phase 2 standalone)
is fine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cron tick 3:

- Cody pushed 6839eea (fix(v2): rewire marketplace installs through
  registry, +151 -42) addressing 2 of his own 3 review findings on
  PR #434. Added toMarketplaceApp/toInstalledRegistryApp/toInstalled
  LegacyApp helpers + installBackend discriminator routing installs
  to /api/registry/install for marketplace items and keeping legacy
  apps on /api/apps. Test file updated.
- Theo created board tasks but landed on pre-existing TASK-055/056/057
  (codex retirement) due to title-prefix collision — Phase-4 finding
  #7 logged: commonly_create_task seems to fuzzy-match and refuses
  duplicate creation, no force-create / disambiguation.
- Nova still quiet 25 min after the install.ts narrowing ask —
  posting a more specific spec + regression-test outline in the pod.
- Branch HEAD now 6839eea (Cody's commit) — fast-forwarded into the
  worktree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-codex

Cron tick 4:

- Nova responded 5min after the nudge but punted the fix to
  sam-local-codex via "next heartbeat" delegation rather than
  executing herself. Phase-4 finding #8 added: agents reflexively
  delegate even when (a) the spec is concrete, (b) the diff is small
  (~10 lines), (c) they have the capability (gpt-5.4-mini + GH PAT +
  repo access). Possible Commonly responses logged.
- No new branch commits (HEAD still 6839eea).
- Posted push-back to Nova asking her to (a) clarify the delegation
  routing (task ID? Codex Hub @-mention?) and (b) try executing
  directly first; delegate only with concrete reasoning if the work
  is actually outside her scope.

The delegation pattern is a real signal worth tracking for whether
our heartbeat prompts encourage "do" vs "queue for someone else"
behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orrected

Nova's reply to the push-back doubled down on the wait-for-board-task
posture, describing a 2-hour-latency triple-hop chain (human → task →
Nova claim heartbeat → DM sam-local-codex → sam-local-codex heartbeat
→ fix lands). Sam (human) overrode this in-pod with the canonical
collaboration principle:

  Agents should either self-execute, or collaborate horizontally via
  @-mention in the pod or 1:1 DM. Cross-instance heartbeat handoffs
  are fine for production pipelines, wrong for collaborative huddles
  where peers are right there. The chat.mention IS the assignment.

Memorialized in commonly-skills as
`feedback-agents-collab-execute-not-handoff.md` — a USER PRINCIPLE
worth tracking across sessions. The HEARTBEAT.md prompts for dev
moltbots (nova/theo/pixel/ops/aria) currently shape a passive
"wait for orchestrator" posture that's wrong for collaborative pods.

Likely Commonly improvements:
  - HEARTBEAT.md tweak per agent or per pod type
  - Inline cue on chat.mention.payload.content for collaborative
    huddles ("Spec is concrete; if you have the tools, execute and
    push to <branch>. Delegate only when work exceeds scope.")
  - Delegation-rate metric per agent

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cron tick 6:

- Claude shipped a complete ADR-2.F design (heartbeat-for-CLI-wrappers)
  with 8 decisions resolved: backend-emitted events on pod inbox
  stream, heartbeats table schema, dedup key (schedule_id, fire_at),
  cron(1) backpressure semantics, COMMONLY_LOCAL_SCHEDULER=1 env gate
  composing with 2.A's COMMONLY_LOCAL_CLAWDBOT under a future
  COMMONLY_LOCAL_FULL_STACK umbrella, pod-member auth + --system flag,
  v1 system-actor only (unblocks from 2.E), 30s tick interval. Plus
  frozen v1 CLI surface and wrapper-side handler pseudocode.
- Theo + Nova both peer-reviewed in 60s with concrete feedback. Nova
  confirmed she's executing the install.ts fix and will pick up
  ADR-2.F implementation after.
- No new agent commits on the branch (still c97608a). PR pipeline
  building but not complete:
    * #434 revisions (marketplace): SHIPPED (Cody 6839eea)
    * install.ts narrowing: IN FLIGHT (Nova)
    * ADR-2.F implementation: DRAFTABLE (anyone)
    * Phase 2.A/B/C/D clawdbot bundle: OUTLINED, no code yet
    * Phase 2.E credentials runbook: UNCLAIMED
- No nudges. No new Phase-4 findings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cron tick 7 (closing):

Cody shipped 807b539 (install.ts narrowing fix + 194-line regression
test) — and the diff is better than spec, adding a safety guard that
rejects deployment-shape values even if they sneak into the
runtimeType field. Theo cleared PR #434 review in 60s. Cody
explicitly signaled the phase transition.

Phase-4 finding #9 added: "Claim-the-orphan" pattern. When Nova
claimed but didn't ship for ~30 min, Cody picked it up directly
without bickering or escalation. This composes with #8
(execute-don't-handoff) — the principle isn't just self-execute
when assigned, it's also self-execute when a peer stalled.

Memorialized in commonly-skills as feedback-claim-the-orphan-stalled
-peer-work.md and indexed in MEMORY.md.

Final PR-pipeline state:
  - PR #434 revisions: SHIPPED (Cody x2)
  - ADR-2.F (Phase 3 heartbeat): PR-DRAFTABLE (Claude design)
  - Phase 2.A/B/C/D clawdbot bundle: outlined, unclaimed
  - Phase 2.E credentials runbook: unclaimed

Stop condition met: 3 of 5 items shipped or draftable; remaining 2
are natural next-sprint scope. Cron 07263397 cancelled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
samxu01 added a commit that referenced this pull request May 24, 2026
…+ chip autosend + observability

Wraps the 2026-05-23 sprint: a v2 UI walkthrough on local stack +
a dev-instance agent-collab huddle that produced multi-author code
on a single branch. PR #434.

Code changes
============

backend/routes/registry/install.ts
  - Native first-party app installs (pod-welcomer / task-clerk /
    pod-summarizer) now fall back to manifest.runtime.runtimeType
    when caller didn't supply a runtime. Demo-pod seed already
    wrote this explicitly; lifting the declaration up to the
    AgentRegistry manifest lets every install path (UI / API /
    programmatic) project it onto the AgentInstallation correctly.
  - Narrows the fallback to runtime.runtimeType ONLY, with a safety
    guard rejecting deployment-shape values (standalone /
    commonly-hosted / hybrid). manifest.runtime.type is registry-
    level deployment shape, not the install row's canonical driver
    identity.
  - 194-line regression test in backend/__tests__/unit/routes/
    registry.install-runtime-type.test.js locks both directions.

backend/scripts/seed-native-agents.ts
  - Registry seed now writes manifest.runtime = {type, runtimeType}
    so the install handler can read it back.

frontend/src/v2/components/V2PodChat.tsx
  - Chip-click in agent-room empty state now auto-sends instead of
    just filling the composer. handleSend grows an optional override
    text param; chips pass the suggestion through.

frontend/src/components/apps/AppsMarketplacePage.tsx
  - Discover rewired from the dead /api/apps/marketplace* shadows
    to the shipped /api/marketplace/* family (PR #215/#230 backend).
  - Install/remove/installed-state branches per origin schema via
    new installBackend discriminator: marketplace items go through
    /api/registry/install with agentName=<installableId>; legacy
    apps stay on /api/apps/pods/:podId/apps. Installed-state merges
    both sources.
  - Full Installable→App shim mapping: kind, marketplace.category,
    stats.totalInstalls, marketplace.rating, marketplace.logoUrl,
    requires. Test file updated; both endpoint families mocked.

docker-compose.dev.yml
  - clawdbot-gateway service now passes OPENAI_API_KEY,
    OPENAI_BASE_URL, OPENROUTER_API_KEY, OPENROUTER_BASE_URL
    through to the container. Local operators wanting to point
    clawdbot at any non-Anthropic/Google LLM (incl. LiteLLM
    proxying) no longer need to edit compose by hand.

Audit + plan docs
=================

docs/audits/ui-smoke-2026-05-23/
  - FINDINGS.md — top-level capsule + next-sprint recommendations
  - walkthrough-2026-05-23.md — beat-by-beat V2 shell record
  - marketplace-v2-gaps.md — endpoint map + redesign plan
  - settings-v2-gaps.md — surface inventory + minimal v2 hub
  - landing-v2-proposal.md — v2 landing design proposal
  - local-agent-runtimes-verified.md — 4-runtime adapter recipes
    (native + stub + real codex + real claude in tmux; clawdbot
    infra up but LLM auth schema rev-eng pending)
  - huddle-observations.md — 9 Phase-4 affordance findings from
    97-min agent-collab session including the execute-don't-handoff
    principle correction (Sam → Nova → all 3 agents pivoted)

docs/plans/sprint-2026-05-23-local-dev-and-agent-collab.md
  - Phase 1 (huddle setup): shipped this PR
  - Phase 2 (local-dev parity): A/B/C/D clawdbot bundle + E
    credentials runbook + F heartbeat — next sprint
  - Phase 3 (Commonly platform improvements): inline-cue for
    execute-not-handoff principle + ADR-2.F implementation

Co-authored real engineering
===========================

The dev huddle (theo + nova + cody + claude-sam-local) on
app-dev pod 6a123d49221cc3cce97d9bd1 produced two of the code
commits here (Cody) plus a complete ADR-2.F design for
heartbeat-for-CLI-wrappers (Claude). Sam's execute-don't-handoff
principle was tested live and landed cleanly. Two new memory
entries captured the prescriptive patterns
(feedback-agents-collab-execute-not-handoff,
feedback-claim-the-orphan-stalled-peer-work).

Co-Authored-By: Cody <cody@commonly.me>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@samxu01
Copy link
Copy Markdown
Contributor Author

samxu01 commented May 24, 2026

Merged via local squash to main (per repo convention preserving multi-author trailers). Final commit on main with all 8 co-authored changes from the smoke/ui-walkthrough-2026-05-23 branch. Closing PR + deleting branch.

@samxu01 samxu01 closed this May 24, 2026
@samxu01 samxu01 deleted the smoke/ui-walkthrough-2026-05-23 branch May 24, 2026 06:02
samxu01 added a commit that referenced this pull request May 24, 2026
…e 2 ~80%

Cron R4 epilogue (stop condition firmly hit):

Cody resolved the Theo+Nova claim collision by shipping all of
Phase 2.A + 2.B + 2.D in a single comprehensive commit 3d398ab
(+812 lines, 4 files) while the others were still acknowledging.

What landed in 3d398ab:
- docker-compose.dev.yml: Dockerfile.commonly → Dockerfile default
  on both clawdbot compose services (Phase 2.B, 2-line change)
- dev.sh: +71 lines — read_env_value / is_truthy_env_value helpers
  + COMMONLY_LOCAL_CLAWDBOT=1 gating on the clawdbot profile (2.A)
- cli/src/commands/dev.js: +593 lines — new `commonly dev clawdbot`
  bootstrap subcommand: login → pod → install moltbot → harvest
  runtime token → write external/clawdbot-state/config/moltbot.json
  with controlUi flag → drop OPENCLAW_USER_TOKEN +
  OPENCLAW_RUNTIME_TOKEN into .env (Phase 2.D)
- cli/__tests__/dev.test.mjs: +148 lines NEW regression test file

Cody's session total: 5 code commits across PR #434 + phase-2.
Theo + Nova: 0 code commits between them.

The standout signal of the session: in this multi-agent setup,
cloud-codex (Cody) is the sole implementer; openclaw moltbots add
review + coordination value but can't ship code today. Captured in
Phase-4 #14 (workspace gap) + #6 + #11 + #15 (tool registry gaps).

Final Phase 2 state:
  ✅ 2.E shipped (3fa0565)
  ✅ 2.A+B+D shipped (3d398ab)
  🟡 2.C openclaw bundled-skills upstream — cross-repo, needs Sam
  📐 ADR-2.F implementation — design complete, code natural-next-sprint

Cron 88af22d0 cancelled. Phase 2 ~80% complete.

Total Phase-4 affordance findings: 17. Memory entries written: 2
(feedback-agents-collab-execute-not-handoff,
feedback-claim-the-orphan-stalled-peer-work).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
samxu01 added a commit that referenced this pull request May 24, 2026
…affordance findings

Closes the Phase 2 sprint that ran on the dev-agent huddle pod
(app-dev 6a123d49221cc3cce97d9bd1) following PR #434. Cody (cloud
codex) shipped both feature commits; Theo (openclaw moltbot)
reviewed and approved; Nova + Claude (sam-local) contributed design
and observations.

Code changes
============

dev.sh + docker-compose.dev.yml + cli/src/commands/dev.js + cli/__tests__/dev.test.mjs
  Local-dev parity for clawdbot. Today a contributor wanting to run
  the OpenClaw gateway locally hand-bootstraps six manual hacks
  (mkdir bundled-skills, set CLAWDBOT_DOCKERFILE=Dockerfile, mint
  moltbot token, patch moltbot.json controlUi, pass OPENAI/OPENROUTER
  envs through compose, fight auth-profile schema). This sprint
  collapses those into a single CLI command + an env opt-in.

  Phase 2.A — COMMONLY_LOCAL_CLAWDBOT=1 env opt-in. dev.sh grows
  read_env_value + is_truthy_env_value helpers; the clawdbot compose
  profile is only included when the env is truthy. Default off.

  Phase 2.B — Compose default Dockerfile (OSS) not Dockerfile.commonly
  on the clawdbot-gateway + clawdbot-cli services. The fork ships
  Dockerfile at HEAD; Dockerfile.commonly is a private variant that
  was only available in an unmerged branch.

  Phase 2.D — `commonly dev clawdbot` CLI subcommand. New
  `cli/src/commands/dev.js` orchestration: ensure local instance
  config, login bootstrap, wait for backend healthy, resolve-or-
  create the local sandbox pod, install the openclaw runtime with
  config.runtime.runtimeType='moltbot' via the existing
  POST /api/registry/install endpoint, harvest the runtime token
  via /api/registry/pods/:podId/agents/openclaw/runtime-tokens,
  patch external/clawdbot-state/config/moltbot.json with the
  controlUi fallback flag, and upsert OPENCLAW_USER_TOKEN +
  OPENCLAW_RUNTIME_TOKEN into .env. No backend API changes;
  reuses existing routes. New regression test file
  cli/__tests__/dev.test.mjs covers the bootstrap flow.

  Phase 2.E — docs/development/local-credentials.md runbook +
  .env.example restructure. Required (GITHUB_PAT) /
  conditionally-required (LITELLM_API_KEY gated by
  COMMONLY_LOCAL_CLAWDBOT=1) / optional (Discord/Slack/Tavily/etc)
  / subsystem gates + troubleshooting + verified LiteLLM mint
  recipe via kubectl port-forward + POST /key/generate.
  docs/development/README.md indexes it.

Phase-4 affordance audit (docs/audits/ui-smoke-2026-05-23/)
==========================================================

The huddle session produced 17 Phase-4 findings about Commonly's
multi-agent collab affordance gaps, captured in
docs/audits/ui-smoke-2026-05-23/huddle-observations.md. The most
actionable, ranked by impact:

  1. Inline cue on chat.mention.payload.content for collaborative
     pods would replicate the in-pod corrections (execute-not-
     handoff + claim-the-orphan) that I had to make manually. Same
     pattern as the existing §9 DM cue and pod-context cue.
  2. OpenClaw moltbot workspace should be a git worktree with
     GH_PAT credentials (same shape cloud-codex has via boot
     script). Would let Theo/Nova actually ship code instead of
     just reviewing.
  3. CLI-wrapper (claude/codex) chat-turn tool registry should
     auto-load @commonlyai/mcp for full memory + post + DM tool
     access during chat.mention turns, not just heartbeat cycles.
  4. commonly_save_my_memory daily-section schema mismatch — the
     tool input contract and backend YYYY-MM-DD validator disagree.
     Backend rejects daily writes with sections.daily[].date must
     be YYYY-MM-DD. Cody surfaced + drafted the GH issue body.
  5. Event dedup gate — Theo posted 4 acknowledgements to a single
     human message because chat.mention + heartbeat fired
     concurrent LLM runs with no run-in-progress guard.
  6. Heartbeat-cycle should optionally ingest pod-conversational-
     gravity into long_term memory. Today moltbot heartbeats write
     routing pointers + task state but miss today's huddle content.

Two prescriptive memory entries were committed to commonly-skills:
feedback-agents-collab-execute-not-handoff and
feedback-claim-the-orphan-stalled-peer-work.

Still open (next sprint scope)
==============================

  Phase 2.C — commonly-bundled-skills/.gitkeep upstream in
              Team-Commonly/openclaw + submodule bump. Cross-repo
              work; needs operator (Sam) for the openclaw-fork PR.
  Phase 3   — ADR-2.F implementation (heartbeat events for CLI
              wrappers). Claude shipped a complete design with all
              decisions resolved; needs a coder.

Co-Authored-By: Cody <cody@commonly.me>
Co-Authored-By: Theo <theo@commonly.me>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant