Skip to content

feat(kiloclaw): add Kilo CLI recovery agent#1657

Merged
RSO merged 25 commits intomainfrom
RSO/mousy-rest
Mar 30, 2026
Merged

feat(kiloclaw): add Kilo CLI recovery agent#1657
RSO merged 25 commits intomainfrom
RSO/mousy-rest

Conversation

@RSO
Copy link
Copy Markdown
Contributor

@RSO RSO commented Mar 27, 2026

Summary

  • Add the ability to run kilo run --auto on a KiloClaw instance from the dashboard as a recovery tool for stuck/failing instances, with real-time output polling and DB persistence
  • Full stack implementation: controller spawn route → CF Worker DO proxy → platform routes → tRPC procedures → React polling page with auto-scrolling output viewer
  • Admin visibility via a dedicated CLI Runs tab with search and pagination
  • System context prompt template that gives the agent awareness of the KiloClaw environment (config paths, gateway setup, common issues)
  • UI rebranded from generic "Run Kilo Agent" to recovery-focused "Recover with Kilo" messaging

Verification

  • pnpm typecheck — passes (all packages)
  • pnpm format:check — passes
  • pnpm lint — passes (0 warnings, 0 errors)
  • Controller route tests included (12 tests in kilo-cli-run.test.ts)
  • Migration 0061 regenerated cleanly after rebase on main (no number collision with main's 0060)

Visual Changes

hooks-enabled-recovery.mp4

Reviewer Notes

  • The drizzle migration was regenerated as 0061 after rebasing onto main (which already has 0060). The migration adds the kiloclaw_cli_runs table with FK to kilocode_users and indexes on user_id and started_at.
  • The recovery agent uses the instance's configured default model and KiloCode API key. The system prompt template in kilo-cli-config.ts gives the agent context about the OpenClaw environment.
  • Output polling uses @tanstack/react-query with a 2s refetch interval while the run is active.

RSO added 13 commits March 27, 2026 16:35
Add the ability to run `kilo run --auto` on a KiloClaw instance from the
dashboard, with real-time output polling and DB persistence.

Stack: controller spawn route -> CF Worker DO proxy -> platform routes ->
tRPC procedures -> React polling page. Includes admin visibility,
changelog entry, and 12 controller route tests.
Sync the Kilo CLI's model config with the user's selected KiloClaw
default model (KILOCODE_DEFAULT_MODEL). Previously the CLI ignored this
env var and fell back to kilo-auto/small. Now the model is written to
opencode.json on both fresh installs and every boot, converting the
kilocode/ provider prefix to kilo/ for the CLI's naming convention.
Wrap the user's prompt with system context (key paths, architecture,
diagnostic commands, and fix instructions) so the agent knows where
to look and how to repair broken OpenClaw instances. The original
user prompt is still stored for UI display; only the expanded prompt
is passed to `kilo run --auto`.
…CLI run UI

Export path constants from config-writer and kilo-cli-config so the
prompt template references them instead of duplicating string literals.
Add openclaw doctor to diagnostics. Improve the CLI run detail page
with SetPageTitle, remove max-height on output, and clean up layout.
@RSO RSO self-assigned this Mar 27, 2026
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot bot commented Mar 27, 2026

Code Review Summary

Status: 5 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 5
SUGGESTION 0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File Line Issue
src/routers/kiloclaw-router.ts 1061 Still polls the controller's single current/last run whenever the DB row is running, so a requested run can still show or persist a newer run's output if another recovery starts before the older row is finalized.
src/routers/kiloclaw-router.ts 1096 cancelKiloCliRun still sends an unscoped cancel request to the controller, so canceling an older runId can terminate a newer run and mark the wrong DB row as cancelled.
Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File Line Issue
src/routers/admin-kiloclaw-instances-router.ts 677 Admin instance-detail worker calls are still keyed by userId; machineStart now explicitly passes undefined for instanceId, so a multi-instance user's page can inspect or control the wrong durable object.
kiloclaw/src/routes/platform.ts 533 PATCH /api/platform/secrets still resolves the durable object by userId, so per-instance secret or config-path updates can be applied to the wrong instance in multi-instance mode.
kiloclaw/src/routes/platform.ts 1258 volume-snapshots and candidate-volumes still ignore instanceId, so the admin volume tooling can read or operate on the wrong instance for multi-instance users.

Resolved since previous review:

  • kiloclaw/controller/src/kilo-cli-config.ts:87 now relies on KILO_API_URL instead of the stale provider.kilo.options.baseURL override.
Files Reviewed (5 files)
  • src/routers/kiloclaw-router.ts - 2 issues
  • src/routers/admin-kiloclaw-instances-router.ts - 1 issue
  • kiloclaw/src/routes/platform.ts - 2 issues
  • src/lib/kiloclaw/kiloclaw-internal-client.ts - 0 issues
  • src/app/admin/components/KiloclawInstances/KiloclawInstanceDetail.tsx - 0 issues

Reviewed by gpt-5.4-20260305 · 3,615,017 tokens

Comment on lines +193 to +195
if (env.KILOCODE_API_BASE_URL) {
env.KILO_API_URL = new URL(env.KILOCODE_API_BASE_URL).origin;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the providers.kilo.options.baseURL setting in the opencode.json config, which doesn't work as expected.

config.provider = config.provider || {};
config.provider.kilo = config.provider.kilo || {};
config.provider.kilo.options = config.provider.kilo.options || {};
delete config.provider.kilo.options.baseURL;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could test baseURL just never works as you want it to because either the models-fetch or the chat-completion fails because it uses the wrong path.

@RSO RSO requested a review from a team March 27, 2026 15:58
RSO added 3 commits March 27, 2026 17:27
Exclude the output column from listAllCliRuns to avoid sending up to
~25MB per page. Add a dedicated getCliRunOutput procedure that fetches
output for a single run on demand when selected in the admin panel.
@RSO RSO enabled auto-merge March 27, 2026 16:38
@RSO RSO disabled auto-merge March 30, 2026 08:41
.input(z.object({ runId: z.string().uuid() }))
.mutation(async ({ ctx, input }) => {
const client = new KiloClawInternalClient();
const result = await client.cancelKiloCliRun(ctx.user.id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: runId still is not sent to the controller cancel API

cancelKiloCliRun only narrows the database update. The controller request here still cancels the user's current run regardless of input.runId, so canceling an older row after a newer recovery starts can terminate the newer job and leave the wrong row marked as cancelled.

@RSO RSO merged commit b447254 into main Mar 30, 2026
33 of 34 checks passed
@RSO RSO deleted the RSO/mousy-rest branch March 30, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants