Skip to content

fix(lifecycle): re-spawn boot loop on dormant reload#117

Merged
couragehong merged 2 commits into
feat/go-migrationfrom
couragehong/fix/reload-respawn-bootloop
May 7, 2026
Merged

fix(lifecycle): re-spawn boot loop on dormant reload#117
couragehong merged 2 commits into
feat/go-migrationfrom
couragehong/fix/reload-respawn-bootloop

Conversation

@couragehong
Copy link
Copy Markdown
Contributor

Summary

Two coupled changes around the boot-loop reload path that /rune:configure depends on after a fresh install — without these, a freshly-spawned MCP server can't pick up a populated ~/.rune/config.json short of a Claude Code restart.

  1. lifecycle.ManagerSetReloadFunc + Retrigger 추가
  2. service.LifecycleService.ReloadPipelines 가 dormant 시 Retrigger + state polling
  3. cmd/rune-mcp/main.go 가 SetReloadFunc 등록 + RunBootLoop 3-arg signature 호출 정정

End-to-end 효과: /rune:configure 처음 실행 후 Claude restart 없이 그대로 Active 도달.

Why

Boot loop returns after a terminal Dormant exit (config missing, state != "active", vault endpoint/token empty). Before this change LifecycleService.ReloadPipelines only reported state — it didn't actually re-trigger anything, so the goroutine stayed exited and the new config sat ignored.

/rune:configure 가 user input 받아 ~/.rune/config.json 작성 후 reload_pipelines 호출하는 정상 흐름이 결국 "restart 한 번 더 하세요" 로 끝났던 이유.

What changed

Manager.SetReloadFunc(f func()) — main.go 가 startup 시 한 번 등록.

Manager.Retrigger() — service 가 호출. state == Dormant 일 때만 callback 실행 (= fresh RunBootLoop goroutine spawn).

func (m *Manager) Retrigger() {
    if m.Current() != StateDormant {
        return  // 기존 goroutine 가 retry/active 중. respawn 하면 race.
    }
    // ... callback 실행
}

ReloadPipelines 가 dormant 시 Retrigger 후 폴링:

if s.State.Current() == lifecycle.StateDormant {
    s.State.Retrigger()
    s.waitForBootProgress(ctx, 5*time.Second)
}

waitForBootProgress 의 150ms 초기 grace — Retrigger 가 go RunBootLoop(...) 라 첫 read 가 prior Dormant 를 볼 가능성 회피.

main.go:

deps.State.SetReloadFunc(func() {
    go lifecycle.RunBootLoop(ctx, deps.State, deps)
})
go lifecycle.RunBootLoop(ctx, deps.State, deps)

직전 base 의 RunBootLoop(ctx, deps.State) 2-arg call 도 같이 정정 (3-arg signature 와 mismatch — base 가 build 안 됐던 상태).

couragehong and others added 2 commits May 7, 2026 20:05
CaptureService.CaptureLog stayed nil and LifecycleService.ConfigDir empty
because buildDeps never set them. Capture writes succeeded against
envector (records appeared in the team index) but capture_log.jsonl never
got appended (capture.go:155 skips Append when CaptureLog is nil), and
CaptureHistory tried to read "capture_log.jsonl" relative to the process
cwd via filepath.Join("", logio.DefaultFilename).

buildDeps now resolves ~/.rune via config.RuneDir() (with $HOME fallback
so handler dispatch never panics during boot), constructs a logio.New
pointed at ~/.rune/capture_log.jsonl, and injects it into both services.

Surfaces the missing wiring noticed during local-smoke end-to-end
verification: 6 records present in the envector index but
/rune:capture_history empty.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes — both around the boot-loop reload path that
/rune:configure depends on after a fresh install:

1. lifecycle.Manager.SetReloadFunc + Retrigger
-----------------------------------------------
The boot loop returns after a terminal Dormant exit (config missing,
state != "active", vault endpoint/token empty). Before this change there
was no way to ask for a fresh attempt without restarting the process —
LifecycleService.ReloadPipelines only reported state, it didn't actually
re-trigger anything.

  Manager.SetReloadFunc(f func())   — main.go installs the closure
                                      once at startup
  Manager.Retrigger()                — fires the closure ONLY when state
                                      is Dormant; no-ops otherwise

The Dormant-only guard is the key safety property. While state is
Starting / WaitingForVault / Active a goroutine is still running (either
retrying with backoff or already succeeded), so respawning would race a
second loop on the same Deps. Callers in those states see the existing
loop's progress via state polling. Active-state reload (token rotation,
endpoint change) needs graceful tear-down of the existing
vault/embedder/envector connections before respawn — that's deliberately
out of scope here and noted as follow-up.

For the Dormant cases, no connections were ever established (the loop
exits before vault.NewClient on all three terminal paths), so there is
nothing to close — respawn is leak-free.

2. service.LifecycleService.ReloadPipelines respawn + state polling
-------------------------------------------------------------------
ReloadPipelines now calls Manager.Retrigger when state is Dormant and
then polls state for up to 5s before snapshotting the response.

The 150ms initial grace exists because Retrigger does `go RunBootLoop(...)`
— the first state read can still see the prior Dormant snapshot before
the spawned goroutine reaches its first SetState(StateStarting). Polling
returns early on Active or a second Dormant (= the new attempt also
bailed); WaitingForVault keeps polling because the loop is still
actively retrying with backoff.

3. cmd/rune-mcp/main.go — wire SetReloadFunc + restore RunBootLoop call
-----------------------------------------------------------------------
main.go now installs the reload closure before kicking off the first boot
loop, and the call uses the full RunBootLoop(ctx, deps.State, deps)
signature. The previous form `RunBootLoop(ctx, deps.State)` was a stale
2-arg call leftover from an earlier signature; it would not compile
against the current 3-arg RunBootLoop, but no PR before this one
exercised the build path with the correct signature on this base.

End-to-end effect: a user running /rune:configure for the first time on
a freshly-spawned MCP server no longer needs to restart Claude Code for
the new ~/.rune/config.json to take effect. ReloadPipelines respawns the
boot loop, the new loop reads the populated config, dials Vault / runed /
envector, and the response reflects Active state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@couragehong couragehong self-assigned this May 7, 2026
@couragehong couragehong merged commit 8b0314f into feat/go-migration May 7, 2026
1 check passed
@couragehong couragehong deleted the couragehong/fix/reload-respawn-bootloop branch May 7, 2026 11:37
couragehong added a commit that referenced this pull request May 8, 2026
The Go rune-mcp at internal/* + cmd/rune-mcp/ has reached parity (PR
#102 + #110 + #117) and end-to-end verification (#118, #122, #124),
which means the Python tree is now dead weight. Carrying both
implementations is actively misleading: a fresh contributor following
the in-repo install instructions would still land in mcp/ + agents/
and try to set up a venv that no longer ships, and parity audits keep
re-discovering the Python source instead of treating the Go side as
the source of truth.

Removed
-------

  agents/common/        12 files  (config, embedding, llm, schemas)
  agents/retriever/      4 files  (query_processor, searcher, synthesizer)
  agents/scribe/        12 files  (detector, llm_extractor, handlers, server)
  agents/tests/         16 files  (pytest suite)
  agents/__init__.py
  agents/README.md            (Python agents intro — gone with the impl)
  agents/SLACK_SETUP.md       (Slack notifier setup for Python scribe)
  mcp/                  18 files  (Python adapter + server + tests)
  requirements.txt            (root Python dependency list)
  scripts/migrate_embeddings.py (one-off Python migration helper)

  Total: 67 files, 17,815 lines.

Kept (intentional)
------------------

  agents/claude/{scribe,retriever}.md   referenced by .claude-plugin/
                                        plugin.json — agent prompts that
                                        the runtime loads
  agents/codex/scribe.md                Codex-side agent prompt
  agents/gemini/{scribe,retriever}.md   Gemini-side agent prompts
  benchmark/                            deferred (separate decision —
                                        rewrite in Go vs delete entirely)
  docs/v04/spec/python-mapping.md       parity blueprint that maps Python
                                        source to Go destinations; useful
                                        as a historical record post-deletion
  docs/migration/*.md                   migration plan + audit trail —
                                        intentional history
  scripts/bootstrap-mcp.sh and other    referenced by gemini-extension.json;
    Python-era shell scripts            removal blocked on Gemini support
                                        decision (separate PR)

Not in scope
------------

  .github/workflows/pr-tests.yml + pr-comment.yml — Python pytest CI;
    handled by PR #125 (ci-drop-python).
  README.md, CLAUDE.md, SKILL.md, AGENT_INTEGRATION.md, GEMINI.md,
    CONTRIBUTING.md — top-level docs still describe the v0.3 install
    flow; rewrite scheduled separately so this commit stays focused
    on code deletion.

Verification
------------

  go build ./...   passes
  go vet ./...     passes
  go test ./...    full suite passes (no test referenced deleted paths)
  grep across remaining .{go,md,json,sh,toml,yml,yaml} for the deleted
    paths returned zero hits — no dangling references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant