Skip to content

feat(self_improve): migrate to plugin hook architecture (Phase 1 Step 4)#8

Merged
AVADSA25 merged 3 commits intomainfrom
phase1-step4-self-improve-as-hook
May 1, 2026
Merged

feat(self_improve): migrate to plugin hook architecture (Phase 1 Step 4)#8
AVADSA25 merged 3 commits intomainfrom
phase1-step4-self-improve-as-hook

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

@AVADSA25 AVADSA25 commented May 1, 2026

Summary

Migrates codec_self_improve from nightly polling to event-driven hook plugin per Phase 1 Step 4. Same proposals end up in ~/.codec/skill_proposals/YYYY-MM-DD/ — only the trigger changes.

Architecture

post_tool / on_error           In-memory ring buffer (last 200 signals)
       │                       captures (tool, outcome, error_type)
       ▼
on_operation_end               Snapshot buffer → codec_self_improve._find_gaps
       │                       → throttle filter (30 min/tool)
       │                       → spawn daemon thread
       ▼
   threading.Thread(daemon)    LLM draft + write run in background.
       │                       on_operation_end returns immediately so
       │                       the user's operation isn't blocked.
       ▼
codec_self_improve.            Same _draft_skill / _validate /
   _draft_skill                 _write_proposal flow as nightly run.
   _validate                    Same dangerous-pattern gate.
   _write_proposal

codec_self_improve.run_once() is unchanged — CLI / skill / autopilot triggers all keep working. Plugin path is additive.

Commits

sha what
89c5a3c plugins/self_improve.py NEW (393 lines) — full plugin with metadata, hooks, throttle, kill switch
9509f85 codec_self_improve.py docstring update — annotates the dual-trigger architecture; declares _find_gaps/_draft_skill/_validate/_write_proposal/_existing_skill_names + MAX_PROPOSALS_PER_RUN/_GAP_KIND_TO_SIGNAL/_PROPOSALS_ROOT as the stable internal-API surface the plugin imports. Zero code change.
dc07b17 tests/test_self_improve_plugin.py NEW (361 lines) — 21 tests, 100% passing

Test plan

  • pytest tests/test_self_improve_plugin.py -v21 passed in 44.28s
  • Per Step 4 contract: did NOT run full pytest suite (avoids the destructive-skill cascade documented in docs/INCIDENT-2026-05-01-spurious-skill-fires.md)
  • Post-test state files clean: pending_questions=0, Apple Reminders=0, /tmp/codec_*.txt=0
  • Verified codec_self_improve public surface intact (10 names: 7 helpers + 3 module constants)
  • Plugin metadata AST-discoverable by codec_hooks._extract_metadata without executing the module

Test coverage breakdown

Area Tests What it asserts
Plugin metadata 1 AST scan finds PLUGIN_NAME / declared hooks
post_tool capture 4 OK / failure-string / self-recursion / observe-only
on_error capture 3 exception / TimeoutError / self-recursion
Drafter spawn 3 empty / sub-threshold / threshold breach
Throttle 2 blocks repeat / works at 0 (proves it's the gate)
Kill switch 5 post_tool / on_error / on_operation_end / default / aliases
Validation 2 dangerous rejected / safe accepted
End-to-end 1 _draft_and_write produces .md + .py

All tests redirect codec_audit._AUDIT_LOG and codec_self_improve._PROPOSALS_ROOT to tmp_path. _draft_skill is mocked everywhere — NO Qwen calls, NO Apple state, NO Terminal popups, NO osascript.

Install path

Per the trust model (AGENTS.md §3 plugin section: "local Python files curated by the user"), the plugin file lives in the repo and the user installs it manually:

cp ~/codec-repo/plugins/self_improve.py ~/.codec/plugins/self_improve.py
pm2 restart codec-dashboard codec-mcp-http open-codec
# Plugin loads on next operation in any process that uses codec_hooks.

To uninstall: rm ~/.codec/plugins/self_improve.py + PM2 restart, OR set SELF_IMPROVE_PLUGIN_ENABLED=false env var on the affected processes.

Safety / what this PR does NOT do

  • No _HTTP_BLOCKED change.
  • No modification of codec_self_improve.py runtime behavior — only the docstring.
  • No osascript / subprocess calls anywhere in the plugin (per the 2026-05-01 incident contract).
  • No Apple Reminders / Notes / Calendar entries created.
  • Plugin auto-installs nothing — user must cp the file to ~/.codec/plugins/ themselves.

Per-feature kill switch

Env var Default Effect
SELF_IMPROVE_PLUGIN_ENABLED true Set to false → all 3 hooks no-op, buffer stops growing, no drafts. Set on PM2 restart of the affected process. Test: tests/test_self_improve_plugin.py::test_kill_switch_disables_post_tool (and 4 more aliases).

Out of scope (Phase 2)

  • AGENTS.md update with the new plugin path (will fold into Phase 1 sign-off doc, not this PR).
  • audit_report breakdown by extra.trigger (plugin_hook vs nightly_run) — additive, can land separately.
  • Removing the self_improve skill from MCP exposure entirely (it currently still fires when the LLM asks for it; the plugin just adds a second event-driven trigger). Decision deferred per user instruction.

Phase 1 status after this PR

  • Step 1 (audit envelope): merged + 24h watch clean
  • Step 2 (plugin lifecycle): merged + 24h watch clean
  • Step 3 (AskUserQuestion + stuck + step budget): merged 2026-05-01 13:47 UTC; 24h watch SKIPPED per user instruction (pattern established)
  • Step 4 (self_improve as plugin): THIS PR

Once this lands, Phase 1 is complete.

🤖 Generated with Claude Code

Mikarina13 added 3 commits May 1, 2026 16:02
…Phase 1 Step 4 a)

Adds plugins/self_improve.py — registers post_tool, on_error, and
on_operation_end hooks per the codec_hooks contract (Phase 1 Step 2).
Captures every tool fire as a signal in an in-memory ring buffer of
last 200 entries. On each operation-end, snapshots the buffer, runs the
EXISTING codec_self_improve._find_gaps + _draft_skill + _validate +
_write_proposal flow in a daemon thread, and emits the same
skill_proposal_staged audit event as the nightly path (with a new
extra.trigger="plugin_hook" so audit_report can break out the source).

Net effect: replaces the nightly polling cycle. Same proposals end up
in ~/.codec/skill_proposals/YYYY-MM-DD/ — only the trigger changes.

Design highlights:
- Plugin metadata via PLUGIN_NAME / PLUGIN_DESCRIPTION /
  PLUGIN_PRIORITY=200 (low — observe-only mostly) / PLUGIN_TOOL_FILTER=None
- Lazy-load codec_self_improve helpers on first hook fire (cheap startup,
  doesn't crash the AST scan if codec_self_improve is moved)
- Per-tool throttle of 30 min — prevents Qwen spam if same gap fires
  every operation
- Background daemon thread for the LLM call so on_operation_end returns
  fast; the user's operation isn't blocked by the ~2-min Qwen draft
- Self-recursion guard: skip tool_name in {"self_improve", ""} to avoid
  the plugin analyzing its own emits
- Kill switch: SELF_IMPROVE_PLUGIN_ENABLED env var (default true).
  Set false → all hooks no-op, buffer stops growing, no drafts.
- NO osascript, NO subprocess, NO Apple Reminders / Notes / Calendar
  (per the 2026-05-01 incident contract)

Coexistence: codec_self_improve.run_once() is unchanged. CLI invocation
(`python3 codec_self_improve.py`), skill (`self_improve`), or autopilot
trigger all keep working. Plugin path is ADDITIVE — both paths share
the same _find_gaps + _draft_skill + _write_proposal helpers and write
to the same proposal directory.

Install (manual, per trust model):
    cp ~/codec-repo/plugins/self_improve.py ~/.codec/plugins/self_improve.py
    pm2 restart codec-dashboard codec-mcp-http open-codec

Test surface: exposes _reset_state_for_test, _set_throttle_seconds_for_test,
_get_signals_snapshot_for_test for tests/test_self_improve_plugin.py
(commit b — next).
… plugin paths (Phase 1 Step 4 b)

Updates the module docstring to make explicit the dual-trigger
architecture introduced in Phase 1 Step 4:

  [1] Legacy nightly / on-demand — run_once(target_date)
      CLI / skill / autopilot paths unchanged.

  [2] Plugin path — plugins/self_improve.py
      post_tool / on_error / on_operation_end hooks; same
      _find_gaps + _draft_skill + _validate + _write_proposal
      helpers; daemon thread for the LLM call; per-tool throttle.

No code change. The module's public/internal-stable surface
(_existing_skill_names, _load_audit_for, _find_gaps, _draft_skill,
_validate, _write_proposal, run_once + MAX_PROPOSALS_PER_RUN,
_GAP_KIND_TO_SIGNAL, _PROPOSALS_ROOT constants) is documented as
the plugin's import target — renaming any of these is a breaking
change for the plugin AND for any future test_self_improve.py.

Verified the surface still imports cleanly:
  $ python3 -c "import codec_self_improve; ..."
  ✓ all 10 names present.

Per Step 4 plan: codec_self_improve.py is the SHIM (helpers + legacy
trigger); plugins/self_improve.py is the NEW trigger that imports
the shim. Both write to ~/.codec/skill_proposals/YYYY-MM-DD/.
Existing test surface and import paths preserved.
…plugin path (Phase 1 Step 4 c)

Validates plugins/self_improve.py end-to-end. 21 tests covering:

Plugin metadata (1):
  - AST-discoverable by codec_hooks._extract_metadata: PLUGIN_NAME=
    "self_improve", PLUGIN_PRIORITY=200, PLUGIN_TOOL_FILTER=None,
    declared hooks = {post_tool, on_error, on_operation_end}.

post_tool signal capture (4):
  - Successful call → buffer entry with outcome="ok"
  - Failure-string heuristic → outcome="error", error_type=
    "ResultStringError" (matches "Error:", "Skill X failed:", etc.)
  - Self-recursion guard → tool_name="self_improve" + "" both skipped
  - Returns None always (observe-only contract)

on_error signal capture (3):
  - Generic exception → outcome="error", error_type captured
  - TimeoutError-like → outcome="timeout" (distinct gap kind for
    codec_self_improve._find_gaps)
  - Self-recursion guard same as post_tool

on_operation_end + drafter spawn (3):
  - Empty buffer → no thread spawned
  - Sub-threshold (1 unknown-tool call < the ≥2 missing_tool threshold)
    → no thread spawned
  - Threshold breach (2 unknown-tool calls) → drafter thread spawned
    with target=_draft_and_write, daemon=True

Throttle (2):
  - Same-tool repeat within window → second draft blocked
  - Throttle=0 → both rounds spawn (proves throttle was the gating)

Kill switch (5):
  - SELF_IMPROVE_PLUGIN_ENABLED=false → post_tool no-op
  - Same → on_error no-op
  - Same → on_operation_end no-op even with full buffer + threshold
  - Default (env unset) → enabled=True
  - All off-aliases (false/0/no/off/FALSE/Off) → enabled=False

Validation gate (2):
  - Dangerous code (os.system) → _validate rejects
  - Safe code → _validate accepts

End-to-end (1):
  - Mock _draft_skill → run _draft_and_write → assert .md + .py written
    to _PROPOSALS_ROOT/YYYY-MM-DD/, .md contains skill name +
    PASSED/REJECTED status

Test isolation (per the 2026-05-01 incident lessons):
  - codec_audit._AUDIT_LOG redirected to tmp_path (no real audit writes)
  - codec_self_improve._PROPOSALS_ROOT redirected to tmp_path (no real
    proposals to ~/.codec/skill_proposals/)
  - _draft_skill monkeypatched to return canned data (NO Qwen calls)
  - threading.Thread mock-captured to assert spawn without actually
    running drafter (in spawn-assertion tests)
  - Plugin state reset via _reset_state_for_test fixture per test
  - NO osascript, NO subprocess, NO Apple Reminders / Notes / Calendar

Test run result: 21 passed in 44.28s (Qwen draft is mocked everywhere;
the 44s is mostly module-import + pytest collection overhead).

Post-test state files verified clean:
  - ~/.codec/pending_questions.json: 0 entries
  - Apple Reminders incomplete: 0
  - /tmp/codec_*.txt: 0 files

Per Step 4 contract: no full-suite pytest run, no fixture leak risk
beyond this file's own monkeypatches.
@AVADSA25 AVADSA25 merged commit 9858934 into main May 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants