feat(self_improve): migrate to plugin hook architecture (Phase 1 Step 4) by AVADSA25 · Pull Request #8 · AVADSA25/codec

AVADSA25 · 2026-05-01T14:07:20Z

Summary

Migrates codec_self_improve from nightly polling to event-driven hook plugin per Phase 1 Step 4. Same proposals end up in ~/.codec/skill_proposals/YYYY-MM-DD/ — only the trigger changes.

Architecture

post_tool / on_error           In-memory ring buffer (last 200 signals)
       │                       captures (tool, outcome, error_type)
       ▼
on_operation_end               Snapshot buffer → codec_self_improve._find_gaps
       │                       → throttle filter (30 min/tool)
       │                       → spawn daemon thread
       ▼
   threading.Thread(daemon)    LLM draft + write run in background.
       │                       on_operation_end returns immediately so
       │                       the user's operation isn't blocked.
       ▼
codec_self_improve.            Same _draft_skill / _validate /
   _draft_skill                 _write_proposal flow as nightly run.
   _validate                    Same dangerous-pattern gate.
   _write_proposal

codec_self_improve.run_once() is unchanged — CLI / skill / autopilot triggers all keep working. Plugin path is additive.

Commits

sha	what
`89c5a3c`	`plugins/self_improve.py` NEW (393 lines) — full plugin with metadata, hooks, throttle, kill switch
`9509f85`	`codec_self_improve.py` docstring update — annotates the dual-trigger architecture; declares `_find_gaps`/`_draft_skill`/`_validate`/`_write_proposal`/`_existing_skill_names` + `MAX_PROPOSALS_PER_RUN`/`_GAP_KIND_TO_SIGNAL`/`_PROPOSALS_ROOT` as the stable internal-API surface the plugin imports. Zero code change.
`dc07b17`	`tests/test_self_improve_plugin.py` NEW (361 lines) — 21 tests, 100% passing

Test plan

pytest tests/test_self_improve_plugin.py -v → 21 passed in 44.28s
Per Step 4 contract: did NOT run full pytest suite (avoids the destructive-skill cascade documented in docs/INCIDENT-2026-05-01-spurious-skill-fires.md)
Post-test state files clean: pending_questions=0, Apple Reminders=0, /tmp/codec_*.txt=0
Verified codec_self_improve public surface intact (10 names: 7 helpers + 3 module constants)
Plugin metadata AST-discoverable by codec_hooks._extract_metadata without executing the module

Test coverage breakdown

Area	Tests	What it asserts
Plugin metadata	1	AST scan finds PLUGIN_NAME / declared hooks
post_tool capture	4	OK / failure-string / self-recursion / observe-only
on_error capture	3	exception / TimeoutError / self-recursion
Drafter spawn	3	empty / sub-threshold / threshold breach
Throttle	2	blocks repeat / works at 0 (proves it's the gate)
Kill switch	5	post_tool / on_error / on_operation_end / default / aliases
Validation	2	dangerous rejected / safe accepted
End-to-end	1	`_draft_and_write` produces `.md` + `.py`

All tests redirect codec_audit._AUDIT_LOG and codec_self_improve._PROPOSALS_ROOT to tmp_path. _draft_skill is mocked everywhere — NO Qwen calls, NO Apple state, NO Terminal popups, NO osascript.

Install path

Per the trust model (AGENTS.md §3 plugin section: "local Python files curated by the user"), the plugin file lives in the repo and the user installs it manually:

cp ~/codec-repo/plugins/self_improve.py ~/.codec/plugins/self_improve.py
pm2 restart codec-dashboard codec-mcp-http open-codec
# Plugin loads on next operation in any process that uses codec_hooks.

To uninstall: rm ~/.codec/plugins/self_improve.py + PM2 restart, OR set SELF_IMPROVE_PLUGIN_ENABLED=false env var on the affected processes.

Safety / what this PR does NOT do

No _HTTP_BLOCKED change.
No modification of codec_self_improve.py runtime behavior — only the docstring.
No osascript / subprocess calls anywhere in the plugin (per the 2026-05-01 incident contract).
No Apple Reminders / Notes / Calendar entries created.
Plugin auto-installs nothing — user must cp the file to ~/.codec/plugins/ themselves.

Per-feature kill switch

Env var	Default	Effect
`SELF_IMPROVE_PLUGIN_ENABLED`	`true`	Set to `false` → all 3 hooks no-op, buffer stops growing, no drafts. Set on PM2 restart of the affected process. Test: `tests/test_self_improve_plugin.py::test_kill_switch_disables_post_tool` (and 4 more aliases).

Out of scope (Phase 2)

AGENTS.md update with the new plugin path (will fold into Phase 1 sign-off doc, not this PR).
audit_report breakdown by extra.trigger (plugin_hook vs nightly_run) — additive, can land separately.
Removing the self_improve skill from MCP exposure entirely (it currently still fires when the LLM asks for it; the plugin just adds a second event-driven trigger). Decision deferred per user instruction.

Phase 1 status after this PR

Step 1 (audit envelope): merged + 24h watch clean
Step 2 (plugin lifecycle): merged + 24h watch clean
Step 3 (AskUserQuestion + stuck + step budget): merged 2026-05-01 13:47 UTC; 24h watch SKIPPED per user instruction (pattern established)
Step 4 (self_improve as plugin): THIS PR

Once this lands, Phase 1 is complete.

🤖 Generated with Claude Code

…Phase 1 Step 4 a) Adds plugins/self_improve.py — registers post_tool, on_error, and on_operation_end hooks per the codec_hooks contract (Phase 1 Step 2). Captures every tool fire as a signal in an in-memory ring buffer of last 200 entries. On each operation-end, snapshots the buffer, runs the EXISTING codec_self_improve._find_gaps + _draft_skill + _validate + _write_proposal flow in a daemon thread, and emits the same skill_proposal_staged audit event as the nightly path (with a new extra.trigger="plugin_hook" so audit_report can break out the source). Net effect: replaces the nightly polling cycle. Same proposals end up in ~/.codec/skill_proposals/YYYY-MM-DD/ — only the trigger changes. Design highlights: - Plugin metadata via PLUGIN_NAME / PLUGIN_DESCRIPTION / PLUGIN_PRIORITY=200 (low — observe-only mostly) / PLUGIN_TOOL_FILTER=None - Lazy-load codec_self_improve helpers on first hook fire (cheap startup, doesn't crash the AST scan if codec_self_improve is moved) - Per-tool throttle of 30 min — prevents Qwen spam if same gap fires every operation - Background daemon thread for the LLM call so on_operation_end returns fast; the user's operation isn't blocked by the ~2-min Qwen draft - Self-recursion guard: skip tool_name in {"self_improve", ""} to avoid the plugin analyzing its own emits - Kill switch: SELF_IMPROVE_PLUGIN_ENABLED env var (default true). Set false → all hooks no-op, buffer stops growing, no drafts. - NO osascript, NO subprocess, NO Apple Reminders / Notes / Calendar (per the 2026-05-01 incident contract) Coexistence: codec_self_improve.run_once() is unchanged. CLI invocation (`python3 codec_self_improve.py`), skill (`self_improve`), or autopilot trigger all keep working. Plugin path is ADDITIVE — both paths share the same _find_gaps + _draft_skill + _write_proposal helpers and write to the same proposal directory. Install (manual, per trust model): cp ~/codec-repo/plugins/self_improve.py ~/.codec/plugins/self_improve.py pm2 restart codec-dashboard codec-mcp-http open-codec Test surface: exposes _reset_state_for_test, _set_throttle_seconds_for_test, _get_signals_snapshot_for_test for tests/test_self_improve_plugin.py (commit b — next).

… plugin paths (Phase 1 Step 4 b) Updates the module docstring to make explicit the dual-trigger architecture introduced in Phase 1 Step 4: [1] Legacy nightly / on-demand — run_once(target_date) CLI / skill / autopilot paths unchanged. [2] Plugin path — plugins/self_improve.py post_tool / on_error / on_operation_end hooks; same _find_gaps + _draft_skill + _validate + _write_proposal helpers; daemon thread for the LLM call; per-tool throttle. No code change. The module's public/internal-stable surface (_existing_skill_names, _load_audit_for, _find_gaps, _draft_skill, _validate, _write_proposal, run_once + MAX_PROPOSALS_PER_RUN, _GAP_KIND_TO_SIGNAL, _PROPOSALS_ROOT constants) is documented as the plugin's import target — renaming any of these is a breaking change for the plugin AND for any future test_self_improve.py. Verified the surface still imports cleanly: $ python3 -c "import codec_self_improve; ..." ✓ all 10 names present. Per Step 4 plan: codec_self_improve.py is the SHIM (helpers + legacy trigger); plugins/self_improve.py is the NEW trigger that imports the shim. Both write to ~/.codec/skill_proposals/YYYY-MM-DD/. Existing test surface and import paths preserved.

…plugin path (Phase 1 Step 4 c) Validates plugins/self_improve.py end-to-end. 21 tests covering: Plugin metadata (1): - AST-discoverable by codec_hooks._extract_metadata: PLUGIN_NAME= "self_improve", PLUGIN_PRIORITY=200, PLUGIN_TOOL_FILTER=None, declared hooks = {post_tool, on_error, on_operation_end}. post_tool signal capture (4): - Successful call → buffer entry with outcome="ok" - Failure-string heuristic → outcome="error", error_type= "ResultStringError" (matches "Error:", "Skill X failed:", etc.) - Self-recursion guard → tool_name="self_improve" + "" both skipped - Returns None always (observe-only contract) on_error signal capture (3): - Generic exception → outcome="error", error_type captured - TimeoutError-like → outcome="timeout" (distinct gap kind for codec_self_improve._find_gaps) - Self-recursion guard same as post_tool on_operation_end + drafter spawn (3): - Empty buffer → no thread spawned - Sub-threshold (1 unknown-tool call < the ≥2 missing_tool threshold) → no thread spawned - Threshold breach (2 unknown-tool calls) → drafter thread spawned with target=_draft_and_write, daemon=True Throttle (2): - Same-tool repeat within window → second draft blocked - Throttle=0 → both rounds spawn (proves throttle was the gating) Kill switch (5): - SELF_IMPROVE_PLUGIN_ENABLED=false → post_tool no-op - Same → on_error no-op - Same → on_operation_end no-op even with full buffer + threshold - Default (env unset) → enabled=True - All off-aliases (false/0/no/off/FALSE/Off) → enabled=False Validation gate (2): - Dangerous code (os.system) → _validate rejects - Safe code → _validate accepts End-to-end (1): - Mock _draft_skill → run _draft_and_write → assert .md + .py written to _PROPOSALS_ROOT/YYYY-MM-DD/, .md contains skill name + PASSED/REJECTED status Test isolation (per the 2026-05-01 incident lessons): - codec_audit._AUDIT_LOG redirected to tmp_path (no real audit writes) - codec_self_improve._PROPOSALS_ROOT redirected to tmp_path (no real proposals to ~/.codec/skill_proposals/) - _draft_skill monkeypatched to return canned data (NO Qwen calls) - threading.Thread mock-captured to assert spawn without actually running drafter (in spawn-assertion tests) - Plugin state reset via _reset_state_for_test fixture per test - NO osascript, NO subprocess, NO Apple Reminders / Notes / Calendar Test run result: 21 passed in 44.28s (Qwen draft is mocked everywhere; the 44s is mostly module-import + pytest collection overhead). Post-test state files verified clean: - ~/.codec/pending_questions.json: 0 entries - Apple Reminders incomplete: 0 - /tmp/codec_*.txt: 0 files Per Step 4 contract: no full-suite pytest run, no fixture leak risk beyond this file's own monkeypatches.

Mikarina13 added 3 commits May 1, 2026 16:02

AVADSA25 merged commit 9858934 into main May 1, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(self_improve): migrate to plugin hook architecture (Phase 1 Step 4)#8

feat(self_improve): migrate to plugin hook architecture (Phase 1 Step 4)#8
AVADSA25 merged 3 commits intomainfrom
phase1-step4-self-improve-as-hook

AVADSA25 commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented May 1, 2026

Summary

Architecture

Commits

Test plan

Test coverage breakdown

Install path

Safety / what this PR does NOT do

Per-feature kill switch

Out of scope (Phase 2)

Phase 1 status after this PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants