Conversation
…Phase 1 Step 4 a)
Adds plugins/self_improve.py — registers post_tool, on_error, and
on_operation_end hooks per the codec_hooks contract (Phase 1 Step 2).
Captures every tool fire as a signal in an in-memory ring buffer of
last 200 entries. On each operation-end, snapshots the buffer, runs the
EXISTING codec_self_improve._find_gaps + _draft_skill + _validate +
_write_proposal flow in a daemon thread, and emits the same
skill_proposal_staged audit event as the nightly path (with a new
extra.trigger="plugin_hook" so audit_report can break out the source).
Net effect: replaces the nightly polling cycle. Same proposals end up
in ~/.codec/skill_proposals/YYYY-MM-DD/ — only the trigger changes.
Design highlights:
- Plugin metadata via PLUGIN_NAME / PLUGIN_DESCRIPTION /
PLUGIN_PRIORITY=200 (low — observe-only mostly) / PLUGIN_TOOL_FILTER=None
- Lazy-load codec_self_improve helpers on first hook fire (cheap startup,
doesn't crash the AST scan if codec_self_improve is moved)
- Per-tool throttle of 30 min — prevents Qwen spam if same gap fires
every operation
- Background daemon thread for the LLM call so on_operation_end returns
fast; the user's operation isn't blocked by the ~2-min Qwen draft
- Self-recursion guard: skip tool_name in {"self_improve", ""} to avoid
the plugin analyzing its own emits
- Kill switch: SELF_IMPROVE_PLUGIN_ENABLED env var (default true).
Set false → all hooks no-op, buffer stops growing, no drafts.
- NO osascript, NO subprocess, NO Apple Reminders / Notes / Calendar
(per the 2026-05-01 incident contract)
Coexistence: codec_self_improve.run_once() is unchanged. CLI invocation
(`python3 codec_self_improve.py`), skill (`self_improve`), or autopilot
trigger all keep working. Plugin path is ADDITIVE — both paths share
the same _find_gaps + _draft_skill + _write_proposal helpers and write
to the same proposal directory.
Install (manual, per trust model):
cp ~/codec-repo/plugins/self_improve.py ~/.codec/plugins/self_improve.py
pm2 restart codec-dashboard codec-mcp-http open-codec
Test surface: exposes _reset_state_for_test, _set_throttle_seconds_for_test,
_get_signals_snapshot_for_test for tests/test_self_improve_plugin.py
(commit b — next).
… plugin paths (Phase 1 Step 4 b)
Updates the module docstring to make explicit the dual-trigger
architecture introduced in Phase 1 Step 4:
[1] Legacy nightly / on-demand — run_once(target_date)
CLI / skill / autopilot paths unchanged.
[2] Plugin path — plugins/self_improve.py
post_tool / on_error / on_operation_end hooks; same
_find_gaps + _draft_skill + _validate + _write_proposal
helpers; daemon thread for the LLM call; per-tool throttle.
No code change. The module's public/internal-stable surface
(_existing_skill_names, _load_audit_for, _find_gaps, _draft_skill,
_validate, _write_proposal, run_once + MAX_PROPOSALS_PER_RUN,
_GAP_KIND_TO_SIGNAL, _PROPOSALS_ROOT constants) is documented as
the plugin's import target — renaming any of these is a breaking
change for the plugin AND for any future test_self_improve.py.
Verified the surface still imports cleanly:
$ python3 -c "import codec_self_improve; ..."
✓ all 10 names present.
Per Step 4 plan: codec_self_improve.py is the SHIM (helpers + legacy
trigger); plugins/self_improve.py is the NEW trigger that imports
the shim. Both write to ~/.codec/skill_proposals/YYYY-MM-DD/.
Existing test surface and import paths preserved.
…plugin path (Phase 1 Step 4 c)
Validates plugins/self_improve.py end-to-end. 21 tests covering:
Plugin metadata (1):
- AST-discoverable by codec_hooks._extract_metadata: PLUGIN_NAME=
"self_improve", PLUGIN_PRIORITY=200, PLUGIN_TOOL_FILTER=None,
declared hooks = {post_tool, on_error, on_operation_end}.
post_tool signal capture (4):
- Successful call → buffer entry with outcome="ok"
- Failure-string heuristic → outcome="error", error_type=
"ResultStringError" (matches "Error:", "Skill X failed:", etc.)
- Self-recursion guard → tool_name="self_improve" + "" both skipped
- Returns None always (observe-only contract)
on_error signal capture (3):
- Generic exception → outcome="error", error_type captured
- TimeoutError-like → outcome="timeout" (distinct gap kind for
codec_self_improve._find_gaps)
- Self-recursion guard same as post_tool
on_operation_end + drafter spawn (3):
- Empty buffer → no thread spawned
- Sub-threshold (1 unknown-tool call < the ≥2 missing_tool threshold)
→ no thread spawned
- Threshold breach (2 unknown-tool calls) → drafter thread spawned
with target=_draft_and_write, daemon=True
Throttle (2):
- Same-tool repeat within window → second draft blocked
- Throttle=0 → both rounds spawn (proves throttle was the gating)
Kill switch (5):
- SELF_IMPROVE_PLUGIN_ENABLED=false → post_tool no-op
- Same → on_error no-op
- Same → on_operation_end no-op even with full buffer + threshold
- Default (env unset) → enabled=True
- All off-aliases (false/0/no/off/FALSE/Off) → enabled=False
Validation gate (2):
- Dangerous code (os.system) → _validate rejects
- Safe code → _validate accepts
End-to-end (1):
- Mock _draft_skill → run _draft_and_write → assert .md + .py written
to _PROPOSALS_ROOT/YYYY-MM-DD/, .md contains skill name +
PASSED/REJECTED status
Test isolation (per the 2026-05-01 incident lessons):
- codec_audit._AUDIT_LOG redirected to tmp_path (no real audit writes)
- codec_self_improve._PROPOSALS_ROOT redirected to tmp_path (no real
proposals to ~/.codec/skill_proposals/)
- _draft_skill monkeypatched to return canned data (NO Qwen calls)
- threading.Thread mock-captured to assert spawn without actually
running drafter (in spawn-assertion tests)
- Plugin state reset via _reset_state_for_test fixture per test
- NO osascript, NO subprocess, NO Apple Reminders / Notes / Calendar
Test run result: 21 passed in 44.28s (Qwen draft is mocked everywhere;
the 44s is mostly module-import + pytest collection overhead).
Post-test state files verified clean:
- ~/.codec/pending_questions.json: 0 entries
- Apple Reminders incomplete: 0
- /tmp/codec_*.txt: 0 files
Per Step 4 contract: no full-suite pytest run, no fixture leak risk
beyond this file's own monkeypatches.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrates
codec_self_improvefrom nightly polling to event-driven hook plugin per Phase 1 Step 4. Same proposals end up in~/.codec/skill_proposals/YYYY-MM-DD/— only the trigger changes.Architecture
codec_self_improve.run_once()is unchanged — CLI / skill / autopilot triggers all keep working. Plugin path is additive.Commits
89c5a3cplugins/self_improve.pyNEW (393 lines) — full plugin with metadata, hooks, throttle, kill switch9509f85codec_self_improve.pydocstring update — annotates the dual-trigger architecture; declares_find_gaps/_draft_skill/_validate/_write_proposal/_existing_skill_names+MAX_PROPOSALS_PER_RUN/_GAP_KIND_TO_SIGNAL/_PROPOSALS_ROOTas the stable internal-API surface the plugin imports. Zero code change.dc07b17tests/test_self_improve_plugin.pyNEW (361 lines) — 21 tests, 100% passingTest plan
pytest tests/test_self_improve_plugin.py -v→ 21 passed in 44.28sdocs/INCIDENT-2026-05-01-spurious-skill-fires.md)pending_questions=0,Apple Reminders=0,/tmp/codec_*.txt=0codec_self_improvepublic surface intact (10 names: 7 helpers + 3 module constants)codec_hooks._extract_metadatawithout executing the moduleTest coverage breakdown
_draft_and_writeproduces.md+.pyAll tests redirect
codec_audit._AUDIT_LOGandcodec_self_improve._PROPOSALS_ROOTtotmp_path._draft_skillis mocked everywhere — NO Qwen calls, NO Apple state, NO Terminal popups, NO osascript.Install path
Per the trust model (
AGENTS.md§3 plugin section: "local Python files curated by the user"), the plugin file lives in the repo and the user installs it manually:To uninstall:
rm ~/.codec/plugins/self_improve.py+ PM2 restart, OR setSELF_IMPROVE_PLUGIN_ENABLED=falseenv var on the affected processes.Safety / what this PR does NOT do
_HTTP_BLOCKEDchange.codec_self_improve.pyruntime behavior — only the docstring.osascript/subprocesscalls anywhere in the plugin (per the 2026-05-01 incident contract).cpthe file to~/.codec/plugins/themselves.Per-feature kill switch
SELF_IMPROVE_PLUGIN_ENABLEDtruefalse→ all 3 hooks no-op, buffer stops growing, no drafts. Set on PM2 restart of the affected process. Test:tests/test_self_improve_plugin.py::test_kill_switch_disables_post_tool(and 4 more aliases).Out of scope (Phase 2)
extra.trigger(plugin_hookvsnightly_run) — additive, can land separately.self_improveskill from MCP exposure entirely (it currently still fires when the LLM asks for it; the plugin just adds a second event-driven trigger). Decision deferred per user instruction.Phase 1 status after this PR
Once this lands, Phase 1 is complete.
🤖 Generated with Claude Code