feat: learn standing orders from mission patterns (#87)#120
Merged
Conversation
Implements a Darwin Gödel Machine-inspired pipeline that detects candidate standing orders from accumulated mission data and surfaces them for human review. Pipeline (in scripts/nelson_data_patterns.py): - Mine — cluster `avoid` text patterns across missions via Jaccard - Score — Fisher's exact test for outcome correlation + log-odds ratio - Filter — drop low-confidence patterns, patterns covered by existing orders (token containment), and previously dismissed candidates - Synthesize — FM-assisted prose with heuristic-stub fallback if no client is wired or the response is malformed - Persist — append to .nelson/memory/candidate-standing-orders.json CLI: `nelson-data detect-patterns | promote-candidate | dismiss-candidate`. Promotion writes a new `.md` under references/standing-orders/ with an audit-lineage comment listing the evidence mission IDs, and appends a row to the SKILL.md lookup table. Dismissal moves the entry to a dismissed archive so re-runs cannot resurface it. Hard constraint: candidates may only ADD new standing orders, never modify or remove existing ones — mitigates the objective-hacking failure mode documented in DGM Appendix H. The Intelligence Brief now surfaces a `CANDIDATE STANDING ORDERS (awaiting review): N` line when the queue is non-empty. 35 new tests cover statistical primitives, mining, novelty, end-to-end detection, FM-call robustness, ranking, promotion, dismissal, CLI smoke, and brief surfacing. Full suite remains green (338 passing).
Multi-perspective code review of the original commit surfaced one CRITICAL,
several HIGH-severity correctness/safety issues, missing regression tests
for the headline DGM safety invariant, and undocumented CLI surface. This
commit addresses them as a single fix pass — no scope changes.
Pipeline correctness
- Cluster fingerprint is now stable under input reordering. Replaced the
greedy single-pass clusterer with an order-independent union-find pass
and computed the fingerprint over the union of all variant tokens. The
canonical text is now the shortest variant so reruns pick a deterministic
representative. Without this, the dismissed-candidate archive could not
guarantee that dismissed patterns stay dismissed.
- Polarity guard at the candidacy gate. The filter now requires
``correlation < 0`` so a success-correlated avoid-text is never surfaced
as an anti-pattern candidate (its remedy would be the inverse of what
the data shows).
- ``_mine_event_sequences`` and ``_score_pattern`` skip records missing
``mission_id`` instead of raising ``KeyError`` — one corrupt entry no
longer takes down the whole brief surfacing path.
Safety / file-system writes
- ``promote_candidate`` re-slugifies the candidate title at the promotion
boundary so a hand-edited queue entry like ``../../../tmp/pwn`` cannot
escape the standing-orders directory.
- The SKILL.md table insertion is now anchored to the ``## Standing Orders``
heading and only scans rows within that section, so a similar-shaped row
elsewhere in SKILL.md (e.g. Damage Control) can no longer become the
insertion point.
- Trigger text is flattened before insertion so newlines / pipes cannot
break out of the table cell into arbitrary markdown (extra rows,
headings, prose) that Claude would later read as skill instructions.
- The SKILL.md mutation is now pre-flighted: if the heading or table is
missing, promotion fails before touching disk. If the SKILL.md write
fails after the standing-order .md has been written, the .md is rolled
back so promotion is transactional from the caller's perspective.
- Both ``.md`` writes (standing-order body and SKILL.md) now go through an
atomic tempfile + os.replace helper so a crash mid-write cannot leave a
torn file behind.
CLI consistency
- ``detect-patterns`` / ``promote-candidate`` / ``dismiss-candidate`` now
accept ``--missions-dir`` and derive ``memory_dir`` as
``{missions_dir}/../memory`` — the same rule that ``cmd_brief`` and
``nelson_data_memory`` use. Before, running ``detect-patterns`` from a
non-default ``--missions-dir`` could write to a memory dir that ``brief``
would not read.
- ``--candidate-id`` is now a named flag on promote and dismiss, matching
the flag-only convention used by every other subcommand.
- ``detect-patterns`` gains ``--json`` for machine-readable output.
- ``cmd_brief`` narrows its bare ``except Exception`` around the candidate
count to ``(OSError, ValueError, json.JSONDecodeError)`` so a real bug
in the candidates module surfaces in tests instead of being swallowed.
Tests
- 15 new regression tests covering: cluster ID stability under reordering;
polarity gate filters success-correlated patterns; path-traversal title
is rejected at promotion; newline / pipe injection in trigger is
neutralised; SKILL.md row is inserted under the right section; promotion
is idempotent on second call; missing SKILL.md raises before writing the
standing-order .md; missing table aborts atomically; the add-only
invariant (existing standing-order files and SKILL.md rows are
untouched) is verified; first-run with zero candidates does not litter
the memory dir; ``--missions-dir`` derivation works from a non-project
CWD; missing ``mission_id`` records are skipped; ``--json`` emits a
structured summary. Full Python suite 338 → 353 passing.
Documentation
- ``references/structured-data.md`` documents all three new subcommands
(``detect-patterns``, ``promote-candidate``, ``dismiss-candidate``)
with arguments, exit semantics, and the new ``--missions-dir`` rule.
2 tasks
harrymunro
added a commit
that referenced
this pull request
May 12, 2026
…e.md Replace the inline project-structure listing in both files with a short "Key references" section pointing at the new docs/project_structure.md and README.md. docs/project_structure.md captures the full repository layout in one place, including files previously missing from CLAUDE.md (circuit breakers, conflict radar, the full admiralty-templates and damage-control sets) and the learned-standing-orders pipeline from PR #120.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
skills/nelson/scripts/nelson_data_patterns.py) that minesavoidpatterns from accumulated mission data, scores them with Fisher's exact + log-odds, filters against existing orders and a dismissed archive, and synthesises candidate standing orders for human review (with a heuristic-stub fallback when no FM client is wired).nelson-data.py:detect-patterns,promote-candidate,dismiss-candidate. Promotion writes a new.mdunderreferences/standing-orders/with an audit-lineage comment and adds a row to the SKILL.md lookup table; dismissal archives the fingerprint so re-runs cannot resurface it.CANDIDATE STANDING ORDERS (awaiting review): Nin the Intelligence Brief (text + JSON) when the queue is non-empty.test_nelson_data_patterns.py; full Python suite remains green (338 passing).Closes #87.
Design notes
confidence × (1 + novelty)rather than the plan'ssigmoid(confidence) × (1 + novelty). Sigmoid maps[0, 1]confidence into[0.5, 0.73]— too compressed to discriminate when novelty's range is[1, 2]. Documented inline in_review_score.detect-patternsskips writing an empty queue file on first runs to avoid littering the memory dir.Test plan
pytest skills/nelson/scripts/test_nelson_data_patterns.py— 35/35 passpytest skills/nelson/scripts/— 338/338 passruff checkclean on new codemin_missions=10) — 0 candidates, no crash, no empty file.md(header / Trigger / Symptoms / Remedy / Related orders / audit lineage) and appends row to SKILL.md table