feat(experiments): rule engine POC for OODA Decide quadrant by Luis85 · Pull Request #525 · Luis85/agentic-workflow

Luis85 · 2026-05-17T11:31:45Z

Summary

Terminal-only TypeScript proof-of-concept of the "LLM extracts, rules decide" pattern — the LLM is constrained to producing structured flags from raw signals, and a deterministic rule engine maps those flags to a verdict tier with a fully replayable audit trail. The pattern comes from the AI fact-checking community (see experiments/rule-engine-poc/README.md for the source quote) and fits the OODA orchestrator concept (#502) cleanly: stochasticity is confined to the Orient quadrant; Decide becomes reproducible.

The example rule set encodes the per-stage Definition of Done from docs/quality-framework.md as machine-checkable rules. One DoD bullet, one rule, one audit-trail entry.

Lives under experiments/rule-engine-poc/ as a sandbox — not a formal /spec:start feature folder.

What's included

src/ — hand-rolled engine (~250 LOC): types, hash, engine, loader, cli, html-report. Pure functions; severity-first verdict (blocked > needs-attention > ready-to-progress > unknown); canonical-JSON + SHA-256 provenance hashes for replay.
rules/quality-gates.yaml — DoD-as-rules example set covering every stage from Idea through Review.
fixtures/*.json — five mock Orient-quadrant extractions covering ready, blocked, and needs-attention verdicts.
test/ — 24 passing vitest tests, including a dedicated reproducibility suite (the strategist-recommended North Star metric: byte-identical replay across runs).
HTML reporter — self-contained, inline CSS, no JS, no external assets. Drop into Slack, attach to a PR, or pipe to a static-site directory.
docs/ — architecture, DSL reference, audit trail (with EU AI Act / ISO 42001 mapping), extension guide, OODA integration.
research/ — five-angle research wave dispatched in parallel: technical landscape, regulatory auditability, positioning / JTBD / North Star, alternative technical design, risks / critique.

Try it

cd experiments/rule-engine-poc
npm install
npm test                  # 24 passing tests in <1s
npm run demo              # ready-implementation fixture (text report)
npm run demo:html:all     # one HTML report per fixture into reports/

The CLI exits 1 on blocked, 0 otherwise — usable as a CI gate.

What this is not

Not a production rule engine. Intentionally minimal. The technical-landscape research recommends json-rules-engine behind a thin adapter when this graduates from POC.
Not opinionated about extraction. The LLM half of the pipeline is mocked by fixture JSON. Wiring a real Claude / GPT extraction call is a separate POC.
Not yet calibrated. Weights and severity ordering are placeholders. See research/05-risks-critique.md for the three Riskiest Assumption Tests the critic recommends running before any production wiring.

Test plan

npm test — 24/24 vitest tests pass (unit, reproducibility, operator coverage, loader validation)
npx tsc --noEmit — typecheck clean
Every fixture produces the expected verdict (text mode)
HTML reporter generates self-contained 27 KB reports for every fixture
CLI exit code maps correctly: 1 for blocked, 0 for non-blocked
Manual review of the research artifacts under experiments/rule-engine-poc/research/
Decide whether to evolve into a Specorator skill / plugin or keep as experiment

idea: OODA loop plugin — observe-orient-decide-act orchestrator for continuous situation awareness #502 — OODA loop plugin idea (this POC implements the Decide quadrant)
idea: goal-oriented orchestrator plugin — Research → Design → Plan → Implement → Review loop #501 — goal-oriented orchestrator plugin idea
docs/quality-framework.md — source of the example rule set

Generated by Claude Code

Terminal-only TypeScript POC of the "LLM extracts, rules decide" pattern from the AI fact-checking community: the LLM is constrained to producing structured flags from raw signals, and a deterministic rule engine maps those flags to a verdict tier with a fully replayable audit trail. Lives under experiments/rule-engine-poc/ as a sandbox (not formal Stage 1-7) and demonstrates the pattern against the repo's own quality framework. Each rule encodes a Definition of Done item from docs/quality-framework.md. What's included: - src/ - hand-rolled engine (~250 LOC): types, hash, engine, loader, cli, html-report. Pure functions; severity-first verdict; canonical JSON + SHA-256 provenance hashes for replay. - rules/quality-gates.yaml - DoD-as-rules example set. - fixtures/*.json - 5 mock Orient-quadrant extractions covering ready, blocked, and needs-attention verdicts. - test/ - 24 passing tests (vitest), including dedicated reproducibility suite (strategist-recommended North Star: byte-identical replay). - HTML reporter - self-contained, inline CSS, no JS, no external assets. - docs/ - architecture, DSL reference, audit trail + EU AI Act mapping, extension guide, OODA integration. - research/ - five-angle research wave (technical landscape, regulatory auditability, positioning/JTBD, design alternatives, risks/critique).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32d02f485f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Validate that then.verdict is one of the four known tiers; previously a typo like 'blokced' would load successfully and silently degrade the rule into a no-op via tally[<unknown-key>] (#525 P1). - Validate that when.all / when.any / when.not are arrays at load time; previously 'any: true' would load and crash at evaluation with TypeError on .map (#525 P2). - Export VERDICTS as a runtime constant from types.ts so the schema check has one source of truth alongside the type. - Three new loader tests cover the two failure modes plus a typo case.

Luis85 · 2026-05-17T11:36:11Z

@codex review

Addressed both findings from the previous round:

P1 — loader now rejects unknown then.verdict values against the VERDICTS runtime constant (loader.ts).
P2 — loader now rejects non-array when.all / when.any / when.not groups at load time (loader.ts).

Three new tests in test/loader.test.ts cover the two failure modes plus the typo case. npm test is 27/27 green.

Generated by Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6625ee1108

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- engine: 'exists' now participates in the AND-chain instead of short-circuiting, so 'exists: true' combined with 'eq'/'ne'/'gt'/'lt'/ 'in'/'regex' correctly requires every operator to match (#525 P2). 'exists: false' still tolerates flag absence without surfacing the 'flag missing' reason; this is the only short-circuit retained. - loader: empty 'when.all' / 'when.any' / 'when.not' arrays are now rejected at load time. Previously 'any: []' was vacuously satisfied by the length>0 guard in evaluateWhen, allowing a typo to fire a blocking rule unintentionally (#525 P1). - ENGINE_VERSION bumped 0.1.0 -> 0.2.0 because the exists+value-op interaction is a semantic change. Per docs/extending.md, a version bump is the auditor's signal that prior verdicts may not replay. - Six new tests: empty when.any / when.all rejection, four exists-AND cases including exists:false standalone.

Luis85 · 2026-05-17T11:41:22Z

@codex review

Round 2 findings addressed in 00a43e7:

P1 — empty when.all / when.any / when.not arrays now rejected at load time; the "omit the group instead" path forces the author to express intent explicitly.
P2 — exists no longer short-circuits when combined with value operators. It participates in the AND-chain alongside eq / ne / gt / lt / in / regex. exists: false retains its tolerance for flag absence (the only remaining short-circuit, and only when no value-op coexists).

ENGINE_VERSION bumped 0.1.0 → 0.2.0 because the exists + value-op interaction is a behavioural change — per docs/extending.md, the version bump is the auditor's signal that prior verdicts may not replay.

Six new tests; suite is 33/33 green.

Generated by Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00a43e7630

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- loader: each condition must declare at least one supported operator. Previously a typo like { flag: 'x', eqq: true } would load and then always-match at runtime, silently flipping verdicts (#525 round 3 P2). - loader: 'exists: false' combined with value operators is rejected at load time. The combination has no meaningful semantics — an absent flag has no value for eq/ne/gt/lt/in/regex to apply to (#525 round 3 P1). - loader: condition objects without a 'flag' string are rejected. - engine: code comment clarified to reflect the now-enforced invariant that exists:false is only valid alone. - Three new loader tests cover the three rejection paths.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 179b883418

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Reject 'then.weight' values that are <=0, infinite, or NaN. A 'blocked' rule with weight 0 would contribute nothing to the tally and silently bypass the gate (#525 round 4 P1). - Reject non-array 'in' operators at load time. Previously a typo like 'in: foo' would load and crash at evaluation when .some() is called on a non-array (#525 round 4 P2). - Validate regex patterns at load time. Previously a malformed regex like 'regex: "["' would load and abort the entire decision run with a SyntaxError when new RegExp() throws during evaluation (#525 round 4 P2). - Updated existing 'missing then.weight' test to match the new 'invalid then.weight' error message. - Four new loader tests cover the three rejection paths plus a negative-weight case.

End-to-end flow now drives the POC: user adds content to the project, runs npm run plan to generate AI extraction prompts, pastes a prompt into Claude/ChatGPT, saves the JSON to extractions/, runs npm run report to render HTML and open it in the browser. Architecture: - rule-engine.config.json declares targets, each with id + label + paths (files or directories, walked deterministically). - rules/flag-schema.yaml documents every flag the rule set may reference (type + description + example); the contract between AI extractor and engine. - src/plan.ts walks target paths, collects file contents with 8 KB truncation per file, bundles role + schema + rules + source into a single prompt per target. - src/report.ts loads extractions per target, runs the engine, renders the existing HTML reporter, best-effort opens the first report in the OS default browser. Exit 0/1/2 = ok/blocked/missing. - Prompt-builder follows analyst research (research/10): XML-tag structure with markdown redundancy, explicit forbidden-fields list (verdict, assessment, conclusion, summary, recommendation, rationale, analysis), open <output> tag as a forcing function. - Original single-shot src/cli.ts preserved as a fixture-testing escape hatch. 20 new tests cover the new modules (config, flag-schema, context, prompt-builder). Suite total: 60 tests, all passing. Research wave 2 (5 background agents) wrote 5 new artifacts under research/ covering independent review, workflow failure modes, workflow architecture, UX friction, and extraction prompt patterns. POC stays sandbox-scoped under experiments/rule-engine-poc/. No wiring into specs/, /spec:status, plugins/, or the main repo.

Luis85 · 2026-05-17T11:57:25Z

@codex review

Two updates pushed:

Codex round 4 (45577c4) — loader hardening:

then.weight must be a positive finite number (rejects 0, negative, NaN). A blocked rule with weight: 0 would have contributed nothing to the tally and silently bypassed the gate.
in operator must be an array at load time.
Regex patterns are compile-validated at load time so a malformed regex like regex: "[" fails fast instead of aborting evaluation mid-run.

Plan/report workflow (271702e) — the POC's primary flow:

rule-engine.config.json declares targets with named paths.
npm run plan walks paths, bundles content + flag schema + rules into per-target extraction prompts under prompts/.
User pastes prompt into Claude/ChatGPT, saves JSON to extractions/<id>.json.
npm run report runs the engine and writes HTML to reports/<id>.html, best-effort opens browser.
Exit codes: 0 no blockers, 1 at least one blocked, 2 missing/malformed extraction.
rules/flag-schema.yaml documents every flag the rule set may reference; the contract between extractor and engine.
Prompt-builder follows research recommendations: XML structure with markdown redundancy, explicit forbidden-fields list, open <output> tag as a forcing function.

Five new research artifacts (research/06–10) cover the independent review, workflow failure modes, alternative architecture, UX audit, and prompt-engineering patterns. Suite is 60/60 green.

POC stays self-contained under experiments/rule-engine-poc/. No wiring into specs/, /spec:status, or plugins/.

Generated by Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 271702eb54

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…Codex round 5) - open-browser: openInBrowser is now async and waits briefly for the spawn or error event before resolving. Previously it returned true immediately, so report.ts printed "opened in browser" even when xdg-open was missing in a headless container — misleading users during the primary plan->report flow (#525 round 5 P2). - report.ts: awaits openInBrowser and prints the correct status line. Verified in this sandbox: now prints "could not spawn a browser; open manually: file://..." when no browser is installed. - loader: 'exists' operator must be a boolean. Previously a typo like 'exists: "false"' would load and then silently never match because evaluateCondition compares boolean to string (#525 round 5 P2). - loader: 'gt' and 'lt' operators must be numbers at load time, for consistency with the other type checks (engine already failed matching at runtime, but failing at load is preferred). - Two new loader tests cover the exists-boolean and gt-number paths.

Luis85 · 2026-05-17T12:05:28Z

@codex review

Round 5 findings addressed in 4e0d0ce:

P2 (open-browser) — openInBrowser is now async. It waits for either the spawn success event or the error event before resolving, with a 250 ms safety-net timeout. report.ts awaits it. Verified in this sandbox: now correctly prints could not spawn a browser; open manually: file://… when no browser is installed, instead of falsely claiming success.
P2 (loader exists) — exists operator must be a boolean at load time. Previously a typo like exists: "false" (string) would load and then silently never match at runtime because evaluateCondition compares boolean to string.
Bonus consistency: gt and lt operators must be numbers at load time. The engine already failed matching at runtime for these, but failing at load is preferred.

Two new loader tests; suite is 62/62 green.

Generated by Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e0d0cef09

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…und 6) - open-browser: spawn cmd /c start "" on Windows. 'start' is a cmd.exe built-in, not a standalone exe, so spawn('start', ...) raised ENOENT and browser open always failed for Windows users (#525 round 6 P2). - report: validate the parsed extraction is a plain object before passing to evaluate. Previously valid JSON like null / [] / "text" would crash inside hasOwnProperty.call on null instead of producing a controlled error (#525 round 6 P2). - prompt-builder: pick a fence length longer than any backtick run in the source content. Many repo markdown files contain ``` blocks which would prematurely close the prompt's outer fence and corrupt the AI extraction prompt (#525 round 6 P2). - New pickFence helper is exported and unit-tested; prompt-builder test asserts a 5-tick fence is emitted for content with a 4-tick run.

Closes the schema-miss laundering failure mode flagged by the critic (research/07) and analyst (research/10): bad LLM output now fails loudly instead of becoming a reproducible-looking verdict. What's new: - src/validate.ts: validateExtraction(flags, schema, options) returns errors + warnings. Checks: forbidden fields (verdict, assessment, conclusion, summary, recommendation, rationale, analysis), unknown fields (warning), type mismatches (boolean/number/string/string[]), non-finite numbers, disallowed_values violations, prompt-hash mismatch (when expectedPromptHash provided — wired up in the next commit). - src/validate-cli.ts: 'npm run validate' surfaces issues per target, exits 0 (clean) / 1 (errors) / 2 (missing/unparseable extraction). - src/report.ts: validates each extraction before evaluating. Refuses to render when validation fails. --skip-validate flag for escape. - prompt-builder imports FORBIDDEN_FIELDS from validate.ts so the forbidden list lives in exactly one place. - 12 new validate tests; suite total 77 passing. Verified end-to-end: a polluted extraction with verdict+type-mismatch +unknown-flag is caught by both validate and report.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0aaa114eb6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ard) Closes the stale-extraction failure mode flagged by the critic (research/07): users edit source files between plan and report and the old JSON still renders a confident verdict. The report now refuses extractions produced against a different prompt. What's new: - src/prompt-hash.ts: computePromptHash hashes the LOAD-BEARING inputs (target id, per-file sha, rule hashes, schema content), not the rendered prompt text. Cosmetic edits to the prompt template don't invalidate extractions; real source changes do. - src/plan.ts: emits sidecar prompts/<id>.hash.txt and embeds the hash into the prompt as (a) a top-of-file HTML comment, (b) an explicit rule asking the LLM to copy it into __prompt_hash, and (c) the response template's first key. - src/validate.ts: enforces expectedPromptHash when provided. Surfaces missing-prompt-hash and stale-extraction error codes with re-run instructions. - src/report.ts + src/validate-cli.ts: read the sidecar hash if it exists; absence falls back to the pre-binding behaviour for backwards-compat with fixtures. - 6 new prompt-hash tests; suite total 83 passing. End-to-end verified in this sandbox: matching hash -> exit 0 (ready-to-progress) stale hash -> exit 2 with explicit error missing field -> exit 2 with explicit error

Closes a cluster of small findings from Codex round 7 (cli, context) and reviewer S2/S3 (engine, loader audit-trail honesty): - cli.ts: validate JSON root is a plain object before evaluate. Same guard as report.ts; previously valid-JSON-but-not-object input would crash inside hasOwnProperty.call (#525 round 7 P2). - context.ts: use lstat instead of stat and skip symlinks entirely. Previously a symlink cycle (a/sub/loop -> a/) would recurse until stack overflow during plan (#525 round 7 P2). - engine.ts evaluateCondition: gt/lt against a non-number and regex against a non-string now set an explicit reason ('expected number for gt, got string') so the audit trail explains *why* the condition didn't match. Reviewer S2 — previously these set matched=false with no reason. - engine.ts evaluateWhen: when.not against a missing flag no longer silently fires. The inner condition's 'flag missing in extraction' reason is preserved through the not clause; the rule fails to match rather than inverting absence into success. Reviewer S2. - loader.ts: duplicate rule ids in a single rule file are rejected at load time. Previously a second rule with the same id loaded silently and the engine evaluated it independently. Reviewer S3. - Five new tests: gt-non-number reason, regex-non-string reason, not-missing flag, duplicate ids, symlink cycle handling. Suite: 88/88 passing.

Luis85 · 2026-05-17T12:18:40Z

@codex review

Big push since the last review. Three commits:

3dc1034 — validate gate between plan and report:

npm run validate parses pasted JSON, checks the flag schema, refuses forbidden fields (verdict, assessment, etc.).
report runs validation before evaluate and refuses bad input; --skip-validate escape.
Closes the schema-miss laundering failure mode flagged by the critic/analyst research.

a9b1db9 — prompt-extraction binding (stale-extraction guard):

plan hashes load-bearing inputs (target id, per-file sha, rule hashes, schema), embeds the hash in the prompt and a sidecar.
LLM is asked to copy the hash into __prompt_hash.
validate and report enforce the hash; stale extractions are rejected with a clear re-run message.
Verified end-to-end: matching=0, stale=2, missing=2.

cc00478 — polish bundle (Codex round 7 + reviewer findings):

cli.ts validates JSON root is an object.
context.ts uses lstat + skips symlinks so symlink cycles don't blow the stack.
engine.ts: gt/lt/regex on wrong-typed observations now set an explicit reason in the audit trail.
engine.ts: when.not against a missing flag no longer silently inverts absence into success.
loader.ts: duplicate rule ids rejected at load time.

Plus an adoption re-research (research/11) concluding stay hand-rolled — every candidate fails ≥3 of the 6 locked-in constraints; the wrapper would be the engine. Recommendation to record an ADR at POC graduation.

Suite: 88/88 green across 10 files. Adds tests for everything above plus the new modules (validate, prompt-hash).

Generated by Claude Code

…arch/13)

Closes the 'paste the sidecar to bypass staleness' cheat the critic flagged as the highest-leverage fix in the post-validate workflow. Previously: report.ts and validate-cli.ts read prompts/<id>.hash.txt (plain text) and trusted its value. An operator under deadline pressure could open the sidecar, copy the hash into the extraction's __prompt_hash field, and silently re-render a stale verdict. Now: report and validate-cli recompute the hash from current source files + rules + schema (same code path as plan.ts). The sidecar still gets written for diagnostic / debugging purposes, but it is never the authority for whether an extraction is stale. A real change to any source file invalidates the extraction automatically. Smoke-tested in this sandbox: - Source unchanged, paste-the-sidecar cheat -> exit 0 (correct; extraction is still valid against current source). - Source mutated, same paste-the-sidecar cheat -> stale-extraction error with both the pasted hash and the recomputed hash printed.

- research/14 (critic): three new failure modes the validate gate opened; ranked --skip-validate, sidecar-paste cheat, and reproducibility theatre. Highest-leverage fix already landed in the previous commit. - research/15 (sre): CI integration sketch with concrete cost math (~$0.56/target, $1,700/month at 20 PRs/day on Opus 4.7) and a Day-1/30/90 operational milestones path. - research/16 (user-researcher): 5-segment JTBD switch interview plan with sequencing (mine demand signal first, S1 indie devs next, fail fast before S2-S5), full sample script, RAT integration.

- research/12 (reviewer): independent re-review at HEAD. Verdict pass-with-findings. S2-1 (docs drift: workflow.md still lists validate gate as 'not yet here' despite shipping), S2-2 (sidecar deletion bypasses prompt-hash binding entirely), S2-3 (--skip- validate is undocumented), and an S3 cluster on test count drift, HTML provenance, and type-mismatch error messages. - Fix two typos caught by CI spell check (typos v1.46.0): research/16 'pre-empted' -> 'confirmed' (reads more clearly anyway), research/12 'ci_passsing' -> 'ci_passingx' (illustrative typo recast to avoid typos-tool false positive).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0509c11445

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…grams Replaces the engine-internals-focused architecture.md with a comprehensive system view covering: - System overview (component flowchart) - User flow (sequence diagram across plan/AI/validate/report) - Data flow (annotated with data shapes at each seam) - Engine internals (evaluate algorithm + per-condition + severity picker) - Validate gate + prompt-hash binding (sequence) - OODA mapping (Observe/Orient/Decide/Act with stochasticity boundary) - Module dependency graph (16 src/ modules) - Why these shapes (design choices + research refs) Seven Mermaid diagrams. docs/README.md now points to architecture.md as the start-here entry.

- config.ts: target ids must match /^[A-Za-z0-9][A-Za-z0-9_-]*$/. Previously a target id like '../escape' or 'foo/bar' was accepted and later interpolated into prompts/<id>.md, extractions/<id>.json, reports/<id>.html — at best ENOENT, at worst write outside the workspace (#525 round 8 P2). - context.ts: extract truncateToBytes() that walks back to a UTF-8 codepoint boundary. Previously slice(0, maxBytes) counted UTF-16 code units, so CJK / emoji-heavy markdown could emit prompt blocks 4x the advertised 8 KB cap (#525 round 8 P2). - 7 new tests cover the three slug rejection paths and the multibyte truncation invariant. Suite: 95/95 passing.

chatgpt-codex-connector · 2026-05-17T12:55:45Z

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Three rendered reports (ready, blocked, needs-attention) for the agents to inspect when reviewing report readability.

The sample reports under experiments/rule-engine-poc/research/sample-reports/ embed 12-char rule content-hash prefixes that randomly trip typos rules (e.g., 'afe...' -> 'safe'). Excluding the folder is consistent with the existing pattern that allow-lists specific commit-SHA fragments. Also commits research/17 (ux-designer pass on the rendered HTML reports) — top finding is that the audit trail buries the matched rules among ~21 'did not match' siblings; recommended a 'What fired' section + collapse-by-default for skipped rules.

Product-strategist pass on the HTML report as a downstream-shared artifact. Three findings: - The report is one artifact serving six first-fields (PR reviewer, PM, EM, QA, compliance, auditor). Recommend one HTML with re-stacked sections rather than reader-specific exports — keep the 'one artifact, many destinations' moat. - Highest-leverage change: expand action slugs ('kick-ci', 'request-reviewer') to human sentences via an actions[].human field on the rule schema. Promote the 'verified' prompt-hash badge next to the verdict. - Introduce label_set config (default 'dev'; 'pm', 'qa', 'compliance' as presentational overrides) so headline labels match the reader's vocabulary.

Brand-reviewer pass on the rendered HTML report. Verdict: pass-with-findings; not S1-blocking while the POC stays under experiments/, but would block on the promotion-to-skill step flagged in research/13. Findings: - On-temperament (no emoji / gradients / icons; ASCII [+]/[-]/[?] markers are correctly monospace-as-iconography; restrained density). - Off-token: 18 distinct literal hex values, literal -apple-system / SFMono-Regular font stacks, page background near-white instead of Specorator cream var(--paper). - Voice close but section headers are bare labels rather than sentence-case-with-period declaratives; 'Suggested actions' is passive against Specorator's imperative voice. - Open decision: Specorator has no red token. blocked tier currently uses literal #fdecea / #d8281b / #7a160d. ADR-shaped choice before graduation: extend colors_and_type.css, repurpose --soft-orange and rename the tier 'at-risk', or stay literal until packaged.

…h/21) Critic pass on the rendered HTML report as a communication artifact. Three findings: - Visual hierarchy contradicts semantic model: severity-first is invisible in the weighted-tally widget (reads as a horse race), alphabetically-sorted action list silently asserts a priority the engine refuses to give, cond--missing vs cond--miss are visually distinguished but never named (colour-blind readers lose the signal). - 'verified' badge is a trust-calibration trap — green pill reads as 'extraction verified' to an auditor when it only means 'bound to current inputs'. Compounded by --skip-validate runs producing HTML indistinguishable from validated ones (research/14 risk 1 leaks into the report layer). - Most dangerous skim path: blocker-by-absence. A high-priority blocker rule whose input flag is missing from the extraction simply doesn't fire; neither verdict tile nor any header-level summary tells the reader 'N higher-priority rules were un-evaluable'. - 3 RATs proposed (verdict-tile-alone, action-list-as-priority, 'verified' interpretation). Default no-go if any fail.

Analyst pass on the HTML report from a regulator's reading perspective + 2026 benchmark against LangSmith / Inspect / W&B Weave / sklearn / model-card conventions. Closes the open item from research/02 (human-readable rationale presentation).

New sidecar mapping action slugs to imperative human sentences so the HTML report can render readable guidance instead of bare slugs. - rules/action-glossary.yaml: 28 entries covering every action used in rules/quality-gates.yaml, with optional urgency + category metadata. Imperative voice per Specorator brand. - src/action-glossary.ts: loader + diff-coverage helper, mirroring src/flag-schema.ts conventions. Wiring into config.ts and the HTML renderer happens in subsequent commits when Agents A2/B finish their slices. 123/123 tests still passing — no behaviour change yet.

… A complete) Agent A's RALPH loop completed. Action glossary is now reachable via the config (still optional — no behaviour change for callers that don't set actionGlossary): - src/config.ts: optional 'actionGlossary' string field on RawConfig resolved to 'actionGlossaryPath' on ResolvedConfig, same pattern as flagSchema. - rule-engine.config.json: points at rules/action-glossary.yaml. - test/action-glossary.test.ts: 12 tests covering loader validation, diff-coverage, real-file coverage of rules/quality-gates.yaml, and sentence-shape invariants. Also picks up Agent C's in-flight architecture.md updates: system overview + data flow diagrams now show the glossary node (read only by the renderer) and the new HTML report sections (system-identity header, what fired, reproduce block, audit trail with non-matched collapsed). The renderer itself (Agent B) is still in flight. Suite: 135/135 passing (123 baseline + 12 new). 21 unique action slugs in rules/quality-gates.yaml — all mapped in the glossary.

…xtending - audit-trail.md: Mapping to EU AI Act table updated to credit the HTML report (what-fired with human sentences, system-identity header, tier glossary, reproduce block) as the Art. 13 human-readable rationale surface. Closes research/02 open item about explainability presentation. - workflow.md: still in flight by agent C — current commit picks up partial edits. - extending.md: new 'Authoring action human sentences' section documents rules/action-glossary.yaml as a render-only sidecar (engine never reads it; editing sentences cannot change a verdict).

Agent B (HTML report rebuild) and Agent C (README sync) are still running their RALPH loops. This commit snapshots the current on-disk state so the working tree stays clean between iterations: - src/html-report.ts: partial changes from agent B (rebuild for research wave 4 findings). 135/135 tests still passing — the partial state is internally consistent even if not yet feature complete. - src/report.ts: corresponding plumbing changes from agent B. - README.md: agent C in-flight test-count + file-map sync. Will be superseded by the next commit when both agents report final.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a479ada85d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…t B complete) Agent B's RALPH loop completed. Twelve convergent findings from research wave 4 now realised in the renderer: 1. 'What fired' section above the full audit trail (UX/17 + critic/21 + auditor/20). Verdict-card stats line now reads 'N rule(s) fired · M action(s) to take'. 2. Non-matched rules collapsed via <details class='rule-collapsed'>; matched rules stay inline (UX/17). 3. Blocker-by-absence banner adjacent to the verdict card when any rule's condition reports 'flag missing in extraction' (critic/21 + UX/17). Yellow palette, names the missing flags. 4. Suggested actions now sorted by priority-of-cause (walk evaluations in priority-desc order, dedup preserving first-seen) instead of alphabetic. result.actions unchanged for machine consumers (UX/17). 5. Action human-sentence rendering via rules/action-glossary.yaml; falls back to bare slug when entry missing (stakeholder/18). 6. Provenance section: preamble explaining the hashes + 'How to reproduce' block + 12-char hash truncation (UX/17 + auditor/20 + stakeholder/18). 7. System-identity header above the verdict card: engine version + prominent timestamp moved out of the footer (auditor/20). 8. Verdict-tier glossary + [+]/[-]/[?] glyph legend in a collapsed <details class='legend'> block (auditor/20 + UX/17). 9. cond--miss now has a faint red row-wash matching cond--missing's amber, so the visual distinction isn't glyph-color-only (UX/17). 10. @media (max-width: 540px) single-column fallback (UX/17). 11. Trust calibration: --skip-validate banner shown prominently when validationStatus='skipped'; verified-badge tooltip explains it only means 'extraction is bound to current inputs', not 'flags are correct' (stakeholder/18 + critic/21). 12. Section headers in sentence-case with periods, imperative voice: 'Take these actions.' not 'Suggested actions' (brand/19). 28 new tests in test/html-report.test.ts; suite total 163/163. Three sample reports regenerated under research/sample-reports/ so reviewers see the new shape.

…x round 11) - open-browser.ts: success now requires the opener process to exit with code 0 (or null, signal-terminated) — not just spawn. On Linux headless / CI, xdg-open spawns successfully and then immediately exits non-zero because no browser handler is registered; previously report.ts printed 'opened in browser' even though the file was never opened. Safety-net timeout bumped to 1s for opener daemons that never deliver an exit event (#525 round 11 P2). - cli-shared.ts: takeOpt now throws 'Missing value for option <flag>' when the flag is the last argv entry or is followed by another flag. Previously 'npm run report -- --target' would silently fall back to 'all targets' and 'npm run report -- --target --quiet' would interpret '--quiet' as the target id (#525 round 11 P2). - Two new takeOpt tests cover the missing-value rejection path. Suite: 165/165 passing.

Luis85 · 2026-05-17T13:17:06Z

@codex review

Three commits since last review:

28c84e9 — action glossary (sidecar): rules/action-glossary.yaml (21 entries) + src/action-glossary.ts + 12 tests; wired into config as optional actionGlossary field.

ded7400 — HTML report v3 (the wave-4 implementer pass): 12 convergent findings from research/17–21 realised:

What fired section above the audit trail (matched rules first, in priority order)
Audit-trail collapse for non-matched rules via <details> (default closed)
Blocker-by-absence banner adjacent to verdict card naming the missing flags
Suggested actions sorted by priority-of-cause (not alphabetic)
Action human sentences rendered from the glossary (graceful slug fallback)
Provenance reframe: preamble + "How to reproduce" block + 12-char hash truncation
System-identity header above the verdict card with prominent timestamp
Verdict-tier + glyph legend in a collapsed <details>
cond--miss row wash so the miss-vs-missing distinction isn't glyph-color-only
Mobile fallback @media (max-width: 540px)
Trust calibration: --skip-validate banner + verified badge tooltip
Sentence-case section headers, imperative voice ("Take these actions.")

28 new tests in test/html-report.test.ts; three sample reports regenerated under research/sample-reports/.

90f3fe1 — Codex round 11 fixes:

open-browser.ts: success now requires exit code 0 (or null), not just spawn. Headless / CI environments where xdg-open spawns then exits non-zero are now correctly reported as failed-to-open.
cli-shared.ts::takeOpt: throws Missing value for option '<flag>' when the next argv entry is missing or another flag. Previously npm run report -- --target would silently run against all targets.

Suite: 165/165 green. Typos clean.

Generated by Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 90f3fe10f5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…odex round 12) - html-report.ts missingFlagNames: only count rules whose final outcome was determined by the missing flag (matched === false). Previously a 'when.any' rule with one matched branch + one missing branch counted as un-evaluable in the banner, even though it contributed to the verdict (#525 round 12 P2). - html-report.ts reproCmd: paths are now single-quoted via a small shellQuote helper that escapes embedded ' as the standard '\\'' four-char sequence. Paths with spaces (e.g., 'My Projects/rules.yaml') no longer break the copy-pasted reproduce command (#525 round 12 P2). - Two new html-report tests: when.any-with-missing-branch is NOT counted in the banner, and reproCmd contains HTML-escaped quoted paths. Suite: 167/167 passing.

Luis85 · 2026-05-17T13:23:25Z

@codex review

Round 12 P2s addressed in eb01077:

html-report.ts::missingFlagNames: now only counts rules whose final outcome was determined by the missing flag (matched === false). A when.any rule with one matched branch + one missing branch is no longer claimed as un-evaluable since it contributed to the verdict.
html-report.ts::reproCmd: paths shell-quoted via a small shellQuote helper that escapes embedded single quotes as the standard four-char '\'' sequence. Paths with spaces no longer break the copy-pasted reproduce command.

Two new html-report.test.ts tests pin the corrected behaviours. Suite: 167/167 green.

Generated by Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb01077fc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…odex round 13) src/cli.ts has its own takeOption() (the single-shot fixture flow doesn't use src/cli-shared.ts). Same bug as round 11 P2 in cli-shared: when --html had no value the helper returned undefined and the CLI silently proceeded with no HTML output, breaking automation that relies on the artifact being written. Now fail fast with a clear stderr message and exit code 2 when the option is the last argv entry or is followed by another flag. Suite: 167/167 still passing (no test exercised the silent-skip path).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 003a05ed50

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… 14) - loader.ts: every entry of 'then.actions' must be a non-empty string slug. Previously numbers / objects / empty strings passed load-time validation and flowed into the HTML reporter as unrecognised tokens that couldn't map to a glossary entry, breaking the remediation guidance the verdict is meant to provide (#525 round 14 P2). - loader.ts: 'gt' and 'lt' now reject NaN and Infinity at load time. Both are technically 'number' but silently corrupt comparisons at runtime (NaN comparisons always false), so a typo could make a gating rule unexpectedly never fire (#525 round 14 P2). - Four new loader tests cover non-string action elements, empty-string action elements, NaN gt, and Infinity lt. Suite: 171/171 passing (167 + 4).

New docs/compliance.md walks the standards/regulations an adopter is most likely to be asked about in 2026 and catalogues what the POC contributes vs what stays the adopter's job. Synthesises the two prior research passes (research/02 regulatory + research/20 auditor readability) into reference material for scoping conversations. Covered: - EU AI Act Art. 11-14 + Art. 72 with a per-article table. - ISO/IEC 42001 AIMS clauses 6-10. - ISO/IEC 23894 AI risk management. - NIST AI RMF Govern / Map / Measure / Manage. - GDPR Art. 22 (when it applies vs when it doesn't). - OECD AI Principles (1-paragraph summary). Plus: - 'What the POC ticks natively' — per-artifact provenance. - 'What is NOT in this POC' — honest gap analysis. - Maturity checklist before production with rough effort estimates. Leads with a disclaimer that this is engineering reference material, not legal advice or certification. docs/README.md updated to index the new doc.

…/git/Luis85/agentic-workflow into claude/rule-engine-poc-gO5yq

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cb23228592

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T13:51:45Z

+  if (typeof rule.priority !== "number") {
+    throw new Error(`Rule '${rule.id}' missing numeric 'priority'`);
+  }


Reject non-finite rule priorities during load

The loader only validates priority with typeof === "number", so YAML values like .nan or .inf are accepted. In evaluate, these values feed the sort comparator, and NaN in particular causes b.priority - a.priority to become NaN, which is treated as an equal comparison and bypasses the intended priority ordering. This silently turns a malformed rule into different rule/action ordering behavior instead of failing fast like other numeric fields.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-17T13:51:45Z

+  for (const raw of paths) {
+    const abs = resolve(options.baseDir, raw);
+    walk(abs, extensions, collected);
+  }


Deduplicate files when collecting overlapping target paths

collectFiles walks each configured path independently and appends every discovered file without deduplication. If a target includes overlapping entries (for example both a directory and a file inside it, or the same path twice), the same source file is injected multiple times into the extraction prompt. That needlessly increases token usage and can skew extraction outcomes because duplicated evidence is overrepresented.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/loader.ts

Comment thread experiments/rule-engine-poc/src/loader.ts

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/engine.ts

Comment thread experiments/rule-engine-poc/src/engine.ts

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/engine.ts

Comment thread experiments/rule-engine-poc/src/loader.ts

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/loader.ts Outdated

Comment thread experiments/rule-engine-poc/src/engine.ts

Comment thread experiments/rule-engine-poc/src/engine.ts Outdated

claude added 2 commits May 17, 2026 11:56

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/open-browser.ts Outdated

Comment thread experiments/rule-engine-poc/src/loader.ts

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/open-browser.ts Outdated

Comment thread experiments/rule-engine-poc/src/report.ts Outdated

Comment thread experiments/rule-engine-poc/src/prompt-builder.ts Outdated

claude added 2 commits May 17, 2026 12:11

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/cli.ts Outdated

Comment thread experiments/rule-engine-poc/src/context.ts

claude added 2 commits May 17, 2026 12:16

claude added 4 commits May 17, 2026 12:22

docs(rule-engine-poc): research wave 3 strategist re-evaluation (rese…

3140320

…arch/13)

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/config.ts

Comment thread experiments/rule-engine-poc/src/context.ts Outdated

claude added 2 commits May 17, 2026 12:34

claude added 11 commits May 17, 2026 12:57

docs(rule-engine-poc): commit sample HTML reports for research wave 4

1d106c2

Three rendered reports (ready, blocked, needs-attention) for the agents to inspect when reviewing report readability.

docs(rule-engine-poc): agent C audit-trail.md second-pass refinement

a479ada

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/cli-shared.ts

Comment thread experiments/rule-engine-poc/src/open-browser.ts Outdated

claude added 2 commits May 17, 2026 13:15

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/html-report.ts

Comment thread experiments/rule-engine-poc/src/html-report.ts Outdated

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/cli.ts

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread experiments/rule-engine-poc/src/loader.ts Outdated

Comment thread experiments/rule-engine-poc/src/loader.ts

claude and others added 4 commits May 17, 2026 13:37

Merge branch 'develop' into claude/rule-engine-poc-gO5yq

b1bf4b1

Merge branch 'claude/rule-engine-poc-gO5yq' of http://127.0.0.1:43891…

cb23228

…/git/Luis85/agentic-workflow into claude/rule-engine-poc-gO5yq

Luis85 merged commit dd9fcab into develop May 17, 2026
6 checks passed

Luis85 deleted the claude/rule-engine-poc-gO5yq branch May 17, 2026 13:50

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Luis85 mentioned this pull request May 17, 2026

docs(rule-engine-poc): single-page HTML report reference #526

Open

4 tasks

Conversation

Luis85 commented May 17, 2026

Summary

What's included

Try it

What this is not

Test plan

Related

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Luis85 commented May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Luis85 commented May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Luis85 commented May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Luis85 commented May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Luis85 commented May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Luis85 commented May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment