Add prompt adherence check to pipeline by neoneye · Pull Request #544 · PlanExeOrg/PlanExe

neoneye · 2026-04-09T15:30:39Z

Summary

Adds a Prompt Adherence step that checks how faithfully the final plan follows the original user prompt
Two-phase LLM approach: Phase 1 extracts directives (constraints, stated facts, requirements, banned words, intent), Phase 2 scores each against the final plan
Each directive scored with adherence_5 (1-5 Likert) and categorized as Fully honored, Partially honored, Softened, Ignored, Contradicted, or Unsolicited caveat
Produces a markdown report with summary table (sorted by ID), overall adherence percentage, and detailed Issues section for all non-perfect items
Last section in the final HTML report
Splits plan.txt into plan_raw.json (raw user prompt + date as JSON) + SetupTask (template). Prompt adherence reads the raw user prompt directly, not the templated version.

Motivation

PlanExe's pipeline has a "normalization bias" — each step nudges the plan toward what a reasonable project should look like. Over ~70 nodes the cumulative drift is significant. User-stated facts get overridden, requirements get softened, and the plan adds unsolicited feasibility studies. This step surfaces that drift so the user can see it at a glance.

Test plan

7 unit tests for Pydantic models, markdown generation, and score calculation
330 total tests pass (no regressions)
extract_dag picks up the new node (71 nodes)
Manual test with real pipeline runs

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add PromptAdherenceTask to full_plan_pipeline.py requires() dict and report.py requires() dict and run_inner(). Also fix bare Enum types in prompt_adherence.py Pydantic models to use Literal[...] as required by the codebase lint rules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Incrementing integer prevents random ordering from the LLM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The report generator adds its own section header. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The app now saves plan_raw.json (user prompt + date as JSON). SetupTask reads plan_raw.json and produces plan.txt from a template. This separates the raw user input from the formatted pipeline input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Uses the raw user prompt directly, not the templated plan.txt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Changed threshold from adherence <= 3 to adherence < 5 so partially_honored items appear in the Issues list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Displays the formula below the overall score, e.g.: (5×5 + 4×4 + 5×3 + ...) / 250 = 94% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Shows three lines: IMPORTANCE_ADHERENCE_SUM = (5×5 + 3×4 + ...) = 205 IMPORTANCE_SUM = 5 + 3 + ... = 41 OVERALL_ADHERENCE = IMPORTANCE_ADHERENCE_SUM / (IMPORTANCE_SUM × 5) = 205 / 205 = 94% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Code blocks in sections like Prompt Adherence now render as <pre><code> instead of inline <code>, preserving line breaks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

neoneye and others added 19 commits April 9, 2026 17:17

docs: add prompt adherence design spec

c197a6c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add prompt adherence implementation plan

818686d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add FilenameEnum entries for prompt adherence

4d8bea0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add prompt adherence Pydantic models, prompts, and markdown gen…

06bfad3

…eration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add PromptAdherenceTask Luigi node

2ad532f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: use directive_index (int) instead of directive_id (str)

de9f0b2

Incrementing integer prevents random ordering from the LLM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use human-readable category labels in markdown output

ee655a9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use human-readable directive type labels in markdown output

4ceff77

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: move Prompt Adherence to last section in report

3876d99

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: sort summary table by ID, use "Issue N - title" format in issues

ca653eb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove h1 header from prompt adherence markdown

57af3c6

The report generator adds its own section header. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: read plan_prompt from plan_raw.json in PromptAdherenceTask

b691e1a

Uses the raw user prompt directly, not the templated plan.txt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: show all non-perfect directives in Issues section

ee32bd3

Changed threshold from adherence <= 3 to adherence < 5 so partially_honored items appear in the Issues list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: show adherence score math in markdown report

12cca36

Displays the formula below the overall score, e.g.: (5×5 + 4×4 + 5×3 + ...) / 250 = 94% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: wrap adherence formula in code block

b28445c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: enable fenced_code in markdown_with_tables rendering

0cbb55d

Code blocks in sections like Prompt Adherence now render as <pre><code> instead of inline <code>, preserving line breaks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

neoneye merged commit 7da1664 into main Apr 9, 2026
3 checks passed

neoneye deleted the feature/prompt-adherence branch April 9, 2026 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prompt adherence check to pipeline#544

Add prompt adherence check to pipeline#544
neoneye merged 19 commits intomainfrom
feature/prompt-adherence

neoneye commented Apr 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neoneye commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neoneye commented Apr 9, 2026 •

edited

Loading