Skip to content

Add prompt adherence check to pipeline#544

Merged
neoneye merged 19 commits intomainfrom
feature/prompt-adherence
Apr 9, 2026
Merged

Add prompt adherence check to pipeline#544
neoneye merged 19 commits intomainfrom
feature/prompt-adherence

Conversation

@neoneye
Copy link
Copy Markdown
Member

@neoneye neoneye commented Apr 9, 2026

Summary

  • Adds a Prompt Adherence step that checks how faithfully the final plan follows the original user prompt
  • Two-phase LLM approach: Phase 1 extracts directives (constraints, stated facts, requirements, banned words, intent), Phase 2 scores each against the final plan
  • Each directive scored with adherence_5 (1-5 Likert) and categorized as Fully honored, Partially honored, Softened, Ignored, Contradicted, or Unsolicited caveat
  • Produces a markdown report with summary table (sorted by ID), overall adherence percentage, and detailed Issues section for all non-perfect items
  • Last section in the final HTML report
  • Splits plan.txt into plan_raw.json (raw user prompt + date as JSON) + SetupTask (template). Prompt adherence reads the raw user prompt directly, not the templated version.

Motivation

PlanExe's pipeline has a "normalization bias" — each step nudges the plan toward what a reasonable project should look like. Over ~70 nodes the cumulative drift is significant. User-stated facts get overridden, requirements get softened, and the plan adds unsolicited feasibility studies. This step surfaces that drift so the user can see it at a glance.

Test plan

  • 7 unit tests for Pydantic models, markdown generation, and score calculation
  • 330 total tests pass (no regressions)
  • extract_dag picks up the new node (71 nodes)
  • Manual test with real pipeline runs

🤖 Generated with Claude Code

neoneye and others added 19 commits April 9, 2026 17:17
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add PromptAdherenceTask to full_plan_pipeline.py requires() dict and
report.py requires() dict and run_inner(). Also fix bare Enum types
in prompt_adherence.py Pydantic models to use Literal[...] as required
by the codebase lint rules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Incrementing integer prevents random ordering from the LLM.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The report generator adds its own section header.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The app now saves plan_raw.json (user prompt + date as JSON).
SetupTask reads plan_raw.json and produces plan.txt from a template.
This separates the raw user input from the formatted pipeline input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Uses the raw user prompt directly, not the templated plan.txt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changed threshold from adherence <= 3 to adherence < 5 so
partially_honored items appear in the Issues list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Displays the formula below the overall score, e.g.:
(5×5 + 4×4 + 5×3 + ...) / 250 = 94%

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shows three lines:
IMPORTANCE_ADHERENCE_SUM = (5×5 + 3×4 + ...) = 205
IMPORTANCE_SUM = 5 + 3 + ... = 41
OVERALL_ADHERENCE = IMPORTANCE_ADHERENCE_SUM / (IMPORTANCE_SUM × 5) = 205 / 205 = 94%

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code blocks in sections like Prompt Adherence now render as
<pre><code> instead of inline <code>, preserving line breaks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@neoneye neoneye merged commit 7da1664 into main Apr 9, 2026
3 checks passed
@neoneye neoneye deleted the feature/prompt-adherence branch April 9, 2026 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant