fix: anonymize few-shot examples in identify_documents prompts#582
Merged
fix: anonymize few-shot examples in identify_documents prompts#582
Conversation
The three IDENTIFY_DOCUMENTS_*_SYSTEM_PROMPT variants contained concrete, domain-named examples (childcare subsidy policies, housing price indices, mental health survey data, marathon training, climate crop yield, etc.) that the LLM reproduced verbatim for unrelated plans. Observed in a Denmark euro-adoption run: documents_to_find was populated with the exact social-policy names from the business prompt's NAMING CONVENTION and EXAMPLE MAPPING sections, while the downstream DraftDocumentsToFindTask correctly filled euro-adoption content into essential_information, producing a chimera output. Replace the concrete named examples in the Documents to Find sections (EXAMPLE MAPPING, FORBIDDEN, NAMING CONVENTION) with bracketed placeholders, and mark the EXAMPLE MAPPING blocks as "illustrative patterns only — do NOT reuse these topics." The structural pattern is preserved; only the domain hooks the LLM was latching onto are removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
neoneye
added a commit
that referenced
this pull request
Apr 17, 2026
Previous fix (00d997e) added out-of-scope exclusions for legislation, treaties, currency adoption, etc. — but left the concrete physics examples inside the same instruction: "perpetual motion, faster-than- light travel, reactionless/anti-gravity propulsion, time travel" plus "thermodynamics, conservation of energy, relativity". On a Denmark euro-adoption run, item 1 was rated HIGH with justification: "success literally requires breaking the named law of physics (conservation of energy) for a reactionless/anti-gravity propulsion system." Both "reactionless/anti-gravity propulsion" and "conservation of energy" are lifted verbatim from the instruction's own example lists. Same bleedthrough pattern as PR #582 on identify_documents: concrete few-shot examples get reproduced as findings by weaker models. Fix: strip the scifi system examples and the named-law examples from the instruction. Add an explicit anti-fabrication rule: the model must quote text from the plan describing a physics-violating mechanism, or else rate LOW. For ≥MEDIUM ratings, require the justification to quote the plan text alongside naming the violated law. Scope: item 1 only. Does not touch Bug B (template lock across audit items 4-20 sharing identical justifications) — separate concern, will address in a follow-up PR if needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
IDENTIFY_DOCUMENTS_*_SYSTEM_PROMPTvariants inworker_plan_internal/document/identify_documents.pycontained concrete, domain-named examples (childcare subsidy policies, housing price indices, mental health survey data, marathon training, climate crop yield, etc.) that the LLM reproduced verbatim for unrelated plans.PlanExe-web/20260129_euro_adoption/):identified_documents_to_find.jsonwas populated with the exact social-policy document names from the business prompt'sNAMING CONVENTIONandEXAMPLE MAPPINGsections. DownstreamDraftDocumentsToFindTask(separate per-document LLM call) correctly filled euro-adoption content intoessential_information, producing a chimera output indraft_documents_to_find.json.EXAMPLE MAPPING,FORBIDDEN,NAMING CONVENTION) with bracketed placeholders across all three variants (business / personal / other), and mark theEXAMPLE MAPPINGblocks asillustrative patterns only — do NOT reuse these topics. Structural guidance is preserved; only the domain hooks the LLM was latching onto are removed.Root cause
The prompt's illustrative examples acted as a domain prior. For a Denmark euro-adoption plan, the LLM generated documents named
Existing National Childcare Subsidy Policies,Data on Average Childcare Costs,Tax Code Sections Related to Dependents,National Housing Price Indices,Official National Mental Health Survey Data,Existing Social Support Program Details— all literal strings from the system prompt examples.Test plan
identified_documents_to_find.jsoncontains euro-adoption-relevant source materials (treaty texts, ERM II convergence criteria, ECB monetary policy, payment-system standards, currency-conversion studies), not social-policy items.self_improveloop — it has not been iterated on yet.🤖 Generated with Claude Code