fix(prompts): LLM prompting audit fixes (8 findings) by dennys246 · Pull Request #218 · dennys246/Maxim

dennys246 · 2026-05-03T16:41:36Z

Summary

Fixes from the LLM-prompting audit (8 findings, all closed). Net: 4 commits, +422/−350 LOC, 6307 tests pass (+10 new regression tests for the contracts that were silently relied on).

Acting Coach prompt section bumped to CRITICAL — was IMPORTANT while the bio signals it synthesises (causal_context, valence_context, body_state) were CRITICAL, so the interpreted view dropped before its source data under token pressure.
Observation section bumped IMPORTANT — was NICE_TO_HAVE, dropped before conversation history despite tools needing the current state to be safely invoked.
sense_presence.input_schema migrated from the legacy description-as-value pattern to JSONSchema, so the export marks context as optional (matching the implementation) instead of required (matching the prose). Strict MCP / Anthropic clients no longer reject calls that omit context.
Auto-sense bare-except in agent_loop.py split: KeyError (tool not registered) stays silent, anything else routes through log_swallowed_exception so silent blindness can no longer hide a real bug.
Dead PromptAssembler scaffold deleted (−326 LOC). The substrate plan's B1 ("PromptAssembler as single composition point") was archived as shipped but the migration was never completed; prompts/assembler.py was abandoned scaffold whose only consumer-of-a-consumer (MemoryHub.get_memory_summary) had zero production callers. CLAUDE.md "legacy" label corrected.
Hardcoded conversation-turn cap (3 vs 12) replaced with min(12, max(3, n_ctx // 2000)) for embodied / min(20, max(6, n_ctx // 800)) for non-embodied. Embodied agents on 32K-context models no longer get clipped to 3 turns regardless of context.
Motor programs truncation rewritten to drop full entries (name + Steps + Known risks together) instead of slicing by line count, which previously left orphan continuation lines whose owning name had been dropped.
AgentPool.remove() now drops module-level bio_integration stash entries (_episode_ticks, _latest_pain_intensity, _latest_substrate_nodes) so a future agent reusing the same id can't inherit stale state. No bug today — multi-agent runs use unique ids — but cheap insurance against NPC respawn / pool recycling patterns.

Plus regression tests pinning the deliberation-transcript suppression contract (bio_enrichment_context and working_memory_thoughts are intentionally suppressed when a transcript is present) so future producer/consumer divergence surfaces loudly instead of as a stale-prompt regression.

Test plan

Full fast suite (6307 passed, 15 skipped, 40 deselected)
ruff check + ruff format on every touched file
New regression tests for transcript suppression (7), motor program truncation (3), AgentPool stash cleanup (1), sense_presence schema export (1)
Sanity-run a sim with MAXIM_LOG_FILE to confirm Acting Coach now appears under token pressure on a 4K-ctx model
Verify on a real Anthropic / MCP client that sense_presence calls without context no longer get rejected

🤖 Generated with Claude Code

From the prompting audit: #2 Acting Coach was at SectionPriority.IMPORTANT while its source signals (causal_context, valence_context, body_state) were CRITICAL. Under token pressure the interpreted view dropped before the raw signals it synthesises — backwards. Bumped to CRITICAL so the agent safety annotations survive alongside the bio data they summarise. #4 Added regression tests for the deliberation_transcript suppression contract in _add_perception_sections + _add_working_memory_section. When transcript is present, bio_enrichment_context and working_memory_thoughts are intentionally suppressed (transcript is assumed to subsume them). The risk is silent staleness: a fresh percept arriving after transcript construction would have its bio_enrichment dropped. The tests pin both branches so any future producer/consumer divergence surfaces loudly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… auto-sense logging #7 Bumped current-observation section from NICE_TO_HAVE to IMPORTANT in PromptBuilder._add_perception_sections. Tools cannot be safely invoked without knowing the current state, so the observation should outlast conversation history under token pressure. #3 SensePresenceTool.input_schema migrated from the legacy description-as-value pattern to JSONSchema directly. The old form emitted required: ["context"] in the export despite the prose saying "Optional", which strict MCP / Anthropic clients reject. execute() already handles a missing context gracefully and the auto-sense caller never passes one. Added regression test in test_tool_discovery.py. Auto-sense logging in agent_loop.py:1017-1057 — split the bare except (KeyError, Exception): pass blanket into: - KeyError on registry.get(): silent (tool not registered is a legitimate config, e.g. agents without entity_map). - Anything else from execute(): log_swallowed_exception so silent blindness can no longer hide a real bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e plan B1 The substrate plan's B1 ("PromptAssembler — single composition point") was archived as shipped, but the migration was never completed. Exit criterion (grep -r system_message=f outside the assembler returns nothing) was never met. agents/prompt_builder.py is the actual production builder at 1658 lines; prompts/assembler.py was a 184-line abandoned scaffold. What was dead: - PromptAssembler class (no production caller, never instantiated) - compose_memory_section, compose_observation_section methods - MemorySummary.from_atl classmethod - MemoryHub.get_memory_summary (only consumer of from_atl; itself had zero production callers, only one test asserting it returned None when substrate disabled) Removed: prompts/assembler.py, prompts/__init__.py exports for the three dead symbols, MemoryHub.get_memory_summary, related tests in test_substrate_recognition.py (TestMemorySummary, TestPromptAssembler, test_get_memory_summary_none_when_disabled), and the stale "PromptAssembler will replace this" comment in dm_runtime.py. Updated CLAUDE.md "Prompt composition" key files row: prompt_builder.py is the canonical builder, no longer mislabelled "legacy" — the file that had the misleading "(legacy)" annotation is the one in production, and the file labelled canonical is what got deleted. If a future plan wants to replace prompt_builder.py with a structured composition layer, start fresh — the previous attempt is not a useful foundation to build on. Test count: 6303 passed (was 6312; 9 tests removed for the dead code). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…or truncation + AgentPool stash cleanup #5 Two prompt-builder budget fixes: - Conversation history turn cap was hardcoded `3 if acting_coach else 12`, ignoring n_ctx entirely. Embodied agents on 32K-context models still got 3 turns; non-embodied on 4K still got 12. Replaced with proportional formulas that scale with n_ctx and floor at sensible minimums: embodied min(12, max(3, n_ctx // 2000)), non-embodied min(20, max(6, n_ctx // 800)). - Motor programs section truncated by line count (`m // 15`), which left orphan continuation lines (Steps:/Known risks:) when the owning name line got dropped. Each motor program is 1-3 lines; line-count slicing doesn't respect that structure. Replaced with an entry-aware truncate_fn that drops full entries from the tail using a char-budget estimate (`max_tok * 4`, same heuristic as the deliberation-transcript truncator). Also stops once only the header remains. The deliberation-transcript budget at line 1277 (`min(2000, available * 0.3)`) was flagged in the audit but is actually a correct proportional pattern (cap + fraction); left unchanged. #6 AgentPool.remove() now drops the per-agent bio_integration stash entries (`_episode_ticks`, `_latest_pain_intensity`, `_latest_substrate_nodes`) so a future agent reusing the same id doesn't inherit stale tick counters / pain intensity / substrate node refs from the removed one. No bug today since multi-agent runs use unique ids, but the cleanup is cheap insurance against NPC respawn or pool recycling patterns. Tests: +4 (test_remove_clears_per_agent_stash + three motor program truncation cases pinning no-orphans + tail-drop + head-preservation). Total: 6307 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dennys246 and others added 4 commits May 3, 2026 09:18

dennys246 merged commit a815c71 into main May 3, 2026
5 checks passed

dennys246 deleted the bug/prompting-audit-fixes branch May 3, 2026 19:20

dennys246 mentioned this pull request May 4, 2026

fix(nac): canonicalize tool signatures + substring-scan keyword query #222

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(prompts): LLM prompting audit fixes (8 findings)#218

fix(prompts): LLM prompting audit fixes (8 findings)#218
dennys246 merged 4 commits intomainfrom
bug/prompting-audit-fixes

dennys246 commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant