Summary
/map-explain has two UX defects that hurt it on real targets:
- Output language is hardcoded to English — the skill never respects the user's working language. When the conversation/user context is Russian, the walkthrough is still emitted entirely in English.
- Output is an enormous wall of text — the rigid always-all-10-sections structure plus "explain every important line / do not skip obvious lines" produces ~4000+ words on a single ~600-line diff. Technically excellent, but it overwhelms the reader and defeats the "fast path to understanding" goal.
Both reproduced today on a 600-line Go controller branch-diff (iam-operator), Russian-speaking user.
mapify_version: 3.5.0
Affected file: .claude/skills/map-explain/SKILL.md
Defect 1 — Respect the user's language
Observed: SKILL.md contains zero language instructions (verified: no Respond/language-of-output rule anywhere in the file). The user's language preference is explicitly present in the agent's context — both the user-global instruction ("respond in the same language as the user's message") and the host project's CLAUDE.md language-handling block establish Russian. This is not ambiguous: the preference is right there in context. The skill simply has no rule to honor it, so the agent defaults to English and silently overrides a stated preference.
Expected: the explanation body should be produced in the user's working language (technical terms, code, identifiers, file:line refs stay English). This matches how most other agentic skills behave.
Suggested fix: add an explicit rule near the top of SKILL.md, e.g.:
Write the explanation in the user's established language — honor the language already set in context (the conversation's language and the host/global CLAUDE.md language convention) rather than defaulting to English. Keep code, identifiers, commands, error messages, and file:line references in English; translate only the prose.
Defect 2 — Information overload / verbosity
Observed: on a 600-line controller diff the output was ~4000+ words across all 10 forced sections. Sections 6 ("what every important line does") and 7 ("why each non-trivial line is needed") are the worst offenders because SKILL.md:97 and SKILL.md:106 (do not skip "obvious" lines) force line-by-line bloat. The always-emit-all-10-sections rule also pads sections that don't apply (e.g. "entities" for a single function).
Expected: stay a genuine deep-understanding tool, but front-load the mental model and scale depth to target size instead of drowning the reader.
Recommended changes (from an LLM-council deliberation, prioritized)
The council (minimax-m3, gpt-5.5, qwen3.7-max, deepseek-v4-pro) was strongly aligned. Root cause: the prompt optimizes for completeness instead of signal — every rule pushes toward inclusion.
P1 — Biggest lever: replace "every important line" with "load-bearing lines". Delete do not skip "obvious" lines (SKILL.md:106) entirely. Define load-bearing mechanically — a line qualifies if it: mutates external/system state; branches on a non-trivial condition; crosses an abstraction boundary / public contract; does validation, authz, normalization, or parsing; handles an error/retry/fallback/edge case; encodes a non-obvious invariant; or would silently break behavior if removed. Explicitly skip type annotations, logging, trivial assignments, boilerplate imports, standard decorators, name-equals-behavior getters/setters. Merge old sections 6 + 7 into one table (Location | What it does | Why it matters | If changed incorrectly) — they were explaining the same code twice. Expected ~40–60% word reduction on large targets. Embed a worked load-bearing-vs-not example in the prompt.
P2 — Scope compression for repeated patterns. When the target has repeated handlers/routes/cases/validators/mappers, explain the shared pattern once, then list only the meaningful exceptions. Never re-explain the same shape N times. Alone cuts 30–60% on controller-style files.
P3 — Adaptive size tiers with hard word budgets. LLMs respect numeric budgets far better than "be concise". Treat budgets as ceilings, not targets ("when in doubt, cut"):
| Tier |
Trigger |
Word budget |
Load-bearing line cap |
| Tiny |
≤30–50 lines / single symbol |
300–700 |
full line detail OK |
| Small |
31–150 lines |
600–1,200 |
≤8–10 |
| Medium |
151–300 lines |
900–1,600 |
≤12 |
| Large |
301–600 lines / multi-file |
1,200–2,200 |
≤18 |
| Huge |
>600–800 lines |
1,800–2,800 |
≤20–25 |
Open with a framing line: Target: <type>. Size: <tier>. Budget: <range> words. Add snippet caps: no quote >3 lines, ~20 quoted lines total max. (Contested point: 800-word hard cap was judged too aggressive for deep-understanding on large diffs; 2.5–3.5k too loose. Center of gravity ≈ 1,200–2,200 words for a large diff — start there, tune on a tiny/medium/large sample.)
P4 — Progressive disclosure in a single shot. No expand button exists, so front-load. Always open with a "Mental model in 60 seconds" block (≤5 sentences / ≤100 words: what it is, its job, the one thing you most need to know, the one thing most likely to surprise you). Tag section headers with read-tiers, e.g. [MUST READ] / [READ IF MODIFYING] / [SKIM], so the skip decision is visible. End with natural-language follow-up suggestions ("Explain the authorization path line by line", "Focus only on the data flow") — do not print fake CLI flags like --focus flow the CLI can't honor.
P5 — Make sections adaptive (menu, not checklist). Replace "always emit all 10 sections" with applicability rules: skip "entities" for a single function; for a PR fold "differences" into the before/after delta; skip "side effects" for a pure function; compress a trivial category to one sentence instead of a full section. Don't emit a header just to write "Skipped" — append one line at the end: Omitted: <list> — irrelevant for <reason>. For PR diffs specifically: move before/after/runtime-delta to the top (most critical context for a diff) and apply line-by-line analysis only to changed lines.
P6 — Prompt-wording surgery (cheap, reliable):
- Delete
do not skip "obvious" lines.
- Replace "be thorough/detailed" with "Be dense. Prefer one precise sentence over three vague ones."
- "Prefer statements of purpose/consequence over procedural description."
- Ban filler: "This line…", "Here we have…", "It is important to note…", "Essentially…", "In other words…", "Note that…".
- "No preamble, no apology, no closing pleasantries. Begin with the substance."
- Tighten the
Inferred: rule: mark only inferences that required reading multiple files or guessing intent — not direct observations any competent reader would make.
- Default one sentence per load-bearing line for WHAT; add a second sentence only for WHY.
Suggested rollout order
- Load-bearing filter + merge sections 6/7 into a table + scope compression (ship first, ~40–60% reduction).
- Size tiers + hard word/snippet budgets (guardrails).
- Front-load the mental model with read-tier labels.
- Wording surgery + a "lines deliberately summarized, not quoted" trust footer + natural-language follow-ups.
Council conversation id: 80e522f4-ac22-4c13-bde3-9de6d87c7b76
Summary
/map-explainhas two UX defects that hurt it on real targets:Both reproduced today on a 600-line Go controller branch-diff (
iam-operator), Russian-speaking user.mapify_version: 3.5.0Affected file:
.claude/skills/map-explain/SKILL.mdDefect 1 — Respect the user's language
Observed:
SKILL.mdcontains zero language instructions (verified: noRespond/language-of-output rule anywhere in the file). The user's language preference is explicitly present in the agent's context — both the user-global instruction ("respond in the same language as the user's message") and the host project'sCLAUDE.mdlanguage-handling block establish Russian. This is not ambiguous: the preference is right there in context. The skill simply has no rule to honor it, so the agent defaults to English and silently overrides a stated preference.Expected: the explanation body should be produced in the user's working language (technical terms, code, identifiers,
file:linerefs stay English). This matches how most other agentic skills behave.Suggested fix: add an explicit rule near the top of
SKILL.md, e.g.:Defect 2 — Information overload / verbosity
Observed: on a 600-line controller diff the output was ~4000+ words across all 10 forced sections. Sections 6 ("what every important line does") and 7 ("why each non-trivial line is needed") are the worst offenders because
SKILL.md:97andSKILL.md:106(do not skip "obvious" lines) force line-by-line bloat. The always-emit-all-10-sections rule also pads sections that don't apply (e.g. "entities" for a single function).Expected: stay a genuine deep-understanding tool, but front-load the mental model and scale depth to target size instead of drowning the reader.
Recommended changes (from an LLM-council deliberation, prioritized)
The council (minimax-m3, gpt-5.5, qwen3.7-max, deepseek-v4-pro) was strongly aligned. Root cause: the prompt optimizes for completeness instead of signal — every rule pushes toward inclusion.
P1 — Biggest lever: replace "every important line" with "load-bearing lines". Delete
do not skip "obvious" lines(SKILL.md:106) entirely. Define load-bearing mechanically — a line qualifies if it: mutates external/system state; branches on a non-trivial condition; crosses an abstraction boundary / public contract; does validation, authz, normalization, or parsing; handles an error/retry/fallback/edge case; encodes a non-obvious invariant; or would silently break behavior if removed. Explicitly skip type annotations, logging, trivial assignments, boilerplate imports, standard decorators, name-equals-behavior getters/setters. Merge old sections 6 + 7 into one table (Location | What it does | Why it matters | If changed incorrectly) — they were explaining the same code twice. Expected ~40–60% word reduction on large targets. Embed a worked load-bearing-vs-not example in the prompt.P2 — Scope compression for repeated patterns. When the target has repeated handlers/routes/cases/validators/mappers, explain the shared pattern once, then list only the meaningful exceptions. Never re-explain the same shape N times. Alone cuts 30–60% on controller-style files.
P3 — Adaptive size tiers with hard word budgets. LLMs respect numeric budgets far better than "be concise". Treat budgets as ceilings, not targets ("when in doubt, cut"):
Open with a framing line:
Target: <type>. Size: <tier>. Budget: <range> words.Add snippet caps: no quote >3 lines, ~20 quoted lines total max. (Contested point: 800-word hard cap was judged too aggressive for deep-understanding on large diffs; 2.5–3.5k too loose. Center of gravity ≈ 1,200–2,200 words for a large diff — start there, tune on a tiny/medium/large sample.)P4 — Progressive disclosure in a single shot. No expand button exists, so front-load. Always open with a "Mental model in 60 seconds" block (≤5 sentences / ≤100 words: what it is, its job, the one thing you most need to know, the one thing most likely to surprise you). Tag section headers with read-tiers, e.g.
[MUST READ]/[READ IF MODIFYING]/[SKIM], so the skip decision is visible. End with natural-language follow-up suggestions ("Explain the authorization path line by line", "Focus only on the data flow") — do not print fake CLI flags like--focus flowthe CLI can't honor.P5 — Make sections adaptive (menu, not checklist). Replace "always emit all 10 sections" with applicability rules: skip "entities" for a single function; for a PR fold "differences" into the before/after delta; skip "side effects" for a pure function; compress a trivial category to one sentence instead of a full section. Don't emit a header just to write "Skipped" — append one line at the end:
Omitted: <list> — irrelevant for <reason>. For PR diffs specifically: move before/after/runtime-delta to the top (most critical context for a diff) and apply line-by-line analysis only to changed lines.P6 — Prompt-wording surgery (cheap, reliable):
do not skip "obvious" lines.Inferred:rule: mark only inferences that required reading multiple files or guessing intent — not direct observations any competent reader would make.Suggested rollout order
Council conversation id:
80e522f4-ac22-4c13-bde3-9de6d87c7b76