Skip to content

map-explain: respect user's language + cut wall-of-text verbosity (adaptive depth) #224

Description

@azalio

Summary

/map-explain has two UX defects that hurt it on real targets:

  1. Output language is hardcoded to English — the skill never respects the user's working language. When the conversation/user context is Russian, the walkthrough is still emitted entirely in English.
  2. Output is an enormous wall of text — the rigid always-all-10-sections structure plus "explain every important line / do not skip obvious lines" produces ~4000+ words on a single ~600-line diff. Technically excellent, but it overwhelms the reader and defeats the "fast path to understanding" goal.

Both reproduced today on a 600-line Go controller branch-diff (iam-operator), Russian-speaking user.

mapify_version: 3.5.0
Affected file: .claude/skills/map-explain/SKILL.md


Defect 1 — Respect the user's language

Observed: SKILL.md contains zero language instructions (verified: no Respond/language-of-output rule anywhere in the file). The user's language preference is explicitly present in the agent's context — both the user-global instruction ("respond in the same language as the user's message") and the host project's CLAUDE.md language-handling block establish Russian. This is not ambiguous: the preference is right there in context. The skill simply has no rule to honor it, so the agent defaults to English and silently overrides a stated preference.

Expected: the explanation body should be produced in the user's working language (technical terms, code, identifiers, file:line refs stay English). This matches how most other agentic skills behave.

Suggested fix: add an explicit rule near the top of SKILL.md, e.g.:

Write the explanation in the user's established language — honor the language already set in context (the conversation's language and the host/global CLAUDE.md language convention) rather than defaulting to English. Keep code, identifiers, commands, error messages, and file:line references in English; translate only the prose.


Defect 2 — Information overload / verbosity

Observed: on a 600-line controller diff the output was ~4000+ words across all 10 forced sections. Sections 6 ("what every important line does") and 7 ("why each non-trivial line is needed") are the worst offenders because SKILL.md:97 and SKILL.md:106 (do not skip "obvious" lines) force line-by-line bloat. The always-emit-all-10-sections rule also pads sections that don't apply (e.g. "entities" for a single function).

Expected: stay a genuine deep-understanding tool, but front-load the mental model and scale depth to target size instead of drowning the reader.

Recommended changes (from an LLM-council deliberation, prioritized)

The council (minimax-m3, gpt-5.5, qwen3.7-max, deepseek-v4-pro) was strongly aligned. Root cause: the prompt optimizes for completeness instead of signal — every rule pushes toward inclusion.

P1 — Biggest lever: replace "every important line" with "load-bearing lines". Delete do not skip "obvious" lines (SKILL.md:106) entirely. Define load-bearing mechanically — a line qualifies if it: mutates external/system state; branches on a non-trivial condition; crosses an abstraction boundary / public contract; does validation, authz, normalization, or parsing; handles an error/retry/fallback/edge case; encodes a non-obvious invariant; or would silently break behavior if removed. Explicitly skip type annotations, logging, trivial assignments, boilerplate imports, standard decorators, name-equals-behavior getters/setters. Merge old sections 6 + 7 into one table (Location | What it does | Why it matters | If changed incorrectly) — they were explaining the same code twice. Expected ~40–60% word reduction on large targets. Embed a worked load-bearing-vs-not example in the prompt.

P2 — Scope compression for repeated patterns. When the target has repeated handlers/routes/cases/validators/mappers, explain the shared pattern once, then list only the meaningful exceptions. Never re-explain the same shape N times. Alone cuts 30–60% on controller-style files.

P3 — Adaptive size tiers with hard word budgets. LLMs respect numeric budgets far better than "be concise". Treat budgets as ceilings, not targets ("when in doubt, cut"):

Tier Trigger Word budget Load-bearing line cap
Tiny ≤30–50 lines / single symbol 300–700 full line detail OK
Small 31–150 lines 600–1,200 ≤8–10
Medium 151–300 lines 900–1,600 ≤12
Large 301–600 lines / multi-file 1,200–2,200 ≤18
Huge >600–800 lines 1,800–2,800 ≤20–25

Open with a framing line: Target: <type>. Size: <tier>. Budget: <range> words. Add snippet caps: no quote >3 lines, ~20 quoted lines total max. (Contested point: 800-word hard cap was judged too aggressive for deep-understanding on large diffs; 2.5–3.5k too loose. Center of gravity ≈ 1,200–2,200 words for a large diff — start there, tune on a tiny/medium/large sample.)

P4 — Progressive disclosure in a single shot. No expand button exists, so front-load. Always open with a "Mental model in 60 seconds" block (≤5 sentences / ≤100 words: what it is, its job, the one thing you most need to know, the one thing most likely to surprise you). Tag section headers with read-tiers, e.g. [MUST READ] / [READ IF MODIFYING] / [SKIM], so the skip decision is visible. End with natural-language follow-up suggestions ("Explain the authorization path line by line", "Focus only on the data flow") — do not print fake CLI flags like --focus flow the CLI can't honor.

P5 — Make sections adaptive (menu, not checklist). Replace "always emit all 10 sections" with applicability rules: skip "entities" for a single function; for a PR fold "differences" into the before/after delta; skip "side effects" for a pure function; compress a trivial category to one sentence instead of a full section. Don't emit a header just to write "Skipped" — append one line at the end: Omitted: <list> — irrelevant for <reason>. For PR diffs specifically: move before/after/runtime-delta to the top (most critical context for a diff) and apply line-by-line analysis only to changed lines.

P6 — Prompt-wording surgery (cheap, reliable):

  • Delete do not skip "obvious" lines.
  • Replace "be thorough/detailed" with "Be dense. Prefer one precise sentence over three vague ones."
  • "Prefer statements of purpose/consequence over procedural description."
  • Ban filler: "This line…", "Here we have…", "It is important to note…", "Essentially…", "In other words…", "Note that…".
  • "No preamble, no apology, no closing pleasantries. Begin with the substance."
  • Tighten the Inferred: rule: mark only inferences that required reading multiple files or guessing intent — not direct observations any competent reader would make.
  • Default one sentence per load-bearing line for WHAT; add a second sentence only for WHY.

Suggested rollout order

  1. Load-bearing filter + merge sections 6/7 into a table + scope compression (ship first, ~40–60% reduction).
  2. Size tiers + hard word/snippet budgets (guardrails).
  3. Front-load the mental model with read-tier labels.
  4. Wording surgery + a "lines deliberately summarized, not quoted" trust footer + natural-language follow-ups.

Council conversation id: 80e522f4-ac22-4c13-bde3-9de6d87c7b76

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions