Skip to content

feat: add AI Teammate training system with learn-by-example patterns#148

Merged
anandgupta42 merged 22 commits intomainfrom
claude/ai-teammate-interface-p8SO6
Mar 15, 2026
Merged

feat: add AI Teammate training system with learn-by-example patterns#148
anandgupta42 merged 22 commits intomainfrom
claude/ai-teammate-interface-p8SO6

Conversation

@anandgupta42
Copy link
Contributor

@anandgupta42 anandgupta42 commented Mar 15, 2026

What does this PR do?

Adds a training system that lets the AI teammate learn from corrections and share that knowledge across your team.

Correct the agent once. It remembers forever. Your team inherits it.

When you correct the agent ("no, use DECIMAL not FLOAT"), it asks "Want me to remember this?" You say yes — 2 seconds, zero context switching. The correction persists across sessions. Your teammates inherit it via git pull. Research shows compact, focused context improves AI performance by 17pp while comprehensive docs hurt by 3pp (SkillsBench, 7,308 runs). Training delivers the right knowledge to the right agent.

What's included:

  • Correction capture in all agent modes (builder, analyst, validator, migrator, researcher, executive)
  • Trainer agent mode — dedicated mode for systematic teaching, onboarding, gap analysis, and curation
  • Researcher agent mode — deep multi-step investigation with structured reports
  • Context-aware injection — unified memory+training system that scores blocks by agent relevance (builder sees rules first, analyst sees glossary first)
  • 6 training kinds: rule, pattern, glossary, standard, context, playbook
  • Self-improvement insights: stale detection, applied count tracking, consolidation suggestions
  • Training tools: training_save, training_list, training_remove
  • Skills: /teach (learn from files), /train (learn from docs), /training-status (dashboard)
  • TUI tips: 5 new tips for training discoverability
  • Comprehensive docs: training guide, updated agent-modes page, homepage with all 7 agents

Architecture decisions:

  • Training wraps the existing Altimate Memory module (no parallel system)
  • Single unified injection path with agent-aware relevance scoring
  • Dedicated ALTIMATE_DISABLE_TRAINING feature flag (independent of memory)
  • Memory storage path uses .altimate-code/ primary with .opencode/ fallback
  • Budget of 20KB with priority-based selection (backed by SkillsBench research)

What was intentionally NOT built (based on 8-person evaluation team simulation):

  • training_scan — removed; was regex counting, not discovery
  • training_validate — removed; was keyword grep, not validation
  • accepted/rejected counters — removed; were never incremented (dead code)

Type of change

  • New feature (non-breaking change which adds functionality)

Issue for this PR

Closes #151

How did you verify your code works?

  • 152 tests across 7 test files (types, store, prompt, insights, integration, ux-improvements, tools)
  • Full lifecycle tests: save → list → inject → remove
  • TypeScript compilation passes
  • Evaluated by 8-person simulation team across 4 personas (analytics engineer, platform engineer, team lead, solo engineer) + 2 researchers + 2 adversarial analysts

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@github-actions
Copy link

Hey! Your PR title Add AI Teammate training system with learn-by-example patterns doesn't follow conventional commit format.

Please update it to start with one of:

  • feat: or feat(scope): new feature
  • fix: or fix(scope): bug fix
  • docs: or docs(scope): documentation changes
  • chore: or chore(scope): maintenance tasks
  • refactor: or refactor(scope): code refactoring
  • test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

@anandgupta42 anandgupta42 changed the title Add AI Teammate training system with learn-by-example patterns feat: add AI Teammate training system with learn-by-example patterns Mar 15, 2026
@github-actions
Copy link

Thanks for updating your PR! It now meets our contributing guidelines. 👍

@anandgupta42 anandgupta42 force-pushed the claude/ai-teammate-interface-p8SO6 branch from f94f0cb to ae8f884 Compare March 15, 2026 20:10
@anandgupta42 anandgupta42 force-pushed the claude/ai-teammate-interface-p8SO6 branch from db0ab6c to 80c9e68 Compare March 15, 2026 22:25
const meta = parseTrainingMeta(block.content)
const appliedStr = meta && meta.applied > 0 ? ` (applied ${meta.applied}x)` : ""
// Strip the training metadata comment from content for display
const content = block.content.replace(/^<!--\s*training\n[\s\S]*?-->\n*/m, "").trim()

This comment was marked as outdated.

if (memoryBlocks.length > 0) {
const memHeader = "\n### Memory\n"
const firstMemFormatted = formatBlock(memoryBlocks[0].block)
if (used + memHeader.length + firstMemFormatted.length + 2 < budget) {

This comment was marked as outdated.

claude and others added 11 commits March 15, 2026 15:54
Comprehensive design for repositioning altimate from "AI tool" to "AI
teammate" — including trainable knowledge system (/teach, /train,
/feedback), Deep Research mode for multi-step investigations, team
memory that persists via git, and UX reframing from "agent modes" to
"teammate roles."

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq
Add detailed competitive analysis from OpenClaw (self-improving memory,
heartbeat scheduler, meet-users-where-they-are), Devin ($10.2B
valuation, "junior partner" framing), and Factory AI (workflow
embedding). Add proactive behaviors section with background monitors
(cost alerts, freshness checks, schema drift, PII scanning) and
auto-promotion of learned corrections.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq
Core training infrastructure built on top of existing memory system:

Training Store & Types:
- TrainingStore wraps MemoryStore with training-specific conventions
- Four knowledge kinds: pattern, rule, glossary, standard
- Structured metadata (applied count, source, acceptance tracking)
- Training blocks stored in .opencode/memory/training/ (git-committable)
- One person teaches, whole team benefits via git

Training Tools:
- training_save: Save learned patterns, rules, glossary, standards
- training_list: List all learned knowledge with applied counts
- training_remove: Remove outdated training entries

Training Skills:
- /teach: Learn patterns from example files in the codebase
- /train: Learn standards from documents or style guides
- /training-status: Dashboard of all learned knowledge

System Prompt Injection:
- Training knowledge injected alongside memory at session start
- Structured by kind: rules first, then patterns, standards, glossary
- Budget-limited to 6000 chars to control prompt size
- Zero LLM calls on startup — just reads files from disk

Deep Research Agent Mode:
- New "researcher" agent for multi-step investigations
- 4-phase protocol: Plan → Gather → Analyze → Report
- Read-only access to all warehouse, schema, FinOps tools
- Structured reports with evidence, root causes, action items

Agent Awareness:
- All agent prompts updated with training awareness section
- Agents offer to save corrections as rules when users correct behavior
- Training tools permitted in all agent modes

Tests:
- 88 new tests across 5 test files (types, store, prompt, tools, integration)
- All tests standalone (no Instance dependency)
- Full lifecycle tests: save → list → format → inject → remove
- Edge cases: budget limits, meta roundtrips, coexistence with memory

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq
…n, budget visibility

- Fix researcher agent permissions: add training_save/remove (was read-only)
- Auto-lowercase + space-to-hyphen name transform in training_save (ARR → arr)
- Detect update vs new save, show "Updated" with preserved applied count
- Show training budget usage (chars/percent) on save, list, and remove
- Improve training_list: group by kind, show most-applied entries, budget %
- Improve training_remove: show available entries on not-found, applied count
- Show similar entry names in duplicate warnings (not just count)
- Raise content limit from 1800 to 2500 chars
- Export TRAINING_BUDGET constant, add budgetUsage() to TrainingPrompt
- Add 30 new tests: auto-lowercase, update detection, budget overflow,
  name collision, scale (80 entries), improved messaging
- All 118 training tests + 305 memory tests pass

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq
- Builder prompt: add attribution instructions (cite training entries that
  influenced output), correction detection (explicit + implicit patterns),
  conflict flagging between contradictory training entries
- Add /teach, /train, /training-status to Available Skills list in builder prompt
- Sort training entries by applied count (descending) in prompt injection so
  most-used entries get priority within the 6000-char budget
- Restructure Teammate Training section with clear subsections

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq
Simulation findings and fixes:

1. training_save now echoes back saved content so user can verify
   what was captured (new saves show content preview, updates show
   old vs new diff)

2. When training limit is reached, error now lists existing entries
   sorted by applied count and suggests the least-applied entry
   for removal

3. Researcher prompt now documents training_save/remove permissions
   (was contradicting its own permissions by saying "read-only" while
   having write access to training)

4. Added 10 new tests: content echo, update diff, limit suggestion,
   special character preservation (SQL -->, Jinja, HTML comments,
   code blocks), priority sorting verification

Verified: --> in content does NOT corrupt meta block (false positive).
The non-greedy regex terminates at the meta block's own --> correctly.

128 training tests + 305 memory tests all pass.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq
…ction

OpenClaw-inspired self-improvement mechanisms:

1. Wire up incrementApplied at injection time — counters now actually
   increment once per session per entry (deduped via session-scoped set),
   making "Most Applied" dashboard and priority sorting meaningful

2. TrainingInsights module analyzes training metadata and surfaces:
   - Stale entries (7+ days old, never applied) — suggests cleanup
   - High-value entries (5+ applications) — highlights most impactful
   - Near-limit warnings (18-19 of 20 entries per kind)
   - Consolidation opportunities (3+ entries with shared name prefix)

3. Insights automatically shown in training_list output

4. 24 new tests covering all insight types, boundary conditions,
   session tracking dedup, and format output

152 training tests + 305 memory tests all pass.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq
- Add `ALTIMATE_DISABLE_TRAINING` flag independent of memory's disable flag
- Use new flag in session prompt injection and tool registry
- Remove unused `budget-warning` insight type from `TrainingInsight`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ncation

- Call `TrainingPrompt.resetSession()` at session start (step === 1)
  to prevent applied counters from growing unbounded across sessions
- Add structured error logging to all three training tools
- Add truncation indicator (`...`) when training list preview is cut off

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ode` fallback

Memory store was hardcoded to `.opencode/memory/` but the config system
already uses `.altimate-code` as primary with `.opencode` as fallback.

Now checks for `.altimate-code/` directory first, falls back to `.opencode/`,
and defaults to `.altimate-code/` for new projects. Result is cached per
process to avoid repeated filesystem checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dation

Add dedicated trainer mode — the 8th primary agent — for systematically
building the AI teammate's knowledge base. Unlike inline corrections in
other modes, trainer mode actively scans codebases, validates training
against reality, and guides knowledge curation.

Changes:
- New `trainer` agent mode with read-only permissions (no write/edit/sql_execute)
- New `training_scan` tool: auto-discover patterns in models, SQL, config, tests, docs
- New `training_validate` tool: check training compliance against actual codebase
- Expand `TrainingKind` to 6 types: add `context` (background "why" knowledge)
  and `playbook` (multi-step procedures)
- Update `count()` to derive from enum (prevents drift when kinds change)
- Add KIND_HEADERS for context and playbook in prompt injection
- Update injection order: rules first, playbooks last (budget priority)
- Update training-save and training-list descriptions for new kinds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
anandgupta42 and others added 10 commits March 15, 2026 15:54
- New `data-engineering/training/index.md` (350+ lines):
  - Quick start with 3 entry points (trainer mode, inline corrections, /train skill)
  - Deep dive into all 4 trainer workflows (scan, validate, teach, gap analysis)
  - 5 comprehensive scenarios: new project onboarding, post-incident learning,
    quarterly review, business domain teaching, pre-migration documentation
  - Explicit limitations section (not a hard gate, budget limits, no auto-learning,
    heuristic validation, no conflict resolution, no version history)
  - Full reference tables for tools, skills, limits, and feature flag
- Updated `agent-modes.md`: add Researcher and Trainer mode sections with
  examples, capabilities, and "when to use" guidance
- Updated `getting-started.md`: add training link to "Next steps"
- Updated `mkdocs.yml`: add Training nav section under Data Engineering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s customization guide

Training is not a CLAUDE.md replacement — it's the mechanism by which users
customize the data engineering harness for their specific project. The agent
works WITH the user to discover what it needs to know, rather than requiring
users to write perfect static instructions.

Changes:
- Increase TRAINING_BUDGET from 6000 to 16000 chars (removes the #1 criticism
  from user simulations — budget was worse than unlimited CLAUDE.md)
- Complete docs rewrite with correct positioning:
  - "Customizing Your AI Teammate" framing (not "Training Your AI Teammate")
  - Research-backed "why" section (40-70% knowledge omission, guided discovery)
  - Clear comparison table: training vs CLAUDE.md (complementary, not competing)
  - 6 real-world scenarios including Databricks, Salesforce quirks, cost spikes
  - Honest limitations section (not a linter, not an audit trail, not automatic)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace two parallel injection systems (memory 8KB + training 16KB)
with a single unified injection that scores blocks by relevance to
the current agent.

How it works:
- All blocks (memory + training) loaded in one pass
- Each block scored: agent tag match (+10), training kind relevance
  per agent (+1-5), applied count bonus (+0-3), recency (+0-2),
  non-training base (+5)
- Builder sees rules/patterns first; analyst sees glossary/context first
- Budget is 20KB unified, filled greedily by score
- Training blocks still tracked with applied counts (fire-and-forget)

Architecture:
- memory/prompt.ts: new scoreBlock(), unified inject() with InjectionContext
- memory/types.ts: UNIFIED_INJECTION_BUDGET, AGENT_TRAINING_RELEVANCE weights
- session/prompt.ts: single inject call with agent context (was 2 separate)
- training/prompt.ts: deprecated, delegates to MemoryPrompt (backward compat)

No changes to: MemoryStore, TrainingStore, training tools, memory tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Research from 8 independent evaluations + SkillsBench (7,308 test runs)
found that compact focused context beats comprehensive docs by 20pp.
The training system's value is in correction capture (2-sec saves) and
team propagation (git sync) — not in regex scanning or keyword grep.

Removed:
- training_scan (255 lines) — regex pattern counting, not discovery
- training_validate (315 lines) — keyword grep, not validation

Simplified:
- trainer.txt: removed scan/validate workflows, focused on guided
  teaching and curation
- agent-modes.md: updated trainer section with correction-focused example
- training docs: complete rewrite with new pitch:
  "Correct the agent once. It remembers forever. Your team inherits it."
  Backed by SkillsBench research showing compact > comprehensive.

Net: -753 lines. 152 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…limitations

Gaps found by simulation team:

1. Remove `accepted`/`rejected` counters from TrainingBlockMeta — they were
   never incremented anywhere in the codebase (dead code since inception)
2. Add 5 training discoverability tips to TUI tips (was 0 mentions in 152 tips)
3. Expand limitations section in docs with honest, complete list:
   context budget, 20/kind limit, no approval workflow, SQL-focused,
   git discipline required

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Homepage: update from "Four agents" to "Seven agents" — add Researcher,
  Trainer, Executive cards with descriptions
- Getting Started: update training link to match new pitch
  "Corrections That Stick"
- Tools index: add Training row (3 tools + 3 skills) with link
- All references now consistent with simplified training system

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. stripTrainingMeta/parseTrainingMeta regex: remove multiline `m` flag
   that could match user content starting with `<!-- training` mid-string
   (types.ts, store.ts)

2. training_save content limit: reduce from 2500 to 1800 chars to account
   for ~200 char metadata overhead against MemoryStore's 2048 char limit
   (training-save.ts)

3. injectTrainingOnly: change `break` to `continue` so budget-exceeding
   section headers skip to next kind instead of stopping all injection
   (memory/prompt.ts)

4. injectTrainingOnly: track itemCount and return empty string when no
   items injected (was returning header-only string, inflating budget
   reports) (memory/prompt.ts)

5. projectDir cache: replace module-level singleton with Map keyed by
   Instance.directory to prevent stale paths when AsyncLocalStorage
   context changes across concurrent requests (memory/store.ts)

6. budgetUsage side effect: already fixed — delegates to injectTrainingOnly
   which is read-only (no applied count increment). Sentry comments were
   against pre-refactor code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Agent test: add researcher + trainer to "all disabled" test so it
   correctly expects "no primary visible agent" when ALL agents are off

2. Orphaned section headers: add pre-check that at least one entry fits
   before adding section header in both injectTrainingOnly and inject
   memory section (prevents header-only output inflating budget reports)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes from 6-model consensus review (Claude + GPT + Gemini + Kimi + MiniMax + GLM-5):

1. training_remove: add name validation regex matching training_save
   (Gemini finding — prevents path traversal via malformed names)

2. training_save: improve name transform to strip ALL non-alphanumeric
   chars, not just whitespace (Gemini finding — "don't-use-float!"
   now becomes "don-t-use-float" instead of failing regex)

3. incrementApplied: replace silent `.catch(() => {})` with warning
   log (Kimi + GLM-5 consensus — fire-and-forget is by design but
   failures should be visible in logs for debugging)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… check

1. formatTrainingEntry regex: remove multiline `m` flag that could
   match user content mid-string (memory/prompt.ts:82)

2. Memory block budget check: change `<` to `<=` so blocks that fit
   exactly into remaining budget are included (memory/prompt.ts:204)

3 prior Sentry findings already fixed in earlier commits:
   - projectDir cache (Map keyed by Instance.directory)
   - injectTrainingOnly header-only return (itemCount guard)
   - orphaned section headers (first-entry pre-check)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anandgupta42 anandgupta42 force-pushed the claude/ai-teammate-interface-p8SO6 branch from c119de9 to 4087cce Compare March 15, 2026 22:55
Fixes from consensus across Claude, GPT 5.2, Gemini 3.1, Kimi K2.5,
MiniMax M2.5, and GLM-5:

1. parseTrainingMeta: check safeParse().success before accessing .data
   (GLM-5 + MiniMax consensus — accessing .data on failed parse returns
   undefined, could cause downstream errors)

2. Stale detection: use `e.updated` not `e.created` so entries updated
   recently aren't incorrectly flagged as stale (MiniMax finding)

3. training_list: pass scope/kind filter to count() so summary table
   matches the filtered entries list (GPT finding)

4. training_remove: show hint entries from same scope only, not all
   scopes (GPT + MiniMax finding)

Prior fixes already addressed: name validation on remove (Gemini),
name transform punctuation (Gemini), silent incrementApplied catch
(Kimi + GLM-5), regex m flag (MiniMax + Sentry).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anandgupta42 anandgupta42 merged commit 6d56feb into main Mar 15, 2026
7 checks passed
anandgupta42 added a commit that referenced this pull request Mar 17, 2026
…148)

* Add AI Teammate repositioning design document

Comprehensive design for repositioning altimate from "AI tool" to "AI
teammate" — including trainable knowledge system (/teach, /train,
/feedback), Deep Research mode for multi-step investigations, team
memory that persists via git, and UX reframing from "agent modes" to
"teammate roles."

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Enrich design doc with OpenClaw research and proactive behaviors

Add detailed competitive analysis from OpenClaw (self-improving memory,
heartbeat scheduler, meet-users-where-they-are), Devin ($10.2B
valuation, "junior partner" framing), and Factory AI (workflow
embedding). Add proactive behaviors section with background monitors
(cost alerts, freshness checks, schema drift, PII scanning) and
auto-promotion of learned corrections.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Implement AI Teammate training system and Deep Research mode

Core training infrastructure built on top of existing memory system:

Training Store & Types:
- TrainingStore wraps MemoryStore with training-specific conventions
- Four knowledge kinds: pattern, rule, glossary, standard
- Structured metadata (applied count, source, acceptance tracking)
- Training blocks stored in .opencode/memory/training/ (git-committable)
- One person teaches, whole team benefits via git

Training Tools:
- training_save: Save learned patterns, rules, glossary, standards
- training_list: List all learned knowledge with applied counts
- training_remove: Remove outdated training entries

Training Skills:
- /teach: Learn patterns from example files in the codebase
- /train: Learn standards from documents or style guides
- /training-status: Dashboard of all learned knowledge

System Prompt Injection:
- Training knowledge injected alongside memory at session start
- Structured by kind: rules first, then patterns, standards, glossary
- Budget-limited to 6000 chars to control prompt size
- Zero LLM calls on startup — just reads files from disk

Deep Research Agent Mode:
- New "researcher" agent for multi-step investigations
- 4-phase protocol: Plan → Gather → Analyze → Report
- Read-only access to all warehouse, schema, FinOps tools
- Structured reports with evidence, root causes, action items

Agent Awareness:
- All agent prompts updated with training awareness section
- Agents offer to save corrections as rules when users correct behavior
- Training tools permitted in all agent modes

Tests:
- 88 new tests across 5 test files (types, store, prompt, tools, integration)
- All tests standalone (no Instance dependency)
- Full lifecycle tests: save → list → format → inject → remove
- Edge cases: budget limits, meta roundtrips, coexistence with memory

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Polish AI Teammate training UX: auto-lowercase names, update detection, budget visibility

- Fix researcher agent permissions: add training_save/remove (was read-only)
- Auto-lowercase + space-to-hyphen name transform in training_save (ARR → arr)
- Detect update vs new save, show "Updated" with preserved applied count
- Show training budget usage (chars/percent) on save, list, and remove
- Improve training_list: group by kind, show most-applied entries, budget %
- Improve training_remove: show available entries on not-found, applied count
- Show similar entry names in duplicate warnings (not just count)
- Raise content limit from 1800 to 2500 chars
- Export TRAINING_BUDGET constant, add budgetUsage() to TrainingPrompt
- Add 30 new tests: auto-lowercase, update detection, budget overflow,
  name collision, scale (80 entries), improved messaging
- All 118 training tests + 305 memory tests pass

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Enhance training UX: attribution, correction detection, priority sorting

- Builder prompt: add attribution instructions (cite training entries that
  influenced output), correction detection (explicit + implicit patterns),
  conflict flagging between contradictory training entries
- Add /teach, /train, /training-status to Available Skills list in builder prompt
- Sort training entries by applied count (descending) in prompt injection so
  most-used entries get priority within the 6000-char budget
- Restructure Teammate Training section with clear subsections

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Fix experience gaps from user journey simulations

Simulation findings and fixes:

1. training_save now echoes back saved content so user can verify
   what was captured (new saves show content preview, updates show
   old vs new diff)

2. When training limit is reached, error now lists existing entries
   sorted by applied count and suggests the least-applied entry
   for removal

3. Researcher prompt now documents training_save/remove permissions
   (was contradicting its own permissions by saying "read-only" while
   having write access to training)

4. Added 10 new tests: content echo, update diff, limit suggestion,
   special character preservation (SQL -->, Jinja, HTML comments,
   code blocks), priority sorting verification

Verified: --> in content does NOT corrupt meta block (false positive).
The non-greedy regex terminates at the meta block's own --> correctly.

128 training tests + 305 memory tests all pass.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Add self-improvement loop: applied tracking, insights, staleness detection

OpenClaw-inspired self-improvement mechanisms:

1. Wire up incrementApplied at injection time — counters now actually
   increment once per session per entry (deduped via session-scoped set),
   making "Most Applied" dashboard and priority sorting meaningful

2. TrainingInsights module analyzes training metadata and surfaces:
   - Stale entries (7+ days old, never applied) — suggests cleanup
   - High-value entries (5+ applications) — highlights most impactful
   - Near-limit warnings (18-19 of 20 entries per kind)
   - Consolidation opportunities (3+ entries with shared name prefix)

3. Insights automatically shown in training_list output

4. 24 new tests covering all insight types, boundary conditions,
   session tracking dedup, and format output

152 training tests + 305 memory tests all pass.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* fix: add dedicated training feature flag and remove unused insight type

- Add `ALTIMATE_DISABLE_TRAINING` flag independent of memory's disable flag
- Use new flag in session prompt injection and tool registry
- Remove unused `budget-warning` insight type from `TrainingInsight`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: reset training session tracking, add error logging, fix list truncation

- Call `TrainingPrompt.resetSession()` at session start (step === 1)
  to prevent applied counters from growing unbounded across sessions
- Add structured error logging to all three training tools
- Add truncation indicator (`...`) when training list preview is cut off

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use `.altimate-code/memory` as primary storage path with `.opencode` fallback

Memory store was hardcoded to `.opencode/memory/` but the config system
already uses `.altimate-code` as primary with `.opencode` as fallback.

Now checks for `.altimate-code/` directory first, falls back to `.opencode/`,
and defaults to `.altimate-code/` for new projects. Result is cached per
process to avoid repeated filesystem checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Trainer agent mode with pattern discovery and training validation

Add dedicated trainer mode — the 8th primary agent — for systematically
building the AI teammate's knowledge base. Unlike inline corrections in
other modes, trainer mode actively scans codebases, validates training
against reality, and guides knowledge curation.

Changes:
- New `trainer` agent mode with read-only permissions (no write/edit/sql_execute)
- New `training_scan` tool: auto-discover patterns in models, SQL, config, tests, docs
- New `training_validate` tool: check training compliance against actual codebase
- Expand `TrainingKind` to 6 types: add `context` (background "why" knowledge)
  and `playbook` (multi-step procedures)
- Update `count()` to derive from enum (prevents drift when kinds change)
- Add KIND_HEADERS for context and playbook in prompt injection
- Update injection order: rules first, playbooks last (budget priority)
- Update training-save and training-list descriptions for new kinds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add comprehensive training guide with scenarios and limitations

- New `data-engineering/training/index.md` (350+ lines):
  - Quick start with 3 entry points (trainer mode, inline corrections, /train skill)
  - Deep dive into all 4 trainer workflows (scan, validate, teach, gap analysis)
  - 5 comprehensive scenarios: new project onboarding, post-incident learning,
    quarterly review, business domain teaching, pre-migration documentation
  - Explicit limitations section (not a hard gate, budget limits, no auto-learning,
    heuristic validation, no conflict resolution, no version history)
  - Full reference tables for tools, skills, limits, and feature flag
- Updated `agent-modes.md`: add Researcher and Trainer mode sections with
  examples, capabilities, and "when to use" guidance
- Updated `getting-started.md`: add training link to "Next steps"
- Updated `mkdocs.yml`: add Training nav section under Data Engineering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: increase training budget to 16K chars and rewrite docs as harness customization guide

Training is not a CLAUDE.md replacement — it's the mechanism by which users
customize the data engineering harness for their specific project. The agent
works WITH the user to discover what it needs to know, rather than requiring
users to write perfect static instructions.

Changes:
- Increase TRAINING_BUDGET from 6000 to 16000 chars (removes the #1 criticism
  from user simulations — budget was worse than unlimited CLAUDE.md)
- Complete docs rewrite with correct positioning:
  - "Customizing Your AI Teammate" framing (not "Training Your AI Teammate")
  - Research-backed "why" section (40-70% knowledge omission, guided discovery)
  - Clear comparison table: training vs CLAUDE.md (complementary, not competing)
  - 6 real-world scenarios including Databricks, Salesforce quirks, cost spikes
  - Honest limitations section (not a linter, not an audit trail, not automatic)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: merge training into memory with context-aware relevance scoring

Replace two parallel injection systems (memory 8KB + training 16KB)
with a single unified injection that scores blocks by relevance to
the current agent.

How it works:
- All blocks (memory + training) loaded in one pass
- Each block scored: agent tag match (+10), training kind relevance
  per agent (+1-5), applied count bonus (+0-3), recency (+0-2),
  non-training base (+5)
- Builder sees rules/patterns first; analyst sees glossary/context first
- Budget is 20KB unified, filled greedily by score
- Training blocks still tracked with applied counts (fire-and-forget)

Architecture:
- memory/prompt.ts: new scoreBlock(), unified inject() with InjectionContext
- memory/types.ts: UNIFIED_INJECTION_BUDGET, AGENT_TRAINING_RELEVANCE weights
- session/prompt.ts: single inject call with agent context (was 2 separate)
- training/prompt.ts: deprecated, delegates to MemoryPrompt (backward compat)

No changes to: MemoryStore, TrainingStore, training tools, memory tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: cut training_scan and training_validate, simplify docs

Research from 8 independent evaluations + SkillsBench (7,308 test runs)
found that compact focused context beats comprehensive docs by 20pp.
The training system's value is in correction capture (2-sec saves) and
team propagation (git sync) — not in regex scanning or keyword grep.

Removed:
- training_scan (255 lines) — regex pattern counting, not discovery
- training_validate (315 lines) — keyword grep, not validation

Simplified:
- trainer.txt: removed scan/validate workflows, focused on guided
  teaching and curation
- agent-modes.md: updated trainer section with correction-focused example
- training docs: complete rewrite with new pitch:
  "Correct the agent once. It remembers forever. Your team inherits it."
  Backed by SkillsBench research showing compact > comprehensive.

Net: -753 lines. 152 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove dead accepted/rejected fields, add training tips, expand limitations

Gaps found by simulation team:

1. Remove `accepted`/`rejected` counters from TrainingBlockMeta — they were
   never incremented anywhere in the codebase (dead code since inception)
2. Add 5 training discoverability tips to TUI tips (was 0 mentions in 152 tips)
3. Expand limitations section in docs with honest, complete list:
   context budget, 20/kind limit, no approval workflow, SQL-focused,
   git discipline required

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update site-wide docs for training and new agent modes

- Homepage: update from "Four agents" to "Seven agents" — add Researcher,
  Trainer, Executive cards with descriptions
- Getting Started: update training link to match new pitch
  "Corrections That Stick"
- Tools index: add Training row (3 tools + 3 skills) with link
- All references now consistent with simplified training system

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Sentry review findings — 7 bugs fixed

1. stripTrainingMeta/parseTrainingMeta regex: remove multiline `m` flag
   that could match user content starting with `<!-- training` mid-string
   (types.ts, store.ts)

2. training_save content limit: reduce from 2500 to 1800 chars to account
   for ~200 char metadata overhead against MemoryStore's 2048 char limit
   (training-save.ts)

3. injectTrainingOnly: change `break` to `continue` so budget-exceeding
   section headers skip to next kind instead of stopping all injection
   (memory/prompt.ts)

4. injectTrainingOnly: track itemCount and return empty string when no
   items injected (was returning header-only string, inflating budget
   reports) (memory/prompt.ts)

5. projectDir cache: replace module-level singleton with Map keyed by
   Instance.directory to prevent stale paths when AsyncLocalStorage
   context changes across concurrent requests (memory/store.ts)

6. budgetUsage side effect: already fixed — delegates to injectTrainingOnly
   which is read-only (no applied count increment). Sentry comments were
   against pre-refactor code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: CI failure + new Sentry finding — orphaned headers and agent test

1. Agent test: add researcher + trainer to "all disabled" test so it
   correctly expects "no primary visible agent" when ALL agents are off

2. Orphaned section headers: add pre-check that at least one entry fits
   before adding section header in both injectTrainingOnly and inject
   memory section (prevents header-only output inflating budget reports)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address multi-model code review findings

Fixes from 6-model consensus review (Claude + GPT + Gemini + Kimi + MiniMax + GLM-5):

1. training_remove: add name validation regex matching training_save
   (Gemini finding — prevents path traversal via malformed names)

2. training_save: improve name transform to strip ALL non-alphanumeric
   chars, not just whitespace (Gemini finding — "don't-use-float!"
   now becomes "don-t-use-float" instead of failing regex)

3. incrementApplied: replace silent `.catch(() => {})` with warning
   log (Kimi + GLM-5 consensus — fire-and-forget is by design but
   failures should be visible in logs for debugging)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address new Sentry findings — regex m flag and off-by-one budget check

1. formatTrainingEntry regex: remove multiline `m` flag that could
   match user content mid-string (memory/prompt.ts:82)

2. Memory block budget check: change `<` to `<=` so blocks that fit
   exactly into remaining budget are included (memory/prompt.ts:204)

3 prior Sentry findings already fixed in earlier commits:
   - projectDir cache (Map keyed by Instance.directory)
   - injectTrainingOnly header-only return (itemCount guard)
   - orphaned section headers (first-entry pre-check)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 6-model consensus review — 4 remaining bugs

Fixes from consensus across Claude, GPT 5.2, Gemini 3.1, Kimi K2.5,
MiniMax M2.5, and GLM-5:

1. parseTrainingMeta: check safeParse().success before accessing .data
   (GLM-5 + MiniMax consensus — accessing .data on failed parse returns
   undefined, could cause downstream errors)

2. Stale detection: use `e.updated` not `e.created` so entries updated
   recently aren't incorrectly flagged as stale (MiniMax finding)

3. training_list: pass scope/kind filter to count() so summary table
   matches the filtered entries list (GPT finding)

4. training_remove: show hint entries from same scope only, not all
   scopes (GPT + MiniMax finding)

Prior fixes already addressed: name validation on remove (Gemini),
name transform punctuation (Gemini), silent incrementApplied catch
(Kimi + GLM-5), regex m flag (MiniMax + Sentry).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
@anandgupta42 anandgupta42 deleted the claude/ai-teammate-interface-p8SO6 branch March 17, 2026 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: AI Teammate training system with learn-by-example patterns

2 participants