Skip to content

feat: builder prompt rewrite + dbt skills consolidation + altimate-dbt CLI#174

Closed
suryaiyer95 wants to merge 10000 commits intomainfrom
feat/builder-prompt-and-dbt-skills
Closed

feat: builder prompt rewrite + dbt skills consolidation + altimate-dbt CLI#174
suryaiyer95 wants to merge 10000 commits intomainfrom
feat/builder-prompt-and-dbt-skills

Conversation

@suryaiyer95
Copy link
Contributor

Summary

  • Rewrite builder.txt from flat tool list to skills-first architecture with altimate_core offline SQL analysis tools surfaced
  • Add altimate-dbt CLI (packages/dbt-tools/) — 16 commands wrapping @altimateai/dbt-integration for one-shot dbt operations
  • Consolidate 8 dbt skills into 5 with progressive references/ system for lean AI routing
  • Fix dbt-tools CLI error handling in columns, init, and main dispatch

Changes

Builder Prompt (packages/opencode/src/altimate/prompts/builder.txt)

  • Skills-first architecture: organized tables for dbt (5), SQL (3), Training (3) skills
  • Surface 6 altimate_core tools (validate, semantics, lint, column_lineage, correct, grade) — previously invisible to the agent
  • Structured 5-phase workflow: Explore → Plan → Analyze → Execute → Validate
  • Common Pitfalls section from benchmark failure analysis
  • Removed verbose pre-execution protocol and FinOps tools

altimate-dbt CLI (packages/dbt-tools/)

  • 16 commands: init, doctor, info, compile, build, run, test, execute, columns, graph, deps, etc.
  • Config at ~/.altimate-code/dbt.json, auto-detected via altimate-dbt init
  • JSON output to stdout, logs to stderr
  • Python bridge fix for altimate_python_packages bundling

Skills Consolidation (.opencode/skills/)

  • New: dbt-develop, dbt-test, dbt-troubleshoot, dbt-analyze, dbt-docs (5 focused skills)
  • Deleted: dbt-cli, model-scaffold, generate-tests, yaml-config, incremental-logic, medallion-patterns, impact-analysis (7 merged)
  • Each skill has lean SKILL.md + references/ directory for progressive context

Test plan

  • Verify altimate-dbt init detects dbt project and creates config
  • Verify altimate-dbt build --model <name> compiles and runs models
  • Verify builder agent uses skills (dbt-develop, dbt-troubleshoot) when appropriate
  • Verify altimate_core_* tools appear in agent's tool list
  • Run Spider2-DBT benchmark to compare pass rate vs main

🤖 Generated with Claude Code

adamdotdevin and others added 30 commits March 3, 2026 05:35
Co-authored-by: Adam <2363879+adamdotdevin@users.noreply.github.com>
* fix: auto-bootstrap Python engine before starting bridge

Bridge.start() now calls ensureEngine() to download uv, create an
isolated venv, and install altimate-engine before spawning the Python
subprocess. resolvePython() also checks the managed venv path so
the correct interpreter is used after bootstrapping.

Previously, resolvePython() would fall through to system python3
which doesn't have altimate_engine installed, causing
ModuleNotFoundError on first run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add bridge client tests for ensureEngine and resolvePython

- Export resolvePython() from client.ts for direct unit testing
- Test that ALTIMATE_CLI_PYTHON env var takes highest priority
- Test that managed engine venv is used when present on disk
- Test fallback to python3 when no venvs exist
- Test that ensureEngine() is called before bridge spawn
- Mock only bridge/engine module to avoid leaking into other tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: auto-bootstrap Python engine before starting bridge

Bridge.start() now calls ensureEngine() to download uv, create an
isolated venv, and install altimate-engine before spawning the Python
subprocess. resolvePython() also checks the managed venv path so
the correct interpreter is used after bootstrapping.

Previously, resolvePython() would fall through to system python3
which doesn't have altimate_engine installed, causing
ModuleNotFoundError on first run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add bridge client tests for ensureEngine and resolvePython

- Export resolvePython() from client.ts for direct unit testing
- Test that ALTIMATE_CLI_PYTHON env var takes highest priority
- Test that managed engine venv is used when present on disk
- Test fallback to python3 when no venvs exist
- Test that ensureEngine() is called before bridge spawn
- Mock only bridge/engine module to avoid leaking into other tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Move existing data engineering docs into data-engineering/ subdirectory and
add 29 new pages covering platform features: TUI, CLI, web UI, IDE and CI/CD
integration, configuration, providers, tools, agents, models, themes, keybinds,
commands, formatters, permissions, LSP, MCP, ACP, skills, custom tools, SDK,
server, plugins, ecosystem, network, troubleshooting, and Windows/WSL.

All content adapted with altimate-code branding (env vars, config paths,
package names). mkdocs builds with zero warnings.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Remove navigation.expand so sub-sections start collapsed (less overwhelming)
- Group Configure's 16 flat items into 5 logical sub-sections:
  Providers & Models, Agents & Tools, Behavior, Appearance, Integrations
- Group orphaned bottom pages (Network, Troubleshooting, Windows/WSL) under Reference
- All 44 pages preserved, zero information lost

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
anandgupta42 and others added 6 commits March 15, 2026 15:59
…148)

* Add AI Teammate repositioning design document

Comprehensive design for repositioning altimate from "AI tool" to "AI
teammate" — including trainable knowledge system (/teach, /train,
/feedback), Deep Research mode for multi-step investigations, team
memory that persists via git, and UX reframing from "agent modes" to
"teammate roles."

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Enrich design doc with OpenClaw research and proactive behaviors

Add detailed competitive analysis from OpenClaw (self-improving memory,
heartbeat scheduler, meet-users-where-they-are), Devin ($10.2B
valuation, "junior partner" framing), and Factory AI (workflow
embedding). Add proactive behaviors section with background monitors
(cost alerts, freshness checks, schema drift, PII scanning) and
auto-promotion of learned corrections.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Implement AI Teammate training system and Deep Research mode

Core training infrastructure built on top of existing memory system:

Training Store & Types:
- TrainingStore wraps MemoryStore with training-specific conventions
- Four knowledge kinds: pattern, rule, glossary, standard
- Structured metadata (applied count, source, acceptance tracking)
- Training blocks stored in .opencode/memory/training/ (git-committable)
- One person teaches, whole team benefits via git

Training Tools:
- training_save: Save learned patterns, rules, glossary, standards
- training_list: List all learned knowledge with applied counts
- training_remove: Remove outdated training entries

Training Skills:
- /teach: Learn patterns from example files in the codebase
- /train: Learn standards from documents or style guides
- /training-status: Dashboard of all learned knowledge

System Prompt Injection:
- Training knowledge injected alongside memory at session start
- Structured by kind: rules first, then patterns, standards, glossary
- Budget-limited to 6000 chars to control prompt size
- Zero LLM calls on startup — just reads files from disk

Deep Research Agent Mode:
- New "researcher" agent for multi-step investigations
- 4-phase protocol: Plan → Gather → Analyze → Report
- Read-only access to all warehouse, schema, FinOps tools
- Structured reports with evidence, root causes, action items

Agent Awareness:
- All agent prompts updated with training awareness section
- Agents offer to save corrections as rules when users correct behavior
- Training tools permitted in all agent modes

Tests:
- 88 new tests across 5 test files (types, store, prompt, tools, integration)
- All tests standalone (no Instance dependency)
- Full lifecycle tests: save → list → format → inject → remove
- Edge cases: budget limits, meta roundtrips, coexistence with memory

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Polish AI Teammate training UX: auto-lowercase names, update detection, budget visibility

- Fix researcher agent permissions: add training_save/remove (was read-only)
- Auto-lowercase + space-to-hyphen name transform in training_save (ARR → arr)
- Detect update vs new save, show "Updated" with preserved applied count
- Show training budget usage (chars/percent) on save, list, and remove
- Improve training_list: group by kind, show most-applied entries, budget %
- Improve training_remove: show available entries on not-found, applied count
- Show similar entry names in duplicate warnings (not just count)
- Raise content limit from 1800 to 2500 chars
- Export TRAINING_BUDGET constant, add budgetUsage() to TrainingPrompt
- Add 30 new tests: auto-lowercase, update detection, budget overflow,
  name collision, scale (80 entries), improved messaging
- All 118 training tests + 305 memory tests pass

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Enhance training UX: attribution, correction detection, priority sorting

- Builder prompt: add attribution instructions (cite training entries that
  influenced output), correction detection (explicit + implicit patterns),
  conflict flagging between contradictory training entries
- Add /teach, /train, /training-status to Available Skills list in builder prompt
- Sort training entries by applied count (descending) in prompt injection so
  most-used entries get priority within the 6000-char budget
- Restructure Teammate Training section with clear subsections

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Fix experience gaps from user journey simulations

Simulation findings and fixes:

1. training_save now echoes back saved content so user can verify
   what was captured (new saves show content preview, updates show
   old vs new diff)

2. When training limit is reached, error now lists existing entries
   sorted by applied count and suggests the least-applied entry
   for removal

3. Researcher prompt now documents training_save/remove permissions
   (was contradicting its own permissions by saying "read-only" while
   having write access to training)

4. Added 10 new tests: content echo, update diff, limit suggestion,
   special character preservation (SQL -->, Jinja, HTML comments,
   code blocks), priority sorting verification

Verified: --> in content does NOT corrupt meta block (false positive).
The non-greedy regex terminates at the meta block's own --> correctly.

128 training tests + 305 memory tests all pass.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Add self-improvement loop: applied tracking, insights, staleness detection

OpenClaw-inspired self-improvement mechanisms:

1. Wire up incrementApplied at injection time — counters now actually
   increment once per session per entry (deduped via session-scoped set),
   making "Most Applied" dashboard and priority sorting meaningful

2. TrainingInsights module analyzes training metadata and surfaces:
   - Stale entries (7+ days old, never applied) — suggests cleanup
   - High-value entries (5+ applications) — highlights most impactful
   - Near-limit warnings (18-19 of 20 entries per kind)
   - Consolidation opportunities (3+ entries with shared name prefix)

3. Insights automatically shown in training_list output

4. 24 new tests covering all insight types, boundary conditions,
   session tracking dedup, and format output

152 training tests + 305 memory tests all pass.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* fix: add dedicated training feature flag and remove unused insight type

- Add `ALTIMATE_DISABLE_TRAINING` flag independent of memory's disable flag
- Use new flag in session prompt injection and tool registry
- Remove unused `budget-warning` insight type from `TrainingInsight`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: reset training session tracking, add error logging, fix list truncation

- Call `TrainingPrompt.resetSession()` at session start (step === 1)
  to prevent applied counters from growing unbounded across sessions
- Add structured error logging to all three training tools
- Add truncation indicator (`...`) when training list preview is cut off

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use `.altimate-code/memory` as primary storage path with `.opencode` fallback

Memory store was hardcoded to `.opencode/memory/` but the config system
already uses `.altimate-code` as primary with `.opencode` as fallback.

Now checks for `.altimate-code/` directory first, falls back to `.opencode/`,
and defaults to `.altimate-code/` for new projects. Result is cached per
process to avoid repeated filesystem checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Trainer agent mode with pattern discovery and training validation

Add dedicated trainer mode — the 8th primary agent — for systematically
building the AI teammate's knowledge base. Unlike inline corrections in
other modes, trainer mode actively scans codebases, validates training
against reality, and guides knowledge curation.

Changes:
- New `trainer` agent mode with read-only permissions (no write/edit/sql_execute)
- New `training_scan` tool: auto-discover patterns in models, SQL, config, tests, docs
- New `training_validate` tool: check training compliance against actual codebase
- Expand `TrainingKind` to 6 types: add `context` (background "why" knowledge)
  and `playbook` (multi-step procedures)
- Update `count()` to derive from enum (prevents drift when kinds change)
- Add KIND_HEADERS for context and playbook in prompt injection
- Update injection order: rules first, playbooks last (budget priority)
- Update training-save and training-list descriptions for new kinds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add comprehensive training guide with scenarios and limitations

- New `data-engineering/training/index.md` (350+ lines):
  - Quick start with 3 entry points (trainer mode, inline corrections, /train skill)
  - Deep dive into all 4 trainer workflows (scan, validate, teach, gap analysis)
  - 5 comprehensive scenarios: new project onboarding, post-incident learning,
    quarterly review, business domain teaching, pre-migration documentation
  - Explicit limitations section (not a hard gate, budget limits, no auto-learning,
    heuristic validation, no conflict resolution, no version history)
  - Full reference tables for tools, skills, limits, and feature flag
- Updated `agent-modes.md`: add Researcher and Trainer mode sections with
  examples, capabilities, and "when to use" guidance
- Updated `getting-started.md`: add training link to "Next steps"
- Updated `mkdocs.yml`: add Training nav section under Data Engineering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: increase training budget to 16K chars and rewrite docs as harness customization guide

Training is not a CLAUDE.md replacement — it's the mechanism by which users
customize the data engineering harness for their specific project. The agent
works WITH the user to discover what it needs to know, rather than requiring
users to write perfect static instructions.

Changes:
- Increase TRAINING_BUDGET from 6000 to 16000 chars (removes the #1 criticism
  from user simulations — budget was worse than unlimited CLAUDE.md)
- Complete docs rewrite with correct positioning:
  - "Customizing Your AI Teammate" framing (not "Training Your AI Teammate")
  - Research-backed "why" section (40-70% knowledge omission, guided discovery)
  - Clear comparison table: training vs CLAUDE.md (complementary, not competing)
  - 6 real-world scenarios including Databricks, Salesforce quirks, cost spikes
  - Honest limitations section (not a linter, not an audit trail, not automatic)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: merge training into memory with context-aware relevance scoring

Replace two parallel injection systems (memory 8KB + training 16KB)
with a single unified injection that scores blocks by relevance to
the current agent.

How it works:
- All blocks (memory + training) loaded in one pass
- Each block scored: agent tag match (+10), training kind relevance
  per agent (+1-5), applied count bonus (+0-3), recency (+0-2),
  non-training base (+5)
- Builder sees rules/patterns first; analyst sees glossary/context first
- Budget is 20KB unified, filled greedily by score
- Training blocks still tracked with applied counts (fire-and-forget)

Architecture:
- memory/prompt.ts: new scoreBlock(), unified inject() with InjectionContext
- memory/types.ts: UNIFIED_INJECTION_BUDGET, AGENT_TRAINING_RELEVANCE weights
- session/prompt.ts: single inject call with agent context (was 2 separate)
- training/prompt.ts: deprecated, delegates to MemoryPrompt (backward compat)

No changes to: MemoryStore, TrainingStore, training tools, memory tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: cut training_scan and training_validate, simplify docs

Research from 8 independent evaluations + SkillsBench (7,308 test runs)
found that compact focused context beats comprehensive docs by 20pp.
The training system's value is in correction capture (2-sec saves) and
team propagation (git sync) — not in regex scanning or keyword grep.

Removed:
- training_scan (255 lines) — regex pattern counting, not discovery
- training_validate (315 lines) — keyword grep, not validation

Simplified:
- trainer.txt: removed scan/validate workflows, focused on guided
  teaching and curation
- agent-modes.md: updated trainer section with correction-focused example
- training docs: complete rewrite with new pitch:
  "Correct the agent once. It remembers forever. Your team inherits it."
  Backed by SkillsBench research showing compact > comprehensive.

Net: -753 lines. 152 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove dead accepted/rejected fields, add training tips, expand limitations

Gaps found by simulation team:

1. Remove `accepted`/`rejected` counters from TrainingBlockMeta — they were
   never incremented anywhere in the codebase (dead code since inception)
2. Add 5 training discoverability tips to TUI tips (was 0 mentions in 152 tips)
3. Expand limitations section in docs with honest, complete list:
   context budget, 20/kind limit, no approval workflow, SQL-focused,
   git discipline required

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update site-wide docs for training and new agent modes

- Homepage: update from "Four agents" to "Seven agents" — add Researcher,
  Trainer, Executive cards with descriptions
- Getting Started: update training link to match new pitch
  "Corrections That Stick"
- Tools index: add Training row (3 tools + 3 skills) with link
- All references now consistent with simplified training system

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Sentry review findings — 7 bugs fixed

1. stripTrainingMeta/parseTrainingMeta regex: remove multiline `m` flag
   that could match user content starting with `<!-- training` mid-string
   (types.ts, store.ts)

2. training_save content limit: reduce from 2500 to 1800 chars to account
   for ~200 char metadata overhead against MemoryStore's 2048 char limit
   (training-save.ts)

3. injectTrainingOnly: change `break` to `continue` so budget-exceeding
   section headers skip to next kind instead of stopping all injection
   (memory/prompt.ts)

4. injectTrainingOnly: track itemCount and return empty string when no
   items injected (was returning header-only string, inflating budget
   reports) (memory/prompt.ts)

5. projectDir cache: replace module-level singleton with Map keyed by
   Instance.directory to prevent stale paths when AsyncLocalStorage
   context changes across concurrent requests (memory/store.ts)

6. budgetUsage side effect: already fixed — delegates to injectTrainingOnly
   which is read-only (no applied count increment). Sentry comments were
   against pre-refactor code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: CI failure + new Sentry finding — orphaned headers and agent test

1. Agent test: add researcher + trainer to "all disabled" test so it
   correctly expects "no primary visible agent" when ALL agents are off

2. Orphaned section headers: add pre-check that at least one entry fits
   before adding section header in both injectTrainingOnly and inject
   memory section (prevents header-only output inflating budget reports)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address multi-model code review findings

Fixes from 6-model consensus review (Claude + GPT + Gemini + Kimi + MiniMax + GLM-5):

1. training_remove: add name validation regex matching training_save
   (Gemini finding — prevents path traversal via malformed names)

2. training_save: improve name transform to strip ALL non-alphanumeric
   chars, not just whitespace (Gemini finding — "don't-use-float!"
   now becomes "don-t-use-float" instead of failing regex)

3. incrementApplied: replace silent `.catch(() => {})` with warning
   log (Kimi + GLM-5 consensus — fire-and-forget is by design but
   failures should be visible in logs for debugging)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address new Sentry findings — regex m flag and off-by-one budget check

1. formatTrainingEntry regex: remove multiline `m` flag that could
   match user content mid-string (memory/prompt.ts:82)

2. Memory block budget check: change `<` to `<=` so blocks that fit
   exactly into remaining budget are included (memory/prompt.ts:204)

3 prior Sentry findings already fixed in earlier commits:
   - projectDir cache (Map keyed by Instance.directory)
   - injectTrainingOnly header-only return (itemCount guard)
   - orphaned section headers (first-entry pre-check)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 6-model consensus review — 4 remaining bugs

Fixes from consensus across Claude, GPT 5.2, Gemini 3.1, Kimi K2.5,
MiniMax M2.5, and GLM-5:

1. parseTrainingMeta: check safeParse().success before accessing .data
   (GLM-5 + MiniMax consensus — accessing .data on failed parse returns
   undefined, could cause downstream errors)

2. Stale detection: use `e.updated` not `e.created` so entries updated
   recently aren't incorrectly flagged as stale (MiniMax finding)

3. training_list: pass scope/kind filter to count() so summary table
   matches the filtered entries list (GPT finding)

4. training_remove: show hint entries from same scope only, not all
   scopes (GPT + MiniMax finding)

Prior fixes already addressed: name validation on remove (Gemini),
name transform punctuation (Gemini), silent incrementApplied catch
(Kimi + GLM-5), regex m flag (MiniMax + Sentry).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
npm v7+ suppresses ALL postinstall output (stdout AND stderr), so
the welcome box was never visible after `npm install`. Users only
saw "added 2 packages" with no feedback.

Move the full welcome box into `showWelcomeBannerIfNeeded()` which
runs in the CLI middleware before the TUI starts. The postinstall
script now only writes the marker file — no output.

Flow:
1. `npm install` → postinstall writes `.installed-version` marker
2. First `altimate` run → CLI reads marker, shows welcome box, deletes marker
3. Subsequent runs → no marker, no banner

Closes #160
Multiple scripts and CI workflows were fetching/pushing tags in ways
that caused ~900 upstream OpenCode tags to leak into our origin remote:

- CI `git fetch upstream` auto-followed tags — added `--no-tags`
- Release scripts used `git push --tags` pushing ALL local tags to
  origin — changed to push only the specific new tag
- Release scripts used `git fetch --force --tags` without explicit
  remote — added explicit `origin`
- `script/publish.ts` used `--tags` flag — push only `v${version}`
- Docs referenced `git fetch upstream --tags` — fixed to `--no-tags`

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…erge (#168)

* fix: sidebar shows `OpenCode` instead of `Altimate Code` after upstream merge

- Replace `<b>Open</b><b>Code</b>` with `<b>Altimate</b><b> Code</b>` in sidebar footer
- Add `altimate_change` markers to protect branding block from future upstream merges
- Add TUI branding guard tests to `upstream-merge-guard.test.ts`

Closes #167

* fix: remove stale `accepted`/`rejected` properties from `TrainingBlockMeta` test

These fields were removed from the type but the test wasn't updated.
* feat: added a skill for data story telling and visualizations/ data products

* fix: rename skill to data-viz

* fix: reduce skills.md and references files by 60%

---------

Co-authored-by: Saurabh Arora <saurabh@altimate.ai>
suryaiyer95 and others added 4 commits March 15, 2026 17:54
New `packages/dbt-tools/` TypeScript package wrapping `@altimateai/dbt-integration`
to provide one-shot dbt CLI operations (compile, build, test, execute, introspect).

- 16 commands: init, doctor, info, compile, compile-query, build, run, test,
  build-project, execute, columns, columns-source, column-values, children,
  parents, deps, add-packages
- Config at `~/.altimate-code/dbt.json`, auto-detected via `altimate-dbt init`
- Prerequisite validation (`doctor`) checks Python, dbt-core, and project health
- Structured JSON output to stdout, logs to stderr, `--format text` for humans
- Graceful error handling with actionable `error` + `fix` fields
- Patch `python-bridge@1.1.0` to fix `bluebird.promisifyAll` crash
- Build with `bun build --target node` for Node.js runtime (Bun IPC bug workaround)
- 11 tests covering config round-trip, CLI dispatch, error paths
- `/dbt-cli` skill teaching AI agents when and how to invoke each command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… bridge

`bun build` bundles all JS into a single `dist/index.js`, causing
`import.meta.url` to resolve to the bundle location instead of the
original `@altimateai/dbt-integration/dist/` directory. This meant
`PYTHONPATH` pointed to a nonexistent `altimate_python_packages/` dir,
breaking `dbt_core_integration` imports.

- Add `script/copy-python.ts` post-build step that copies
  `altimate_python_packages/` from the npm package into `dist/`
- Remove developer-only build instructions from `/dbt-cli` skill

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nces/` system

Restructure all dbt-related skills for better AI routing and progressive context disclosure:

**New skills (5):**
- `dbt-develop` — model creation hub (merges model-scaffold, yaml-config, medallion-patterns, incremental-logic, dbt-cli)
- `dbt-test` — schema tests, unit tests, custom tests (merges generate-tests + new content)
- `dbt-troubleshoot` — diagnostic workflow for compilation, runtime, and test errors
- `dbt-analyze` — downstream impact analysis using lineage (replaces impact-analysis)
- `dbt-docs` — enhanced with `altimate-dbt` integration

**Deleted skills (7):**
- dbt-cli, model-scaffold, generate-tests, yaml-config, incremental-logic, medallion-patterns, impact-analysis

**Architecture:**
- Lean SKILL.md files for AI routing (When to Use / Do NOT Use sections)
- Deep `references/` directories for on-demand knowledge (read only when needed)
- Shared `altimate-dbt-commands.md` reference in every skill
- Iron Rules, Common Mistakes tables, and Rationalizations to Resist patterns
- All skills use `altimate-dbt` commands instead of raw `dbt`

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ce `altimate_core` tools

- Restructure `builder.txt` from flat tool list to skills-first architecture
- Surface 6 `altimate_core` offline SQL analysis tools (validate, semantics, lint, column_lineage, correct, grade)
- Add structured 5-phase workflow: Explore → Plan → Analyze → Execute → Validate
- Add Common Pitfalls section from benchmark failure analysis
- Fix dbt-tools CLI: improve error handling in `columns`, `init`, and main dispatch
- Update dbt-develop and dbt-troubleshoot skills with `altimate-dbt` references

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +175 to +178
result = (await import("./commands/graph")).children(adapter, rest)
break
case "parents":
result = (await import("./commands/graph")).parents(adapter, rest)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The children and parents commands are missing async/await, causing them to return a Promise instead of data, which results in empty {} output.
Severity: HIGH

Suggested Fix

Add the async keyword to the children and parents function declarations in graph.ts. Then, add await before the adapter method calls (getChildrenModels and getParentModels) within those functions. Finally, add await to the calls for these commands in index.ts.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: packages/dbt-tools/src/index.ts#L175-L178

Potential issue: The `children` and `parents` commands are not defined as `async`
functions and do not `await` the results from their respective adapter methods
(`getChildrenModels` and `getParentModels`). All other commands and adapter interactions
in the codebase use `async`/`await`. This inconsistency will cause the commands to
return a Promise object instead of the resolved data. Consequently, when the result is
stringified to JSON for output, it will produce an empty object `{}` instead of the
expected model graph, and any errors during execution will lead to unhandled promise
rejections.

Did we get this right? 👍 / 👎 to inform future reviews.

@suryaiyer95 suryaiyer95 force-pushed the feat/builder-prompt-and-dbt-skills branch from a30e344 to ce928a6 Compare March 16, 2026 00:55
Comment on lines +174 to +178
case "children":
result = (await import("./commands/graph")).children(adapter, rest)
break
case "parents":
result = (await import("./commands/graph")).parents(adapter, rest)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The children and parents commands are missing await, which can cause a race condition where the adapter is disposed before the async operation completes, leading to runtime errors.
Severity: HIGH

Suggested Fix

Mark the children and parents functions in graph.ts as async. Then, add await to the calls to these functions in index.ts for the children and parents cases to ensure the operations complete before the adapter is disposed.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: packages/dbt-tools/src/index.ts#L174-L178

Potential issue: The `children` and `parents` commands in the main switch statement do
not `await` the result of their respective functions. These functions call
`adapter.getChildrenModels` and `adapter.getParentModels`, which likely return Promises,
based on the usage pattern of other adapter methods. The `finally` block disposes of the
`adapter` immediately after the command is dispatched. This creates a race condition
where the adapter can be disposed of before the asynchronous operation completes,
leading to a runtime error when the pending Promise attempts to access the disposed
adapter's resources.

@dev-punia-altimate
Copy link

🤖 Behavioral Analysis — 4 Finding(s)

🔴 Critical (1)

  • F1 packages/dbt-tools/src/commands/graph.ts:17 [Missing async/await — Promise returned as data]
    Both children and parents are declared as plain synchronous functions but call async adapter methods without await. They return a Promise instead of resolved data. In index.ts the results are also not awaited (lines 4116-4121). Effect 1: result is a Promise object — JSON.stringify(Promise) produces {} — both commands output an empty object instead of the model graph. Effect 2: the finally { await adapter.dispose() } block fires immediately after the switch, disposing the adapter while the Promise is still pending. When the Promise eventually resolves it tries to use a closed Python bridge, causing a crash caught by the unhandledRejection handler. All other commands in the switch use result = await ...; this is the only place the pattern is broken. Note: flagged twice by sentry bot but not yet acknowledged by the author. — Auto-fixable: In graph.ts add async to both function declarations and await the adapter calls. In index.ts add await before both .children(adapter, rest) and .parents(adapter, rest) calls.

🟡 Warnings (1)

  • F2 packages/opencode/src/tool/registry.ts:218 [DbtRunTool removed from registry despite description updated for fallback use]
    The PR simultaneously (a) updates dbt-run.ts description to 'Use this tool only as a fallback when altimate-dbt is unavailable' and (b) removes DbtRunTool from both the re-export in altimate/index.ts and the tool registry. The updated description implies the tool should remain available as a last resort, but removing it from the registry makes it completely inaccessible to the agent. The builder.txt prompt also contradicts the updated tool description: the prompt says 'Never call raw dbt directly' while the description says 'use when altimate-dbt is unavailable'. In environments where altimate-dbt has not been initialized (fresh installs, CI), the agent now has zero dbt execution path. The file dbt-run.ts still exists with an unreachable description, which will confuse future maintainers.

🔵 Nits (2)

  • F3 packages/dbt-tools/src/commands/execute.ts:8 — Two edge cases silently fall through to unlimited execution: (1) --limit 0: raw is '0' (truthy string), parseInt returns 0, if (0) is falsy so unlimited SQL runs instead of returning empty; (2) --limit abc or when the next argv is another flag: parseInt returns NaN, if (NaN) is falsy so unlimited SQL runs with no warning. Neither path signals to the caller that the limit was ignored.

  • F4 packages/dbt-tools/src/commands/columns.ts:36 — The values() function calls adapter.getColumnValues(model, col) without a try/catch. The sibling columns() and source() functions both wrap their adapter calls in try/catch and return structured { error: '...actionable message...' } objects. If getColumnValues throws, the error propagates to bail() which emits a generic 'Run: altimate-dbt doctor' message instead of the actionable context the other commands provide.

Analysis run | Powered by QA Autopilot

suryaiyer95 and others added 7 commits March 16, 2026 00:43
…, improve builder prompt

- Remove `coalesce(a, 0)` guidance from dbt-develop and dbt-troubleshoot skills
  (caused NULL→0 conversion failures in salesforce001 and others)
- Remove `dbt_packages/` reading instructions from dbt-develop skill
  (caused agent to spend too many events reading packages, fewer building)
- Change dbt build guidance from individual `altimate-dbt build --model` to
  full-project `dbt build` to ensure all models including package models materialize
- Add explicit `dbt deps` guidance for package installation
- Add NULL preservation guidance (don't coalesce unless explicitly required)
- Add date spine boundary guidance (derive from source data, not `current_date`)

Spider2-DBT benchmark context:
- Run 1 (pre-changes): 29/68 = 42.65%
- Run 2 (added dbt_packages reading): 21/68 = 30.88% (regression)
- Run 3 (removed coalesce/packages reading): 23/68 = 33.82%
- This commit targets the remaining issues for Run 4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ession)

Reverting `dbt build` (full project) back to `altimate-dbt build --model <name>`.
The full-project build caused agent to waste event budget and miss models.

Kept from post-baseline changes:
- `dbt deps` guidance for package installation
- NULL vs 0 preservation pitfall
- Removed coalesce guidance that caused salesforce001 failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add final `dbt build` step after all models are created to build package models
- Add column casing preservation guidance (e.g., keep `MINIMUM_NIGHTS` not `minimum_nights`)
- Add warning against extra columns not requested by the task
- Add date spine completeness guidance (derive boundaries from source data)
- Update dbt-develop skill with final full-project build step

Run 6 result: 27/68 = 39.7% (up from 33.8%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removed:
- Training tools/skills section (unused in benchmark)
- Teammate training section (unused in benchmark)
- Verbose 5-step workflow (replaced with 3-step)
- Verbose SQL analysis tools table (compressed to 4 bullets)
- `dbt build` final step (caused 20 timeouts in run 8)

Reverted dbt-develop skill to original (no full-project build).

Run 6: 27/68 (with dual build) → Run 7: 19/68 → Run 8: 18/68 (20 timeouts!)
Hypothesis: leaner prompt = more event budget for model creation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip everything non-essential:
- Remove skills tables (agent doesn't load skills in benchmark)
- Remove SQL analysis tools (agent rarely uses them)
- Remove redundant pitfalls
- Keep only: principles, dbt commands, workflow, key pitfalls

Hypothesis: less system prompt = more context for task = better results

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pt and skills

- Add principle 4 to builder prompt: "Fix everything" — run full `dbt build` after changes
- Add full project build instruction after first-build note
- Add 4 new common pitfalls: column casing, stopping at compile, skipping full build, ignoring pre-existing failures
- Add iron rule 5 to dbt-develop: fix ALL errors including pre-existing
- Expand dbt-troubleshoot iron rule to include fixing all errors, not just reported ones

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: improve builder prompt, skills, and add Langfuse tracing
Copilot AI review requested due to automatic review settings March 16, 2026 18:47
Comment on lines +8 to +9
const limit = raw ? parseInt(raw, 10) : undefined
if (limit) return adapter.immediatelyExecuteSQLWithLimit(sql, model, limit)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: parseInt on the --limit option can return NaN. The subsequent truthiness check if (limit) fails silently, ignoring the user's invalid input instead of raising an error.
Severity: MEDIUM

Suggested Fix

After parsing the --limit value with parseInt, add a validation step using Number.isNaN() to check if the result is NaN. If it is, throw an error to inform the user that they have provided an invalid value for the limit. This aligns with the existing validation pattern in packages/opencode/src/session/retry.ts.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: packages/dbt-tools/src/commands/execute.ts#L8-L9

Potential issue: When a non-numeric value is passed to the `--limit` option, `parseInt`
returns `NaN`. The code only performs a truthiness check on the result (`if (limit)`).
Since `NaN` is falsy, the condition fails, and the code silently falls back to executing
the SQL without a limit via `immediatelyExecuteSQL`. This contradicts the user's intent
and happens without any error or warning. The expected behavior is to validate the
parsed number and inform the user if their input is invalid, a pattern already
established elsewhere in the codebase.

Comment on lines +39 to +41
function format(result?: CommandProcessResult) {
if (result?.stderr) return { error: result.stderr, stdout: result.stdout }
return { stdout: result?.stdout ?? "" }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The format function incorrectly uses the presence of stderr content to detect errors, rather than checking the exit_code. This can cause successful commands to be reported as failures.
Severity: HIGH

Suggested Fix

Modify the format function to determine success or failure based on the result.exit_code property. An exit_code of 0 indicates success. The presence of stderr should not be treated as a definitive error condition, as shown in the implementation for dbt-run.ts.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: packages/dbt-tools/src/commands/build.ts#L39-L41

Potential issue: The `format` function in `build.ts` and `deps.ts` incorrectly
determines if a dbt command failed. It checks if `result.stderr` has content, but dbt
often writes non-error information like warnings and progress logs to stderr. The
function should instead check `result.exit_code === 0` to determine success, which is
the reliable indicator. This incorrect logic will cause successful dbt operations (like
`build`, `test`, or `deps`) to be reported as failures if dbt writes anything to stderr,
leading to user confusion and unnecessary retries.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new altimate-dbt CLI package (Bun/TS) for dbt operations and adds a Spider2-DBT benchmarking/evaluation harness, while updating Altimate prompts/tools to prefer altimate-dbt over raw dbt execution and patching python-bridge for runtime compatibility.

Changes:

  • Add packages/dbt-tools (CLI + adapter wrapper around @altimateai/dbt-integration) with Bun tests and workspace wiring.
  • Add experiments/spider2_dbt benchmark runner, evaluator, report generator, and SQLite tracker for results.
  • Update Altimate prompts/tool exports/registry to de-emphasize or remove the dbt_run tool, and patch python-bridge@1.1.0.

Reviewed changes

Copilot reviewed 64 out of 66 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
patches/python-bridge@1.1.0.patch Patch python-bridge child_process import to avoid promisify behavior.
packages/opencode/src/tool/registry.ts Removes DbtRunTool import/registration from tool registry.
packages/opencode/src/altimate/tools/dbt-run.ts Updates tool description to prefer altimate-dbt (tool now appears orphaned).
packages/opencode/src/altimate/prompts/builder.txt Rewrites builder prompt to emphasize altimate-dbt workflow and principles.
packages/opencode/src/altimate/index.ts Removes barrel export for dbt-run.
packages/dbt-tools/tsconfig.json Adds strict TS config for the new dbt-tools package.
packages/dbt-tools/test/config.test.ts Adds basic tests around config handling/JSON structure.
packages/dbt-tools/test/cli.test.ts Adds CLI behavior tests via bun spawnSync.
packages/dbt-tools/src/index.ts Implements CLI routing, formatting, error diagnosis, adapter lifecycle.
packages/dbt-tools/src/config.ts Implements config file read/write under ~/.altimate-code/dbt.json.
packages/dbt-tools/src/commands/init.ts Adds init command to discover project + Python and write config.
packages/dbt-tools/src/commands/info.ts Adds info command wrapper.
packages/dbt-tools/src/commands/graph.ts Adds children/parents DAG wrappers.
packages/dbt-tools/src/commands/execute.ts Adds SQL execution wrapper with optional limit.
packages/dbt-tools/src/commands/doctor.ts Adds doctor command to report prerequisite status.
packages/dbt-tools/src/commands/deps.ts Adds deps installation and package add wrappers.
packages/dbt-tools/src/commands/compile.ts Adds model/query compile wrappers.
packages/dbt-tools/src/commands/columns.ts Adds model/source/values column inspection wrappers.
packages/dbt-tools/src/commands/build.ts Adds build/run/test/project wrappers.
packages/dbt-tools/src/check.ts Adds prerequisite checking and validation messaging.
packages/dbt-tools/src/adapter.ts Creates and initializes DBTProjectIntegrationAdapter.
packages/dbt-tools/script/copy-python.ts Copies bundled Python packages from @altimateai/dbt-integration into dist.
packages/dbt-tools/package.json Defines new workspace package, bin entry, build/test scripts.
packages/dbt-tools/bin/altimate-dbt Node bin shim that imports the built CLI entry.
package.json Adds packages/dbt-tools workspace and patches python-bridge@1.1.0.
experiments/spider2_dbt/tracker.py Adds SQLite-backed run/task tracking and comparison commands.
experiments/spider2_dbt/setup_spider2.py Adds one-time setup (clone/download/extract/verify) for Spider2-DBT.
experiments/spider2_dbt/schema_introspect.py Adds DuckDB schema summarization for prompt context.
experiments/spider2_dbt/run_benchmark.py Adds benchmark runner with retries, parallelism, caching, and result aggregation.
experiments/spider2_dbt/requirements.txt Adds Python deps for benchmark tooling.
experiments/spider2_dbt/report.py Adds single-file HTML report generator for evaluation results.
experiments/spider2_dbt/prompt_template.py Builds task prompts including YAML model discovery and DuckDB schema summary.
experiments/spider2_dbt/evaluate_results.py Evaluates benchmark outputs via Spider2 duckdb_match and writes summary JSON.
experiments/spider2_dbt/config.py Centralizes benchmark config (paths, defaults, leaderboard data).
experiments/spider2_dbt/altimate-code-dev.sh Adds local dev wrapper script (currently hardcoded path).
experiments/spider2_dbt/.gitignore Ignores cloned repo/workspace/results artifacts for experiments.
bun.lock Updates lockfile for new workspace/package deps and patched deps.
.opencode/skills/yaml-config/SKILL.md Removes legacy skill doc.
.opencode/skills/model-scaffold/SKILL.md Removes legacy skill doc.
.opencode/skills/medallion-patterns/SKILL.md Removes legacy skill doc.
.opencode/skills/incremental-logic/SKILL.md Removes legacy skill doc.
.opencode/skills/impact-analysis/SKILL.md Removes legacy skill doc.
.opencode/skills/generate-tests/SKILL.md Removes legacy skill doc.
.opencode/skills/dbt-troubleshoot/SKILL.md Adds new dbt-troubleshoot skill doc using altimate-dbt.
.opencode/skills/dbt-troubleshoot/references/test-failures.md Adds troubleshooting reference for test failures.
.opencode/skills/dbt-troubleshoot/references/runtime-errors.md Adds troubleshooting reference for runtime errors.
.opencode/skills/dbt-troubleshoot/references/compilation-errors.md Adds troubleshooting reference for compilation errors.
.opencode/skills/dbt-troubleshoot/references/altimate-dbt-commands.md Adds altimate-dbt command reference (troubleshoot skill).
.opencode/skills/dbt-test/SKILL.md Adds new dbt-test skill doc using altimate-dbt.
.opencode/skills/dbt-test/references/unit-test-guide.md Adds dbt unit testing guide.
.opencode/skills/dbt-test/references/schema-test-patterns.md Adds schema-test patterns reference.
.opencode/skills/dbt-test/references/custom-tests.md Adds custom test patterns reference.
.opencode/skills/dbt-test/references/altimate-dbt-commands.md Adds altimate-dbt command reference (test skill).
.opencode/skills/dbt-docs/SKILL.md Updates dbt docs skill to be altimate-dbt-driven + adds references.
.opencode/skills/dbt-docs/references/documentation-standards.md Adds documentation standards reference.
.opencode/skills/dbt-docs/references/altimate-dbt-commands.md Adds altimate-dbt command reference (docs skill).
.opencode/skills/dbt-develop/SKILL.md Adds new dbt-develop skill doc using altimate-dbt.
.opencode/skills/dbt-develop/references/yaml-generation.md Adds YAML generation reference.
.opencode/skills/dbt-develop/references/medallion-architecture.md Adds medallion architecture reference.
.opencode/skills/dbt-develop/references/layer-patterns.md Adds dbt layering patterns reference.
.opencode/skills/dbt-develop/references/incremental-strategies.md Adds incremental strategies reference.
.opencode/skills/dbt-develop/references/common-mistakes.md Adds common mistakes reference.
.opencode/skills/dbt-develop/references/altimate-dbt-commands.md Adds altimate-dbt command reference (develop skill).
.opencode/skills/dbt-analyze/SKILL.md Adds new dbt-analyze impact analysis skill doc.
.opencode/skills/dbt-analyze/references/lineage-interpretation.md Adds lineage interpretation reference.
.opencode/skills/dbt-analyze/references/altimate-dbt-commands.md Adds altimate-dbt command reference (analyze skill).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +357 to +361
FROM task_results a
FULL OUTER JOIN task_results b ON a.instance_id = b.instance_id AND b.run_id = ?
WHERE a.run_id = ?
ORDER BY instance_id
""", (run2, run1)).fetchall()
Comment on lines +174 to +179
case "children":
result = (await import("./commands/graph")).children(adapter, rest)
break
case "parents":
result = (await import("./commands/graph")).parents(adapter, rest)
break
Comment on lines +3 to +7
export function children(adapter: DBTProjectIntegrationAdapter, args: string[]) {
const model = flag(args, "model")
if (!model) return { error: "Missing --model" }
return adapter.getChildrenModels({ table: model })
}
Comment on lines +9 to +13
export function parents(adapter: DBTProjectIntegrationAdapter, args: string[]) {
const model = flag(args, "model")
if (!model) return { error: "Missing --model" }
return adapter.getParentModels({ table: model })
}
Comment on lines +7 to +10
const raw = flag(args, "limit")
const limit = raw ? parseInt(raw, 10) : undefined
if (limit) return adapter.immediatelyExecuteSQLWithLimit(sql, model, limit)
return adapter.immediatelyExecuteSQL(sql, model)
Comment on lines +17 to +21
function python(): string {
for (const cmd of ["python3", "python"]) {
try {
return execFileSync("which", [cmd], { encoding: "utf-8" }).trim()
} catch {}
Comment on lines +24 to +26
SPIDER2_REPO_URL = "https://github.com/xlang-ai/Spider2.git"
# Pin to a known-good commit for reproducibility
SPIDER2_COMMIT = "main"
Comment on lines +423 to +425
total_elapsed = time.perf_counter() - total_start
skipped = sum(1 for r in results if load_incremental(r["instance_id"]) is not None and r.get("_cached", False))

Comment on lines +1 to +2
#!/bin/bash
exec bun run --cwd /Users/surya/code/altimateai/altimate-code/packages/opencode --conditions=browser src/index.ts "$@"
Comment on lines 44 to 49
import { WarehouseAddTool } from "../altimate/tools/warehouse-add"
import { WarehouseRemoveTool } from "../altimate/tools/warehouse-remove"
import { WarehouseDiscoverTool } from "../altimate/tools/warehouse-discover"
import { DbtRunTool } from "../altimate/tools/dbt-run"

import { DbtManifestTool } from "../altimate/tools/dbt-manifest"
import { DbtProfilesTool } from "../altimate/tools/dbt-profiles"
@dev-punia-altimate
Copy link

✅ Tests — All Passed

TypeScript — passed

Python — passed

Tested at cb8efd79 | Run log | Powered by QA Autopilot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.