diff --git a/README.md b/README.md index 2350a34874..7a9850dcb3 100644 --- a/README.md +++ b/README.md @@ -7,19 +7,18 @@ **The open-source data engineering harness.** -The intelligence layer for data engineering AI — 99+ deterministic tools for SQL analysis, +The intelligence layer for data engineering AI — 100+ deterministic tools for SQL analysis, column-level lineage, dbt, FinOps, and warehouse connectivity across every major cloud platform. Run standalone in your terminal, embed underneath Claude Code or Codex, or integrate into CI pipelines and orchestration DAGs. Precision data tooling for any LLM. -[![npm](https://img.shields.io/npm/v/@altimateai/altimate-code)](https://www.npmjs.com/package/@altimateai/altimate-code) -[![npm](https://img.shields.io/npm/v/@altimateai/altimate-core)](https://www.npmjs.com/package/@altimateai/altimate-core) -[![npm downloads](https://img.shields.io/npm/dm/@altimateai/altimate-code)](https://www.npmjs.com/package/@altimateai/altimate-code) +[![npm](https://img.shields.io/npm/v/altimate-code)](https://www.npmjs.com/package/altimate-code) +[![npm downloads](https://img.shields.io/npm/dm/altimate-code)](https://www.npmjs.com/package/altimate-code) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE) [![CI](https://github.com/AltimateAI/altimate-code/actions/workflows/ci.yml/badge.svg)](https://github.com/AltimateAI/altimate-code/actions/workflows/ci.yml) [![Slack](https://img.shields.io/badge/Slack-Join%20Community-4A154B?logo=slack)](https://altimate.ai/slack) -[![Docs](https://img.shields.io/badge/docs-altimateai.github.io-blue)](https://altimateai.github.io/altimate-code) +[![Docs](https://img.shields.io/badge/docs-docs.altimate.sh-blue)](https://docs.altimate.sh) @@ -29,7 +28,7 @@ into CI pipelines and orchestration DAGs. Precision data tooling for any LLM. ```bash # npm (recommended) -npm install -g @altimateai/altimate-code +npm install -g altimate-code # Homebrew brew install AltimateAI/tap/altimate-code @@ -58,7 +57,7 @@ altimate /discover `/discover` auto-detects dbt projects, warehouse connections (from `~/.dbt/profiles.yml`, Docker, environment variables), and installed tools (dbt, sqlfluff, airflow, dagster, and more). Skip this and start building — you can always run it later. -> **Zero Python setup required.** On first run, the CLI automatically downloads [`uv`](https://github.com/astral-sh/uv), creates an isolated Python environment, and installs the data engine with all warehouse drivers. No `pip install`, no virtualenv management. +> **Zero additional setup.** One command install. ## Why a specialized harness? @@ -162,7 +161,7 @@ Each agent has scoped permissions and purpose-built tools for its role. ## Supported Warehouses -Snowflake · BigQuery · Databricks · PostgreSQL · Redshift · DuckDB · MySQL · SQL Server +Snowflake · BigQuery · Databricks · PostgreSQL · Redshift · DuckDB · MySQL · SQL Server · Oracle · SQLite First-class support with schema indexing, query execution, and metadata introspection. SSH tunneling available for secure connections. @@ -220,10 +219,11 @@ Contributions welcome — docs, SQL rules, warehouse connectors, and TUI improve **[Read CONTRIBUTING.md →](./CONTRIBUTING.md)** -## What's New +## Changelog +- **v0.4.2** (March 2026) — yolo mode, Python engine elimination (all-native TypeScript), tool consolidation, path sandboxing hardening, altimate-dbt CLI, unscoped npm package - **v0.4.1** (March 2026) — env-based skill selection, session caching, tracing improvements -- **v0.4.0** (Feb 2026) — data visualization skill, 99+ tools, training system +- **v0.4.0** (Feb 2026) — data visualization skill, 100+ tools, training system - **v0.3.x** — [See full changelog →](CHANGELOG.md) ## License diff --git a/docs/docs/assets/css/extra.css b/docs/docs/assets/css/extra.css index 5c7c736fa8..4a62c26c0d 100644 --- a/docs/docs/assets/css/extra.css +++ b/docs/docs/assets/css/extra.css @@ -92,7 +92,7 @@ /* --- Feature cards --- */ .grid.cards > ul > li { border-radius: 8px; - padding: 0.8rem !important; + padding: 1rem !important; transition: box-shadow 0.2s ease, transform 0.2s ease; } @@ -112,15 +112,18 @@ /* --- Pill grid (LLM providers, warehouses) --- */ .pill-grid { - max-width: 600px; - margin: 0 auto; + width: 100%; + max-width: 640px; + margin: 1rem auto; + padding: 0 1rem; + box-sizing: border-box; } .pill-grid ul { display: flex; flex-wrap: wrap; justify-content: center; - gap: 0.45rem; + gap: 0.6rem 0.75rem; list-style: none; padding: 0; margin: 0; @@ -129,14 +132,14 @@ .pill-grid ul li { display: inline-flex; align-items: center; - gap: 0.3rem; - padding: 0.4rem 0.85rem; + gap: 0.4rem; + padding: 0.5rem 1.15rem; border-radius: 100px; font-size: 0.8rem; border: 1px solid var(--md-default-fg-color--lightest); color: var(--md-default-fg-color--light); white-space: nowrap; - margin: 0; + flex-shrink: 0; } .pill-grid ul li .twemoji { @@ -147,6 +150,30 @@ border-color: rgba(255, 255, 255, 0.12); } +/* Responsive pill sizing */ +@media (max-width: 768px) { + .pill-grid { + max-width: 100%; + padding: 0 0.75rem; + } + + .pill-grid ul { + gap: 0.5rem 0.6rem; + } + + .pill-grid ul li { + padding: 0.45rem 0.9rem; + font-size: 0.75rem; + } +} + +@media (max-width: 480px) { + .pill-grid ul li { + padding: 0.4rem 0.8rem; + font-size: 0.72rem; + } +} + /* --- Doc links footer --- */ .doc-links { text-align: center; diff --git a/docs/docs/configure/agents.md b/docs/docs/configure/agents.md index d111acb46e..2876080c96 100644 --- a/docs/docs/configure/agents.md +++ b/docs/docs/configure/agents.md @@ -4,26 +4,46 @@ Agents define different AI personas with specific models, prompts, permissions, ## Built-in Agents -### General Purpose +| Agent | Description | Access Level | +|-------|------------|-------------| +| `builder` | Create and modify dbt models, SQL pipelines, and data transformations | Full read/write. SQL mutations prompt for approval. | +| `analyst` | Explore data, run SELECT queries, inspect schemas, generate insights | Read-only (enforced). SQL writes denied. Safe bash commands auto-allowed. | +| `plan` | Plan before acting, restricted to planning files only | Minimal: no edits, no bash, no SQL | -| Agent | Description | -|-------|------------| -| `general` | Default general-purpose coding agent | -| `plan` | Planning agent — analyzes before acting | -| `build` | Build-focused agent — prioritizes code generation | -| `explore` | Read-only exploration agent | +### Builder -### Data Engineering +Full access mode. Can read/write files, run any bash command (with approval), execute SQL, and modify dbt models. SQL write operations (`INSERT`, `UPDATE`, `DELETE`, `CREATE`, etc.) prompt for user approval. Destructive SQL (`DROP DATABASE`, `DROP SCHEMA`, `TRUNCATE`) is hard-blocked. -| Agent | Description | Permissions | -|-------|------------|------------| -| `builder` | Create dbt models, SQL pipelines, transformations | Full read/write | -| `analyst` | Explore data, run SELECT queries, generate insights | Read-only (enforced) | -| `validator` | Data quality checks, schema validation, test coverage | Read + validate | -| `migrator` | Cross-warehouse SQL translation and migration | Read/write for migration | +### Analyst + +Truly read-only mode for safe data exploration: + +- **File access**: Read, grep, glob without prompts +- **SQL**: SELECT queries execute freely. Write queries are denied (not prompted, blocked entirely) +- **Bash**: Safe commands auto-allowed (`ls`, `grep`, `cat`, `head`, `tail`, `find`, `wc`). dbt read commands allowed (`dbt list`, `dbt ls`, `dbt debug`, `dbt deps`). Everything else denied. +- **Web**: Fetch and search allowed without prompts +- **Schema/warehouse/finops**: All inspection tools available !!! tip - Use the `analyst` agent when exploring data to ensure no accidental writes. Switch to `builder` when you are ready to create or modify models. + Use `analyst` when exploring data to ensure no accidental writes. Switch to `builder` when you're ready to create or modify models. + +### Plan + +Planning mode with minimal permissions. Can only read files and edit plan files. No SQL, no bash, no file modifications. + +## SQL Write Access Control + +All SQL queries are classified before execution: + +| Query Type | Builder | Analyst | +|-----------|---------|---------| +| `SELECT`, `SHOW`, `DESCRIBE`, `EXPLAIN` | Allowed | Allowed | +| `INSERT`, `UPDATE`, `DELETE`, `CREATE`, `ALTER` | Prompts for approval | Denied | +| `DROP DATABASE`, `DROP SCHEMA`, `TRUNCATE` | Blocked (cannot override) | Blocked | + +The classifier detects write operations including: `INSERT`, `UPDATE`, `DELETE`, `MERGE`, `CREATE`, `DROP`, `ALTER`, `TRUNCATE`, `GRANT`, `REVOKE`, `COPY INTO`, `CALL`, `EXEC`, `EXECUTE IMMEDIATE`, `BEGIN`, `DECLARE`, `REPLACE`, `UPSERT`, `RENAME`. + +Multi-statement queries (`SELECT 1; INSERT INTO ...`) are classified as write if any statement is a write. ## Custom Agents @@ -86,11 +106,11 @@ You are a Snowflake cost optimization expert. For every query: ``` !!! info - Markdown agent files use YAML frontmatter for configuration and the body as the system prompt. This is a convenient way to define agents without editing your main config file. + Markdown agent files use YAML frontmatter for configuration and the body as the system prompt. ## Agent Permissions -Each agent can have its own permission overrides that restrict or expand the default permissions: +Each agent can have its own permission overrides: ```json { @@ -99,10 +119,11 @@ Each agent can have its own permission overrides that restrict or expand the def "permission": { "write": "deny", "edit": "deny", + "sql_execute_write": "deny", "bash": { - "dbt show *": "allow", + "*": "deny", "dbt list *": "allow", - "*": "deny" + "ls *": "allow" } } } @@ -117,4 +138,4 @@ Each agent can have its own permission overrides that restrict or expand the def - **TUI**: Press leader + `a` or use `/agent ` - **CLI**: `altimate --agent analyst` -- **In conversation**: Type `/agent validator` +- **In conversation**: Type `/agent analyst` diff --git a/docs/docs/configure/commands.md b/docs/docs/configure/commands.md index 02e8d07d26..d09d6a3e91 100644 --- a/docs/docs/configure/commands.md +++ b/docs/docs/configure/commands.md @@ -8,7 +8,7 @@ altimate ships with six built-in slash commands: |---------|-------------| | `/init` | Create or update an AGENTS.md file with build commands and code style guidelines. | | `/discover` | Scan your data stack and set up warehouse connections. Detects dbt projects, warehouse connections from profiles/Docker/env vars, installed tools, and config files. Walks you through adding and testing new connections, then indexes schemas. | -| `/review` | Review changes — accepts `commit`, `branch`, or `pr` as an argument (defaults to uncommitted changes). | +| `/review` | Review changes. Accepts `commit`, `branch`, or `pr` as an argument (defaults to uncommitted changes). | | `/feedback` | Submit product feedback as a GitHub issue. Guides you through title, category, description, and optional session context. | | `/configure-claude` | Configure altimate as a `/altimate` slash command in [Claude Code](https://claude.com/claude-code). Writes `~/.claude/commands/altimate.md` so you can invoke altimate from within Claude Code sessions. | | `/configure-codex` | Configure altimate as a skill in [Codex CLI](https://developers.openai.com/codex). Creates `~/.codex/skills/altimate/SKILL.md` so Codex can delegate data engineering tasks to altimate. | @@ -37,10 +37,10 @@ The recommended way to set up a new data engineering project. Run `/discover` in Submit product feedback directly from the CLI. The agent walks you through: -1. **Title** — a short summary of your feedback -2. **Category** — bug, feature, improvement, or ux -3. **Description** — detailed explanation -4. **Session context** (opt-in) — includes working directory name and session ID for debugging +1. **Title**, a short summary of your feedback +2. **Category**: bug, feature, improvement, or ux +3. **Description** with a detailed explanation +4. **Session context** (opt-in), which includes working directory name and session ID for debugging ``` /feedback # start the guided feedback flow @@ -137,7 +137,7 @@ Commands are loaded from: Press leader + `/` to see all available commands. -## External CLI integration +## External CLI Integration The `/configure-claude` and `/configure-codex` commands write integration files to external CLI tools: diff --git a/docs/docs/configure/config.md b/docs/docs/configure/config.md index a66b8a6633..8c2ed93f7e 100644 --- a/docs/docs/configure/config.md +++ b/docs/docs/configure/config.md @@ -57,7 +57,7 @@ Configuration is loaded from multiple sources, with later sources overriding ear | `skills` | `object` | Skill paths and URLs | | `plugin` | `string[]` | Plugin specifiers | | `instructions` | `string[]` | Glob patterns for instruction files | -| `telemetry` | `object` | Telemetry settings (see [Telemetry](telemetry.md)) | +| `telemetry` | `object` | Telemetry settings (see [Telemetry](../reference/telemetry.md)) | | `compaction` | `object` | Context compaction settings (see [Context Management](context-management.md)) | | `experimental` | `object` | Experimental feature flags | @@ -149,7 +149,7 @@ Control how context is managed when conversations grow long: |-------|---------|-------------| | `auto` | `true` | Auto-compact when context is full | | `prune` | `true` | Prune old tool outputs | -| `reserved` | — | Token buffer to reserve | +| `reserved` | (none) | Token buffer to reserve | !!! info Compaction automatically summarizes older messages to free up context window space, allowing longer conversations without losing important context. See [Context Management](context-management.md) for full details. diff --git a/docs/docs/configure/context-management.md b/docs/docs/configure/context-management.md index 805da651eb..19cb216ab5 100644 --- a/docs/docs/configure/context-management.md +++ b/docs/docs/configure/context-management.md @@ -1,14 +1,14 @@ # Context Management -altimate automatically manages conversation context so you can work through long sessions without hitting model limits. When a conversation grows large, the CLI summarizes older messages, prunes stale tool outputs, and recovers from provider overflow errors — all without losing the important details of your work. +altimate automatically manages conversation context so you can work through long sessions without hitting model limits. When a conversation grows large, the CLI summarizes older messages, prunes stale tool outputs, and recovers from provider overflow errors, all without losing the important details of your work. ## How It Works Every LLM has a finite context window. As you work, each message, tool call, and tool result adds tokens to the conversation. When the conversation approaches the model's limit, altimate takes action: -1. **Prune** — Old tool outputs (file reads, command results, query results) are replaced with compact summaries -2. **Compact** — The entire conversation history is summarized into a continuation prompt -3. **Continue** — The agent picks up where it left off using the summary +1. **Prune.** Old tool outputs (file reads, command results, query results) are replaced with compact summaries +2. **Compact.** The entire conversation history is summarized into a continuation prompt +3. **Continue.** The agent picks up where it left off using the summary This happens automatically by default. You do not need to manually manage context. @@ -38,7 +38,7 @@ When a tool output is pruned, it is replaced with a brief fingerprint: [Tool output cleared — read_file(file: src/main.ts) returned 42 lines, 1.2 KB — "import { App } from './app'"] ``` -This tells the model what tool was called, what arguments were used, how much output it produced, and the first line of the result — enough to maintain continuity without consuming tokens. +This tells the model what tool was called, what arguments were used, how much output it produced, and the first line of the result. That is enough to maintain continuity without consuming tokens. **Pruning rules:** @@ -51,12 +51,12 @@ This tells the model what tool was called, what arguments were used, how much ou Compaction is aware of data engineering workflows. When summarizing a conversation, the compaction prompt preserves: -- **Warehouse connections** — which databases or warehouses are connected -- **Schema context** — discovered tables, columns, and relationships -- **dbt project state** — models, sources, tests, and project structure -- **Lineage findings** — upstream and downstream dependencies -- **Query patterns** — SQL dialects, anti-patterns, and optimization opportunities -- **FinOps context** — cost findings and warehouse sizing recommendations +- **Warehouse connections**, including which databases or warehouses are connected +- **Schema context**, including discovered tables, columns, and relationships +- **dbt project state**, including models, sources, tests, and project structure +- **Lineage findings**, including upstream and downstream dependencies +- **Query patterns**, including SQL dialects, anti-patterns, and optimization opportunities +- **FinOps context**, including cost findings and warehouse sizing recommendations This means you can run a long data exploration session and compaction will not lose track of what schemas you discovered, what dbt models you were working with, or what cost optimizations you identified. diff --git a/docs/docs/configure/governance.md b/docs/docs/configure/governance.md new file mode 100644 index 0000000000..b04a30c44b --- /dev/null +++ b/docs/docs/configure/governance.md @@ -0,0 +1,39 @@ +# Governance + +Most people think of governance as a cost — something you bolt on for compliance. In practice, governance makes agents produce **better results**, not just safer ones. + +LLMs have built-in randomization. Give them too much freedom and they explore dead ends, burn tokens, and produce inconsistent output. Constrain the solution space and they get to correct results faster, in fewer tokens, with more consistency. + +Task-scoped permissions aren't just about safety — they're about **focus**. When an Analyst agent knows it can only `SELECT`, it doesn't waste cycles considering whether to `CREATE` a temp table. When it has prescribed, deterministic tools for tracing lineage instead of trying to figure it out from scratch, the results are the same every time. + +There's an audit angle too. In regulated industries, prescribed tooling eliminates unnecessary audit cycles. When your tools generate SQL the same way every time, auditors can verify consistency. Change the SQL — even if the results are conceptually identical — and you trigger an investigation to prove equivalence. Deterministic tooling removes that overhead entirely. + +Altimate Code enforces governance at the **harness level**, not via prompt instructions the model can ignore. Four mechanisms work together: + +## Rules + +Project rules via `AGENTS.md` files guide agent behavior — coding conventions, naming standards, warehouse policies, and workflow instructions. Rules are loaded automatically from well-known file patterns and merged into the agent's system prompt. Place them at your project root, in subdirectories for scoped guidance, or host them remotely for organization-wide standards. + +[:octicons-arrow-right-24: Rules reference](rules.md) + +## Permissions + +Every tool has a permission level — `allow`, `ask`, or `deny` — configurable globally or per agent. The Analyst agent can't `INSERT`, `UPDATE`, `DELETE`, or `DROP`. That's not a prompt instruction the model can choose to ignore. It's enforced at the tool level. Pattern-based permissions give you fine-grained control: allow `dbt build *` but deny `rm -rf *`. + +[:octicons-arrow-right-24: Permissions reference](permissions.md) + +## Context Management + +Long sessions produce large conversation histories that can exceed model context windows. Altimate Code automatically prunes old tool outputs, compacts conversations into summaries, and recovers from provider overflow errors — all while preserving critical data engineering context like warehouse connections, schema discoveries, lineage findings, and cost analysis results. + +[:octicons-arrow-right-24: Context Management reference](context-management.md) + +## Formatters + +Every file edit is auto-formatted before it's written. This isn't optional consistency — it's enforced consistency. Altimate Code detects file types and runs the appropriate formatter (prettier, ruff, gofmt, sqlfluff, and 20+ others) automatically. The agent can't produce code that violates your formatting standards. + +[:octicons-arrow-right-24: Formatters reference](formatters.md) + +--- + +Together, these four mechanisms mean governance is not an afterthought — it's built into every agent interaction. The harness enforces the rules so your team doesn't have to police the output. diff --git a/docs/docs/configure/index.md b/docs/docs/configure/index.md new file mode 100644 index 0000000000..d2df2d3ed8 --- /dev/null +++ b/docs/docs/configure/index.md @@ -0,0 +1,57 @@ +# Configure + +Set up your warehouses, LLM providers, and preferences. For agents, tools, skills, and commands, see the [Use](../data-engineering/agent-modes.md) section. For rules, permissions, and context management, see [Governance](rules.md). + +## What's in this section + +
+ +- :material-cog:{ .lg .middle } **Config File Reference** + + --- + + JSON configuration file locations, schema, value substitution, and project structure. + + [:octicons-arrow-right-24: Config File](config.md) + +- :material-database:{ .lg .middle } **Warehouses** + + --- + + Connect to Snowflake, BigQuery, Databricks, PostgreSQL, Redshift, DuckDB, MySQL, and SQL Server. Includes key-pair auth, IAM, ADC, and SSH tunneling. + + [:octicons-arrow-right-24: Warehouses](warehouses.md) + +- :material-cloud-outline:{ .lg .middle } **LLMs** + + --- + + Connect to 35+ LLM providers — Anthropic, OpenAI, Bedrock, Ollama, and more. Configure API keys and model selection. + + [:octicons-arrow-right-24: Providers](providers.md) · [:octicons-arrow-right-24: Models](models.md) + +- :material-puzzle:{ .lg .middle } **MCPs & ACPs** + + --- + + Extend Altimate Code with MCP servers (local and remote) and ACP-compatible editor integrations. + + [:octicons-arrow-right-24: MCP Servers](mcp-servers.md) · [:octicons-arrow-right-24: ACP Support](acp.md) + +- :material-palette:{ .lg .middle } **Appearance** + + --- + + Themes, keybinds, and visual customization for the TUI. + + [:octicons-arrow-right-24: Themes](themes.md) · [:octicons-arrow-right-24: Keybinds](keybinds.md) + +- :material-dots-horizontal:{ .lg .middle } **Additional Config** + + --- + + LSP servers, network/proxy settings, and Windows/WSL setup. + + [:octicons-arrow-right-24: LSP Servers](lsp.md) · [:octicons-arrow-right-24: Network](../reference/network.md) · [:octicons-arrow-right-24: Windows / WSL](../reference/windows-wsl.md) + +
diff --git a/docs/docs/configure/keybinds.md b/docs/docs/configure/keybinds.md index 9ce0310028..281986e765 100644 --- a/docs/docs/configure/keybinds.md +++ b/docs/docs/configure/keybinds.md @@ -74,6 +74,12 @@ Override it in your config: | `Ctrl+Z` | Undo | | `Ctrl+Shift+Z` | Redo | +### Prompt + +| Keybind | Action | +|---------|--------| +| Leader + `i` | Enhance prompt (AI-powered rewrite for clarity) | + ### Other | Keybind | Action | diff --git a/docs/docs/configure/permissions.md b/docs/docs/configure/permissions.md index 93d8407821..3b4e7e7557 100644 --- a/docs/docs/configure/permissions.md +++ b/docs/docs/configure/permissions.md @@ -49,7 +49,7 @@ For tools that accept arguments (like `bash`), use pattern matching: } ``` -Patterns are matched in order — **last matching rule wins**. Use `*` as a wildcard. Place your catch-all `"*"` rule first and more specific rules after it. +Patterns are matched in order, and the **last matching rule wins**. Use `*` as a wildcard. Place your catch-all `"*"` rule first and more specific rules after it. For example, with `"*": "ask"` first and `"rm *": "deny"` after it, all `rm` commands are denied while everything else prompts. If you put `"*": "ask"` last, it would override the deny rule. @@ -86,6 +86,7 @@ Override permissions for specific agents: | `grep` | Yes | Search files | | `list` | Yes | List directories | | `bash` | Yes | Shell commands | +| `sql_execute_write` | Yes | SQL write operations (INSERT, UPDATE, DELETE, etc.) | | `task` | Yes | Spawn subagents | | `lsp` | Yes | LSP operations | | `skill` | Yes | Execute skills | @@ -125,7 +126,7 @@ export ALTIMATE_CLI_YOLO=true altimate-code run "analyze my queries" ``` -The fallback `OPENCODE_YOLO` env var is also supported. When both are set, `ALTIMATE_CLI_YOLO` takes precedence — setting it to `false` disables yolo even if `OPENCODE_YOLO=true`. +The fallback `OPENCODE_YOLO` env var is also supported. When both are set, `ALTIMATE_CLI_YOLO` takes precedence. Setting it to `false` disables yolo even if `OPENCODE_YOLO=true`. **Safety:** Explicit `deny` rules in your config are still enforced. Deny rules throw an error *before* any permission prompt is created, so yolo mode never sees them. If you've denied `rm *` or `DROP *`, those remain blocked even with `--yolo`. @@ -133,7 +134,7 @@ When yolo mode is active in the TUI, a `△ YOLO` indicator appears in the foote ## Recommended Configurations -### Data Engineering (Default — Balanced) +### Data Engineering (Default, Balanced) A good starting point for most data engineering workflows. Allows safe read operations, prompts for writes and commands: @@ -205,20 +206,22 @@ Give each agent only the permissions it needs: "permission": { "write": "deny", "edit": "deny", + "sql_execute_write": "deny", "bash": { - "SELECT *": "allow", - "dbt docs *": "allow", - "*": "deny" + "*": "deny", + "ls *": "allow", + "cat *": "allow", + "dbt list *": "allow" } } }, "builder": { "permission": { + "sql_execute_write": "ask", "bash": { "*": "ask", "dbt *": "allow", - "git *": "ask", - "DROP *": "deny" + "rm -rf *": "deny" } } } @@ -230,11 +233,11 @@ Give each agent only the permissions it needs: When the agent wants to use a tool, the permission system evaluates your rules in order: -1. **Config rules** — from `altimate-code.json` -2. **Agent-level rules** — per-agent overrides -3. **Session approvals** — patterns you've approved with "Allow always" during the current session +1. **Config rules** from `altimate-code.json` +2. **Agent-level rules** for per-agent overrides +3. **Session approvals** for patterns you've approved with "Allow always" during the current session -If a rule matches, it applies. If no rule matches, the default is `"ask"` — you'll be prompted. +If a rule matches, it applies. If no rule matches, the default is `"ask"`, which means you'll be prompted. When prompted, you have three choices: @@ -251,4 +254,4 @@ When prompted, you have three choices: - **Start with `"ask"` and relax as you build confidence.** You can always approve patterns with "Allow always" during a session. - **Use `"deny"` for truly dangerous commands** like `rm *`, `DROP *`, `git push --force *`, and `git reset --hard *`. These are blocked even if other rules would allow them. - **Use per-agent permissions** to enforce least-privilege. An analyst doesn't need write access. A builder doesn't need `DROP`. -- **Review the prompt before approving.** The TUI shows you exactly what will run — including diffs for file edits and the full command for bash operations. +- **Review the prompt before approving.** The TUI shows you exactly what will run, including diffs for file edits and the full command for bash operations. diff --git a/docs/docs/configure/providers.md b/docs/docs/configure/providers.md index 0b73800508..a62e96d9e6 100644 --- a/docs/docs/configure/providers.md +++ b/docs/docs/configure/providers.md @@ -69,7 +69,7 @@ Available models: `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5-2025 Uses the standard AWS credential chain. Set `AWS_PROFILE` or provide credentials directly. !!! note - If you have AWS SSO or IAM roles configured, Bedrock will use your default credential chain automatically — no explicit keys needed. + If you have AWS SSO or IAM roles configured, Bedrock will use your default credential chain automatically, so no explicit keys are needed. ## Azure OpenAI @@ -143,7 +143,7 @@ If `location` is not set, it defaults to `us-central1`. } ``` -No API key needed — runs entirely on your local machine. +No API key needed. Runs entirely on your local machine. !!! info Make sure Ollama is running before starting altimate. Install it from [ollama.com](https://ollama.com) and pull your desired model with `ollama pull llama3.1`. diff --git a/docs/docs/configure/rules.md b/docs/docs/configure/rules.md index 892e916418..f2b5e6751e 100644 --- a/docs/docs/configure/rules.md +++ b/docs/docs/configure/rules.md @@ -6,9 +6,9 @@ Rules are instructions that guide agent behavior. They are loaded automatically altimate looks for instruction files in these locations: -- `AGENTS.md` — Primary instruction file (searched up directory tree) -- `CLAUDE.md` — Fallback instruction file -- `.altimate-code/AGENTS.md` — Project-specific instructions +- `AGENTS.md`: Primary instruction file (searched up directory tree) +- `CLAUDE.md`: Fallback instruction file +- `.altimate-code/AGENTS.md`: Project-specific instructions - Custom patterns via the `instructions` config field !!! tip @@ -31,9 +31,9 @@ Specify additional instruction sources in your config: Patterns support: -- **Glob patterns** — `*.md`, `docs/**/*.md` -- **URLs** — fetched at startup -- **Relative paths** — resolved from project root +- **Glob patterns** such as `*.md`, `docs/**/*.md` +- **URLs**, which are fetched at startup +- **Relative paths**, which are resolved from project root ## Writing Effective Rules @@ -57,7 +57,7 @@ This is a dbt project for our analytics warehouse on Snowflake. ``` !!! example "Tips for effective rules" - - Be specific and actionable — vague rules get ignored + - Be specific and actionable, since vague rules get ignored - Include project-specific terminology and conventions - Reference file paths and commands that agents should use - Keep rules concise; overly long instructions dilute focus diff --git a/docs/docs/configure/skills.md b/docs/docs/configure/skills.md index 66801fd2b1..6a807cce3f 100644 --- a/docs/docs/configure/skills.md +++ b/docs/docs/configure/skills.md @@ -67,9 +67,68 @@ Skills are loaded from these locations (in priority order): ## Built-in Data Engineering Skills -altimate ships with built-in skills for common data engineering tasks. Skills are loaded and surfaced dynamically at runtime — type `/` in the TUI to browse what's available and get autocomplete on skill names. +altimate ships with built-in skills for common data engineering tasks. Type `/` in the TUI to browse what's available and get autocomplete on skill names. + +| Skill | Description | +|-------|-------------| +| `/sql-review` | SQL quality gate that lints 26 anti-patterns, validates syntax, and checks safety | +| `/sql-translate` | Cross-dialect SQL translation | +| `/schema-migration` | Schema migration planning and execution | +| `/pii-audit` | PII detection and compliance audits | +| `/cost-report` | Snowflake FinOps analysis | +| `/lineage-diff` | Column-level lineage comparison | +| `/query-optimize` | Query optimization suggestions | +| `/data-viz` | Interactive data visualization and dashboards | +| `/dbt-develop` | dbt model development and scaffolding | +| `/dbt-test` | dbt test generation | +| `/dbt-docs` | dbt documentation generation | +| `/dbt-analyze` | dbt project analysis | +| `/dbt-troubleshoot` | dbt issue diagnosis | +| `/teach` | Teach patterns from example files | +| `/train` | Learn standards from documents/style guides | +| `/training-status` | Dashboard of all learned knowledge | + +## Adding Custom Skills + +Add your own skills as Markdown files in `.altimate-code/skill/`: -For custom skills, see [Adding Custom Skills](#adding-custom-skills) below. +```markdown +--- +name: cost-review +description: Review SQL queries for cost optimization +--- + +Analyze the SQL query for cost optimization opportunities. +Focus on: $ARGUMENTS +``` + +`$ARGUMENTS` is replaced with whatever the user types after the skill name (e.g., `/cost-review SELECT * FROM orders` passes `SELECT * FROM orders`). + +Skills are loaded from these paths (highest priority first): + +1. `.altimate-code/skill/` (project) +2. `~/.altimate-code/skills/` (global) +3. Custom paths via config: + +```json +{ + "skills": { + "paths": ["./my-skills", "~/shared-skills"] + } +} +``` + +### Remote Skills + +Host skills at a URL and load them at startup: + +```json +{ + "skills": { + "urls": ["https://example.com/skills-registry.json"] + } +} +``` ## Disabling External Skills diff --git a/docs/docs/configure/tools.md b/docs/docs/configure/tools.md index 9069b54c21..1149312866 100644 --- a/docs/docs/configure/tools.md +++ b/docs/docs/configure/tools.md @@ -24,7 +24,7 @@ altimate includes built-in tools that agents use to interact with your codebase ## Data Engineering Tools -In addition to built-in tools, altimate provides 55+ specialized data engineering tools. See the [Data Engineering Tools](../data-engineering/tools/index.md) section for details. +In addition to built-in tools, altimate provides 100+ specialized data engineering tools. See the [Data Engineering Tools](../data-engineering/tools/index.md) section for details. ## Tool Permissions @@ -98,9 +98,9 @@ The `bash` tool executes shell commands in the project directory. Commands run i File tools respect the project boundaries and permission settings: -- **`read`** — Reads file contents, supports line ranges -- **`write`** — Creates or overwrites entire files -- **`edit`** — Surgical find-and-replace edits within files +- **`read`** reads file contents and supports line ranges +- **`write`** creates or overwrites entire files +- **`edit`** performs surgical find-and-replace edits within files ### LSP Tool diff --git a/docs/docs/configure/tools/config.md b/docs/docs/configure/tools/config.md new file mode 100644 index 0000000000..636a229ff8 --- /dev/null +++ b/docs/docs/configure/tools/config.md @@ -0,0 +1,54 @@ +# Tools + +altimate includes built-in tools that agents use to interact with your codebase and environment. + +## Built-in Tools + +| Tool | Description | +|------|------------| +| `bash` | Execute shell commands | +| `read` | Read file contents | +| `edit` | Edit files with find-and-replace | +| `write` | Create or overwrite files | +| `glob` | Find files by pattern | +| `grep` | Search file contents with regex | +| `list` | List directory contents | +| `patch` | Apply multi-file patches | +| `lsp` | Language server operations (diagnostics, completions) | +| `webfetch` | Fetch and process web pages | +| `websearch` | Search the web | +| `question` | Ask the user a question | +| `todo_read` | Read task list | +| `todo_write` | Create/update tasks | +| `skill` | Execute a skill | + +## Data Engineering Tools + +In addition to built-in tools, altimate provides 100+ specialized data engineering tools. See the [Data Engineering Tools](index.md) section for details. + +## Tool Permissions + +Control which tools agents can use via the [permission system](../permissions.md). For full details, pattern-based rules, and recommended configurations, see the [Permissions reference](../permissions.md). + +## Tool Behavior + +### Bash Tool + +The `bash` tool executes shell commands in the project directory. Commands run in a non-interactive shell with the user's environment. + +### Read / Write / Edit Tools + +File tools respect the project boundaries and permission settings: + +- **`read`** — Reads file contents, supports line ranges +- **`write`** — Creates or overwrites entire files +- **`edit`** — Surgical find-and-replace edits within files + +### LSP Tool + +When [LSP servers](../lsp.md) are configured, the `lsp` tool provides: + +- Diagnostics (errors, warnings) +- Go-to-definition +- Hover information +- Completions diff --git a/docs/docs/configure/tools/core-tools.md b/docs/docs/configure/tools/core-tools.md new file mode 100644 index 0000000000..ef0fc3716b --- /dev/null +++ b/docs/docs/configure/tools/core-tools.md @@ -0,0 +1,281 @@ +# Core Tools + +The `altimate_core_*` tools are powered by a Rust-based SQL engine that provides fast, deterministic analysis without LLM calls. These tools handle validation, linting, safety scanning, lineage, formatting, and more. + +## Analysis & Validation + +### altimate_core_check + +Run the full analysis pipeline — validate + lint + safety scan + PII check — in a single call. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_validate + +Validate SQL syntax and schema references. Checks if tables and columns exist in the schema and if SQL is valid for the target dialect. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_lint + +Lint SQL for anti-patterns — NULL comparisons, implicit casts, unused CTEs, and dialect-specific problems. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_grade + +Grade SQL quality on an A–F scale. Evaluates readability, performance, correctness, and best practices. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +## Safety & Governance + +### altimate_core_safety + +Scan SQL for injection patterns, dangerous statements (DROP, TRUNCATE), and security threats. + +**Parameters:** `sql` (required) + +--- + +### altimate_core_is_safe + +Quick boolean safety check — returns true/false indicating whether SQL is safe to execute. + +**Parameters:** `sql` (required) + +--- + +### altimate_core_policy + +Check SQL against YAML-based governance policy guardrails. Validates compliance with custom rules like allowed tables, forbidden operations, and data access restrictions. + +**Parameters:** `sql` (required), `policy_json` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_classify_pii + +Classify PII columns in a schema by name patterns and data types. Identifies columns likely containing personal identifiable information. + +**Parameters:** `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_query_pii + +Analyze query-level PII exposure. Checks if a SQL query accesses columns classified as PII and reports the exposure risk. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +## SQL Transformation + +### altimate_core_fix + +Auto-fix SQL errors using fuzzy matching and iterative re-validation to correct syntax errors, typos, and schema reference issues. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional), `max_iterations` (optional) + +--- + +### altimate_core_correct + +Iteratively correct SQL using a propose-verify-refine loop. More thorough than `fix` — applies multiple correction rounds. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_format + +Format SQL with dialect-aware keyword casing and indentation. Fast and deterministic. + +**Parameters:** `sql` (required), `dialect` (optional) + +--- + +### altimate_core_rewrite + +Suggest query optimization rewrites — analyzes SQL and proposes concrete rewrites for better performance. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_transpile + +Translate SQL between dialects using the Rust engine. + +**Parameters:** `sql` (required), `source_dialect` (required), `target_dialect` (required) + +--- + +## Comparison & Equivalence + +### altimate_core_compare + +Structurally compare two SQL queries. Identifies differences in table references, join conditions, filters, projections, and aggregations. + +**Parameters:** `left_sql` (required), `right_sql` (required), `dialect` (optional) + +--- + +### altimate_core_equivalence + +Check semantic equivalence of two SQL queries — determines if they produce the same result set regardless of syntactic differences. + +**Parameters:** `sql1` (required), `sql2` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +## Lineage & Metadata + +### altimate_core_column_lineage + +Trace schema-aware column lineage. Maps how columns flow through a query from source tables to output. + +**Parameters:** `sql` (required), `dialect` (optional), `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_track_lineage + +Track lineage across multiple SQL statements. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_extract_metadata + +Extract metadata from SQL — identifies tables, columns, functions, CTEs, and other structural elements referenced in a query. + +**Parameters:** `sql` (required), `dialect` (optional) + +--- + +### altimate_core_resolve_term + +Resolve a business glossary term to schema elements using fuzzy matching. Maps human-readable terms like "revenue" or "customer" to actual table/column names. + +**Parameters:** `term` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_semantics + +Analyze semantic meaning of SQL elements. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +## Schema Operations + +### altimate_core_schema_diff + +Diff two schema versions to detect structural changes. + +**Parameters:** `old_ddl` (required), `new_ddl` (required), `dialect` (optional) + +--- + +### altimate_core_migration + +Analyze DDL migration safety. Detects potential data loss, type narrowing, missing defaults, and other risks in schema migration statements. + +**Parameters:** `old_ddl` (required), `new_ddl` (required), `dialect` (optional) + +--- + +### altimate_core_export_ddl + +Export a YAML/JSON schema as CREATE TABLE DDL statements. + +**Parameters:** `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_import_ddl + +Convert CREATE TABLE DDL into a structured YAML schema definition that other core tools can consume. + +**Parameters:** `ddl` (required), `dialect` (optional) + +--- + +### altimate_core_fingerprint + +Compute a SHA-256 fingerprint of a schema. Useful for cache invalidation and change detection. + +**Parameters:** `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_introspection_sql + +Generate INFORMATION_SCHEMA introspection queries for a given database type. Supports postgres, bigquery, snowflake, mysql, mssql, redshift. + +**Parameters:** `db_type` (required), `database` (required), `schema_name` (optional) + +--- + +## Context Optimization + +### altimate_core_optimize_context + +Optimize schema for LLM context window. Applies 5-level progressive disclosure to reduce schema size while preserving essential information. + +**Parameters:** `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_optimize_for_query + +Prune schema to only tables and columns relevant to a specific query. Reduces context size for LLM prompts. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_prune_schema + +Filter schema to only tables and columns referenced by a SQL query. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +## dbt & Autocomplete + +### altimate_core_parse_dbt + +Parse a dbt project directory. Extracts models, sources, tests, and project structure for analysis. + +**Parameters:** `project_dir` (required) + +--- + +### altimate_core_complete + +Get cursor-aware SQL completion suggestions. Returns table names, column names, functions, and keywords relevant to the cursor position. + +**Parameters:** `sql` (required), `cursor_pos` (required), `schema_path` (optional), `schema_context` (optional) + +--- + +### altimate_core_testgen + +Generate test cases for SQL queries. + +**Parameters:** `sql` (required), `schema_path` (optional), `schema_context` (optional) diff --git a/docs/docs/configure/tools/custom.md b/docs/docs/configure/tools/custom.md new file mode 100644 index 0000000000..18f121070c --- /dev/null +++ b/docs/docs/configure/tools/custom.md @@ -0,0 +1,94 @@ +# Custom Tools + +Create custom tools using TypeScript and the altimate plugin system. + +## Quick Start + +1. Create a tools directory: + +```bash +mkdir -p .altimate-code/tools +``` + +2. Create a tool file: + +```typescript +// .altimate-code/tools/my-tool.ts +import { defineTool } from "@altimateai/altimate-code-plugin/tool" +import { z } from "zod" + +export default defineTool({ + name: "my_custom_tool", + description: "Does something useful", + parameters: z.object({ + input: z.string().describe("The input to process"), + }), + async execute({ input }) { + // Your tool logic here + return { result: `Processed: ${input}` } + }, +}) +``` + +## Plugin Package + +For more complex tools, create a plugin package: + +```bash +npm init +npm install @altimateai/altimate-code-plugin zod +``` + +```typescript +// index.ts +import { definePlugin } from "@altimateai/altimate-code-plugin" +import { z } from "zod" + +export default definePlugin({ + name: "my-plugin", + tools: [ + { + name: "analyze_costs", + description: "Analyze warehouse costs", + parameters: z.object({ + warehouse: z.string(), + days: z.number().default(30), + }), + async execute({ warehouse, days }) { + // Implementation + return { costs: [] } + }, + }, + ], +}) +``` + +## Registering Plugins + +Add plugins to your config: + +```json +{ + "plugin": [ + "@altimateai/altimate-code-plugin-example", + "./my-local-plugin" + ] +} +``` + +## Plugin Hooks + +Plugins can hook into 30+ lifecycle events: + +- `onSessionStart` / `onSessionEnd` +- `onMessage` / `onResponse` +- `onToolCall` / `onToolResult` +- `onFileEdit` / `onFileWrite` +- `onError` +- And more... + +## Disabling Default Plugins + +```bash +export ALTIMATE_CLI_DISABLE_DEFAULT_PLUGINS=true +``` diff --git a/docs/docs/configure/tools/index.md b/docs/docs/configure/tools/index.md new file mode 100644 index 0000000000..7c1a387606 --- /dev/null +++ b/docs/docs/configure/tools/index.md @@ -0,0 +1,17 @@ +# Tools Reference + +Altimate Code has 100+ specialized tools organized by function. + +| Category | Tools | Purpose | +|---|---|---| +| [Built-in Tools](config.md) | 14 tools | File operations, search, shell, subagents, and other core agent tools | +| [Core Tools](core-tools.md) | 28 tools | Rust-based SQL engine — validation, linting, safety, lineage, formatting, PII, governance | +| [SQL Tools](../../data-engineering/tools/sql-tools.md) | 10 tools | Analysis, optimization, translation, formatting, cost prediction | +| [Schema Tools](../../data-engineering/tools/schema-tools.md) | 7 tools | Inspection, search, PII detection, tagging, diffing | +| [FinOps Tools](../../data-engineering/tools/finops-tools.md) | 8 tools | Cost analysis, warehouse sizing, unused resources, RBAC | +| [Lineage Tools](../../data-engineering/tools/lineage-tools.md) | 1 tool | Column-level lineage tracing with confidence scoring | +| [dbt Tools](../../data-engineering/tools/dbt-tools.md) | 4 tools + 11 skills | Run, manifest, lineage, profiles, test generation, scaffolding | +| [Warehouse Tools](../../data-engineering/tools/warehouse-tools.md) | 6 tools | Environment scanning, connection management, discovery, testing | +| [Custom Tools](custom.md) | — | Build your own tools with TypeScript plugins | + +All tools are available in the interactive TUI. The agent automatically selects the right tools based on your request. diff --git a/docs/docs/configure/tracing.md b/docs/docs/configure/tracing.md index 2b09eb0969..f23b914bf6 100644 --- a/docs/docs/configure/tracing.md +++ b/docs/docs/configure/tracing.md @@ -1,13 +1,13 @@ # Tracing -Altimate Code captures detailed traces of every headless session — LLM generations, tool calls, token usage, cost, and timing — and saves them locally as JSON files. Traces are invaluable for debugging agent behavior, optimizing cost, and understanding how the agent solves problems. +Altimate Code captures detailed traces of every headless session, including LLM generations, tool calls, token usage, cost, and timing, and saves them locally as JSON files. Traces are invaluable for debugging agent behavior, optimizing cost, and understanding how the agent solves problems. Tracing is **enabled by default** and requires no configuration. Traces are stored locally and never leave your machine unless you configure a remote exporter. ## Quick Start ```bash -# Run a prompt — trace is saved automatically +# Run a prompt (trace is saved automatically) altimate-code run "optimize my most expensive queries" # → Trace saved: ~/.local/share/altimate-code/traces/abc123.json @@ -44,7 +44,7 @@ When using SQL and dbt tools, traces automatically capture domain-specific data: | **Data Quality** | Row counts, null percentages, freshness, anomaly detection | | **Cost Attribution** | LLM cost + warehouse compute cost + storage delta = total cost, per user/team/project | -These attributes are purely optional — traces are valid without them. They're populated automatically by tools that have access to warehouse metadata. +These attributes are purely optional. Traces are valid without them. They're populated automatically by tools that have access to warehouse metadata. ## Configuration @@ -115,9 +115,9 @@ altimate-code trace view Opens a local web server with an interactive trace viewer in your browser. The viewer shows: -- **Summary cards** — duration, token breakdown (input/output/reasoning/cache), cost, generations, tool calls, status -- **Timeline** — horizontal bars for each span, color-coded by type (generation, tool, error) -- **Detail panel** — click any span to see its model info, token counts, finish reason, input/output, and domain-specific attributes (warehouse metrics, dbt results, etc.) +- **Summary cards** showing duration, token breakdown (input/output/reasoning/cache), cost, generations, tool calls, status +- **Timeline** with horizontal bars for each span, color-coded by type (generation, tool, error) +- **Detail panel** where you click any span to see its model info, token counts, finish reason, input/output, and domain-specific attributes (warehouse metrics, dbt results, etc.) Options: @@ -126,11 +126,11 @@ Options: | `--port` | Port for the viewer server (default: random) | | `--live` | Auto-refresh every 2s for in-progress sessions | -Partial session ID matching is supported — `altimate-code trace view abc` matches `abc123def456`. +Partial session ID matching is supported. For example, `altimate-code trace view abc` matches `abc123def456`. ### Live Viewing (In-Progress Sessions) -Traces are written incrementally — after every tool call and generation, a snapshot is flushed to disk. This means you can view a trace while the session is still running: +Traces are written incrementally. After every tool call and generation, a snapshot is flushed to disk. This means you can view a trace while the session is still running: ```bash # In terminal 1: run a long task @@ -178,7 +178,7 @@ Traces can be sent to remote backends via HTTP POST. Each exporter receives the - A failing exporter never blocks local file storage or other exporters - If the server responds with `{ "url": "..." }`, the URL is displayed to the user - Exporters have a 10-second timeout -- All export operations are best-effort — they never crash the CLI +- All export operations are best-effort and never crash the CLI ## Trace File Format @@ -296,13 +296,13 @@ All domain-specific attributes use the `de.*` prefix and are stored in the `attr Traces are designed to survive process crashes: -1. **Immediate snapshot** — A trace file is written as soon as `startTrace()` is called, before any LLM interaction. Even if the process crashes immediately, a minimal trace file exists. +1. **Immediate snapshot.** A trace file is written as soon as `startTrace()` is called, before any LLM interaction. Even if the process crashes immediately, a minimal trace file exists. -2. **Incremental snapshots** — After every tool call and generation completion, the trace file is updated atomically (write to temp file, then rename). The file on disk always contains a valid, complete JSON document. +2. **Incremental snapshots.** After every tool call and generation completion, the trace file is updated atomically (write to temp file, then rename). The file on disk always contains a valid, complete JSON document. -3. **Crash handlers** — The `run` command registers `SIGINT`/`SIGTERM`/`beforeExit` handlers that flush the trace synchronously with a `"crashed"` status. +3. **Crash handlers.** The `run` command registers `SIGINT`/`SIGTERM`/`beforeExit` handlers that flush the trace synchronously with a `"crashed"` status. -4. **Status indicators** — Trace status tells you exactly what happened: +4. **Status indicators.** Trace status tells you exactly what happened: | Status | Meaning | |--------|---------| @@ -335,7 +335,7 @@ Traces are stored **locally only** by default. They contain: - Tool inputs and outputs (SQL queries, file contents, command results) - Model responses -If you configure remote exporters, trace data is sent to those endpoints. No trace data is included in the anonymous telemetry described in [Telemetry](telemetry.md). +If you configure remote exporters, trace data is sent to those endpoints. No trace data is included in the anonymous telemetry described in [Telemetry](../reference/telemetry.md). !!! warning "Sensitive Data" Traces may contain SQL queries, file paths, and command outputs from your session. If you share trace files or configure remote exporters, be aware that this data will be included. diff --git a/docs/docs/configure/warehouses.md b/docs/docs/configure/warehouses.md new file mode 100644 index 0000000000..4185576ba7 --- /dev/null +++ b/docs/docs/configure/warehouses.md @@ -0,0 +1,350 @@ +# Warehouses + +Altimate Code connects to 8 warehouse types. Configure them in the `warehouses` section of your config file or in `.altimate-code/connections.json`. + +## Configuration + +Each warehouse has a key (the connection name) and a config object: + +```json +{ + "warehouses": { + "my-connection-name": { + "type": "", + ... + } + } +} +``` + +!!! tip + Use `{env:...}` substitution for passwords and tokens so you never commit secrets to version control. + +## Snowflake + +```json +{ + "warehouses": { + "prod-snowflake": { + "type": "snowflake", + "account": "xy12345.us-east-1", + "user": "analytics_user", + "password": "{env:SNOWFLAKE_PASSWORD}", + "warehouse": "COMPUTE_WH", + "database": "ANALYTICS", + "role": "ANALYST_ROLE" + } + } +} +``` + +| Field | Required | Description | +|-------|----------|-------------| +| `account` | Yes | Snowflake account identifier (e.g. `xy12345.us-east-1`) | +| `user` | Yes | Username | +| `password` | Auth | Password (use one auth method) | +| `private_key_path` | Auth | Path to private key file (alternative to password) | +| `private_key_passphrase` | No | Passphrase for encrypted private key | +| `warehouse` | No | Warehouse name | +| `database` | No | Database name | +| `schema` | No | Schema name | +| `role` | No | User role | + +### Key-pair authentication + +```json +{ + "warehouses": { + "prod-snowflake": { + "type": "snowflake", + "account": "xy12345.us-east-1", + "user": "svc_altimate", + "private_key_path": "~/.ssh/snowflake_rsa_key.p8", + "private_key_passphrase": "{env:SNOWFLAKE_KEY_PASSPHRASE}", + "warehouse": "COMPUTE_WH", + "database": "ANALYTICS", + "role": "TRANSFORM_ROLE" + } + } +} +``` + +## BigQuery + +```json +{ + "warehouses": { + "bigquery-prod": { + "type": "bigquery", + "project": "my-gcp-project", + "credentials_path": "/path/to/service-account.json", + "location": "US" + } + } +} +``` + +| Field | Required | Description | +|-------|----------|-------------| +| `project` | Yes | Google Cloud project ID | +| `credentials_path` | No | Path to service account JSON file. Omit to use Application Default Credentials (ADC) | +| `location` | No | Default location (default: `US`) | + +### Using Application Default Credentials + +If you're already authenticated via `gcloud`, omit `credentials_path`: + +```json +{ + "warehouses": { + "bigquery-prod": { + "type": "bigquery", + "project": "my-gcp-project" + } + } +} +``` + +## Databricks + +```json +{ + "warehouses": { + "databricks-prod": { + "type": "databricks", + "server_hostname": "adb-1234567890.1.azuredatabricks.net", + "http_path": "/sql/1.0/warehouses/abcdef1234567890", + "access_token": "{env:DATABRICKS_TOKEN}", + "catalog": "main", + "schema": "default" + } + } +} +``` + +| Field | Required | Description | +|-------|----------|-------------| +| `server_hostname` | Yes | Databricks workspace hostname | +| `http_path` | Yes | HTTP path from compute resources | +| `access_token` | Yes | Personal Access Token (PAT) | +| `catalog` | No | Unity Catalog name | +| `schema` | No | Schema/database name | + +## PostgreSQL + +```json +{ + "warehouses": { + "my-postgres": { + "type": "postgres", + "host": "localhost", + "port": 5432, + "database": "analytics", + "user": "analyst", + "password": "{env:PG_PASSWORD}" + } + } +} +``` + +| Field | Required | Description | +|-------|----------|-------------| +| `connection_string` | No | Full connection string (alternative to individual fields) | +| `host` | No | Hostname (default: `localhost`) | +| `port` | No | Port (default: `5432`) | +| `database` | No | Database name (default: `postgres`) | +| `user` | No | Username | +| `password` | No | Password | + +### Using a connection string + +```json +{ + "warehouses": { + "my-postgres": { + "type": "postgres", + "connection_string": "postgresql://analyst:secret@localhost:5432/analytics" + } + } +} +``` + +## Redshift + +```json +{ + "warehouses": { + "redshift-prod": { + "type": "redshift", + "host": "my-cluster.abc123.us-east-1.redshift.amazonaws.com", + "port": 5439, + "database": "analytics", + "user": "admin", + "password": "{env:REDSHIFT_PASSWORD}" + } + } +} +``` + +| Field | Required | Description | +|-------|----------|-------------| +| `connection_string` | No | Full connection string (alternative to individual fields) | +| `host` | No | Hostname | +| `port` | No | Port (default: `5439`) | +| `database` | No | Database name (default: `dev`) | +| `user` | No | Username | +| `password` | No | Password | +| `iam_role` | No | IAM role ARN (alternative to password) | +| `region` | No | AWS region (default: `us-east-1`) | +| `cluster_identifier` | No | Cluster identifier (required for IAM auth) | + +### IAM authentication + +```json +{ + "warehouses": { + "redshift-prod": { + "type": "redshift", + "host": "my-cluster.abc123.us-east-1.redshift.amazonaws.com", + "database": "analytics", + "user": "admin", + "iam_role": "arn:aws:iam::123456789012:role/RedshiftReadOnly", + "cluster_identifier": "my-cluster", + "region": "us-east-1" + } + } +} +``` + +## DuckDB + +```json +{ + "warehouses": { + "dev-duckdb": { + "type": "duckdb", + "path": "./dev.duckdb" + } + } +} +``` + +| Field | Required | Description | +|-------|----------|-------------| +| `path` | No | Database file path. Omit or use `":memory:"` for in-memory | + +## MySQL + +```json +{ + "warehouses": { + "mysql-prod": { + "type": "mysql", + "host": "localhost", + "port": 3306, + "database": "analytics", + "user": "analyst", + "password": "{env:MYSQL_PASSWORD}" + } + } +} +``` + +| Field | Required | Description | +|-------|----------|-------------| +| `host` | No | Hostname (default: `localhost`) | +| `port` | No | Port (default: `3306`) | +| `database` | No | Database name | +| `user` | No | Username | +| `password` | No | Password | +| `ssl_ca` | No | Path to CA certificate file | +| `ssl_cert` | No | Path to client certificate file | +| `ssl_key` | No | Path to client key file | + +## SQL Server + +```json +{ + "warehouses": { + "sqlserver-prod": { + "type": "sqlserver", + "host": "localhost", + "port": 1433, + "database": "analytics", + "user": "sa", + "password": "{env:MSSQL_PASSWORD}" + } + } +} +``` + +| Field | Required | Description | +|-------|----------|-------------| +| `host` | No | Hostname (default: `localhost`) | +| `port` | No | Port (default: `1433`) | +| `database` | No | Database name | +| `user` | No | Username | +| `password` | No | Password | +| `driver` | No | ODBC driver name (default: `ODBC Driver 18 for SQL Server`) | +| `azure_auth` | No | Use Azure AD authentication (default: `false`) | +| `trust_server_certificate` | No | Trust server certificate without validation (default: `false`) | + +## SSH Tunneling + +All warehouse types support SSH tunneling for connections behind a bastion host: + +```json +{ + "warehouses": { + "prod-via-bastion": { + "type": "postgres", + "host": "10.0.1.50", + "database": "analytics", + "user": "analyst", + "password": "{env:PG_PASSWORD}", + "ssh_host": "bastion.example.com", + "ssh_port": 22, + "ssh_user": "ubuntu", + "ssh_auth_type": "key", + "ssh_key_path": "~/.ssh/id_rsa" + } + } +} +``` + +| Field | Required | Description | +|-------|----------|-------------| +| `ssh_host` | Yes | SSH bastion hostname | +| `ssh_port` | No | SSH port (default: `22`) | +| `ssh_user` | Yes | SSH username | +| `ssh_auth_type` | No | `"key"` or `"password"` | +| `ssh_key_path` | No | Path to SSH private key | +| `ssh_password` | No | SSH password | + +## Auto-Discovery + +The `/discover` command can automatically detect warehouse connections from: + +| Source | Detection | +|--------|-----------| +| dbt profiles | Parses `~/.dbt/profiles.yml` | +| Docker containers | Finds running PostgreSQL, MySQL, and SQL Server containers | +| Environment variables | Scans for `SNOWFLAKE_ACCOUNT`, `PGHOST`, `DATABRICKS_HOST`, etc. | + +See [Warehouse Tools](../data-engineering/tools/warehouse-tools.md) for the full list of environment variable signals. + +## Testing Connections + +After configuring a warehouse, verify it works: + +``` +> warehouse_test prod-snowflake + +Testing connection to prod-snowflake (snowflake)... + ✓ Connected successfully + Account: xy12345.us-east-1 + User: analytics_user + Role: ANALYST_ROLE + Warehouse: COMPUTE_WH + Database: ANALYTICS +``` diff --git a/docs/docs/data-engineering/agent-modes.md b/docs/docs/data-engineering/agent-modes.md index 6290e16760..97e612edcc 100644 --- a/docs/docs/data-engineering/agent-modes.md +++ b/docs/docs/data-engineering/agent-modes.md @@ -1,16 +1,12 @@ # Agent Modes -altimate runs in one of seven specialized modes. Each mode has different permissions, tool access, and behavioral guardrails. +altimate runs in one of three specialized modes. Each mode has different permissions, tool access, and behavioral guardrails. | Mode | Access | Purpose | |---|---|---| | **Builder** | Read/Write | Create and modify data pipelines | | **Analyst** | Read-only | Safe exploration and cost analysis | -| **Validator** | Read + Validate | Data quality and integrity checks | -| **Migrator** | Cross-warehouse | Dialect translation and migration | -| **Researcher** | Read-only + Parallel | Deep multi-step investigations | -| **Trainer** | Read-only + Training | Teach your AI teammate | -| **Executive** | Read-only | Business-friendly reporting (no SQL jargon) | +| **Plan** | Minimal | Planning only, no edits or execution | ## Builder @@ -20,11 +16,9 @@ altimate runs in one of seven specialized modes. Each mode has different permiss altimate --agent builder ``` -Builder mode follows a strict pre-execution protocol for every SQL operation: +> Tip: `--yolo` auto-approves permission prompts for faster iteration (`altimate --yolo --agent builder`). Not recommended with live warehouse connections. Use on local/dev environments only. See [Permissions: Yolo Mode](../configure/permissions.md#yolo-mode). -1. `sql_analyze` — Check for anti-patterns -2. `sql_validate` — Verify syntax and schema references -3. `sql_execute` — Run the query +Builder mode classifies every SQL query before execution. Read queries run freely. Write queries (`INSERT`, `UPDATE`, `DELETE`, `CREATE`, `ALTER`) prompt for approval. Destructive SQL (`DROP DATABASE`, `DROP SCHEMA`, `TRUNCATE`) is hard-blocked and cannot be overridden. ### Example: Create a staging model @@ -70,7 +64,7 @@ I'll create a staging model with proper typing, deduplication, and column naming ### What builder can do - Create and edit SQL files, dbt models, YAML configs -- Execute SQL (DDL/DML/DQL) +- Execute SQL (DDL/DML/DQL) with write approval prompts - Run dbt commands - Generate tests and documentation - Scaffold new models from templates @@ -87,10 +81,11 @@ altimate --agent analyst Analyst mode enforces strict guardrails: -- **SELECT only** — INSERT, UPDATE, DELETE, DROP are blocked by altimate-core -- **Cost-first** — Every query gets a cost prediction before execution -- **LIMIT enforced** — Large scans are automatically limited -- **No file writes** — Cannot modify your codebase +- **SELECT only.** INSERT, UPDATE, DELETE, DROP are denied (not prompted, blocked entirely) +- **No file writes.** Cannot modify your codebase +- **Safe bash only.** `ls`, `grep`, `cat`, `head`, `tail`, `find`, `wc` auto-allowed. dbt read commands allowed (`dbt list`, `dbt ls`, `dbt debug`, `dbt deps`). Everything else denied. +- **Full web access.** Fetch and search allowed without prompts +- **All inspection tools.** Schema, warehouse, finops, lineage, and core analysis tools available ### Example: Investigate expensive queries @@ -105,9 +100,9 @@ Analyst: Let me pull the query history and credit analysis. Daily trend: ↑ 15% vs last week > finops_expensive_queries prod-snowflake --days 7 --limit 5 - 1. SELECT * FROM events JOIN users... — 89.3 credits, 47 runs - 2. CREATE TABLE AS SELECT... — 45.1 credits, 1 run - 3. SELECT COUNT(DISTINCT...) — 23.7 credits, 312 runs + 1. SELECT * FROM events JOIN users... (89.3 credits, 47 runs) + 2. CREATE TABLE AS SELECT... (45.1 credits, 1 run) + 3. SELECT COUNT(DISTINCT...) (23.7 credits, 312 runs) The #1 cost driver is a query running 47 times that scans the full events table with a SELECT *. Let me analyze it: @@ -118,7 +113,7 @@ events table with a SELECT *. Let me analyze it: Recommendations: 1. Replace SELECT * with only the 6 columns actually used downstream -2. Add a date filter — the events table has 2.1B rows but most queries +2. Add a date filter, since the events table has 2.1B rows but most queries only need the last 7 days 3. Consider materializing this as a view with the filter baked in @@ -133,218 +128,40 @@ Estimated savings: ~60 credits/week (71% reduction on this query alone) - Analyze costs and credit consumption - Search warehouse metadata - Detect PII columns +- Save training entries --- -## Validator +## Plan -**Read + validate. For data quality and integrity checks.** +**Minimal permissions. For planning before acting.** ```bash -altimate --agent validator +altimate --agent plan ``` -### Example: Audit a model before merging - -``` -You: Validate the changes in models/marts/fct_revenue.sql - -Validator: - -> sql_analyze [fct_revenue.sql contents] - ⚠ FUNCTION_IN_FILTER: DATE_TRUNC on order_date prevents index usage - ✓ No other anti-patterns - -> lineage_check [fct_revenue.sql] - Sources: stg_orders.order_amount, stg_orders.order_date, dim_products.category - Confidence: high (no SELECT *, schema-qualified, small graph) - -> schema_diff [old version] [new version] - + Added column: revenue_category (VARCHAR) - ~ Modified column: total_revenue FLOAT → DECIMAL(18,2) - Severity: medium (type change may affect downstream) - -Issues found: -1. FUNCTION_IN_FILTER on line 23 — use a pre-computed date column instead -2. Type change from FLOAT to DECIMAL(18,2) — verify downstream consumers -3. Missing test coverage for the new revenue_category column -``` +Plan mode restricts the agent to reading files and editing plan files only. No SQL, no bash, no file modifications. Use this to outline an approach before switching to builder to execute it. --- -## Migrator - -**Cross-warehouse migration specialist.** - -```bash -altimate --agent migrator -``` - -### Example: Migrate from Snowflake to BigQuery - -``` -You: Translate our top 5 models from Snowflake to BigQuery - -Migrator: - -> sql_translate --source snowflake --target bigquery - Input: SELECT DATEADD(day, -7, CURRENT_TIMESTAMP()) AS week_ago, - IFF(status = 'active', 1, 0) AS is_active, - TRY_TO_NUMBER(raw_amount) AS amount - FROM analytics.public.orders - QUALIFY ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY updated_at DESC) = 1 - - Output: SELECT DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AS week_ago, - IF(status = 'active', 1, 0) AS is_active, - SAFE_CAST(raw_amount AS NUMERIC) AS amount - FROM `analytics.public.orders` - QUALIFY ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY updated_at DESC) = 1 - - Warnings: - - QUALIFY is supported natively in BigQuery (no change needed) - - TRY_TO_NUMBER → SAFE_CAST (NULL on failure, same behavior) - - Table reference backtick-quoted for BigQuery - -> lineage_check [source SQL] -> lineage_check [translated SQL] - ✓ Lineage preserved — same columns, same sources -``` - -### Supported translation paths - -| From | To | -|---|---| -| Snowflake | BigQuery, Databricks, Redshift, PostgreSQL | -| BigQuery | Snowflake, Databricks, Redshift, PostgreSQL | -| Redshift | Snowflake, BigQuery, PostgreSQL | -| PostgreSQL | Snowflake, BigQuery, Databricks | -| MySQL | PostgreSQL, Snowflake | -| SQL Server | PostgreSQL, Snowflake | - ---- - -## Researcher - -**Read-only + parallel investigation. For complex analytical questions.** - -```bash -altimate --agent researcher -``` - -Researcher mode follows a 4-phase protocol for thorough investigation: - -1. **Plan** — Outline questions, data sources, and tool sequence -2. **Gather** — Execute investigation steps, parallelize where possible -3. **Analyze** — Cross-reference findings, identify root causes -4. **Report** — Produce structured report with evidence and recommendations - -### Example: Investigate a cost spike - -``` -You: Why did our Snowflake costs spike last week? - -Researcher: Let me plan the investigation. - - Investigation Plan: - 1. Pull credit consumption trends (finops_analyze_credits) - 2. Identify top expensive queries (finops_expensive_queries) - 3. Check for deployment changes (git log) - 4. Correlate query patterns with cost changes - - Proceeding... +## SQL Write Access Control -> finops_analyze_credits prod-snowflake --days 14 - Last 7 days: 847 credits (↑ 38% vs prior week) - Top warehouse: TRANSFORM_WH (523 credits, 62%) +All SQL queries are classified before execution using AST-based parsing: -> finops_expensive_queries prod-snowflake --days 7 - 1. Full table scan on events (89 credits, 47 runs) - 2. New model fct_attribution (45 credits, added 3 days ago) +| Query Type | Builder | Analyst | +|-----------|---------|---------| +| `SELECT`, `SHOW`, `DESCRIBE`, `EXPLAIN` | Allowed | Allowed | +| `INSERT`, `UPDATE`, `DELETE`, `CREATE`, `ALTER` | Prompts for approval | Denied | +| `DROP DATABASE`, `DROP SCHEMA`, `TRUNCATE` | Blocked (cannot override) | Blocked | -> bash: git log --oneline --since="7 days ago" -- models/ - abc1234 feat: add fct_attribution model - def5678 fix: remove partition filter from stg_events +The classifier detects write operations including: `INSERT`, `UPDATE`, `DELETE`, `MERGE`, `CREATE`, `DROP`, `ALTER`, `TRUNCATE`, `GRANT`, `REVOKE`, `COPY INTO`, `CALL`, `EXEC`, `EXECUTE IMMEDIATE`, `BEGIN`, `DECLARE`, `REPLACE`, `UPSERT`, `RENAME`. - # Investigation Report - - ## Summary - Costs increased 38% due to two changes deployed 3 days ago. - - ## Key Findings - 1. fct_attribution model (new) scans 2.1B events rows without partition filter — 45 credits/week - 2. Partition filter was accidentally removed from stg_events in commit def5678 — caused 47 full scans - - ## Recommendations - 1. [ ] Add date partition filter to fct_attribution (estimated savings: 40 credits/week) - 2. [ ] Restore partition filter in stg_events (estimated savings: 60 credits/week) -``` - -### What researcher can do - -- Run all read-only SQL and schema tools -- Use `bash` for git history and code analysis -- Launch parallel sub-investigations with the `task` tool -- Access `websearch` and `webfetch` for external research -- Save discoveries as training for future sessions - -### When to use researcher vs analyst - -| Use Analyst for... | Use Researcher for... | -|---|---| -| Quick data exploration | Multi-step investigations | -| Single-query answers | Root cause analysis | -| Schema browsing | Cross-referencing multiple data sources | -| Ad-hoc cost checks | Comprehensive audits with reports | +Multi-statement queries (`SELECT 1; INSERT INTO ...`) are classified as write if any statement is a write. --- -## Trainer - -**Read-only + training management. For teaching your AI teammate.** - -```bash -altimate --agent trainer -``` - -Trainer mode helps you systematically teach the agent about your project. It cannot modify files — it only reads, explores, and manages training. - -### Example: Teach from a correction - -``` -You: I want you to know that we never use FLOAT for money columns - -Trainer: Good rule. Let me save this properly: - - Kind: rule - Name: no-float-financial - Content: "Use DECIMAL(18,2) for financial columns (*_amount, *_price, *_cost). - FLOAT causes rounding errors that compound across aggregations." - - Scope: project (shared with team via git) - Save this? - -You: Yes - -Trainer: Saved. Every agent mode will apply this from now on. - Your teammates get it when they pull. -``` - -### What trainer can do - -- Guide users through teaching interactively -- Read codebase files to understand existing patterns -- Save, list, and remove training entries -- Analyze training gaps and suggest what to teach next -- Review and curate training quality (stale detection, consolidation) - -### When to use trainer mode +## Custom Agent Modes -| Scenario | Why trainer mode | -|---|---| -| New project setup | Teach conventions before anyone starts building | -| New hire onboarding | Walk through what the team has taught | -| Post-incident review | Save lessons learned as permanent rules | -| Loading a style guide | Extract rules and standards from documentation | -| Quarterly audit | Remove stale entries, consolidate, fill gaps | +You can create custom agents with tailored permissions for specialized workflows like validation, migration, research, or executive reporting. See [Agent Configuration](../configure/agents.md#custom-agents) for details. -For the full guide, see [Training: Corrections That Stick](training/index.md). +For training your AI teammate, see [Training](training/index.md). diff --git a/docs/docs/data-engineering/guides/ci-headless.md b/docs/docs/data-engineering/guides/ci-headless.md index 11d29da1af..bacc07d8a1 100644 --- a/docs/docs/data-engineering/guides/ci-headless.md +++ b/docs/docs/data-engineering/guides/ci-headless.md @@ -51,7 +51,7 @@ SNOWFLAKE_WAREHOUSE=compute_wh | Code | Meaning | |---|---| -| `0` | Success — task completed | +| `0` | Success (task completed) | | `1` | Task completed but result indicates issues (e.g., anti-patterns found) | | `2` | Configuration error (missing API key, bad connection) | | `3` | Tool execution error (warehouse unreachable, query failed) | @@ -66,7 +66,7 @@ altimate run "validate models in models/staging/ for anti-patterns" || exit 1 ## Worked Examples -### Example 1 — Nightly Cost Check (GitHub Actions) +### Example 1: Nightly Cost Check (GitHub Actions) ```yaml # .github/workflows/cost-check.yml @@ -83,7 +83,7 @@ jobs: - uses: actions/checkout@v4 - name: Install altimate - run: npm install -g @altimateai/altimate-code + run: npm install -g altimate-code - name: Run cost report env: @@ -105,7 +105,7 @@ jobs: path: cost-report.json ``` -### Example 2 — Post-Deploy SQL Validation +### Example 2: Post-Deploy SQL Validation Add to your dbt deployment workflow to catch anti-patterns before they reach production: @@ -120,7 +120,7 @@ Add to your dbt deployment workflow to catch anti-patterns before they reach pro --output json ``` -### Example 3 — Automated Test Generation (Pre-commit) +### Example 3: Automated Test Generation (Pre-commit) ```bash #!/bin/bash @@ -152,4 +152,4 @@ See [Tracing](../../configure/tracing.md) for the full trace reference. ## Security Recommendation -Use a **read-only warehouse user** for CI jobs that only need to read data. Reserve write-access credentials for jobs that explicitly need them (e.g., test generation that writes files). See [Security FAQ](../../security-faq.md) and [Permissions](../../configure/permissions.md). +Use a **read-only warehouse user** for CI jobs that only need to read data. Reserve write-access credentials for jobs that explicitly need them (e.g., test generation that writes files). See [Security FAQ](../../reference/security-faq.md) and [Permissions](../../configure/permissions.md). diff --git a/docs/docs/data-engineering/guides/cost-optimization.md b/docs/docs/data-engineering/guides/cost-optimization.md index 068d1c0c21..654651383b 100644 --- a/docs/docs/data-engineering/guides/cost-optimization.md +++ b/docs/docs/data-engineering/guides/cost-optimization.md @@ -70,9 +70,9 @@ You: Are our warehouses the right size? ``` Common findings: -- **Over-provisioned warehouses** — Utilization below 30% means you're paying for idle compute -- **Missing auto-suspend** — Warehouses running 24/7 when only used during business hours -- **Wrong size for workload** — Small queries on XL warehouses waste credits +- **Over-provisioned warehouses.** Utilization below 30% means you're paying for idle compute. +- **Missing auto-suspend.** Warehouses running 24/7 when only used during business hours. +- **Wrong size for workload.** Small queries on XL warehouses waste credits. ## Step 4: Clean up unused resources diff --git a/docs/docs/data-engineering/guides/migration.md b/docs/docs/data-engineering/guides/migration.md index 1b62886ca5..c7b594ccfa 100644 --- a/docs/docs/data-engineering/guides/migration.md +++ b/docs/docs/data-engineering/guides/migration.md @@ -1,11 +1,11 @@ # Migration Guide -Use migrator mode to translate SQL across warehouse dialects while preserving lineage and correctness. +Use altimate to translate SQL across warehouse dialects while preserving lineage and correctness. -## Start migrator mode +## Start a migration ```bash -altimate --agent migrator +altimate --agent builder ``` ## Translation workflow @@ -15,7 +15,7 @@ altimate --agent migrator ``` You: Migrate our Snowflake models to BigQuery -Migrator: I'll translate each model and verify lineage is preserved. +Builder: I'll translate each model and verify lineage is preserved. Let me start by listing your models. > dbt_manifest ./target/manifest.json @@ -24,7 +24,7 @@ Migrator: I'll translate each model and verify lineage is preserved. ### 2. Translate with verification -For each model, the migrator: +For each model, the agent: 1. **Reads** the source SQL 2. **Translates** to target dialect @@ -66,7 +66,7 @@ Some features don't have direct equivalents: ``` You: Run all translations and report issues -Migrator: +Builder: ✓ 38/47 models translated cleanly ⚠ 6 models need manual review (VARIANT columns) ✗ 3 models use Snowflake-specific features (STREAMS, TASKS) @@ -127,8 +127,8 @@ WHERE RLIKE(email, '^[a-z]+@.*$'); ## Best practices -1. **Translate in batches** — Start with staging models, then intermediate, then marts -2. **Verify lineage** — Always check that column lineage is preserved after translation -3. **Test with LIMIT** — Run translated queries with `LIMIT 10` on the target warehouse first -4. **Check data types** — Type mappings may lose precision (e.g., `NUMBER(38,0)` → `INT64`) -5. **Handle NULL semantics** — Some warehouses handle NULLs differently in comparisons +1. **Translate in batches.** Start with staging models, then intermediate, then marts. +2. **Verify lineage.** Always check that column lineage is preserved after translation. +3. **Test with LIMIT.** Run translated queries with `LIMIT 10` on the target warehouse first. +4. **Check data types.** Type mappings may lose precision (e.g., `NUMBER(38,0)` to `INT64`). +5. **Handle NULL semantics.** Some warehouses handle NULLs differently in comparisons. diff --git a/docs/docs/data-engineering/guides/using-with-codex.md b/docs/docs/data-engineering/guides/using-with-codex.md index acc8f29e9e..493d3987c2 100644 --- a/docs/docs/data-engineering/guides/using-with-codex.md +++ b/docs/docs/data-engineering/guides/using-with-codex.md @@ -56,7 +56,7 @@ Once authenticated, all altimate tools work with Codex as the LLM backend. No AP - altimate authenticates via PKCE OAuth flow with ChatGPT - Requests route through `chatgpt.com/backend-api/codex/responses` -- Your subscription covers all token usage — no per-token billing +- Your subscription covers all token usage, so there is no per-token billing - Token is stored locally at `~/.altimate/data/auth.json` ## Cost diff --git a/docs/docs/data-engineering/tools/dbt-tools.md b/docs/docs/data-engineering/tools/dbt-tools.md index 2b07099901..89f62e7c95 100644 --- a/docs/docs/data-engineering/tools/dbt-tools.md +++ b/docs/docs/data-engineering/tools/dbt-tools.md @@ -14,10 +14,10 @@ Running: dbt run --select stg_orders ``` **Parameters:** -- `command` (optional, default: "run") — dbt command: `run`, `test`, `build`, `compile`, `seed`, `snapshot` -- `select` (optional) — Model selection syntax (`stg_orders`, `+fct_revenue`, `tag:daily`) -- `args` (optional) — Additional CLI arguments -- `project_dir` (optional) — Path to dbt project root +- `command` (optional, default: "run"): dbt command: `run`, `test`, `build`, `compile`, `seed`, `snapshot` +- `select` (optional): Model selection syntax (`stg_orders`, `+fct_revenue`, `tag:daily`) +- `args` (optional): Additional CLI arguments +- `project_dir` (optional): Path to dbt project root ### Examples @@ -72,6 +72,36 @@ Source Freshness: --- +## altimate-dbt CLI + +`altimate-dbt` is a standalone CLI for dbt workflows. It auto-detects your dbt project directory, Python environment, and adapter type (Snowflake, BigQuery, Databricks, Redshift, etc.). + +```bash +# Initialize dbt integration +altimate-dbt init + +# Diagnose issues +altimate-dbt doctor + +# Run dbt commands +altimate-dbt compile +altimate-dbt build +altimate-dbt run +altimate-dbt test + +# Utilities +altimate-dbt execute "SELECT 1" # Run a query via dbt adapter +altimate-dbt columns my_model # List model columns +altimate-dbt graph # View lineage/DAG +altimate-dbt deps # Manage dependencies +``` + +All commands provide friendly error diagnostics with actionable fix suggestions when something goes wrong. + +> **Tip:** In builder mode, the agent prefers `altimate-dbt` over the raw `dbt_run` tool for better error handling and auto-detection. + +--- + ## dbt Skills ### /generate-tests diff --git a/docs/docs/data-engineering/tools/finops-tools.md b/docs/docs/data-engineering/tools/finops-tools.md index 1cd5bd8b15..b7ffc987a7 100644 --- a/docs/docs/data-engineering/tools/finops-tools.md +++ b/docs/docs/data-engineering/tools/finops-tools.md @@ -27,11 +27,11 @@ Summary: ``` **Parameters:** -- `warehouse` (required) — Connection name -- `days` (optional, default: 7) — Lookback period -- `limit` (optional, default: 100) — Max queries returned -- `user` (optional) — Filter by username -- `warehouse_filter` (optional) — Filter by compute warehouse name +- `warehouse` (required): Connection name +- `days` (optional, default: 7): Lookback period +- `limit` (optional, default: 100): Max queries returned +- `user` (optional): Filter by username +- `warehouse_filter` (optional): Filter by compute warehouse name **Data sources by warehouse:** - Snowflake: `QUERY_HISTORY` function @@ -63,9 +63,9 @@ By Warehouse: DEV_WH (XS): 47.4 credits (6%) Recommendations: - 1. TRANSFORM_WH runs at 23% utilization — consider downsizing to L - 2. 340 queries on ANALYTICS_WH scan >1GB but return <100 rows — add filters - 3. DEV_WH has 0 queries between 2am-8am — enable auto-suspend + 1. TRANSFORM_WH runs at 23% utilization, consider downsizing to L + 2. 340 queries on ANALYTICS_WH scan >1GB but return <100 rows, add filters + 3. DEV_WH has 0 queries between 2am-8am, enable auto-suspend ``` --- @@ -92,7 +92,7 @@ Top 5 Expensive Queries: 3. 23.7 credits | 312 executions | ANALYTICS_WH SELECT COUNT(DISTINCT user_id) FROM events WHERE ... Anti-patterns: None - Suggestion: Pre-aggregate in a materialized view — saves ~23 credits/week + Suggestion: Pre-aggregate in a materialized view, which saves ~23 credits/week 4. 18.2 credits | 7 executions | TRANSFORM_WH INSERT INTO daily_agg SELECT ... FROM raw_events @@ -153,14 +153,14 @@ Find tables and warehouses that are costing money but not being used. > finops_unused_resources prod-snowflake --days 30 Unused Tables (no reads in 30 days): - 1. RAW.LEGACY_EVENTS — 450GB, last accessed 2025-11-03 - 2. STAGING.STG_OLD_USERS — 12GB, last accessed 2025-12-15 - 3. ANALYTICS.TMP_MIGRATION_2024 — 89GB, last accessed 2025-08-22 + 1. RAW.LEGACY_EVENTS (450GB, last accessed 2025-11-03) + 2. STAGING.STG_OLD_USERS (12GB, last accessed 2025-12-15) + 3. ANALYTICS.TMP_MIGRATION_2024 (89GB, last accessed 2025-08-22) Total storage: 551GB → ~$23/month in storage costs Idle Warehouses (no queries in 7+ days): - 1. MIGRATION_WH (Medium) — last query 2026-02-10 - 2. TEST_WH (Small) — last query 2026-01-28 + 1. MIGRATION_WH (Medium), last query 2026-02-10 + 2. TEST_WH (Small), last query 2026-01-28 Recommendations: 1. Archive or drop the 3 unused tables → save $23/month diff --git a/docs/docs/data-engineering/tools/index.md b/docs/docs/data-engineering/tools/index.md index b8993ee5f8..5df590cc31 100644 --- a/docs/docs/data-engineering/tools/index.md +++ b/docs/docs/data-engineering/tools/index.md @@ -1,6 +1,6 @@ # Tools Reference -altimate has 99+ specialized tools organized by function. +altimate has 100+ specialized tools organized by function. | Category | Tools | Purpose | |---|---|---| @@ -8,9 +8,10 @@ altimate has 99+ specialized tools organized by function. | [Schema Tools](schema-tools.md) | 7 tools | Inspection, search, PII detection, tagging, diffing | | [FinOps Tools](finops-tools.md) | 8 tools | Cost analysis, warehouse sizing, unused resources, RBAC | | [Lineage Tools](lineage-tools.md) | 1 tool | Column-level lineage tracing with confidence scoring | -| [dbt Tools](dbt-tools.md) | 2 tools + 6 skills | Run, manifest parsing, test generation, scaffolding | +| [dbt Tools](dbt-tools.md) | 2 tools + 5 skills | Run, manifest parsing, test generation, scaffolding, `altimate-dbt` CLI | | [Warehouse Tools](warehouse-tools.md) | 6 tools | Environment scanning, connection management, discovery, testing | | [Altimate Memory](memory-tools.md) | 3 tools | Persistent cross-session memory for warehouse config, conventions, and preferences | | [Training](../training/index.md) | 3 tools + 3 skills | Correct the agent once, it remembers forever, your team inherits it | +| `tool_lookup` | 1 tool | Runtime introspection that discovers tool schemas and parameters dynamically | All tools are available in the interactive TUI. The agent automatically selects the right tools based on your request. diff --git a/docs/docs/data-engineering/tools/memory-tools.md b/docs/docs/data-engineering/tools/memory-tools.md index 47d4b43b40..03f837dd6c 100644 --- a/docs/docs/data-engineering/tools/memory-tools.md +++ b/docs/docs/data-engineering/tools/memory-tools.md @@ -2,16 +2,16 @@ Altimate Memory gives your data engineering agent **persistent, cross-session memory**. Instead of re-explaining your warehouse setup, naming conventions, or team preferences every session, the agent remembers what matters and picks up where you left off. -Memory blocks are plain Markdown files stored on disk — human-readable, version-controllable, and fully under your control. +Memory blocks are plain Markdown files stored on disk, making them human-readable, version-controllable, and fully under your control. ## Why memory matters for data engineering General-purpose coding agents treat every session as a blank slate. For data engineering, this is especially painful because: -- **Warehouse context is stable** — your Snowflake warehouse name, default database, and connection details rarely change, but you re-explain them every session. -- **Naming conventions are tribal knowledge** — `stg_` for staging, `int_` for intermediate, `fct_`/`dim_` for marts. The agent needs to learn these once, not every time. -- **Past analyses inform future work** — if the agent optimized a query or traced lineage for a table last week, recalling that context avoids redundant work. -- **User preferences accumulate** — SQL style, preferred dialects, dbt patterns, warehouse sizing decisions. +- **Warehouse context is stable.** Your Snowflake warehouse name, default database, and connection details rarely change, but you re-explain them every session. +- **Naming conventions are tribal knowledge.** `stg_` for staging, `int_` for intermediate, `fct_`/`dim_` for marts. The agent needs to learn these once, not every time. +- **Past analyses inform future work.** If the agent optimized a query or traced lineage for a table last week, recalling that context avoids redundant work. +- **User preferences accumulate.** SQL style, preferred dialects, dbt patterns, warehouse sizing decisions. Altimate Memory solves this with three tools that let the agent save, recall, and manage its own persistent knowledge. @@ -41,7 +41,7 @@ Memory: 1 block(s) |---|---|---|---| | `scope` | `"global" \| "project" \| "all"` | `"all"` | Filter by scope | | `tags` | `string[]` | `[]` | Filter to blocks containing all specified tags | -| `id` | `string` | — | Read a specific block by ID | +| `id` | `string` | (none) | Read a specific block by ID | --- @@ -55,7 +55,7 @@ Create or update a persistent memory block. Memory: Created "warehouse-config" ``` -The agent automatically calls this when it learns something worth persisting — you can also explicitly ask it to "remember" something. +The agent automatically calls this when it learns something worth persisting. You can also explicitly ask it to "remember" something. **Parameters:** @@ -116,7 +116,7 @@ tags: ["snowflake", "warehouse"] - **Default database**: ANALYTICS_DB ``` -Files are human-readable and editable. You can create, edit, or delete them manually — the agent will pick up changes on the next session. +Files are human-readable and editable. You can create, edit, or delete them manually. The agent will pick up changes on the next session. ## Limits and safety @@ -134,7 +134,7 @@ Blocks are written to a temporary file first, then atomically renamed. This prev ## Disabling memory -Set the environment variable to disable all memory functionality — tools and automatic injection: +Set the environment variable to disable all memory functionality, including tools and automatic injection: ```bash ALTIMATE_DISABLE_MEMORY=true @@ -149,7 +149,7 @@ Altimate Memory automatically injects relevant blocks into the system prompt at **What this means in practice:** - With a typical block size of 200-500 characters, the default budget comfortably fits 15-40 blocks -- Memory injection adds a one-time cost at session start — it does not grow during the session +- Memory injection adds a one-time cost at session start and does not grow during the session - If you notice context pressure, reduce the number of blocks or keep them concise - The agent's own tool calls and responses consume far more context than memory blocks - To disable injection entirely (e.g., for benchmarks), set `ALTIMATE_DISABLE_MEMORY=true` @@ -173,7 +173,7 @@ Memory blocks persist indefinitely. If your warehouse configuration changes or a **How to prevent:** -- Review memory blocks periodically — they're plain Markdown files you can inspect directly +- Review memory blocks periodically, since they're plain Markdown files you can inspect directly - Ask the agent to "forget" outdated information when things change - Keep blocks focused on stable facts rather than ephemeral details @@ -193,7 +193,7 @@ The agent decides what to save based on conversation context. It may occasionall **How to fix:** - Delete the bad block: ask the agent or run `rm .altimate-code/memory/bad-block.md` -- Edit the file directly — it's just Markdown +- Edit the file directly, since it's just Markdown - Ask the agent to rewrite it: "Update the warehouse-config memory with the correct warehouse name" ### Context bloat @@ -219,7 +219,7 @@ Memory blocks are stored as plaintext files on disk. Be mindful of what gets sav - **Do not** save credentials, API keys, or connection strings in memory blocks - **Do** save structural information (warehouse names, naming conventions, schema patterns) - If using project-scoped memory in a shared repo, add `.altimate-code/memory/` to `.gitignore` to avoid committing sensitive context -- Memory blocks are scoped per-user (global) and per-project — there is no cross-user or cross-project leakage +- Memory blocks are scoped per-user (global) and per-project, so there is no cross-user or cross-project leakage !!! warning Memory blocks are not encrypted. Treat them like any other configuration file on your machine. Do not store secrets or PII in memory blocks. diff --git a/docs/docs/data-engineering/tools/schema-tools.md b/docs/docs/data-engineering/tools/schema-tools.md index 8de2ac6880..78726177aa 100644 --- a/docs/docs/data-engineering/tools/schema-tools.md +++ b/docs/docs/data-engineering/tools/schema-tools.md @@ -23,9 +23,9 @@ Table: ANALYTICS.PUBLIC.ORDERS ``` **Parameters:** -- `table` (required) — Table name (schema-qualified: `schema.table` or just `table`) -- `schema_name` (optional) — Schema to search in -- `warehouse` (optional) — Connection name +- `table` (required): Table name (schema-qualified: `schema.table` or just `table`) +- `schema_name` (optional): Schema to search in +- `warehouse` (optional): Connection name --- @@ -50,13 +50,13 @@ Run this once per warehouse (or periodically to refresh). Enables `schema_search ## schema_search -Search indexed metadata by keyword — finds tables, columns, and schemas. +Search indexed metadata by keyword to find tables, columns, and schemas. ``` > schema_search "revenue" --warehouse prod-snowflake Tables: - 1. ANALYTICS.MARTS.FCT_REVENUE (42 columns) — "Monthly revenue fact table" + 1. ANALYTICS.MARTS.FCT_REVENUE (42 columns), "Monthly revenue fact table" 2. ANALYTICS.STAGING.STG_REVENUE_EVENTS (18 columns) Columns: @@ -66,9 +66,9 @@ Columns: ``` **Parameters:** -- `query` (required) — Search term -- `warehouse` (optional) — Limit to one connection -- `limit` (optional) — Max results +- `query` (required): Search term +- `warehouse` (optional): Limit to one connection +- `limit` (optional): Max results --- @@ -84,7 +84,7 @@ Check cache freshness across all warehouses. ├─────────────────┼──────────┼────────┼─────────┼─────────────────────┤ │ prod-snowflake │ 12 │ 847 │ 15,293 │ 2026-02-26 14:30:00 │ │ dev-duckdb │ 2 │ 23 │ 156 │ 2026-02-25 09:15:00 │ -│ bigquery-prod │ — │ — │ — │ Never │ +│ bigquery-prod │ n/a │ n/a │ n/a │ Never │ └─────────────────┴──────────┴────────┴─────────┴─────────────────────┘ ``` @@ -149,8 +149,8 @@ Compare schema changes between two SQL versions to understand migration impact. --new_sql "CREATE TABLE orders (id INT, amount DECIMAL(12,2), status TEXT, created_at TIMESTAMP)" Schema Changes: - ~ Modified: amount (FLOAT → DECIMAL(12,2)) — severity: medium - + Added: created_at (TIMESTAMP) — severity: low + ~ Modified: amount (FLOAT → DECIMAL(12,2)), severity: medium + + Added: created_at (TIMESTAMP), severity: low Impact: Type change on 'amount' may affect downstream consumers expecting FLOAT ``` diff --git a/docs/docs/data-engineering/tools/sql-tools.md b/docs/docs/data-engineering/tools/sql-tools.md index a776fbd4f6..f953f2ee15 100644 --- a/docs/docs/data-engineering/tools/sql-tools.md +++ b/docs/docs/data-engineering/tools/sql-tools.md @@ -18,9 +18,9 @@ Run SQL queries against your connected warehouse. ``` **Parameters:** -- `query` (required) — SQL to execute -- `warehouse` (optional) — Connection name from config. Uses default if omitted -- `limit` (optional, default: 100) — Max rows returned +- `query` (required): SQL to execute +- `warehouse` (optional): Connection name from config. Uses default if omitted +- `limit` (optional, default: 100): Max rows returned --- @@ -167,7 +167,7 @@ Diagnose and auto-fix SQL errors. --error "SQL compilation error: Object 'ANALYTICS.PUBLIC.USERSS' does not exist" \ "SELECT * FROM analytics.public.userss" -Diagnosis: Typo in table name — 'userss' should be 'users' +Diagnosis: Typo in table name. 'userss' should be 'users' Fixed SQL: SELECT * FROM analytics.public.users @@ -222,12 +222,12 @@ Rewritten SQL: ### Rewrite strategies -1. **Predicate pushdown** — Move filters closer to data source -2. **SELECT pruning** — Replace `*` with explicit columns -3. **Function elimination** — Replace non-sargable functions with range predicates -4. **JOIN reordering** — Smaller tables first -5. **Subquery flattening** — Convert to JOINs where possible -6. **UNION ALL promotion** — Replace UNION with UNION ALL when safe +1. **Predicate pushdown.** Move filters closer to data source +2. **SELECT pruning.** Replace `*` with explicit columns +3. **Function elimination.** Replace non-sargable functions with range predicates +4. **JOIN reordering.** Smaller tables first +5. **Subquery flattening.** Convert to JOINs where possible +6. **UNION ALL promotion.** Replace UNION with UNION ALL when safe --- @@ -260,6 +260,6 @@ Schema-aware SQL completion. > sql_autocomplete --prefix "SELECT o.order_id, o.amo" --table_context ["orders"] Suggestions: - 1. o.amount (DECIMAL) — orders.amount - 2. o.amount_usd (DECIMAL) — orders.amount_usd + 1. o.amount (DECIMAL), from orders.amount + 2. o.amount_usd (DECIMAL), from orders.amount_usd ``` diff --git a/docs/docs/data-engineering/tools/warehouse-tools.md b/docs/docs/data-engineering/tools/warehouse-tools.md index adaa76daf7..2505f0f282 100644 --- a/docs/docs/data-engineering/tools/warehouse-tools.md +++ b/docs/docs/data-engineering/tools/warehouse-tools.md @@ -9,9 +9,6 @@ Scan the entire data engineering environment in one call. Detects dbt projects, # Environment Scan -## Python Engine -✓ Engine healthy - ## Git Repository ✓ Git repo on branch `main` (origin: github.com/org/analytics) diff --git a/docs/docs/data-engineering/training/index.md b/docs/docs/data-engineering/training/index.md index 4e75b8791f..39050a65f4 100644 --- a/docs/docs/data-engineering/training/index.md +++ b/docs/docs/data-engineering/training/index.md @@ -4,7 +4,7 @@ ## The Problem -AI coding assistants make the same mistakes over and over. You say "use DECIMAL not FLOAT," it fixes it — then does the same thing next session. You write instructions in CLAUDE.md, but nobody updates it after corrections. The knowledge from your day-to-day work never becomes permanent. +AI coding assistants make the same mistakes over and over. You say "use DECIMAL not FLOAT," it fixes it, then does the same thing next session. You write instructions in CLAUDE.md, but nobody updates it after corrections. The knowledge from your day-to-day work never becomes permanent. ## How Training Works @@ -21,9 +21,9 @@ Builder: Saved. I'll apply this in every future session. Your team gets it too when they pull. ``` -That's it. **2 seconds.** No editing files. No context switching. The correction becomes permanent knowledge that every agent mode (builder, analyst, validator) sees in every future session. +That's it. **2 seconds.** No editing files. No context switching. The correction becomes permanent knowledge that every agent mode (builder, analyst) sees in every future session. -Research shows compact, focused context improves AI performance by 17 percentage points — while dumping comprehensive docs actually hurts by 3 points (SkillsBench, 7,308 test runs). Training delivers the right knowledge to the right agent at the right time, not everything to everyone. +Research shows compact, focused context improves AI performance by 17 percentage points, while dumping comprehensive docs actually hurts by 3 points (SkillsBench, 7,308 test runs). Training delivers the right knowledge to the right agent at the right time, not everything to everyone. ## Three Ways to Teach @@ -48,7 +48,7 @@ Point the agent at code that demonstrates a convention: ``` You: /teach @models/staging/stg_orders.sql -Trainer: I see the pattern: +Agent: I see the pattern: - source CTE → filtered CTE → final - ROW_NUMBER dedup on _loaded_at Save as pattern "staging-cte-structure"? @@ -75,7 +75,7 @@ Agent: I found 8 actionable rules: | Kind | Purpose | Example | |---|---|---| -| **rule** | Hard constraint | "Never use FLOAT for money — use DECIMAL(18,2)" | +| **rule** | Hard constraint | "Never use FLOAT for money. Use DECIMAL(18,2)." | | **pattern** | How code should look | "Staging models: source CTE → filtered → final" | | **standard** | Team convention | "Every PR needs tests + schema YAML" | | **glossary** | Business term | "ARR = Annual Recurring Revenue = MRR * 12" | @@ -91,22 +91,15 @@ Agent: I found 8 actionable rules: No meetings. No Slack messages. No "hey everyone, remember to..." -## Trainer Mode +## Systematic Teaching -For systematic teaching (not just corrections), switch to trainer mode: +For systematic teaching (not just corrections), use the `/teach` and `/train` skills in any agent mode: -```bash -altimate --agent trainer -``` - -Trainer mode is read-only — it can't modify your code. It helps you: - -- **Teach interactively**: "Let me teach you about our Databricks setup" -- **Find gaps**: "What don't you know about my project?" -- **Review training**: "Show me what the team has taught you" -- **Curate**: "Which entries are stale? What should we consolidate?" +- `/teach @file` to learn patterns from example files +- `/train @file` to learn standards from documentation +- `/training-status` to see all learned knowledge -### When to Use Trainer Mode +### When to Teach | Scenario | Why | |---|---| @@ -121,8 +114,6 @@ Training doesn't dump everything into every session. It delivers what's relevant - **Builder** gets rules and patterns first (naming conventions, SQL constraints) - **Analyst** gets glossary and context first (business terms, background knowledge) -- **Validator** gets rules and standards first (quality gates, test requirements) -- **Executive** gets glossary and playbooks first (business terms, procedures) Research shows 2-3 focused modules per task is optimal. The scoring system ensures each agent gets its most relevant knowledge first. @@ -148,7 +139,7 @@ Training doesn't replace CLAUDE.md. They complement each other: - **Advisory, not enforced.** Training guides the agent, but it's not a hard gate. For critical rules, also add dbt tests or sqlfluff rules that block CI. - **No approval workflow.** Anyone with repo access can save training to project scope. Use code review on `.altimate-code/memory/` changes for governance. -- **No audit trail** beyond git history. Training doesn't track who saved what — use `git blame` on the training files. +- **No audit trail** beyond git history. Training doesn't track who saved what, so use `git blame` on the training files. - **Context budget.** Training competes for context space. Under pressure, least-relevant entries are excluded. Run `/training-status` to see what's included. - **20 entries per kind.** Hard limit. Consolidate related rules into one entry rather than saving many small ones. - **SQL-focused file analysis.** The `/teach` skill works best with SQL/dbt files. Python, PySpark, and other patterns must be taught manually via conversation. diff --git a/docs/docs/data-engineering/training/team-deployment.md b/docs/docs/data-engineering/training/team-deployment.md index fec7848db0..a7ccfcd2f5 100644 --- a/docs/docs/data-engineering/training/team-deployment.md +++ b/docs/docs/data-engineering/training/team-deployment.md @@ -1,10 +1,10 @@ # Deploying Team Training -Get every teammate's AI automatically applying the same SQL conventions, naming standards, and anti-pattern rules. Achieved by committing `.altimate-code/memory/` to git — teammates inherit your training on `git pull`. +Get every teammate's AI automatically applying the same SQL conventions, naming standards, and anti-pattern rules. Achieved by committing `.altimate-code/memory/` to git so that teammates inherit your training on `git pull`. --- -## Step 1 — Create Your First Team Training Entries +## Step 1: Create Your First Team Training Entries Use the `/teach` or `/train` skills to save project-specific conventions: @@ -26,7 +26,7 @@ This shows all active training entries, their scope (global vs project), and whe --- -## Step 2 — Locate the Training Files +## Step 2: Locate the Training Files Training is stored in `.altimate-code/memory/` in your project root. Each entry is a markdown file with YAML frontmatter: @@ -40,11 +40,11 @@ Training is stored in `.altimate-code/memory/` in your project root. Each entry **Global vs. project scope:** - **Project scope** (`.altimate-code/memory/`): Applies when working in this project. Commit to git to share with team. -- **Global scope** (`~/.altimate-code/memory/`): Applies across all projects. Do not commit — this is personal. +- **Global scope** (`~/.altimate-code/memory/`): Applies across all projects. Do not commit, as this is personal. --- -## Step 3 — Commit to Git +## Step 3: Commit to Git ```bash git add .altimate-code/memory/ @@ -52,11 +52,11 @@ git commit -m "Add team SQL conventions and naming standards" git push ``` -Teammates who `git pull` automatically inherit all training entries. No additional setup required — the tool reads from `.altimate-code/memory/` on startup. +Teammates who `git pull` automatically inherit all training entries. No additional setup is required because the tool reads from `.altimate-code/memory/` on startup. --- -## Step 4 — Verify a Teammate Got the Training +## Step 4: Verify a Teammate Got the Training After a teammate pulls, they can run: @@ -85,4 +85,4 @@ Use project scope for team standards. Use global scope only for personal prefere ## Limitations -Training is as good as the corrections you save. The system doesn't infer conventions from your existing codebase — you teach it explicitly. For the full description of how training works, see [Training Overview](index.md). +Training is as good as the corrections you save. The system doesn't infer conventions from your existing codebase; you teach it explicitly. For the full description of how training works, see [Training Overview](index.md). diff --git a/docs/docs/develop/ecosystem.md b/docs/docs/develop/ecosystem.md index 66bfd9186b..3f847d5d41 100644 --- a/docs/docs/develop/ecosystem.md +++ b/docs/docs/develop/ecosystem.md @@ -12,15 +12,15 @@ altimate has a growing ecosystem of plugins, tools, and integrations. ## Integrations -- **GitHub Actions** — Automated PR review and issue triage -- **GitLab CI** — Merge request analysis -- **VS Code / Cursor** — IDE integration -- **MCP** — Model Context Protocol servers -- **ACP** — Agent Communication Protocol for editors +- **GitHub Actions**: Automated PR review and issue triage +- **GitLab CI**: Merge request analysis +- **VS Code / Cursor**: IDE integration +- **MCP**: Model Context Protocol servers +- **ACP**: Agent Communication Protocol for editors ## Community -- [GitHub Repository](https://github.com/AltimateAI/altimate-code) — Source code, issues, discussions +- [GitHub Repository](https://github.com/AltimateAI/altimate-code): Source code, issues, discussions - Share your plugins and tools with the community ## Contributing diff --git a/docs/docs/develop/plugins.md b/docs/docs/develop/plugins.md index 237904ea80..a4bb14b979 100644 --- a/docs/docs/develop/plugins.md +++ b/docs/docs/develop/plugins.md @@ -56,9 +56,9 @@ Add plugins to your `altimate-code.json` config file: Plugins can be specified as: -- **npm package name** — installed from the registry (e.g., `"npm-published-plugin"`) -- **Relative path** — a local directory (e.g., `"./path/to/local-plugin"`) -- **Scoped package** — with an org prefix (e.g., `"@altimateai/altimate-code-plugin-example"`) +- **npm package name**: installed from the registry (e.g., `"npm-published-plugin"`) +- **Relative path**: a local directory (e.g., `"./path/to/local-plugin"`) +- **Scoped package**: with an org prefix (e.g., `"@altimateai/altimate-code-plugin-example"`) ## Plugin Hooks @@ -70,7 +70,7 @@ Plugins can listen to lifecycle events. Each hook receives a context object with | `onSessionEnd` | A session is closed or expires | `session.id`, `session.duration`, `session.messageCount` | | `onMessage` | User sends a message to the agent | `message.content`, `message.sessionId`, `message.agent` | | `onResponse` | Agent generates a response | `response.content`, `response.sessionId`, `response.toolCalls` | -| `onToolCall` | Before a tool is executed | `call.name`, `call.parameters`, `call.sessionId` — return `false` to cancel | +| `onToolCall` | Before a tool is executed | `call.name`, `call.parameters`, `call.sessionId` (return `false` to cancel) | | `onToolResult` | After a tool finishes executing | `result.toolName`, `result.output`, `result.duration`, `result.error` | | `onFileEdit` | A file is modified via the agent | `edit.filePath`, `edit.oldContent`, `edit.newContent`, `edit.sessionId` | | `onFileWrite` | A new file is created via the agent | `write.filePath`, `write.content`, `write.sessionId` | @@ -92,7 +92,7 @@ Hooks fire in this order during a typical interaction: ## Example: SQL Anti-Pattern Plugin -This example creates a data-engineering-specific plugin that checks for `CROSS JOIN` without a `WHERE` clause in Snowflake SQL — a common anti-pattern that can cause massive result sets and runaway costs. +This example creates a data-engineering-specific plugin that checks for `CROSS JOIN` without a `WHERE` clause in Snowflake SQL. This is a common anti-pattern that can cause massive result sets and runaway costs. ### Plugin File diff --git a/docs/docs/develop/sdk.md b/docs/docs/develop/sdk.md index 5502660509..bdd30dfe1c 100644 --- a/docs/docs/develop/sdk.md +++ b/docs/docs/develop/sdk.md @@ -186,11 +186,11 @@ try { | Import | Description | |--------|------------| -| `@altimateai/altimate-code-sdk` | Core SDK — error types, constants, utilities | -| `@altimateai/altimate-code-sdk/client` | HTTP client — `createClient()` | -| `@altimateai/altimate-code-sdk/server` | Server utilities — for embedding altimate in your own server | -| `@altimateai/altimate-code-sdk/v2` | v2 API types — TypeScript type definitions | -| `@altimateai/altimate-code-sdk/v2/client` | v2 client — auto-generated typed client | +| `@altimateai/altimate-code-sdk` | Core SDK: error types, constants, utilities | +| `@altimateai/altimate-code-sdk/client` | HTTP client: `createClient()` | +| `@altimateai/altimate-code-sdk/server` | Server utilities for embedding altimate in your own server | +| `@altimateai/altimate-code-sdk/v2` | v2 API types: TypeScript type definitions | +| `@altimateai/altimate-code-sdk/v2/client` | v2 client: auto-generated typed client | ## OpenAPI diff --git a/docs/docs/develop/server.md b/docs/docs/develop/server.md index d99f9a8a0f..5bae917ed6 100644 --- a/docs/docs/develop/server.md +++ b/docs/docs/develop/server.md @@ -44,12 +44,12 @@ The server uses HTTP Basic Authentication when credentials are set. The server exposes REST endpoints for: -- **Sessions** — Create, list, delete sessions -- **Messages** — Send messages, stream responses -- **Models** — List available models -- **Agents** — List and switch agents -- **Tools** — Execute tools programmatically -- **Export/Import** — Session data management +- **Sessions**: Create, list, delete sessions +- **Messages**: Send messages, stream responses +- **Models**: List available models +- **Agents**: List and switch agents +- **Tools**: Execute tools programmatically +- **Export/Import**: Session data management Use the [SDK](sdk.md) for a typed client, or call the API directly. diff --git a/docs/docs/drivers.md b/docs/docs/drivers.md index 949a3e9f83..52c1d85cdf 100644 --- a/docs/docs/drivers.md +++ b/docs/docs/drivers.md @@ -2,7 +2,7 @@ ## Overview -Altimate Code connects to 10 databases natively via TypeScript drivers. No Python dependency required. Drivers are loaded lazily — only the driver you need is imported at runtime. +Altimate Code connects to 10 databases natively via TypeScript drivers. No Python dependency required. Drivers are loaded lazily, so only the driver you need is imported at runtime. ## Support Matrix @@ -21,7 +21,7 @@ Altimate Code connects to 10 databases natively via TypeScript drivers. No Pytho ## Installation -Drivers are `optionalDependencies` — install only what you need: +Drivers are `optionalDependencies`, so install only what you need: ```bash # Embedded databases (no external service needed) @@ -77,7 +77,7 @@ export ALTIMATE_CODE_CONN_MYDB='{"type":"postgres","host":"localhost","port":543 ### Via dbt Profiles (Recommended for dbt Users) -**dbt-first execution**: When working in a dbt project, `sql.execute` automatically uses dbt's own adapter to connect via `profiles.yml` — no separate connection configuration needed. If dbt is not configured or fails, it falls back to native drivers silently. +**dbt-first execution**: When working in a dbt project, `sql.execute` automatically uses dbt's own adapter to connect via `profiles.yml`, so no separate connection configuration is needed. If dbt is not configured or fails, it falls back to native drivers silently. Connections are also auto-discovered from `~/.dbt/profiles.yml` for the `warehouse.list` and `warehouse.discover` tools. Jinja `{{ env_var() }}` patterns are resolved automatically. Discovered connections are named `dbt_{profile}_{target}`. @@ -161,15 +161,15 @@ Connect through a bastion host by adding SSH config to any connection: SSH auth types: `"key"` (default) or `"password"` (set `ssh_password`). -> **Note:** SSH tunneling cannot be used with `connection_string` — use explicit `host`/`port` instead. +> **Note:** SSH tunneling cannot be used with `connection_string`. Use explicit `host`/`port` instead. ## Auto-Discovery The CLI auto-discovers connections from: -1. **Docker containers** — detects running PostgreSQL, MySQL, MariaDB, SQL Server, Oracle containers -2. **dbt profiles** — parses `~/.dbt/profiles.yml` for all supported adapters -3. **Environment variables** — detects `SNOWFLAKE_ACCOUNT`, `PGHOST`, `MYSQL_HOST`, `MSSQL_HOST`, `ORACLE_HOST`, `DUCKDB_PATH`, `SQLITE_PATH`, etc. +1. **Docker containers**: detects running PostgreSQL, MySQL, MariaDB, SQL Server, Oracle containers +2. **dbt profiles**: parses `~/.dbt/profiles.yml` for all supported adapters +3. **Environment variables**: detects `SNOWFLAKE_ACCOUNT`, `PGHOST`, `MYSQL_HOST`, `MSSQL_HOST`, `ORACLE_HOST`, `DUCKDB_PATH`, `SQLITE_PATH`, etc. Use the `warehouse_discover` tool or run project scan to find available connections. @@ -177,7 +177,7 @@ Use the `warehouse_discover` tool or run project scan to find available connecti These features work based on SDK documentation but haven't been verified with automated E2E tests: -### Snowflake (partially tested — 37 E2E tests pass) +### Snowflake (partially tested, 37 E2E tests pass) - ✅ Password authentication - ✅ Key-pair with unencrypted PEM - ✅ Key-pair with encrypted PEM + passphrase @@ -188,7 +188,7 @@ These features work based on SDK documentation but haven't been verified with au - ❌ OAuth/external browser auth (requires interactive browser) - ❌ Multi-cluster warehouse auto-scaling -### BigQuery (partially tested — 25 E2E tests pass) +### BigQuery (partially tested, 25 E2E tests pass) - ✅ Service Account JSON key authentication - ✅ Schema introspection (datasets, tables, columns) - ✅ BigQuery types (UNNEST, STRUCT, DATE/DATETIME/TIMESTAMP, STRING_AGG) @@ -197,7 +197,7 @@ These features work based on SDK documentation but haven't been verified with au - ❌ Location-specific query execution - ❌ Dry run / cost estimation -### Databricks (partially tested — 24 E2E tests pass) +### Databricks (partially tested, 24 E2E tests pass) - ✅ Personal Access Token (PAT) authentication - ✅ Unity Catalog (SHOW CATALOGS, SHOW SCHEMAS) - ✅ Schema introspection (listSchemas, listTables, describeTable) @@ -246,11 +246,11 @@ User calls sql.execute("SELECT * FROM orders") ### Dispatcher Pattern -All 73 tool methods route through a central `Dispatcher` that maps method names to native TypeScript handlers. There is no Python bridge — every call executes in-process. +All 73 tool methods route through a central `Dispatcher` that maps method names to native TypeScript handlers. There is no Python bridge; every call executes in-process. ### Shared Driver Package -Database drivers live in `packages/drivers/` (`@altimateai/drivers`) — a workspace package shared across the monorepo. Each driver: +Database drivers live in `packages/drivers/` (`@altimateai/drivers`), a workspace package shared across the monorepo. Each driver: - Lazy-loads its npm package via dynamic `import()` (no startup cost) - Uses parameterized queries for schema introspection (SQL injection safe) - Implements a common `Connector` interface: `connect()`, `execute()`, `listSchemas()`, `listTables()`, `describeTable()`, `close()` @@ -259,9 +259,9 @@ Database drivers live in `packages/drivers/` (`@altimateai/drivers`) — a works Credentials are handled with a 3-tier fallback: -1. **OS Keychain** (via `keytar`) — preferred, secure. Credentials stored in macOS Keychain, Linux Secret Service, or Windows Credential Vault. -2. **Environment variables** (`ALTIMATE_CODE_CONN_*`) — for CI/headless environments. Pass full connection JSON. -3. **Refuse** — if keytar is unavailable and no env var set, credentials are NOT stored in plaintext. The CLI warns and tells you to use env vars. +1. **OS Keychain** (via `keytar`): preferred and secure. Credentials stored in macOS Keychain, Linux Secret Service, or Windows Credential Vault. +2. **Environment variables** (`ALTIMATE_CODE_CONN_*`): for CI/headless environments. Pass full connection JSON. +3. **Refuse**: if keytar is unavailable and no env var set, credentials are NOT stored in plaintext. The CLI warns and tells you to use env vars. Sensitive fields (`password`, `private_key_passphrase`, `access_token`, `ssh_password`, `connection_string`) are always stripped from `connections.json` on disk. @@ -289,4 +289,4 @@ Or in config: } ``` -Telemetry failures **never** affect functionality — every tracking call is wrapped in try/catch. +Telemetry failures **never** affect functionality because every tracking call is wrapped in try/catch. diff --git a/docs/docs/examples/index.md b/docs/docs/examples/index.md new file mode 100644 index 0000000000..070aeb2142 --- /dev/null +++ b/docs/docs/examples/index.md @@ -0,0 +1,49 @@ +# Showcase + +Real-world examples showing what altimate can do across data engineering workflows. Each example demonstrates end-to-end automation — from discovery to implementation. + +
+ +- :material-pipe:{ .lg .middle } **Build, Test & Document dbt Models** + + --- + + Pull context from your Knowledge Hub, grab requirements from a Jira ticket, and build fully tested dbt models — all from your IDE. + + +- :material-snowflake:{ .lg .middle } **Find Broken Views in Snowflake** + + --- + + Create a "Sprint Work Agent" that queries Snowflake, finds empty views, traces root causes through dbt models, and files Jira tickets. + + +- :material-cash-multiple:{ .lg .middle } **Optimize Cost & Performance** + + --- + + Automate discovery and implementation of optimization opportunities across Snowflake, Databricks, and BigQuery. + + +- :material-swap-horizontal:{ .lg .middle } **Migrate PySpark to dbt** + + --- + + Convert a PySpark-based reporting project in Databricks to dbt with automated code conversion, testing, and validation. + + +- :material-bug:{ .lg .middle } **Debug an Airflow DAG** + + --- + + Use AI to debug Airflow DAGs by combining platform integrations, best-practice templates, and automated fix suggestions. + + +- :material-function:{ .lg .middle } **Write Snowflake UDFs** + + --- + + Use the Knowledge Hub to guide LLMs in building Snowflake UDFs with best practices, examples, and auto-generated documentation. + + +
diff --git a/docs/docs/getting-started.md b/docs/docs/getting-started.md index 86363af612..9f823fbea8 100644 --- a/docs/docs/getting-started.md +++ b/docs/docs/getting-started.md @@ -4,7 +4,7 @@ ## Why altimate? -altimate is the open-source data engineering harness — 99+ deterministic tools for building, validating, optimizing, and shipping data products. Unlike general-purpose coding agents, every tool is purpose-built for data engineering: +altimate is the open-source data engineering harness with 100+ deterministic tools for building, validating, optimizing, and shipping data products. Unlike general-purpose coding agents, every tool is purpose-built for data engineering: | Capability | General coding agents | altimate | |---|---|---| @@ -42,7 +42,7 @@ Then in the TUI: This walks you through selecting and authenticating with an LLM provider (Anthropic, OpenAI, Bedrock, Codex, Ollama, etc.). You need a working LLM connection before the agent can do anything useful. -## Step 3: Configure Your Warehouse +## Step 3: Configure Your Warehouse _(Optional)_ Set up warehouse connections so altimate can query your data platform. You have two options: @@ -54,19 +54,17 @@ Set up warehouse connections so altimate can query your data platform. You have `/discover` scans your environment and sets up everything automatically: -1. **Detects your dbt project** — finds `dbt_project.yml`, parses the manifest, and reads profiles -2. **Discovers warehouse connections** — from `~/.dbt/profiles.yml`, running Docker containers, and environment variables (e.g. `SNOWFLAKE_ACCOUNT`, `PGHOST`, `DATABASE_URL`) -3. **Checks installed tools** — dbt, sqlfluff, airflow, dagster, prefect, soda, sqlmesh, great_expectations, sqlfmt -4. **Offers to configure connections** — walks you through adding and testing each discovered warehouse -5. **Indexes schemas** — populates the schema cache for autocomplete and context-aware analysis +1. **Detects your dbt project** by finding `dbt_project.yml`, parsing the manifest, and reading profiles +2. **Discovers warehouse connections** from `~/.dbt/profiles.yml`, running Docker containers, and environment variables (e.g. `SNOWFLAKE_ACCOUNT`, `PGHOST`, `DATABASE_URL`) +3. **Checks installed tools** including dbt, sqlfluff, airflow, dagster, prefect, soda, sqlmesh, great_expectations, sqlfmt +4. **Offers to configure connections** and walks you through adding and testing each discovered warehouse +5. **Indexes schemas** to populate the schema cache for autocomplete and context-aware analysis Once complete, altimate indexes your schemas and detects your tooling, enabling schema-aware autocomplete and context-rich analysis. ### Option B: Manual configuration -Add a warehouse connection to your `altimate-code.json`. Here are minimal snippets for each warehouse type: - -#### Snowflake (quick-connect) +Add a warehouse connection to your `altimate-code.json`. Here's a quick example: ```json { @@ -83,52 +81,7 @@ Add a warehouse connection to your `altimate-code.json`. Here are minimal snippe } ``` -#### BigQuery (quick-connect) - -```json -{ - "warehouses": { - "bigquery": { - "type": "bigquery", - "project": "my-gcp-project", - "dataset": "analytics" - } - } -} -``` - -> Tip: Omit `service_account` to use Application Default Credentials (`gcloud auth application-default login`). - -#### Databricks (quick-connect) - -```json -{ - "warehouses": { - "databricks": { - "type": "databricks", - "host": "dbc-abc123.cloud.databricks.com", - "token": "${DATABRICKS_TOKEN}", - "warehouse_id": "abcdef1234567890", - "catalog": "main" - } - } -} -``` - -#### DuckDB (quick-connect) - -```json -{ - "warehouses": { - "duckdb": { - "type": "duckdb", - "database": "./dev.duckdb" - } - } -} -``` - -See [Warehouse connections](#warehouse-connections) below for full configuration options including key-pair auth, Redshift, and PostgreSQL. +For all warehouse types (Snowflake, BigQuery, Databricks, PostgreSQL, Redshift, DuckDB, MySQL, SQL Server) and advanced options (key-pair auth, ADC, SSH tunneling), see the [Warehouses reference](configure/warehouses.md). ## Step 4: Choose an Agent Mode @@ -136,13 +89,9 @@ altimate offers specialized agent modes for different workflows: | What do you want to do? | Use this agent mode | |---|---| -| Analyzing data without risk of changes | **Analyst** — read-only queries, cost analysis, data profiling | -| Building or generating dbt models | **Builder** — model scaffolding, SQL generation, ref() wiring | -| Validating data quality | **Validator** — test generation, anomaly detection, data contracts | -| Migrating across warehouses | **Migrator** — cross-dialect SQL translation, compatibility checks | -| Teaching team conventions | **Trainer** — learns corrections, enforces naming/style rules across team | -| Research and exploration | **Researcher** — deep-dive analysis, lineage tracing, impact assessment | -| Executive summaries and reports | **Executive** — high-level overviews, cost summaries, health dashboards | +| Analyzing data without risk of changes | **Analyst** for read-only queries, cost analysis, data profiling. SQL writes are blocked entirely. | +| Building or generating dbt models | **Builder** for model scaffolding, SQL generation, ref() wiring. SQL writes prompt for approval. | +| Planning before acting | **Plan** for outlining an approach before switching to builder to execute it | Switch modes in the TUI: @@ -162,94 +111,7 @@ altimate uses a JSON config file. Create `altimate-code.json` in your project ro ### Warehouse connections -```json -{ - "warehouses": { - "prod-snowflake": { - "type": "snowflake", - "account": "xy12345.us-east-1", - "user": "analytics_user", - "password": "${SNOWFLAKE_PASSWORD}", - "warehouse": "COMPUTE_WH", - "database": "ANALYTICS", - "role": "ANALYST_ROLE" - }, - "dev-duckdb": { - "type": "duckdb", - "database": "./dev.duckdb" - } - } -} -``` - -### Snowflake (key-pair auth) - -```json -{ - "warehouses": { - "snowflake-prod": { - "type": "snowflake", - "account": "xy12345.us-east-1", - "user": "svc_altimate", - "private_key_path": "~/.ssh/snowflake_rsa_key.p8", - "warehouse": "COMPUTE_WH", - "database": "ANALYTICS", - "role": "SYSADMIN" - } - } -} -``` - -### BigQuery - -```json -{ - "warehouses": { - "bigquery-prod": { - "type": "bigquery", - "project": "my-gcp-project", - "dataset": "analytics", - "service_account": "/path/to/service-account.json" - } - } -} -``` - -Or use Application Default Credentials (ADC) — just omit `service_account` and run `gcloud auth application-default login`. - -### Databricks - -```json -{ - "warehouses": { - "databricks-prod": { - "type": "databricks", - "host": "dbc-abc123.cloud.databricks.com", - "token": "${DATABRICKS_TOKEN}", - "warehouse_id": "abcdef1234567890", - "catalog": "main", - "schema": "default" - } - } -} -``` - -### PostgreSQL / Redshift - -```json -{ - "warehouses": { - "postgres-dev": { - "type": "postgres", - "host": "localhost", - "port": 5432, - "database": "analytics", - "user": "analyst", - "password": "${PG_PASSWORD}" - } - } -} -``` +For all warehouse types and configuration options, see the [Warehouses reference](configure/warehouses.md). ## Project-level config @@ -302,7 +164,7 @@ altimate integrates with Codex in two ways: 1. Run `/connect` in the TUI 2. Select **Codex** as your provider 3. Authenticate via browser OAuth -4. Your subscription covers all usage — no API keys needed +4. Your subscription covers all usage, so no API keys are needed See [Using with Codex](data-engineering/guides/using-with-codex.md) for details. @@ -358,10 +220,10 @@ Generate data quality tests for all models in the marts/ directory. For each mod ## Next steps -- [Terminal UI](usage/tui.md) — Learn the terminal interface, keybinds, and slash commands -- [CLI](usage/cli.md) — Subcommands, flags, and environment variables -- [Config Files](configure/config.md) — Full config file reference -- [Providers](configure/providers.md) — Set up Anthropic, OpenAI, Bedrock, Ollama, and more -- [Agent Modes](data-engineering/agent-modes.md) — Builder, Analyst, Validator, Migrator, Researcher, Trainer -- [Training](data-engineering/training/index.md) — Correct the agent once, it remembers forever, your team inherits it -- [Tools](data-engineering/tools/sql-tools.md) — 99+ specialized tools for SQL, dbt, and warehouses +- [Terminal UI](usage/tui.md): Learn the terminal interface, keybinds, and slash commands +- [CLI](usage/cli.md): Subcommands, flags, and environment variables +- [Config Files](configure/config.md): Full config file reference +- [Providers](configure/providers.md): Set up Anthropic, OpenAI, Bedrock, Ollama, and more +- [Agent Modes](data-engineering/agent-modes.md): Builder, Analyst, Plan +- [Training](data-engineering/training/index.md): Correct the agent once, it remembers forever, your team inherits it +- [Tools](data-engineering/tools/sql-tools.md): 100+ specialized tools for SQL, dbt, and warehouses diff --git a/docs/docs/getting-started/index.md b/docs/docs/getting-started/index.md new file mode 100644 index 0000000000..e7f5c373bb --- /dev/null +++ b/docs/docs/getting-started/index.md @@ -0,0 +1,182 @@ +--- +title: Altimate Code +hide: + - toc +--- + + + +
+ +

+ altimate-code +

+ +

Open-source data engineering harness.

+ +

100+ specialized data engineering tools for building, validating, optimizing, and shipping data products. Use in your terminal, CI pipeline, orchestration DAGs, or as the harness for your data agents. Evaluate across platforms, independent of any single warehouse provider.

+ +

+ +[Get Started](quickstart.md){ .md-button .md-button--primary } +[See Examples](../examples/index.md){ .md-button } +[View on GitHub :material-github:](https://github.com/AltimateAI/altimate-code){ .md-button } + +

+ +
+ +
+ +```bash +npm install -g altimate-code +``` + +
+ +--- + +

Why Altimate Code?

+

Every major data platform is building AI agents — but they're all locked to one ecosystem. Your data stack isn't.

+ +Your transformation logic is in dbt. Your orchestration is in Airflow or Dagster. Your warehouses span Snowflake and BigQuery (and maybe that Redshift cluster nobody wants to talk about). Your governance requirements cross every platform boundary. + +Altimate Code goes the other direction. It connects to your **entire** stack and lets you bring **any LLM** you want. No vendor lock-in. No platform tax. + +
+ +- :material-open-source-initiative:{ .lg .middle } **Open source & auditable** + + --- + + Every tool, every agent prompt, every analysis rule is inspectable, extensible, and auditable. For data teams in regulated industries, that's not a nice-to-have — it's a requirement. + +- :material-connection:{ .lg .middle } **Cross-platform, not single-vendor** + + --- + + Optimize a Snowflake query in the morning. Migrate a SQL Server pipeline to BigQuery in the afternoon. Same agent, same tools. No warehouse subscription required. First-class support for :material-snowflake: Snowflake, :material-google-cloud: BigQuery, :simple-databricks: Databricks, :material-elephant: PostgreSQL, :material-aws: Redshift, :material-duck: DuckDB, :material-database: MySQL, and :material-microsoft: SQL Server. + +- :material-cloud-outline:{ .lg .middle } **Works with any LLM** + + --- + + Model-agnostic — bring your own provider, use your existing subscription, or run locally. Swap models without swapping your harness. Supports :material-cloud: Anthropic, :material-creation: OpenAI, :material-google: Google Gemini, :material-google: Google Vertex AI, :material-aws: AWS Bedrock, :material-microsoft-azure: Azure OpenAI, :material-server: Ollama, :material-router-wireless: OpenRouter, :material-cog: Mistral, :material-lightning-bolt: Groq, :material-head-snowflake-outline: DeepInfra, :material-brain: Cerebras, :material-message-text: Cohere, :material-group: Together AI, :material-compass: Perplexity, :material-alpha-x-circle: xAI, and :material-github: GitHub Copilot. + +- :material-puzzle:{ .lg .middle } **Customizable to your workflow** + + --- + + Bring your own rules, agents, skills, and tools. Customize the framework to match your company's data conventions, naming standards, and testing patterns. + +- :material-shield-check:{ .lg .middle } **Governed by design — five agent modes** + + --- + + Three agent modes — Builder, Analyst, and Plan — each with tool-level permissions you can `allow`, `ask`, or `deny` per agent. Create custom agents for specialized workflows. Layer on project rules via `AGENTS.md`, automatic context compaction for long sessions, and auto-formatting on every edit. Governance enforced by the harness. + +
+ +--- + +

100+ specialized tools

+

Unlike general-purpose coding agents, every tool is purpose-built for data engineering workflows.

+ +
+ +- :material-database-search:{ .lg .middle } **SQL Anti-Pattern Detection** + + --- + + 19 rules with confidence scoring. Catches SELECT *, missing filters, cartesian joins, non-sargable predicates, and more. 100% accuracy across 1,077 benchmark queries. + +- :material-graph-outline:{ .lg .middle } **Live Column-Level Lineage** + + --- + + Real-time lineage extraction from SQL. Trace any column back through joins, CTEs, and subqueries to its source. Not a cached graph — a living lineage that updates with every change. + +- :material-cash-multiple:{ .lg .middle } **FinOps & Cost Analysis** + + --- + + Credit analysis, expensive query detection, warehouse right-sizing, and unused resource cleanup. Specific optimization recommendations with estimated savings. + +- :material-translate:{ .lg .middle } **Cross-Dialect Translation** + + --- + + Deterministic engine translating SQL between Snowflake, BigQuery, Databricks, Redshift, PostgreSQL, MySQL, SQL Server, and DuckDB with lineage verification. + +- :material-shield-lock-outline:{ .lg .middle } **PII Detection & Safety** + + --- + + Automatic column scanning across 15+ PII categories. Safety checks and policy enforcement before every query touches production. + +- :material-pipe:{ .lg .middle } **dbt Native** + + --- + + Manifest parsing, test generation, model scaffolding, incremental model detection, and lineage-aware refactoring. Builds models that fit your project conventions. + +
+ +--- + +

See it in action

+

Build dbt models from Jira tickets, find broken Snowflake views, optimize warehouse costs, migrate PySpark to dbt, debug Airflow DAGs, and more — all from your terminal.

+ +```bash + +# Analyze a query for anti-patterns and optimization opportunities +> Analyze this query for issues: or + +# Translate SQL across dialects +> /sql-translate this Snowflake query to BigQuery: + +# Get a cost report for your Snowflake or Databricks account +> /cost-report + +# Scaffold a new dbt model following your project patterns +> /model-scaffold fct_revenue from stg_orders and stg_payments + +# Generate column level lineage report for sensitive columns +# from a particular table and identify owners +> Trace the lineage for email_id and name columns from + customer_data.customer_info table and generate a report + of where sensitive data is replicated with table owners info + +# Migrate PySpark jobs to dbt models +> Migrate this PySpark ETL to a dbt model: + +# Debug a failing Airflow DAG +> Debug this Airflow DAG failure: +``` + +

[:octicons-arrow-right-24: Browse more examples](../examples/index.md)

+ +--- + +

Benchmarks

+

Precision matters. Here's where we stand.

+ +| Benchmark | Result | +|---|---| +| **ADE-Bench (DuckDB Local)** | **74.4%** pass rate (32/43 tasks) — 15.4 points ahead of dbt Fusion+MCP (59%). | +| **SQL Anti-Pattern Detection** | 100% accuracy across 1,077 queries, 19 categories. Zero false positives. | +| **Column-Level Lineage** | 100% edge match across 500 queries with complex joins, CTEs, and subqueries. | +| **Snowflake Query Optimization (TPC-H)** | 16.8% average execution speedup (3.6x vs baseline). | + +

[:octicons-arrow-right-24: Full benchmark details](https://www.altimate.sh/benchmarks)

+ +--- + + diff --git a/docs/docs/getting-started/quickstart-new.md b/docs/docs/getting-started/quickstart-new.md new file mode 100644 index 0000000000..44ecc3540c --- /dev/null +++ b/docs/docs/getting-started/quickstart-new.md @@ -0,0 +1,197 @@ +--- +description: "Get value from Altimate Code in 10 minutes. For data engineers who know dbt, Snowflake, and SQL — skip the basics, see what Altimate adds to your workflow." +--- + +# Quickstart + +--- + +## Step 1: Install + +```bash +npm install -g altimate-code +``` + +Or via Homebrew: `brew install AltimateAI/tap/altimate-code` + +--- + +## Step 2: Connect Your LLM + +```bash +altimate # Launch the TUI +/connect # Interactive setup +``` + +Or set an environment variable and skip the prompt: + +```bash +export ANTHROPIC_API_KEY=sk-ant-... +altimate +``` + +> **No API key?** Select **Codex** in `/connect` — it's built-in with no setup. + +--- + +## Step 3: Connect Your Warehouse + +### Option A: Auto-detect from dbt profiles + +If you have `~/.dbt/profiles.yml` configured: + +```bash +/discover +``` + +Altimate reads your dbt profiles and creates warehouse connections automatically. You'll see output like: + +``` +Found dbt project: jaffle_shop (dbt-snowflake) +Found profile: snowflake_prod → Added connection 'snowflake_prod' +Indexing schema... 142 tables, 1,847 columns indexed +``` + +### Option B: Manual configuration + +Add to `altimate-code.json` in your project root: + +=== "Snowflake" + + ```json + { + "connections": { + "snowflake": { + "type": "snowflake", + "account": "xy12345.us-east-1", + "user": "dbt_user", + "password": "${SNOWFLAKE_PASSWORD}", + "warehouse": "TRANSFORM_WH", + "database": "ANALYTICS", + "schema": "PUBLIC", + "role": "TRANSFORMER" + } + } + } + ``` + +=== "BigQuery" + + ```json + { + "connections": { + "bigquery": { + "type": "bigquery", + "project": "my-project-id", + "keyfile": "~/.config/gcloud/application_default_credentials.json" + } + } + } + ``` + +=== "PostgreSQL" + + ```json + { + "connections": { + "postgres": { + "type": "postgres", + "host": "localhost", + "port": 5432, + "database": "analytics", + "user": "postgres", + "password": "${POSTGRES_PASSWORD}" + } + } + } + ``` + +=== "DuckDB (local)" + + ```json + { + "connections": { + "local": { + "type": "duckdb", + "database": "./data/analytics.duckdb" + } + } + } + ``` + +Then index the schema for autocomplete and analysis: + +```bash +/schema-index snowflake +``` + +--- + +## Step 4: Your First Workflow — NYC Taxi Cab Analytics + +Try this end-to-end example. Paste this prompt into the TUI: + +``` +Take the New York City taxi cab public dataset, bring up a DuckDB instance, +and build a dashboard showing areas of maximum coverage and lowest coverage. +Set up a complete dbt project with staging, intermediate, and mart layers, +and create an Airflow DAG to orchestrate the pipeline. +``` + +**What altimate does:** + +1. **Downloads the NYC TLC trip data** into a local DuckDB instance +2. **Scaffolds a full dbt project** with proper directory structure: + ``` + nyc_taxi/ + models/ + staging/ + stg_yellow_trips.sql + stg_taxi_zones.sql + intermediate/ + int_trips_by_zone.sql + int_zone_coverage_stats.sql + marts/ + fct_zone_coverage.sql + dim_zones.sql + seeds/ + taxi_zone_lookup.csv + dbt_project.yml + profiles.yml # points to DuckDB + ``` +3. **Generates mart models** that aggregate pickup/dropoff counts per zone, rank zones by trip volume, and classify them as high-coverage or low-coverage +4. **Creates an Airflow DAG** (`dags/nyc_taxi_pipeline.py`) with tasks for data ingestion, `dbt run`, `dbt test`, and dashboard generation +5. **Builds an interactive dashboard** visualizing zone coverage across NYC — top zones, bottom zones, and geographic distribution + +This single prompt exercises warehouse connections, dbt scaffolding, SQL generation, orchestration wiring, and visualization — the full altimate toolkit. + +--- + +## Skill Discovery: What Can I Do? + +Type `/` in the TUI to see all available skills. Here's a quick reference for common tasks: + +| I want to... | Skill | Example | +| ------------------------- | ------------------- | -------------------------------------------------------- | +| Optimize a slow query | `/query-optimize` | `/query-optimize SELECT * FROM big_table` | +| Review SQL before merging | `/sql-review` | `/sql-review models/staging/stg_orders.sql` | +| Check Snowflake costs | `/cost-report` | `/cost-report` (last 30 days) | +| Scan for PII exposure | `/pii-audit` | `/pii-audit` (full schema) or `/pii-audit models/marts/` | +| Debug a dbt error | `/dbt-troubleshoot` | Paste the error message | +| Add tests to a model | `/dbt-test` | `/dbt-test models/staging/stg_orders.sql` | +| Document a model | `/dbt-docs` | `/dbt-docs models/marts/fct_revenue.sql` | +| Analyze downstream impact | `/dbt-analyze` | `/dbt-analyze stg_orders` (before refactoring) | +| Create a new dbt model | `/dbt-develop` | `Create a staging model for the raw_orders source` | +| Translate SQL dialects | `/sql-translate` | `/sql-translate snowflake bigquery SELECT DATEADD(...)` | +| Check migration safety | `/schema-migration` | `/schema-migration migrations/V003__alter_orders.sql` | +| Teach a pattern | `/teach` | `/teach @models/staging/stg_orders.sql` | + +**Pro tip:** You don't need to memorize these. Just describe what you want in plain English — the agent routes to the right skill automatically. + +--- + +## What's Next + +- **[Setup](quickstart.md)** — Warehouses, LLM providers, agent modes, skills, and permissions +- **[Examples](../examples/index.md)** — End-to-end walkthroughs for common data engineering tasks +- **[Interfaces](../usage/tui.md)** — TUI, CLI, CI, IDE, and GitHub/GitLab integrations diff --git a/docs/docs/getting-started/quickstart.md b/docs/docs/getting-started/quickstart.md new file mode 100644 index 0000000000..d4395cfc05 --- /dev/null +++ b/docs/docs/getting-started/quickstart.md @@ -0,0 +1,443 @@ +--- +description: "Install altimate-code, connect your warehouse and LLM, configure agent modes, skills, and permissions." +--- + +# Setup + +> **You need:** npm 8+ or Homebrew. An API key for any supported LLM provider, or use Codex (built-in, no key required). + +--- + +## Step 1: Install + +```bash +# npm (recommended) +npm install -g altimate-code + +# Homebrew +brew install AltimateAI/tap/altimate-code +``` + +> **Zero additional setup.** One command install. + +--- + +## Step 2: Configure Your LLM + +```bash +altimate # Launch the TUI +/connect # Choose your provider and enter your API key +``` + +Or set an environment variable: + +```bash +export ANTHROPIC_API_KEY=your-key-here # Anthropic Claude (recommended) +export OPENAI_API_KEY=your-key-here # OpenAI +``` + +Minimal config file option (`altimate-code.json` in your project root): + +```json +{ + "provider": { + "anthropic": { + "apiKey": "{env:ANTHROPIC_API_KEY}" + } + }, + "model": "anthropic/claude-sonnet-4-6" +} +``` + +> **No API key?** Select **Codex** in the `/connect` menu. It's a built-in provider with no setup required. + +### Changing your LLM provider + +Switch providers at any time by updating the `provider` and `model` fields in `altimate-code.json`: + +=== "Anthropic" + + ```json + { + "provider": { + "anthropic": { + "apiKey": "{env:ANTHROPIC_API_KEY}" + } + }, + "model": "anthropic/claude-sonnet-4-6" + } + ``` + +=== "OpenAI" + + ```json + { + "provider": { + "openai": { + "apiKey": "{env:OPENAI_API_KEY}" + } + }, + "model": "openai/gpt-4o" + } + ``` + +=== "AWS Bedrock" + + ```json + { + "provider": { + "bedrock": { + "region": "us-east-1", + "accessKeyId": "{env:AWS_ACCESS_KEY_ID}", + "secretAccessKey": "{env:AWS_SECRET_ACCESS_KEY}" + } + }, + "model": "bedrock/anthropic.claude-sonnet-4-6-v1" + } + ``` + +=== "Azure OpenAI" + + ```json + { + "provider": { + "azure": { + "apiKey": "{env:AZURE_OPENAI_API_KEY}", + "baseURL": "https://your-resource.openai.azure.com/openai/deployments/your-deployment" + } + }, + "model": "azure/gpt-4o" + } + ``` + +=== "Google Gemini" + + ```json + { + "provider": { + "google": { + "apiKey": "{env:GOOGLE_API_KEY}" + } + }, + "model": "google/gemini-2.5-pro" + } + ``` + +=== "Ollama (Local)" + + ```json + { + "provider": { + "ollama": { + "baseURL": "http://localhost:11434" + } + }, + "model": "ollama/llama3.1" + } + ``` + +=== "OpenRouter" + + ```json + { + "provider": { + "openrouter": { + "apiKey": "{env:OPENROUTER_API_KEY}" + } + }, + "model": "openrouter/anthropic/claude-sonnet-4-6" + } + ``` + +You can also set a smaller model for lightweight tasks like summarization: + +```json +{ + "model": "anthropic/claude-sonnet-4-6", + "small_model": "anthropic/claude-haiku-4-5-20251001" +} +``` + +--- + +## Step 3: Connect Your Warehouse + +### Auto-discover with `/discover` + +> Skip this step if you want to work locally. You can always run `/discover` later. + +```bash +altimate /discover +``` + +Auto-detects your dbt projects, warehouse credentials from `~/.dbt/profiles.yml`, running Docker containers, and environment variables (`SNOWFLAKE_ACCOUNT`, `PGHOST`, `DATABASE_URL`, etc.). + +### Manual configuration + +Add a warehouse connection to `altimate-code.json`: + +=== "Snowflake" + + ```json + { + "warehouses": { + "snowflake": { + "type": "snowflake", + "account": "xy12345.us-east-1", + "user": "dbt_user", + "password": "{env:SNOWFLAKE_PASSWORD}", + "warehouse": "TRANSFORM_WH", + "database": "ANALYTICS", + "schema": "PUBLIC", + "role": "TRANSFORMER" + } + } + } + ``` + +=== "BigQuery" + + ```json + { + "warehouses": { + "bigquery": { + "type": "bigquery", + "project": "my-project-id", + "credentials_path": "~/.config/gcloud/application_default_credentials.json" + } + } + } + ``` + +=== "Databricks" + + ```json + { + "warehouses": { + "databricks": { + "type": "databricks", + "server_hostname": "dbc-abc123.cloud.databricks.com", + "http_path": "/sql/1.0/warehouses/abcdef", + "access_token": "{env:DATABRICKS_TOKEN}", + "catalog": "main", + "schema": "default" + } + } + } + ``` + +=== "PostgreSQL" + + ```json + { + "warehouses": { + "postgres": { + "type": "postgres", + "host": "localhost", + "port": 5432, + "database": "analytics", + "user": "postgres", + "password": "{env:POSTGRES_PASSWORD}" + } + } + } + ``` + +=== "DuckDB" + + ```json + { + "warehouses": { + "local": { + "type": "duckdb", + "path": "./data/analytics.duckdb" + } + } + } + ``` + +=== "Redshift" + + ```json + { + "warehouses": { + "redshift": { + "type": "redshift", + "host": "my-cluster.abc123.us-east-1.redshift.amazonaws.com", + "port": 5439, + "database": "analytics", + "user": "admin", + "password": "{env:REDSHIFT_PASSWORD}" + } + } + } + ``` + +All warehouse types support SSH tunneling for bastion hosts. See the [Warehouses reference](../configure/warehouses.md) for full options including key-pair auth, IAM roles, and ADC. + +Verify your connection: + +``` +> warehouse_test snowflake +✓ Connected successfully +``` + +--- + +## Step 4: Choose an Agent Mode + +altimate ships with specialized agent modes, each with its own tool permissions: + +| Mode | Access | Use when you want to... | +|---|---|---| +| **Builder** | Read/Write | Create and modify SQL, dbt models, pipelines. SQL writes prompt for approval. | +| **Analyst** | Read-only | Explore production data safely, run cost analysis. SQL writes denied entirely. | +| **Plan** | Minimal | Plan an approach before switching to builder to execute it | + +Switch modes in the TUI: + +``` +/agent analyst +``` + +Or from the CLI: + +```bash +altimate --agent analyst +``` + +The **Analyst** mode is production-safe — it blocks INSERT, UPDATE, DELETE, and DROP statements at the harness level. The **Builder** mode has full read/write access for creating and editing SQL and dbt files. + +--- + +## Step 5: Select Skills + +Skills are reusable prompt templates for common workflows. Type `/` in the TUI to browse all available skills: + +| Skill | Purpose | +|---|---| +| `/query-optimize` | Optimize slow queries with anti-pattern detection | +| `/sql-review` | SQL quality gate with grading | +| `/sql-translate` | Cross-dialect SQL translation | +| `/cost-report` | Snowflake/Databricks cost analysis | +| `/pii-audit` | Scan for PII exposure | +| `/dbt-develop` | Scaffold new dbt models | +| `/dbt-test` | Generate dbt tests | +| `/dbt-docs` | Generate dbt documentation | +| `/dbt-analyze` | Column-level lineage and impact analysis | +| `/dbt-troubleshoot` | Debug dbt errors | +| `/data-viz` | Interactive dashboards and visualizations | +| `/teach` | Teach patterns from example files | +| `/train` | Load standards from documents | + +You don't need to memorize these — describe what you want in plain English and the agent routes to the right skill automatically. + +### Custom skills + +Add your own skills as Markdown files in `.altimate-code/skill/`: + +```markdown +--- +name: cost-review +description: Review SQL queries for cost optimization +--- + +Analyze the SQL query for cost optimization opportunities. +Focus on: $ARGUMENTS +``` + +Skills are loaded from these paths (highest priority first): + +1. `.altimate-code/skill/` (project) +2. `~/.altimate-code/skills/` (global) +3. Custom paths via config: + +```json +{ + "skills": { + "paths": ["./my-skills", "~/shared-skills"] + } +} +``` + +--- + +## Step 6: Configure Permissions + +Governance is enforced at the harness level, not via prompts. Every tool has a permission level: `allow`, `ask`, or `deny`. + +### Per-agent permissions + +Set tool permissions for each agent mode in `altimate-code.json`: + +```json +{ + "agent": { + "analyst": { + "permission": { + "write": "deny", + "edit": "deny", + "bash": { + "dbt docs generate": "allow", + "*": "deny" + } + } + }, + "builder": { + "permission": { + "write": "allow", + "edit": "allow", + "bash": { + "dbt *": "allow", + "rm -rf *": "deny" + } + } + } + } +} +``` + +### Project rules with AGENTS.md + +Define project-wide conventions in an `AGENTS.md` file at your project root. These rules are automatically loaded into every agent's system prompt: + +```markdown +# Project Rules + +- All staging models must be prefixed with `stg_` +- Never run queries without a WHERE clause on production tables +- Use `ref()` instead of hardcoded table names in dbt models +- All new models require at least one unique test and one not_null test +``` + +### Default permissions by agent mode + +| Agent | File writes | SQL writes | Bash | Training | +|---|---|---|---|---| +| Builder | allow | ask (prompts for approval) | ask | allow | +| Analyst | deny | deny (blocked entirely) | deny (safe commands auto-allowed) | allow | +| Plan | deny | deny | deny | deny | + +--- + +## Step 7: Build Your First Artifact + +In the TUI, paste this prompt: + +``` +Build a NYC taxi analytics dashboard using BigQuery public data and dbt +for transformations. Include geographic demand analysis with +pickup/dropoff hotspots, top routes, airport traffic, and borough +comparisons. Add revenue analytics with fare breakdowns, fare +distribution, tip analysis, payment trends, and revenue-per-mile +by route. +``` + +--- + +## What's Next + +- [Agent Modes](../data-engineering/agent-modes.md): Deep dive into each mode's capabilities +- [Warehouses Reference](../configure/warehouses.md): All warehouse types, auth methods, SSH tunneling +- [Config Reference](../configure/config.md): Full config file schema +- [CI & Automation](../data-engineering/guides/ci-headless.md): Run altimate in automated pipelines diff --git a/docs/docs/index.md b/docs/docs/index.md index 3abd9c34cf..63085ca92b 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -17,7 +17,7 @@ hide:

The open-source data engineering harness.

-

99+ tools for building, validating, optimizing, and shipping data products. Use in your terminal, CI pipeline, orchestration DAGs, or as the harness for your data agents. Evaluate across any platform — independent of a single warehouse provider.

+

100+ tools for building, validating, optimizing, and shipping data products. Use in your terminal, CI pipeline, orchestration DAGs, or as the harness for your data agents. Evaluate across any platform, independent of a single warehouse provider.

@@ -39,7 +39,7 @@ npm install -g altimate-code ---

Purpose-built for the data product lifecycle

-

Every tool covers a specific stage — build, validate, optimize, or ship. Not general-purpose AI on top of SQL files.

+

Every tool covers a specific stage: build, validate, optimize, or ship. Not general-purpose AI on top of SQL files.

@@ -92,7 +92,7 @@ npm install -g altimate-code --- - Interactive TUI with 99+ tools, autocomplete for skills, and persistent memory across sessions. + Interactive TUI with 100+ tools, autocomplete for skills, and persistent memory across sessions. - :material-pipe-disconnected:{ .lg .middle } **CI Pipeline** @@ -110,13 +110,13 @@ npm install -g altimate-code --- - Mount altimate as the tool layer underneath Claude Code, Codex, or any AI agent — giving it deterministic, warehouse-aware capabilities. + Mount altimate as the tool layer underneath Claude Code, Codex, or any AI agent, giving it deterministic, warehouse-aware capabilities.
--- -

Seven specialized agents

+

Purpose-built agent modes

Each agent has scoped permissions and purpose-built tools for its role.

@@ -125,50 +125,28 @@ npm install -g altimate-code --- - Create dbt models, SQL pipelines, and data transformations with full read/write access. + Create dbt models, SQL pipelines, and data transformations with full read/write access. SQL writes prompt for approval. Destructive SQL is hard-blocked. - :material-chart-bar:{ .lg .middle } **Analyst** --- - Explore data, run SELECT queries, and generate insights. Read-only access is enforced. + Explore data, run SELECT queries, and generate insights. Read-only access is enforced. SQL writes are denied, not prompted. Safe bash commands auto-allowed. -- :material-check-decagram:{ .lg .middle } **Validator** +- :material-clipboard-text:{ .lg .middle } **Plan** --- - Data quality checks, schema validation, test coverage analysis, and CI gating. - -- :material-swap-horizontal:{ .lg .middle } **Migrator** - - --- - - Cross-warehouse SQL translation, schema migration, and dialect conversion workflows. - -- :material-magnify:{ .lg .middle } **Researcher** - - --- - - Deep multi-step investigations with structured reports. Root cause analysis, cost audits, deprecation checks. - -- :material-school:{ .lg .middle } **Trainer** - - --- - - Correct the agent once, it remembers forever, your team inherits it. Teach patterns, rules, and domain knowledge. - -- :material-account-tie:{ .lg .middle } **Executive** - - --- - - Business-friendly reporting. No SQL jargon — translates technical findings into impact and recommendations. + Plan before acting. Read-only with minimal permissions. No SQL, no bash, no file modifications.
+Create custom agents with tailored permissions for specialized workflows like validation, migration, research, or executive reporting. See [Agent Configuration](configure/agents.md#custom-agents). + ---

Works with any LLM

-

Model-agnostic — bring your own provider or run locally.

+

Model-agnostic. Bring your own provider or run locally.

@@ -185,7 +163,7 @@ npm install -g altimate-code ---

Evaluate across any platform

-

First-class support for 8 warehouses. Migrate, compare, and translate across platforms — not locked to one vendor.

+

First-class support for 10 databases. Migrate, compare, and translate across platforms, not locked to one vendor.

@@ -197,6 +175,8 @@ npm install -g altimate-code - :material-duck: **DuckDB** - :material-database: **MySQL** - :material-microsoft: **SQL Server** +- :material-database-outline: **Oracle** +- :material-database-search: **SQLite**
@@ -204,8 +184,8 @@ npm install -g altimate-code diff --git a/docs/docs/llms.txt b/docs/docs/llms.txt index 70eaa8733f..f287812765 100644 --- a/docs/docs/llms.txt +++ b/docs/docs/llms.txt @@ -1,42 +1,42 @@ # altimate-code llms.txt # AI-friendly documentation index for altimate-code -# Generated: 2026-03-17 | Version: v0.4.1 -# Source: https://altimateai.github.io/altimate-code +# Generated: 2026-03-18 | Version: v0.5.0 +# Source: https://docs.altimate.sh -> altimate-code is an open-source data engineering harness — 99+ tools for building, validating, optimizing, and shipping data products. Use in your terminal, CI pipeline, orchestration DAGs, or as the tool layer for your data agents. Includes a deterministic SQL Intelligence Engine (100% F1 across 1,077 queries), column-level lineage, FinOps analysis, PII detection, and dbt integration. Works with any LLM provider. Local-first, MIT-licensed. +> altimate-code is an open-source data engineering harness with 100+ tools for building, validating, optimizing, and shipping data products. Use in your terminal, CI pipeline, orchestration DAGs, or as the tool layer for your data agents. Includes a deterministic SQL Intelligence Engine (100% F1 across 1,077 queries), column-level lineage, FinOps analysis, PII detection, and dbt integration. Works with any LLM provider. Local-first, MIT-licensed. ## Get Started -- [Quickstart (5 min)](https://altimateai.github.io/altimate-code/quickstart/): Install altimate, configure your LLM provider, connect your warehouse, and run your first query in under 5 minutes. -- [Full Setup Guide](https://altimateai.github.io/altimate-code/getting-started/): Complete installation, warehouse configuration for all 8 supported warehouses, LLM provider setup, and first-run walkthrough. -- [Network & Proxy](https://altimateai.github.io/altimate-code/network/): Proxy configuration, CA certificate setup, firewall requirements. +- [Quickstart (5 min)](https://docs.altimate.sh/quickstart/): Install altimate, configure your LLM provider, connect your warehouse, and run your first query in under 5 minutes. +- [Full Setup Guide](https://docs.altimate.sh/getting-started/): Complete installation, warehouse configuration for all 8 supported warehouses, LLM provider setup, and first-run walkthrough. +- [Network & Proxy](https://docs.altimate.sh/reference/network/): Proxy configuration, CA certificate setup, firewall requirements. ## Data Engineering -- [Agent Modes](https://altimateai.github.io/altimate-code/data-engineering/agent-modes/): 7 specialized agents — Builder (full read/write), Analyst (read-only enforced), Validator, Migrator, Researcher, Trainer, Executive — each with scoped permissions and purpose-built tool access. -- [Training Overview](https://altimateai.github.io/altimate-code/data-engineering/training/): How to teach altimate project-specific patterns, naming conventions, and corrections that persist across sessions and team members. -- [Team Deployment](https://altimateai.github.io/altimate-code/data-engineering/training/team-deployment/): How to commit training to git so your entire team inherits SQL conventions automatically. -- [SQL Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/sql-tools/): 9 SQL analysis tools with 19 anti-pattern rules. 100% F1 accuracy on 1,077 benchmark queries. -- [Schema Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/schema-tools/): Warehouse schema introspection, metadata indexing, and column-level analysis tools. -- [FinOps Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/finops-tools/): Credit analysis, expensive query detection, warehouse right-sizing, unused resource cleanup, RBAC auditing. -- [Lineage Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/lineage-tools/): Column-level lineage extraction from SQL. 100% edge-match accuracy on 500 benchmark queries. -- [dbt Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/dbt-tools/): dbt manifest parsing, test generation, model scaffolding, incremental logic detection. -- [Warehouse Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/warehouse-tools/): Direct connectivity to Snowflake, BigQuery, Databricks, PostgreSQL, Redshift, DuckDB, MySQL, SQL Server. -- [Memory Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/memory-tools/): Session memory, persistent corrections, and team training storage. -- [Cost Optimization Guide](https://altimateai.github.io/altimate-code/data-engineering/guides/cost-optimization/): Step-by-step warehouse cost reduction with before/after SQL examples and savings estimates. -- [Migration Guide](https://altimateai.github.io/altimate-code/data-engineering/guides/migration/): Cross-warehouse SQL migration with side-by-side examples. -- [CI & Headless Mode](https://altimateai.github.io/altimate-code/data-engineering/guides/ci-headless/): Non-interactive use in GitHub Actions, scheduled jobs, and pre-commit hooks. +- [Agent Modes](https://docs.altimate.sh/data-engineering/agent-modes/): 7 specialized agents (Builder, Analyst, Validator, Migrator, Researcher, Trainer, Executive), each with scoped permissions and purpose-built tool access. Builder has full read/write; Analyst has read-only enforced. +- [Training Overview](https://docs.altimate.sh/data-engineering/training/): How to teach altimate project-specific patterns, naming conventions, and corrections that persist across sessions and team members. +- [Team Deployment](https://docs.altimate.sh/data-engineering/training/team-deployment/): How to commit training to git so your entire team inherits SQL conventions automatically. +- [SQL Tools](https://docs.altimate.sh/data-engineering/tools/sql-tools/): 9 SQL analysis tools with 19 anti-pattern rules. 100% F1 accuracy on 1,077 benchmark queries. +- [Schema Tools](https://docs.altimate.sh/data-engineering/tools/schema-tools/): Warehouse schema introspection, metadata indexing, and column-level analysis tools. +- [FinOps Tools](https://docs.altimate.sh/data-engineering/tools/finops-tools/): Credit analysis, expensive query detection, warehouse right-sizing, unused resource cleanup, RBAC auditing. +- [Lineage Tools](https://docs.altimate.sh/data-engineering/tools/lineage-tools/): Column-level lineage extraction from SQL. 100% edge-match accuracy on 500 benchmark queries. +- [dbt Tools](https://docs.altimate.sh/data-engineering/tools/dbt-tools/): dbt manifest parsing, test generation, model scaffolding, incremental logic detection. +- [Warehouse Tools](https://docs.altimate.sh/data-engineering/tools/warehouse-tools/): Direct connectivity to Snowflake, BigQuery, Databricks, PostgreSQL, Redshift, DuckDB, MySQL, SQL Server. +- [Memory Tools](https://docs.altimate.sh/data-engineering/tools/memory-tools/): Session memory, persistent corrections, and team training storage. +- [Cost Optimization Guide](https://docs.altimate.sh/data-engineering/guides/cost-optimization/): Step-by-step warehouse cost reduction with before/after SQL examples and savings estimates. +- [Migration Guide](https://docs.altimate.sh/data-engineering/guides/migration/): Cross-warehouse SQL migration with side-by-side examples. +- [CI & Headless Mode](https://docs.altimate.sh/data-engineering/guides/ci-headless/): Non-interactive use in GitHub Actions, scheduled jobs, and pre-commit hooks. ## Configure -- [Configuration Overview](https://altimateai.github.io/altimate-code/configure/config/): Full altimate-code.json schema, value substitution, project structure, experimental flags. -- [Providers](https://altimateai.github.io/altimate-code/configure/providers/): 17 LLM provider configurations with JSON examples: Anthropic, OpenAI, Google Gemini, Vertex AI, Amazon Bedrock, Azure OpenAI, Mistral, Groq, Ollama, and more. -- [Agent Skills](https://altimateai.github.io/altimate-code/configure/skills/): How to configure, discover, and add custom skills. -- [Permissions](https://altimateai.github.io/altimate-code/configure/permissions/): Permission levels, pattern matching, per-agent restrictions, deny rules for destructive SQL. -- [Tracing](https://altimateai.github.io/altimate-code/configure/tracing/): Local-first observability — trace schema, span types, live viewing, remote OTLP exporters, crash recovery. -- [Telemetry](https://altimateai.github.io/altimate-code/configure/telemetry/): 25 anonymized event types, privacy guarantees, opt-out instructions. +- [Configuration Overview](https://docs.altimate.sh/configure/config/): Full altimate-code.json schema, value substitution, project structure, experimental flags. +- [Providers](https://docs.altimate.sh/configure/providers/): 17 LLM provider configurations with JSON examples: Anthropic, OpenAI, Google Gemini, Vertex AI, Amazon Bedrock, Azure OpenAI, Mistral, Groq, Ollama, and more. +- [Agent Skills](https://docs.altimate.sh/configure/skills/): How to configure, discover, and add custom skills. +- [Permissions](https://docs.altimate.sh/configure/permissions/): Permission levels, pattern matching, per-agent restrictions, deny rules for destructive SQL. +- [Tracing](https://docs.altimate.sh/configure/tracing/): Local-first observability covering trace schema, span types, live viewing, remote OTLP exporters, and crash recovery. +- [Telemetry](https://docs.altimate.sh/reference/telemetry/): 25 anonymized event types, privacy guarantees, opt-out instructions. ## Reference -- [Security FAQ](https://altimateai.github.io/altimate-code/security-faq/): 12 Q&A pairs on data handling, credentials, permissions, network endpoints, and team hardening. -- [Troubleshooting](https://altimateai.github.io/altimate-code/troubleshooting/): 6 common issues with step-by-step fixes, including Python bridge failures and warehouse connection errors. +- [Security FAQ](https://docs.altimate.sh/reference/security-faq/): 12 Q&A pairs on data handling, credentials, permissions, network endpoints, and team hardening. +- [Troubleshooting](https://docs.altimate.sh/reference/troubleshooting/): 6 common issues with step-by-step fixes, including tool execution errors and warehouse connection setup. diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md index a0292ffaad..273c5bec2b 100644 --- a/docs/docs/quickstart.md +++ b/docs/docs/quickstart.md @@ -1,28 +1,28 @@ --- -description: "Install altimate-code and run your first SQL analysis. The open-source data engineering harness — 99+ tools for building, validating, optimizing, and shipping data products." +description: "Install altimate-code and run your first SQL analysis. The open-source data engineering harness with 100+ tools for building, validating, optimizing, and shipping data products." --- # Quickstart -> **You need:** npm 8+ or Homebrew. An API key for any supported LLM provider — or use Codex (built-in, no key required). +> **You need:** npm 8+ or Homebrew. An API key for any supported LLM provider, or use Codex (built-in, no key required). --- -## Step 1 — Install +## Step 1: Install ```bash # npm (recommended) -npm install -g @altimateai/altimate-code +npm install -g altimate-code # Homebrew brew install AltimateAI/tap/altimate-code ``` -> **Zero Python setup required.** On first run, the CLI automatically downloads `uv`, creates an isolated Python environment, and installs the data engine. No `pip install`, no virtualenv management. +> **Zero additional setup.** One command install. --- -## Step 2 — Configure Your LLM +## Step 2: Configure Your LLM ```bash altimate # Launch the TUI @@ -48,11 +48,11 @@ Minimal config file option (`altimate-code.json` in your project root): } ``` -> **No API key?** Select **Codex** in the `/connect` menu — it's a built-in provider with no setup required. +> **No API key?** Select **Codex** in the `/connect` menu. It's a built-in provider with no setup required. --- -## Step 3 — Connect Your Warehouse _(Optional)_ +## Step 3: Connect Your Warehouse _(Optional)_ > Skip this step if you want to work locally or don't need warehouse/orchestration connections. You can always run `/discover` later. @@ -60,7 +60,7 @@ Minimal config file option (`altimate-code.json` in your project root): altimate /discover ``` -`/discover` scans for dbt projects, warehouse credentials (from `~/.dbt/profiles.yml`, environment variables, and Docker), and installed tools. It **reads but never writes** — safe to run against production. +Auto-detects your dbt projects, warehouse credentials, and installed tools. See [Full Setup](getting-started.md#step-3-configure-your-warehouse-optional) for details on what `/discover` finds and manual configuration options. **No cloud warehouse?** Use DuckDB with a local file: @@ -77,7 +77,7 @@ altimate /discover --- -## Step 4 — Build Your First Artifact +## Step 4: Build Your First Artifact In the TUI, try these prompts or describe your own use case: @@ -97,6 +97,6 @@ Build me a real time, interactive dashboard for my macbook system metrics and he ## What's Next -- [Full Setup](getting-started.md) — All warehouse configs, LLM providers, advanced setup -- [Agent Modes](data-engineering/agent-modes.md) — Choose the right agent for your task -- [CI & Automation](data-engineering/guides/ci-headless.md) — Run altimate in automated pipelines +- [Full Setup](getting-started.md): All warehouse configs, LLM providers, advanced setup +- [Agent Modes](data-engineering/agent-modes.md): Choose the right agent for your task +- [CI & Automation](data-engineering/guides/ci-headless.md): Run altimate in automated pipelines diff --git a/docs/docs/reference/changelog.md b/docs/docs/reference/changelog.md new file mode 100644 index 0000000000..e48494e67a --- /dev/null +++ b/docs/docs/reference/changelog.md @@ -0,0 +1,367 @@ +# Changelog + +All notable changes to this project will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## How to check your version + +```bash +altimate --version +``` + +## How to upgrade + +```bash +npm update -g altimate-code +``` + +After upgrading, the TUI welcome banner shows what changed since your previous version. + +--- + +## [0.5.0] - 2026-03-18 + +### Added + +- Smooth streaming mode for TUI response rendering (#281) +- Ship builtin skills to customers via postinstall (#279) +- `/configure-claude` and `/configure-codex` built-in commands (#235) + +### Fixed + +- Brew formula stuck at v0.3.1, version normalization in publish pipeline (#286) +- Harden auth field handling for all warehouse drivers (#271) +- Suppress console logging that corrupts TUI display (#269) + +## [0.4.9] - 2026-03-18 + +### Added + +- Script to build and run compiled binary locally (#262) + +### Fixed + +- Snowflake auth — support all auth methods (`password`, `keypair`, `externalbrowser`, `oauth`), fix field name mismatches (#268) +- dbt tool regression — schema format mismatch, silent failures, wrong results (#263) +- `altimate-dbt compile`, `execute`, and children commands fail with runtime errors (#255) +- `Cannot find module @altimateai/altimate-core` on `npm install` (#259) +- Dispatcher tests fail in CI due to shared module state (#257) + +### Changed + +- CI: parallel per-target builds — 12 jobs, ~5 min wall clock instead of ~20 min (#254) +- CI: faster release — build parallel with test, lower compression, tighter timeouts (#251) +- Docker E2E tests skip in CI unless explicitly opted in (#253) + +## [0.4.2] - 2026-03-18 + +### Breaking Changes + +- **Python engine eliminated** — all 73 tool methods now run natively in TypeScript. No Python, pip, venv, or `altimate-engine` installation required. Fixes #210. + +### Added + +- `@altimateai/drivers` shared workspace package with 10 database drivers (Snowflake, BigQuery, PostgreSQL, Databricks, Redshift, MySQL, SQL Server, Oracle, DuckDB, SQLite) +- Direct `@altimateai/altimate-core` napi-rs bindings — SQL analysis calls go straight to Rust (no Python intermediary) +- dbt-first SQL execution — automatically uses `profiles.yml` connection when in a dbt project +- Warehouse telemetry (5 event types: connect, query, introspection, discovery, census) +- 340+ new tests including E2E tests against live Snowflake, BigQuery, and Databricks accounts +- Encrypted key-pair auth support for Snowflake (PKCS8 PEM with passphrase) +- Comprehensive driver documentation at `docs/docs/drivers.md` + +### Fixed + +- Python bridge connection failures for UV, conda, and non-standard venv setups (#210) +- SQL injection in finops/schema queries (parameterized queries + escape utility) +- Credential store no longer saves plaintext passwords +- SSH tunnel cleanup on SIGINT/SIGTERM +- Race condition in connection registry for concurrent access +- Databricks DATE_SUB syntax +- Redshift describeTable column name +- SQL Server describeTable includes views +- Dispatcher telemetry wrapped in try/catch +- Flaky test timeouts + +### Removed + +- `packages/altimate-engine/` — entire Python package (~17,000 lines) +- `packages/opencode/src/altimate/bridge/` — JSON-RPC bridge +- `.github/workflows/publish-engine.yml` — PyPI publish workflow + +## [0.4.1] - 2026-03-16 + +### Added + +- Local-first tracing system replacing Langfuse (#183) + +### Fixed + +- Engine not found when user's project has `.venv` in cwd — managed venv now takes priority (#199) +- Missing `[warehouses]` pip extra causing FinOps tools to fail with "snowflake-connector-python not installed" (#199) +- Engine install trusting stale manifest when venv/Python binary was deleted (#199) +- Extras changes not detected on upgrade — manifest now tracks installed extras (#199) +- Windows path handling for dev/cwd venv resolution (#199) +- Concurrent bridge startup race condition — added `pendingStart` mutex (#199) +- Unhandled spawn `error` event crashing host process on invalid Python path (#199) +- Bridge hung permanently after ping failure — child process now cleaned up (#199) +- `restartCount` incorrectly incremented on signal kills, prematurely disabling bridge (#199) +- TUI prompt corruption from engine bootstrap messages writing to stderr (#180) +- Tracing exporter timeout leaking timers (#191) +- Feedback submission failing when repo labels don't exist (#188) +- Pre-release security and resource cleanup fixes for tracing (#197) + +## [0.4.0] - 2026-03-15 + +### Added + +- Data-viz skill for data storytelling and visualizations (#170) +- AI Teammate training system with learn-by-example patterns (#148) + +### Fixed + +- Sidebar shows "OpenCode" instead of "Altimate Code" after upstream merge (#168) +- Prevent upstream tags from polluting origin (#165) +- Show welcome box on first CLI run, not during postinstall (#163) + +### Changed + +- Engine version bumped to 0.4.0 + +## [0.3.1] - 2026-03-15 + +### Fixed + +- Database migration crash when upgrading from v0.2.x — backfill NULL migration names for Drizzle beta.16 compatibility (#161) +- Install banner not visible during `npm install` — moved output from stdout to stderr (#161) +- Verbose changelog dump removed from CLI startup (#161) +- `altimate upgrade` detection broken — `method()` and `latest()` referenced upstream `opencode-ai` package names instead of `@altimateai/altimate-code` (#161) +- Brew formula detection and upgrade referencing `opencode` instead of `altimate-code` (#161) +- Homebrew tap updated to v0.3.0 (was stuck at 0.1.4 due to expired `HOMEBREW_TAP_TOKEN`) (#161) +- `.opencode/memory/` references in docs updated to `.altimate-code/memory/` (#161) +- Stale `@opencode-ai/plugin` reference in CONTRIBUTING.md (#161) + +### Changed + +- CI now uses path-based change detection to skip unaffected jobs (saves ~100s on non-TS changes) (#161) +- Release workflow gated on test job passing (#157) +- Upstream merge restricted to published GitHub releases only (#150) + +## [0.3.0] - 2026-03-15 + +### Added + +- AI-powered prompt enhancement (#144) +- Altimate Memory — persistent cross-session memory with TTL, namespaces, citations, and audit logging (#136) +- Upstream merge with OpenCode v1.2.26 (#142) + +### Fixed + +- Sentry review findings from PR #144 (#147) +- OAuth token refresh retry and error handling for idle timeout (#133) +- Welcome banner on first CLI run after install/upgrade (#132) +- `@altimateai/altimate-code` npm package name restored after upstream rebase +- Replace `mock.module()` with `spyOn()` to fix 149 test failures (#153) + +### Changed + +- Rebrand user-facing references to Altimate Code (#134) +- Bump `@modelcontextprotocol/sdk` dependency (#139) +- Engine version bumped to 0.3.0 + +## [0.2.5] - 2026-03-13 + +### Added + +- `/feedback` command and `feedback_submit` tool for in-app user feedback (#89) +- Datamate manager — dynamic MCP server management (#99) +- Non-interactive mode for `mcp add` command with input validation +- `mcp remove` command +- Upstream merge with OpenCode v1.2.20 + +### Fixed + +- TUI crash after upstream merge (#98) +- `GitlabAuthPlugin` type incompatibility in plugin loader (#92) +- All test failures from fork restructure (#91) +- CI/CD workflow paths updated from `altimate-code` to `opencode` +- Fallback to global config when not in a git repo +- PR standards workflow `TEAM_MEMBERS` ref corrected from `dev` to `main` (#101) + +### Changed + +- Removed self-hosted runners from public repo CI (#110) +- Migrated CI/release to ARC runners (#93, #94) +- Reverted Windows tests to `windows-latest` (#95) +- Engine version bumped to 0.2.5 + +## [0.2.4] - 2026-03-04 + +### Added + +- E2E tests for npm install pipeline: postinstall script, bin wrapper, and publish output (#50) + +## [0.2.3] - 2026-03-04 + +### Added + +- Postinstall welcome banner and changelog display after upgrade (#48) + +### Fixed + +- Security: validate well-known auth command type before execution, add confirmation prompt (#45) +- CI/CD: SHA-pin all GitHub Actions, per-job least-privilege permissions (#45) +- MCP: fix copy-paste log messages, log init errors, prefix floating promises (#45) +- Session compaction: clean up compactionAttempts on abort to prevent memory leak (#45) +- Telemetry: retry failed flush events once with buffer-size cap (#45, #46) +- Telemetry: flush events before process exit (#46) +- TUI: resolve worker startup crash from circular dependency (#47) +- CLI: define ALTIMATE_CLI build-time constants for correct version reporting (#41) +- Address 4 issues found in post-v0.2.2 commits (#49) +- Address remaining code review issues from PR #39 (#43) + +### Changed + +- CI/CD: optimize pipeline with caching and parallel builds (#42) + +### Docs + +- Add security FAQ (#44) + +## [0.2.2] - 2026-03-05 + +### Fixed + +- Telemetry init: `Config.get()` failure outside Instance context no longer silently disables telemetry +- Telemetry init: called early in CLI middleware and worker thread so MCP/engine/auth events are captured +- Telemetry init: promise deduplication prevents concurrent init race conditions +- Telemetry: pre-init events are now buffered and flushed (previously silently dropped) +- Telemetry: user email is SHA-256 hashed before sending (privacy) +- Telemetry: error message truncation standardized to 500 chars across all event types +- Telemetry: `ALTIMATE_TELEMETRY_DISABLED` env var now actually checked in init +- Telemetry: MCP disconnect reports correct transport type instead of hardcoded `stdio` +- Telemetry: `agent_outcome` now correctly reports `"error"` outcome for failed sessions + +### Changed + +- Auth telemetry events use session context when available instead of hardcoded `"cli"` + +## [0.2.1] - 2026-03-05 + +### Added + +- Comprehensive telemetry instrumentation: 25 event types across auth, MCP servers, Python engine, provider errors, permissions, upgrades, context utilization, agent outcomes, workflow sequencing, and environment census +- Telemetry docs page with event table, privacy policy, opt-out instructions, and contributor guide +- AppInsights endpoint added to network firewall documentation +- `categorizeToolName()` helper for tool classification (sql, schema, dbt, finops, warehouse, lineage, file, mcp) +- `bucketCount()` helper for privacy-safe count bucketing + +### Fixed + +- Command loading made resilient to MCP/Skill initialization failures + +### Changed + +- CLI binary renamed from `altimate-code` to `altimate` + +## [0.2.0] - 2026-03-04 + +### Added + +- Context management: auto-compaction with overflow recovery, observation masking, and loop protection +- Context management: data-engineering-aware compaction template preserving warehouse, schema, dbt, and lineage context +- Context management: content-aware token estimation (code, JSON, SQL, text heuristics) +- Context management: observation masking replaces pruned tool outputs with fingerprinted summaries +- Context management: provider overflow detection for Azure OpenAI patterns +- CLI observability: telemetry module with session, generation, tool call, and error tracking +- `/discover` command for data stack setup with project_scan tool +- User documentation for context management configuration + +### Fixed + +- ContextOverflowError now triggers automatic compaction instead of a dead-end error +- `isOverflow()` correctly reserves headroom for models with separate input/output limits +- `NamedError.isInstance()` no longer crashes on null input +- Text part duration tracking now preserves original start timestamp +- Compaction loop protection: max 3 consecutive attempts per turn, counter resets between turns +- Negative usable context guard for models where headroom exceeds base capacity + +### Changed + +- Removed cost estimation and complexity scoring bindings +- Docs: redesigned homepage with hero, feature cards, and pill layouts +- Docs: reorganized sidebar navigation for better discoverability + +## [0.1.10] - 2026-03-03 + +### Fixed + +- Build: resolve @opentui/core parser.worker.js via import.meta.resolve for monorepo hoisting +- Build: output binary as `altimate-code` instead of `opencode` +- Publish: update Docker/AUR/Homebrew references from anomalyco/opencode to AltimateAI/altimate-code +- Publish: make Docker/AUR/Homebrew steps non-fatal +- Bin wrapper: look for `@altimateai/altimate-code-*` scoped platform packages +- Postinstall: resolve `@altimateai` scoped platform packages +- Dockerfile: update binary paths and names + +## [0.1.9] - 2026-03-02 + +### Fixed + +- Build: fix solid-plugin import to use bare specifier for monorepo hoisting +- CI: install warehouse extras for Python tests (duckdb, boto3, etc.) +- CI: restrict pytest collection to tests/ directory +- CI: fix all ruff lint errors in Python engine +- CI: fix remaining TypeScript test failures (agent rename, config URLs, Pydantic model) +- Update theme schema URLs and documentation references to altimate-code.dev + +## [0.1.8] - 2026-03-02 + +### Changed + +- Rename npm scope from `@altimate` to `@altimateai` for all packages +- Wrapper package is now `@altimateai/altimate-code` (no `-ai` suffix) + +### Fixed + +- CI: test fixture writes config to correct filename (`altimate-code.json`) +- CI: add `dev` optional dependency group to Python engine for pytest/ruff + +## [0.1.7] - 2026-03-02 + +### Changed + +- Improve TUI logo readability: redesign M, E, T, I letter shapes +- Add two-tone logo color: ALTIMATE in peach, CODE in purple + +### Fixed + +- Release: npm publish glob now finds scoped package directories +- Release: PyPI publish skips existing versions instead of failing + +## [0.1.5] - 2026-03-02 + +### Added + +- Anthropic OAuth plugin ported in-tree +- Docs site switched from Jekyll to Material for MkDocs + +### Fixed + +- Build script: restore `.trim()` on models API JSON to prevent syntax error in generated `models-snapshot.ts` +- Build script: fix archive path for scoped package names in release tarball/zip creation + +## [0.1.0] - 2025-06-01 + +### Added + +- Initial open-source release +- SQL analysis and formatting via Python engine +- Column-level lineage tracking +- dbt integration (profiles, lineage, `+` operator) +- Warehouse connectivity (Snowflake, BigQuery, Databricks, Postgres, DuckDB, MySQL) +- AI-powered SQL code review +- TUI interface with Solid.js +- MCP (Model Context Protocol) server support +- Auto-bootstrapping Python engine via uv diff --git a/docs/docs/network.md b/docs/docs/reference/network.md similarity index 100% rename from docs/docs/network.md rename to docs/docs/reference/network.md diff --git a/docs/docs/security-faq.md b/docs/docs/reference/security-faq.md similarity index 64% rename from docs/docs/security-faq.md rename to docs/docs/reference/security-faq.md index 2abe309340..3206c435e9 100644 --- a/docs/docs/security-faq.md +++ b/docs/docs/reference/security-faq.md @@ -19,7 +19,7 @@ Altimate Code needs database credentials to connect to your warehouse. Credentia ## What can the agent actually execute? -Altimate Code can read files, write files, and run shell commands — but only with your permission. The [permission system](configure/permissions.md) lets you control every tool: +Altimate Code can read files, write files, and run shell commands, but only with your permission. The [permission system](../configure/permissions.md) lets you control every tool: | Level | Behavior | |-------|----------| @@ -96,9 +96,9 @@ No other outbound connections are made. See the [Network reference](network.md) Yes, with constraints. You need: -1. **A locally accessible LLM** — self-hosted model or a provider reachable from your network -2. **Model catalog disabled** — set `ALTIMATE_CLI_DISABLE_MODELS_FETCH=true` or provide a local models file -3. **Telemetry disabled** — set `ALTIMATE_TELEMETRY_DISABLED=true` +1. **A locally accessible LLM**, either a self-hosted model or a provider reachable from your network +2. **Model catalog disabled** by setting `ALTIMATE_CLI_DISABLE_MODELS_FETCH=true` or providing a local models file +3. **Telemetry disabled** by setting `ALTIMATE_TELEMETRY_DISABLED=true` ```bash export ALTIMATE_CLI_DISABLE_MODELS_FETCH=true @@ -108,7 +108,7 @@ export ALTIMATE_CLI_MODELS_PATH=/path/to/models.json ## What telemetry is collected? -Anonymous usage telemetry — event names, token counts, timing, and error types. **Never** code, queries, credentials, file paths, or prompt content. See the full [Telemetry reference](configure/telemetry.md) for the complete event list. +Anonymous usage telemetry, including event names, token counts, timing, and error types. **Never** code, queries, credentials, file paths, or prompt content. See the full [Telemetry reference](telemetry.md) for the complete event list. Disable telemetry entirely: @@ -130,8 +130,8 @@ export ALTIMATE_TELEMETRY_DISABLED=true When you run `altimate auth login `, the CLI fetches `/.well-known/altimate-code` to discover the server's auth command. Before executing anything: -1. **Validation** — The auth command must be an array of strings. Malformed or unexpected types are rejected. -2. **Confirmation prompt** — You are shown the exact command and must explicitly approve it before it runs. +1. **Validation.** The auth command must be an array of strings. Malformed or unexpected types are rejected. +2. **Confirmation prompt.** You are shown the exact command and must explicitly approve it before it runs. ``` $ altimate auth login https://mcp.example.com @@ -150,18 +150,23 @@ MCP (Model Context Protocol) servers extend Altimate Code with additional tools. - **MCP tool calls go through the permission system.** You can set MCP tools to `"ask"` or `"deny"` like any other tool. !!! warning - Third-party MCP servers are not reviewed or audited by Altimate. Treat them like any other third-party dependency — review the source, check for updates, and limit their access. + Third-party MCP servers are not reviewed or audited by Altimate. Treat them like any other third-party dependency: review the source, check for updates, and limit their access. -## How does the Python engine work? Is it safe? +## How does the SQL analysis engine work? -The Python engine (`altimate_engine`) runs as a local subprocess, communicating with the CLI over JSON-RPC via stdio. It: +As of v0.4.2, all 73 tool methods run natively in TypeScript via `@altimateai/altimate-core` (Rust napi-rs bindings). There is no Python dependency. The engine executes in-process with no subprocess, no network port, and no external service. -- Runs under your user account with your permissions -- Has no network access beyond what your warehouse connections require -- Restarts automatically if it crashes (max 2 restarts) -- Times out after 30 seconds per call +## What is `sensitive_write` protection? -The engine is not exposed on any network port — it only communicates through stdin/stdout pipes with the parent CLI process. +Altimate Code classifies writes to credential-adjacent files as `sensitive_write` operations. These always trigger a confirmation prompt, even if `write` is set to `"allow"` in your config. Protected patterns include: + +- **Environment files** such as `.env`, `.env.local`, `.env.production`, `.env.staging` +- **Credential files** such as `credentials.json`, `service-account.json`, `.npmrc`, `.pypirc`, `.netrc`, `.pgpass` +- **Secret key directories** such as `.ssh/`, `.aws/`, `.gnupg/`, `.gcloud/`, `.kube/`, `.docker/` +- **Private key extensions** such as `*.pem`, `*.key`, `*.p12`, `*.pfx` +- **Version control** files such as `.git/config`, `.git/hooks/*` + +You can approve per-file with "Allow always" to reduce prompt fatigue. The approval persists for your current session only. On macOS and Windows, matching is case-insensitive. ## Does Altimate Code store conversation history? @@ -170,22 +175,22 @@ Yes. Altimate Code persists session data locally on your machine: - **Session messages** are stored in a local SQLite database so you can resume, review, and revert conversations. - **Prompt history** (your recent inputs) is saved to `~/.state/prompt-history.jsonl` for command-line recall. -This data **never** leaves your machine — it is not sent to any service or included in telemetry. You can delete it at any time by removing the local database and history files. +This data **never** leaves your machine. It is not sent to any service or included in telemetry. You can delete it at any time by removing the local database and history files. !!! note Your LLM provider may have its own data retention policies. Check your provider's terms to understand how they handle API requests. ## How do I secure Altimate Code in a team environment? -1. **Use project-level config** — Place `altimate-code.json` in your project root with appropriate permission defaults. This ensures consistent security settings across the team. +1. **Use project-level config.** Place `altimate-code.json` in your project root with appropriate permission defaults. This ensures consistent security settings across the team. -2. **Restrict dangerous operations** — Deny destructive SQL and shell commands at the project level so individual users can't accidentally bypass them. +2. **Restrict dangerous operations.** Deny destructive SQL and shell commands at the project level so individual users can't accidentally bypass them. -3. **Use environment variables for secrets** — Never commit credentials. Use `ALTIMATE_CLI_PYTHON`, warehouse connection env vars, and your cloud provider's secret management. +3. **Use environment variables for secrets.** Never commit credentials. Use `ALTIMATE_CLI_PYTHON`, warehouse connection env vars, and your cloud provider's secret management. -4. **Review MCP servers** — Maintain a list of approved MCP servers. Don't let individual developers add arbitrary servers to shared configurations. +4. **Review MCP servers.** Maintain a list of approved MCP servers. Don't let individual developers add arbitrary servers to shared configurations. -5. **Lock down agent permissions** — Give each agent only the permissions it needs. The `analyst` agent doesn't need `write` access. The `builder` agent doesn't need `DROP` permissions. +5. **Lock down agent permissions.** Give each agent only the permissions it needs. The `analyst` agent doesn't need `write` access. The `builder` agent doesn't need `DROP` permissions. ## Can AI-generated SQL damage my database? @@ -202,12 +207,12 @@ For additional safety: Altimate Code includes several layers of protection to keep the agent within your project: -- **Project boundary enforcement** — File operations check that paths stay within your project directory (or git worktree for monorepos). Attempts to read or write outside the project trigger an `external_directory` permission prompt. -- **Symlink-aware path resolution** — Symlinks inside the project that point outside are detected and blocked. This prevents an agent from reading or writing outside your project through symlinks. -- **Path traversal blocking** — Paths containing `../` sequences that would escape the project are rejected with an "Access denied" error. -- **Sensitive file protection** — Writing to credential files (`.env`, `.ssh/`, `.aws/`, private keys) triggers a confirmation prompt, even inside the project. See [below](#why-am-i-being-prompted-to-edit-env-files) for details. -- **Bash command analysis** — The bash tool parses commands with tree-sitter to detect file operations (`rm`, `cp`, `mv`, etc.) targeting paths outside your project, and prompts for permission. -- **Non-git project safety** — For projects outside a git repository, the boundary is strictly the working directory (not the entire filesystem). +- **Project boundary enforcement.** File operations check that paths stay within your project directory (or git worktree for monorepos). Attempts to read or write outside the project trigger an `external_directory` permission prompt. +- **Symlink-aware path resolution.** Symlinks inside the project that point outside are detected and blocked. This prevents an agent from reading or writing outside your project through symlinks. +- **Path traversal blocking.** Paths containing `../` sequences that would escape the project are rejected with an "Access denied" error. +- **Sensitive file protection.** Writing to credential files (`.env`, `.ssh/`, `.aws/`, private keys) triggers a confirmation prompt, even inside the project. See [below](#why-am-i-being-prompted-to-edit-env-files) for details. +- **Bash command analysis.** The bash tool parses commands with tree-sitter to detect file operations (`rm`, `cp`, `mv`, etc.) targeting paths outside your project, and prompts for permission. +- **Non-git project safety.** For projects outside a git repository, the boundary is strictly the working directory (not the entire filesystem). These protections operate at the application level. For additional isolation, you can run Altimate Code inside a Docker container or VM. @@ -225,13 +230,13 @@ Altimate Code prompts before modifying files that commonly contain credentials o When you see this prompt: -- **"Allow once"** — approves this single edit -- **"Allow always"** — approves edits to this specific file for the rest of the session (resets on restart) +- **"Allow once"** approves this single edit +- **"Allow always"** approves edits to this specific file for the rest of the session (resets on restart) -If you frequently edit `.env` files and find the prompts disruptive, click "Allow always" on the first prompt for each file — you won't be asked again for that file during your session. +If you frequently edit `.env` files and find the prompts disruptive, click "Allow always" on the first prompt for each file. You won't be asked again for that file during your session. !!! tip - This protection does **not** block reading these files — only writing. The agent can still read your `.env` to understand configuration without prompting. + This protection does **not** block reading these files, only writing. The agent can still read your `.env` to understand configuration without prompting. ## What commands are blocked or prompted by default? @@ -248,17 +253,17 @@ Altimate Code applies safe defaults so you don't have to configure anything for | `TRUNCATE *` | **Blocked** | Irreversible data deletion. | | All other commands | **Prompted** | You approve each command before it runs. | -**"Prompted"** means you'll see the command and can approve or reject it. **"Blocked"** means the agent cannot run it at all — you must override in config. +**"Prompted"** means you'll see the command and can approve or reject it. **"Blocked"** means the agent cannot run it at all; you must override in config. -To override defaults, add rules in `altimate-code.json`. See [Permissions](configure/permissions.md) for the full configuration reference. +To override defaults, add rules in `altimate-code.json`. See [Permissions](../configure/permissions.md) for the full configuration reference. ## Best practices for staying safe -1. **Review before approving.** The permission prompt shows you exactly what will happen — diffs for file edits, the full command for bash. Take a moment to read it. +1. **Review before approving.** The permission prompt shows you exactly what will happen, including diffs for file edits and the full command for bash. Take a moment to read it. -2. **Work on a branch.** Let the agent work on a feature branch so you can review changes before merging. Git gives you a full safety net — this is the single most effective protection. +2. **Work on a branch.** Let the agent work on a feature branch so you can review changes before merging. Git gives you a full safety net. This is the single most effective protection. -3. **Use per-agent permissions.** Give each agent only what it needs. The `analyst` agent doesn't need write access. See [Permissions](configure/permissions.md) for examples. +3. **Use per-agent permissions.** Give each agent only what it needs. The `analyst` agent doesn't need write access. See [Permissions](../configure/permissions.md) for examples. 4. **Use read-only database credentials for exploration.** When using the agent for analysis or ad-hoc queries, connect with a read-only database user. diff --git a/docs/docs/configure/telemetry.md b/docs/docs/reference/telemetry.md similarity index 65% rename from docs/docs/configure/telemetry.md rename to docs/docs/reference/telemetry.md index 84b9acaa0c..5d499d8e4d 100644 --- a/docs/docs/configure/telemetry.md +++ b/docs/docs/reference/telemetry.md @@ -11,27 +11,27 @@ We collect the following categories of events: | `session_start` | A new CLI session begins | | `session_end` | A CLI session ends (includes duration) | | `session_forked` | A session is forked from an existing one | -| `generation` | An AI model generation completes (model ID, token counts, duration — no prompt content) | -| `tool_call` | A tool is invoked (tool name and category — no arguments or output) | -| `bridge_call` | A Python engine RPC call completes (method name and duration — no arguments) | +| `generation` | An AI model generation completes (model ID, token counts, duration, but no prompt content) | +| `tool_call` | A tool is invoked (tool name and category, but no arguments or output) | +| `bridge_call` | A native tool call completes (method name and duration, but no arguments) | | `command` | A CLI command is executed (command name only) | -| `error` | An unhandled error occurs (error type and truncated message — no stack traces) | -| `auth_login` | Authentication succeeds or fails (provider and method — no credentials) | +| `error` | An unhandled error occurs (error type and truncated message, but no stack traces) | +| `auth_login` | Authentication succeeds or fails (provider and method, but no credentials) | | `auth_logout` | A user logs out (provider only) | | `mcp_server_status` | An MCP server connects, disconnects, or errors (server name and transport) | -| `provider_error` | An AI provider returns an error (error type and HTTP status — no request content) | -| `engine_started` | The Python engine starts or restarts (version and duration) | -| `engine_error` | The Python engine fails to start (phase and truncated error) | +| `provider_error` | An AI provider returns an error (error type and HTTP status, but no request content) | +| `engine_started` | The native tool engine initializes (version and duration) | +| `engine_error` | The native tool engine fails to start (phase and truncated error) | | `upgrade_attempted` | A CLI upgrade is attempted (version and method) | | `permission_denied` | A tool permission is denied (tool name and source) | | `doom_loop_detected` | A repeated tool call pattern is detected (tool name and count) | | `compaction_triggered` | Context compaction runs (strategy and token counts) | | `tool_outputs_pruned` | Tool outputs are pruned during compaction (count) | -| `environment_census` | Environment snapshot on project scan (warehouse types, dbt presence, feature flags — no hostnames) | +| `environment_census` | Environment snapshot on project scan (warehouse types, dbt presence, feature flags, but no hostnames) | | `context_utilization` | Context window usage per generation (token counts, utilization percentage, cache hit ratio) | | `agent_outcome` | Agent session outcome (agent type, tool/generation counts, cost, outcome status) | | `error_recovered` | Successful recovery from a transient error (error type, strategy, attempt count) | -| `mcp_server_census` | MCP server capabilities after connect (tool and resource counts — no tool names) | +| `mcp_server_census` | MCP server capabilities after connect (tool and resource counts, but no tool names) | | `context_overflow_recovered` | Context overflow is handled (strategy) | Each event includes a timestamp, anonymous session ID, and the CLI version. @@ -40,16 +40,16 @@ Each event includes a timestamp, anonymous session ID, and the CLI version. Telemetry events are buffered in memory and flushed periodically. If a flush fails (e.g., due to a transient network error), events are re-added to the buffer for one retry. On process exit, the CLI performs a final flush to avoid losing events from the current session. -No events are ever written to disk — if the process is killed before the final flush, buffered events are lost. This is by design to minimize on-disk footprint. +No events are ever written to disk. If the process is killed before the final flush, buffered events are lost. This is by design to minimize on-disk footprint. ## Why We Collect Telemetry Telemetry helps us: -- **Detect errors** — identify crashes, provider failures, and engine issues before users report them -- **Improve reliability** — track MCP server stability, engine startup success rates, and upgrade outcomes -- **Understand usage patterns** — know which tools and features are used so we can prioritize development -- **Measure performance** — track generation latency, engine startup time, and bridge call duration +- **Detect errors** by identifying crashes, provider failures, and engine issues before users report them +- **Improve reliability** by tracking MCP server stability, engine initialization, and upgrade outcomes +- **Understand usage patterns** to know which tools and features are used so we can prioritize development +- **Measure performance** by tracking generation latency, tool call duration, and startup time ## Disabling Telemetry @@ -79,7 +79,7 @@ We take your privacy seriously. Altimate Code telemetry **never** collects: - Code content, file contents, or file paths - Credentials, API keys, or tokens - Database connection strings or hostnames -- Personally identifiable information (your email is SHA-256 hashed before sending — used only for anonymous user correlation) +- Personally identifiable information (your email is SHA-256 hashed before sending and is used only for anonymous user correlation) - Tool arguments or outputs - AI prompt content or responses @@ -93,7 +93,7 @@ Telemetry data is sent to Azure Application Insights: |----------|---------| | `eastus-8.in.applicationinsights.azure.com` | Telemetry ingestion | -For a complete list of network endpoints, see the [Network Reference](../network.md). +For a complete list of network endpoints, see the [Network Reference](network.md). ## For Contributors @@ -101,21 +101,21 @@ For a complete list of network endpoints, see the [Network Reference](../network Event type names use **snake_case** with a `domain_action` pattern: -- `auth_login`, `auth_logout` — authentication events -- `mcp_server_status`, `mcp_server_census` — MCP server lifecycle -- `engine_started`, `engine_error` — Python engine events -- `provider_error` — AI provider errors -- `session_forked` — session lifecycle -- `environment_census` — environment snapshot events -- `context_utilization`, `context_overflow_recovered` — context management events -- `agent_outcome` — agent session events -- `error_recovered` — error recovery events +- `auth_login`, `auth_logout` for authentication events +- `mcp_server_status`, `mcp_server_census` for MCP server lifecycle +- `engine_started`, `engine_error` for native engine events +- `provider_error` for AI provider errors +- `session_forked` for session lifecycle +- `environment_census` for environment snapshot events +- `context_utilization`, `context_overflow_recovered` for context management events +- `agent_outcome` for agent session events +- `error_recovered` for error recovery events ### Adding a New Event -1. **Define the type** — Add a new variant to the `Telemetry.Event` union in `packages/altimate-code/src/telemetry/index.ts` -2. **Emit the event** — Call `Telemetry.track()` at the appropriate location -3. **Update docs** — Add a row to the event table above +1. **Define the type.** Add a new variant to the `Telemetry.Event` union in `packages/altimate-code/src/telemetry/index.ts` +2. **Emit the event.** Call `Telemetry.track()` at the appropriate location +3. **Update docs.** Add a row to the event table above ### Privacy Checklist diff --git a/docs/docs/troubleshooting.md b/docs/docs/reference/troubleshooting.md similarity index 90% rename from docs/docs/troubleshooting.md rename to docs/docs/reference/troubleshooting.md index 2e2c39f4a7..4429bbf7a4 100644 --- a/docs/docs/troubleshooting.md +++ b/docs/docs/reference/troubleshooting.md @@ -47,7 +47,7 @@ altimate --print-logs --log-level DEBUG # Example for PostgreSQL: bun add pg ``` -3. No Python installation is required — all tools run natively in TypeScript. +3. No Python installation is required. All tools run natively in TypeScript. ### Warehouse Connection Failed @@ -55,7 +55,7 @@ altimate --print-logs --log-level DEBUG **Solutions:** -1. **If using dbt:** Run `altimate-dbt init` to set up the dbt integration. The CLI will use your `profiles.yml` automatically — no separate connection config needed. +1. **If using dbt:** Run `altimate-dbt init` to set up the dbt integration. The CLI will use your `profiles.yml` automatically, so no separate connection config is needed. 2. **If not using dbt:** Add a connection via the `warehouse_add` tool, `~/.altimate-code/connections.json`, or `ALTIMATE_CODE_CONN_*` env vars. 3. Test connectivity: use the `warehouse_test` tool with your connection name. 4. Check that the warehouse hostname and port are reachable @@ -69,7 +69,7 @@ altimate --print-logs --log-level DEBUG **Solutions:** -1. Check the log files — MCP initialization errors are now logged with the server name and error message: +1. Check the log files. MCP initialization errors are now logged with the server name and error message: ``` WARN failed to initialize MCP server { key: "my-tools", error: "..." } ``` @@ -139,5 +139,5 @@ Then share `debug.log` when reporting issues. ## Getting Help -- [GitHub Issues](https://github.com/AltimateAI/altimate-code/issues) — Report bugs and request features +- [GitHub Issues](https://github.com/AltimateAI/altimate-code/issues): Report bugs and request features - Check [existing issues](https://github.com/AltimateAI/altimate-code/issues) before filing new ones diff --git a/docs/docs/windows-wsl.md b/docs/docs/reference/windows-wsl.md similarity index 93% rename from docs/docs/windows-wsl.md rename to docs/docs/reference/windows-wsl.md index 0367a64436..a3285a03ac 100644 --- a/docs/docs/windows-wsl.md +++ b/docs/docs/reference/windows-wsl.md @@ -8,7 +8,7 @@ You can install and run altimate directly in PowerShell or Command Prompt withou ```powershell # PowerShell or CMD — install globally -npm install -g @altimateai/altimate-code +npm install -g altimate-code # Launch altimate @@ -18,7 +18,7 @@ This works with Node.js 18+ installed natively on Windows. All core features wor ## WSL Setup (Recommended) -For the best experience — especially with file watching, shell tools, and dbt — we recommend WSL 2: +For the best experience (especially with file watching, shell tools, and dbt), we recommend WSL 2: 1. Install WSL: ```powershell @@ -115,4 +115,4 @@ If you installed Node.js but `npm` or `node` is not recognized: - Use WSL 2 for better performance - Store your projects in the WSL filesystem (`~/projects/`) rather than `/mnt/c/` for faster file operations - Set up your warehouse connections in the WSL environment -- If using both WSL and native Windows, keep separate config files — the WSL and Windows file systems have different path conventions +- If using both WSL and native Windows, keep separate config files because the WSL and Windows file systems have different path conventions diff --git a/docs/docs/usage/ci-headless.md b/docs/docs/usage/ci-headless.md new file mode 100644 index 0000000000..471acdaada --- /dev/null +++ b/docs/docs/usage/ci-headless.md @@ -0,0 +1,155 @@ +# CI & Headless Mode + +Run any altimate prompt non-interactively from scripts, CI pipelines, or scheduled jobs. No TUI. Output is plain text or JSON. + +--- + +## Basic Usage + +```bash +altimate run "your prompt here" +``` + +Key flags: + +| Flag | Description | +|---|---| +| `--output json` | Structured JSON output instead of plain text | +| `--model ` | Override the configured model | +| `--connection ` | Select a specific warehouse connection | +| `--no-color` | Disable ANSI color codes (for CI logs) | + +See `altimate run --help` for the full flag list, or [CLI Reference](cli.md). + +--- + +## Environment Variables for CI + +Configure without committing an `altimate-code.json` file: + +```bash +# LLM provider +ALTIMATE_PROVIDER=anthropic +ALTIMATE_ANTHROPIC_API_KEY=your-key-here + +# Or OpenAI +ALTIMATE_PROVIDER=openai +ALTIMATE_OPENAI_API_KEY=your-key-here + +# Warehouse (Snowflake example) +SNOWFLAKE_ACCOUNT=myorg-myaccount +SNOWFLAKE_USER=ci_user +SNOWFLAKE_PASSWORD=${{ secrets.SNOWFLAKE_PASSWORD }} +SNOWFLAKE_DATABASE=analytics +SNOWFLAKE_SCHEMA=public +SNOWFLAKE_WAREHOUSE=compute_wh +``` + +--- + +## Exit Codes + +| Code | Meaning | +|---|---| +| `0` | Success (task completed) | +| `1` | Task completed but result indicates issues (e.g., anti-patterns found) | +| `2` | Configuration error (missing API key, bad connection) | +| `3` | Tool execution error (warehouse unreachable, query failed) | + +Use exit codes to fail CI on actionable findings: + +```bash +altimate run "validate models in models/staging/ for anti-patterns" || exit 1 +``` + +--- + +## Worked Examples + +### Example 1: Nightly Cost Check (GitHub Actions) + +```yaml +# .github/workflows/cost-check.yml +name: Nightly Cost Check + +on: + schedule: + - cron: '0 8 * * 1-5' # 8am UTC, weekdays + +jobs: + cost-check: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Install altimate + run: npm install -g altimate-code + + - name: Run cost report + env: + ALTIMATE_PROVIDER: anthropic + ALTIMATE_ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }} + SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_CI_USER }} + SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_CI_PASSWORD }} + SNOWFLAKE_DATABASE: analytics + SNOWFLAKE_WAREHOUSE: compute_wh + run: | + altimate run "/cost-report" --output json > cost-report.json + cat cost-report.json + + - name: Upload cost report + uses: actions/upload-artifact@v4 + with: + name: cost-report + path: cost-report.json +``` + +### Example 2: Post-Deploy SQL Validation + +Add to your dbt deployment workflow to catch anti-patterns before they reach production: + +```yaml + - name: SQL anti-pattern check + env: + ALTIMATE_PROVIDER: anthropic + ALTIMATE_ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + run: | + altimate run "validate all SQL files in models/staging/ for anti-patterns and fail if any are found" \ + --no-color \ + --output json +``` + +### Example 3: Automated Test Generation (Pre-commit) + +```bash +#!/bin/bash +# .git/hooks/pre-commit +# Generate tests for any staged SQL model files + +STAGED_MODELS=$(git diff --cached --name-only --diff-filter=A | grep "models/.*\.sql") + +if [ -n "$STAGED_MODELS" ]; then + echo "Generating tests for new models..." + altimate run "/generate-tests for: $STAGED_MODELS" --no-color +fi +``` + +--- + +## Tracing in Headless Mode + +Tracing works in headless mode. View traces after the run: + +```bash +altimate trace list +altimate trace view +``` + +See [Tracing](../configure/tracing.md) for the full trace reference. + +--- + +## Security Recommendation + +Use a **read-only warehouse user** for CI jobs that only need to read data. Reserve write-access credentials for jobs that explicitly need them (e.g., test generation that writes files). See [Security FAQ](../reference/security-faq.md) and [Permissions](../configure/permissions.md). diff --git a/docs/docs/usage/cli.md b/docs/docs/usage/cli.md index 7b10316bbf..804923785e 100644 --- a/docs/docs/usage/cli.md +++ b/docs/docs/usage/cli.md @@ -41,14 +41,15 @@ altimate --agent analyst ## Global Flags -| Flag | Description | -| -------------------------- | ----------------------------------------------- | -| `--model ` | Override the default model | -| `--agent ` | Start with a specific agent | -| `--print-logs` | Print logs to stderr | -| `--log-level ` | Set log level: `DEBUG`, `INFO`, `WARN`, `ERROR` | -| `--help`, `-h` | Show help | -| `--version`, `-v` | Show version | +| Flag | Description | +|------|------------| +| `--model ` | Override the default model | +| `--agent ` | Start with a specific agent | +| `--yolo` | Auto-approve all permission prompts (explicit `deny` rules still enforced) | +| `--print-logs` | Print logs to stderr | +| `--log-level ` | Set log level: `DEBUG`, `INFO`, `WARN`, `ERROR` | +| `--help`, `-h` | Show help | +| `--version`, `-v` | Show version | ## Environment Variables @@ -85,6 +86,21 @@ Configuration can be controlled via environment variables: | `ALTIMATE_CLI_SERVER_PASSWORD` | Server HTTP basic auth password | | `ALTIMATE_CLI_PERMISSION` | Permission config as JSON | +### Permissions & Safety + +| Variable | Description | +|----------|------------| +| `ALTIMATE_CLI_YOLO` | Auto-approve all permission prompts (`true`/`false`). Explicit `deny` rules still enforced. | +| `OPENCODE_YOLO` | Fallback for `ALTIMATE_CLI_YOLO`. When both are set, `ALTIMATE_CLI_YOLO` takes precedence. | + +### Memory & Training + +| Variable | Description | +|----------|------------| +| `ALTIMATE_DISABLE_MEMORY` | Disable the persistent memory system | +| `ALTIMATE_MEMORY_AUTO_EXTRACT` | Auto-extract memories at session end | +| `ALTIMATE_DISABLE_TRAINING` | Disable the AI teammate training system | + ### Experimental | Variable | Description | @@ -137,9 +153,11 @@ altimate --print-logs --log-level DEBUG run "test query" altimate run --no-trace "quick question" ``` +For CI pipelines and headless automation, see [CI & Automation](ci-headless.md). + ## Tracing -Every `run` command automatically saves a trace file with the full session details — generations, tool calls, tokens, cost, and timing. See [Tracing](../configure/tracing.md) for configuration options. +Every `run` command automatically saves a trace file with the full session details, including generations, tool calls, tokens, cost, and timing. See [Tracing](../configure/tracing.md) for configuration options. ```bash # List recent traces diff --git a/docs/docs/usage/tui.md b/docs/docs/usage/tui.md index a30be554a2..7ca9177253 100644 --- a/docs/docs/usage/tui.md +++ b/docs/docs/usage/tui.md @@ -10,9 +10,9 @@ altimate The TUI has three main areas: -- **Message area** — shows the conversation with the AI assistant -- **Input area** — where you type messages and commands -- **Sidebar** — shows session info, tool calls, and file changes (toggle with leader key + `s`) +- **Message area**: shows the conversation with the AI assistant +- **Input area**: where you type messages and commands +- **Sidebar**: shows session info, tool calls, and file changes (toggle with leader key + `s`) ## Input Shortcuts @@ -34,15 +34,16 @@ The leader key (default: `Ctrl+X`) gives access to all TUI keybindings. Press le | `s` | Toggle sidebar | | `t` | List themes | | `m` | List models | +| `i` | Enhance prompt (rewrite with AI for clarity) | | `a` | List agents | | `k` | List keybinds | | `q` | Quit | ## Scrolling -- **Page up/down** — scroll messages -- **Home/End** — jump to first/last message -- **Mouse scroll** — scroll with mouse wheel +- **Page up/down**: scroll messages +- **Home/End**: jump to first/last message +- **Mouse scroll**: scroll with mouse wheel Configure scroll speed: @@ -64,7 +65,7 @@ Switch between agents during a conversation: - Press leader key + `a` to see all agents - Use `/agent ` to switch directly - Built-in agents: `general`, `plan`, `build`, `explore` -- Data engineering agents: `builder`, `analyst`, `validator`, `migrator` +- Data engineering agents: `builder`, `analyst`, `plan` ## Diff Display diff --git a/docs/docs/usage/web.md b/docs/docs/usage/web.md index 099ac9fa33..bd3125e3a5 100644 --- a/docs/docs/usage/web.md +++ b/docs/docs/usage/web.md @@ -1,53 +1,11 @@ -# Web +# Web UI -altimate includes a web-based interface for browser access. - -```bash -altimate web -``` - -## Configuration - -Configure the web server in `altimate-code.json`: - -```json -{ - "server": { - "port": 3000, - "hostname": "localhost", - "cors": ["https://myapp.example.com"], - "mdns": true, - "mdnsDomain": "altimate-code.local" - } -} -``` - -| Option | Default | Description | -|--------|---------|------------| -| `port` | 3000 | HTTP port | -| `hostname` | `localhost` | Bind address | -| `cors` | `[]` | Allowed CORS origins | -| `mdns` | `false` | Enable mDNS discovery | -| `mdnsDomain` | — | Custom mDNS domain | - -## Authentication - -Set basic auth credentials: - -```bash -export ALTIMATE_CLI_SERVER_USERNAME=admin -export ALTIMATE_CLI_SERVER_PASSWORD=secret -altimate web -``` - -## Features - -The web UI provides the same conversational interface as the TUI: +Altimate Web is a browser-based interface for interacting with altimate's data engineering tools without the terminal. It provides the same conversational agent experience as the TUI, accessible from any browser. - Full chat interface with streaming responses +- Agent switching between builder, analyst, and plan modes - File references and tool call results -- Agent switching -- Session management +- Session management and history -!!! note - The web UI is the general-purpose agent interface. For data-engineering-specific UIs, see the [Data Engineering guides](../data-engineering/guides/index.md). +!!! info "Coming Soon" + The web UI is currently under development. For now, use the [TUI](tui.md) or [CLI](cli.md) to interact with altimate. diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 4402a9bdca..9a4635260b 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -1,5 +1,5 @@ site_name: altimate-code -site_description: The open-source data engineering harness. 99+ tools for building, validating, optimizing, and shipping data products. +site_description: The open-source data engineering harness. 100+ tools for building, validating, optimizing, and shipping data products. site_url: https://docs.altimate.sh repo_url: https://github.com/AltimateAI/altimate-code repo_name: AltimateAI/altimate-code @@ -73,77 +73,78 @@ markdown_extensions: emoji_generator: !!python/name:material.extensions.emoji.to_svg nav: - - Home: index.md - - - Get Started: - - Quickstart: quickstart.md - - Full Setup: getting-started.md - - Agent Modes: data-engineering/agent-modes.md + - Getting Started: + - Overview: getting-started/index.md + - Quickstart: getting-started/quickstart-new.md + - Setup: getting-started/quickstart.md + - Examples: + - Showcase: examples/index.md + - Use: + - Agents: + - Agent Modes: data-engineering/agent-modes.md + - Agent Config: configure/agents.md + - Tools: + - Overview: configure/tools/index.md + - Built-in Tools: configure/tools/config.md + - Core Tools: configure/tools/core-tools.md + - SQL Tools: data-engineering/tools/sql-tools.md + - Schema Tools: data-engineering/tools/schema-tools.md + - FinOps Tools: data-engineering/tools/finops-tools.md + - Lineage Tools: data-engineering/tools/lineage-tools.md + - dbt Tools: data-engineering/tools/dbt-tools.md + - Warehouse Tools: data-engineering/tools/warehouse-tools.md + - Memory Tools: data-engineering/tools/memory-tools.md + - Custom Tools: configure/tools/custom.md + - Skills: configure/skills.md + - Commands: configure/commands.md - Interfaces: - - Terminal UI: usage/tui.md + - TUI: usage/tui.md - CLI: usage/cli.md - - IDE / VS Code: usage/ide.md - Web UI: usage/web.md - - - Guides: - - Cost Optimization: data-engineering/guides/cost-optimization.md - - SQL Migration: data-engineering/guides/migration.md - - CI & Automation: data-engineering/guides/ci-headless.md - - - Tools: - - SQL Analysis: data-engineering/tools/sql-tools.md - - Schema & Metadata: data-engineering/tools/schema-tools.md - - Column-Level Lineage: data-engineering/tools/lineage-tools.md - - dbt Integration: data-engineering/tools/dbt-tools.md - - Cost & FinOps: data-engineering/tools/finops-tools.md - - Warehouse Tools: data-engineering/tools/warehouse-tools.md - - - Integrations: - - GitHub Actions: usage/github.md - - GitLab CI: usage/gitlab.md - - Claude Code: data-engineering/guides/using-with-claude-code.md - - Codex: data-engineering/guides/using-with-codex.md - - MCP Servers: configure/mcp-servers.md - - LSP: configure/lsp.md - - ACP: configure/acp.md - + - CI: usage/ci-headless.md + - IDE: usage/ide.md + - GitHub: usage/github.md + - GitLab: usage/gitlab.md + - Guides: + - Cost Optimization: data-engineering/guides/cost-optimization.md + - Migration: data-engineering/guides/migration.md + - Using with Claude Code: data-engineering/guides/using-with-claude-code.md + - Using with Codex: data-engineering/guides/using-with-codex.md - Configure: - - Config Files: configure/config.md - - AI Providers & Models: + - Overview: configure/index.md + - Warehouses: configure/warehouses.md + - LLMs: - Providers: configure/providers.md - Models: configure/models.md - - Agents & Skills: - - Agents: configure/agents.md - - Skills: configure/skills.md - - Tools & Access: - - Allowed Tools: configure/tools.md - - Custom Tools: configure/custom-tools.md - - Access Control: configure/permissions.md - - Behavior: - - Rules: configure/rules.md - - Commands: configure/commands.md - - Context Management: configure/context-management.md - - Memory: data-engineering/tools/memory-tools.md - - Training: - - Overview: data-engineering/training/index.md - - Team Deployment: data-engineering/training/team-deployment.md + - MCPs & ACPs: + - MCP Servers: configure/mcp-servers.md + - ACP Support: configure/acp.md - Appearance: - Themes: configure/themes.md - Keybinds: configure/keybinds.md - - Formatters: configure/formatters.md - Observability: - Tracing: configure/tracing.md - - Telemetry: configure/telemetry.md - - Network & Proxy: network.md - - Windows / WSL: windows-wsl.md - - - Extend: - - SDK: develop/sdk.md - - Server API: develop/server.md - - Plugins: develop/plugins.md - - Ecosystem: develop/ecosystem.md - + - Telemetry: reference/telemetry.md + - Training: + - Overview: data-engineering/training/index.md + - Team Deployment: data-engineering/training/team-deployment.md + - Additional Config: + - LSP Servers: configure/lsp.md + - Network: reference/network.md + - Windows / WSL: reference/windows-wsl.md + - Config File Reference: configure/config.md + - Governance: + - Overview: configure/governance.md + - Rules: configure/rules.md + - Permissions: configure/permissions.md + - Context Management: configure/context-management.md + - Formatters: configure/formatters.md - Reference: - - Security FAQ: security-faq.md - - Troubleshooting: troubleshooting.md - - Changelog: https://github.com/AltimateAI/altimate-code/blob/main/CHANGELOG.md + - Changelog: reference/changelog.md + - Security FAQ: reference/security-faq.md + - Troubleshooting: reference/troubleshooting.md + - Extend: + - SDK: develop/sdk.md + - Server API: develop/server.md + - Plugins: develop/plugins.md + - Ecosystem: develop/ecosystem.md