sql-code-graph

Pre-1.0 — expect breaking changes. APIs, CLI flags, and graph schema may change between releases without a deprecation period. Pin to an exact version in production. Re-indexing is always the migration path.

Dialect coverage. This tool has been developed and tested against a production Snowflake data warehouse with ~1,400 SQL files. Other dialects (bigquery, postgres, ansi, tsql) are wired up but have not been exercised against a real corpus — edge cases will surface. If you use sqlcg with a non-Snowflake dialect and hit parse failures or missing lineage, please open an issue with a minimal reproducer; we want to support other dialects but need a corpus to develop against.

SQL lineage and dependency analysis as an MCP server for Claude Code.

Indexes a directory of .sql files into a graph database and exposes lineage queries as MCP tools — so Claude can answer questions like "what tables does this view depend on?" or "where is orders.customer_id derived from?" without reading every file.

Quick start

Choose one:

Permanent install (recommended):

uv tool install sql-code-graph    # Fast, managed, no isolation needed
sqlcg install                     # Register MCP server in Claude Code

One-shot try (cold cache warning):

uvx sql-code-graph                # First run is slow (downloads deps)
                                  # Subsequent runs use cache, ~1s startup

Restart Claude Code, then inside your project ask:

Index my SQL files at ./sql --dialect snowflake

That's it. The MCP tools are now available to Claude in every conversation for that project.

Workflow (3 steps)

Initialize: sqlcg db init
Index: sqlcg index ./sql --dialect snowflake
Keep fresh: sqlcg git install-hooks (optional)

Full setup (recommended)

# 1. Install
pip install sql-code-graph

# 2. Register with Claude Code (~/.claude/settings.json)
sqlcg install

# 3. Restart Claude Code

# 4. Index your SQL repo
# Only git-tracked files are indexed — build artefacts, node_modules,
# and .venv are ignored automatically.
sqlcg db init
sqlcg index ./sql --dialect snowflake   # or: bigquery, postgres, ansi

# 5. (Optional) Keep the graph fresh on branch switches
cd /your/sql/repo
sqlcg git install-hooks

Step 5 installs a post-checkout git hook that re-indexes automatically whenever you switch branches. Without it the graph may be stale after a git checkout until you re-run sqlcg index manually.

Dialect config

To avoid passing --dialect every time, create .sqlcg.toml in your repo root:

[sqlcg]
dialect = "snowflake"   # snowflake | bigquery | postgres | ansi

The git hook and sqlcg index --dialect auto both read this file.

Add to your project CLAUDE.md (recommended)

Adding a short note to your project's CLAUDE.md helps Claude know the tools are available and when to use them:

## SQL lineage
This project uses sql-code-graph. MCP tools are available:
- `db_info` — check graph health and parse quality before running lineage queries
- `index_repo` — index or re-index a directory of SQL files
- `find_table_usages` — find all queries that read a table
- `trace_column_lineage` — trace where a column's value comes from
- `get_upstream_dependencies` / `get_downstream_dependencies` — dependency chains
- `search_sql_pattern` — full-text search across all indexed SQL
- `execute_cypher` — raw graph query for advanced analysis

The MCP server works without this — Claude can discover the tools on its own — but the CLAUDE.md snippet ensures they get used proactively.

Parse quality

After indexing, sqlcg gain shows a parse quality breakdown that tells you how much column-level lineage was extracted:

Quality	Meaning	Tools affected
`FULL`	Column-level lineage extracted	All tools work
`TABLE_ONLY`	Table edges only — no column lineage	`trace_column_lineage`, `get_*_dependencies` return empty
`SCRIPTING_FALLBACK`	sqlglot fell back to raw command node	Partial table edges; column lineage unavailable
`FAILED`	File failed to parse entirely	File invisible to all queries

Quality is shown per-file after sqlcg index and in sqlcg gain Section F. list_dialects_and_repos() warns when scripting fallback exceeds 20% of queries.

What causes TABLE_ONLY? Mostly SELECT * — sqlglot can't trace column names through a wildcard without knowing the source table's columns. See Resolving SELECT * below to fix this without rewriting your ETLs.

What causes SCRIPTING_FALLBACK? Snowflake $$ procedure bodies or BEGIN…END scripting blocks. sqlglot parses the block as a raw Command node and extracts DML via tokenizer fallback. Table edges are usually correct; column edges are not.

Check sqlcg db info for the parsing mode distribution across all indexed queries.

Resolving SELECT *

By default, SELECT * ETLs produce TABLE_ONLY parse quality because sqlglot needs to know the source table's column list to expand the wildcard. The fastest fix is to provide your production information schema as a CSV — sqlcg uses it as the authoritative column source, replacing any DDL-inferred columns.

Step 1 — Export from Snowflake (run in a worksheet, export as CSV):

SELECT
    TABLE_CATALOG,
    TABLE_SCHEMA,
    TABLE_NAME,
    COLUMN_NAME,
    ORDINAL_POSITION,
    DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA NOT IN ('INFORMATION_SCHEMA')
ORDER BY TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, ORDINAL_POSITION;

Step 2 — Drop it in your repo:

cp ~/Downloads/columns.csv <your-sql-repo>/.sqlcg/schema.csv

Step 3 — Index your ETL folder only:

sqlcg index etl/

That's the entire workflow. sqlcg index auto-discovers .sqlcg/schema.csv and loads it before indexing — no separate command needed. Point it at your ETL folder rather than the full repo: fewer files to scan, faster indexing, and the schema CSV covers what DDL parsing was doing anyway.

After indexing, sqlcg db info will show non-zero STAR_EXPANSION lineage edges, and trace_column_lineage will return results for queries that previously returned empty.

Alternative: explicit load

If your schema CSV lives elsewhere or you want to refresh it without re-indexing:

sqlcg load-schema /path/to/columns.csv   # load or refresh schema
sqlcg index etl/                         # re-index using updated schema

Precedence: production CSV > DDL files > nothing. Partial CSVs are fine — tables not covered fall back to DDL-inferred columns automatically.

MCP tools reference

Tool	Description
`index_repo(repo_path, dialect)`	Index a directory of SQL files
`trace_column_lineage(table_col)`	Trace column lineage upstream
`find_table_usages(table_name)`	Find all queries that read a table
`get_upstream_dependencies(table_col)`	Full upstream dependency chain
`get_downstream_dependencies(table_col)`	Full downstream dependency chain
`search_sql_pattern(query)`	Full-text search across indexed SQL
`list_dialects_and_repos()`	List indexed repos and dialects (catalogue)
`db_info()`	Graph health, node counts, parse quality breakdown, warnings
`execute_cypher(query)`	Raw Cypher query against the graph

LLM agent tip: call db_info() before lineage queries to check that SqlColumn > 0 and warnings is empty. If parse_quality["scripting_block"] is high, column lineage will be limited for those files — use table-level tools (find_table_usages, get_*_dependencies) instead.

CLI reference

Full option reference: docs/cli.md

sqlcg install                          # register MCP server in Claude Code
sqlcg db init                          # initialise graph database
sqlcg index <path> --dialect <d>       # index SQL files
sqlcg index <path> --dialect auto      # read dialect from .sqlcg.toml
sqlcg load-schema <csv>                # load production schema (auto-discovered from .sqlcg/schema.csv)
sqlcg load-schema <csv> --include-catalog  # use 3-part names (CATALOG.SCHEMA.TABLE)
sqlcg watch <path>                     # watch for file changes
sqlcg git install-hooks                # install post-checkout hook
sqlcg gain                             # show usage metrics
sqlcg report                           # generate FP/error report
sqlcg mcp start                        # start MCP server manually
sqlcg version                          # show installed version

Supported dialects

Dialect	Status
`snowflake`	Tested against a production DWH (~1,400 files)
`bigquery`	Wired up, untested against a real corpus
`postgres`	Wired up, untested against a real corpus
`ansi`	Wired up, untested against a real corpus
`tsql`	Wired up, untested against a real corpus
`dbt`	Via optional extra, untested against a real corpus

If you use a non-Snowflake dialect and run into issues, open an issue with a minimal SQL reproducer. Contributions of test fixtures from real (anonymised) corpora are especially welcome.

Development

git clone https://github.com/Warhorze/sql-code-graph
cd sql-code-graph
uv sync --all-extras
uv run pytest tests/unit

Issues

Bug reports and feature requests: github.com/Warhorze/sql-code-graph/issues

Questions and discussion: github.com/Warhorze/sql-code-graph/discussions

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.claude		.claude
.github		.github
docs		docs
plan		plan
scripts		scripts
src/sqlcg		src/sqlcg
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.sqlcgignore		.sqlcgignore
ARCHITECTURE_REVIEW.md		ARCHITECTURE_REVIEW.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sql-code-graph

Quick start

Workflow (3 steps)

Full setup (recommended)

Dialect config

Add to your project CLAUDE.md (recommended)

Parse quality

Resolving SELECT *

MCP tools reference

CLI reference

Supported dialects

Development

Issues

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sql-code-graph

Quick start

Workflow (3 steps)

Full setup (recommended)

Dialect config

Add to your project CLAUDE.md (recommended)

Parse quality

Resolving SELECT *

MCP tools reference

CLI reference

Supported dialects

Development

Issues

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages