Skip to content

[feedback] LLM agent experience, silent failures, false positives — v0.2.1 #5

@Warhorze

Description

@Warhorze

Tested sqlcg v0.2.1 on a Snowflake DWH repo (1457 SQL files, 3245 tables, 8084 queries, SqlColumn: 0). Ran the full session via an LLM agent (Claude Code) using the MCP server. Issues grouped by severity.

1. LLM onboarding (critical)

  • Package is sql-code-graph, binary is sqlcg. An LLM invoking the MCP server name hits "executable not found, try uvx --from sql-code-graph sqlcg" and cannot self-recover without human intervention.
  • --help lists commands but gives no workflow order. The required sequence (db initindex <path>git install-hooks) is not implied anywhere. The LLM jumped straight to find/analyze and got empty results for the entire session.
  • mcp setup prints JSON with no mention of prerequisites. An LLM will run it and consider setup complete.
  • watch has no initialization guard — behavior is undefined if db init was never run.
  • Suggestion: add an LLM-oriented quick-start block at the top of --help:
    QUICK START (run in order):
      1. sqlcg db init
      2. sqlcg index <repo-root> --dialect <dialect>
      3. sqlcg git install-hooks
    
    NOTE: binary is 'sqlcg', MCP package name is 'sql-code-graph'
    Check health: sqlcg db info
    ⚠ If SqlColumn: 0 — column lineage unavailable, only table-level tracing works
    

2. Silent failures & misleading state

  • db info shows SqlColumn: 0 with no warning that trace_column_lineage and find column are therefore completely non-functional.
  • Index success rate reported as 100% despite 900KB of parse warnings and zero column nodes populated. "Success" should mean something.
  • find column returns "No results" with no explanation that the column graph is empty. An LLM cannot distinguish "column not found" from "column graph not built".
  • After db init with no index run, every query silently returns empty. There is no "database is empty, have you run sqlcg index?" guard.

3. analyze unused is all false positives

Returns ~100 IA_ANALYTICS / IA_TABLEAU / IA_BUSINESSOBJECTS views as unused. These are consumer-facing views used by Tableau and BI tools — external consumers that have no SQL references within the repo. The tool has no concept of external consumers.

Suggestions:

  • Add a --exclude-schema flag to filter known consumer schemas
  • Or warn: "results may include externally-consumed objects not referenced in indexed files"

4. analyze impact vs find pattern inconsistency

For the same table (BA.WTFV_VOORRAAD_DAGSTAND_IGDC):

  • analyze impact returned only DDL files
  • find pattern additionally found the ETL INSERT in etl/sql/fact/

The two commands should return consistent results.

5. Feedback loop has no false-negative path

trace_column_lineage was called 7 times via MCP, all returned empty, 0 feedback samples collected. The feedback prompt only fires when there are results to rate.

The FN case — empty result when the user expected one — is the most valuable signal at this stage and it's completely absent. Suggestion: prompt "Did you expect a result here? (y/n)" on empty returns.

As a proxy metric: execute_cypher was the #1 MCP tool with 15 calls vs trace_column_lineage at 7. That ratio means the LLM kept falling back to raw Cypher because the high-level tools failed. Worth surfacing in sqlcg gain as a signal that abstractions aren't working.

6. Parser: ALTER WAREHOUSE noise

ALTER WAREHOUSE IDENTIFIER($var) SET WAREHOUSE_SIZE = 'X-Large' is a standard pattern in Snowflake DDL rebuild files (every table rebuild uses it). The parser fails on it and falls back to Command type. Since it's not DML, stripping it before DML extraction would eliminate the noise entirely. Same applies to CALL MA.MSSPR_*() stored procedure calls.

7. Missing: persistent index config

No .sqlcg.toml to declare repo root and dialect. After a db reset or fresh clone there is no way to replay the index without remembering the original invocation. The git hook (which correctly resyncs on branch switch) also depends on the DB already existing — it cannot bootstrap from scratch.

Suggestion: .sqlcg.toml with path and dialect so sqlcg index with no arguments just works, and the git hook can call it safely.

8. No progress feedback during index

sqlcg index on 1457 files took ~17 seconds but produced zero stdout output while running — only the final line Indexed 1457 files — 2231 tables, 0 edges, 148 errors appeared at the end. With no progress indicator it's impossible to tell whether the command is working, hung, or waiting on something.

Also: the summary line reports 0 edges — a direct signal that the graph has no relationships and lineage tracing will return nothing — but it appears only in the raw log, never surfaced in db info.

Suggestions:

  • A simple progress indicator or periodic INFO: indexed N files... lines
  • Surface edges: 0 as a warning in db info — a graph with no edges cannot answer any lineage query

9. uvx cold-start is invisible

The recommended install and MCP server config both use uvx, which downloads and installs the package on first use. On a cold cache this took ~6.5 minutes with zero output — indistinguishable from a hang. The actual indexing of 1457 files took 17 seconds.

The MCP server config ("command": "uvx") means Claude Code also pays a startup cost each session.

Suggestions:

  • Docs should recommend uv tool install sql-code-graph for persistent installs, with uvx only for one-shot tries
  • Or at minimum note in the quick-start that first run will be slow due to package download

Environment: sqlcg 0.2.1, Snowflake dialect, 1457 files indexed, WSL2 / Linux, Claude Code MCP session.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions