Skip to content

Backlog: open findings after v0.3.1 #15

@Warhorze

Description

@Warhorze

Consolidating all known open findings into one place post v0.3.1. Full detail for each is in ARCHITECTURE_REVIEW.md.

Next sprint (HIGH)

Finding 6 — Star projection column lineage (no edges for SELECT *)
sqlglot.optimizer.qualify can expand SELECT * into explicit columns when schema is available. Requires a two-pass index (DDL first → populate SchemaResolver → ETL pass uses qualify before sg_lineage). This is the biggest remaining gap for column lineage coverage.
Closes #5 when done.

Finding 11.6 — Indexer too slow for real corpora
index_repo is single-threaded. Parsing is CPU-bound (can't use threads — GIL). Needs ProcessPoolExecutor for parallel file parsing + bulk DB commits (N=50) to replace per-file fsync round-trips. Expected ~6-7x throughput gain on 8-core hardware. Implement after finding 6 (SchemaResolver must be pickle-safe first).

Medium priority

Finding 11.4 — Parse warnings go to stdout (#13)
Redirect WARNING log output to ~/.sqlcg/index.log by default; print only a count summary to stdout. Add --verbose to restore current behaviour.

Finding 11.7 — E8 dynamic identifier marker (observability only)
col_lineage_skip:dynamic_source: marker is already emitted (v0.3.1). No further action unless corpus gains statically-resolvable variable patterns.

Low priority / deferred

Finding 12 — Case-sensitive table lookup (#12)
Uppercase unquoted identifiers at index time and normalise lookup input. sqlglot dialect normalization likely handles most of this.

Finding 3.3 — SchemaResolver not thread-safe under sqlcg watch
Add threading.Lock around cache mutation in SchemaResolver. Blocked on finding 6 (resolver redesign anyway).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions