Consolidating all known open findings into one place post v0.3.1. Full detail for each is in ARCHITECTURE_REVIEW.md.
Next sprint (HIGH)
Finding 6 — Star projection column lineage (no edges for SELECT *)
sqlglot.optimizer.qualify can expand SELECT * into explicit columns when schema is available. Requires a two-pass index (DDL first → populate SchemaResolver → ETL pass uses qualify before sg_lineage). This is the biggest remaining gap for column lineage coverage.
Closes #5 when done.
Finding 11.6 — Indexer too slow for real corpora
index_repo is single-threaded. Parsing is CPU-bound (can't use threads — GIL). Needs ProcessPoolExecutor for parallel file parsing + bulk DB commits (N=50) to replace per-file fsync round-trips. Expected ~6-7x throughput gain on 8-core hardware. Implement after finding 6 (SchemaResolver must be pickle-safe first).
Medium priority
Finding 11.4 — Parse warnings go to stdout (#13)
Redirect WARNING log output to ~/.sqlcg/index.log by default; print only a count summary to stdout. Add --verbose to restore current behaviour.
Finding 11.7 — E8 dynamic identifier marker (observability only)
col_lineage_skip:dynamic_source: marker is already emitted (v0.3.1). No further action unless corpus gains statically-resolvable variable patterns.
Low priority / deferred
Finding 12 — Case-sensitive table lookup (#12)
Uppercase unquoted identifiers at index time and normalise lookup input. sqlglot dialect normalization likely handles most of this.
Finding 3.3 — SchemaResolver not thread-safe under sqlcg watch
Add threading.Lock around cache mutation in SchemaResolver. Blocked on finding 6 (resolver redesign anyway).
Consolidating all known open findings into one place post v0.3.1. Full detail for each is in
ARCHITECTURE_REVIEW.md.Next sprint (HIGH)
Finding 6 — Star projection column lineage (no edges for SELECT *)
sqlglot.optimizer.qualifycan expandSELECT *into explicit columns when schema is available. Requires a two-pass index (DDL first → populateSchemaResolver→ ETL pass usesqualifybeforesg_lineage). This is the biggest remaining gap for column lineage coverage.Closes #5 when done.
Finding 11.6 — Indexer too slow for real corpora
index_repois single-threaded. Parsing is CPU-bound (can't use threads — GIL). NeedsProcessPoolExecutorfor parallel file parsing + bulk DB commits (N=50) to replace per-file fsync round-trips. Expected ~6-7x throughput gain on 8-core hardware. Implement after finding 6 (SchemaResolver must be pickle-safe first).Medium priority
Finding 11.4 — Parse warnings go to stdout (#13)
Redirect
WARNINGlog output to~/.sqlcg/index.logby default; print only a count summary to stdout. Add--verboseto restore current behaviour.Finding 11.7 — E8 dynamic identifier marker (observability only)
col_lineage_skip:dynamic_source:marker is already emitted (v0.3.1). No further action unless corpus gains statically-resolvable variable patterns.Low priority / deferred
Finding 12 — Case-sensitive table lookup (#12)
Uppercase unquoted identifiers at index time and normalise lookup input. sqlglot dialect normalization likely handles most of this.
Finding 3.3 —
SchemaResolvernot thread-safe undersqlcg watchAdd
threading.Lockaround cache mutation inSchemaResolver. Blocked on finding 6 (resolver redesign anyway).