Skip to content

d-daemon/sql-etl-visualizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL ETL Visualizer

Python License Build Code Style Repo

A lightweight tool that parses SQL ETL scripts (BigQuery dialect) and auto-generates an interactive DAG to visualize end-to-end data flow across granular, intermediate, and output layers.


Features

  • Parses CREATE TABLE, INSERT INTO, and MERGE INTO statements.
  • Identifies source and target tables across scripts.
  • Constructs directed dependency graphs (source → target).
  • Interactive Streamlit UI using PyVis.
  • CLI for lineage extraction and export to JSON/DOT.
  • Works locally without database access.

Quick Start

1. Clone and set up environment

uv sync

2. Run CLI lineage extraction

uv run sqlviz examples/sql --json lineage.json --dot lineage.dot

Output:

  • lineage.json: Node-link representation of lineage graph.
  • lineage.dot: Graphviz export for visualization or documentation.

3. Launch interactive UI

uv run streamlit run sql_visualizer/app_streamlit.py -- examples/sql

Then open the Streamlit app in your browser to explore the DAG.


Example DAG

SOURCE_SYSTEM.CUSTOMERS ─┐
                         ├─> RAW.CUSTOMER_BASE ─┐
SOURCE_SYSTEM.ACCOUNTS ─┘                       │
                                                 ├─> STG.CUST_ACCT_LINK ─> MART.SEGMENT_BALANCE_SUMMARY

Project Structure

sql-etl-visualizer/
├── sql_visualizer/
│   ├── sql_parser.py           # Parse SQL to extract lineage
│   ├── graph_builder.py        # Construct dependency DAG
│   ├── cli.py                  # CLI interface
│   └── app_streamlit.py        # Streamlit UI (PyVis)
├── examples/sql/               # Sample SQL scripts
├── pyproject.toml              # Project configuration
└── README.md                   # Documentation

Configuration

pyproject.toml includes all dependencies and dev tools.

Main dependencies:

  • sqlglot for SQL parsing
  • networkx for graph modeling
  • pyvis for visualization
  • streamlit for UI

Development tools:

  • ruff for linting
  • pytest for testing

Development

To test the parser:

pytest -v

To lint:

ruff check .

Next Steps

  1. Add layer-based coloring to project, dataset, and table/view in UI.
  2. Include metadata tooltips (file, modified date, statement ID).
  3. Add export buttons for PNG/JSON from Streamlit UI.
  4. Integrate basic unit tests for parser and graph validation.

License

MIT License.


About

Parses SQL ETL scripts and visualizes end-to-end lineage DAGs.

Resources

License

Stars

Watchers

Forks

Languages