A visual canvas for mapping data pipelines and interactively tracing column-level dependencies.
Built for data analysts and engineers who want to draw how data flows rather than describe it in text.
When refactoring complex analytical scripts or parsing legacy repositories, the same bottleneck always appears: how to quickly trace the origin of a specific data attribute. Once a column passes through a sequence of joins, filters, renames, and custom transformations, its trail is easily lost in the codebase.
DataPav lets you build a visual graph of your pipeline and click any column to instantly highlight its full upstream path — all locally, with no backend or sign-up.
Drag columns between DataFrames to wire lineage edges:
Click any column to trace its full upstream path:
Search across all nodes and columns:
You need to modify the calculation logic of a user_segment column at the start of a script, but don't know where it gets renamed or used to compute downstream metrics like LTV. Map the script's core logic onto the canvas — clicking user_segment at the final output instantly isolates its lineage graph so you can refactor without breaking downstream dependencies.
A final query produces an adjusted metric like revenue_adjusted across hundreds of lines and multiple CTEs. Instead of parsing the SQL from bottom to top, model the query visually: use Merge nodes for JOIN keys and Filter nodes for WHERE clauses. The origin of any metric becomes visible at a single glance.
Unused columns (large string fields, temporary metadata, redundant IDs) bloat RAM across in-memory pipelines. Trace the lineage of any attribute to instantly see if it ever reaches a downstream aggregation or export. If it leads to a dead end, safely drop it early with .drop(columns=[...]).
| Node | Purpose |
|---|---|
| DataFrame | Table with typed columns — the main building block |
| Merge | Visual JOIN with key pairs and join type (inner / left / right / outer) |
| Filter | Multi-condition WHERE builder with @column syntax |
| Transform | Type casts, fillna, drop columns, drop duplicates |
| Rename | Column rename mappings |
| GroupBy | Group-by keys + aggregations (sum, mean, count, nunique…) |
| Function | Free-form transformation with named inputs, outputs, and pass-through linking |
| Comment | Resizable sticky note with @ref highlighting |
- Column-level lineage — drag a column from one DataFrame onto another to copy it and auto-draw a lineage edge
- Attribute Tracker
◎— click any column to highlight its full upstream/downstream path; everything else fades out - Search
⌘K— fuzzy search across all node labels and column names - Auto-layout — one-click dagre LR arrangement
- Undo / Redo — full history (50 snapshots)
- Multiple tabs — work on several canvases in the same session
- Save / Load JSON — full canvas state in a portable file
- Share link — compressed URL encoding the full canvas (~40 KB JSON → ~3–5 KB URL)
- Copy / Paste canvas —
⌃⇧C/⌃⇧V - SQL export / import — generate SELECT queries from the graph or scaffold nodes from a query
- Export PNG — 3× pixel-ratio render of the current viewport
Runs entirely in browser localStorage. No data is sent to external servers. Export/import as local JSON or run offline via npm.
git clone https://github.com/PavelLuchkov/dataloom.git
cd dataloom
npm install
npm start # opens at http://localhost:3000If port 3000 is occupied:
PORT=3001 npm start
On first load the app shows a built-in demo canvas with all node types, lineage edges, and usage tips.
| Shortcut | Action |
|---|---|
⌘K |
Open search |
⌃⇧F |
Toggle attribute tracker |
⌘Z / ⌘Y |
Undo / Redo |
⌘C / ⌘D |
Copy / Paste selected nodes |
⌃⇧C / ⌃⇧V |
Copy / Paste full canvas |
Delete |
Remove selected nodes or edge |
@ in Filter |
Autocomplete column from connected node |
- Save DataFrames. Save commonly used dataframes in your own pack and use it between all canvases
- Expand "Function". Funtions that contain internal logic can be expanded to walk through their paths
- More fluent refactoring process. Current iteration does not fullfil mid-path refactoring breaking connections.
| Tool | When to choose DataPav instead |
|---|---|
| Miro / Draw.io | When you need interactive column-level trace, not just static boxes and arrows |
| dbt docs | For projects built without dbt, or quick sketching without boilerplate YAML |
| Enterprise platforms (OpenLineage, Monte Carlo) | When you need a zero-config local tool without DevOps or compliance overhead |
- React Flow — canvas, nodes, handles, edges
- React (CRA) + Tailwind CSS v3
- dagre — auto-layout
- pako — canvas compression for share links
- html-to-image — PNG export
MIT


