A fully browser-based data wrangling tool built with React, TypeScript, and MUI. Upload a CSV or JSON file, clean and transform its contents, explore it through an interactive data grid, and visualise key statistics — all without a backend, a server, or any data ever leaving your machine.
Designed as a portfolio project that demonstrates the intersection of frontend engineering and data analytics: real-world ETL patterns (extract, transform, load) implemented entirely in the browser using Web Workers, virtualised rendering, and reactive state management.
- Features
- Tech Stack
- Architecture
- Data Flow
- Project Structure
- Views
- Transformation Operations
- Analytics & Aggregations
- Performance Design
- Responsive Design
- Getting Started
- Scripts
- Privacy
| Feature | Detail |
|---|---|
| Local-first | No data ever leaves your machine. All processing happens entirely in the browser using the File API and Web Worker threads. |
| Web Worker ingestion | CSV and JSON files are parsed off the main thread using a dedicated dataParser.worker.ts and PapaParse, keeping the UI fully interactive during ingestion of files up to 2 GB. Progress messages (row count) stream back to the UI via postMessage. |
| Virtualised grid | 60 FPS scrolling through datasets with 500,000+ rows. TanStack Virtual renders only ~30 DOM rows at a time using absolute positioning; the full virtual scroll height is maintained so the scrollbar behaves correctly. Columns are sortable (click header) and resizable (drag handle). |
| Transformation engine | A second transformer.worker.ts applies data-cleaning operations off the main thread. The original dataset is frozen in the store — a Reset to Original button is always available when rows have been removed. |
| Analytics dashboard | Recharts-powered visualisations: per-column null distribution (bar + pie), top-N value frequency histogram, numeric column distribution histogram with Min/Max/Mean/Median/Std Dev. |
| Live stats bar | A compact stats row above the grid shows live row count, column count, null-cell percentage (colour-coded warning/danger), and rows removed vs the original. |
| CSV export | The current (possibly cleaned) dataset is serialised back to CSV using PapaParse.unparse() and downloaded as a Blob — no server round-trip. |
| Responsive UI | Sidebar collapses to a hamburger-triggered temporary drawer on mobile; all panels and charts reflow gracefully across screen sizes. |
| Layer | Library | Version |
|---|---|---|
| Framework | React + TypeScript (strict) | React 19, TS 6 |
| Build tool | Vite | 8 |
| UI components | MUI (Material Design) | v7 |
| State management | Zustand | v5 |
| Table logic | TanStack Table (headless) | v8 |
| Row virtualisation | TanStack Virtual | v3 |
| CSV parsing & export | PapaParse | v5 |
| File ingestion UI | react-dropzone | v15 |
| Charts | Recharts | v3 |
| Linting | ESLint (strict TS) + Prettier | ESLint 9, Prettier 3 |
┌─────────────────────────────────────────────────────────────────┐
│ Browser — main thread │
│ │
│ React UI │
│ ├── Sidebar / Topbar (layout, navigation) │
│ │ │
│ ├── UploadZone ──────── postMessage({file}) ──────────────▶ │
│ │ dataParser │
│ │ ◀── { type:'progress', loaded } ◀───── .worker.ts │
│ │ ◀── { type:'complete', data, columns } ◀── (PapaParse) │
│ │ │
│ ├── DataGrid ◀──────────── Zustand selector ──────────────── │
│ │ (TanStack Table v8 + TanStack Virtual v3) │
│ │ │
│ ├── TransformPanel ─── postMessage({op, col, data}) ───────▶ │
│ │ transformer │
│ │ ◀── { type:'complete', data, affected } ◀─ .worker.ts │
│ │ (pure fns) │
│ │ │
│ └── AnalyticsPanel ◀─── Zustand + useMemo ────────────────── │
│ (Recharts) (aggregations.ts) │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Zustand Store │ │
│ │ { data, originalData, columns, isLoading, loadingMsg } │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Why Web Workers for parsing?
Parsing a 500,000-row CSV with PapaParse on the main thread blocks the JavaScript event loop for several seconds, freezing the UI. By offloading to a dedicated worker, the spinner, progress counter, and any other interactions remain fully responsive. The worker sends chunked progress messages as PapaParse streams through the file, giving the UI real-time row-count feedback before the full dataset arrives.
Why a second Web Worker for transforms?
Data cleaning operations iterate over every cell in the dataset. On a 500k-row file with 20 columns that's 10 million cell reads/writes — enough to block the main thread for 200–500 ms. The transformer worker receives the entire data array via postMessage (structured clone), processes it with pure functions, and sends back the result with an affected count. The main thread never stalls.
Why TanStack Virtual with absolute positioning?
Rendering a DOM element for every data row would create hundreds of thousands of nodes, causing catastrophic memory usage and complete loss of rendering performance. The virtualiser computes which rows fall within the current scroll viewport and renders only those (~30 at any time). Each row is absolutely positioned with transform: translateY(...) inside a container whose height equals the full virtual scroll size, so the native scrollbar tracks position accurately.
Why Zustand with per-component selectors?
Passing the full data array as a prop to every nested component causes the entire tree to re-render whenever any part of state changes. Zustand's selector API lets each component subscribe only to the exact slice it needs — DataGrid subscribes to data and columns, Topbar subscribes to rowCount, colCount, and isLoading independently. Re-renders are isolated to components that actually care about the changed value.
Why keep originalData separate?
Transforms are destructive by nature (dropping rows, mutating values). Storing the original snapshot separately lets the app offer a safe Reset to Original without re-parsing the file. originalData is set only once — when the first file is loaded — and is never mutated by transform operations.
User drops/selects file
→ useDropzone fires onDrop
→ useDataParser.parse(file)
→ terminates any in-flight worker
→ creates new Worker(dataParser.worker.ts)
→ setLoading(true, 'Parsing filename…')
→ postMessage({ file })
dataParser.worker.ts receives file
├── .csv → Papa.parse(file, { header, dynamicTyping, chunk })
│ chunk callback → postMessage({ type:'progress', loaded })
│ complete callback → postMessage({ type:'complete', data, columns })
└── .json → file.text() → JSON.parse → normalizeJsonData
→ postMessage({ type:'complete', data, columns })
useDataParser.worker.onmessage
├── 'progress' → setLoading(true, 'Loaded N rows…')
├── 'complete' → setData(data, columns) → setLoading(false)
└── 'error' → setLoading(false) + console.error
useDataStore.setData(rows, columns)
→ data = rows
→ originalData = rows (only if originalData was empty)
→ columns = columns
App.useEffect([rowCount, isLoading])
→ when rowCount > 0 && !isLoading → setActiveView('grid')
User selects operation + target column → clicks Apply
→ useTransformer.run({ operation, column, fillValue?, onComplete, onError })
→ terminates any in-flight worker
→ creates new Worker(transformer.worker.ts)
→ setLoading(true, 'Applying operation…')
→ postMessage({ operation, column, data: currentData, fillValue? })
transformer.worker.ts receives message
→ dispatches to pure function: dropNulls | trimWhitespace | parseDates | castToNumber | fillNulls
→ each fn returns { data: DataRow[], affected: number }
→ postMessage({ type:'complete', data, affected })
useTransformer.worker.onmessage
├── 'complete' → updateData(data) → setLoading(false) → onComplete(affected)
└── 'error' → setLoading(false) → onError(message)
useDataStore.updateData(rows)
→ data = rows (originalData untouched)
User clicks Export CSV
→ useExport.exportCsv()
→ Papa.unparse(data, { columns })
→ new Blob([csv], { type: 'text/csv' })
→ URL.createObjectURL(blob)
→ programmatic <a> click → download
→ URL.revokeObjectURL(url)
zen-data/
├── public/
│ ├── favicon.svg
│ ├── icons.svg
│ └── zen-data.svg # App logo
├── src/
│ ├── components/
│ │ ├── charts/
│ │ │ ├── AnalyticsPanel.tsx # Full analytics view: null dist, value freq, histograms
│ │ │ └── DataSummary.tsx # Compact live stats bar above the data grid
│ │ ├── grid/
│ │ │ └── DataGrid.tsx # Virtualised, sortable, resizable table (TanStack)
│ │ ├── layout/
│ │ │ ├── Sidebar.tsx # Responsive nav drawer (permanent ↔ temporary)
│ │ │ └── Topbar.tsx # Fixed AppBar: dataset info + export + hamburger
│ │ ├── transform/
│ │ │ └── TransformPanel.tsx # Data cleaning operations UI + dataset summary
│ │ └── upload/
│ │ └── UploadZone.tsx # Drag-and-drop / file-picker ingestion UI
│ ├── hooks/
│ │ ├── useDataParser.ts # Worker lifecycle: spawn → stream progress → complete
│ │ ├── useExport.ts # PapaParse unparse → Blob → programmatic download
│ │ └── useTransformer.ts # Worker lifecycle: spawn → apply op → update store
│ ├── store/
│ │ └── useDataStore.ts # Zustand store: data, originalData, columns, loading
│ ├── theme/
│ │ └── theme.ts # MUI dark enterprise theme + global component overrides
│ ├── utils/
│ │ └── aggregations.ts # Pure aggregation functions (no side-effects)
│ ├── workers/
│ │ ├── dataParser.worker.ts # PapaParse streaming + JSON array normaliser
│ │ └── transformer.worker.ts # Pure transform fns: drop, trim, cast, dates, fill
│ ├── App.tsx # Root layout, view router, mobile drawer state
│ └── main.tsx # React 19 root mount + MUI ThemeProvider
├── index.html
├── vite.config.ts
├── tsconfig.app.json
├── eslint.config.js
└── package.json
Drag-and-drop zone powered by react-dropzone. Accepts .csv and .json files up to 2 GB. Shows a loading spinner with a live row counter during parsing. Once parsing completes, the app automatically navigates to the Data Grid view.
A headless, virtualised table built with TanStack Table v8 and TanStack Virtual v3.
- Columns are defined dynamically from the dataset's header row (or JSON keys).
- Sorting is handled client-side by TanStack Table's
getSortedRowModel; click any column header to sort ascending/descending. - Column resizing is done via a drag handle on every header cell (
columnResizeMode: 'onChange'). - Virtualisation renders only the rows in the current viewport (~30 at a time) using absolute positioning and
transform: translateY(...)for GPU-accelerated scrolling. - A row index column (
#) is prepended automatically and is never sortable or resizable. - Null values are rendered in red italic
nulltext to make data quality issues immediately visible. - Numeric values use tabular-nums font feature for aligned columns.
- The DataSummary bar above the grid shows live null stats and rows-removed indicator.
A two-panel layout (stacks vertically on mobile):
- Left panel — Operation selector with a target-column dropdown and a fill-value input. Each operation card shows a description and an Apply button.
- Right panel — Dataset summary stats (total rows, original rows, column count, null count) plus a per-column null chip grid. Clicking a chip sets that column as the target. A Reset to Original button appears whenever rows have been removed.
All operations run in the transformer Web Worker and report back the number of affected rows via a toast notification.
Three sections of Recharts visualisations:
- Overview — StatBox cards for rows, columns, null cells, numeric column count, categorical column count.
- Null distribution — Bar chart of null percentage per column (green < 5%, orange 5–20%, red > 20%) alongside a donut chart of overall completeness.
- Value distribution — Top-25 value frequency bar chart for any selected column.
- Numeric distribution — Descriptor stats (Min, Max, Mean, Median, Std Dev, Non-null count) and a 20-bin histogram for any numeric column.
All operations are implemented as pure functions in transformer.worker.ts and execute entirely off the main thread.
| Operation | Target | Behaviour |
|---|---|---|
| Drop Null Rows | Single column or All | Filters out any row where the target column(s) contain null, undefined, or "". Reports rows removed. |
| Trim Whitespace | Single column or All | Strips leading and trailing whitespace from string values. Reports rows where at least one cell changed. |
| Cast to Number | Single column | Converts string values to float using Number(value.replace(/,/g, '')). Values that cannot be parsed are set to null. Reports cells changed. |
| Parse Dates | Single column | Matches strings against ISO, MM/DD/YYYY, DD-MM-YYYY, and long-form patterns. Valid dates are normalised to ISO 8601 (YYYY-MM-DD). Invalid strings are left unchanged. |
| Fill Nulls | Single column or All | Replaces null, undefined, and "" with a constant value you type in. Accepts strings and numbers. Reports cells changed. |
The originalData snapshot in the Zustand store is preserved across all transform operations. Resetting restores the full original dataset without re-parsing.
All analytical computations live in src/utils/aggregations.ts as pure, side-effect-free functions memoised with useMemo in the component layer.
| Function | Description |
|---|---|
countByValue(data, column, topN=25) |
Builds a frequency map, sorts by count descending, returns top N { name, count } entries. Nulls are represented as "(null)". |
nullsByColumn(data, columns) |
Per-column null count and percentage as NullEntry[]. Used for the null distribution bar chart. |
numericStats(data, column) |
Min, max, mean, median (exact middle), population std dev, non-null count. Values are coerced from strings if needed. Returns null if no numeric values exist. |
histogramBins(data, column, bins=20) |
Equal-width bins over [min, max]. Each bin is { range, count, from, to }. Returns [] if all values are identical. |
isNumericColumn(data, column) |
Samples up to 200 rows; classifies as numeric if > 80% of non-null values parse as a number. Used to split columns into numeric vs categorical for chart selectors. |
overallNullStats(data, columns) |
Total cells, null cells, and null percentage across the entire dataset. Used in the DataSummary bar and the Overview section. |
PapaParse's chunk callback processes the CSV in streaming chunks rather than reading the entire file into memory before parsing. Each chunk appends to an accumulated array and fires a progress message to the main thread, so the UI displays a live row count. The worker is terminated immediately after complete fires to free memory.
TanStack Virtual calculates a virtualItems array containing only the rows within the visible viewport plus an overscan buffer of 15 rows. The table body is a single <Box> whose height equals rowVirtualizer.getTotalSize() (the full virtual height). Each visible row is absolutely positioned with transform: translateY(virtualRow.start) — a CSS property that triggers compositing rather than layout, enabling GPU-accelerated scrolling.
Total width of the table is calculated by summing all column sizes (table.getTotalSize()). The inner scroll wrapper is set to exactly this width, which causes the container's native horizontal scrollbar to appear when columns overflow the viewport.
Every component that reads from the Zustand store uses a field-level selector:
const rowCount = useDataStore((s) => s.data.length) // only re-renders on length change
const isLoading = useDataStore((s) => s.isLoading) // independent subscriptionReact.memo is applied to all view components and layout components (DataGrid, AnalyticsPanel, TransformPanel, Topbar, Sidebar) so they only re-render when their own props change.
Aggregation functions (countByValue, numericStats, etc.) are wrapped in useMemo with [data, column] dependencies inside AnalyticsPanel so expensive O(n) scans do not re-run on unrelated state updates.
The layout adapts across three breakpoints using MUI's responsive sx system ({ xs, sm, md } values):
| Breakpoint | Sidebar | Topbar | Main content |
|---|---|---|---|
| xs / sm (< 900 px) | Hidden; opens as a temporary drawer via hamburger button | Full width (left: 0), hamburger icon visible |
Full width (ml: 0) |
| md+ (≥ 900 px) | Permanent fixed drawer (240 px) | Offset to the right of the sidebar | Left margin of 240 px |
Additional responsive adjustments:
- TransformPanel stacks its two panels vertically (
flexDirection: column) belowmd. - UploadZone reduces inner and outer padding on small screens.
- AnalyticsPanel scales padding and section gaps; the null-distribution bar chart becomes full-width before the pie chart on narrow viewports.
- DataSummary bar scrolls horizontally on mobile to prevent stats from being clipped.
- The Export CSV button label is hidden on xs (icon only) and the local-first chip is hidden on xs to keep the topbar uncluttered.
# Clone
git clone https://github.com/your-username/zen-data.git
cd zen-data
# Install dependencies
npm install
# Start development server (http://localhost:5173)
npm run dev| Script | Command | Description |
|---|---|---|
dev |
vite |
Starts the Vite dev server with HMR |
build |
tsc -b && vite build |
Type-checks then produces an optimised production bundle |
preview |
vite preview |
Serves the production build locally for verification |
lint |
eslint . |
Runs ESLint with strict TypeScript rules |
format |
prettier --write "src/**/*.{ts,tsx}" |
Formats all source files with Prettier |
Type-check without building:
npx tsc -b --noEmitAll data processing is strictly local-first:
- Files are read directly in the browser using the native File API — no upload, no network request.
- Parsing and transformation run inside Web Worker threads that are isolated to the browser tab.
- The Zustand store holds data only in JavaScript memory for the lifetime of the browser tab.
- Closing or refreshing the tab discards all data immediately.
- No analytics, telemetry, or tracking of any kind is present in this application.