Zen Data

A fully browser-based data wrangling tool built with React, TypeScript, and MUI. Upload a CSV or JSON file, clean and transform its contents, explore it through an interactive data grid, and visualise key statistics — all without a backend, a server, or any data ever leaving your machine.

Designed as a portfolio project that demonstrates the intersection of frontend engineering and data analytics: real-world ETL patterns (extract, transform, load) implemented entirely in the browser using Web Workers, virtualised rendering, and reactive state management.

Features

Feature	Detail
Local-first	No data ever leaves your machine. All processing happens entirely in the browser using the File API and Web Worker threads.
Web Worker ingestion	CSV and JSON files are parsed off the main thread using a dedicated `dataParser.worker.ts` and PapaParse, keeping the UI fully interactive during ingestion of files up to 2 GB. Progress messages (row count) stream back to the UI via `postMessage`.
Virtualised grid	60 FPS scrolling through datasets with 500,000+ rows. TanStack Virtual renders only ~30 DOM rows at a time using absolute positioning; the full virtual scroll height is maintained so the scrollbar behaves correctly. Columns are sortable (click header) and resizable (drag handle).
Transformation engine	A second `transformer.worker.ts` applies data-cleaning operations off the main thread. The original dataset is frozen in the store — a Reset to Original button is always available when rows have been removed.
Analytics dashboard	Recharts-powered visualisations: per-column null distribution (bar + pie), top-N value frequency histogram, numeric column distribution histogram with Min/Max/Mean/Median/Std Dev.
Live stats bar	A compact stats row above the grid shows live row count, column count, null-cell percentage (colour-coded warning/danger), and rows removed vs the original.
CSV export	The current (possibly cleaned) dataset is serialised back to CSV using `PapaParse.unparse()` and downloaded as a Blob — no server round-trip.
Responsive UI	Sidebar collapses to a hamburger-triggered temporary drawer on mobile; all panels and charts reflow gracefully across screen sizes.

Tech Stack

Layer	Library	Version
Framework	React + TypeScript (strict)	React 19, TS 6
Build tool	Vite	8
UI components	MUI (Material Design)	v7
State management	Zustand	v5
Table logic	TanStack Table (headless)	v8
Row virtualisation	TanStack Virtual	v3
CSV parsing & export	PapaParse	v5
File ingestion UI	react-dropzone	v15
Charts	Recharts	v3
Linting	ESLint (strict TS) + Prettier	ESLint 9, Prettier 3

Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Browser — main thread                                          │
│                                                                 │
│  React UI                                                       │
│  ├── Sidebar / Topbar (layout, navigation)                      │
│  │                                                              │
│  ├── UploadZone ──────── postMessage({file}) ──────────────▶   │
│  │                                               dataParser     │
│  │   ◀── { type:'progress', loaded }  ◀─────    .worker.ts     │
│  │   ◀── { type:'complete', data, columns } ◀──  (PapaParse)   │
│  │                                                              │
│  ├── DataGrid ◀──────────── Zustand selector ────────────────  │
│  │   (TanStack Table v8 + TanStack Virtual v3)                  │
│  │                                                              │
│  ├── TransformPanel ─── postMessage({op, col, data}) ───────▶  │
│  │                                               transformer    │
│  │   ◀── { type:'complete', data, affected } ◀─  .worker.ts    │
│  │                                               (pure fns)     │
│  │                                                              │
│  └── AnalyticsPanel ◀─── Zustand + useMemo ──────────────────  │
│      (Recharts)           (aggregations.ts)                     │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  Zustand Store                                           │  │
│  │  { data, originalData, columns, isLoading, loadingMsg }  │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Key design decisions

Why Web Workers for parsing? Parsing a 500,000-row CSV with PapaParse on the main thread blocks the JavaScript event loop for several seconds, freezing the UI. By offloading to a dedicated worker, the spinner, progress counter, and any other interactions remain fully responsive. The worker sends chunked progress messages as PapaParse streams through the file, giving the UI real-time row-count feedback before the full dataset arrives.

Why a second Web Worker for transforms? Data cleaning operations iterate over every cell in the dataset. On a 500k-row file with 20 columns that's 10 million cell reads/writes — enough to block the main thread for 200–500 ms. The transformer worker receives the entire data array via postMessage (structured clone), processes it with pure functions, and sends back the result with an affected count. The main thread never stalls.

Why TanStack Virtual with absolute positioning? Rendering a DOM element for every data row would create hundreds of thousands of nodes, causing catastrophic memory usage and complete loss of rendering performance. The virtualiser computes which rows fall within the current scroll viewport and renders only those (~30 at any time). Each row is absolutely positioned with transform: translateY(...) inside a container whose height equals the full virtual scroll size, so the native scrollbar tracks position accurately.

Why Zustand with per-component selectors? Passing the full data array as a prop to every nested component causes the entire tree to re-render whenever any part of state changes. Zustand's selector API lets each component subscribe only to the exact slice it needs — DataGrid subscribes to data and columns, Topbar subscribes to rowCount, colCount, and isLoading independently. Re-renders are isolated to components that actually care about the changed value.

Why keep originalData separate? Transforms are destructive by nature (dropping rows, mutating values). Storing the original snapshot separately lets the app offer a safe Reset to Original without re-parsing the file. originalData is set only once — when the first file is loaded — and is never mutated by transform operations.

Data Flow

Ingestion

User drops/selects file
  → useDropzone fires onDrop
  → useDataParser.parse(file)
    → terminates any in-flight worker
    → creates new Worker(dataParser.worker.ts)
    → setLoading(true, 'Parsing filename…')
    → postMessage({ file })

dataParser.worker.ts receives file
  ├── .csv  → Papa.parse(file, { header, dynamicTyping, chunk })
  │     chunk callback → postMessage({ type:'progress', loaded })
  │     complete callback → postMessage({ type:'complete', data, columns })
  └── .json → file.text() → JSON.parse → normalizeJsonData
              → postMessage({ type:'complete', data, columns })

useDataParser.worker.onmessage
  ├── 'progress' → setLoading(true, 'Loaded N rows…')
  ├── 'complete' → setData(data, columns) → setLoading(false)
  └── 'error'   → setLoading(false) + console.error

useDataStore.setData(rows, columns)
  → data = rows
  → originalData = rows  (only if originalData was empty)
  → columns = columns

App.useEffect([rowCount, isLoading])
  → when rowCount > 0 && !isLoading → setActiveView('grid')

Transformation

User selects operation + target column → clicks Apply
  → useTransformer.run({ operation, column, fillValue?, onComplete, onError })
    → terminates any in-flight worker
    → creates new Worker(transformer.worker.ts)
    → setLoading(true, 'Applying operation…')
    → postMessage({ operation, column, data: currentData, fillValue? })

transformer.worker.ts receives message
  → dispatches to pure function: dropNulls | trimWhitespace | parseDates | castToNumber | fillNulls
  → each fn returns { data: DataRow[], affected: number }
  → postMessage({ type:'complete', data, affected })

useTransformer.worker.onmessage
  ├── 'complete' → updateData(data) → setLoading(false) → onComplete(affected)
  └── 'error'   → setLoading(false) → onError(message)

useDataStore.updateData(rows)
  → data = rows   (originalData untouched)

Export

User clicks Export CSV
  → useExport.exportCsv()
    → Papa.unparse(data, { columns })
    → new Blob([csv], { type: 'text/csv' })
    → URL.createObjectURL(blob)
    → programmatic <a> click → download
    → URL.revokeObjectURL(url)

Project Structure

zen-data/
├── public/
│   ├── favicon.svg
│   ├── icons.svg
│   └── zen-data.svg               # App logo
├── src/
│   ├── components/
│   │   ├── charts/
│   │   │   ├── AnalyticsPanel.tsx  # Full analytics view: null dist, value freq, histograms
│   │   │   └── DataSummary.tsx     # Compact live stats bar above the data grid
│   │   ├── grid/
│   │   │   └── DataGrid.tsx        # Virtualised, sortable, resizable table (TanStack)
│   │   ├── layout/
│   │   │   ├── Sidebar.tsx         # Responsive nav drawer (permanent ↔ temporary)
│   │   │   └── Topbar.tsx          # Fixed AppBar: dataset info + export + hamburger
│   │   ├── transform/
│   │   │   └── TransformPanel.tsx  # Data cleaning operations UI + dataset summary
│   │   └── upload/
│   │       └── UploadZone.tsx      # Drag-and-drop / file-picker ingestion UI
│   ├── hooks/
│   │   ├── useDataParser.ts        # Worker lifecycle: spawn → stream progress → complete
│   │   ├── useExport.ts            # PapaParse unparse → Blob → programmatic download
│   │   └── useTransformer.ts       # Worker lifecycle: spawn → apply op → update store
│   ├── store/
│   │   └── useDataStore.ts         # Zustand store: data, originalData, columns, loading
│   ├── theme/
│   │   └── theme.ts                # MUI dark enterprise theme + global component overrides
│   ├── utils/
│   │   └── aggregations.ts         # Pure aggregation functions (no side-effects)
│   ├── workers/
│   │   ├── dataParser.worker.ts    # PapaParse streaming + JSON array normaliser
│   │   └── transformer.worker.ts   # Pure transform fns: drop, trim, cast, dates, fill
│   ├── App.tsx                     # Root layout, view router, mobile drawer state
│   └── main.tsx                    # React 19 root mount + MUI ThemeProvider
├── index.html
├── vite.config.ts
├── tsconfig.app.json
├── eslint.config.js
└── package.json

Views

Import (`/upload`)

Drag-and-drop zone powered by react-dropzone. Accepts .csv and .json files up to 2 GB. Shows a loading spinner with a live row counter during parsing. Once parsing completes, the app automatically navigates to the Data Grid view.

Data Grid (`/grid`)

A headless, virtualised table built with TanStack Table v8 and TanStack Virtual v3.

Columns are defined dynamically from the dataset's header row (or JSON keys).
Sorting is handled client-side by TanStack Table's getSortedRowModel; click any column header to sort ascending/descending.
Column resizing is done via a drag handle on every header cell (columnResizeMode: 'onChange').
Virtualisation renders only the rows in the current viewport (~30 at a time) using absolute positioning and transform: translateY(...) for GPU-accelerated scrolling.
A row index column (#) is prepended automatically and is never sortable or resizable.
Null values are rendered in red italic null text to make data quality issues immediately visible.
Numeric values use tabular-nums font feature for aligned columns.
The DataSummary bar above the grid shows live null stats and rows-removed indicator.

Transform (`/transform`)

A two-panel layout (stacks vertically on mobile):

Left panel — Operation selector with a target-column dropdown and a fill-value input. Each operation card shows a description and an Apply button.
Right panel — Dataset summary stats (total rows, original rows, column count, null count) plus a per-column null chip grid. Clicking a chip sets that column as the target. A Reset to Original button appears whenever rows have been removed.

All operations run in the transformer Web Worker and report back the number of affected rows via a toast notification.

Analytics (`/charts`)

Three sections of Recharts visualisations:

Overview — StatBox cards for rows, columns, null cells, numeric column count, categorical column count.
Null distribution — Bar chart of null percentage per column (green < 5%, orange 5–20%, red > 20%) alongside a donut chart of overall completeness.
Value distribution — Top-25 value frequency bar chart for any selected column.
Numeric distribution — Descriptor stats (Min, Max, Mean, Median, Std Dev, Non-null count) and a 20-bin histogram for any numeric column.

Transformation Operations

All operations are implemented as pure functions in transformer.worker.ts and execute entirely off the main thread.

Operation	Target	Behaviour
Drop Null Rows	Single column or All	Filters out any row where the target column(s) contain `null`, `undefined`, or `""`. Reports rows removed.
Trim Whitespace	Single column or All	Strips leading and trailing whitespace from string values. Reports rows where at least one cell changed.
Cast to Number	Single column	Converts string values to `float` using `Number(value.replace(/,/g, ''))`. Values that cannot be parsed are set to `null`. Reports cells changed.
Parse Dates	Single column	Matches strings against ISO, `MM/DD/YYYY`, `DD-MM-YYYY`, and long-form patterns. Valid dates are normalised to ISO 8601 (`YYYY-MM-DD`). Invalid strings are left unchanged.
Fill Nulls	Single column or All	Replaces `null`, `undefined`, and `""` with a constant value you type in. Accepts strings and numbers. Reports cells changed.

The originalData snapshot in the Zustand store is preserved across all transform operations. Resetting restores the full original dataset without re-parsing.

Analytics & Aggregations

All analytical computations live in src/utils/aggregations.ts as pure, side-effect-free functions memoised with useMemo in the component layer.

Function	Description
`countByValue(data, column, topN=25)`	Builds a frequency map, sorts by count descending, returns top N `{ name, count }` entries. Nulls are represented as `"(null)"`.
`nullsByColumn(data, columns)`	Per-column null count and percentage as `NullEntry[]`. Used for the null distribution bar chart.
`numericStats(data, column)`	Min, max, mean, median (exact middle), population std dev, non-null count. Values are coerced from strings if needed. Returns `null` if no numeric values exist.
`histogramBins(data, column, bins=20)`	Equal-width bins over `[min, max]`. Each bin is `{ range, count, from, to }`. Returns `[]` if all values are identical.
`isNumericColumn(data, column)`	Samples up to 200 rows; classifies as numeric if > 80% of non-null values parse as a number. Used to split columns into numeric vs categorical for chart selectors.
`overallNullStats(data, columns)`	Total cells, null cells, and null percentage across the entire dataset. Used in the DataSummary bar and the Overview section.

Performance Design

Parsing large files

PapaParse's chunk callback processes the CSV in streaming chunks rather than reading the entire file into memory before parsing. Each chunk appends to an accumulated array and fires a progress message to the main thread, so the UI displays a live row count. The worker is terminated immediately after complete fires to free memory.

Virtualised rendering

TanStack Virtual calculates a virtualItems array containing only the rows within the visible viewport plus an overscan buffer of 15 rows. The table body is a single <Box> whose height equals rowVirtualizer.getTotalSize() (the full virtual height). Each visible row is absolutely positioned with transform: translateY(virtualRow.start) — a CSS property that triggers compositing rather than layout, enabling GPU-accelerated scrolling.

Total width of the table is calculated by summing all column sizes (table.getTotalSize()). The inner scroll wrapper is set to exactly this width, which causes the container's native horizontal scrollbar to appear when columns overflow the viewport.

Re-render isolation

Every component that reads from the Zustand store uses a field-level selector:

const rowCount = useDataStore((s) => s.data.length) // only re-renders on length change
const isLoading = useDataStore((s) => s.isLoading) // independent subscription

React.memo is applied to all view components and layout components (DataGrid, AnalyticsPanel, TransformPanel, Topbar, Sidebar) so they only re-render when their own props change.

Aggregation functions (countByValue, numericStats, etc.) are wrapped in useMemo with [data, column] dependencies inside AnalyticsPanel so expensive O(n) scans do not re-run on unrelated state updates.

Responsive Design

The layout adapts across three breakpoints using MUI's responsive sx system ({ xs, sm, md } values):

Breakpoint	Sidebar	Topbar	Main content
xs / sm (< 900 px)	Hidden; opens as a temporary drawer via hamburger button	Full width (`left: 0`), hamburger icon visible	Full width (`ml: 0`)
md+ (≥ 900 px)	Permanent fixed drawer (240 px)	Offset to the right of the sidebar	Left margin of 240 px

Additional responsive adjustments:

TransformPanel stacks its two panels vertically (flexDirection: column) below md.
UploadZone reduces inner and outer padding on small screens.
AnalyticsPanel scales padding and section gaps; the null-distribution bar chart becomes full-width before the pie chart on narrow viewports.
DataSummary bar scrolls horizontally on mobile to prevent stats from being clipped.
The Export CSV button label is hidden on xs (icon only) and the local-first chip is hidden on xs to keep the topbar uncluttered.

Getting Started

# Clone
git clone https://github.com/your-username/zen-data.git
cd zen-data

# Install dependencies
npm install

# Start development server (http://localhost:5173)
npm run dev

Scripts

Script	Command	Description
`dev`	`vite`	Starts the Vite dev server with HMR
`build`	`tsc -b && vite build`	Type-checks then produces an optimised production bundle
`preview`	`vite preview`	Serves the production build locally for verification
`lint`	`eslint .`	Runs ESLint with strict TypeScript rules
`format`	`prettier --write "src/*/.{ts,tsx}"`	Formats all source files with Prettier

Type-check without building:

npx tsc -b --noEmit

Privacy

All data processing is strictly local-first:

Files are read directly in the browser using the native File API — no upload, no network request.
Parsing and transformation run inside Web Worker threads that are isolated to the browser tab.
The Zustand store holds data only in JavaScript memory for the lifetime of the browser tab.
Closing or refreshing the tab discards all data immediately.
No analytics, telemetry, or tracking of any kind is present in this application.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
public		public
src		src
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zen Data

Table of Contents

Features

Tech Stack

Architecture

Key design decisions

Data Flow

Ingestion

Transformation

Export

Project Structure

Views

Import (`/upload`)

Data Grid (`/grid`)

Transform (`/transform`)

Analytics (`/charts`)

Transformation Operations

Analytics & Aggregations

Performance Design

Parsing large files

Virtualised rendering

Re-render isolation

Responsive Design

Getting Started

Scripts

Privacy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zen Data

Table of Contents

Features

Tech Stack

Architecture

Key design decisions

Data Flow

Ingestion

Transformation

Export

Project Structure

Views

Import (/upload)

Data Grid (/grid)

Transform (/transform)

Analytics (/charts)

Transformation Operations

Analytics & Aggregations

Performance Design

Parsing large files

Virtualised rendering

Re-render isolation

Responsive Design

Getting Started

Scripts

Privacy

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Import (`/upload`)

Data Grid (`/grid`)

Transform (`/transform`)

Analytics (`/charts`)

Packages