Enrich and generate datasets directly inside a Google Sheet, backed by any data warehouse.
Analysts and ops teams live in spreadsheets, not query editors. The gap between "I have a list of ids and want the rest of the row" (or "I want a filtered list of records") and "someone runs a query for me and pastes the result back" is pure friction — and it repeats every day. This toolkit closes that gap: a menu and a couple of sidebars turn a Google Sheet into a front-end for your warehouse, without giving every analyst raw query access or building a bespoke internal tool from scratch.
It's shipped as a documented skeleton: real, runnable reference code
against an example schema, plus the docs to port it to your own backend. The
value is in the two engineering-heavy pieces most teams get wrong the first
time — safely streaming large results back into a browser-side script, and
deciding when a result is too big for a spreadsheet at all — solved once and
documented in depth in docs/engineering-notes.md.
┌────────────┐ menu / sidebar ┌──────────────┐
│ Google │ ───────────────────▶│ Apps Script │
│ Sheet │ │ (Code.gs) │
│ (UI) │◀─────────────────── │ │
└────────────┘ values + colors └──────┬───────┘
│ submit(mode, params)
│ poll(run_id)
│ fetch_result(run_id, output_ref)
▼
┌──────────────┐
│ DataBackend │ <-- the seam
│ (interface) │
└──────┬───────┘
│ implemented by
▼
┌──────────────┐
│ Adapter │
│ (Databricks, │
│ Snowflake…) │
└──────┬───────┘
│ runs a notebook / query
▼
┌──────────────┐
│ Query engine │
│ (Spark, SQL) │
└──────┬───────┘
│ reads/writes
▼
┌──────────────┐
│ Storage │
│ (table + │
│ result file)│
└──────────────┘
Full walkthrough, hop by hop: docs/architecture.md.
| Mode | What you do | What you get |
|---|---|---|
| Complete | Paste a column of ids into a sheet | Each id's row filled in with a fixed field set (name, status, region, tier, created_at); unmatched ids flagged and highlighted |
| Generate | Pick filters in a sidebar (status, region, tier, activity, created-at range) | A generated list of matching records, written to a sheet tab, an .xlsx, or a CSV — chosen automatically if the result is too large for the requested format |
| Expand | Pick metrics + a number of months, on top of an already-Completed sheet | Extra columns appended per metric per month (e.g. activity_metric_2026-06-01) |
Screenshot placeholder: a short GIF of pasting ids and running Complete, and of the Generate sidebar with filters checked, belongs here once this is deployed against a real sheet.
Every mode operates against one example table, entities — see
schema/example-schema.md.
-
Copy the config template.
cp apps-script/config.example.gs apps-script/config.gsFill in
backendHost,backendToken,clusterId.config.gsis git-ignored — it holds your API token. -
Deploy the Apps Script project. Open your target Google Sheet → Extensions → Apps Script, and add
Code.gs,config.gs,Sidebar.html,SidebarExpand.html, andappsscript.json. Reload the sheet; a "Sheet Data Toolkit" menu should appear. -
Create the
entitiestable (or point at your own equivalent) in your warehouse — seeschema/example-schema.md. -
Port the three notebooks under
backend/notebooks/into your workspace, and point an adapter (start frombackend/adapters/databricks.py) at your platform's job/query API. -
Create the sheet tabs referenced in
config.gs'sSHEET_NAMES(Complete #1/#2,Generate #1/#2by default).
Full step-by-step, including a Snowflake/BigQuery adapter sketch:
docs/implementing-your-backend.md.
sheet-data-toolkit/
├── apps-script/
│ ├── Code.gs # menu, orchestration, streaming reader, batch writer
│ ├── config.example.gs # copy to config.gs and fill in your backend
│ ├── appsscript.json # Apps Script manifest
│ ├── Sidebar.html # Generate mode UI
│ └── SidebarExpand.html # Expand mode UI
├── backend/
│ ├── interface.py # DataBackend contract + destination cap logic
│ ├── test_interface.py
│ ├── adapters/
│ │ ├── databricks.py # reference adapter (Jobs API + DBFS)
│ │ └── README.md # how to write your own adapter
│ └── notebooks/
│ ├── complete_records.py # Complete mode
│ ├── generate_list.py # Generate mode
│ └── expand_metrics.py # Expand mode
├── schema/
│ └── example-schema.md # the `entities` table every mode references
├── examples/
│ └── field-catalog.example.json # filter <-> field <-> widget data contract
└── docs/
├── architecture.md
├── request-response-contract.md
├── implementing-your-backend.md
└── engineering-notes.md
docs/architecture.md— the full flow, hop by hop, and where theDataBackendseam sits.docs/request-response-contract.md— the exact request/response JSON for each mode.docs/implementing-your-backend.md— step-by-step guide to porting this to your own stack.docs/engineering-notes.md— the five patterns worth understanding before you editCode.gsor the notebooks: the streaming UTF-8-safe reader, the batch write with found/not-found reorder, the dynamic filter sidebar, the per-destination cap with auto-fallback to CSV, and the distributed export withoutcollect().
The only backend-specific file is the adapter. If you're not on Databricks,
start at docs/implementing-your-backend.md
and backend/adapters/README.md — both include
sketches for Snowflake and BigQuery adapters that satisfy the same
submit/poll/fetch_result contract.
This toolkit is generalized from an internal production tool built at a
company for its data analysts, with every company-specific detail —
hostnames, table names, credentials, business domain — removed or replaced
with the generic entities example used throughout this repo. The
architecture, the three modes, and the five engineering patterns in
docs/engineering-notes.md are the same ones that ran in production; only
the identifying details changed.
Apache-2.0 — see LICENSE.